├── DESCRIPTION
├── NAMESPACE
├── R
    ├── DEsingle.R
    ├── DEtype.R
    └── TestData.R
├── README.md
├── data
    └── TestData.rda
├── inst
    ├── CITATION
    └── NEWS
├── man
    ├── DEsingle.Rd
    ├── DEtype.Rd
    └── TestData.Rd
└── vignettes
    ├── DEsingle.Rmd
    └── DEsingle_LOGO.png


/DESCRIPTION:
--------------------------------------------------------------------------------
 1 | Package: DEsingle
 2 | Type: Package
 3 | Title: DEsingle for detecting three types of differential expression in single-cell RNA-seq data
 4 | Version: 1.19.1
 5 | Date: 2018-12-01
 6 | Author: Zhun Miao <miaoz13@tsinghua.org.cn>
 7 | Maintainer: Zhun Miao <miaoz13@tsinghua.org.cn>
 8 | Description: DEsingle is an R package for differential expression (DE) analysis of
 9 |     single-cell RNA-seq (scRNA-seq) data. It defines and detects 3 types of differentially
10 |     expressed genes between two groups of single cells, with regard to different expression
11 |     status (DEs), differential expression abundance (DEa), and general differential expression
12 |     (DEg). DEsingle employs Zero-Inflated Negative Binomial model to estimate the proportion
13 |     of real and dropout zeros and to define and detect the 3 types of DE genes. Results showed
14 |     that DEsingle outperforms existing methods for scRNA-seq DE analysis, and can reveal
15 |     different types of DE genes that are enriched in different biological functions.
16 | License: GPL-2
17 | Encoding: UTF-8
18 | LazyData: true
19 | Depends: R (>= 3.4.0)
20 | Imports:
21 |     stats,
22 |     Matrix (>= 1.2-14),
23 |     MASS (>= 7.3-45),
24 |     VGAM (>= 1.0-2),
25 |     bbmle (>= 1.0.18),
26 |     gamlss (>= 4.4-0),
27 |     maxLik (>= 1.3-4),
28 |     pscl (>= 1.4.9),
29 |     BiocParallel (>= 1.12.0),
30 | Suggests:
31 |     knitr,
32 |     rmarkdown,
33 |     SingleCellExperiment
34 | VignetteBuilder: knitr
35 | URL: https://miaozhun.github.io/DEsingle/
36 | biocViews: DifferentialExpression, GeneExpression, SingleCell, ImmunoOncology, RNASeq, Transcriptomics, Sequencing, Preprocessing, Software
37 | RoxygenNote: 6.0.1
38 | 


--------------------------------------------------------------------------------
/NAMESPACE:
--------------------------------------------------------------------------------
 1 | # Generated by roxygen2: do not edit by hand
 2 | 
 3 | export(DEsingle)
 4 | export(DEtype)
 5 | import(stats)
 6 | importFrom(BiocParallel,bplapply)
 7 | importFrom(BiocParallel,bpparam)
 8 | importFrom(MASS,fitdistr)
 9 | importFrom(MASS,glm.nb)
10 | importFrom(Matrix,Matrix)
11 | importFrom(VGAM,dzinegbin)
12 | importFrom(bbmle,mle2)
13 | importFrom(gamlss,gamlssML)
14 | importFrom(maxLik,maxLik)
15 | importFrom(pscl,zeroinfl)
16 | importMethodsFrom(Matrix,colSums)
17 | 


--------------------------------------------------------------------------------
/R/DEsingle.R:
--------------------------------------------------------------------------------
  1 | #' DEsingle: Detecting differentially expressed genes from scRNA-seq data
  2 | #'
  3 | #' This function is used to detect differentially expressed genes between two specified groups of cells in a raw read counts matrix of single-cell RNA-seq (scRNA-seq) data. It takes a non-negative integer matrix of scRNA-seq raw read counts or a \code{SingleCellExperiment} object as input. So users should map the reads (obtained from sequencing libraries of the samples) to the corresponding genome and count the reads mapped to each gene according to the gene annotation to get the raw read counts matrix in advance.
  4 | #'
  5 | #' @param counts A non-negative integer matrix of scRNA-seq raw read counts or a \code{SingleCellExperiment} object which contains the read counts matrix. The rows of the matrix are genes and columns are samples/cells.
  6 | #' @param group A vector of factor which specifies the two groups to be compared, corresponding to the columns in the counts matrix.
  7 | #' @param parallel If FALSE (default), no parallel computation is used; if TRUE, parallel computation using \code{BiocParallel}, with argument \code{BPPARAM}.
  8 | #' @param BPPARAM An optional parameter object passed internally to \code{\link{bplapply}} when \code{parallel=TRUE}. If not specified, \code{\link{bpparam}()} (default) will be used.
  9 | #' @return
 10 | #' A data frame containing the differential expression (DE) analysis results, rows are genes and columns contain the following items:
 11 | #' \itemize{
 12 | #'   \item theta_1, theta_2, mu_1, mu_2, size_1, size_2, prob_1, prob_2: MLE of the zero-inflated negative binomial distribution's parameters of group 1 and group 2.
 13 | #'   \item total_mean_1, total_mean_2: Mean of read counts of group 1 and group 2.
 14 | #'   \item foldChange: total_mean_1/total_mean_2.
 15 | #'   \item norm_total_mean_1, norm_total_mean_2: Mean of normalized read counts of group 1 and group 2.
 16 | #'   \item norm_foldChange: norm_total_mean_1/norm_total_mean_2.
 17 | #'   \item chi2LR1: Chi-square statistic for hypothesis testing of H0.
 18 | #'   \item pvalue_LR2: P value of hypothesis testing of H20 (Used to determine the type of a DE gene).
 19 | #'   \item pvalue_LR3: P value of hypothesis testing of H30 (Used to determine the type of a DE gene).
 20 | #'   \item FDR_LR2: Adjusted P value of pvalue_LR2 using Benjamini & Hochberg's method (Used to determine the type of a DE gene).
 21 | #'   \item FDR_LR3: Adjusted P value of pvalue_LR3 using Benjamini & Hochberg's method (Used to determine the type of a DE gene).
 22 | #'   \item pvalue: P value of hypothesis testing of H0 (Used to determine whether a gene is a DE gene).
 23 | #'   \item pvalue.adj.FDR: Adjusted P value of H0's pvalue using Benjamini & Hochberg's method (Used to determine whether a gene is a DE gene).
 24 | #'   \item Remark: Record of abnormal program information.
 25 | #' }
 26 | #'
 27 | #' @author Zhun Miao.
 28 | #' @seealso
 29 | #' \code{\link{DEtype}}, for the classification of differentially expressed genes found by \code{\link{DEsingle}}.
 30 | #'
 31 | #' \code{\link{TestData}}, a test dataset for DEsingle.
 32 | #'
 33 | #' @examples
 34 | #' # Load test data for DEsingle
 35 | #' data(TestData)
 36 | #'
 37 | #' # Specifying the two groups to be compared
 38 | #' # The sample number in group 1 and group 2 is 50 and 100 respectively
 39 | #' group <- factor(c(rep(1,50), rep(2,100)))
 40 | #'
 41 | #' # Detecting the differentially expressed genes
 42 | #' results <- DEsingle(counts = counts, group = group)
 43 | #'
 44 | #' # Dividing the differentially expressed genes into 3 categories
 45 | #' results.classified <- DEtype(results = results, threshold = 0.05)
 46 | #'
 47 | #' @import stats
 48 | #' @importFrom BiocParallel bpparam bplapply
 49 | #' @importFrom Matrix Matrix
 50 | #' @importFrom MASS glm.nb fitdistr
 51 | #' @importFrom VGAM dzinegbin
 52 | #' @importFrom bbmle mle2
 53 | #' @importFrom gamlss gamlssML
 54 | #' @importFrom maxLik maxLik
 55 | #' @importFrom pscl zeroinfl
 56 | #' @importMethodsFrom Matrix colSums
 57 | #' @export
 58 | 
 59 | 
 60 | 
 61 | DEsingle <- function(counts, group, parallel = FALSE, BPPARAM = bpparam()){
 62 | 
 63 |   # Handle SingleCellExperiment
 64 |   if(class(counts)[1] == "SingleCellExperiment"){
 65 |     if(!require(SingleCellExperiment))
 66 |       stop("To use SingleCellExperiment as input, you should install the package firstly")
 67 |     counts <- counts(counts)
 68 |   }
 69 | 
 70 |   # Invalid input control
 71 |   if(!is.matrix(counts) & !is.data.frame(counts) & class(counts)[1] != "dgCMatrix")
 72 |     stop("Wrong data type of 'counts'")
 73 |   if(sum(is.na(counts)) > 0)
 74 |     stop("NA detected in 'counts'");gc();
 75 |   if(sum(counts < 0) > 0)
 76 |     stop("Negative value detected in 'counts'");gc();
 77 |   if(all(counts == 0))
 78 |     stop("All elements of 'counts' are zero");gc();
 79 |   if(any(colSums(counts) == 0))
 80 |     warning("Library size of zero detected in 'counts'");gc();
 81 | 
 82 |   if(!is.factor(group))
 83 |     stop("Data type of 'group' is not factor")
 84 |   if(length(levels(group)) != 2)
 85 |     stop("Levels number of 'group' is not two")
 86 |   if(table(group)[1] < 2 | table(group)[2] < 2)
 87 |     stop("Too few samples (< 2) in a group")
 88 |   if(ncol(counts) != length(group))
 89 |     stop("Length of 'group' must equal to column number of 'counts'")
 90 | 
 91 |   if(!is.logical(parallel))
 92 |     stop("Data type of 'parallel' is not logical")
 93 |   if(length(parallel) != 1)
 94 |     stop("Length of 'parallel' is not one")
 95 | 
 96 |   # Preprocessing
 97 |   counts <- round(as.matrix(counts))
 98 |   storage.mode(counts) <- "integer"
 99 |   if(any(rowSums(counts) == 0))
100 |     message("Removing ", sum(rowSums(counts) == 0), " rows of genes with all zero counts")
101 |   counts <- counts[rowSums(counts) != 0,]
102 |   geneNum <- nrow(counts)
103 |   sampleNum <- ncol(counts)
104 |   gc()
105 | 
106 |   # Normalization
107 |   message("Normalizing the data")
108 |   GEOmean <- rep(NA,geneNum)
109 |   for (i in 1:geneNum)
110 |   {
111 |     gene_NZ <- counts[i,counts[i,] > 0]
112 |     GEOmean[i] <- exp(sum(log(gene_NZ), na.rm=TRUE) / length(gene_NZ))
113 |   }
114 |   S <- rep(NA, sampleNum)
115 |   counts_norm <- counts
116 |   for (j in 1:sampleNum)
117 |   {
118 |     sample_j <- counts[,j]/GEOmean
119 |     S[j] <- median(sample_j[which(sample_j != 0)])
120 |     counts_norm[,j] <- counts[,j]/S[j]
121 |   }
122 |   counts_norm <- ceiling(counts_norm)
123 |   remove(GEOmean, gene_NZ, S, sample_j, i, j)
124 |   gc()
125 | 
126 |   # Cache totalMean and foldChange for each gene
127 |   totalMean_1 <- rowMeans(counts[row.names(counts_norm), group == levels(group)[1]])
128 |   totalMean_2 <- rowMeans(counts[row.names(counts_norm), group == levels(group)[2]])
129 |   foldChange <- totalMean_1/totalMean_2
130 |   All_Mean_FC <- cbind(totalMean_1, totalMean_2, foldChange)
131 | 
132 |   # Memory management
133 |   remove(counts, totalMean_1, totalMean_2, foldChange)
134 |   counts_norm <- Matrix(counts_norm, sparse = TRUE)
135 |   gc()
136 | 
137 | 
138 |   # Function of testing homogeneity of two ZINB populations
139 |   CallDE <- function(i){
140 | 
141 |     # Memory management
142 |     if(i %% 100 == 0)
143 |       gc()
144 | 
145 |     # Function input and output
146 |     counts_1 <- counts_norm[i, group == levels(group)[1]]
147 |     counts_2 <- counts_norm[i, group == levels(group)[2]]
148 |     results_gene <- data.frame(row.names = row.names(counts_norm)[i], theta_1 = NA, theta_2 = NA, mu_1 = NA, mu_2 = NA, size_1 = NA, size_2 = NA, prob_1 = NA, prob_2 = NA, total_mean_1 = NA, total_mean_2 = NA, foldChange = NA, norm_total_mean_1 = NA, norm_total_mean_2 = NA, norm_foldChange = NA, chi2LR1 = NA, pvalue_LR2 = NA, pvalue_LR3 = NA, FDR_LR2 = NA, FDR_LR3 = NA, pvalue = NA, pvalue.adj.FDR = NA, Remark = NA)
149 | 
150 |     # Log likelihood functions
151 |     logL <- function(counts_1, theta_1, size_1, prob_1, counts_2, theta_2, size_2, prob_2){
152 |       logL_1 <- sum(dzinegbin(counts_1, size = size_1, prob = prob_1, pstr0 = theta_1, log = TRUE))
153 |       logL_2 <- sum(dzinegbin(counts_2, size = size_2, prob = prob_2, pstr0 = theta_2, log = TRUE))
154 |       logL <- logL_1 + logL_2
155 |       logL
156 |     }
157 |     logL2 <- function(param){
158 |       theta_resL2 <- param[1]
159 |       size_1_resL2 <- param[2]
160 |       prob_1_resL2 <- param[3]
161 |       size_2_resL2 <- param[4]
162 |       prob_2_resL2 <- param[5]
163 |       logL_1 <- sum(dzinegbin(counts_1, size = size_1_resL2, prob = prob_1_resL2, pstr0 = theta_resL2, log = TRUE))
164 |       logL_2 <- sum(dzinegbin(counts_2, size = size_2_resL2, prob = prob_2_resL2, pstr0 = theta_resL2, log = TRUE))
165 |       logL <- logL_1 + logL_2
166 |       logL
167 |     }
168 |     logL2NZ <- function(param){
169 |       theta_resL2 <- 0
170 |       size_1_resL2 <- param[1]
171 |       prob_1_resL2 <- param[2]
172 |       size_2_resL2 <- param[3]
173 |       prob_2_resL2 <- param[4]
174 |       logL_1 <- sum(dzinegbin(counts_1, size = size_1_resL2, prob = prob_1_resL2, pstr0 = theta_resL2, log = TRUE))
175 |       logL_2 <- sum(dzinegbin(counts_2, size = size_2_resL2, prob = prob_2_resL2, pstr0 = theta_resL2, log = TRUE))
176 |       logL <- logL_1 + logL_2
177 |       logL
178 |     }
179 |     logL3 <- function(param){
180 |       theta_1_resL3 <- param[1]
181 |       size_resL3 <- param[2]
182 |       prob_resL3 <- param[3]
183 |       theta_2_resL3 <- param[4]
184 |       logL_1 <- sum(dzinegbin(counts_1, size = size_resL3, prob = prob_resL3, pstr0 = theta_1_resL3, log = TRUE))
185 |       logL_2 <- sum(dzinegbin(counts_2, size = size_resL3, prob = prob_resL3, pstr0 = theta_2_resL3, log = TRUE))
186 |       logL <- logL_1 + logL_2
187 |       logL
188 |     }
189 |     logL3NZ1 <- function(param){
190 |       theta_1_resL3 <- 0
191 |       size_resL3 <- param[1]
192 |       prob_resL3 <- param[2]
193 |       theta_2_resL3 <- param[3]
194 |       logL_1 <- sum(dzinegbin(counts_1, size = size_resL3, prob = prob_resL3, pstr0 = theta_1_resL3, log = TRUE))
195 |       logL_2 <- sum(dzinegbin(counts_2, size = size_resL3, prob = prob_resL3, pstr0 = theta_2_resL3, log = TRUE))
196 |       logL <- logL_1 + logL_2
197 |       logL
198 |     }
199 |     logL3NZ2 <- function(param){
200 |       theta_1_resL3 <- param[1]
201 |       size_resL3 <- param[2]
202 |       prob_resL3 <- param[3]
203 |       theta_2_resL3 <- 0
204 |       logL_1 <- sum(dzinegbin(counts_1, size = size_resL3, prob = prob_resL3, pstr0 = theta_1_resL3, log = TRUE))
205 |       logL_2 <- sum(dzinegbin(counts_2, size = size_resL3, prob = prob_resL3, pstr0 = theta_2_resL3, log = TRUE))
206 |       logL <- logL_1 + logL_2
207 |       logL
208 |     }
209 |     logL3AZ1 <- function(param){
210 |       theta_1_resL3 <- 1
211 |       size_resL3 <- param[1]
212 |       prob_resL3 <- param[2]
213 |       theta_2_resL3 <- param[3]
214 |       logL_1 <- sum(dzinegbin(counts_1, size = size_resL3, prob = prob_resL3, pstr0 = theta_1_resL3, log = TRUE))
215 |       logL_2 <- sum(dzinegbin(counts_2, size = size_resL3, prob = prob_resL3, pstr0 = theta_2_resL3, log = TRUE))
216 |       logL <- logL_1 + logL_2
217 |       logL
218 |     }
219 |     logL3AZ2 <- function(param){
220 |       theta_1_resL3 <- param[1]
221 |       size_resL3 <- param[2]
222 |       prob_resL3 <- param[3]
223 |       theta_2_resL3 <- 1
224 |       logL_1 <- sum(dzinegbin(counts_1, size = size_resL3, prob = prob_resL3, pstr0 = theta_1_resL3, log = TRUE))
225 |       logL_2 <- sum(dzinegbin(counts_2, size = size_resL3, prob = prob_resL3, pstr0 = theta_2_resL3, log = TRUE))
226 |       logL <- logL_1 + logL_2
227 |       logL
228 |     }
229 |     logL3NZ1AZ2 <- function(param){
230 |       theta_1_resL3 <- 0
231 |       size_resL3 <- param[1]
232 |       prob_resL3 <- param[2]
233 |       theta_2_resL3 <- 1
234 |       logL_1 <- sum(dzinegbin(counts_1, size = size_resL3, prob = prob_resL3, pstr0 = theta_1_resL3, log = TRUE))
235 |       logL_2 <- sum(dzinegbin(counts_2, size = size_resL3, prob = prob_resL3, pstr0 = theta_2_resL3, log = TRUE))
236 |       logL <- logL_1 + logL_2
237 |       logL
238 |     }
239 |     logL3NZ2AZ1 <- function(param){
240 |       theta_1_resL3 <- 1
241 |       size_resL3 <- param[1]
242 |       prob_resL3 <- param[2]
243 |       theta_2_resL3 <- 0
244 |       logL_1 <- sum(dzinegbin(counts_1, size = size_resL3, prob = prob_resL3, pstr0 = theta_1_resL3, log = TRUE))
245 |       logL_2 <- sum(dzinegbin(counts_2, size = size_resL3, prob = prob_resL3, pstr0 = theta_2_resL3, log = TRUE))
246 |       logL <- logL_1 + logL_2
247 |       logL
248 |     }
249 |     judgeParam <- function(param){
250 |       if((param >= 0) & (param <= 1))
251 |         res <- TRUE
252 |       else
253 |         res <- FALSE
254 |       res
255 |     }
256 | 
257 |     # MLE of parameters of ZINB counts_1
258 |     if(sum(counts_1 == 0) > 0){
259 |       if(sum(counts_1 == 0) == length(counts_1)){
260 |         theta_1 <- 1
261 |         mu_1 <- 0
262 |         size_1 <- 1
263 |         prob_1 <- size_1/(size_1 + mu_1)
264 |       }else{
265 |         options(show.error.messages = FALSE)
266 |         zinb_try <- try(gamlssML(counts_1, family="ZINBI"), silent=TRUE)
267 |         options(show.error.messages = TRUE)
268 |         if('try-error' %in% class(zinb_try)){
269 |           zinb_try_twice <- try(zeroinfl(formula = counts_1 ~ 1 | 1, dist = "negbin"), silent=TRUE)
270 |           if('try-error' %in% class(zinb_try_twice)){
271 |             print("MLE of ZINB failed!");
272 |             results_gene[1,"Remark"] <- "ZINB failed!"
273 |             return(results_gene)
274 |           }else{
275 |             zinb_1 <- zinb_try_twice
276 |             theta_1 <- plogis(zinb_1$coefficients$zero);names(theta_1) <- NULL
277 |             mu_1 <- exp(zinb_1$coefficients$count);names(mu_1) <- NULL
278 |             size_1 <- zinb_1$theta;names(size_1) <- NULL
279 |             prob_1 <- size_1/(size_1 + mu_1);names(prob_1) <- NULL
280 |           }
281 |         }else{
282 |           zinb_1 <- zinb_try
283 |           theta_1 <- zinb_1$nu;names(theta_1) <- NULL
284 |           mu_1 <- zinb_1$mu;names(mu_1) <- NULL
285 |           size_1 <- 1/zinb_1$sigma;names(size_1) <- NULL
286 |           prob_1 <- size_1/(size_1 + mu_1);names(prob_1) <- NULL
287 |         }
288 |       }
289 |     }else{
290 |       op <- options(warn=2)
291 |       nb_try <- try(glm.nb(formula = counts_1 ~ 1), silent=TRUE)
292 |       options(op)
293 |       if('try-error' %in% class(nb_try)){
294 |         nb_try_twice <- try(fitdistr(counts_1, "Negative Binomial"), silent=TRUE)
295 |         if('try-error' %in% class(nb_try_twice)){
296 |           nb_try_again <- try(mle2(counts_1~dnbinom(mu=exp(logmu),size=1/invk), data=data.frame(counts_1), start=list(logmu=0,invk=1), method="L-BFGS-B", lower=c(logmu=-Inf,invk=1e-8)), silent=TRUE)
297 |           if('try-error' %in% class(nb_try_again)){
298 |             nb_try_fourth <- try(glm.nb(formula = counts_1 ~ 1), silent=TRUE)
299 |             if('try-error' %in% class(nb_try_fourth)){
300 |               print("MLE of NB failed!");
301 |               results_gene[1,"Remark"] <- "NB failed!"
302 |               return(results_gene)
303 |             }else{
304 |               nb_1 <- nb_try_fourth
305 |               theta_1 <- 0
306 |               mu_1 <- exp(nb_1$coefficients);names(mu_1) <- NULL
307 |               size_1 <- nb_1$theta;names(size_1) <- NULL
308 |               prob_1 <- size_1/(size_1 + mu_1);names(prob_1) <- NULL
309 |             }
310 |           }else{
311 |             nb_1 <- nb_try_again
312 |             theta_1 <- 0
313 |             mu_1 <- exp(nb_1@coef["logmu"]);names(mu_1) <- NULL
314 |             size_1 <- 1/nb_1@coef["invk"];names(size_1) <- NULL
315 |             prob_1 <- size_1/(size_1 + mu_1);names(prob_1) <- NULL
316 |           }
317 |         }else{
318 |           nb_1 <- nb_try_twice
319 |           theta_1 <- 0
320 |           mu_1 <- nb_1$estimate["mu"];names(mu_1) <- NULL
321 |           size_1 <- nb_1$estimate["size"];names(size_1) <- NULL
322 |           prob_1 <- size_1/(size_1 + mu_1);names(prob_1) <- NULL
323 |         }
324 |       }else{
325 |         nb_1 <- nb_try
326 |         theta_1 <- 0
327 |         mu_1 <- exp(nb_1$coefficients);names(mu_1) <- NULL
328 |         size_1 <- nb_1$theta;names(size_1) <- NULL
329 |         prob_1 <- size_1/(size_1 + mu_1);names(prob_1) <- NULL
330 |       }
331 |     }
332 | 
333 |     # MLE of parameters of ZINB counts_2
334 |     if(sum(counts_2 == 0) > 0){
335 |       if(sum(counts_2 == 0) == length(counts_2)){
336 |         theta_2 <- 1
337 |         mu_2 <- 0
338 |         size_2 <- 1
339 |         prob_2 <- size_2/(size_2 + mu_2)
340 |       }else{
341 |         options(show.error.messages = FALSE)
342 |         zinb_try <- try(gamlssML(counts_2, family="ZINBI"), silent=TRUE)
343 |         options(show.error.messages = TRUE)
344 |         if('try-error' %in% class(zinb_try)){
345 |           zinb_try_twice <- try(zeroinfl(formula = counts_2 ~ 1 | 1, dist = "negbin"), silent=TRUE)
346 |           if('try-error' %in% class(zinb_try_twice)){
347 |             print("MLE of ZINB failed!");
348 |             results_gene[1,"Remark"] <- "ZINB failed!"
349 |             return(results_gene)
350 |           }else{
351 |             zinb_2 <- zinb_try_twice
352 |             theta_2 <- plogis(zinb_2$coefficients$zero);names(theta_2) <- NULL
353 |             mu_2 <- exp(zinb_2$coefficients$count);names(mu_2) <- NULL
354 |             size_2 <- zinb_2$theta;names(size_2) <- NULL
355 |             prob_2 <- size_2/(size_2 + mu_2);names(prob_2) <- NULL
356 |           }
357 |         }else{
358 |           zinb_2 <- zinb_try
359 |           theta_2 <- zinb_2$nu;names(theta_2) <- NULL
360 |           mu_2 <- zinb_2$mu;names(mu_2) <- NULL
361 |           size_2 <- 1/zinb_2$sigma;names(size_2) <- NULL
362 |           prob_2 <- size_2/(size_2 + mu_2);names(prob_2) <- NULL
363 |         }
364 |       }
365 |     }else{
366 |       op <- options(warn=2)
367 |       nb_try <- try(glm.nb(formula = counts_2 ~ 1), silent=TRUE)
368 |       options(op)
369 |       if('try-error' %in% class(nb_try)){
370 |         nb_try_twice <- try(fitdistr(counts_2, "Negative Binomial"), silent=TRUE)
371 |         if('try-error' %in% class(nb_try_twice)){
372 |           nb_try_again <- try(mle2(counts_2~dnbinom(mu=exp(logmu),size=1/invk), data=data.frame(counts_2), start=list(logmu=0,invk=1), method="L-BFGS-B", lower=c(logmu=-Inf,invk=1e-8)), silent=TRUE)
373 |           if('try-error' %in% class(nb_try_again)){
374 |             nb_try_fourth <- try(glm.nb(formula = counts_2 ~ 1), silent=TRUE)
375 |             if('try-error' %in% class(nb_try_fourth)){
376 |               print("MLE of NB failed!");
377 |               results_gene[1,"Remark"] <- "NB failed!"
378 |               return(results_gene)
379 |             }else{
380 |               nb_2 <- nb_try_fourth
381 |               theta_2 <- 0
382 |               mu_2 <- exp(nb_2$coefficients);names(mu_2) <- NULL
383 |               size_2 <- nb_2$theta;names(size_2) <- NULL
384 |               prob_2 <- size_2/(size_2 + mu_2);names(prob_2) <- NULL
385 |             }
386 |           }else{
387 |             nb_2 <- nb_try_again
388 |             theta_2 <- 0
389 |             mu_2 <- exp(nb_2@coef["logmu"]);names(mu_2) <- NULL
390 |             size_2 <- 1/nb_2@coef["invk"];names(size_2) <- NULL
391 |             prob_2 <- size_2/(size_2 + mu_2);names(prob_2) <- NULL
392 |           }
393 |         }else{
394 |           nb_2 <- nb_try_twice
395 |           theta_2 <- 0
396 |           mu_2 <- nb_2$estimate["mu"];names(mu_2) <- NULL
397 |           size_2 <- nb_2$estimate["size"];names(size_2) <- NULL
398 |           prob_2 <- size_2/(size_2 + mu_2);names(prob_2) <- NULL
399 |         }
400 |       }else{
401 |         nb_2 <- nb_try
402 |         theta_2 <- 0
403 |         mu_2 <- exp(nb_2$coefficients);names(mu_2) <- NULL
404 |         size_2 <- nb_2$theta;names(size_2) <- NULL
405 |         prob_2 <- size_2/(size_2 + mu_2);names(prob_2) <- NULL
406 |       }
407 |     }
408 | 
409 |     # Restricted MLE under H0 (MLE of c(counts_1, counts_2))
410 |     if(sum(c(counts_1, counts_2) == 0) > 0){
411 |       options(show.error.messages = FALSE)
412 |       zinb_try <- try(gamlssML(c(counts_1, counts_2), family="ZINBI"), silent=TRUE)
413 |       options(show.error.messages = TRUE)
414 |       if('try-error' %in% class(zinb_try)){
415 |         zinb_try_twice <- try(zeroinfl(formula = c(counts_1, counts_2) ~ 1 | 1, dist = "negbin"), silent=TRUE)
416 |         if('try-error' %in% class(zinb_try_twice)){
417 |           print("MLE of ZINB failed!");
418 |           results_gene[1,"Remark"] <- "ZINB failed!"
419 |           return(results_gene)
420 |         }else{
421 |           zinb_res <- zinb_try_twice
422 |           theta_res <- plogis(zinb_res$coefficients$zero);names(theta_res) <- NULL
423 |           mu_res <- exp(zinb_res$coefficients$count);names(mu_res) <- NULL
424 |           size_res <- zinb_res$theta;names(size_res) <- NULL
425 |           prob_res <- size_res/(size_res + mu_res);names(prob_res) <- NULL
426 |         }
427 |       }else{
428 |         zinb_res <- zinb_try
429 |         theta_res <- zinb_res$nu;names(theta_res) <- NULL
430 |         mu_res <- zinb_res$mu;names(mu_res) <- NULL
431 |         size_res <- 1/zinb_res$sigma;names(size_res) <- NULL
432 |         prob_res <- size_res/(size_res + mu_res);names(prob_res) <- NULL
433 |       }
434 |     }else{
435 |       op <- options(warn=2)
436 |       nb_try <- try(glm.nb(formula = c(counts_1, counts_2) ~ 1), silent=TRUE)
437 |       options(op)
438 |       if('try-error' %in% class(nb_try)){
439 |         nb_try_twice <- try(fitdistr(c(counts_1, counts_2), "Negative Binomial"), silent=TRUE)
440 |         if('try-error' %in% class(nb_try_twice)){
441 |           nb_try_again <- try(mle2(c(counts_1, counts_2)~dnbinom(mu=exp(logmu),size=1/invk), data=data.frame(c(counts_1, counts_2)), start=list(logmu=0,invk=1), method="L-BFGS-B", lower=c(logmu=-Inf,invk=1e-8)), silent=TRUE)
442 |           if('try-error' %in% class(nb_try_again)){
443 |             nb_try_fourth <- try(glm.nb(formula = c(counts_1, counts_2) ~ 1), silent=TRUE)
444 |             if('try-error' %in% class(nb_try_fourth)){
445 |               print("MLE of NB failed!");
446 |               results_gene[1,"Remark"] <- "NB failed!"
447 |               return(results_gene)
448 |             }else{
449 |               nb_res <- nb_try_fourth
450 |               theta_res <- 0
451 |               mu_res <- exp(nb_res$coefficients);names(mu_res) <- NULL
452 |               size_res <- nb_res$theta;names(size_res) <- NULL
453 |               prob_res <- size_res/(size_res + mu_res);names(prob_res) <- NULL
454 |             }
455 |           }else{
456 |             nb_res <- nb_try_again
457 |             theta_res <- 0
458 |             mu_res <- exp(nb_res@coef["logmu"]);names(mu_res) <- NULL
459 |             size_res <- 1/nb_res@coef["invk"];names(size_res) <- NULL
460 |             prob_res <- size_res/(size_res + mu_res);names(prob_res) <- NULL
461 |           }
462 |         }else{
463 |           nb_res <- nb_try_twice
464 |           theta_res <- 0
465 |           mu_res <- nb_res$estimate["mu"];names(mu_res) <- NULL
466 |           size_res <- nb_res$estimate["size"];names(size_res) <- NULL
467 |           prob_res <- size_res/(size_res + mu_res);names(prob_res) <- NULL
468 |         }
469 |       }else{
470 |         nb_res <- nb_try
471 |         theta_res <- 0
472 |         mu_res <- exp(nb_res$coefficients);names(mu_res) <- NULL
473 |         size_res <- nb_res$theta;names(size_res) <- NULL
474 |         prob_res <- size_res/(size_res + mu_res);names(prob_res) <- NULL
475 |       }
476 |     }
477 | 
478 |     # # LRT test of H0
479 |     chi2LR1 <- 2 *(logL(counts_1, theta_1, size_1, prob_1, counts_2, theta_2, size_2, prob_2) - logL(counts_1, theta_res, size_res, prob_res, counts_2, theta_res, size_res, prob_res))
480 |     pvalue <- 1 - pchisq(chi2LR1, df = 3)
481 | 
482 |     # Format output
483 |     results_gene[1,"theta_1"] <- theta_1
484 |     results_gene[1,"theta_2"] <- theta_2
485 |     results_gene[1,"mu_1"] <- mu_1
486 |     results_gene[1,"mu_2"] <- mu_2
487 |     results_gene[1,"size_1"] <- size_1
488 |     results_gene[1,"size_2"] <- size_2
489 |     results_gene[1,"prob_1"] <- prob_1
490 |     results_gene[1,"prob_2"] <- prob_2
491 |     results_gene[1,"norm_total_mean_1"] <- mean(counts_1)
492 |     results_gene[1,"norm_total_mean_2"] <- mean(counts_2)
493 |     results_gene[1,"norm_foldChange"] <- results_gene[1,"norm_total_mean_1"] / results_gene[1,"norm_total_mean_2"]
494 |     results_gene[1,"chi2LR1"] <- chi2LR1
495 |     results_gene[1,"pvalue"] <- pvalue
496 | 
497 |     # Restricted MLE of logL2 and logL3 under H20 and H30 when pvalue <= 0.05
498 |     if(pvalue <= 0.05){
499 |       if(sum(c(counts_1, counts_2) == 0) > 0){
500 |         options(warn=-1)
501 |         # Restricted MLE of logL2
502 |         A <- matrix(rbind(c(1, 0, 0, 0, 0), c(-1, 0, 0, 0, 0), c(0, 0, 1, 0 ,0), c(0, 0, -1, 0 ,0), c(0, 0, 0, 0 ,1), c(0, 0, 0, 0 ,-1)), 6, 5)
503 |         B <- c(1e-10, 1+1e-10, 1e-10, 1+1e-10, 1e-10, 1+1e-10)
504 |         mleL2 <- try(maxLik(logLik = logL2, start = c(theta_resL2 = 0.5, size_1_resL2 = 1, prob_1_resL2 = 0.5, size_2_resL2 = 1, prob_2_resL2 = 0.5), constraints=list(ineqA=A, ineqB=B)), silent=TRUE)
505 |         if('try-error' %in% class(mleL2)){
506 |           mleL2 <- try(maxLik(logLik = logL2, start = c(theta_resL2 = 0, size_1_resL2 = 1, prob_1_resL2 = 0.5, size_2_resL2 = 1, prob_2_resL2 = 0.5), constraints=list(ineqA=A, ineqB=B)), silent=TRUE)
507 |         }
508 |         if('try-error' %in% class(mleL2)){
509 |           mleL2 <- try(maxLik(logLik = logL2, start = c(theta_resL2 = 1, size_1_resL2 = 1, prob_1_resL2 = 0.5, size_2_resL2 = 1, prob_2_resL2 = 0.5), constraints=list(ineqA=A, ineqB=B)), silent=TRUE)
510 |         }
511 |         if('try-error' %in% class(mleL2)){
512 |           A <- matrix(rbind(c(0, 1, 0, 0), c(0, -1, 0, 0), c(0, 0, 0 ,1), c(0, 0, 0 ,-1)), 4, 4)
513 |           B <- c(1e-10, 1+1e-10, 1e-10, 1+1e-10)
514 |           mleL2 <- maxLik(logLik = logL2NZ, start = c(size_1_resL2 = 1, prob_1_resL2 = 0.5, size_2_resL2 = 1, prob_2_resL2 = 0.5), constraints=list(ineqA=A, ineqB=B))
515 |           theta_resL2 <- 0
516 |           size_1_resL2 <- mleL2$estimate["size_1_resL2"];names(size_1_resL2) <- NULL
517 |           prob_1_resL2 <- mleL2$estimate["prob_1_resL2"];names(prob_1_resL2) <- NULL
518 |           size_2_resL2 <- mleL2$estimate["size_2_resL2"];names(size_2_resL2) <- NULL
519 |           prob_2_resL2 <- mleL2$estimate["prob_2_resL2"];names(prob_2_resL2) <- NULL
520 |         }else{
521 |           theta_resL2 <- mleL2$estimate["theta_resL2"];names(theta_resL2) <- NULL
522 |           size_1_resL2 <- mleL2$estimate["size_1_resL2"];names(size_1_resL2) <- NULL
523 |           prob_1_resL2 <- mleL2$estimate["prob_1_resL2"];names(prob_1_resL2) <- NULL
524 |           size_2_resL2 <- mleL2$estimate["size_2_resL2"];names(size_2_resL2) <- NULL
525 |           prob_2_resL2 <- mleL2$estimate["prob_2_resL2"];names(prob_2_resL2) <- NULL
526 |         }
527 | 
528 |         # Restricted MLE of logL3
529 |         if((sum(counts_1 == 0) > 0) & (sum(counts_2 == 0) > 0)){
530 |           # logL3
531 |           if(sum(counts_1 == 0) == length(counts_1)){
532 |             A <- matrix(rbind(c(0, 1, 0), c(0, -1, 0), c(0, 0 ,1), c(0, 0 ,-1)), 4, 3)
533 |             B <- c(1e-10, 1+1e-10, 1e-10, 1+1e-10)
534 |             mleL3 <- maxLik(logLik = logL3AZ1, start = c(size_resL3 = 1, prob_resL3 = 0.5, theta_2_resL3 = 0.5), constraints=list(ineqA=A, ineqB=B))
535 |             theta_1_resL3 <- 1
536 |             size_resL3 <- mleL3$estimate["size_resL3"];names(size_resL3) <- NULL
537 |             prob_resL3 <- mleL3$estimate["prob_resL3"];names(prob_resL3) <- NULL
538 |             theta_2_resL3 <- mleL3$estimate["theta_2_resL3"];names(theta_2_resL3) <- NULL
539 |           }else if(sum(counts_2 == 0) == length(counts_2)){
540 |             A <- matrix(rbind(c(1, 0, 0), c(-1, 0, 0), c(0, 0 ,1), c(0, 0 ,-1)), 4, 3)
541 |             B <- c(1e-10, 1+1e-10, 1e-10, 1+1e-10)
542 |             mleL3 <- maxLik(logLik = logL3AZ2, start = c(theta_1_resL3 = 0.5, size_resL3 = 1, prob_resL3 = 0.5), constraints=list(ineqA=A, ineqB=B))
543 |             theta_1_resL3 <- mleL3$estimate["theta_1_resL3"];names(theta_1_resL3) <- NULL
544 |             size_resL3 <- mleL3$estimate["size_resL3"];names(size_resL3) <- NULL
545 |             prob_resL3 <- mleL3$estimate["prob_resL3"];names(prob_resL3) <- NULL
546 |             theta_2_resL3 <- 1
547 |           }else{
548 |             A <- matrix(rbind(c(1, 0, 0, 0), c(-1, 0, 0, 0), c(0, 0, 1, 0), c(0, 0, -1, 0), c(0, 0, 0 ,1), c(0, 0, 0 ,-1)), 6, 4)
549 |             B <- c(1e-10, 1+1e-10, 1e-10, 1+1e-10, 1e-10, 1+1e-10)
550 |             mleL3 <- maxLik(logLik = logL3, start = c(theta_1_resL3 = 0.5, size_resL3 = 1, prob_resL3 = 0.5, theta_2_resL3 = 0.5), constraints=list(ineqA=A, ineqB=B))
551 |             theta_1_resL3 <- mleL3$estimate["theta_1_resL3"];names(theta_1_resL3) <- NULL
552 |             size_resL3 <- mleL3$estimate["size_resL3"];names(size_resL3) <- NULL
553 |             prob_resL3 <- mleL3$estimate["prob_resL3"];names(prob_resL3) <- NULL
554 |             theta_2_resL3 <- mleL3$estimate["theta_2_resL3"];names(theta_2_resL3) <- NULL
555 |           }
556 |         }else if(sum(counts_1 == 0) == 0){
557 |           # logL3
558 |           if(sum(counts_2 == 0) == length(counts_2)){
559 |             A <- matrix(rbind(c(0, 1), c(0, -1)), 2, 2)
560 |             B <- c(1e-10, 1+1e-10)
561 |             mleL3 <- maxLik(logLik = logL3NZ1AZ2, start = c(size_resL3 = 1, prob_resL3 = 0.5), constraints=list(ineqA=A, ineqB=B))
562 |             theta_1_resL3 <- 0
563 |             size_resL3 <- mleL3$estimate["size_resL3"];names(size_resL3) <- NULL
564 |             prob_resL3 <- mleL3$estimate["prob_resL3"];names(prob_resL3) <- NULL
565 |             theta_2_resL3 <- 1
566 |           }else{
567 |             A <- matrix(rbind(c(0, 1, 0), c(0, -1, 0), c(0, 0 ,1), c(0, 0 ,-1)), 4, 3)
568 |             B <- c(1e-10, 1+1e-10, 1e-10, 1+1e-10)
569 |             mleL3 <- maxLik(logLik = logL3NZ1, start = c(size_resL3 = 1, prob_resL3 = 0.5, theta_2_resL3 = 0.5), constraints=list(ineqA=A, ineqB=B))
570 |             theta_1_resL3 <- 0
571 |             size_resL3 <- mleL3$estimate["size_resL3"];names(size_resL3) <- NULL
572 |             prob_resL3 <- mleL3$estimate["prob_resL3"];names(prob_resL3) <- NULL
573 |             theta_2_resL3 <- mleL3$estimate["theta_2_resL3"];names(theta_2_resL3) <- NULL
574 |           }
575 |         }else if(sum(counts_2 == 0) == 0){
576 |           # logL3
577 |           if(sum(counts_1 == 0) == length(counts_1)){
578 |             A <- matrix(rbind(c(0, 1), c(0, -1)), 2, 2)
579 |             B <- c(1e-10, 1+1e-10)
580 |             mleL3 <- maxLik(logLik = logL3NZ2AZ1, start = c(size_resL3 = 1, prob_resL3 = 0.5), constraints=list(ineqA=A, ineqB=B))
581 |             theta_1_resL3 <- 1
582 |             size_resL3 <- mleL3$estimate["size_resL3"];names(size_resL3) <- NULL
583 |             prob_resL3 <- mleL3$estimate["prob_resL3"];names(prob_resL3) <- NULL
584 |             theta_2_resL3 <- 0
585 |           }else{
586 |             A <- matrix(rbind(c(1, 0, 0), c(-1, 0, 0), c(0, 0 ,1), c(0, 0 ,-1)), 4, 3)
587 |             B <- c(1e-10, 1+1e-10, 1e-10, 1+1e-10)
588 |             mleL3 <- maxLik(logLik = logL3NZ2, start = c(theta_1_resL3 = 0.5, size_resL3 = 1, prob_resL3 = 0.5), constraints=list(ineqA=A, ineqB=B))
589 |             theta_1_resL3 <- mleL3$estimate["theta_1_resL3"];names(theta_1_resL3) <- NULL
590 |             size_resL3 <- mleL3$estimate["size_resL3"];names(size_resL3) <- NULL
591 |             prob_resL3 <- mleL3$estimate["prob_resL3"];names(prob_resL3) <- NULL
592 |             theta_2_resL3 <- 0
593 |           }
594 |         }
595 |         options(warn=0)
596 |       }else{
597 |         # Restricted MLE of logL2
598 |         theta_resL2 <- 0
599 |         size_1_resL2 <- size_1
600 |         prob_1_resL2 <- prob_1
601 |         size_2_resL2 <- size_2
602 |         prob_2_resL2 <- prob_2
603 | 
604 |         # Restricted MLE of logL3
605 |         theta_1_resL3 <- 0
606 |         size_resL3 <- size_res
607 |         prob_resL3 <- prob_res
608 |         theta_2_resL3 <- 0
609 |       }
610 | 
611 |       # Judge parameters
612 |       if(!(judgeParam(theta_resL2) & judgeParam(prob_1_resL2) & judgeParam(prob_2_resL2)))
613 |         results_gene[1,"Remark"] <- "logL2 failed!"
614 |       if(!(judgeParam(theta_1_resL3) & judgeParam(theta_2_resL3) & judgeParam(prob_resL3)))
615 |         results_gene[1,"Remark"] <- "logL3 failed!"
616 | 
617 |       # LRT test of H20 and H30
618 |       chi2LR2 <- 2 *(logL(counts_1, theta_1, size_1, prob_1, counts_2, theta_2, size_2, prob_2) - logL(counts_1, theta_resL2, size_1_resL2, prob_1_resL2, counts_2, theta_resL2, size_2_resL2, prob_2_resL2))
619 |       pvalue_LR2 <- 1 - pchisq(chi2LR2, df = 1)
620 |       chi2LR3 <- 2 *(logL(counts_1, theta_1, size_1, prob_1, counts_2, theta_2, size_2, prob_2) - logL(counts_1, theta_1_resL3, size_resL3, prob_resL3, counts_2, theta_2_resL3, size_resL3, prob_resL3))
621 |       pvalue_LR3 <- 1 - pchisq(chi2LR3, df = 2)
622 | 
623 |       # Format output
624 |       results_gene[1,"pvalue_LR2"] <- pvalue_LR2
625 |       results_gene[1,"pvalue_LR3"] <- pvalue_LR3
626 |     }
627 | 
628 |     # Return results_gene
629 |     return(results_gene)
630 |   }
631 | 
632 | 
633 |   # Call DEG gene by gene
634 |   if(!parallel){
635 |     results <- matrix(data=NA, nrow = geneNum, ncol = 22, dimnames = list(row.names(counts_norm), c("theta_1", "theta_2", "mu_1", "mu_2", "size_1", "size_2", "prob_1", "prob_2", "total_mean_1", "total_mean_2", "foldChange", "norm_total_mean_1", "norm_total_mean_2", "norm_foldChange", "chi2LR1", "pvalue_LR2", "pvalue_LR3", "FDR_LR2", "FDR_LR3", "pvalue", "pvalue.adj.FDR", "Remark")))
636 |     results <- as.data.frame(results)
637 |     for(i in 1:geneNum){
638 |       cat("\r",paste0("DEsingle is analyzing ", i," of ",geneNum," expressed genes"))
639 |       results[i,] <- CallDE(i)
640 |     }
641 |   }else{
642 |     message("DEsingle is analyzing ", geneNum, " expressed genes in parallel")
643 |     results <- do.call(rbind, bplapply(1:geneNum, CallDE, BPPARAM = BPPARAM))
644 |   }
645 | 
646 |   # Format output results
647 |   results[, c("total_mean_1", "total_mean_2", "foldChange")] <- All_Mean_FC
648 |   results[,"FDR_LR2"] <- p.adjust(results[,"pvalue_LR2"], method="fdr")
649 |   results[,"FDR_LR3"] <- p.adjust(results[,"pvalue_LR3"], method="fdr")
650 |   results[,"pvalue.adj.FDR"] <- p.adjust(results[,"pvalue"], method="fdr")
651 |   results <- results[order(results[,"chi2LR1"], decreasing = TRUE),]
652 | 
653 |   # Abnormity control
654 |   if(exists("lastFuncGrad") & exists("lastFuncParam"))
655 |     remove(lastFuncGrad, lastFuncParam, envir=.GlobalEnv)
656 |   if(sum(!is.na(results[,"Remark"])) != 0)
657 |     cat(paste0("\n\n ",sum(!is.na(results[,"Remark"])), " gene failed.\n\n"))
658 | 
659 |   return(results)
660 | 
661 | 
662 | }
663 | 
664 | 
665 | 
666 | 
667 | 


--------------------------------------------------------------------------------
/R/DEtype.R:
--------------------------------------------------------------------------------
  1 | #' DEtype: Classifying differentially expressed genes from DEsingle
  2 | #'
  3 | #' This function is used to classify the differentially expressed genes of single-cell RNA-seq (scRNA-seq) data found by \code{DEsingle}. It takes the output data frame from \code{DEsingle} as input.
  4 | #'
  5 | #' @param results A output data frame from \code{DEsingle}, which contains the unclassified differential expression analysis results.
  6 | #' @param threshold A number of (0,1) to specify the threshold of FDR.
  7 | #' @return
  8 | #' A data frame containing the differential expression (DE) analysis results and DE gene types and states.
  9 | #' \itemize{
 10 | #'   \item theta_1, theta_2, mu_1, mu_2, size_1, size_2, prob_1, prob_2: MLE of the zero-inflated negative binomial distribution's parameters of group 1 and group 2.
 11 | #'   \item total_mean_1, total_mean_2: Mean of read counts of group 1 and group 2.
 12 | #'   \item foldChange: total_mean_1/total_mean_2.
 13 | #'   \item norm_total_mean_1, norm_total_mean_2: Mean of normalized read counts of group 1 and group 2.
 14 | #'   \item norm_foldChange: norm_total_mean_1/norm_total_mean_2.
 15 | #'   \item chi2LR1: Chi-square statistic for hypothesis testing of H0.
 16 | #'   \item pvalue_LR2: P value of hypothesis testing of H20 (Used to determine the type of a DE gene).
 17 | #'   \item pvalue_LR3: P value of hypothesis testing of H30 (Used to determine the type of a DE gene).
 18 | #'   \item FDR_LR2: Adjusted P value of pvalue_LR2 using Benjamini & Hochberg's method (Used to determine the type of a DE gene).
 19 | #'   \item FDR_LR3: Adjusted P value of pvalue_LR3 using Benjamini & Hochberg's method (Used to determine the type of a DE gene).
 20 | #'   \item pvalue: P value of hypothesis testing of H0 (Used to determine whether a gene is a DE gene).
 21 | #'   \item pvalue.adj.FDR: Adjusted P value of H0's pvalue using Benjamini & Hochberg's method (Used to determine whether a gene is a DE gene).
 22 | #'   \item Remark: Record of abnormal program information.
 23 | #'   \item Type: Types of DE genes. DEs represents different expression status; DEa represents differential expression abundance; DEg represents general differential expression.
 24 | #'   \item State: State of DE genes, up represents up-regulated; down represents down-regulated.
 25 | #' }
 26 | #'
 27 | #' @author Zhun Miao.
 28 | #' @seealso
 29 | #' \code{\link{DEsingle}}, for the detection of differentially expressed genes from scRNA-seq data.
 30 | #'
 31 | #' \code{\link{TestData}}, a test dataset for DEsingle.
 32 | #'
 33 | #' @examples
 34 | #' # Load test data for DEsingle
 35 | #' data(TestData)
 36 | #'
 37 | #' # Specifying the two groups to be compared
 38 | #' # The sample number in group 1 and group 2 is 50 and 100 respectively
 39 | #' group <- factor(c(rep(1,50), rep(2,100)))
 40 | #'
 41 | #' # Detecting the differentially expressed genes
 42 | #' results <- DEsingle(counts = counts, group = group)
 43 | #'
 44 | #' # Dividing the differentially expressed genes into 3 categories
 45 | #' results.classified <- DEtype(results = results, threshold = 0.05)
 46 | #'
 47 | #' @import stats
 48 | #' @importFrom BiocParallel bpparam bplapply
 49 | #' @importFrom Matrix Matrix
 50 | #' @importFrom MASS glm.nb fitdistr
 51 | #' @importFrom VGAM dzinegbin
 52 | #' @importFrom bbmle mle2
 53 | #' @importFrom gamlss gamlssML
 54 | #' @importFrom maxLik maxLik
 55 | #' @importFrom pscl zeroinfl
 56 | #' @importMethodsFrom Matrix colSums
 57 | #' @export
 58 | 
 59 | 
 60 | 
 61 | DEtype <- function(results, threshold){
 62 |   # Invalid input judge
 63 |   if(class(results) != "data.frame")
 64 |     stop("Invalid input of wrong data type of results")
 65 |   if(ncol(results) != 22)
 66 |     stop("Invalid input of wrong column number of results")
 67 |   if(colnames(results)[21] != "pvalue.adj.FDR" | colnames(results)[16] != "pvalue_LR2" | colnames(results)[17] != "pvalue_LR3")
 68 |     stop("Invalid input of wrong column name of results")
 69 |   if(class(threshold) != "numeric")
 70 |     stop("Invalid input of wrong data type of threshold")
 71 |   if(threshold <= 0 | threshold > 0.1)
 72 |     stop("Invalid input of wrong range of threshold")
 73 | 
 74 |   # Classify the types of DE genes
 75 |   results <- cbind(results, NA, NA)
 76 |   colnames(results)[c(ncol(results)-1, ncol(results))] <- c("Type", "State")
 77 |   for(i in 1:nrow(results)){
 78 |     if(results[i,"pvalue.adj.FDR"] < threshold)
 79 |     {
 80 |       if(results[i,"pvalue_LR2"] < threshold & results[i,"pvalue_LR3"] < threshold){
 81 |         results[i,"Type"] <- "DEg"
 82 |         if(results[i,"mu_1"] * (1 - results[i,"theta_1"]) >= results[i,"mu_2"] * (1 - results[i,"theta_2"]))
 83 |           results[i,"State"] <- "up"
 84 |         else
 85 |           results[i,"State"] <- "down"
 86 |       }
 87 |       else if(results[i,"pvalue_LR2"] < threshold){
 88 |         results[i,"Type"] <- "DEs"
 89 |         if(results[i,"theta_1"] <= results[i,"theta_2"])
 90 |           results[i,"State"] <- "up"
 91 |         else
 92 |           results[i,"State"] <- "down"
 93 |       }
 94 |       else if(results[i,"pvalue_LR3"] < threshold){
 95 |         results[i,"Type"] <- "DEa"
 96 |         if(results[i,"mu_1"] >= results[i,"mu_2"])
 97 |           results[i,"State"] <- "up"
 98 |         else
 99 |           results[i,"State"] <- "down"
100 |       }
101 |       else{
102 |         results[i,"Type"] <- "DEg"
103 |         if(results[i,"mu_1"] * (1 - results[i,"theta_1"]) >= results[i,"mu_2"] * (1 - results[i,"theta_2"]))
104 |           results[i,"State"] <- "up"
105 |         else
106 |           results[i,"State"] <- "down"
107 |       }
108 |     }
109 |     else
110 |       next;
111 |   }
112 |   results
113 | }
114 | 
115 | 
116 | 
117 | 
118 | 


--------------------------------------------------------------------------------
/R/TestData.R:
--------------------------------------------------------------------------------
 1 | #' TestData: A test dataset for DEsingle
 2 | #'
 3 | #' A toy dataset containing a single-cell RNA-seq (scRNA-seq) read counts matrix and its grouping information.
 4 | #'
 5 | #' \itemize{
 6 | #'   \item counts. A matrix of raw read counts of scRNA-seq data which has 200 genes (rows) and 150 cells (columns).
 7 | #'   \item group. A vector of factor specifying the two groups to be compared in \code{counts}. Also could be generated by: \code{group <- factor(c(rep(1,50), rep(2,100)))}
 8 | #' }
 9 | #'
10 | #' @name TestData
11 | #' @aliases counts group
12 | #' @docType data
13 | #' @keywords data
14 | #' @usage data(TestData)
15 | #' @format
16 | #' \itemize{
17 | #'   \item counts. A non-negative integer matrix of scRNA-seq raw read counts, rows are genes and columns are cells.
18 | #'   \item group. A vector of factor specifying the two groups to be compared, corresponding to the columns of the \code{counts}.
19 | #' }
20 | #' @source Petropoulos S, et al. Cell, 2016, 165(4): 1012-1026.
21 | #' @seealso
22 | #' \code{\link{DEsingle}}, for the detection of differentially expressed genes from scRNA-seq data.
23 | #'
24 | #' \code{\link{DEtype}}, for the classification of differentially expressed genes found by \code{\link{DEsingle}}.
25 | #'
26 | #' @examples
27 | #' # Load test data for DEsingle
28 | #' data(TestData)
29 | #'
30 | #' # Specifying the two groups to be compared
31 | #' # The sample number in group 1 and group 2 is 50 and 100 respectively
32 | #' group <- factor(c(rep(1,50), rep(2,100)))
33 | #'
34 | #' # Detecting the differentially expressed genes
35 | #' results <- DEsingle(counts = counts, group = group)
36 | #'
37 | #' # Dividing the differentially expressed genes into 3 categories
38 | #' results.classified <- DEtype(results = results, threshold = 0.05)
39 | #'
40 | NULL
41 | 
42 | 
43 | 
44 | 
45 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # DEsingle
  2 | 
  3 | *Zhun Miao*
  4 | 
  5 | *2018-06-21*
  6 | 
  7 | [![build](https://bioconductor.org/shields/build/release/bioc/DEsingle.svg)](http://bioconductor.org/checkResults/release/bioc-LATEST/DEsingle/)
  8 | [![platform](https://bioconductor.org/shields/availability/3.7/DEsingle.svg)](https://miaozhun.github.io/DEsingle/#downloads)
  9 | [![downloads](https://bioconductor.org/shields/downloads/DEsingle.svg)](https://bioconductor.org/packages/release/bioc/src/contrib/DEsingle_1.0.5.tar.gz)
 10 | 
 11 | ![Logo](https://github.com/miaozhun/DEsingle/blob/master/vignettes/DEsingle_LOGO.png?raw=true)
 12 | 
 13 | 
 14 | ## Introduction
 15 | 
 16 | **`DEsingle`** is an R package for **differential expression (DE) analysis of single-cell RNA-seq (scRNA-seq) data**. It will detect differentially expressed genes between two groups of cells in a scRNA-seq raw read counts matrix.
 17 | 
 18 | **`DEsingle`** employs the Zero-Inflated Negative Binomial model for differential expression analysis. By estimating the proportion of real and dropout zeros, it not only detects DE genes **at higher accuracy** but also **subdivides three types of differential expression with different regulatory and functional mechanisms**.
 19 | 
 20 | For more information, please refer to the [manuscript](https://doi.org/10.1093/bioinformatics/bty332) by *Zhun Miao, Ke Deng, Xiaowo Wang and Xuegong Zhang*.
 21 | 
 22 | 
 23 | ## Citation
 24 | 
 25 | If you use **`DEsingle`** in published research, please cite:
 26 | 
 27 | > Zhun Miao, Ke Deng, Xiaowo Wang, Xuegong Zhang (2018). DEsingle for detecting three types of differential expression in single-cell RNA-seq data. Bioinformatics, bty332. [10.1093/bioinformatics/bty332.](https://doi.org/10.1093/bioinformatics/bty332)
 28 | 
 29 | 
 30 | ## Installation
 31 | 
 32 | To install **`DEsingle`** from [**Bioconductor**](http://bioconductor.org/packages/DEsingle/):
 33 | 
 34 | ```{r Installation from Bioconductor, eval = FALSE}
 35 | if(!require(BiocManager)) install.packages("BiocManager")
 36 | BiocManager::install("DEsingle")
 37 | ```
 38 | 
 39 | To install the *developmental version* from [**GitHub**](https://github.com/miaozhun/DEsingle/):
 40 | 
 41 | ```{r Installation from GitHub, eval = FALSE}
 42 | if(!require(devtools)) install.packages("devtools")
 43 | devtools::install_github("miaozhun/DEsingle", build_vignettes = TRUE)
 44 | ```
 45 | 
 46 | To load the installed **`DEsingle`** in R:
 47 | 
 48 | ```{r Load DEsingle, eval = FALSE}
 49 | library(DEsingle)
 50 | ```
 51 | 
 52 | 
 53 | ## Input
 54 | 
 55 | **`DEsingle`** takes two inputs: `counts` and `group`.
 56 | 
 57 | The input `counts` is a scRNA-seq **raw read counts matrix** or a **`SingleCellExperiment`** object which contains the read counts matrix. The rows of the matrix are genes and columns are cells.
 58 | 
 59 | The other input `group` is a vector of factor which specifies the two groups in the matrix to be compared, corresponding to the columns in `counts`.
 60 | 
 61 | 
 62 | ## Test data
 63 | 
 64 | Users can load the test data in **`DEsingle`** by
 65 | 
 66 | ```{r Load TestData}
 67 | library(DEsingle)
 68 | data(TestData)
 69 | ```
 70 | 
 71 | The toy data `counts` in `TestData` is a scRNA-seq read counts matrix which has 200 genes (rows) and 150 cells (columns).
 72 | 
 73 | ```{r counts}
 74 | dim(counts)
 75 | counts[1:6, 1:6]
 76 | ```
 77 | 
 78 | The object `group` in `TestData` is a vector of factor which has two levels and equal length to the column number of `counts`.
 79 | 
 80 | ```{r group}
 81 | length(group)
 82 | summary(group)
 83 | ```
 84 | 
 85 | 
 86 | ## Usage
 87 | 
 88 | ### With read counts matrix input
 89 | 
 90 | Here is an example to run **`DEsingle`** with read counts matrix input:
 91 | 
 92 | ```{r demo1, eval = FALSE}
 93 | # Load library and the test data for DEsingle
 94 | library(DEsingle)
 95 | data(TestData)
 96 | 
 97 | # Specifying the two groups to be compared
 98 | # The sample number in group 1 and group 2 is 50 and 100 respectively
 99 | group <- factor(c(rep(1,50), rep(2,100)))
100 | 
101 | # Detecting the DE genes
102 | results <- DEsingle(counts = counts, group = group)
103 | 
104 | # Dividing the DE genes into 3 categories at threshold of FDR < 0.05
105 | results.classified <- DEtype(results = results, threshold = 0.05)
106 | ```
107 | 
108 | ### With SingleCellExperiment input
109 | 
110 | The [`SingleCellExperiment`](http://bioconductor.org/packages/SingleCellExperiment/) class is a widely used S4 class for storing single-cell genomics data. **`DEsingle`** also could take the `SingleCellExperiment` data representation as input.
111 | 
112 | Here is an example to run **`DEsingle`** with `SingleCellExperiment` input:
113 | 
114 | ```{r demo2, eval = FALSE}
115 | # Load library and the test data for DEsingle
116 | library(DEsingle)
117 | library(SingleCellExperiment)
118 | data(TestData)
119 | 
120 | # Convert the test data in DEsingle to SingleCellExperiment data representation
121 | sce <- SingleCellExperiment(assays = list(counts = as.matrix(counts)))
122 | 
123 | # Specifying the two groups to be compared
124 | # The sample number in group 1 and group 2 is 50 and 100 respectively
125 | group <- factor(c(rep(1,50), rep(2,100)))
126 | 
127 | # Detecting the DE genes with SingleCellExperiment input sce
128 | results <- DEsingle(counts = sce, group = group)
129 | 
130 | # Dividing the DE genes into 3 categories at threshold of FDR < 0.05
131 | results.classified <- DEtype(results = results, threshold = 0.05)
132 | ```
133 | 
134 | 
135 | ## Output
136 | 
137 | `DEtype` subdivides the DE genes found by `DEsingle` into 3 types: **`DEs`**, **`DEa`** and **`DEg`**.
138 | 
139 | * **`DEs`** refers to ***“different expression status”***. It is the type of genes that show significant difference in the proportion of real zeros in the two groups, but do not have significant difference in the other cells.
140 | 
141 | * **`DEa`** is for ***“differential expression abundance”***, which refers to genes that are significantly differentially expressed between the groups without significant difference in the proportion of real zeros.
142 | 
143 | * **`DEg`** or ***“general differential expression”*** refers to genes that have significant difference in both the proportions of real zeros and the expression abundances between the two groups.
144 | 
145 | The output of `DEtype` is a matrix containing the DE analysis results, whose rows are genes and columns contain the following items:
146 | 
147 | * `theta_1`, `theta_2`, `mu_1`, `mu_2`, `size_1`, `size_2`, `prob_1`, `prob_2`: MLE of the zero-inflated negative binomial distribution's parameters of group 1 and group 2.
148 | * `total_mean_1`, `total_mean_2`: Mean of read counts of group 1 and group 2.
149 | * `foldChange`: total_mean_1/total_mean_2.
150 | * `norm_total_mean_1`, `norm_total_mean_2`: Mean of normalized read counts of group 1 and group 2.
151 | * `norm_foldChange`: norm_total_mean_1/norm_total_mean_2.
152 | * `chi2LR1`: Chi-square statistic for hypothesis testing of H0.
153 | * `pvalue_LR2`: P value of hypothesis testing of H20 (Used to determine the type of a DE gene).
154 | * `pvalue_LR3`: P value of hypothesis testing of H30 (Used to determine the type of a DE gene).
155 | * `FDR_LR2`: Adjusted P value of pvalue_LR2 using Benjamini & Hochberg's method (Used to determine the type of a DE gene).
156 | * `FDR_LR3`: Adjusted P value of pvalue_LR3 using Benjamini & Hochberg's method (Used to determine the type of a DE gene).
157 | * `pvalue`: P value of hypothesis testing of H0 (Used to determine whether a gene is a DE gene).
158 | * `pvalue.adj.FDR`: Adjusted P value of H0's pvalue using Benjamini & Hochberg's method (Used to determine whether a gene is a DE gene).
159 | * `Remark`: Record of abnormal program information.
160 | * `Type`: Types of DE genes. *DEs* represents differential expression status; *DEa* represents differential expression abundance; *DEg* represents general differential expression.
161 | * `State`: State of DE genes, *up* represents up-regulated; *down* represents down-regulated.
162 | 
163 | To extract the significantly differentially expressed genes from the output of `DEtype` (**note that the same threshold of FDR should be used in this step as in `DEtype`**):
164 | 
165 | ```{r extract DE, eval = FALSE}
166 | # Extract DE genes at threshold of FDR < 0.05
167 | results.sig <- results.classified[results.classified$pvalue.adj.FDR < 0.05, ]
168 | ```
169 | 
170 | To further extract the three types of DE genes separately:
171 | 
172 | ```{r extract subtypes, eval = FALSE}
173 | # Extract three types of DE genes separately
174 | results.DEs <- results.sig[results.sig$Type == "DEs", ]
175 | results.DEa <- results.sig[results.sig$Type == "DEa", ]
176 | results.DEg <- results.sig[results.sig$Type == "DEg", ]
177 | ```
178 | 
179 | 
180 | ## Parallelization
181 | 
182 | **`DEsingle`** integrates parallel computing function with [`BiocParallel`](http://bioconductor.org/packages/BiocParallel/) package. Users could just set `parallel = TRUE` in function `DEsingle` to enable parallelization and leave the `BPPARAM` parameter alone.
183 | 
184 | ```{r demo3, eval = FALSE}
185 | # Load library
186 | library(DEsingle)
187 | 
188 | # Detecting the DE genes in parallelization
189 | results <- DEsingle(counts = counts, group = group, parallel = TRUE)
190 | ```
191 | 
192 | Advanced users could use a `BiocParallelParam` object from package `BiocParallel` to fill in the `BPPARAM` parameter to specify the parallel back-end to be used and its configuration parameters.
193 | 
194 | ### For Unix and Mac users
195 | 
196 | The best choice for Unix and Mac users is to use `MulticoreParam` to configure a multicore parallel back-end:
197 | 
198 | ```{r demo4, eval = FALSE}
199 | # Load library
200 | library(DEsingle)
201 | library(BiocParallel)
202 | 
203 | # Set the parameters and register the back-end to be used
204 | param <- MulticoreParam(workers = 18, progressbar = TRUE)
205 | register(param)
206 | 
207 | # Detecting the DE genes in parallelization with 18 cores
208 | results <- DEsingle(counts = counts, group = group, parallel = TRUE, BPPARAM = param)
209 | ```
210 | 
211 | ### For Windows users
212 | 
213 | For Windows users, use `SnowParam` to configure a Snow back-end is a good choice:
214 | 
215 | ```{r demo5, eval = FALSE}
216 | # Load library
217 | library(DEsingle)
218 | library(BiocParallel)
219 | 
220 | # Set the parameters and register the back-end to be used
221 | param <- SnowParam(workers = 8, type = "SOCK", progressbar = TRUE)
222 | register(param)
223 | 
224 | # Detecting the DE genes in parallelization with 8 cores
225 | results <- DEsingle(counts = counts, group = group, parallel = TRUE, BPPARAM = param)
226 | ```
227 | 
228 | See the [*Reference Manual*](https://bioconductor.org/packages/release/bioc/manuals/BiocParallel/man/BiocParallel.pdf) of [`BiocParallel`](http://bioconductor.org/packages/BiocParallel/) package for more details of the `BiocParallelParam` class.
229 | 
230 | 
231 | ## Visualization of results
232 | 
233 | Users could use the `heatmap()` function in `stats` or `heatmap.2` function in `gplots` to plot the heatmap of the DE genes DEsingle found, as we did in Figure S3 of the [*manuscript*](https://doi.org/10.1093/bioinformatics/bty332).
234 | 
235 | 
236 | ## Interpretation of results
237 | 
238 | For the interpretation of results when **`DEsingle`** applied to real data, please refer to the *Three types of DE genes between E3 and E4 of human embryonic cells* part in the [*Supplementary Materials*](https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty332/4983067#supplementary-data) of our [*manuscript*](https://doi.org/10.1093/bioinformatics/bty332).
239 | 
240 | 
241 | ## Help
242 | 
243 | Use `browseVignettes("DEsingle")` to see the vignettes of **`DEsingle`** in R after installation.
244 | 
245 | Use the following code in R to get access to the help documentation for **`DEsingle`**:
246 | 
247 | ```{r help1, eval = FALSE}
248 | # Documentation for DEsingle
249 | ?DEsingle
250 | ```
251 | 
252 | ```{r help2, eval = FALSE}
253 | # Documentation for DEtype
254 | ?DEtype
255 | ```
256 | 
257 | ```{r help3, eval = FALSE}
258 | # Documentation for TestData
259 | ?TestData
260 | ?counts
261 | ?group
262 | ```
263 | 
264 | You are also welcome to view and post *DEsingle* tagged questions on [Bioconductor Support Site of DEsingle](https://support.bioconductor.org/t/desingle/) or contact the author by email for help.
265 | 
266 | 
267 | ## Author
268 | 
269 | *Zhun Miao* <<miaoz13@tsinghua.org.cn>>
270 | 
271 | MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST; Department of Automation, Tsinghua University, Beijing 100084, China.
272 | 
273 | 


--------------------------------------------------------------------------------
/data/TestData.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/miaozhun/DEsingle/3cf7d9b3b35d6282f782b8c02116a1b3efde2ba3/data/TestData.rda


--------------------------------------------------------------------------------
/inst/CITATION:
--------------------------------------------------------------------------------
 1 | citEntry(entry="article",
 2 |          title = "DEsingle for detecting three types of differential expression in single-cell RNA-seq data",
 3 |          author = personList( as.person("Zhun Miao"),
 4 |                               as.person("Ke Deng"),
 5 | 							  as.person("Xiaowo Wang"),
 6 |                               as.person("Xuegong Zhang")),
 7 |          year = 2018,
 8 |          journal = "Bioinformatics",
 9 |          doi = "10.1093/bioinformatics/bty332",
10 |          pages = "bty332",
11 |          textVersion = 
12 |          paste("Zhun Miao, Ke Deng, Xiaowo Wang, Xuegong Zhang.", 
13 |                "DEsingle for detecting three types of differential expression in single-cell RNA-seq data.",
14 |                "Bioinformatics (2018): bty332."))
15 | 


--------------------------------------------------------------------------------
/inst/NEWS:
--------------------------------------------------------------------------------
 1 | VERSION 1.0.5
 2 | -------------------------
 3 |    o Optimization of speed and memory.
 4 | 
 5 | VERSION 1.0.1
 6 | -------------------------
 7 |    o Optimization of memory management.
 8 | 
 9 | VERSION 1.0.0
10 | -------------------------
11 |    o Package released in Bioconductor.
12 | 
13 | VERSION 0.99.12
14 | -------------------------
15 |    o Documentation improvements.
16 | 
17 | VERSION 0.99.9
18 | -------------------------
19 |    o Add Parallelization.
20 | 
21 | VERSION 0.99.0
22 | -------------------------
23 |    o Package released.
24 | 


--------------------------------------------------------------------------------
/man/DEsingle.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/DEsingle.R
 3 | \name{DEsingle}
 4 | \alias{DEsingle}
 5 | \title{DEsingle: Detecting differentially expressed genes from scRNA-seq data}
 6 | \usage{
 7 | DEsingle(counts, group, parallel = FALSE, BPPARAM = bpparam())
 8 | }
 9 | \arguments{
10 | \item{counts}{A non-negative integer matrix of scRNA-seq raw read counts or a \code{SingleCellExperiment} object which contains the read counts matrix. The rows of the matrix are genes and columns are samples/cells.}
11 | 
12 | \item{group}{A vector of factor which specifies the two groups to be compared, corresponding to the columns in the counts matrix.}
13 | 
14 | \item{parallel}{If FALSE (default), no parallel computation is used; if TRUE, parallel computation using \code{BiocParallel}, with argument \code{BPPARAM}.}
15 | 
16 | \item{BPPARAM}{An optional parameter object passed internally to \code{\link{bplapply}} when \code{parallel=TRUE}. If not specified, \code{\link{bpparam}()} (default) will be used.}
17 | }
18 | \value{
19 | A data frame containing the differential expression (DE) analysis results, rows are genes and columns contain the following items:
20 | \itemize{
21 |   \item theta_1, theta_2, mu_1, mu_2, size_1, size_2, prob_1, prob_2: MLE of the zero-inflated negative binomial distribution's parameters of group 1 and group 2.
22 |   \item total_mean_1, total_mean_2: Mean of read counts of group 1 and group 2.
23 |   \item foldChange: total_mean_1/total_mean_2.
24 |   \item norm_total_mean_1, norm_total_mean_2: Mean of normalized read counts of group 1 and group 2.
25 |   \item norm_foldChange: norm_total_mean_1/norm_total_mean_2.
26 |   \item chi2LR1: Chi-square statistic for hypothesis testing of H0.
27 |   \item pvalue_LR2: P value of hypothesis testing of H20 (Used to determine the type of a DE gene).
28 |   \item pvalue_LR3: P value of hypothesis testing of H30 (Used to determine the type of a DE gene).
29 |   \item FDR_LR2: Adjusted P value of pvalue_LR2 using Benjamini & Hochberg's method (Used to determine the type of a DE gene).
30 |   \item FDR_LR3: Adjusted P value of pvalue_LR3 using Benjamini & Hochberg's method (Used to determine the type of a DE gene).
31 |   \item pvalue: P value of hypothesis testing of H0 (Used to determine whether a gene is a DE gene).
32 |   \item pvalue.adj.FDR: Adjusted P value of H0's pvalue using Benjamini & Hochberg's method (Used to determine whether a gene is a DE gene).
33 |   \item Remark: Record of abnormal program information.
34 | }
35 | }
36 | \description{
37 | This function is used to detect differentially expressed genes between two specified groups of cells in a raw read counts matrix of single-cell RNA-seq (scRNA-seq) data. It takes a non-negative integer matrix of scRNA-seq raw read counts or a \code{SingleCellExperiment} object as input. So users should map the reads (obtained from sequencing libraries of the samples) to the corresponding genome and count the reads mapped to each gene according to the gene annotation to get the raw read counts matrix in advance.
38 | }
39 | \examples{
40 | # Load test data for DEsingle
41 | data(TestData)
42 | 
43 | # Specifying the two groups to be compared
44 | # The sample number in group 1 and group 2 is 50 and 100 respectively
45 | group <- factor(c(rep(1,50), rep(2,100)))
46 | 
47 | # Detecting the differentially expressed genes
48 | results <- DEsingle(counts = counts, group = group)
49 | 
50 | # Dividing the differentially expressed genes into 3 categories
51 | results.classified <- DEtype(results = results, threshold = 0.05)
52 | 
53 | }
54 | \seealso{
55 | \code{\link{DEtype}}, for the classification of differentially expressed genes found by \code{\link{DEsingle}}.
56 | 
57 | \code{\link{TestData}}, a test dataset for DEsingle.
58 | }
59 | \author{
60 | Zhun Miao.
61 | }
62 | 


--------------------------------------------------------------------------------
/man/DEtype.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/DEtype.R
 3 | \name{DEtype}
 4 | \alias{DEtype}
 5 | \title{DEtype: Classifying differentially expressed genes from DEsingle}
 6 | \usage{
 7 | DEtype(results, threshold)
 8 | }
 9 | \arguments{
10 | \item{results}{A output data frame from \code{DEsingle}, which contains the unclassified differential expression analysis results.}
11 | 
12 | \item{threshold}{A number of (0,1) to specify the threshold of FDR.}
13 | }
14 | \value{
15 | A data frame containing the differential expression (DE) analysis results and DE gene types and states.
16 | \itemize{
17 |   \item theta_1, theta_2, mu_1, mu_2, size_1, size_2, prob_1, prob_2: MLE of the zero-inflated negative binomial distribution's parameters of group 1 and group 2.
18 |   \item total_mean_1, total_mean_2: Mean of read counts of group 1 and group 2.
19 |   \item foldChange: total_mean_1/total_mean_2.
20 |   \item norm_total_mean_1, norm_total_mean_2: Mean of normalized read counts of group 1 and group 2.
21 |   \item norm_foldChange: norm_total_mean_1/norm_total_mean_2.
22 |   \item chi2LR1: Chi-square statistic for hypothesis testing of H0.
23 |   \item pvalue_LR2: P value of hypothesis testing of H20 (Used to determine the type of a DE gene).
24 |   \item pvalue_LR3: P value of hypothesis testing of H30 (Used to determine the type of a DE gene).
25 |   \item FDR_LR2: Adjusted P value of pvalue_LR2 using Benjamini & Hochberg's method (Used to determine the type of a DE gene).
26 |   \item FDR_LR3: Adjusted P value of pvalue_LR3 using Benjamini & Hochberg's method (Used to determine the type of a DE gene).
27 |   \item pvalue: P value of hypothesis testing of H0 (Used to determine whether a gene is a DE gene).
28 |   \item pvalue.adj.FDR: Adjusted P value of H0's pvalue using Benjamini & Hochberg's method (Used to determine whether a gene is a DE gene).
29 |   \item Remark: Record of abnormal program information.
30 |   \item Type: Types of DE genes. DEs represents different expression status; DEa represents differential expression abundance; DEg represents general differential expression.
31 |   \item State: State of DE genes, up represents up-regulated; down represents down-regulated.
32 | }
33 | }
34 | \description{
35 | This function is used to classify the differentially expressed genes of single-cell RNA-seq (scRNA-seq) data found by \code{DEsingle}. It takes the output data frame from \code{DEsingle} as input.
36 | }
37 | \examples{
38 | # Load test data for DEsingle
39 | data(TestData)
40 | 
41 | # Specifying the two groups to be compared
42 | # The sample number in group 1 and group 2 is 50 and 100 respectively
43 | group <- factor(c(rep(1,50), rep(2,100)))
44 | 
45 | # Detecting the differentially expressed genes
46 | results <- DEsingle(counts = counts, group = group)
47 | 
48 | # Dividing the differentially expressed genes into 3 categories
49 | results.classified <- DEtype(results = results, threshold = 0.05)
50 | 
51 | }
52 | \seealso{
53 | \code{\link{DEsingle}}, for the detection of differentially expressed genes from scRNA-seq data.
54 | 
55 | \code{\link{TestData}}, a test dataset for DEsingle.
56 | }
57 | \author{
58 | Zhun Miao.
59 | }
60 | 


--------------------------------------------------------------------------------
/man/TestData.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/TestData.R
 3 | \docType{data}
 4 | \name{TestData}
 5 | \alias{TestData}
 6 | \alias{counts}
 7 | \alias{group}
 8 | \title{TestData: A test dataset for DEsingle}
 9 | \format{\itemize{
10 |   \item counts. A non-negative integer matrix of scRNA-seq raw read counts, rows are genes and columns are cells.
11 |   \item group. A vector of factor specifying the two groups to be compared, corresponding to the columns of the \code{counts}.
12 | }}
13 | \source{
14 | Petropoulos S, et al. Cell, 2016, 165(4): 1012-1026.
15 | }
16 | \usage{
17 | data(TestData)
18 | }
19 | \description{
20 | A toy dataset containing a single-cell RNA-seq (scRNA-seq) read counts matrix and its grouping information.
21 | }
22 | \details{
23 | \itemize{
24 |   \item counts. A matrix of raw read counts of scRNA-seq data which has 200 genes (rows) and 150 cells (columns).
25 |   \item group. A vector of factor specifying the two groups to be compared in \code{counts}. Also could be generated by: \code{group <- factor(c(rep(1,50), rep(2,100)))}
26 | }
27 | }
28 | \examples{
29 | # Load test data for DEsingle
30 | data(TestData)
31 | 
32 | # Specifying the two groups to be compared
33 | # The sample number in group 1 and group 2 is 50 and 100 respectively
34 | group <- factor(c(rep(1,50), rep(2,100)))
35 | 
36 | # Detecting the differentially expressed genes
37 | results <- DEsingle(counts = counts, group = group)
38 | 
39 | # Dividing the differentially expressed genes into 3 categories
40 | results.classified <- DEtype(results = results, threshold = 0.05)
41 | 
42 | }
43 | \seealso{
44 | \code{\link{DEsingle}}, for the detection of differentially expressed genes from scRNA-seq data.
45 | 
46 | \code{\link{DEtype}}, for the classification of differentially expressed genes found by \code{\link{DEsingle}}.
47 | }
48 | \keyword{data}
49 | 


--------------------------------------------------------------------------------
/vignettes/DEsingle.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "DEsingle"
  3 | author: "Zhun Miao"
  4 | date: "2018-06-21"
  5 | output: 
  6 |   html_document:
  7 |     toc: TRUE
  8 |     toc_depth: 3
  9 |     toc_float: TRUE
 10 |     collapsed: TRUE
 11 | vignette: >
 12 |   %\VignetteIndexEntry{DEsingle}
 13 |   %\VignetteEngine{knitr::rmarkdown}
 14 |   %\VignetteEncoding{UTF-8}
 15 | ---
 16 | 
 17 | ```{r setup, include = FALSE}
 18 | knitr::opts_chunk$set(
 19 |   echo = TRUE,
 20 |   collapse = TRUE,
 21 |   comment = "#>"
 22 | )
 23 | ```
 24 | 
 25 | ![](DEsingle_LOGO.png)
 26 | 
 27 | 
 28 | ## Introduction
 29 | 
 30 | **`DEsingle`** is an R package for **differential expression (DE) analysis of single-cell RNA-seq (scRNA-seq) data**. It will detect differentially expressed genes between two groups of cells in a scRNA-seq raw read counts matrix.
 31 | 
 32 | **`DEsingle`** employs the Zero-Inflated Negative Binomial model for differential expression analysis. By estimating the proportion of real and dropout zeros, it not only detects DE genes **at higher accuracy** but also **subdivides three types of differential expression with different regulatory and functional mechanisms**.
 33 | 
 34 | For more information, please refer to the [manuscript](https://doi.org/10.1093/bioinformatics/bty332) by *Zhun Miao, Ke Deng, Xiaowo Wang and Xuegong Zhang*.
 35 | 
 36 | 
 37 | ## Citation
 38 | 
 39 | If you use **`DEsingle`** in published research, please cite:
 40 | 
 41 | > Zhun Miao, Ke Deng, Xiaowo Wang, Xuegong Zhang (2018). DEsingle for detecting three types of differential expression in single-cell RNA-seq data. Bioinformatics, bty332. [10.1093/bioinformatics/bty332.](https://doi.org/10.1093/bioinformatics/bty332)
 42 | 
 43 | 
 44 | ## Installation
 45 | 
 46 | To install **`DEsingle`** from [**Bioconductor**](http://bioconductor.org/packages/DEsingle/):
 47 | 
 48 | ```{r Installation from Bioconductor, eval = FALSE}
 49 | if(!require(BiocManager)) install.packages("BiocManager")
 50 | BiocManager::install("DEsingle")
 51 | ```
 52 | 
 53 | To install the *developmental version* from [**GitHub**](https://github.com/miaozhun/DEsingle/):
 54 | 
 55 | ```{r Installation from GitHub, eval = FALSE}
 56 | if(!require(devtools)) install.packages("devtools")
 57 | devtools::install_github("miaozhun/DEsingle", build_vignettes = TRUE)
 58 | ```
 59 | 
 60 | To load the installed **`DEsingle`** in R:
 61 | 
 62 | ```{r Load DEsingle, eval = FALSE}
 63 | library(DEsingle)
 64 | ```
 65 | 
 66 | 
 67 | ## Input
 68 | 
 69 | **`DEsingle`** takes two inputs: `counts` and `group`.
 70 | 
 71 | The input `counts` is a scRNA-seq **raw read counts matrix** or a **`SingleCellExperiment`** object which contains the read counts matrix. The rows of the matrix are genes and columns are cells.
 72 | 
 73 | The other input `group` is a vector of factor which specifies the two groups in the matrix to be compared, corresponding to the columns in `counts`.
 74 | 
 75 | 
 76 | ## Test data
 77 | 
 78 | Users can load the test data in **`DEsingle`** by
 79 | 
 80 | ```{r Load TestData}
 81 | library(DEsingle)
 82 | data(TestData)
 83 | ```
 84 | 
 85 | The toy data `counts` in `TestData` is a scRNA-seq read counts matrix which has 200 genes (rows) and 150 cells (columns).
 86 | 
 87 | ```{r counts}
 88 | dim(counts)
 89 | counts[1:6, 1:6]
 90 | ```
 91 | 
 92 | The object `group` in `TestData` is a vector of factor which has two levels and equal length to the column number of `counts`.
 93 | 
 94 | ```{r group}
 95 | length(group)
 96 | summary(group)
 97 | ```
 98 | 
 99 | 
100 | ## Usage
101 | 
102 | ### With read counts matrix input
103 | 
104 | Here is an example to run **`DEsingle`** with read counts matrix input:
105 | 
106 | ```{r demo1, eval = FALSE}
107 | # Load library and the test data for DEsingle
108 | library(DEsingle)
109 | data(TestData)
110 | 
111 | # Specifying the two groups to be compared
112 | # The sample number in group 1 and group 2 is 50 and 100 respectively
113 | group <- factor(c(rep(1,50), rep(2,100)))
114 | 
115 | # Detecting the DE genes
116 | results <- DEsingle(counts = counts, group = group)
117 | 
118 | # Dividing the DE genes into 3 categories at threshold of FDR < 0.05
119 | results.classified <- DEtype(results = results, threshold = 0.05)
120 | ```
121 | 
122 | ### With SingleCellExperiment input
123 | 
124 | The [`SingleCellExperiment`](http://bioconductor.org/packages/SingleCellExperiment/) class is a widely used S4 class for storing single-cell genomics data. **`DEsingle`** also could take the `SingleCellExperiment` data representation as input.
125 | 
126 | Here is an example to run **`DEsingle`** with `SingleCellExperiment` input:
127 | 
128 | ```{r demo2, eval = FALSE}
129 | # Load library and the test data for DEsingle
130 | library(DEsingle)
131 | library(SingleCellExperiment)
132 | data(TestData)
133 | 
134 | # Convert the test data in DEsingle to SingleCellExperiment data representation
135 | sce <- SingleCellExperiment(assays = list(counts = as.matrix(counts)))
136 | 
137 | # Specifying the two groups to be compared
138 | # The sample number in group 1 and group 2 is 50 and 100 respectively
139 | group <- factor(c(rep(1,50), rep(2,100)))
140 | 
141 | # Detecting the DE genes with SingleCellExperiment input sce
142 | results <- DEsingle(counts = sce, group = group)
143 | 
144 | # Dividing the DE genes into 3 categories at threshold of FDR < 0.05
145 | results.classified <- DEtype(results = results, threshold = 0.05)
146 | ```
147 | 
148 | 
149 | ## Output
150 | 
151 | `DEtype` subdivides the DE genes found by `DEsingle` into 3 types: **`DEs`**, **`DEa`** and **`DEg`**.
152 | 
153 | * **`DEs`** refers to ***“different expression status”***. It is the type of genes that show significant difference in the proportion of real zeros in the two groups, but do not have significant difference in the other cells.
154 | 
155 | * **`DEa`** is for ***“differential expression abundance”***, which refers to genes that are significantly differentially expressed between the groups without significant difference in the proportion of real zeros.
156 | 
157 | * **`DEg`** or ***“general differential expression”*** refers to genes that have significant difference in both the proportions of real zeros and the expression abundances between the two groups.
158 | 
159 | The output of `DEtype` is a matrix containing the DE analysis results, whose rows are genes and columns contain the following items:
160 | 
161 | * `theta_1`, `theta_2`, `mu_1`, `mu_2`, `size_1`, `size_2`, `prob_1`, `prob_2`: MLE of the zero-inflated negative binomial distribution's parameters of group 1 and group 2.
162 | * `total_mean_1`, `total_mean_2`: Mean of read counts of group 1 and group 2.
163 | * `foldChange`: total_mean_1/total_mean_2.
164 | * `norm_total_mean_1`, `norm_total_mean_2`: Mean of normalized read counts of group 1 and group 2.
165 | * `norm_foldChange`: norm_total_mean_1/norm_total_mean_2.
166 | * `chi2LR1`: Chi-square statistic for hypothesis testing of H0.
167 | * `pvalue_LR2`: P value of hypothesis testing of H20 (Used to determine the type of a DE gene).
168 | * `pvalue_LR3`: P value of hypothesis testing of H30 (Used to determine the type of a DE gene).
169 | * `FDR_LR2`: Adjusted P value of pvalue_LR2 using Benjamini & Hochberg's method (Used to determine the type of a DE gene).
170 | * `FDR_LR3`: Adjusted P value of pvalue_LR3 using Benjamini & Hochberg's method (Used to determine the type of a DE gene).
171 | * `pvalue`: P value of hypothesis testing of H0 (Used to determine whether a gene is a DE gene).
172 | * `pvalue.adj.FDR`: Adjusted P value of H0's pvalue using Benjamini & Hochberg's method (Used to determine whether a gene is a DE gene).
173 | * `Remark`: Record of abnormal program information.
174 | * `Type`: Types of DE genes. *DEs* represents differential expression status; *DEa* represents differential expression abundance; *DEg* represents general differential expression.
175 | * `State`: State of DE genes, *up* represents up-regulated; *down* represents down-regulated.
176 | 
177 | To extract the significantly differentially expressed genes from the output of `DEtype` (**note that the same threshold of FDR should be used in this step as in `DEtype`**):
178 | 
179 | ```{r extract DE, eval = FALSE}
180 | # Extract DE genes at threshold of FDR < 0.05
181 | results.sig <- results.classified[results.classified$pvalue.adj.FDR < 0.05, ]
182 | ```
183 | 
184 | To further extract the three types of DE genes separately:
185 | 
186 | ```{r extract subtypes, eval = FALSE}
187 | # Extract three types of DE genes separately
188 | results.DEs <- results.sig[results.sig$Type == "DEs", ]
189 | results.DEa <- results.sig[results.sig$Type == "DEa", ]
190 | results.DEg <- results.sig[results.sig$Type == "DEg", ]
191 | ```
192 | 
193 | 
194 | ## Parallelization
195 | 
196 | **`DEsingle`** integrates parallel computing function with [`BiocParallel`](http://bioconductor.org/packages/BiocParallel/) package. Users could just set `parallel = TRUE` in function `DEsingle` to enable parallelization and leave the `BPPARAM` parameter alone.
197 | 
198 | ```{r demo3, eval = FALSE}
199 | # Load library
200 | library(DEsingle)
201 | 
202 | # Detecting the DE genes in parallelization
203 | results <- DEsingle(counts = counts, group = group, parallel = TRUE)
204 | ```
205 | 
206 | Advanced users could use a `BiocParallelParam` object from package `BiocParallel` to fill in the `BPPARAM` parameter to specify the parallel back-end to be used and its configuration parameters.
207 | 
208 | ### For Unix and Mac users
209 | 
210 | The best choice for Unix and Mac users is to use `MulticoreParam` to configure a multicore parallel back-end:
211 | 
212 | ```{r demo4, eval = FALSE}
213 | # Load library
214 | library(DEsingle)
215 | library(BiocParallel)
216 | 
217 | # Set the parameters and register the back-end to be used
218 | param <- MulticoreParam(workers = 18, progressbar = TRUE)
219 | register(param)
220 | 
221 | # Detecting the DE genes in parallelization with 18 cores
222 | results <- DEsingle(counts = counts, group = group, parallel = TRUE, BPPARAM = param)
223 | ```
224 | 
225 | ### For Windows users
226 | 
227 | For Windows users, use `SnowParam` to configure a Snow back-end is a good choice:
228 | 
229 | ```{r demo5, eval = FALSE}
230 | # Load library
231 | library(DEsingle)
232 | library(BiocParallel)
233 | 
234 | # Set the parameters and register the back-end to be used
235 | param <- SnowParam(workers = 8, type = "SOCK", progressbar = TRUE)
236 | register(param)
237 | 
238 | # Detecting the DE genes in parallelization with 8 cores
239 | results <- DEsingle(counts = counts, group = group, parallel = TRUE, BPPARAM = param)
240 | ```
241 | 
242 | See the [*Reference Manual*](https://bioconductor.org/packages/release/bioc/manuals/BiocParallel/man/BiocParallel.pdf) of [`BiocParallel`](http://bioconductor.org/packages/BiocParallel/) package for more details of the `BiocParallelParam` class.
243 | 
244 | 
245 | ## Visualization of results
246 | 
247 | Users could use the `heatmap()` function in `stats` or `heatmap.2` function in `gplots` to plot the heatmap of the DE genes DEsingle found, as we did in Figure S3 of the [*manuscript*](https://doi.org/10.1093/bioinformatics/bty332).
248 | 
249 | 
250 | ## Interpretation of results
251 | 
252 | For the interpretation of results when **`DEsingle`** applied to real data, please refer to the *Three types of DE genes between E3 and E4 of human embryonic cells* part in the [*Supplementary Materials*](https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty332/4983067#supplementary-data) of our [*manuscript*](https://doi.org/10.1093/bioinformatics/bty332).
253 | 
254 | 
255 | ## Help
256 | 
257 | Use `browseVignettes("DEsingle")` to see the vignettes of **`DEsingle`** in R after installation.
258 | 
259 | Use the following code in R to get access to the help documentation for **`DEsingle`**:
260 | 
261 | ```{r help1, eval = FALSE}
262 | # Documentation for DEsingle
263 | ?DEsingle
264 | ```
265 | 
266 | ```{r help2, eval = FALSE}
267 | # Documentation for DEtype
268 | ?DEtype
269 | ```
270 | 
271 | ```{r help3, eval = FALSE}
272 | # Documentation for TestData
273 | ?TestData
274 | ?counts
275 | ?group
276 | ```
277 | 
278 | You are also welcome to view and post *DEsingle* tagged questions on [Bioconductor Support Site of DEsingle](https://support.bioconductor.org/t/desingle/) or contact the author by email for help.
279 | 
280 | 
281 | ## Author
282 | 
283 | *Zhun Miao* <<miaoz13@tsinghua.org.cn>>
284 | 
285 | MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST; Department of Automation, Tsinghua University, Beijing 100084, China.
286 | 
287 | 
288 | ## Session info
289 | 
290 | ```{r sessionInfo}
291 | sessionInfo()
292 | ```
293 | 
294 | 


--------------------------------------------------------------------------------
/vignettes/DEsingle_LOGO.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/miaozhun/DEsingle/3cf7d9b3b35d6282f782b8c02116a1b3efde2ba3/vignettes/DEsingle_LOGO.png


--------------------------------------------------------------------------------