├── .gitignore ├── DESCRIPTION ├── LICENSE ├── NAMESPACE ├── R ├── bottomly.R ├── nls-functions.R ├── plot-superSeq.R ├── summary-superSeq.R └── superSeq.R ├── README.md ├── data └── bottomly.RData ├── man ├── bottomly.Rd ├── defaultNLSControl.Rd ├── defaultNLSStarts.Rd ├── fitnls.Rd ├── plot.superSeq.Rd ├── summary.superSeq.Rd └── superSeq.Rd └── vignettes └── superSeq.Rmd /.gitignore: -------------------------------------------------------------------------------- 1 | inst/doc 2 | # History files 3 | .Rhistory 4 | .Rapp.history 5 | 6 | # Session Data files 7 | .RData 8 | .DS_Store 9 | # Example code in package build process 10 | *-Ex.R 11 | # Output files from R CMD build 12 | /*.tar.gz 13 | # Output files from R CMD check 14 | /*.Rcheck/ 15 | # RStudio files 16 | .Rproj.user/ 17 | # produced vignettes 18 | vignettes/*.html 19 | vignettes/*.pdf 20 | # OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3 21 | .httr-oauth 22 | # knitr and R markdown default cache directories 23 | /*_cache/ 24 | /cache/ 25 | # Temporary files created by R markdown 26 | *.utf8.md 27 | *.knit.md 28 | # Shiny token, see https://shiny.rstudio.com/articles/shinyapps.html 29 | rsconnect/ 30 | .Rproj.user 31 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: superSeq 2 | Type: Package 3 | Title: Determining sufficient read depth in RNA-Seq experiments 4 | Version: 0.0.0.99 5 | Date: 2019-05-09 6 | Author: Andrew J. Bass, David G. Robinson, and John D. Storey 7 | Maintainer: Andrew J. Bass 8 | Description: The superSeq package helps determine the sufficient read depth to achieve desired statistical power in an RNA-Seq study. 9 | Imports: 10 | subSeq, 11 | dplyr, 12 | purrr, 13 | ggplot2 14 | Suggests: 15 | knitr, rmarkdown, Biobase, 16 | Depends: 17 | R(>= 3.1.0) 18 | VignetteBuilder: knitr 19 | URL: https://github.com/StoreyLab/superSeq 20 | License: MIT + file LICENSE 21 | RoxygenNote: 6.1.0 22 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Storey Lab 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | S3method(plot,superSeq) 4 | S3method(summary,superSeq) 5 | export(defaultNLSControl) 6 | export(defaultNLSStarts) 7 | export(fitnls) 8 | export(superSeq) 9 | import(dplyr) 10 | import(ggplot2) 11 | import(purrr) 12 | import(subSeq) 13 | -------------------------------------------------------------------------------- /R/bottomly.R: -------------------------------------------------------------------------------- 1 | #' @name bottomly 2 | #' 3 | #' @title RNA-Seq counts from Bottomly et al 2011 4 | #' 5 | #' @docType data 6 | #' 7 | #' @description An expressionSet object of the Bottomly et al study. 8 | #' 9 | #' This was downloaded from the ReCount database of analysis-ready RNA-Seq datasets (Frazee et al 2011). 10 | #' 11 | #' Bottomly D, Walter NA, Hunter JE, et al. Evaluating gene expression in C57BL/6J and DBA/2J mouse striatum 12 | #' using RNA-Seq and microarrays. PLoS One. 2011;6(3):e17820. Published 2011 Mar 24. doi:10.1371/journal.pone.0017820 13 | #' 14 | #' Frazee, A. C., Langmead, B., and Leek, J. T. (2011). ReCount: a multi-experiment resource of analysis-ready 15 | #' RNA-seq gene count datasets. BMC Bioinformatics, 12, 449. 16 | #' http://bowtie-bio.sourceforge.net/recount/ 17 | NULL 18 | -------------------------------------------------------------------------------- /R/nls-functions.R: -------------------------------------------------------------------------------- 1 | #' Fit significance curve using nonlinear least squares 2 | #' 3 | #' Fit a nonlinear least squares model explaining the number of significant 4 | #' genes based on the proportion of reads used. Tries each of the starting 5 | #' combinations of parameters in the given order. If all fail, returns NULL. 6 | #' 7 | #' @param dat a data.frame from summary.subsamples, containing columns for 8 | #' "proportion" and "significant" 9 | #' @param method specifies the method used to fit the model. Default is probit (superSeq). "logit" and "smoother" are possible 10 | #' options but are mainly for use in our manuscript. 11 | #' @param starts A list of combinations of starting guesses to try with 12 | #' \code{nls}. If NULL, uses \code{\link{defaultNLSStarts}} 13 | #' @param control control for \code{nls}, by default \link{defaultNLSControl} 14 | #' 15 | #' @seealso \code{\link{defaultNLSStarts}} 16 | #' @export 17 | fitnls <- function (dat, method = "probit", starts = NULL, control = defaultNLSControl) { 18 | nls0 <- purrr::possibly(nls, NULL, quiet = TRUE) 19 | # starting parameters 20 | if (is.null(starts)) { 21 | starts <- defaultNLSStarts(max(dat$significant)) 22 | } 23 | # Create upper bound for k 24 | pi0 <- unique(dat$pi0) 25 | up_lim <- pmin(unique(dat$genes) * (1 - pi0 / 1.4) / 0.95, unique(dat$genes)) 26 | for (s in starts) { 27 | if (method == "probit") { 28 | fit = nls0(significant ~ k * plnorm(proportion + b, mu, s), 29 | dat, 30 | start = s, 31 | control = control, 32 | lower = c(.95 * max(dat$significant), -15, 0.00001, 0), 33 | upper = c(up_lim, 15, 15, 10), 34 | algorithm = "port") 35 | } else if (method == "logit") { 36 | fit = nls0(significant ~ k * plogis(log10(proportion + b), location = mu, scale = s), 37 | dat, 38 | start = s, 39 | lower = c(.95 * max(dat$significant), -15, 0.00001, 0), 40 | upper = c(unique(dat$genes), 15, 15, 10), 41 | control = control, 42 | algorithm = "port") 43 | 44 | } else { 45 | fit = smooth.spline(x = dat$proportion, y = dat$significant, df = 4) 46 | } 47 | if (!is.null(fit)) { 48 | return(list(fit = fit)) 49 | } 50 | } 51 | warning("Failed to converge. Try different starting parameters") 52 | return(NULL) 53 | } 54 | 55 | 56 | #' Default control for nonlinear least squares fitting of saturation curves 57 | #' 58 | #' This is a wrapper for nls.control. It is a list that can be directly appended. 59 | #' See nls.control for more details. 60 | #' @export 61 | defaultNLSControl <- nls.control(maxiter=2000, minFactor=1/1e3, tol=1e-2, printEval=FALSE, 62 | warnOnly=FALSE) 63 | 64 | #' Default combinations of starting values for nonlinear least squares fitting 65 | #' of saturation curves 66 | #' 67 | #' @param maxsig the maximum number of significant genes in the curve 68 | #' 69 | #' @return a list of lists, each of which serves as a set of starting 70 | #' parameter estimates for nls 71 | #' 72 | #' @export 73 | defaultNLSStarts <- function(maxsig) { 74 | list( 75 | list(k=1 * maxsig, mu=0, s=2, b=0), 76 | list(k=1 * maxsig, mu=-2, s=1, b=5), 77 | list(k=1 * maxsig, mu=.5, s=.2, b=1), 78 | list(k=1 * maxsig, mu=0, s=20, b=10 ), 79 | list(k=1 * maxsig, mu=-2, s=8, b=-10), 80 | list(k=2 * maxsig, mu=.5, s=.2, b=1), 81 | list(k=2 * maxsig, mu=0, s=2, b=0), 82 | list(k=2 * maxsig, mu=-2, s=1, b=0), 83 | list(k=2 * maxsig, mu=.5, s=7, b=7), 84 | list(k=2 * maxsig, mu=0, s=9, b=-7), 85 | list(k=3 * maxsig, mu=-2, s=1, b=0), 86 | list(k=3 * maxsig, mu=.5, s=.2, b=1), 87 | list(k=3 * maxsig, mu=0, s=2, b=0), 88 | list(k=3 * maxsig, mu=-2, s=3.1, b=10), 89 | list(k=3 * maxsig, mu=.5, s=2.2, b=-20), 90 | list(k=4 * maxsig, mu=0, s=2, b=0), 91 | list(k=4 * maxsig, mu=-2, s=1, b=0), 92 | list(k=4 * maxsig, mu=.5, s=.2, b=1), 93 | list(k=4 * maxsig, mu=0, s=8, b=2), 94 | list(k=4 * maxsig, mu=-2, s=4, b=10), 95 | list(k=1.5 * maxsig, mu=1.5, s=.2, b=0), 96 | list(k=1.5 * maxsig, mu=10, s=2.6, b=0), 97 | list(k=1.5 * maxsig, mu=-20, s=1.2, b=0), 98 | list(k=1.5 * maxsig, mu=5, s=5.2, b=0), 99 | list(k=1.5 * maxsig, mu=2, s=22, b=0), 100 | list(k=2.5 * maxsig, mu=-2, s=1, b=0), 101 | list(k=2.5 * maxsig, mu=.5, s=.2, b=1), 102 | list(k=2.5 * maxsig, mu=0.1, s=2, b=0), 103 | list(k=2.5 * maxsig, mu=-.2, s=0.01, b=0), 104 | list(k=3.5 * maxsig, mu=1.5, s=.2, b=1), 105 | list(k=3.5 * maxsig, mu=20, s=2, b=0), 106 | list(k=3.5 * maxsig, mu=-2.3, s=1, b=0), 107 | list(k=3.5 * maxsig, mu=2.5, s=.2, b=1), 108 | list(k = 1 * maxsig, mu = 0, s = 3, b = 0), 109 | list(k = 2 * maxsig, mu = -1, s = 2, b = 0), 110 | list(k = 3 * maxsig, mu = -1, s = 0.001, b = 0), 111 | list(k = 5 * maxsig, mu = -2, s = 6, b = 0), 112 | list(k = 3 * maxsig, mu = -6, s = 0.2, b = 0), 113 | list(k = 6 * maxsig, mu = 0, s = 0, b = 1), 114 | list(k = 3 * maxsig, mu = -40, s = -10, b = 0), 115 | list(k = 2 * maxsig, mu = -2, s = -5, b = 0), 116 | list(k = 1 * maxsig, mu = 0, s = 2, b = 0), 117 | list(k = 1 *maxsig, mu = -2, s = 1, b = 5), 118 | list(k = 1 * maxsig, mu = 0.5, s = 0.2, b = 1), 119 | list(k = 1 * maxsig, mu = 0, s = 20, b = 10), 120 | list(k = 1 * maxsig, mu = -2, s = 8, b = -10), 121 | list(k = 2 * maxsig, mu = 0.5, s = 0.2, b = 1), 122 | list(k = 2 * maxsig, mu = 0, s = 2, b = 0), 123 | list(k = 2 * maxsig, mu = -2, s = 1, b = 0), 124 | list(k = 2 * maxsig, mu = 0.5, s = 7, b = 7), 125 | list(k = 2 * maxsig, mu = 0, s = 9, b = -7), 126 | list(k = 3 * maxsig, mu = -2, s = 1, b = 0), 127 | list(k = 3 * maxsig, mu = 0.5, s = 0.2, b = 1), 128 | list(k = 3 * maxsig, mu = 0, s = 2, b = 0), 129 | list(k = 3 * maxsig, mu = -2, s = 3.1, b = 10), 130 | list(k = 3 * maxsig, mu = 0.5, s = 2.2, b = -20), 131 | list(k = 4 * maxsig, mu = 0, s = 2, b = 0), 132 | list(k = 4 * maxsig, mu = -2, s = 1, b = 0), 133 | list(k = 4 * maxsig, mu = 0.5, s = 0.2, b = 1), 134 | list(k = 4 * maxsig, mu = 0, s = 8, b = 2), 135 | list(k = 4 * maxsig, mu = -2, s = 4, b = 10), 136 | list(k = 1.5 * maxsig, mu = 1.5, s = 0.2, b = 0), 137 | list(k = 1.5 * maxsig, mu = 10, s = 2.6, b = 0), 138 | list(k = 1.5 * maxsig, mu = -20, s = 1.2, b = 0), 139 | list(k = 1.5 * maxsig, mu = 5, s = 5.2, b = 0), 140 | list(k = 1.5 * maxsig, mu = 2, s = 22, b = 0), 141 | list(k = 2.5 * maxsig, mu = -2, s = 1, b = 0), 142 | list(k = 2.5 * maxsig, mu = 0.5, s = 0.2, b = 1), 143 | list(k = 2.5 * maxsig, mu = 0.1, s = 2, b = 0), 144 | list(k = 2.5 * maxsig, mu = -0.2, s = 0.01, b = 0), 145 | list(k = 3.5 * maxsig, mu = 1.5, s = 0.2, b = 1), 146 | list(k = 3.5 * maxsig, mu = 20, s = 2, b = 0), 147 | list(k = 3.5 * maxsig, mu = -2.3, s = 1, b = 0), 148 | list(k = 3.5 * maxsig, mu = 2.5, s = 0.2, b = 1), 149 | list(k = 1 * maxsig, mu = 0, s = 1/3, b = 0), 150 | list(k = 2 * maxsig, mu = -1, s = 1/2, b = 0), 151 | list(k = 3 * maxsig, mu = -1, s = 1/0.001, b = 0), 152 | list(k = 5 * maxsig, mu = -2, s = 6, b = 0), 153 | list(k = 3 * maxsig, mu = -6, s = 1/0.2, b = 0), 154 | list(k = 6 * maxsig, mu = 0.00010, s = 1, b = 1), 155 | list(k = 3 * maxsig, mu = -40, s = -10, b = 0), 156 | list(k = 2 * maxsig, mu = -2, s = -5, b = 0), 157 | list(k = 1 * maxsig, mu = 0, s = 12, b = 0), 158 | list(k = 1 * maxsig, mu = -2, s = 11, b = 5), 159 | list(k = 1 * maxsig, mu = 0.05, s = 0.2, b = 1), 160 | list(k = 1 * maxsig, mu = 0, s = 1/20, b = 10), 161 | list(k = 1 * maxsig, mu = -2, s = 1/8, b = -10), 162 | list(k = 2 * maxsig, mu = 0.5, s = 1/0.2, b = 1), 163 | list(k = 2 * maxsig, mu = 0, s = 2, b = 0), 164 | list(k = 2 * maxsig, mu = -0.002, s = 1, b = 0), 165 | list(k = 2 * maxsig, mu = 0.5, s = 3/7, b = 7), 166 | list(k = 2 * maxsig, mu = 0, s = 1/9, b = -7), 167 | list(k = 3 * maxsig, mu = -0.002, s = .00011, b = 0), 168 | list(k = 3 * maxsig, mu = 0.5, s = 0.2, b = 1), 169 | list(k = 3 * maxsig, mu = 0, s = 2, b = 0), 170 | list(k = 3 * maxsig, mu = -2, s = 3.1, b = 10), 171 | list(k = 3 * maxsig, mu = 0.5, s = 2.2, b = -20), 172 | list(k = 3 * maxsig, mu = 0, s = 2, b = 0), 173 | list(k = 4 * maxsig, mu = -2, s = 31, b = 0), 174 | list(k = 4 * maxsig, mu = 0.5, s = 10.2, b = 1), 175 | list(k = 4 * maxsig, mu = 0, s = 8, b = 2), 176 | list(k = 4 * maxsig, mu = -2, s = 42, b = 0), 177 | list(k = 1.5 * maxsig, mu = 1.5, s = 0.2, b = 0), 178 | list(k = 1.5 * maxsig, mu = 2, s = 2.6, b = 0), 179 | list(k = 1.5 * maxsig, mu = -2, s = 1.2, b = 0), 180 | list(k = 1.5 * maxsig, mu = -1, s =3.2, b = 0), 181 | list(k = 1.5 * maxsig, mu = 1, s = 2, b = 0), 182 | list(k = 2.5 * maxsig, mu = -2, s = 0.4, b = 0), 183 | list(k = 2.5 * maxsig, mu = 0.5, s = 0.2, b = 0), 184 | list(k = 2.5 * maxsig, mu = 0.1, s = 2, b = 0), 185 | list(k = 2.5 * maxsig, mu = -0.2, s = 0.01, b = 0), 186 | list(k = 3.5 * maxsig, mu = 1.5, s = 0.2, b = 1), 187 | list(k = 2 * maxsig, mu = 2, s = 2, b = .0005), 188 | list(k = 2 * maxsig, mu = -4, s = 1, b = 0), 189 | list(k = 2 * maxsig, mu = 4, s = 2, b = -.5) 190 | ) 191 | } 192 | -------------------------------------------------------------------------------- /R/plot-superSeq.R: -------------------------------------------------------------------------------- 1 | #' Plotting for superSeq object 2 | #' 3 | #' @param x superSeq object 4 | #' @param ... not used 5 | #' 6 | #' @keywords plot 7 | #' @aliases plot, plot.superSeq 8 | #' @export 9 | plot.superSeq <- function(x, ...) { 10 | subseq_object <- x$subsample 11 | predictions <- x$predictions 12 | cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7") 13 | 14 | p0 <- subseq_object %>% 15 | ggplot(aes(proportion, significant, color = method)) + 16 | geom_point(size = 1) + 17 | geom_line(dat = predictions, aes(y = predicted, x = proportion, color = method), lty = 2, size = 1.0) + 18 | geom_vline(xintercept = 1, lty = 3) + 19 | scale_colour_manual(values=cbbPalette) + 20 | ylab("Number of significant genes") + 21 | xlab("Read depth proportion") + 22 | theme_bw() 23 | p0 24 | } 25 | -------------------------------------------------------------------------------- /R/summary-superSeq.R: -------------------------------------------------------------------------------- 1 | #' Summary function of superSeq object 2 | #' 3 | #' @param object superSeq object 4 | #' @param depth Specified read depth proportion to provide predictions in the summary. 5 | #' @param digits Number of digits to print. 6 | #' @param ... Not currently used 7 | #' 8 | #' @export 9 | summary.superSeq <- function(object, depth = c(1, 2, 3), digits = getOption("digits"), ...) { 10 | # Call 11 | model_fit <- object$fits$fit[[1]] 12 | total_dis <- object$predictions$total_discoveries[1] 13 | cat("\nCall:\n", deparse(object$call), "\n\n", sep = "") 14 | # Expected number of discoveries 15 | cat("Asymptotic number of discoveries:", format(total_dis, digits=digits), "\n", sep="\t") 16 | cat("\n") 17 | # Number of significant values for p-value, q-value and local FDR 18 | new_p <- predict(model_fit[[1]], newdata = data.frame(proportion = depth)) 19 | df <- data.frame(depth, format(new_p, digits = digits), format(new_p / total_dis, digits = digits)) 20 | 21 | colnames(df) <- c("Read depth proportion", "Estimated number of discoveries", "Estimated experimental power") 22 | print(df) 23 | cat("\n") 24 | } -------------------------------------------------------------------------------- /R/superSeq.R: -------------------------------------------------------------------------------- 1 | #' Apply superSeq model to subsampling data 2 | #' 3 | #' The superSeq function fits a non-linear least squares model to subsampling data to learn the relationship between statistical power and read depth. 4 | #' 5 | #' @param object A subSeq summary object. 6 | #' @param control Specify convergence criteria for non-linear least squares algorithm. 7 | #' See \code{\link{defaultNLSControl}}. 8 | #' @param starts A list of combinations of starting guesses to try with 9 | #' \code{nls}. If NULL, uses \code{\link{defaultNLSStarts}}. 10 | #' @param new_p A vector of subsampling proportions to predict using the superSeq model fits. By default, triple the subsampling 11 | #' depth is predicted. 12 | #' 13 | #' @return A superSeq object, which is a \code{data.frame}: 14 | #' 15 | #' \item{fits}{The fitted objected from nls function} 16 | #' \item{subsample}{The subsampled object from subSeq} 17 | #' \item{predictions}{A data frame with the following columns: method used, total predicted discoveries, proportion read depth, predicted number of DE genes, and estimated statistical power.} 18 | #' 19 | #' @examples 20 | #'\dontrun{ 21 | #'library(superSeq) 22 | #'library(subSeq) 23 | #'library(Biobase) 24 | #'# Load bottomly data 25 | #'data(bottomly) 26 | #'bottomly_counts <- exprs(bottomly) 27 | #'bottomly_design <- pData(bottomly) 28 | #'bottomly_counts <- bottomly_counts[rowSums(bottomly_counts) >= 10, ] 29 | #'bottomly_proportions <- 10 ^ seq(-2, 0, 0.1) 30 | #'# Apply subsampling methodology subSeq 31 | #'ss = subsample(counts = bottomly_counts, 32 | #' proportions = bottomly_proportions, 33 | #' treatment=bottomly_design$strain, 34 | #' method=c("voomLimma"), 35 | #' replications = 3, 36 | #' seed = 12345) 37 | #' ss_sum <- summary(ss) 38 | #' 39 | #'# apply superSeq model 40 | #'ss_obj <- superSeq(ss_sum) 41 | #' 42 | #'# plot results 43 | #'plot(ss_obj) 44 | #' 45 | #'# summarise results 46 | #'summary(ss_obj) 47 | #'} 48 | #' 49 | #' @seealso \code{\link{fitnls}} 50 | #' 51 | #' @import dplyr subSeq purrr ggplot2 52 | #' @export 53 | superSeq <- function(object, control = defaultNLSControl, starts = NULL, new_p = NULL) { 54 | if (is.null(new_p)) new_p <- seq(0, 3, 0.05) 55 | # apply non-linear least squares algorithm 56 | fits <- object %>% 57 | group_by(method) %>% 58 | do(fit = fitnls(., control = control, starts = starts)) %>% 59 | ungroup() 60 | # use model to provide predictions 61 | predictions <- fits %>% 62 | group_by(method) %>% 63 | do(data.frame(proportion = new_p, 64 | predicted = predict_func(fit =.$fit[[1]][[1]], data = new_p))) %>% 65 | distinct() 66 | 67 | predictions <- fits %>% 68 | group_by(method) %>% 69 | do(data.frame(total_discoveries = coef(.$fit[[1]]$fit)[1])) %>% 70 | right_join(predictions, by = "method") %>% 71 | mutate(estimated_power = predicted / total_discoveries) 72 | out <- list(call = match.call(), subsample = object, fits = fits, predictions = predictions) 73 | class(out) <- "superSeq" 74 | out 75 | } 76 | 77 | # Function to get predictions from NLS model fits 78 | predict_func <- function(fit, data) { 79 | predict(fit, newdata = data.frame(proportion = data)) 80 | } 81 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | superSeq: Determining sufficient read depth in RNA-Seq experiments 2 | ======= 3 | 4 | The superSeq package models the relationship between statistical power and read depth in an RNA sequencing study. Our algorithm can help predict how many additional reads, if any, should be sequenced to achieve desired statistical power. 5 | 6 | See also [superSeq: Determining sufficient sequencing depth in RNA-Seq differential expression studies](https://www.biorxiv.org/content/10.1101/635623v1). 7 | 8 | Installation 9 | ------------- 10 | 11 | First install the Bioconductor dependencies: 12 | 13 | source("http://bioconductor.org/biocLite.R") 14 | biocLite(c("qvalue", "limma", "edgeR", "DESeq2", "DEXSeq", "pasilla")) 15 | 16 | Then install the [devtools](https://github.com/hadley/devtools) package, and use it to install the [subSeq](https://github.com/StoreyLab/subSeq) and superSeq packages. 17 | 18 | install.packages("devtools") 19 | library(devtools) 20 | install_github("StoreyLab/subSeq") 21 | install_github("StoreyLab/superSeq", build_opts = c("--no-resave-data", "--no-manual"), build_vignettes = TRUE) 22 | 23 | Vignette 24 | --------------------- 25 | 26 | Once you've installed the package, you can access the vignette with 27 | 28 | library(superSeq) 29 | vignette("superSeq") 30 | 31 | If you run into a problem or have a question about the software's usage, please open a [GitHub issue](https://github.com/StoreyLab/superSeq/issues). 32 | -------------------------------------------------------------------------------- /data/bottomly.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/StoreyLab/superSeq/52c287dae1c9daa6c141fa866dc2536298c8b815/data/bottomly.RData -------------------------------------------------------------------------------- /man/bottomly.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/bottomly.R 3 | \docType{data} 4 | \name{bottomly} 5 | \alias{bottomly} 6 | \title{RNA-Seq counts from Bottomly et al 2011} 7 | \description{ 8 | An expressionSet object of the Bottomly et al study. 9 | 10 | This was downloaded from the ReCount database of analysis-ready RNA-Seq datasets (Frazee et al 2011). 11 | 12 | Bottomly D, Walter NA, Hunter JE, et al. Evaluating gene expression in C57BL/6J and DBA/2J mouse striatum 13 | using RNA-Seq and microarrays. PLoS One. 2011;6(3):e17820. Published 2011 Mar 24. doi:10.1371/journal.pone.0017820 14 | 15 | Frazee, A. C., Langmead, B., and Leek, J. T. (2011). ReCount: a multi-experiment resource of analysis-ready 16 | RNA-seq gene count datasets. BMC Bioinformatics, 12, 449. 17 | http://bowtie-bio.sourceforge.net/recount/ 18 | } 19 | -------------------------------------------------------------------------------- /man/defaultNLSControl.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/nls-functions.R 3 | \docType{data} 4 | \name{defaultNLSControl} 5 | \alias{defaultNLSControl} 6 | \title{Default control for nonlinear least squares fitting of saturation curves} 7 | \format{An object of class \code{list} of length 5.} 8 | \usage{ 9 | defaultNLSControl 10 | } 11 | \description{ 12 | This is a wrapper for nls.control. It is a list that can be directly appended. 13 | See nls.control for more details. 14 | } 15 | \keyword{datasets} 16 | -------------------------------------------------------------------------------- /man/defaultNLSStarts.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/nls-functions.R 3 | \name{defaultNLSStarts} 4 | \alias{defaultNLSStarts} 5 | \title{Default combinations of starting values for nonlinear least squares fitting 6 | of saturation curves} 7 | \usage{ 8 | defaultNLSStarts(maxsig) 9 | } 10 | \arguments{ 11 | \item{maxsig}{the maximum number of significant genes in the curve} 12 | } 13 | \value{ 14 | a list of lists, each of which serves as a set of starting 15 | parameter estimates for nls 16 | } 17 | \description{ 18 | Default combinations of starting values for nonlinear least squares fitting 19 | of saturation curves 20 | } 21 | -------------------------------------------------------------------------------- /man/fitnls.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/nls-functions.R 3 | \name{fitnls} 4 | \alias{fitnls} 5 | \title{Fit significance curve using nonlinear least squares} 6 | \usage{ 7 | fitnls(dat, method = "probit", starts = NULL, 8 | control = defaultNLSControl) 9 | } 10 | \arguments{ 11 | \item{dat}{a data.frame from summary.subsamples, containing columns for 12 | "proportion" and "significant"} 13 | 14 | \item{method}{specifies the method used to fit the model. Default is probit (superSeq). "logit" and "smoother" are possible 15 | options but are mainly for use in our manuscript.} 16 | 17 | \item{starts}{A list of combinations of starting guesses to try with 18 | \code{nls}. If NULL, uses \code{\link{defaultNLSStarts}}} 19 | 20 | \item{control}{control for \code{nls}, by default \link{defaultNLSControl}} 21 | } 22 | \description{ 23 | Fit a nonlinear least squares model explaining the number of significant 24 | genes based on the proportion of reads used. Tries each of the starting 25 | combinations of parameters in the given order. If all fail, returns NULL. 26 | } 27 | \seealso{ 28 | \code{\link{defaultNLSStarts}} 29 | } 30 | -------------------------------------------------------------------------------- /man/plot.superSeq.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/plot-superSeq.R 3 | \name{plot.superSeq} 4 | \alias{plot.superSeq} 5 | \alias{plot,} 6 | \title{Plotting for superSeq object} 7 | \usage{ 8 | \method{plot}{superSeq}(x, ...) 9 | } 10 | \arguments{ 11 | \item{x}{superSeq object} 12 | 13 | \item{...}{not used} 14 | } 15 | \description{ 16 | Plotting for superSeq object 17 | } 18 | \keyword{plot} 19 | -------------------------------------------------------------------------------- /man/summary.superSeq.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/summary-superSeq.R 3 | \name{summary.superSeq} 4 | \alias{summary.superSeq} 5 | \title{Summary function of superSeq object} 6 | \usage{ 7 | \method{summary}{superSeq}(object, depth = c(1, 2, 3), 8 | digits = getOption("digits"), ...) 9 | } 10 | \arguments{ 11 | \item{object}{superSeq object} 12 | 13 | \item{depth}{Specified read depth proportion to provide predictions in the summary.} 14 | 15 | \item{digits}{Number of digits to print.} 16 | 17 | \item{...}{Not currently used} 18 | } 19 | \description{ 20 | Summary function of superSeq object 21 | } 22 | -------------------------------------------------------------------------------- /man/superSeq.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/superSeq.R 3 | \name{superSeq} 4 | \alias{superSeq} 5 | \title{Apply superSeq model to subsampling data} 6 | \usage{ 7 | superSeq(object, control = defaultNLSControl, starts = NULL, 8 | new_p = NULL) 9 | } 10 | \arguments{ 11 | \item{object}{A subSeq summary object.} 12 | 13 | \item{control}{Specify convergence criteria for non-linear least squares algorithm. 14 | See \code{\link{defaultNLSControl}}.} 15 | 16 | \item{starts}{A list of combinations of starting guesses to try with 17 | \code{nls}. If NULL, uses \code{\link{defaultNLSStarts}}.} 18 | 19 | \item{new_p}{A vector of subsampling proportions to predict using the superSeq model fits. By default, triple the subsampling 20 | depth is predicted.} 21 | } 22 | \value{ 23 | A superSeq object, which is a \code{data.frame}: 24 | 25 | \item{fits}{The fitted objected from nls function} 26 | \item{subsample}{The subsampled object from subSeq} 27 | \item{predictions}{A data frame with the following columns: method used, total predicted discoveries, proportion read depth, predicted number of DE genes, and estimated statistical power.} 28 | } 29 | \description{ 30 | The superSeq function fits a non-linear least squares model to subsampling data to learn the relationship between statistical power and read depth. 31 | } 32 | \examples{ 33 | \dontrun{ 34 | library(superSeq) 35 | library(subSeq) 36 | library(Biobase) 37 | # Load bottomly data 38 | data(bottomly) 39 | bottomly_counts <- exprs(bottomly) 40 | bottomly_design <- pData(bottomly) 41 | bottomly_counts <- bottomly_counts[rowSums(bottomly_counts) >= 10, ] 42 | bottomly_proportions <- 10 ^ seq(-2, 0, 0.1) 43 | # Apply subsampling methodology subSeq 44 | ss = subsample(counts = bottomly_counts, 45 | proportions = bottomly_proportions, 46 | treatment=bottomly_design$strain, 47 | method=c("voomLimma"), 48 | replications = 3, 49 | seed = 12345) 50 | ss_sum <- summary(ss) 51 | 52 | # apply superSeq model 53 | ss_obj <- superSeq(ss_sum) 54 | 55 | # plot results 56 | plot(ss_obj) 57 | 58 | # summarise results 59 | summary(ss_obj) 60 | } 61 | 62 | } 63 | \seealso{ 64 | \code{\link{fitnls}} 65 | } 66 | -------------------------------------------------------------------------------- /vignettes/superSeq.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Determining sufficient read depth with superSeq" 3 | author: "Andrew J. Bass, David G. Robinson, John D. Storey" 4 | output: rmarkdown::html_vignette 5 | vignette: > 6 | %\VignetteIndexEntry{Determining sufficient read depth with superSeq} 7 | %\VignetteEngine{knitr::rmarkdown} 8 | \usepackage[utf8]{inputenc} 9 | --- 10 | 11 | ```{r setup, include=FALSE} 12 | knitr::opts_chunk$set(echo = TRUE) 13 | ``` 14 | 15 | The superSeq package can help determine whether a completed study has sufficient read depth to achieve desired statistical power. Using superSeq, we can predict the increase in statistical power from an increasing read depth (given a fixed number of biological replicates). For more details, see our recent [preprint](TODO). 16 | 17 | ## Quick start guide 18 | 19 | The superSeq package predicts the relationship between statistical power and read depth using subsampling data. Therefore, we need to apply a subsampling methodology to the experiment and then fit our model using the `superSeq` function. In this package, we use the computationally efficient [subSeq](https://github.com/StoreyLab/subSeq) package. 20 | 21 | As an example, let's first load the package and apply the subsampling function `subsample` to the `bottomly` data set: 22 | 23 | ```{r quick_start, message = FALSE, warning = FALSE} 24 | library(superSeq) 25 | library(subSeq) 26 | library(Biobase) 27 | data(bottomly) 28 | 29 | # Extract count matrix, experimental design and filter low count genes 30 | bottomly_counts <- exprs(bottomly) 31 | bottomly_design <- pData(bottomly) 32 | bottomly_counts <- bottomly_counts[rowSums(bottomly_counts) >= 10, ] 33 | 34 | # Generate the subsampling data for this study at specified proportions 35 | bottomly_proportions <- 10 ^ seq(-2, 0, 0.1) 36 | ss = subsample(counts = bottomly_counts, 37 | proportions = bottomly_proportions, 38 | treatment = bottomly_design$strain, 39 | method = c("voomLimma"), 40 | replications = 3, 41 | seed = 12345) 42 | ss_sum <- summary(ss) 43 | head(ss_sum) 44 | ``` 45 | 46 | Type `?subSeq` for additional details on the subsampling implementation. Now that we have the subsampling results, we can apply the `superSeq` function and view the predictions from the model using the `plot` function: 47 | 48 | ```{r, fig.width=5} 49 | ss_obj <- superSeq(ss_sum) 50 | plot(ss_obj) 51 | ``` 52 | 53 | It is evident from the above plot that the study is undersaturated, i.e., the study can expect a substantial increase in statistical power from sequencing additional reads. We can extract a summary of our predictions as follows: 54 | 55 | ```{r} 56 | summary(ss_obj) 57 | ``` 58 | 59 | The estimated asymptotic number of discoveries is the expected number of differentially expressed genes when the technical variability is minimized. Thus the current read depth provides 53.3\% of the total power and doubling or tripling the read depth will provide a 14.3\% or 21.8\% increase in power, respectively. 60 | --------------------------------------------------------------------------------