├── .gitignore
├── DESCRIPTION
├── LICENSE
├── NAMESPACE
├── R
    ├── bottomly.R
    ├── nls-functions.R
    ├── plot-superSeq.R
    ├── summary-superSeq.R
    └── superSeq.R
├── README.md
├── data
    └── bottomly.RData
├── man
    ├── bottomly.Rd
    ├── defaultNLSControl.Rd
    ├── defaultNLSStarts.Rd
    ├── fitnls.Rd
    ├── plot.superSeq.Rd
    ├── summary.superSeq.Rd
    └── superSeq.Rd
└── vignettes
    └── superSeq.Rmd


/.gitignore:
--------------------------------------------------------------------------------
 1 | inst/doc
 2 | # History files
 3 | .Rhistory
 4 | .Rapp.history
 5 | 
 6 | # Session Data files
 7 | .RData
 8 | .DS_Store
 9 | # Example code in package build process
10 | *-Ex.R
11 | # Output files from R CMD build
12 | /*.tar.gz
13 | # Output files from R CMD check
14 | /*.Rcheck/
15 | # RStudio files
16 | .Rproj.user/
17 | # produced vignettes
18 | vignettes/*.html
19 | vignettes/*.pdf
20 | # OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
21 | .httr-oauth
22 | # knitr and R markdown default cache directories
23 | /*_cache/
24 | /cache/
25 | # Temporary files created by R markdown
26 | *.utf8.md
27 | *.knit.md
28 | # Shiny token, see https://shiny.rstudio.com/articles/shinyapps.html
29 | rsconnect/
30 | .Rproj.user
31 | 


--------------------------------------------------------------------------------
/DESCRIPTION:
--------------------------------------------------------------------------------
 1 | Package: superSeq
 2 | Type: Package
 3 | Title: Determining sufficient read depth in RNA-Seq experiments
 4 | Version: 0.0.0.99
 5 | Date: 2019-05-09
 6 | Author: Andrew J. Bass, David G. Robinson, and John D. Storey
 7 | Maintainer: Andrew J. Bass <ajbass@princeton.edu>
 8 | Description: The superSeq package helps determine the sufficient read depth to achieve desired statistical power in an RNA-Seq study.
 9 | Imports: 
10 |     subSeq,
11 |     dplyr,
12 |     purrr,
13 |     ggplot2
14 | Suggests:
15 |     knitr, rmarkdown, Biobase,
16 | Depends:
17 |     R(>= 3.1.0)
18 | VignetteBuilder: knitr
19 | URL: https://github.com/StoreyLab/superSeq
20 | License: MIT + file LICENSE
21 | RoxygenNote: 6.1.0
22 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2019 Storey Lab
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/NAMESPACE:
--------------------------------------------------------------------------------
 1 | # Generated by roxygen2: do not edit by hand
 2 | 
 3 | S3method(plot,superSeq)
 4 | S3method(summary,superSeq)
 5 | export(defaultNLSControl)
 6 | export(defaultNLSStarts)
 7 | export(fitnls)
 8 | export(superSeq)
 9 | import(dplyr)
10 | import(ggplot2)
11 | import(purrr)
12 | import(subSeq)
13 | 


--------------------------------------------------------------------------------
/R/bottomly.R:
--------------------------------------------------------------------------------
 1 | #' @name bottomly
 2 | #' 
 3 | #' @title RNA-Seq counts from Bottomly et al 2011
 4 | #' 
 5 | #' @docType data
 6 | #' 
 7 | #' @description An expressionSet object of the Bottomly et al study.
 8 | #' 
 9 | #' This was downloaded from the ReCount database of analysis-ready RNA-Seq datasets (Frazee et al 2011).
10 | #' 
11 | #' Bottomly D, Walter NA, Hunter JE, et al. Evaluating gene expression in C57BL/6J and DBA/2J mouse striatum
12 | #'  using RNA-Seq and microarrays. PLoS One. 2011;6(3):e17820. Published 2011 Mar 24. doi:10.1371/journal.pone.0017820
13 | #' 
14 | #' Frazee, A. C., Langmead, B., and Leek, J. T. (2011). ReCount: a multi-experiment resource of analysis-ready
15 | #' RNA-seq gene count datasets. BMC Bioinformatics, 12, 449.
16 | #' http://bowtie-bio.sourceforge.net/recount/
17 | NULL
18 | 


--------------------------------------------------------------------------------
/R/nls-functions.R:
--------------------------------------------------------------------------------
  1 | #' Fit significance curve using nonlinear least squares
  2 | #'
  3 | #' Fit a nonlinear least squares model explaining the number of significant
  4 | #' genes based on the proportion of reads used. Tries each of the starting
  5 | #' combinations of parameters in the given order. If all fail, returns NULL.
  6 | #'
  7 | #' @param dat a data.frame from summary.subsamples, containing columns for
  8 | #' "proportion" and "significant"
  9 | #' @param method specifies the method used to fit the model. Default is probit (superSeq). "logit" and "smoother" are possible
 10 | #' options but are mainly for use in our manuscript.
 11 | #' @param starts A list of combinations of starting guesses to try with
 12 | #' \code{nls}. If NULL, uses \code{\link{defaultNLSStarts}}
 13 | #' @param control control for \code{nls}, by default \link{defaultNLSControl}
 14 | #'
 15 | #' @seealso \code{\link{defaultNLSStarts}}
 16 | #' @export
 17 | fitnls <- function (dat, method = "probit", starts = NULL, control = defaultNLSControl) {
 18 |   nls0 <- purrr::possibly(nls, NULL, quiet = TRUE)  
 19 |   # starting parameters
 20 |   if (is.null(starts)) {
 21 |     starts <- defaultNLSStarts(max(dat$significant))
 22 |   }
 23 |   # Create upper bound for k
 24 |   pi0 <- unique(dat$pi0)
 25 |   up_lim <- pmin(unique(dat$genes) * (1 - pi0 / 1.4) / 0.95, unique(dat$genes))
 26 |   for (s in starts) {
 27 |     if (method == "probit") {
 28 |       fit = nls0(significant ~ k * plnorm(proportion + b, mu, s),
 29 |                  dat,
 30 |                  start = s,
 31 |                  control = control,
 32 |                  lower = c(.95 * max(dat$significant), -15, 0.00001, 0),
 33 |                  upper = c(up_lim, 15, 15, 10), 
 34 |                  algorithm = "port")
 35 |     } else if (method == "logit") {
 36 |       fit = nls0(significant ~ k * plogis(log10(proportion + b), location = mu, scale = s),
 37 |                  dat,
 38 |                  start = s,
 39 |                  lower = c(.95 * max(dat$significant), -15, 0.00001, 0), 
 40 |                  upper = c(unique(dat$genes), 15, 15, 10),
 41 |                  control = control,
 42 |                  algorithm = "port")
 43 |       
 44 |     }  else {
 45 |       fit = smooth.spline(x = dat$proportion, y = dat$significant, df = 4)
 46 |     }
 47 |     if (!is.null(fit)) {
 48 |       return(list(fit = fit))
 49 |     }
 50 |   }
 51 |   warning("Failed to converge. Try different starting parameters")
 52 |   return(NULL)
 53 | }
 54 | 
 55 | 
 56 | #' Default control for nonlinear least squares fitting of saturation curves
 57 | #'
 58 | #' This is a wrapper for nls.control. It is a list that can be directly appended.
 59 | #' See nls.control for more details.
 60 | #' @export
 61 | defaultNLSControl <- nls.control(maxiter=2000, minFactor=1/1e3, tol=1e-2, printEval=FALSE,
 62 |                                  warnOnly=FALSE)
 63 | 
 64 | #' Default combinations of starting values for nonlinear least squares fitting
 65 | #' of saturation curves
 66 | #'
 67 | #' @param maxsig the maximum number of significant genes in the curve
 68 | #'
 69 | #' @return a list of lists, each of which serves as a set of starting
 70 | #' parameter estimates for nls
 71 | #'
 72 | #' @export
 73 | defaultNLSStarts <- function(maxsig) {
 74 |   list(
 75 |     list(k=1 * maxsig, mu=0, s=2, b=0),
 76 |     list(k=1 * maxsig, mu=-2, s=1, b=5),
 77 |     list(k=1 * maxsig, mu=.5, s=.2, b=1),
 78 |     list(k=1 * maxsig, mu=0, s=20, b=10 ),
 79 |     list(k=1 * maxsig, mu=-2, s=8, b=-10),
 80 |     list(k=2 * maxsig, mu=.5, s=.2, b=1),
 81 |     list(k=2 * maxsig, mu=0, s=2, b=0),
 82 |     list(k=2 * maxsig, mu=-2, s=1, b=0),
 83 |     list(k=2 * maxsig, mu=.5, s=7, b=7),
 84 |     list(k=2 * maxsig, mu=0, s=9, b=-7),
 85 |     list(k=3 * maxsig, mu=-2, s=1, b=0),
 86 |     list(k=3 * maxsig, mu=.5, s=.2, b=1),
 87 |     list(k=3 * maxsig, mu=0, s=2, b=0),
 88 |     list(k=3 * maxsig, mu=-2, s=3.1, b=10),
 89 |     list(k=3 * maxsig, mu=.5, s=2.2, b=-20),
 90 |     list(k=4 * maxsig, mu=0, s=2, b=0),
 91 |     list(k=4 * maxsig, mu=-2, s=1, b=0),
 92 |     list(k=4 * maxsig, mu=.5, s=.2, b=1),
 93 |     list(k=4 * maxsig, mu=0, s=8, b=2),
 94 |     list(k=4 * maxsig, mu=-2, s=4, b=10),
 95 |     list(k=1.5 * maxsig, mu=1.5, s=.2, b=0),
 96 |     list(k=1.5 * maxsig, mu=10, s=2.6, b=0),
 97 |     list(k=1.5 * maxsig, mu=-20, s=1.2, b=0),
 98 |     list(k=1.5 * maxsig, mu=5, s=5.2, b=0),
 99 |     list(k=1.5 * maxsig, mu=2, s=22, b=0),
100 |     list(k=2.5 * maxsig, mu=-2, s=1, b=0),
101 |     list(k=2.5 * maxsig, mu=.5, s=.2, b=1),
102 |     list(k=2.5 * maxsig, mu=0.1, s=2, b=0),
103 |     list(k=2.5 * maxsig, mu=-.2, s=0.01, b=0),
104 |     list(k=3.5 * maxsig, mu=1.5, s=.2, b=1),
105 |     list(k=3.5 * maxsig, mu=20, s=2, b=0),
106 |     list(k=3.5 * maxsig, mu=-2.3, s=1, b=0),
107 |     list(k=3.5 * maxsig, mu=2.5, s=.2, b=1),
108 |     list(k = 1 * maxsig, mu = 0, s = 3, b = 0),
109 |     list(k = 2 * maxsig, mu = -1, s = 2, b = 0),
110 |     list(k = 3 * maxsig, mu = -1, s = 0.001, b = 0),
111 |     list(k = 5 * maxsig, mu = -2, s = 6, b = 0),
112 |     list(k = 3 * maxsig, mu = -6, s = 0.2, b = 0),
113 |     list(k = 6 * maxsig, mu = 0, s = 0, b = 1),
114 |     list(k = 3 * maxsig, mu = -40, s = -10, b = 0),
115 |     list(k = 2 * maxsig, mu = -2, s = -5, b = 0),
116 |     list(k = 1 * maxsig, mu = 0, s = 2, b = 0),
117 |     list(k = 1 *maxsig,  mu = -2, s = 1, b = 5),
118 |     list(k = 1 * maxsig, mu = 0.5, s = 0.2, b = 1),
119 |     list(k = 1 * maxsig, mu = 0, s = 20, b = 10),
120 |     list(k = 1 * maxsig, mu = -2, s = 8, b = -10),
121 |     list(k = 2 * maxsig, mu = 0.5, s = 0.2, b = 1),
122 |     list(k = 2 * maxsig, mu = 0, s = 2, b = 0),
123 |     list(k = 2 * maxsig, mu = -2, s = 1, b = 0),
124 |     list(k = 2 * maxsig, mu = 0.5, s = 7, b = 7),
125 |     list(k = 2 * maxsig, mu = 0, s = 9, b = -7),
126 |     list(k = 3 * maxsig, mu = -2, s = 1, b = 0),
127 |     list(k = 3 * maxsig, mu = 0.5, s = 0.2, b = 1),
128 |     list(k = 3 * maxsig, mu = 0, s = 2, b = 0),
129 |     list(k = 3 * maxsig, mu = -2, s = 3.1, b = 10),
130 |     list(k = 3 * maxsig, mu = 0.5, s = 2.2, b = -20),
131 |     list(k = 4 * maxsig, mu = 0, s = 2, b = 0),
132 |     list(k = 4 * maxsig, mu = -2, s = 1, b = 0),
133 |     list(k = 4 * maxsig, mu = 0.5, s = 0.2, b = 1),
134 |     list(k = 4 * maxsig, mu = 0, s = 8, b = 2),
135 |     list(k = 4 * maxsig, mu = -2, s = 4, b = 10),
136 |     list(k = 1.5 * maxsig, mu = 1.5, s = 0.2, b = 0),
137 |     list(k = 1.5 * maxsig, mu = 10, s = 2.6, b = 0),
138 |     list(k = 1.5 * maxsig, mu = -20, s = 1.2, b = 0),
139 |     list(k = 1.5 * maxsig, mu = 5, s = 5.2, b = 0),
140 |     list(k = 1.5 * maxsig, mu = 2, s = 22, b = 0),
141 |     list(k = 2.5 * maxsig, mu = -2, s = 1, b = 0),
142 |     list(k = 2.5 * maxsig, mu = 0.5, s = 0.2, b = 1),
143 |     list(k = 2.5 * maxsig, mu = 0.1, s = 2, b = 0),
144 |     list(k = 2.5 * maxsig, mu = -0.2, s = 0.01, b = 0),
145 |     list(k = 3.5 * maxsig, mu = 1.5, s = 0.2, b = 1),
146 |     list(k = 3.5 * maxsig, mu = 20, s = 2, b = 0),
147 |     list(k = 3.5 * maxsig, mu = -2.3, s = 1, b = 0),
148 |     list(k = 3.5 * maxsig, mu = 2.5, s = 0.2, b = 1),
149 |     list(k = 1 * maxsig, mu = 0, s = 1/3, b = 0),
150 |     list(k = 2 * maxsig, mu = -1, s = 1/2, b = 0),
151 |     list(k = 3 * maxsig, mu = -1, s = 1/0.001, b = 0),
152 |     list(k = 5 * maxsig, mu = -2, s = 6, b = 0),
153 |     list(k = 3 * maxsig, mu = -6, s = 1/0.2, b = 0),
154 |     list(k = 6 * maxsig, mu = 0.00010, s = 1, b = 1),
155 |     list(k = 3 * maxsig, mu = -40, s = -10, b = 0),
156 |     list(k = 2 * maxsig, mu = -2, s = -5, b = 0),
157 |     list(k = 1 * maxsig, mu = 0, s = 12, b = 0),
158 |     list(k = 1 * maxsig, mu = -2, s = 11, b = 5),
159 |     list(k = 1 * maxsig, mu = 0.05, s = 0.2, b = 1),
160 |     list(k = 1 * maxsig, mu = 0, s = 1/20, b = 10),
161 |     list(k = 1 * maxsig, mu = -2, s = 1/8, b = -10),
162 |     list(k = 2 * maxsig, mu = 0.5, s = 1/0.2, b = 1),
163 |     list(k = 2 * maxsig, mu = 0, s = 2, b = 0),
164 |     list(k = 2 * maxsig, mu = -0.002, s = 1, b = 0),
165 |     list(k = 2 * maxsig, mu = 0.5, s = 3/7, b = 7),
166 |     list(k = 2 * maxsig, mu = 0, s = 1/9, b = -7),
167 |     list(k = 3 * maxsig, mu = -0.002, s = .00011, b = 0),
168 |     list(k = 3 * maxsig, mu = 0.5, s = 0.2, b = 1),
169 |     list(k = 3 * maxsig, mu = 0, s = 2, b = 0),
170 |     list(k = 3 * maxsig, mu = -2, s = 3.1, b = 10),
171 |     list(k = 3 * maxsig, mu = 0.5, s = 2.2, b = -20),
172 |     list(k = 3 * maxsig, mu = 0, s = 2, b = 0),
173 |     list(k = 4 * maxsig, mu = -2, s = 31, b = 0),
174 |     list(k = 4 * maxsig, mu = 0.5, s = 10.2, b = 1),
175 |     list(k = 4 * maxsig, mu = 0, s = 8, b = 2),
176 |     list(k = 4 * maxsig, mu = -2, s = 42, b = 0),
177 |     list(k = 1.5 * maxsig, mu = 1.5, s = 0.2, b = 0),
178 |     list(k = 1.5 *  maxsig, mu = 2, s = 2.6, b = 0),
179 |     list(k = 1.5 * maxsig, mu = -2, s = 1.2, b = 0),
180 |     list(k = 1.5 * maxsig, mu = -1, s =3.2, b = 0),
181 |     list(k = 1.5 * maxsig, mu = 1, s = 2, b = 0),
182 |     list(k = 2.5 * maxsig, mu = -2, s = 0.4, b = 0),
183 |     list(k = 2.5 * maxsig, mu = 0.5, s = 0.2, b = 0),
184 |     list(k = 2.5 * maxsig, mu = 0.1, s = 2, b = 0),
185 |     list(k = 2.5 * maxsig, mu = -0.2, s = 0.01, b = 0),
186 |     list(k = 3.5 * maxsig, mu = 1.5, s = 0.2, b = 1),
187 |     list(k = 2 *  maxsig, mu = 2, s = 2, b = .0005),
188 |     list(k = 2 * maxsig, mu = -4, s = 1, b = 0),
189 |     list(k = 2 * maxsig, mu = 4, s = 2, b = -.5)
190 |   )
191 | }
192 | 


--------------------------------------------------------------------------------
/R/plot-superSeq.R:
--------------------------------------------------------------------------------
 1 | #' Plotting for superSeq object
 2 | #'
 3 | #' @param x superSeq object
 4 | #' @param ... not used
 5 | #'
 6 | #' @keywords plot
 7 | #' @aliases plot, plot.superSeq
 8 | #' @export
 9 | plot.superSeq <- function(x, ...) {
10 |   subseq_object <- x$subsample
11 |   predictions <- x$predictions
12 |   cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
13 |   
14 |   p0 <- subseq_object %>% 
15 |     ggplot(aes(proportion, significant, color = method))  +
16 |     geom_point(size = 1) +
17 |     geom_line(dat = predictions, aes(y = predicted, x = proportion, color = method), lty = 2, size = 1.0) +
18 |     geom_vline(xintercept = 1, lty = 3) +
19 |     scale_colour_manual(values=cbbPalette) +
20 |     ylab("Number of significant genes") + 
21 |     xlab("Read depth proportion") +
22 |     theme_bw() 
23 |   p0
24 | }
25 | 


--------------------------------------------------------------------------------
/R/summary-superSeq.R:
--------------------------------------------------------------------------------
 1 | #' Summary function of superSeq object
 2 | #' 
 3 | #' @param object superSeq object
 4 | #' @param depth Specified read depth proportion to provide predictions in the summary.
 5 | #' @param digits Number of digits to print.
 6 | #' @param ... Not currently used
 7 | #' 
 8 | #' @export
 9 | summary.superSeq <- function(object, depth = c(1, 2, 3), digits = getOption("digits"), ...) {
10 |   # Call
11 |   model_fit <- object$fits$fit[[1]]
12 |   total_dis <- object$predictions$total_discoveries[1]
13 |   cat("\nCall:\n", deparse(object$call), "\n\n", sep = "")
14 |   # Expected number of discoveries
15 |   cat("Asymptotic number of discoveries:", format(total_dis, digits=digits), "\n", sep="\t")
16 |   cat("\n")
17 |   # Number of significant values for p-value, q-value and local FDR
18 |   new_p <- predict(model_fit[[1]], newdata = data.frame(proportion = depth))
19 |   df <- data.frame(depth, format(new_p, digits = digits), format(new_p / total_dis, digits = digits))
20 | 
21 |   colnames(df) <- c("Read depth proportion", "Estimated number of discoveries", "Estimated experimental power")
22 |   print(df)
23 |   cat("\n")
24 | }


--------------------------------------------------------------------------------
/R/superSeq.R:
--------------------------------------------------------------------------------
 1 | #' Apply superSeq model to subsampling data
 2 | #'
 3 | #' The superSeq function fits a non-linear least squares model to subsampling data to learn the relationship between statistical power and read depth.
 4 | #'
 5 | #' @param object A subSeq summary object.
 6 | #' @param control Specify convergence criteria for non-linear least squares algorithm.
 7 | #' See  \code{\link{defaultNLSControl}}.
 8 | #' @param starts A list of combinations of starting guesses to try with
 9 | #' \code{nls}. If NULL, uses \code{\link{defaultNLSStarts}}.
10 | #' @param new_p A vector of subsampling proportions to predict using the superSeq model fits. By default, triple the subsampling
11 | #' depth is predicted.
12 | #' 
13 | #' @return A superSeq object, which is a \code{data.frame}:
14 | #'
15 | #' \item{fits}{The fitted objected from nls function}
16 | #' \item{subsample}{The subsampled object from subSeq}
17 | #' \item{predictions}{A data frame with the following columns: method used, total predicted discoveries, proportion read depth, predicted number of DE genes, and estimated statistical power.}
18 | #' 
19 | #' @examples
20 | #'\dontrun{
21 | #'library(superSeq)
22 | #'library(subSeq)
23 | #'library(Biobase)
24 | #'# Load bottomly data
25 | #'data(bottomly)
26 | #'bottomly_counts <- exprs(bottomly)
27 | #'bottomly_design <- pData(bottomly)
28 | #'bottomly_counts <- bottomly_counts[rowSums(bottomly_counts) >= 10, ]
29 | #'bottomly_proportions <- 10 ^ seq(-2, 0, 0.1)
30 | #'# Apply subsampling methodology subSeq
31 | #'ss = subsample(counts = bottomly_counts,
32 | #'               proportions = bottomly_proportions,
33 | #'               treatment=bottomly_design$strain, 
34 | #'               method=c("voomLimma"),
35 | #'               replications = 3,
36 | #'               seed = 12345)
37 | #'               ss_sum <- summary(ss)
38 | #'               
39 | #'# apply superSeq model
40 | #'ss_obj <- superSeq(ss_sum)
41 | #'
42 | #'# plot results
43 | #'plot(ss_obj)
44 | #'
45 | #'# summarise results
46 | #'summary(ss_obj)
47 | #'}
48 | #' 
49 | #' @seealso \code{\link{fitnls}}
50 | #'
51 | #' @import dplyr subSeq purrr ggplot2
52 | #' @export
53 | superSeq <- function(object, control = defaultNLSControl, starts = NULL, new_p = NULL) {
54 |   if (is.null(new_p)) new_p <- seq(0, 3, 0.05)
55 |   # apply non-linear least squares algorithm
56 |   fits <- object %>%
57 |     group_by(method) %>%
58 |     do(fit = fitnls(., control = control, starts = starts)) %>%
59 |     ungroup()
60 |   # use model to provide predictions
61 |   predictions <- fits %>% 
62 |     group_by(method) %>%
63 |     do(data.frame(proportion = new_p,
64 |                   predicted = predict_func(fit =.$fit[[1]][[1]], data = new_p))) %>%
65 |     distinct()
66 |   
67 |   predictions <- fits %>%
68 |     group_by(method) %>%
69 |     do(data.frame(total_discoveries = coef(.$fit[[1]]$fit)[1])) %>%
70 |     right_join(predictions, by = "method") %>%
71 |     mutate(estimated_power = predicted / total_discoveries)
72 |   out <- list(call = match.call(), subsample = object, fits = fits, predictions = predictions)
73 |   class(out) <- "superSeq"
74 |   out
75 | }
76 | 
77 | # Function to get predictions from NLS model fits 
78 | predict_func <- function(fit, data) {
79 |   predict(fit, newdata = data.frame(proportion = data))
80 | }
81 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | superSeq: Determining sufficient read depth in RNA-Seq experiments
 2 | =======
 3 | 
 4 | The superSeq package models the relationship between statistical power and read depth in an RNA sequencing study. Our algorithm can help predict how many additional reads, if any, should be sequenced to achieve desired statistical power.
 5 | 
 6 | See also [superSeq: Determining sufficient sequencing depth in RNA-Seq differential expression studies](https://www.biorxiv.org/content/10.1101/635623v1).
 7 | 
 8 | Installation
 9 | -------------
10 | 
11 | First install the Bioconductor dependencies:
12 | 
13 |     source("http://bioconductor.org/biocLite.R")
14 |     biocLite(c("qvalue", "limma", "edgeR", "DESeq2", "DEXSeq", "pasilla"))
15 | 
16 | Then install the [devtools](https://github.com/hadley/devtools) package, and use it to install the [subSeq](https://github.com/StoreyLab/subSeq) and superSeq packages. 
17 | 
18 |     install.packages("devtools")
19 |     library(devtools)
20 |     install_github("StoreyLab/subSeq")
21 |     install_github("StoreyLab/superSeq", build_opts = c("--no-resave-data", "--no-manual"), build_vignettes = TRUE)
22 | 
23 | Vignette
24 | ---------------------
25 | 
26 | Once you've installed the package, you can access the vignette with
27 | 
28 |     library(superSeq)
29 |     vignette("superSeq")
30 | 
31 | If you run into a problem or have a question about the software's usage, please open a [GitHub issue](https://github.com/StoreyLab/superSeq/issues).
32 | 


--------------------------------------------------------------------------------
/data/bottomly.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/StoreyLab/superSeq/52c287dae1c9daa6c141fa866dc2536298c8b815/data/bottomly.RData


--------------------------------------------------------------------------------
/man/bottomly.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/bottomly.R
 3 | \docType{data}
 4 | \name{bottomly}
 5 | \alias{bottomly}
 6 | \title{RNA-Seq counts from Bottomly et al 2011}
 7 | \description{
 8 | An expressionSet object of the Bottomly et al study.
 9 | 
10 | This was downloaded from the ReCount database of analysis-ready RNA-Seq datasets (Frazee et al 2011).
11 | 
12 | Bottomly D, Walter NA, Hunter JE, et al. Evaluating gene expression in C57BL/6J and DBA/2J mouse striatum
13 |  using RNA-Seq and microarrays. PLoS One. 2011;6(3):e17820. Published 2011 Mar 24. doi:10.1371/journal.pone.0017820
14 | 
15 | Frazee, A. C., Langmead, B., and Leek, J. T. (2011). ReCount: a multi-experiment resource of analysis-ready
16 | RNA-seq gene count datasets. BMC Bioinformatics, 12, 449.
17 | http://bowtie-bio.sourceforge.net/recount/
18 | }
19 | 


--------------------------------------------------------------------------------
/man/defaultNLSControl.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/nls-functions.R
 3 | \docType{data}
 4 | \name{defaultNLSControl}
 5 | \alias{defaultNLSControl}
 6 | \title{Default control for nonlinear least squares fitting of saturation curves}
 7 | \format{An object of class \code{list} of length 5.}
 8 | \usage{
 9 | defaultNLSControl
10 | }
11 | \description{
12 | This is a wrapper for nls.control. It is a list that can be directly appended.
13 | See nls.control for more details.
14 | }
15 | \keyword{datasets}
16 | 


--------------------------------------------------------------------------------
/man/defaultNLSStarts.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/nls-functions.R
 3 | \name{defaultNLSStarts}
 4 | \alias{defaultNLSStarts}
 5 | \title{Default combinations of starting values for nonlinear least squares fitting
 6 | of saturation curves}
 7 | \usage{
 8 | defaultNLSStarts(maxsig)
 9 | }
10 | \arguments{
11 | \item{maxsig}{the maximum number of significant genes in the curve}
12 | }
13 | \value{
14 | a list of lists, each of which serves as a set of starting
15 | parameter estimates for nls
16 | }
17 | \description{
18 | Default combinations of starting values for nonlinear least squares fitting
19 | of saturation curves
20 | }
21 | 


--------------------------------------------------------------------------------
/man/fitnls.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/nls-functions.R
 3 | \name{fitnls}
 4 | \alias{fitnls}
 5 | \title{Fit significance curve using nonlinear least squares}
 6 | \usage{
 7 | fitnls(dat, method = "probit", starts = NULL,
 8 |   control = defaultNLSControl)
 9 | }
10 | \arguments{
11 | \item{dat}{a data.frame from summary.subsamples, containing columns for
12 | "proportion" and "significant"}
13 | 
14 | \item{method}{specifies the method used to fit the model. Default is probit (superSeq). "logit" and "smoother" are possible
15 | options but are mainly for use in our manuscript.}
16 | 
17 | \item{starts}{A list of combinations of starting guesses to try with
18 | \code{nls}. If NULL, uses \code{\link{defaultNLSStarts}}}
19 | 
20 | \item{control}{control for \code{nls}, by default \link{defaultNLSControl}}
21 | }
22 | \description{
23 | Fit a nonlinear least squares model explaining the number of significant
24 | genes based on the proportion of reads used. Tries each of the starting
25 | combinations of parameters in the given order. If all fail, returns NULL.
26 | }
27 | \seealso{
28 | \code{\link{defaultNLSStarts}}
29 | }
30 | 


--------------------------------------------------------------------------------
/man/plot.superSeq.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/plot-superSeq.R
 3 | \name{plot.superSeq}
 4 | \alias{plot.superSeq}
 5 | \alias{plot,}
 6 | \title{Plotting for superSeq object}
 7 | \usage{
 8 | \method{plot}{superSeq}(x, ...)
 9 | }
10 | \arguments{
11 | \item{x}{superSeq object}
12 | 
13 | \item{...}{not used}
14 | }
15 | \description{
16 | Plotting for superSeq object
17 | }
18 | \keyword{plot}
19 | 


--------------------------------------------------------------------------------
/man/summary.superSeq.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/summary-superSeq.R
 3 | \name{summary.superSeq}
 4 | \alias{summary.superSeq}
 5 | \title{Summary function of superSeq object}
 6 | \usage{
 7 | \method{summary}{superSeq}(object, depth = c(1, 2, 3),
 8 |   digits = getOption("digits"), ...)
 9 | }
10 | \arguments{
11 | \item{object}{superSeq object}
12 | 
13 | \item{depth}{Specified read depth proportion to provide predictions in the summary.}
14 | 
15 | \item{digits}{Number of digits to print.}
16 | 
17 | \item{...}{Not currently used}
18 | }
19 | \description{
20 | Summary function of superSeq object
21 | }
22 | 


--------------------------------------------------------------------------------
/man/superSeq.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/superSeq.R
 3 | \name{superSeq}
 4 | \alias{superSeq}
 5 | \title{Apply superSeq model to subsampling data}
 6 | \usage{
 7 | superSeq(object, control = defaultNLSControl, starts = NULL,
 8 |   new_p = NULL)
 9 | }
10 | \arguments{
11 | \item{object}{A subSeq summary object.}
12 | 
13 | \item{control}{Specify convergence criteria for non-linear least squares algorithm.
14 | See  \code{\link{defaultNLSControl}}.}
15 | 
16 | \item{starts}{A list of combinations of starting guesses to try with
17 | \code{nls}. If NULL, uses \code{\link{defaultNLSStarts}}.}
18 | 
19 | \item{new_p}{A vector of subsampling proportions to predict using the superSeq model fits. By default, triple the subsampling
20 | depth is predicted.}
21 | }
22 | \value{
23 | A superSeq object, which is a \code{data.frame}:
24 | 
25 | \item{fits}{The fitted objected from nls function}
26 | \item{subsample}{The subsampled object from subSeq}
27 | \item{predictions}{A data frame with the following columns: method used, total predicted discoveries, proportion read depth, predicted number of DE genes, and estimated statistical power.}
28 | }
29 | \description{
30 | The superSeq function fits a non-linear least squares model to subsampling data to learn the relationship between statistical power and read depth.
31 | }
32 | \examples{
33 | \dontrun{
34 | library(superSeq)
35 | library(subSeq)
36 | library(Biobase)
37 | # Load bottomly data
38 | data(bottomly)
39 | bottomly_counts <- exprs(bottomly)
40 | bottomly_design <- pData(bottomly)
41 | bottomly_counts <- bottomly_counts[rowSums(bottomly_counts) >= 10, ]
42 | bottomly_proportions <- 10 ^ seq(-2, 0, 0.1)
43 | # Apply subsampling methodology subSeq
44 | ss = subsample(counts = bottomly_counts,
45 |               proportions = bottomly_proportions,
46 |               treatment=bottomly_design$strain, 
47 |               method=c("voomLimma"),
48 |               replications = 3,
49 |               seed = 12345)
50 |               ss_sum <- summary(ss)
51 |               
52 | # apply superSeq model
53 | ss_obj <- superSeq(ss_sum)
54 | 
55 | # plot results
56 | plot(ss_obj)
57 | 
58 | # summarise results
59 | summary(ss_obj)
60 | }
61 | 
62 | }
63 | \seealso{
64 | \code{\link{fitnls}}
65 | }
66 | 


--------------------------------------------------------------------------------
/vignettes/superSeq.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: "Determining sufficient read depth with superSeq"
 3 | author: "Andrew J. Bass, David G. Robinson, John D. Storey"
 4 | output: rmarkdown::html_vignette
 5 | vignette: >
 6 |   %\VignetteIndexEntry{Determining sufficient read depth with superSeq}
 7 |   %\VignetteEngine{knitr::rmarkdown}
 8 |   \usepackage[utf8]{inputenc}
 9 | ---
10 | 
11 | ```{r setup, include=FALSE}
12 | knitr::opts_chunk$set(echo = TRUE)
13 | ```
14 | 
15 | The superSeq package can help determine whether a completed study has sufficient read depth to achieve desired statistical power. Using superSeq, we can predict the increase in statistical power from an increasing read depth (given a fixed number of biological replicates). For more details, see our recent [preprint](TODO).
16 | 
17 | ## Quick start guide
18 | 
19 | The superSeq package predicts the relationship between statistical power and read depth using subsampling data. Therefore, we need to apply a subsampling methodology to the experiment and then fit our model using the `superSeq` function. In this package, we use the computationally efficient [subSeq](https://github.com/StoreyLab/subSeq) package. 
20 | 
21 | As an example, let's first load the package and apply the subsampling function `subsample` to the `bottomly` data set:
22 | 
23 | ```{r quick_start, message = FALSE, warning = FALSE}
24 | library(superSeq)
25 | library(subSeq)
26 | library(Biobase)
27 | data(bottomly)
28 | 
29 | # Extract count matrix, experimental design and filter low count genes
30 | bottomly_counts <- exprs(bottomly)
31 | bottomly_design <- pData(bottomly)
32 | bottomly_counts <- bottomly_counts[rowSums(bottomly_counts) >= 10, ]
33 | 
34 | # Generate the subsampling data for this study at specified proportions
35 | bottomly_proportions <- 10 ^ seq(-2, 0, 0.1)
36 | ss = subsample(counts = bottomly_counts,
37 |                proportions = bottomly_proportions,
38 |                treatment = bottomly_design$strain, 
39 |                method = c("voomLimma"),
40 |                replications = 3,
41 |                seed = 12345)
42 | ss_sum <- summary(ss)
43 | head(ss_sum)
44 | ```
45 | 
46 | Type `?subSeq` for additional details on the subsampling implementation. Now that we have the subsampling results, we can apply the `superSeq` function and view the predictions from the model using the `plot` function:
47 | 
48 | ```{r, fig.width=5}
49 | ss_obj <- superSeq(ss_sum)
50 | plot(ss_obj)
51 | ```
52 | 
53 | It is evident from the above plot that the study is undersaturated, i.e., the study can expect a substantial increase in statistical power from sequencing additional reads. We can extract a summary of our predictions as follows:
54 | 
55 | ```{r}
56 | summary(ss_obj)
57 | ```
58 | 
59 | The estimated asymptotic number of discoveries is the expected number of differentially expressed genes when the technical variability is minimized. Thus the current read depth provides 53.3\% of the total power and doubling or tripling the read depth will provide a 14.3\% or 21.8\% increase in power, respectively.
60 | 


--------------------------------------------------------------------------------