├── .Rbuildignore
├── .gitignore
├── .travis.yml
├── DESCRIPTION
├── NAMESPACE
├── NEWS.md
├── R
    ├── bootstrapify.R
    ├── collect.R
    ├── compat-dplyr.R
    ├── compat-purrr.R
    ├── compat-tidyr.R
    ├── samplify.R
    ├── util.R
    └── zzz.R
├── README.Rmd
├── README.md
├── _pkgdown.yml
├── codecov.yml
├── cran-comments.md
├── man
    ├── bootstrapify.Rd
    ├── collect.resampled_df.Rd
    ├── figures
    │   ├── README-bootstrap-plots-1.png
    │   ├── README-non-bootstrap-plots-1.png
    │   └── strap-god.jpg
    ├── reexports.Rd
    └── samplify.Rd
├── strapgod.Rproj
├── tests
    ├── testthat.R
    └── testthat
    │   ├── test-bootstrapify.R
    │   ├── test-collect.R
    │   ├── test-compat-tidyr.R
    │   ├── test-dplyr-compat.R
    │   ├── test-dplyr-do.R
    │   ├── test-dplyr-group-funs.R
    │   ├── test-dplyr-summarise.R
    │   └── test-samplify.R
└── vignettes
    ├── .gitignore
    ├── dplyr-support.Rmd
    └── virtual-bootstraps.Rmd


/.Rbuildignore:
--------------------------------------------------------------------------------
 1 | ^CRAN-RELEASE$
 2 | ^cran-comments\.md$
 3 | ^pkgdown$
 4 | ^_pkgdown\.yml$
 5 | ^docs$
 6 | ^codecov\.yml$
 7 | ^\.travis\.yml$
 8 | ^README\.Rmd$
 9 | ^strapgod\.Rproj$
10 | ^\.Rproj\.user$
11 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | docs/
2 | inst/doc
3 | .Rhistory
4 | .RData
5 | .Rproj.user
6 | 


--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
 1 | # R for travis: see documentation at https://docs.travis-ci.com/user/languages/r
 2 | 
 3 | language: r
 4 | sudo: false
 5 | cache: packages
 6 | 
 7 | matrix:
 8 |   include:
 9 |   - r: devel
10 |   - r: release
11 |     after_success:
12 |     - Rscript -e 'covr::codecov()'
13 |     before_cache:
14 |       - Rscript -e 'remotes::install_cran("pkgdown")'
15 |       # For the README
16 |       - Rscript -e 'remotes::install_cran("ggplot2")'
17 |     deploy:
18 |       provider: script
19 |       script: Rscript -e 'pkgdown::deploy_site_github(run_dont_run = TRUE)'
20 |       skip_cleanup: true
21 |   - r: oldrel
22 |   - r: 3.2
23 | 


--------------------------------------------------------------------------------
/DESCRIPTION:
--------------------------------------------------------------------------------
 1 | Package: strapgod
 2 | Title: Resampled Data Frames
 3 | Version: 0.0.4.9000
 4 | Authors@R: 
 5 |     person(given = "Davis",
 6 |            family = "Vaughan",
 7 |            role = c("aut", "cre"),
 8 |            email = "davis@rstudio.com")
 9 | Description: Create data frames with virtual groups that can be used with 
10 |     'dplyr' to efficiently compute resampled statistics, generate the data for
11 |     hypothetical outcome plots, and fit multiple models on resampled variations
12 |     of the original data.
13 | License: GPL-3
14 | Encoding: UTF-8
15 | LazyData: true
16 | Roxygen: list(markdown = TRUE)
17 | RoxygenNote: 6.1.1
18 | Depends:
19 |     R (>= 3.2.0)
20 | Imports: 
21 |     dplyr (>= 0.8.3),
22 |     tidyr (>= 1.0.0),
23 |     rlang,
24 |     tibble,
25 |     lifecycle
26 | Suggests: 
27 |     broom,
28 |     testthat,
29 |     covr,
30 |     knitr,
31 |     rmarkdown
32 | URL: https://github.com/DavisVaughan/strapgod
33 | BugReports: https://github.com/DavisVaughan/strapgod/issues
34 | VignetteBuilder: knitr
35 | 


--------------------------------------------------------------------------------
/NAMESPACE:
--------------------------------------------------------------------------------
 1 | # Generated by roxygen2: do not edit by hand
 2 | 
 3 | S3method(anti_join,resampled_df)
 4 | S3method(arrange,resampled_df)
 5 | S3method(arrange_,resampled_df)
 6 | S3method(bootstrapify,data.frame)
 7 | S3method(bootstrapify,grouped_df)
 8 | S3method(bootstrapify,tbl_df)
 9 | S3method(collect,resampled_df)
10 | S3method(distinct,resampled_df)
11 | S3method(filter,resampled_df)
12 | S3method(full_join,resampled_df)
13 | S3method(group_by,resampled_df)
14 | S3method(group_indices,resampled_df)
15 | S3method(group_indices_,resampled_df)
16 | S3method(group_nest,resampled_df)
17 | S3method(group_split,resampled_df)
18 | S3method(inner_join,resampled_df)
19 | S3method(left_join,resampled_df)
20 | S3method(mutate,resampled_df)
21 | S3method(mutate_,resampled_df)
22 | S3method(nest,resampled_df)
23 | S3method(pull,resampled_df)
24 | S3method(rename,resampled_df)
25 | S3method(right_join,resampled_df)
26 | S3method(samplify,data.frame)
27 | S3method(samplify,grouped_df)
28 | S3method(samplify,tbl_df)
29 | S3method(select,resampled_df)
30 | S3method(semi_join,resampled_df)
31 | S3method(slice,resampled_df)
32 | S3method(slice_,resampled_df)
33 | S3method(summarise,resampled_df)
34 | S3method(summarise_,resampled_df)
35 | S3method(transmute,resampled_df)
36 | export(bootstrapify)
37 | export(filter)
38 | export(samplify)
39 | importFrom(dplyr,anti_join)
40 | importFrom(dplyr,arrange)
41 | importFrom(dplyr,arrange_)
42 | importFrom(dplyr,collect)
43 | importFrom(dplyr,distinct)
44 | importFrom(dplyr,filter)
45 | importFrom(dplyr,full_join)
46 | importFrom(dplyr,group_by)
47 | importFrom(dplyr,group_indices)
48 | importFrom(dplyr,group_indices_)
49 | importFrom(dplyr,group_nest)
50 | importFrom(dplyr,group_split)
51 | importFrom(dplyr,inner_join)
52 | importFrom(dplyr,left_join)
53 | importFrom(dplyr,mutate)
54 | importFrom(dplyr,mutate_)
55 | importFrom(dplyr,pull)
56 | importFrom(dplyr,rename)
57 | importFrom(dplyr,right_join)
58 | importFrom(dplyr,select)
59 | importFrom(dplyr,semi_join)
60 | importFrom(dplyr,slice)
61 | importFrom(dplyr,slice_)
62 | importFrom(dplyr,summarise)
63 | importFrom(dplyr,summarise_)
64 | importFrom(dplyr,transmute)
65 | importFrom(lifecycle,deprecated)
66 | importFrom(rlang,":=")
67 | importFrom(tidyr,nest)
68 | 


--------------------------------------------------------------------------------
/NEWS.md:
--------------------------------------------------------------------------------
 1 | # strapgod (development version)
 2 | 
 3 | # strapgod 0.0.4
 4 | 
 5 | * The resampled data frame method for `tidyr::nest()` has been updated to
 6 |   support the changes in tidyr 1.0.0.
 7 | 
 8 | # strapgod 0.0.3
 9 | 
10 | * In dplyr 0.8.2, the `ptype` is carried along as an attribute 
11 |   in `group_split()`. A test was updated to reflect this.
12 | 
13 | * In dplyr 0.8.2, the behavior of `tbl_vars()` was changed to return a classed
14 |   object. A test has been updated to reflect this (#13).
15 | 
16 | # strapgod 0.0.2
17 | 
18 | * In dplyr 0.8.1, the behavior of `group_map()` was moved to `group_modify()`,
19 |   and `group_map()` was repurposed to always return a list. strapgod has been
20 |   updated to reflect these changes.
21 | 
22 | # strapgod 0.0.1
23 | 
24 | * Added a `NEWS.md` file to track changes to the package.
25 | 


--------------------------------------------------------------------------------
/R/bootstrapify.R:
--------------------------------------------------------------------------------
 1 | #' Create a bootstrapped tibble
 2 | #'
 3 | #' @description
 4 | #'
 5 | #' `bootstrapify()` creates a bootstrapped tibble with _virtual groups_.
 6 | #'
 7 | #' @inherit samplify details
 8 | #'
 9 | #' @inherit samplify return
10 | #'
11 | #' @seealso [collect.resampled_df()]
12 | #'
13 | #' @inheritParams samplify
14 | #'
15 | #' @examples
16 | #' library(dplyr)
17 | #' library(broom)
18 | #'
19 | #' bootstrapify(iris, 5)
20 | #'
21 | #' iris %>%
22 | #'   bootstrapify(5) %>%
23 | #'   summarise(per_strap_mean = mean(Petal.Width))
24 | #'
25 | #' iris %>%
26 | #'   group_by(Species) %>%
27 | #'   bootstrapify(5) %>%
28 | #'   summarise(per_strap_species_mean = mean(Petal.Width))
29 | #'
30 | #' iris %>%
31 | #'   bootstrapify(5) %>%
32 | #'   do(tidy(lm(Sepal.Width ~ Sepal.Length + Species, data = .)))
33 | #'
34 | #' # Alternatively, use the newer group_modify()
35 | #' iris %>%
36 | #'   bootstrapify(5) %>%
37 | #'   group_modify(~tidy(lm(Sepal.Width ~ Sepal.Length + Species, data = .x)))
38 | #'
39 | #' # Alter the name of the group with `key`
40 | #' # Materialize them with collect()
41 | #' straps <- bootstrapify(iris, 5, key = ".straps")
42 | #' collect(straps)
43 | #'
44 | #' @family virtual samplers
45 | #'
46 | #' @name bootstrapify
47 | 
48 | #' @rdname bootstrapify
49 | #' @export
50 | bootstrapify <- function(data, times, ..., key = ".bootstrap") {
51 |   UseMethod("bootstrapify")
52 | }
53 | 
54 | #' @export
55 | bootstrapify.data.frame <- function(data, times, ..., key = ".bootstrap") {
56 |   bootstrapify(dplyr::as_tibble(data), times, ..., key = key)
57 | }
58 | 
59 | #' @export
60 | bootstrapify.tbl_df <- function(data, times, ..., key = ".bootstrap") {
61 |   samplify(
62 |     data = data,
63 |     times = times,
64 |     size = nrow(data),
65 |     ...,
66 |     replace = TRUE,
67 |     key = key
68 |   )
69 | }
70 | 
71 | #' @export
72 | bootstrapify.grouped_df <- function(data, times, ..., key = ".bootstrap") {
73 |   samplify(
74 |     data = data,
75 |     times = times,
76 |     size = dplyr::group_size(data),
77 |     ...,
78 |     replace = TRUE,
79 |     key = key
80 |   )
81 | }
82 | 


--------------------------------------------------------------------------------
/R/collect.R:
--------------------------------------------------------------------------------
  1 | #' Force virtual groups to become explicit rows
  2 | #'
  3 | #' When `collect()` is used on a `resampled_df`, the virtual bootstrap groups
  4 | #' are made explicit.
  5 | #'
  6 | #' @param x A `resampled_df`.
  7 | #'
  8 | #' @param ... Not used.
  9 | #'
 10 | #' @param id Optional. A single character that specifies a name for a column
 11 | #' containing a sequence from `1:n` for each bootstrap group.
 12 | #'
 13 | #' @param original_id Optional. A single character that specifies a name for
 14 | #' a column containing the original position of the bootstrapped row.
 15 | #'
 16 | #' @examples
 17 | #' library(dplyr)
 18 | #'
 19 | #' # virtual groups become real rows
 20 | #' collect(bootstrapify(iris, 5))
 21 | #'
 22 | #' # add on the id column for an identifier per bootstrap
 23 | #' collect(bootstrapify(iris, 5), id = ".id")
 24 | #'
 25 | #' # add on the original_id column to know which row this bootstrapped row
 26 | #' # originally came from
 27 | #' collect(bootstrapify(iris, 5), original_id = ".original_id")
 28 | #'
 29 | #' @export
 30 | collect.resampled_df <- function(x, ..., id = NULL, original_id = NULL) {
 31 | 
 32 |   check_empty_dots(...)
 33 |   validate_null_or_single_character(id, "id")
 34 |   validate_null_or_single_character(original_id, "original_id")
 35 | 
 36 |   group_syms <- dplyr::groups(x)
 37 |   group_tbl <- dplyr::group_data(x)
 38 |   x <- dplyr::ungroup(x)
 39 | 
 40 |   # The only column names that are not in the x and are not '.rows' is the .key
 41 |   # Could potentially have multiple bootstrap columns
 42 |   .key <- setdiff(colnames(group_tbl), c(colnames(x), ".rows"))
 43 | 
 44 |   # Strip off non-virtual groups
 45 |   .out <- dplyr::select(group_tbl, !!!.key, .rows)
 46 | 
 47 |   # Order of these calls matters
 48 |   .out <- maybe_use_id(.out, id)
 49 |   .out <- add_straps(.out, x)
 50 |   .out <- maybe_use_original_id(.out, original_id)
 51 | 
 52 |   # Flatten
 53 |   .out <- tidyr::unnest(.out, cols = c(!!id, !!original_id, ...x))
 54 | 
 55 |   .out <- dplyr::group_by(.out, !!!group_syms)
 56 | 
 57 |   .out
 58 | }
 59 | 
 60 | # ------------------------------------------------------------------------------
 61 | 
 62 | # id = 1:n for each group
 63 | maybe_use_id <- function(.out, id) {
 64 | 
 65 |   if(!is.null(id)) {
 66 | 
 67 |     id_col <- map(.out[[".rows"]], seq_along)
 68 | 
 69 |     .out <- tibble::add_column(.out, !!id := id_col, .before = ".rows")
 70 |   }
 71 | 
 72 |   .out
 73 | }
 74 | 
 75 | # Repeat `x` rows to generate the bootstraps
 76 | # Does vctrs::vec_slice() actually speed this up?
 77 | # Limited benchmarking seemed inconclusive
 78 | add_straps <- function(.out, x) {
 79 | 
 80 |   .out[["...x"]] <- map(
 81 |     .x = .out[[".rows"]],
 82 |     .f = function(idx) x[idx, , drop = FALSE]
 83 |   )
 84 | 
 85 |   .out
 86 | }
 87 | 
 88 | maybe_use_original_id <- function(.out, original_id) {
 89 | 
 90 |   if (!is.null(original_id)) {
 91 |     .out <- dplyr::rename(.out, !!original_id := .rows)
 92 |   }
 93 |   else {
 94 |     .out[[".rows"]] <- NULL
 95 |   }
 96 | 
 97 |   .out
 98 | }
 99 | 
100 | # ------------------------------------------------------------------------------
101 | 
102 | validate_null_or_single_character <- function(.x, .x_nm) {
103 | 
104 |   if (is.null(.x)) {
105 |     return(invisible(.x))
106 |   }
107 | 
108 |   if (!rlang::is_scalar_character(.x)) {
109 |     msg <- paste0("`", .x_nm, "` must be a character of size 1.")
110 |     rlang::abort(msg)
111 |   }
112 | 
113 |   invisible(.x)
114 | }
115 | 


--------------------------------------------------------------------------------
/R/compat-dplyr.R:
--------------------------------------------------------------------------------
  1 | # ------------------------------------------------------------------------------
  2 | # Interesting dplyr functions
  3 | 
  4 | # summarise()
  5 | # do()
  6 | # ungroup()
  7 | # group_nest()
  8 | # group_map()
  9 | # group_modify()
 10 | # group_walk()
 11 | # group_split()
 12 | # group_keys()
 13 | # group_indices()
 14 | 
 15 | # In theory we could let the default `summarise()` do its thing. But if the
 16 | # user did a double `bootstrapify()` call, only one level of it will be removed
 17 | # and the post-summarise() object will still be a resampled_df, even though
 18 | # all of the bootstrap rows have been materialized.
 19 | 
 20 | #' @importFrom dplyr summarise
 21 | #' @export
 22 | summarise.resampled_df <- function(.data, ...) {
 23 |   maybe_new_grouped_df(NextMethod())
 24 | }
 25 | 
 26 | # For `group_nest()`, the default method works unless `keep = TRUE`. In that
 27 | # case, we need to `collect()` so the groups are available to be 'kept'.
 28 | 
 29 | #' @importFrom dplyr group_nest
 30 | #' @export
 31 | group_nest.resampled_df <- function(.tbl, ..., .key = "data", keep = FALSE) {
 32 | 
 33 |   if (keep) {
 34 |     dplyr::group_nest(collect(.tbl), ..., .key = .key, keep = keep)
 35 |   }
 36 |   else {
 37 |     NextMethod()
 38 |   }
 39 | 
 40 | }
 41 | 
 42 | # Same idea as group_nest()
 43 | 
 44 | #' @importFrom dplyr group_split
 45 | #' @export
 46 | group_split.resampled_df <- function(.tbl, ..., keep = TRUE) {
 47 | 
 48 |   if (keep) {
 49 |     dplyr::group_split(collect(.tbl), ..., keep = keep)
 50 |   }
 51 |   else {
 52 |     NextMethod()
 53 |   }
 54 | 
 55 | }
 56 | 
 57 | # `group_indices()` returns garbage unless we `collect()` first
 58 | 
 59 | #' @importFrom dplyr group_indices
 60 | #' @export
 61 | group_indices.resampled_df <- function(.data, ...) {
 62 |   dplyr::group_indices(collect(.data), ...)
 63 | }
 64 | 
 65 | # ------------------------------------------------------------------------------
 66 | # Interesting dplyr functions - Standard evaluation backwards compat
 67 | 
 68 | # nocov start
 69 | 
 70 | #' @importFrom dplyr summarise_
 71 | #' @export
 72 | summarise_.resampled_df <- function(.data, ..., .dots = list()) {
 73 |   maybe_new_grouped_df(NextMethod())
 74 | }
 75 | 
 76 | #' @importFrom dplyr group_indices_
 77 | #' @export
 78 | group_indices_.resampled_df <- function(.data, ..., .dots = list()) {
 79 |   dplyr::group_indices_(collect(.data), ..., .dots = list())
 80 | }
 81 | 
 82 | # nocov end
 83 | 
 84 | # ------------------------------------------------------------------------------
 85 | # dplyr support
 86 | 
 87 | #' @importFrom dplyr mutate
 88 | #' @export
 89 | mutate.resampled_df <- function(.data, ...) {
 90 |   dplyr::mutate(collect(.data), ...)
 91 | }
 92 | 
 93 | #' @importFrom dplyr transmute
 94 | #' @export
 95 | transmute.resampled_df <- function(.data, ...) {
 96 |   dplyr::transmute(collect(.data), ...)
 97 | }
 98 | 
 99 | # Required to export filter, otherwise:
100 | # Warning: declared S3 method 'filter.resampled_df' not found
101 | # because of stats::filter
102 | 
103 | #' @export
104 | dplyr::filter
105 | 
106 | #' @importFrom dplyr filter
107 | #' @export
108 | filter.resampled_df <- function(.data, ...) {
109 |   dplyr::filter(collect(.data), ...)
110 | }
111 | 
112 | #' @importFrom dplyr arrange
113 | #' @export
114 | arrange.resampled_df <- function(.data, ...) {
115 |   dplyr::arrange(collect(.data), ...)
116 | }
117 | 
118 | #' @importFrom dplyr distinct
119 | #' @export
120 | distinct.resampled_df <- function(.data, ..., .keep_all = FALSE) {
121 |   dplyr::distinct(collect(.data), ..., .keep_all = .keep_all)
122 | }
123 | 
124 | #' @importFrom dplyr select
125 | #' @export
126 | select.resampled_df <- function(.data, ...) {
127 |   dplyr::select(collect(.data), ...)
128 | }
129 | 
130 | #' @importFrom dplyr slice
131 | #' @export
132 | slice.resampled_df <- function(.data, ...) {
133 |   dplyr::slice(collect(.data), ...)
134 | }
135 | 
136 | #' @importFrom dplyr pull
137 | #' @export
138 | pull.resampled_df <- function(.data, var = -1) {
139 |   dplyr::pull(collect(.data), var = !!rlang::enquo(var))
140 | }
141 | 
142 | #' @importFrom dplyr rename
143 | #' @export
144 | rename.resampled_df <- function(.data, ...) {
145 |   dplyr::rename(collect(.data), ...)
146 | }
147 | 
148 | #' @importFrom dplyr full_join
149 | #' @export
150 | full_join.resampled_df <- function(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) {
151 |   dplyr::full_join(collect(x), collect(y), by = by, copy = copy, suffix = suffix, ...)
152 | }
153 | 
154 | #' @importFrom dplyr inner_join
155 | #' @export
156 | inner_join.resampled_df <- function(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) {
157 |   dplyr::inner_join(collect(x), collect(y), by = by, copy = copy, suffix = suffix, ...)
158 | }
159 | 
160 | #' @importFrom dplyr left_join
161 | #' @export
162 | left_join.resampled_df <- function(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) {
163 |   dplyr::left_join(collect(x), collect(y), by = by, copy = copy, suffix = suffix, ...)
164 | }
165 | 
166 | #' @importFrom dplyr right_join
167 | #' @export
168 | right_join.resampled_df <- function(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) {
169 |   dplyr::right_join(collect(x), collect(y), by = by, copy = copy, suffix = suffix, ...)
170 | }
171 | 
172 | #' @importFrom dplyr anti_join
173 | #' @export
174 | anti_join.resampled_df <- function(x, y, by = NULL, copy = FALSE, ...) {
175 |   dplyr::anti_join(collect(x), collect(y), by = by, copy = copy, ...)
176 | }
177 | 
178 | #' @importFrom dplyr semi_join
179 | #' @export
180 | semi_join.resampled_df <- function(x, y, by = NULL, copy = FALSE, ...) {
181 |   dplyr::semi_join(collect(x), collect(y), by = by, copy = copy, ...)
182 | }
183 | 
184 | #' @importFrom dplyr group_by
185 | #' @export
186 | group_by.resampled_df <- function(.data, ..., add = FALSE, .drop = FALSE) {
187 | 
188 |   if (add) {
189 |     .data <- collect(.data)
190 |   }
191 |   else {
192 |     .data <- dplyr::ungroup(.data)
193 |   }
194 | 
195 |  dplyr::group_by(.data, ..., add = add, .drop = .drop)
196 | }
197 | 
198 | # ------------------------------------------------------------------------------
199 | # Backwards compat support for deprecated standard eval dplyr
200 | 
201 | # nocov start
202 | 
203 | # Only a few of them need it. arrange_.grouped_df()
204 | # directly calls arrange_impl() causing a problem.
205 | 
206 | #' @importFrom dplyr arrange_
207 | #' @export
208 | arrange_.resampled_df <- function(.data, ..., .dots = list()) {
209 |   dplyr::arrange_(collect(.data), ..., .dots = .dots)
210 | }
211 | 
212 | #' @importFrom dplyr mutate_
213 | #' @export
214 | mutate_.resampled_df <- function(.data, ..., .dots = list()) {
215 |   dplyr::mutate_(collect(.data), ..., .dots = .dots)
216 | }
217 | 
218 | #' @importFrom dplyr slice_
219 | #' @export
220 | slice_.resampled_df <- function(.data, ..., .dots = list()) {
221 |   dplyr::slice_(collect(.data), ..., .dots = .dots)
222 | }
223 | 
224 | # nocov end
225 | 
226 | # ------------------------------------------------------------------------------
227 | # Util
228 | 
229 | maybe_new_grouped_df <- function(x) {
230 | 
231 |   if (dplyr::is_grouped_df(x)) {
232 |     x <- dplyr::new_grouped_df(x = x, groups = dplyr::group_data(x))
233 |   }
234 | 
235 |   x
236 | }
237 | 


--------------------------------------------------------------------------------
/R/compat-purrr.R:
--------------------------------------------------------------------------------
 1 | # nocov start - compat-purrr (last updated: rlang 0.3.0.9000)
 2 | 
 3 | # This file serves as a reference for compatibility functions for
 4 | # purrr. They are not drop-in replacements but allow a similar style
 5 | # of programming. This is useful in cases where purrr is too heavy a
 6 | # package to depend on. Please find the most recent version in rlang's
 7 | # repository.
 8 | 
 9 | map <- function(.x, .f, ...) {
10 |   lapply(.x, .f, ...)
11 | }
12 | 
13 | map2 <- function(.x, .y, .f, ...) {
14 |   out <- mapply(.f, .x, .y, MoreArgs = list(...), SIMPLIFY = FALSE)
15 |   if (length(out) == length(.x)) {
16 |     rlang::set_names(out, names(.x))
17 |   } else {
18 |     rlang::set_names(out, NULL)
19 |   }
20 | }
21 | 
22 | # nocov end
23 | 


--------------------------------------------------------------------------------
/R/compat-tidyr.R:
--------------------------------------------------------------------------------
 1 | # ------------------------------------------------------------------------------
 2 | # tidyr support
 3 | 
 4 | #' @importFrom tidyr nest
 5 | #' @importFrom lifecycle deprecated
 6 | #' @export
 7 | nest.resampled_df <- function(.data, ..., .key = deprecated()) {
 8 |   tidyr::nest(collect(.data), ..., .key = .key)
 9 | }
10 | 


--------------------------------------------------------------------------------
/R/samplify.R:
--------------------------------------------------------------------------------
  1 | #' Created a resampled tibble
  2 | #'
  3 | #' `samplify()` creates a resampled tibble with _virtual groups_.
  4 | #'
  5 | #' @details
  6 | #'
  7 | #' The following functions have special / interesting behavior when used with
  8 | #' a `resampled_df`:
  9 | #'
 10 | #' - [dplyr::collect()]
 11 | #'
 12 | #' - [dplyr::summarise()]
 13 | #'
 14 | #' - [dplyr::do()]
 15 | #'
 16 | #' - [dplyr::group_map()]
 17 | #'
 18 | #' - [dplyr::group_modify()]
 19 | #'
 20 | #' - [dplyr::group_walk()]
 21 | #'
 22 | #' - [dplyr::group_nest()]
 23 | #'
 24 | #' - [dplyr::group_split()]
 25 | #'
 26 | #' @param data A tbl.
 27 | #'
 28 | #' @param times A single integer specifying the number of resamples.
 29 | #' If the `tibble` is grouped, this is the number of resamples per group.
 30 | #'
 31 | #' @param size A single integer specifying the size of each resample. For a
 32 | #' grouped data frame, this is also allowed to be an integer vector with size
 33 | #' equal to the number of groups in `data`. This can be helpful when sampling
 34 | #' without replacement when the number of rows per group is very different.
 35 | #'
 36 | #' @param ... Not used.
 37 | #'
 38 | #' @param replace Whether or not to sample with replacement.
 39 | #'
 40 | #' @param key A single character specifying the name of the virtual group
 41 | #' that is added.
 42 | #'
 43 | #' @return A `resampled_df` with an extra group specified by the `key`.
 44 | #'
 45 | #' @examples
 46 | #' library(dplyr)
 47 | #' library(broom)
 48 | #'
 49 | #' samplify(iris, times = 3, size = 20)
 50 | #'
 51 | #' iris %>%
 52 | #'   samplify(times = 3, size = 20) %>%
 53 | #'   summarise(per_strap_mean = mean(Petal.Width))
 54 | #'
 55 | #' iris %>%
 56 | #'   group_by(Species) %>%
 57 | #'   samplify(times = 3, size = 20) %>%
 58 | #'   summarise(per_strap_species_mean = mean(Petal.Width))
 59 | #'
 60 | #' # Alter the name of the group with `key`
 61 | #' # Materialize them with collect()
 62 | #' samps <- samplify(iris, times = 3, size = 5, key = ".samps")
 63 | #' collect(samps)
 64 | #'
 65 | #' collect(samps, id = ".id", original_id = ".orig_id")
 66 | #'
 67 | #' #----------------------------------------------------------------------------
 68 | #'
 69 | #' # Be careful not to specify a `size` larger
 70 | #' # than one of your groups! This will throw an error.
 71 | #'
 72 | #' iris_group_sizes_of_50_and_5 <- iris[1:55,] %>%
 73 | #'   group_by(Species) %>%
 74 | #'   group_trim()
 75 | #'
 76 | #' count(iris_group_sizes_of_50_and_5, Species)
 77 | #'
 78 | #' # size = 10 > min_group_size = 5
 79 | #' \dontrun{
 80 | #' iris_group_sizes_of_50_and_5 %>%
 81 | #'   samplify(times = 2, size = 10)
 82 | #' }
 83 | #'
 84 | #' # Instead, pass a vector of sizes to `samplify()` if this
 85 | #' # structure is absolutely required for your use case.
 86 | #'
 87 | #' # size of 10 for the first group
 88 | #' # size of 5 for the second group
 89 | #' # total number of rows is 10 * 2 + 5 * 2 = 30
 90 | #' iris_group_sizes_of_50_and_5 %>%
 91 | #'   samplify(times = 2, size = c(10, 5)) %>%
 92 | #'   collect()
 93 | #'
 94 | #' @family virtual samplers
 95 | #'
 96 | #' @seealso [collect.resampled_df()]
 97 | #'
 98 | #' @name samplify
 99 | NULL
100 | 
101 | #' @rdname samplify
102 | #' @export
103 | samplify <- function(data, times, size, ...,
104 |                      replace = FALSE, key = ".sample") {
105 | 
106 |   check_empty_dots(...)
107 |   validate_is_scalar_character(key, "key")
108 |   validate_is_bool(replace, "replace")
109 |   validate_is_scalar_positive_integerish(times, "times")
110 |   validate_is_positive_integerish(size, "size")
111 | 
112 |   UseMethod("samplify")
113 | }
114 | 
115 | #' @export
116 | samplify.data.frame <- function(data, times, size, ...,
117 |                                 replace = FALSE, key = ".sample") {
118 |   samplify(
119 |     data = dplyr::as_tibble(data),
120 |     times = times,
121 |     size = size,
122 |     ...,
123 |     replace = replace,
124 |     key = key
125 |   )
126 | }
127 | 
128 | #' @export
129 | samplify.tbl_df <- function(data, times, size, ...,
130 |                             replace = FALSE, key = ".sample") {
131 | 
132 |   # additional check for size 1 here
133 |   validate_is(
134 |     size, "size", rlang::is_scalar_integerish,
135 |     "a single integer (for ungrouped data frames)"
136 |   )
137 | 
138 |   .row_slice_ids <- seq_len(nrow(data))
139 | 
140 |   group_info <- index_sampler(
141 |     .row_slice_ids = .row_slice_ids,
142 |     times = times,
143 |     key = key,
144 |     size = size,
145 |     replace = replace
146 |   )
147 | 
148 |   dplyr::new_grouped_df(data, group_info, class = "resampled_df")
149 | }
150 | 
151 | #' @export
152 | samplify.grouped_df <- function(data, times, size, ...,
153 |                                 replace = FALSE, key = ".sample") {
154 | 
155 |   size <- recycle_size(size, dplyr::n_groups(data))
156 | 
157 |   # extract existing group_info
158 |   group_info <- dplyr::group_data(data)
159 |   index_list <- group_info[[".rows"]]
160 | 
161 |   new_row_index_tbl <- map2(
162 |     .x = index_list,
163 |     .y = size,
164 |     .f = function(.x, .y) {
165 |       index_sampler(
166 |         .row_slice_ids = .x,
167 |         times = times,
168 |         key = key,
169 |         size = .y,
170 |         replace = replace
171 |       )
172 |     }
173 |   )
174 | 
175 |   # overwrite current .rows and unnest
176 |   group_info[[".rows"]] <- new_row_index_tbl
177 |   group_info <- tidyr::unnest(group_info, cols = .rows, names_repair = "universal")
178 | 
179 |   dplyr::new_grouped_df(data, group_info, class = "resampled_df")
180 | }
181 | 
182 | # ------------------------------------------------------------------------------
183 | # Utility
184 | 
185 | # Actually perform the resampling of the row indices
186 | # and create the group tbl information from that
187 | 
188 | index_sampler <- function(.row_slice_ids,
189 |                           times,
190 |                           key,
191 |                           size,
192 |                           replace = FALSE) {
193 | 
194 |   check_size(size, length(.row_slice_ids), replace)
195 | 
196 |   .bootstrap_id <- seq_len(times)
197 | 
198 |   # must unquote the colname as `.rows` is an arg to tibble()
199 |   .row_col <- ".rows"
200 | 
201 |   .index_list <- replicate(
202 |     n = times,
203 |     expr = sample(
204 |       x = .row_slice_ids,
205 |       size = size,
206 |       replace = replace,
207 |       prob = NULL
208 |     ),
209 |     simplify = FALSE
210 |   )
211 | 
212 |   dplyr::tibble(
213 |     !!key := .bootstrap_id,
214 |     !!.row_col := .index_list
215 |   )
216 | 
217 | }
218 | 
219 | # dplyr:::check_size()
220 | check_size <- function (size, n, replace = FALSE) {
221 |   if (size <= n || replace)
222 |     return(invisible(size))
223 | 
224 |   msg <- paste0(
225 |     "`size` (%i) must be less than or equal to the ",
226 |     "size of the data / current group (%i), ",
227 |     "set `replace = TRUE` to use sampling with replacement."
228 |   )
229 | 
230 |   stop(sprintf(msg, size, n), call. = FALSE)
231 | }
232 | 
233 | recycle_size <- function(.x, .n) {
234 | 
235 |   .n_x <- length(.x)
236 | 
237 |   if (.n_x == .n) {
238 |     return(.x)
239 |   }
240 | 
241 |   if (.n_x == 1L) {
242 |     .x <- rep(.x, times = .n)
243 |     return(.x)
244 |   }
245 | 
246 |   msg <- paste0("`size` must be size 1 or ", .n, " (the number of groups).")
247 |   rlang::abort(msg)
248 | }
249 | 
250 | # ------------------------------------------------------------------------------
251 | # Validation
252 | 
253 | validate_is <- function(.x, .x_nm, .is_f, .expected) {
254 | 
255 |   if (!.is_f(.x)) {
256 |     msg <- paste0("`", .x_nm, "` must be ", .expected, ".")
257 |     rlang::abort(msg)
258 |   }
259 | 
260 |   invisible(.x)
261 | }
262 | 
263 | validate_is_scalar_character <- function(.x, .x_nm) {
264 |   validate_is(.x, .x_nm, rlang::is_scalar_character, "a single character")
265 | }
266 | 
267 | is_bool <- function (x) {
268 |   rlang::is_logical(x, n = 1) && !is.na(x)
269 | }
270 | 
271 | validate_is_bool <- function(.x, .x_nm) {
272 |   validate_is(.x, .x_nm, is_bool, "a single logical (TRUE/FALSE)")
273 | }
274 | 
275 | is_positive <- function(x) {
276 |   isTRUE(all(x > 0))
277 | }
278 | 
279 | validate_is_scalar_positive_integerish <- function(.x, .x_nm) {
280 |   validate_is(.x, .x_nm, rlang::is_scalar_integerish, "a single integer")
281 |   validate_is(.x, .x_nm, is_positive, "a positive integer")
282 | }
283 | 
284 | validate_is_positive_integerish <- function(.x, .x_nm) {
285 |   validate_is(.x, .x_nm, rlang::is_integerish, "an integer vector")
286 |   validate_is(.x, .x_nm, is_positive, "a positive integer vector")
287 | }
288 | 


--------------------------------------------------------------------------------
/R/util.R:
--------------------------------------------------------------------------------
 1 | check_empty_dots <- function(...) {
 2 |   dots <- rlang::enquos(...)
 3 | 
 4 |   dots_are_empty <- length(dots) == 0L
 5 | 
 6 |   if (!dots_are_empty) {
 7 |     rlang::abort("`...` must be empty.")
 8 |   }
 9 | 
10 |   invisible()
11 | }
12 | 


--------------------------------------------------------------------------------
/R/zzz.R:
--------------------------------------------------------------------------------
 1 | #' @importFrom dplyr collect
 2 | NULL
 3 | 
 4 | #' @importFrom rlang :=
 5 | NULL
 6 | 
 7 | # For collect()
 8 | utils::globalVariables(
 9 |   c(
10 |     ".rows",
11 |     "...x"
12 |   )
13 | )
14 | 


--------------------------------------------------------------------------------
/README.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | output: github_document
  3 | editor_options: 
  4 |   chunk_output_type: console
  5 | ---
  6 | 
  7 | <!-- README.md is generated from README.Rmd. Please edit that file -->
  8 | 
  9 | ```{r setup, include = FALSE}
 10 | knitr::opts_chunk$set(
 11 |   collapse = TRUE,
 12 |   comment = "#>",
 13 |   fig.path = "man/figures/README-",
 14 |   out.width = "100%"
 15 | )
 16 | ```
 17 | # strapgod
 18 | 
 19 | <!-- badges: start -->
 20 | [![Codecov test coverage](https://codecov.io/gh/DavisVaughan/strapgod/branch/master/graph/badge.svg)](https://codecov.io/gh/DavisVaughan/strapgod?branch=master)
 21 | [![Travis build status](https://travis-ci.org/DavisVaughan/strapgod.svg?branch=master)](https://travis-ci.org/DavisVaughan/strapgod)
 22 | [![CRAN status](https://www.r-pkg.org/badges/version/strapgod)](https://cran.r-project.org/package=strapgod)
 23 | [![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
 24 | <!-- badges: end -->
 25 | 
 26 | <p align="center">
 27 |   <img src="./man/figures/strap-god.jpg">
 28 | </p>
 29 | 
 30 | ## Introduction
 31 | 
 32 | The goal of strapgod is to create _virtual groups_ on top of a `tibble` or `grouped_df` as a way of resampling the original data frame. You can then efficiently perform various dplyr operations on this `resampled_df`, like: `summarise()`, `do()`, `group_map()`, and more, to easily compute bootstrapped and resampled statistics.
 33 | 
 34 | ## Installation
 35 | 
 36 | You can install the released version of strapgod from [CRAN](https://CRAN.R-project.org) with:
 37 | 
 38 | ``` r
 39 | install.packages("strapgod")
 40 | ```
 41 | 
 42 | Install the development version from GitHub with:
 43 | 
 44 | ``` r
 45 | devtools::install_github("DavisVaughan/strapgod")
 46 | ```
 47 | 
 48 | ## Learning about strapgod
 49 | 
 50 | If you aren't already on the [pkgdown site](https://davisvaughan.github.io/strapgod/), I would encourage starting there. From there, you will be able to click on these two vignettes to learn about working with resampled tibbles.
 51 | 
 52 | - `vignette("virtual-bootstraps", "strapgod")`
 53 | 
 54 | - `vignette("dplyr-support", "strapgod")`
 55 | 
 56 | ## Example
 57 | 
 58 | Create resampled data frames with `bootstrapify()` or `samplify()`. Notice how we grouped by the _virtual_ column, `.bootstrap` and there are still only 150 rows even though we bootstrapped this dataset 10 times.
 59 | 
 60 | ```{r first-strap, warning=FALSE, message=FALSE}
 61 | library(strapgod)
 62 | library(dplyr)
 63 | set.seed(123)
 64 | 
 65 | bootstrapify(iris, 10)
 66 | ```
 67 | 
 68 | You can feed a `resampled_df` into `summarise()` or `group_modify()` to perform
 69 | efficient bootstrapped computations.
 70 | 
 71 | ```{r summarise}
 72 | iris %>%
 73 |   bootstrapify(10) %>%
 74 |   summarise(per_strap_mean = mean(Petal.Width))
 75 | ```
 76 | 
 77 | The original data can be grouped as well, and the bootstraps will be created for each group.
 78 | 
 79 | ```{r summarise-by-group}
 80 | iris %>%
 81 |   group_by(Species) %>%
 82 |   bootstrapify(10) %>%
 83 |   summarise(per_strap_per_species_mean = mean(Petal.Width))
 84 | ```
 85 | 
 86 | ## Plotting bootstrapped results
 87 | 
 88 | A fun example of using strapgod is to create bootstrapped visualizations quickly and easily for hypothetical outcome plots. 
 89 | 
 90 | ```{r non-bootstrap-plots, warning=FALSE, message=FALSE}
 91 | set.seed(123)
 92 | library(ggplot2)
 93 | 
 94 | # without bootstrap
 95 | mtcars %>%
 96 |   ggplot(aes(hp, mpg)) + 
 97 |   geom_smooth(se = FALSE) +
 98 |   ylim(y = c(0, 40))
 99 | ```
100 | 
101 | ```{r bootstrap-plots, warning=FALSE, message=FALSE}
102 | # with bootstrap
103 | mtcars %>%
104 |   bootstrapify(10) %>%
105 |   collect() %>%
106 |   ggplot(aes(hp, mpg, group = .bootstrap)) + 
107 |   geom_smooth(se = FALSE) +
108 |   ylim(y = c(0, 40))
109 | ```
110 | 
111 | ## In the wild
112 | 
113 | - Claus Wilke has used strapgod to power some pieces of his [ungeviz](https://github.com/wilkelab/ungeviz) package for visualizing uncertainty.
114 | 
115 | - You can watch Claus's [rstudio::conf 2019](https://resources.rstudio.com/rstudio-conf-2019/visualizing-uncertainty-with-hypothetical-outcomes-plots) talk to see ungeviz and strapgod in action.
116 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | 
  2 | <!-- README.md is generated from README.Rmd. Please edit that file -->
  3 | 
  4 | # strapgod
  5 | 
  6 | <!-- badges: start -->
  7 | 
  8 | [![Codecov test
  9 | coverage](https://codecov.io/gh/DavisVaughan/strapgod/branch/master/graph/badge.svg)](https://codecov.io/gh/DavisVaughan/strapgod?branch=master)
 10 | [![Travis build
 11 | status](https://travis-ci.org/DavisVaughan/strapgod.svg?branch=master)](https://travis-ci.org/DavisVaughan/strapgod)
 12 | [![CRAN
 13 | status](https://www.r-pkg.org/badges/version/strapgod)](https://cran.r-project.org/package=strapgod)
 14 | [![Lifecycle:
 15 | experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
 16 | <!-- badges: end -->
 17 | 
 18 | <p align="center">
 19 | 
 20 | <img src="./man/figures/strap-god.jpg">
 21 | 
 22 | </p>
 23 | 
 24 | ## Introduction
 25 | 
 26 | The goal of strapgod is to create *virtual groups* on top of a `tibble`
 27 | or `grouped_df` as a way of resampling the original data frame. You can
 28 | then efficiently perform various dplyr operations on this
 29 | `resampled_df`, like: `summarise()`, `do()`, `group_map()`, and more, to
 30 | easily compute bootstrapped and resampled statistics.
 31 | 
 32 | ## Installation
 33 | 
 34 | You can install the released version of strapgod from
 35 | [CRAN](https://CRAN.R-project.org) with:
 36 | 
 37 | ``` r
 38 | install.packages("strapgod")
 39 | ```
 40 | 
 41 | Install the development version from GitHub with:
 42 | 
 43 | ``` r
 44 | devtools::install_github("DavisVaughan/strapgod")
 45 | ```
 46 | 
 47 | ## Learning about strapgod
 48 | 
 49 | If you aren’t already on the [pkgdown
 50 | site](https://davisvaughan.github.io/strapgod/), I would encourage
 51 | starting there. From there, you will be able to click on these two
 52 | vignettes to learn about working with resampled tibbles.
 53 | 
 54 |   - `vignette("virtual-bootstraps", "strapgod")`
 55 | 
 56 |   - `vignette("dplyr-support", "strapgod")`
 57 | 
 58 | ## Example
 59 | 
 60 | Create resampled data frames with `bootstrapify()` or `samplify()`.
 61 | Notice how we grouped by the *virtual* column, `.bootstrap` and there
 62 | are still only 150 rows even though we bootstrapped this dataset 10
 63 | times.
 64 | 
 65 | ``` r
 66 | library(strapgod)
 67 | library(dplyr)
 68 | set.seed(123)
 69 | 
 70 | bootstrapify(iris, 10)
 71 | #> # A tibble: 150 x 5
 72 | #> # Groups:   .bootstrap [10]
 73 | #>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
 74 | #>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 75 | #>  1          5.1         3.5          1.4         0.2 setosa 
 76 | #>  2          4.9         3            1.4         0.2 setosa 
 77 | #>  3          4.7         3.2          1.3         0.2 setosa 
 78 | #>  4          4.6         3.1          1.5         0.2 setosa 
 79 | #>  5          5           3.6          1.4         0.2 setosa 
 80 | #>  6          5.4         3.9          1.7         0.4 setosa 
 81 | #>  7          4.6         3.4          1.4         0.3 setosa 
 82 | #>  8          5           3.4          1.5         0.2 setosa 
 83 | #>  9          4.4         2.9          1.4         0.2 setosa 
 84 | #> 10          4.9         3.1          1.5         0.1 setosa 
 85 | #> # … with 140 more rows
 86 | ```
 87 | 
 88 | You can feed a `resampled_df` into `summarise()` or `group_map()` to
 89 | perform efficient bootstrapped computations.
 90 | 
 91 | ``` r
 92 | iris %>%
 93 |   bootstrapify(10) %>%
 94 |   summarise(per_strap_mean = mean(Petal.Width))
 95 | #> # A tibble: 10 x 2
 96 | #>    .bootstrap per_strap_mean
 97 | #>         <int>          <dbl>
 98 | #>  1          1           1.20
 99 | #>  2          2           1.22
100 | #>  3          3           1.23
101 | #>  4          4           1.13
102 | #>  5          5           1.20
103 | #>  6          6           1.15
104 | #>  7          7           1.18
105 | #>  8          8           1.13
106 | #>  9          9           1.31
107 | #> 10         10           1.19
108 | ```
109 | 
110 | The original data can be grouped as well, and the bootstraps will be
111 | created for each group.
112 | 
113 | ``` r
114 | iris %>%
115 |   group_by(Species) %>%
116 |   bootstrapify(10) %>%
117 |   summarise(per_strap_per_species_mean = mean(Petal.Width))
118 | #> # A tibble: 30 x 3
119 | #> # Groups:   Species [3]
120 | #>    Species .bootstrap per_strap_per_species_mean
121 | #>    <fct>        <int>                      <dbl>
122 | #>  1 setosa           1                      0.25 
123 | #>  2 setosa           2                      0.246
124 | #>  3 setosa           3                      0.24 
125 | #>  4 setosa           4                      0.238
126 | #>  5 setosa           5                      0.252
127 | #>  6 setosa           6                      0.274
128 | #>  7 setosa           7                      0.238
129 | #>  8 setosa           8                      0.258
130 | #>  9 setosa           9                      0.252
131 | #> 10 setosa          10                      0.256
132 | #> # … with 20 more rows
133 | ```
134 | 
135 | ## Plotting bootstrapped results
136 | 
137 | A fun example of using strapgod is to create bootstrapped visualizations
138 | quickly and easily for hypothetical outcome plots.
139 | 
140 | ``` r
141 | set.seed(123)
142 | library(ggplot2)
143 | 
144 | # without bootstrap
145 | mtcars %>%
146 |   ggplot(aes(hp, mpg)) + 
147 |   geom_smooth(se = FALSE) +
148 |   ylim(y = c(0, 40))
149 | ```
150 | 
151 | <img src="man/figures/README-non-bootstrap-plots-1.png" width="100%" />
152 | 
153 | ``` r
154 | # with bootstrap
155 | mtcars %>%
156 |   bootstrapify(10) %>%
157 |   collect() %>%
158 |   ggplot(aes(hp, mpg, group = .bootstrap)) + 
159 |   geom_smooth(se = FALSE) +
160 |   ylim(y = c(0, 40))
161 | ```
162 | 
163 | <img src="man/figures/README-bootstrap-plots-1.png" width="100%" />
164 | 
165 | ## In the wild
166 | 
167 |   - Claus Wilke has used strapgod to power some pieces of his
168 |     [ungeviz](https://github.com/wilkelab/ungeviz) package for
169 |     visualizing uncertainty.
170 | 
171 |   - You can watch Claus’s
172 |     [rstudio::conf 2019](https://resources.rstudio.com/rstudio-conf-2019/visualizing-uncertainty-with-hypothetical-outcomes-plots)
173 |     talk to see ungeviz and strapgod in action.
174 | 


--------------------------------------------------------------------------------
/_pkgdown.yml:
--------------------------------------------------------------------------------
1 | destination: docs
2 | 


--------------------------------------------------------------------------------
/codecov.yml:
--------------------------------------------------------------------------------
 1 | comment: false
 2 | 
 3 | coverage:
 4 |   status:
 5 |     project:
 6 |       default:
 7 |         target: auto
 8 |         threshold: 1%
 9 |     patch:
10 |       default:
11 |         target: auto
12 |         threshold: 1%
13 | 


--------------------------------------------------------------------------------
/cran-comments.md:
--------------------------------------------------------------------------------
 1 | ## Comments
 2 | This is the fourth release of this package.
 3 | 
 4 | This release requires tidyr 1.0.0 and updates the `resampled_df` method for
 5 | `tidyr::nest()`.
 6 | 
 7 | ## Test environments
 8 | * local OS X install, R 3.6.0
 9 | * ubuntu 14.04 (on travis-ci) (devel and release)
10 | * win-builder (devel and release)
11 | 
12 | ## R CMD check results
13 | 
14 | 0 errors | 0 warnings | 0 notes
15 | 


--------------------------------------------------------------------------------
/man/bootstrapify.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/bootstrapify.R
 3 | \name{bootstrapify}
 4 | \alias{bootstrapify}
 5 | \title{Create a bootstrapped tibble}
 6 | \usage{
 7 | bootstrapify(data, times, ..., key = ".bootstrap")
 8 | }
 9 | \arguments{
10 | \item{data}{A tbl.}
11 | 
12 | \item{times}{A single integer specifying the number of resamples.
13 | If the \code{tibble} is grouped, this is the number of resamples per group.}
14 | 
15 | \item{...}{Not used.}
16 | 
17 | \item{key}{A single character specifying the name of the virtual group
18 | that is added.}
19 | }
20 | \value{
21 | A \code{resampled_df} with an extra group specified by the \code{key}.
22 | }
23 | \description{
24 | \code{bootstrapify()} creates a bootstrapped tibble with \emph{virtual groups}.
25 | }
26 | \details{
27 | The following functions have special / interesting behavior when used with
28 | a \code{resampled_df}:
29 | \itemize{
30 | \item \code{\link[dplyr:collect]{dplyr::collect()}}
31 | \item \code{\link[dplyr:summarise]{dplyr::summarise()}}
32 | \item \code{\link[dplyr:do]{dplyr::do()}}
33 | \item \code{\link[dplyr:group_map]{dplyr::group_map()}}
34 | \item \code{\link[dplyr:group_modify]{dplyr::group_modify()}}
35 | \item \code{\link[dplyr:group_walk]{dplyr::group_walk()}}
36 | \item \code{\link[dplyr:group_nest]{dplyr::group_nest()}}
37 | \item \code{\link[dplyr:group_split]{dplyr::group_split()}}
38 | }
39 | }
40 | \examples{
41 | library(dplyr)
42 | library(broom)
43 | 
44 | bootstrapify(iris, 5)
45 | 
46 | iris \%>\%
47 |   bootstrapify(5) \%>\%
48 |   summarise(per_strap_mean = mean(Petal.Width))
49 | 
50 | iris \%>\%
51 |   group_by(Species) \%>\%
52 |   bootstrapify(5) \%>\%
53 |   summarise(per_strap_species_mean = mean(Petal.Width))
54 | 
55 | iris \%>\%
56 |   bootstrapify(5) \%>\%
57 |   do(tidy(lm(Sepal.Width ~ Sepal.Length + Species, data = .)))
58 | 
59 | # Alternatively, use the newer group_modify()
60 | iris \%>\%
61 |   bootstrapify(5) \%>\%
62 |   group_modify(~tidy(lm(Sepal.Width ~ Sepal.Length + Species, data = .x)))
63 | 
64 | # Alter the name of the group with `key`
65 | # Materialize them with collect()
66 | straps <- bootstrapify(iris, 5, key = ".straps")
67 | collect(straps)
68 | 
69 | }
70 | \seealso{
71 | \code{\link[=collect.resampled_df]{collect.resampled_df()}}
72 | 
73 | Other virtual samplers: \code{\link{samplify}}
74 | }
75 | \concept{virtual samplers}
76 | 


--------------------------------------------------------------------------------
/man/collect.resampled_df.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/collect.R
 3 | \name{collect.resampled_df}
 4 | \alias{collect.resampled_df}
 5 | \title{Force virtual groups to become explicit rows}
 6 | \usage{
 7 | \method{collect}{resampled_df}(x, ..., id = NULL, original_id = NULL)
 8 | }
 9 | \arguments{
10 | \item{x}{A \code{resampled_df}.}
11 | 
12 | \item{...}{Not used.}
13 | 
14 | \item{id}{Optional. A single character that specifies a name for a column
15 | containing a sequence from \code{1:n} for each bootstrap group.}
16 | 
17 | \item{original_id}{Optional. A single character that specifies a name for
18 | a column containing the original position of the bootstrapped row.}
19 | }
20 | \description{
21 | When \code{collect()} is used on a \code{resampled_df}, the virtual bootstrap groups
22 | are made explicit.
23 | }
24 | \examples{
25 | library(dplyr)
26 | 
27 | # virtual groups become real rows
28 | collect(bootstrapify(iris, 5))
29 | 
30 | # add on the id column for an identifier per bootstrap
31 | collect(bootstrapify(iris, 5), id = ".id")
32 | 
33 | # add on the original_id column to know which row this bootstrapped row
34 | # originally came from
35 | collect(bootstrapify(iris, 5), original_id = ".original_id")
36 | 
37 | }
38 | 


--------------------------------------------------------------------------------
/man/figures/README-bootstrap-plots-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavisVaughan/strapgod/ea2b1ecfc780a44ffa934c9bc7c2032954f4ffaa/man/figures/README-bootstrap-plots-1.png


--------------------------------------------------------------------------------
/man/figures/README-non-bootstrap-plots-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavisVaughan/strapgod/ea2b1ecfc780a44ffa934c9bc7c2032954f4ffaa/man/figures/README-non-bootstrap-plots-1.png


--------------------------------------------------------------------------------
/man/figures/strap-god.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavisVaughan/strapgod/ea2b1ecfc780a44ffa934c9bc7c2032954f4ffaa/man/figures/strap-god.jpg


--------------------------------------------------------------------------------
/man/reexports.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/compat-dplyr.R
 3 | \docType{import}
 4 | \name{reexports}
 5 | \alias{reexports}
 6 | \alias{filter}
 7 | \title{Objects exported from other packages}
 8 | \keyword{internal}
 9 | \description{
10 | These objects are imported from other packages. Follow the links
11 | below to see their documentation.
12 | 
13 | \describe{
14 |   \item{dplyr}{\code{\link[dplyr]{filter}}}
15 | }}
16 | 
17 | 


--------------------------------------------------------------------------------
/man/samplify.Rd:
--------------------------------------------------------------------------------
  1 | % Generated by roxygen2: do not edit by hand
  2 | % Please edit documentation in R/samplify.R
  3 | \name{samplify}
  4 | \alias{samplify}
  5 | \title{Created a resampled tibble}
  6 | \usage{
  7 | samplify(data, times, size, ..., replace = FALSE, key = ".sample")
  8 | }
  9 | \arguments{
 10 | \item{data}{A tbl.}
 11 | 
 12 | \item{times}{A single integer specifying the number of resamples.
 13 | If the \code{tibble} is grouped, this is the number of resamples per group.}
 14 | 
 15 | \item{size}{A single integer specifying the size of each resample. For a
 16 | grouped data frame, this is also allowed to be an integer vector with size
 17 | equal to the number of groups in \code{data}. This can be helpful when sampling
 18 | without replacement when the number of rows per group is very different.}
 19 | 
 20 | \item{...}{Not used.}
 21 | 
 22 | \item{replace}{Whether or not to sample with replacement.}
 23 | 
 24 | \item{key}{A single character specifying the name of the virtual group
 25 | that is added.}
 26 | }
 27 | \value{
 28 | A \code{resampled_df} with an extra group specified by the \code{key}.
 29 | }
 30 | \description{
 31 | \code{samplify()} creates a resampled tibble with \emph{virtual groups}.
 32 | }
 33 | \details{
 34 | The following functions have special / interesting behavior when used with
 35 | a \code{resampled_df}:
 36 | \itemize{
 37 | \item \code{\link[dplyr:collect]{dplyr::collect()}}
 38 | \item \code{\link[dplyr:summarise]{dplyr::summarise()}}
 39 | \item \code{\link[dplyr:do]{dplyr::do()}}
 40 | \item \code{\link[dplyr:group_map]{dplyr::group_map()}}
 41 | \item \code{\link[dplyr:group_modify]{dplyr::group_modify()}}
 42 | \item \code{\link[dplyr:group_walk]{dplyr::group_walk()}}
 43 | \item \code{\link[dplyr:group_nest]{dplyr::group_nest()}}
 44 | \item \code{\link[dplyr:group_split]{dplyr::group_split()}}
 45 | }
 46 | }
 47 | \examples{
 48 | library(dplyr)
 49 | library(broom)
 50 | 
 51 | samplify(iris, times = 3, size = 20)
 52 | 
 53 | iris \%>\%
 54 |   samplify(times = 3, size = 20) \%>\%
 55 |   summarise(per_strap_mean = mean(Petal.Width))
 56 | 
 57 | iris \%>\%
 58 |   group_by(Species) \%>\%
 59 |   samplify(times = 3, size = 20) \%>\%
 60 |   summarise(per_strap_species_mean = mean(Petal.Width))
 61 | 
 62 | # Alter the name of the group with `key`
 63 | # Materialize them with collect()
 64 | samps <- samplify(iris, times = 3, size = 5, key = ".samps")
 65 | collect(samps)
 66 | 
 67 | collect(samps, id = ".id", original_id = ".orig_id")
 68 | 
 69 | #----------------------------------------------------------------------------
 70 | 
 71 | # Be careful not to specify a `size` larger
 72 | # than one of your groups! This will throw an error.
 73 | 
 74 | iris_group_sizes_of_50_and_5 <- iris[1:55,] \%>\%
 75 |   group_by(Species) \%>\%
 76 |   group_trim()
 77 | 
 78 | count(iris_group_sizes_of_50_and_5, Species)
 79 | 
 80 | # size = 10 > min_group_size = 5
 81 | \dontrun{
 82 | iris_group_sizes_of_50_and_5 \%>\%
 83 |   samplify(times = 2, size = 10)
 84 | }
 85 | 
 86 | # Instead, pass a vector of sizes to `samplify()` if this
 87 | # structure is absolutely required for your use case.
 88 | 
 89 | # size of 10 for the first group
 90 | # size of 5 for the second group
 91 | # total number of rows is 10 * 2 + 5 * 2 = 30
 92 | iris_group_sizes_of_50_and_5 \%>\%
 93 |   samplify(times = 2, size = c(10, 5)) \%>\%
 94 |   collect()
 95 | 
 96 | }
 97 | \seealso{
 98 | \code{\link[=collect.resampled_df]{collect.resampled_df()}}
 99 | 
100 | Other virtual samplers: \code{\link{bootstrapify}}
101 | }
102 | \concept{virtual samplers}
103 | 


--------------------------------------------------------------------------------
/strapgod.Rproj:
--------------------------------------------------------------------------------
 1 | Version: 1.0
 2 | 
 3 | RestoreWorkspace: No
 4 | SaveWorkspace: No
 5 | AlwaysSaveHistory: Default
 6 | 
 7 | EnableCodeIndexing: Yes
 8 | UseSpacesForTab: Yes
 9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 | 
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 | 
15 | AutoAppendNewline: Yes
16 | StripTrailingWhitespace: Yes
17 | 
18 | BuildType: Package
19 | PackageUseDevtools: Yes
20 | PackageInstallArgs: --no-multiarch --with-keep.source
21 | PackageRoxygenize: rd,collate,namespace
22 | 


--------------------------------------------------------------------------------
/tests/testthat.R:
--------------------------------------------------------------------------------
1 | library(testthat)
2 | library(strapgod)
3 | 
4 | test_check("strapgod")
5 | 


--------------------------------------------------------------------------------
/tests/testthat/test-bootstrapify.R:
--------------------------------------------------------------------------------
  1 | context("test-bootstrapify")
  2 | 
  3 | library(dplyr)
  4 | 
  5 | test_that("can create bootstrapped data frames", {
  6 | 
  7 |   expect_error(
  8 |     x <- bootstrapify(iris, 2),
  9 |     NA
 10 |   )
 11 | 
 12 |   expect_is(x, "resampled_df")
 13 |   expect_is(x, "grouped_df")
 14 | 
 15 |   x_gd <- group_data(x)
 16 | 
 17 |   expect_equal(
 18 |     colnames(x_gd),
 19 |     c(".bootstrap", ".rows")
 20 |   )
 21 | 
 22 |   expect_equal(
 23 |     x_gd[[".bootstrap"]],
 24 |     c(1, 2)
 25 |   )
 26 | 
 27 |   expect_equal(
 28 |     nrow(x_gd),
 29 |     2
 30 |   )
 31 | 
 32 |   expect_equal(
 33 |     unique(vapply(x_gd$.rows, length, integer(1))),
 34 |     150
 35 |   )
 36 | 
 37 | })
 38 | 
 39 | test_that("can bootstrap with groups", {
 40 | 
 41 |   x <- iris %>%
 42 |     group_by(Species) %>%
 43 |     bootstrapify(5)
 44 | 
 45 |   x_gd <- group_data(x)
 46 | 
 47 |   expect_equal(
 48 |     colnames(x_gd),
 49 |     c("Species", ".bootstrap", ".rows")
 50 |   )
 51 | 
 52 |   expect_equal(
 53 |     nrow(x_gd),
 54 |     15
 55 |   )
 56 | 
 57 |   expect_equal(
 58 |     unique(vapply(x_gd$.rows, length, integer(1))),
 59 |     50
 60 |   )
 61 | 
 62 | })
 63 | 
 64 | test_that("can correctly double bootstrap", {
 65 | 
 66 |   once <- iris %>%
 67 |     bootstrapify(5)
 68 | 
 69 |   twice <- once %>%
 70 |     bootstrapify(5, key = ".bootstrap1")
 71 | 
 72 |   once_gd <- group_data(once)
 73 |   twice_gd <- group_data(twice)
 74 | 
 75 |   once_strap_1 <- once_gd$.rows[[1]]
 76 | 
 77 |   twice_1 <- dplyr::filter(twice_gd, .bootstrap == 1)
 78 | 
 79 |   # Check that the indices generated from bootstrapping the second
 80 |   # time are subsets of the indices in each group from the first
 81 |   # bootstrap
 82 | 
 83 |   each_twice_is_subset_of_once <- vapply(
 84 |     X = twice_1$.rows,
 85 |     FUN = function(x) {all(x %in% once_strap_1)},
 86 |     FUN.VALUE = logical(1)
 87 |   )
 88 | 
 89 |   expect_true(all(each_twice_is_subset_of_once))
 90 | 
 91 |   expect_equal(
 92 |     colnames(twice_gd),
 93 |     c(".bootstrap", ".bootstrap1", ".rows")
 94 |   )
 95 | 
 96 | })
 97 | 
 98 | test_that("universal name repair kicks in with double bootstrap", {
 99 |   once <- iris %>%
100 |     bootstrapify(5)
101 | 
102 |   # Expect universal name repair
103 |   expect_message(
104 |     twice <- once %>%
105 |       bootstrapify(5),
106 |     ".bootstrap -> .bootstrap...1"
107 |   )
108 | 
109 |   expect_equal(
110 |     colnames(group_data(twice)),
111 |     c(".bootstrap...1", ".bootstrap...2", ".rows")
112 |   )
113 | })
114 | 
115 | test_that("can alter the key", {
116 |   x <- bootstrapify(iris, 5, key = ".boot")
117 | 
118 |   expect_equal(
119 |     colnames(group_data(x))[1],
120 |     ".boot"
121 |   )
122 | })
123 | 
124 | test_that("cannot pass into the `...`", {
125 |   expect_error(
126 |     bootstrapify(iris, 5, "key-in-dots"),
127 |     "`...` must be empty"
128 |   )
129 | })
130 | 
131 | test_that("invalid inputs are caught", {
132 |   expect_error(
133 |     bootstrapify(iris, 5, key = 1),
134 |     "`key` must be a single character."
135 |   )
136 | })
137 | 


--------------------------------------------------------------------------------
/tests/testthat/test-collect.R:
--------------------------------------------------------------------------------
  1 | context("test-collect")
  2 | 
  3 | library(dplyr)
  4 | 
  5 | test_that("can collect()", {
  6 | 
  7 |   x <- bootstrapify(iris, 5)
  8 | 
  9 |   expect_error(
 10 |     x_c <- collect(x),
 11 |     NA
 12 |   )
 13 | 
 14 |   expect_equal(
 15 |     nrow(x_c),
 16 |     750
 17 |   )
 18 | 
 19 |   expect_is(
 20 |     x_c,
 21 |     "grouped_df"
 22 |   )
 23 | 
 24 |   expect_false("resample_df" %in% class(x_c))
 25 | 
 26 |   expect_equal(
 27 |     colnames(x_c),
 28 |     c(".bootstrap", colnames(iris))
 29 |   )
 30 | 
 31 | })
 32 | 
 33 | test_that("can do a double bootstrap collect", {
 34 | 
 35 |   x <- iris %>%
 36 |     bootstrapify(5) %>%
 37 |     bootstrapify(10, key = ".bootstrap_2")
 38 | 
 39 |   x_c <- collect(x)
 40 | 
 41 |   expect_equal(
 42 |     nrow(x_c),
 43 |     nrow(iris) * 5 * 10
 44 |   )
 45 | 
 46 |   expect_equal(
 47 |     x_c$.bootstrap,
 48 |     rep(1:5, each = nrow(iris) * 10)
 49 |   )
 50 | 
 51 |   expect_equal(
 52 |     x_c$.bootstrap_2,
 53 |     rep(1:10, each = nrow(iris), times = 5)
 54 |   )
 55 | 
 56 | })
 57 | 
 58 | test_that("all groups are preserved", {
 59 | 
 60 |   iris_g <- iris %>%
 61 |     group_by(Species)
 62 | 
 63 |   x <- bootstrapify(iris_g, 5)
 64 | 
 65 |   x_c <- collect(x)
 66 | 
 67 |   expect_equal(
 68 |     group_vars(x_c),
 69 |     c("Species", ".bootstrap")
 70 |   )
 71 | 
 72 |   expect_equal(
 73 |     nrow(x_c),
 74 |     sum(group_size(iris_g) * 5)
 75 |   )
 76 | 
 77 | })
 78 | 
 79 | test_that("`key` is propagated to `collect()`", {
 80 | 
 81 |   x <- bootstrapify(iris, 5, key = ".boot")
 82 | 
 83 |   expect_equal(
 84 |     colnames(collect(x))[1],
 85 |     ".boot"
 86 |   )
 87 | 
 88 | })
 89 | 
 90 | test_that("can collect with `id`", {
 91 | 
 92 |   x <- bootstrapify(iris, 5)
 93 | 
 94 |   x_c <- collect(x, id = ".id")
 95 | 
 96 |   expect_equal(
 97 |     colnames(x_c),
 98 |     c(".bootstrap", ".id", colnames(iris))
 99 |   )
100 | 
101 |   expect_equal(
102 |     x_c$.id,
103 |     rep(1:150, times = 5)
104 |   )
105 | 
106 | })
107 | 
108 | test_that("can collect with `original_id`", {
109 | 
110 |   x <- bootstrapify(iris, 5)
111 | 
112 |   x_c <- collect(x, original_id = ".o_id")
113 | 
114 |   expect_equal(
115 |     colnames(x_c),
116 |     c(".bootstrap", ".o_id", colnames(iris))
117 |   )
118 | 
119 |   expect_equal(
120 |     x_c$.o_id,
121 |     unlist(group_data(x)$.rows)
122 |   )
123 | 
124 | })
125 | 
126 | test_that("can collect with `id` and `original_id`", {
127 | 
128 |   x <- bootstrapify(iris, 5)
129 | 
130 |   x_c <- collect(x, id = ".id", original_id = ".o_id")
131 | 
132 |   # want this column order. most intuitive.
133 |   expect_equal(
134 |     colnames(x_c),
135 |     c(".bootstrap", ".id", ".o_id", colnames(iris))
136 |   )
137 | 
138 | })
139 | 
140 | test_that("`id` must be a single character", {
141 | 
142 |   x <- bootstrapify(iris, 5)
143 | 
144 |   expect_error(
145 |     collect(x, id = 1),
146 |     "`id` must be a character of size 1."
147 |   )
148 | 
149 | })
150 | 
151 | test_that("`original_id` must be a single character", {
152 | 
153 |   x <- bootstrapify(iris, 5)
154 | 
155 |   expect_error(
156 |     collect(x, original_id = 1),
157 |     "`original_id` must be a character of size 1."
158 |   )
159 | 
160 | })
161 | 


--------------------------------------------------------------------------------
/tests/testthat/test-compat-tidyr.R:
--------------------------------------------------------------------------------
 1 | context("test-compat-tidyr")
 2 | 
 3 | test_that("nest()", {
 4 | 
 5 |   x <- bootstrapify(iris, 5)
 6 | 
 7 |   expect_error(
 8 |     x_n <- nest(x),
 9 |     NA
10 |   )
11 | 
12 |   expect_equal(
13 |     x_n$.bootstrap,
14 |     1:5
15 |   )
16 | 
17 |   expect_equal(
18 |     colnames(nest(x, .x = -.bootstrap)),
19 |     c(".bootstrap", ".x")
20 |   )
21 | 
22 | })
23 | 


--------------------------------------------------------------------------------
/tests/testthat/test-dplyr-compat.R:
--------------------------------------------------------------------------------
  1 | context("test-dplyr-compat")
  2 | 
  3 | library(dplyr)
  4 | 
  5 | test_that("can ungroup() to lose `resampled_df`", {
  6 | 
  7 |   x <- bootstrapify(iris, 5)
  8 | 
  9 |   x_ug <- ungroup(x)
 10 | 
 11 |   expect_is(x_ug, "tbl_df")
 12 |   expect_false("resampled_df" %in% class(x_ug))
 13 |   expect_equal(attr(x_ug, "groups"), NULL)
 14 | 
 15 | })
 16 | 
 17 | test_that("can group_by() after bootstrapify()", {
 18 | 
 19 |   x <- bootstrapify(iris, 5)
 20 | 
 21 |   x_g1 <- group_by(x, Species)
 22 | 
 23 |   expect_is(x_g1, "grouped_df")
 24 |   expect_false("resampled_df" %in% class(x_g1))
 25 |   expect_false(".bootstrap" %in% colnames(x_g1))
 26 | 
 27 |   x_g2 <- group_by(x, Species, add = TRUE)
 28 | 
 29 |   expect_is(x_g2, "grouped_df")
 30 |   expect_false("resampled_df" %in% class(x_g2))
 31 |   expect_true(".bootstrap" %in% colnames(x_g2))
 32 |   expect_equal(group_vars(x_g2), c(".bootstrap", "Species"))
 33 | 
 34 |   expect_equal(
 35 |     nrow(x_g2),
 36 |     5 * 150
 37 |   )
 38 | 
 39 |   x_g3 <- group_by(x, add = TRUE)
 40 | 
 41 |   expect_is(x_g3, "grouped_df")
 42 |   expect_false("resampled_df" %in% class(x_g3))
 43 |   expect_true(".bootstrap" %in% colnames(x_g3))
 44 |   expect_equal(group_vars(x_g3), ".bootstrap")
 45 | 
 46 |   x_g4 <- group_by(x)
 47 | 
 48 |   expect_is(x_g4, "tbl_df")
 49 |   expect_false("grouped_df" %in% class(x_g4))
 50 |   expect_false("resampled_df" %in% class(x_g4))
 51 |   expect_false(".bootstrap" %in% colnames(x_g4))
 52 | 
 53 |   # We know we get an error here
 54 |   # as the bootstraps aren't materialized
 55 |   expect_error(
 56 |     group_by(x, .bootstrap)
 57 |   )
 58 | 
 59 | })
 60 | 
 61 | test_that("mutate()", {
 62 |   x <- bootstrapify(iris, 2)
 63 |   expect_equal(
 64 |     nrow(mutate(x, x = 4)),
 65 |     300
 66 |   )
 67 | })
 68 | 
 69 | test_that("transmute()", {
 70 |   x <- bootstrapify(iris, 2)
 71 |   expect_equal(
 72 |     nrow(transmute(x, x = 4)),
 73 |     300
 74 |   )
 75 | })
 76 | 
 77 | test_that("filter()", {
 78 |   x <- bootstrapify(iris, 2)
 79 |   expect_equal(
 80 |     nrow(filter(x, .bootstrap <= 1)),
 81 |     150
 82 |   )
 83 | })
 84 | 
 85 | test_that("arrange()", {
 86 |   x <- bootstrapify(iris, 2)
 87 | 
 88 |   expect_equal(
 89 |     nrow(arrange(x, desc(.bootstrap))),
 90 |     300
 91 |   )
 92 | 
 93 |   expect_equal(
 94 |     arrange(x, desc(.bootstrap))$.bootstrap,
 95 |     rep(c(2,1), each = 150)
 96 |   )
 97 | })
 98 | 
 99 | test_that("distinct()", {
100 |   x <- bootstrapify(iris, 2)
101 | 
102 |   expect_equal(
103 |     colnames(distinct(x)),
104 |     c(".bootstrap", colnames(iris))
105 |   )
106 | 
107 |   expect_equal(
108 |     distinct(x, .bootstrap),
109 |     group_by(tibble(.bootstrap = 1:2), .bootstrap)
110 |   )
111 | 
112 | })
113 | 
114 | test_that("select()", {
115 |   x <- bootstrapify(iris, 2)
116 | 
117 |   expect_equal(
118 |     colnames(select(x, group_cols())),
119 |     ".bootstrap"
120 |   )
121 | 
122 |   expect_equal(
123 |     nrow(select(x, group_cols())),
124 |     300
125 |   )
126 | 
127 | })
128 | 
129 | test_that("slice()", {
130 |   x <- bootstrapify(iris, 2)
131 | 
132 |   expect_equal(
133 |     nrow(slice(x, 1)),
134 |     2
135 |   )
136 | })
137 | 
138 | test_that("pull()", {
139 |   x <- bootstrapify(iris, 2)
140 | 
141 |   expect_equal(
142 |     length(pull(x, ".bootstrap")),
143 |     300
144 |   )
145 | 
146 |   expect_equal(
147 |     length(pull(x, "Sepal.Length")),
148 |     300
149 |   )
150 | })
151 | 
152 | test_that("rename()", {
153 |   x <- bootstrapify(iris, 2)
154 | 
155 |   expect_equal(
156 |     colnames(rename(x, y = .bootstrap)),
157 |     c("y", colnames(iris))
158 |   )
159 | })
160 | 
161 | test_that("full_join()", {
162 | 
163 |   mini_iris <- iris[1:20, 1:2]
164 |   x <- bootstrapify(mini_iris, 2)
165 |   y <- tibble(Sepal.Length = 100, new = 1)
166 | 
167 |   x_fj <- full_join(x, y, by = "Sepal.Length")
168 | 
169 |   expect_equal(
170 |     nrow(x_fj),
171 |     41
172 |   )
173 | 
174 |   expect_equal(
175 |     x_fj$new,
176 |     c(rep(NA_real_, 40), 1)
177 |   )
178 | 
179 |   expect_equal(
180 |     colnames(x_fj),
181 |     c(".bootstrap", "Sepal.Length", "Sepal.Width", "new")
182 |   )
183 | 
184 | })
185 | 
186 | test_that("inner_join()", {
187 | 
188 |   mini_iris <- iris[1:20, 1:2]
189 |   x <- bootstrapify(mini_iris, 2)
190 |   y <- tibble(Sepal.Length = 100, new = 1)
191 | 
192 |   x_ij <- inner_join(x, y, by = "Sepal.Length")
193 | 
194 |   expect_equal(
195 |     nrow(x_ij),
196 |     0
197 |   )
198 | 
199 |   expect_equal(
200 |     colnames(x_ij),
201 |     c(".bootstrap", "Sepal.Length", "Sepal.Width", "new")
202 |   )
203 | 
204 | })
205 | 
206 | test_that("left_join()", {
207 | 
208 |   mini_iris <- iris[1:20, 1:2]
209 |   x <- bootstrapify(mini_iris, 2)
210 |   y <- tibble(Sepal.Length = 100, new = 1)
211 | 
212 |   x_lj <- left_join(x, y, by = "Sepal.Length")
213 | 
214 |   expect_equal(
215 |     nrow(x_lj),
216 |     40
217 |   )
218 | 
219 |   expect_equal(
220 |     x_lj$new,
221 |     rep(NA_real_, times = 40)
222 |   )
223 | 
224 |   expect_equal(
225 |     colnames(x_lj),
226 |     c(".bootstrap", "Sepal.Length", "Sepal.Width", "new")
227 |   )
228 | 
229 | })
230 | 
231 | test_that("right_join()", {
232 | 
233 |   mini_iris <- iris[1:20, 1:2]
234 |   x <- bootstrapify(mini_iris, 2)
235 |   y <- tibble(Sepal.Length = 100, new = 1)
236 | 
237 |   x_rj <- right_join(x, y, by = "Sepal.Length")
238 | 
239 |   expect_equal(
240 |     nrow(x_rj),
241 |     1
242 |   )
243 | 
244 |   expect_equal(
245 |     x_rj$new,
246 |     1
247 |   )
248 | 
249 |   expect_equal(
250 |     colnames(x_rj),
251 |     c(".bootstrap", "Sepal.Length", "Sepal.Width", "new")
252 |   )
253 | 
254 | })
255 | 
256 | test_that("anti_join()", {
257 | 
258 |   mini_iris <- iris[1:20, 1:2]
259 |   x <- bootstrapify(mini_iris, 2)
260 |   y <- tibble(Sepal.Length = 100, new = 1)
261 | 
262 |   x_aj <- anti_join(x, y, by = "Sepal.Length")
263 | 
264 |   expect_equal(
265 |     x_aj,
266 |     collect(x)
267 |   )
268 | 
269 | })
270 | 
271 | test_that("semi_join()", {
272 | 
273 |   mini_iris <- iris[1:20, 1:2]
274 |   x <- bootstrapify(mini_iris, 2)
275 |   y <- tibble(Sepal.Length = 100, new = 1)
276 | 
277 |   x_sj <- semi_join(x, y, by = "Sepal.Length")
278 | 
279 |   expect_equal(
280 |     x_sj,
281 |     collect(x)[0,]
282 |   )
283 | 
284 | })
285 | 
286 | # ------------------------------------------------------------------------------
287 | # dplyr functions implictly supported
288 | 
289 | context("test-dplyr-compat-extra")
290 | 
291 | test_that("add_count()", {
292 |   x <- bootstrapify(iris, 2)
293 |   expect_equal(
294 |     nrow(add_count(x)),
295 |     300
296 |   )
297 | })
298 | 
299 | test_that("add_tally()", {
300 |   x <- bootstrapify(iris, 2)
301 | 
302 |   expect_equal(
303 |     nrow(add_tally(x)),
304 |     300
305 |   )
306 | 
307 |   expect_equal(
308 |     nrow(add_tally(x, .bootstrap, sort = TRUE)),
309 |     300
310 |   )
311 | 
312 |   expect_equal(
313 |     unique(add_tally(x, .bootstrap, sort = TRUE)$n),
314 |     c(300, 150)
315 |   )
316 | })
317 | 
318 | test_that("as_tibble()", {
319 | 
320 |   x <- bootstrapify(iris, 2)
321 | 
322 |   x_at <- as_tibble(x)
323 | 
324 |   # can convert to tibble without expanding
325 |   # virtual groups
326 |   expect_equal(
327 |     nrow(x_at),
328 |     150
329 |   )
330 | 
331 |   expect_false(".bootstrap" %in% colnames(x_at))
332 |   expect_false("resampled_df" %in% class(x_at))
333 | 
334 | })
335 | 
336 | test_that("bind_rows() fails sadly", {
337 | 
338 |   x <- bootstrapify(iris, 2)
339 | 
340 |   # Cant currently do anything about this
341 |   expect_error(
342 |     bind_rows(x, iris),
343 |     "Column `.bootstrap` is unknown"
344 |   )
345 | 
346 | })
347 | 
348 | test_that("bind_cols() works", {
349 | 
350 |   x <- bootstrapify(iris, 2)
351 | 
352 |   x_bc_1 <- bind_cols(x, iris)
353 | 
354 |   expect_is(x_bc_1, "resampled_df")
355 | 
356 |   expect_equal(
357 |     ncol(x_bc_1),
358 |     10
359 |   )
360 | 
361 |   expect_equal(
362 |     nrow(x_bc_1),
363 |     150
364 |   )
365 | 
366 |   expect_equal(
367 |     nrow(collect(x_bc_1)),
368 |     300
369 |   )
370 | 
371 |   x_bc_2 <- bind_cols(iris, x)
372 | 
373 |   expect_is(x_bc_2, "data.frame")
374 |   expect_false("tbl_df" %in% class(x_bc_2))
375 | })
376 | 
377 | test_that("sample_n()", {
378 |   x <- bootstrapify(iris, 2)
379 | 
380 |   expect_equal(
381 |     nrow(sample_n(x, 5)),
382 |     10
383 |   )
384 | })
385 | 


--------------------------------------------------------------------------------
/tests/testthat/test-dplyr-do.R:
--------------------------------------------------------------------------------
 1 | context("test-dplyr-do")
 2 | 
 3 | library(dplyr)
 4 | 
 5 | test_that("can use do() with bootstraps", {
 6 | 
 7 |   x <- bootstrapify(iris, 5)
 8 | 
 9 |   expect_equal(
10 |     do(x, .),
11 |     collect(x)
12 |   )
13 | 
14 | })
15 | 
16 | test_that("can return non-data frames", {
17 | 
18 |   x <- bootstrapify(iris, 5)
19 | 
20 |   x_do <- do(x, mod = lm(Sepal.Width ~ Species, data = .))
21 | 
22 |   expect_is(x_do, "rowwise_df")
23 |   expect_false("resampled_df" %in% class(x_do))
24 | 
25 |   expect_equal(
26 |     nrow(x_do),
27 |     5
28 |   )
29 | 
30 | })
31 | 
32 | test_that("multiple groups", {
33 | 
34 |   x <- iris %>%
35 |     group_by(Species) %>%
36 |     bootstrapify(5)
37 | 
38 |   x_do <- do(x, mod = lm(Sepal.Width ~ Sepal.Length, data = .))
39 | 
40 |   expect_equal(
41 |     nrow(x_do),
42 |     15
43 |   )
44 | 
45 | })
46 | 
47 | test_that("double bootstrap", {
48 | 
49 |   x <- iris %>%
50 |     bootstrapify(5) %>%
51 |     bootstrapify(10, key = "b2")
52 | 
53 |   x_do <- do(x, mod = lm(Sepal.Width ~ Sepal.Length, data = .))
54 | 
55 |   expect_equal(
56 |     colnames(x_do),
57 |     c(".bootstrap", "b2", "mod")
58 |   )
59 | 
60 | })
61 | 


--------------------------------------------------------------------------------
/tests/testthat/test-dplyr-group-funs.R:
--------------------------------------------------------------------------------
  1 | context("test-dplyr-group-funs")
  2 | 
  3 | library(dplyr)
  4 | 
  5 | test_that("group_nest()", {
  6 | 
  7 |   x <- iris %>%
  8 |     bootstrapify(5)
  9 | 
 10 |   x_gn <- group_nest(x)
 11 | 
 12 |   expect_equal(nrow(x_gn), 5)
 13 |   expect_false("resampled_df" %in% class(x_gn))
 14 | 
 15 |   xx <- bootstrapify(x, 10, key = "bs2")
 16 | 
 17 |   xx_gn <- group_nest(xx)
 18 | 
 19 |   expect_equal(nrow(xx_gn), 50)
 20 |   expect_false("resampled_df" %in% class(xx_gn))
 21 | 
 22 | })
 23 | 
 24 | test_that("group_nest() with `keep = TRUE`", {
 25 | 
 26 |   x <- iris %>%
 27 |     bootstrapify(5)
 28 | 
 29 |   x_gn <- group_nest(x, keep = TRUE)
 30 | 
 31 |   expect_equal(
 32 |     colnames(x_gn$data[[1]])[1],
 33 |     ".bootstrap"
 34 |   )
 35 | 
 36 | })
 37 | 
 38 | test_that("group_modify()", {
 39 | 
 40 |   x <- iris %>%
 41 |     bootstrapify(5)
 42 | 
 43 |   expect_equal(
 44 |     group_modify(x, ~.x),
 45 |     collect(x)
 46 |   )
 47 | 
 48 |   x_gm <- group_modify(x, ~dplyr::tibble(.g = list(.y)))
 49 | 
 50 |   expect_equal(nrow(x_gm), 5)
 51 | 
 52 |   expect_equal(
 53 |     x_gm$.g[[1]],
 54 |     dplyr::tibble(.bootstrap = 1L)
 55 |   )
 56 | 
 57 | })
 58 | 
 59 | test_that("group_map()", {
 60 | 
 61 |   x <- iris %>%
 62 |     bootstrapify(5)
 63 | 
 64 |   # Don't check attributes, as group_split()
 65 |   # also adds the ptype as an attribute as of dplyr 0.8.2
 66 |   expect_equivalent(
 67 |     group_map(x, ~.x),
 68 |     group_split(x, keep = FALSE)
 69 |   )
 70 | 
 71 |   x_gm <- group_map(x, ~dplyr::tibble(.g = list(.y)))
 72 | 
 73 |   expect_is(x_gm, "list")
 74 | 
 75 |   expect_identical(
 76 |     x_gm[[1]],
 77 |     tibble(.g = list(tibble(.bootstrap = 1L)))
 78 |   )
 79 | 
 80 |   # `keep` argument
 81 |   expect_equal(
 82 |     unlist(group_map(x, ~ncol(.x), keep = TRUE)),
 83 |     rep(6, times = 5)
 84 |   )
 85 | 
 86 | })
 87 | 
 88 | test_that("group_walk()", {
 89 | 
 90 |   x <- iris %>%
 91 |     bootstrapify(5)
 92 | 
 93 |   res <- NULL
 94 | 
 95 |   group_walk(x, ~{ res <<- dplyr::bind_rows(res, .y) })
 96 | 
 97 |   expect_equal(
 98 |     res,
 99 |     dplyr::tibble(.bootstrap = 1:5)
100 |   )
101 | 
102 |   res <- NULL
103 | 
104 |   group_walk(x, ~{ res <<- dplyr::bind_rows(res, .x) })
105 | 
106 |   expect_equal(
107 |     res,
108 |     select(ungroup(collect(x)), -.bootstrap)
109 |   )
110 | 
111 | })
112 | 
113 | test_that("group_split() - `keep` argument", {
114 | 
115 |   x <- iris %>%
116 |     bootstrapify(5)
117 | 
118 |   x_gs <- group_split(x, keep = TRUE)
119 | 
120 |   expect_equal(
121 |     colnames(x_gs[[1]]),
122 |     c(".bootstrap", colnames(iris))
123 |   )
124 | 
125 |   x_gs2 <- group_split(x, keep = FALSE)
126 | 
127 |   expect_equal(
128 |     colnames(x_gs2[[1]]),
129 |     colnames(iris)
130 |   )
131 | })
132 | 
133 | test_that("group_keys() can find the virtual groups", {
134 | 
135 |   x <- iris %>%
136 |     group_by(Species) %>%
137 |     bootstrapify(1) %>%
138 |     bootstrapify(2)
139 | 
140 |   x_keys <- group_keys(x)
141 | 
142 |   expect_equal(
143 |     x_keys[[1]],
144 |     rep(unique(iris$Species), each = 2)
145 |   )
146 | 
147 |   expect_equal(
148 |     x_keys[[2]],
149 |     rep(1, times = 6)
150 |   )
151 | 
152 |   expect_equal(
153 |     x_keys[[3]],
154 |     rep(1:2, times = 3)
155 |   )
156 | })
157 | 
158 | test_that("group_data() finds virtual groups", {
159 | 
160 |   x <- iris %>%
161 |     bootstrapify(2)
162 | 
163 |   x_gd <- group_data(x)
164 | 
165 |   expect_equal(
166 |     x_gd$.bootstrap,
167 |     c(1, 2)
168 |   )
169 | })
170 | 
171 | test_that("group_indices() returns collect()ed indices", {
172 | 
173 |   x <- iris %>%
174 |     bootstrapify(2)
175 | 
176 |   x_gi <- group_indices(x)
177 | 
178 |   expect_equal(
179 |     x_gi,
180 |     c(rep(1, times = 150), rep(2, times = 150))
181 |   )
182 | 
183 | })
184 | 
185 | test_that("group_vars() returns virtual groups", {
186 | 
187 |   x <- iris %>%
188 |     bootstrapify(2)
189 | 
190 |   expect_equal(
191 |     group_vars(x),
192 |     ".bootstrap"
193 |   )
194 | 
195 |   # I think it is correct to expect that tbl_vars()
196 |   # doesn't return the virtual group
197 |   expect_equal(
198 |     as.character(tbl_vars(x)),
199 |     colnames(iris)
200 |   )
201 | 
202 | })
203 | 


--------------------------------------------------------------------------------
/tests/testthat/test-dplyr-summarise.R:
--------------------------------------------------------------------------------
 1 | context("test-dplyr-summarise")
 2 | 
 3 | library(dplyr)
 4 | 
 5 | test_that("summarise() works with a simple bootstrap", {
 6 | 
 7 |   x <- bootstrapify(iris, 5)
 8 | 
 9 |   expect_error(
10 |     x_s <- summarise(x, y = mean(Sepal.Width)),
11 |     NA
12 |   )
13 | 
14 |   expect_equal(
15 |     x_s$.bootstrap,
16 |     1:5
17 |   )
18 | 
19 |   expect_false("grouped_df" %in% class(x_s))
20 |   expect_false("resampled_df" %in% class(x_s))
21 | 
22 | })
23 | 
24 | test_that("summarise() == summarize()", {
25 | 
26 |   x <- bootstrapify(iris, 5)
27 | 
28 |   expect_equal(
29 |     summarise(x, y = mean(Sepal.Width)),
30 |     summarize(x, y = mean(Sepal.Width))
31 |   )
32 | 
33 | })
34 | 
35 | test_that("works with a double bootstrap", {
36 | 
37 |   x <- iris %>%
38 |     bootstrapify(5) %>%
39 |     bootstrapify(10, key = "bs2")
40 | 
41 |   expect_error(
42 |     x_s <- summarise(x, y = mean(Sepal.Width)),
43 |     NA
44 |   )
45 | 
46 |   # We want it to be a grouped_df, but no longer a resampled_df
47 |   # because the groups have been materialized
48 |   expect_is(x_s, "grouped_df")
49 |   expect_false("resampled_df" %in% class(x_s))
50 | 
51 |   expect_equal(
52 |     nrow(x_s),
53 |     5 * 10
54 |   )
55 | 
56 | })
57 | 
58 | test_that("works with existing groups", {
59 | 
60 |   x <- iris %>%
61 |     group_by(Species) %>%
62 |     bootstrapify(5)
63 | 
64 |   expect_error(
65 |     x_s <- summarise(x, y = mean(Sepal.Width)),
66 |     NA
67 |   )
68 | 
69 |   # We want it to be a grouped_df, but no longer a resampled_df
70 |   # because the groups have been materialized
71 |   expect_is(x_s, "grouped_df")
72 |   expect_false("resampled_df" %in% class(x_s))
73 | 
74 |   expect_equal(
75 |     nrow(x_s),
76 |     5 * 3
77 |   )
78 | 
79 |   expect_equal(
80 |     colnames(x_s),
81 |     c("Species", ".bootstrap", "y")
82 |   )
83 | 
84 | })
85 | 


--------------------------------------------------------------------------------
/tests/testthat/test-samplify.R:
--------------------------------------------------------------------------------
  1 | context("test-samplify")
  2 | 
  3 | # many of the would-be tests here are already covered by `test-bootstrapify.R`
  4 | 
  5 | library(dplyr)
  6 | 
  7 | test_that("can alter the `size` of each resample", {
  8 | 
  9 |   x <- samplify(iris, 1, 5)
 10 | 
 11 |   expect_equal(
 12 |     length(group_data(x)$.rows[[1]]),
 13 |     5
 14 |   )
 15 | 
 16 | })
 17 | 
 18 | test_that("cannot specify vector `size` for ungrouped data frames", {
 19 | 
 20 |   expect_error(
 21 |     samplify(iris, 2, c(1, 5)),
 22 |     "a single integer \\(for ungrouped data frames\\)"
 23 |   )
 24 | 
 25 | })
 26 | 
 27 | test_that("can specify vector `size` for grouped data frames", {
 28 | 
 29 |   iris_g <- group_by(iris, Species)
 30 | 
 31 |   expect_error(
 32 |     x <- samplify(iris_g, 1, c(1, 2, 3)),
 33 |     NA
 34 |   )
 35 | 
 36 |   x_gd <- group_data(x)
 37 | 
 38 |   expect_equal(
 39 |     vapply(x_gd$.rows, length, integer(1)),
 40 |     c(1, 2, 3)
 41 |   )
 42 | 
 43 |   expect_error(
 44 |     samplify(x, 2, c(1, 5)),
 45 |     "must be size 1 or 3 \\(the number of groups\\)"
 46 |   )
 47 | 
 48 | })
 49 | 
 50 | test_that("cannot sample more than the number of rows without replacement", {
 51 | 
 52 |   expect_error(
 53 |     samplify(iris, 1, 151),
 54 |     "`size` \\(151\\) must be less than or equal to the size of the data / current group \\(150\\)"
 55 |   )
 56 | 
 57 |   iris_g <- group_by(iris, Species)
 58 | 
 59 |   expect_error(
 60 |     samplify(iris_g, 1, c(49, 49, 51)),
 61 |     "`size` \\(51\\) must be less than or equal to the size of the data / current group \\(50\\)"
 62 |   )
 63 | 
 64 | })
 65 | 
 66 | test_that("can sample with replacement past the number of rows", {
 67 | 
 68 |   expect_error(
 69 |     x <- samplify(iris, 1, 151, replace = TRUE),
 70 |     NA
 71 |   )
 72 | 
 73 |   x_gd <- group_data(x)
 74 | 
 75 |   expect_equal(
 76 |     length(x_gd$.rows[[1]]),
 77 |     151
 78 |   )
 79 | 
 80 |   iris_g <- group_by(iris, Species)
 81 | 
 82 |   expect_error(
 83 |     xx <- samplify(iris_g, 1, c(51, 55, 40), replace = TRUE),
 84 |     NA
 85 |   )
 86 | 
 87 |   xx_gd <- group_data(xx)
 88 | 
 89 | 
 90 |   expect_equal(
 91 |     vapply(xx_gd$.rows, length, integer(1)),
 92 |     c(51, 55, 40)
 93 |   )
 94 | 
 95 | })
 96 | 
 97 | test_that("`replace` must be a bool", {
 98 |   expect_error(
 99 |     samplify(iris, 1, 1, replace = NA),
100 |     "a single logical \\(TRUE/FALSE\\)"
101 |   )
102 | })
103 | 
104 | test_that("`size` is recycled as necessary", {
105 | 
106 |   iris_g <- group_by(iris, Species)
107 | 
108 |   x <- samplify(iris_g, 2, 5)
109 | 
110 |   x_gd <- group_data(x)
111 | 
112 |   expect_equal(
113 |     unique(vapply(x_gd$.rows, length, integer(1))),
114 |     5
115 |   )
116 | 
117 | })
118 | 


--------------------------------------------------------------------------------
/vignettes/.gitignore:
--------------------------------------------------------------------------------
1 | *.html
2 | *.R
3 | 


--------------------------------------------------------------------------------
/vignettes/dplyr-support.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Support for dplyr"
  3 | output: rmarkdown::html_vignette
  4 | vignette: >
  5 |   %\VignetteIndexEntry{Support for dplyr}
  6 |   %\VignetteEncoding{UTF-8}
  7 |   %\VignetteEngine{knitr::rmarkdown}
  8 | editor_options: 
  9 |   chunk_output_type: console
 10 | ---
 11 | 
 12 | ```{r, include = FALSE}
 13 | knitr::opts_chunk$set(
 14 |   collapse = TRUE,
 15 |   comment = "#>"
 16 | )
 17 | ```
 18 | 
 19 | ```{r setup, message=FALSE, warning=FALSE}
 20 | set.seed(123)
 21 | library(strapgod)
 22 | library(dplyr)
 23 | ```
 24 | 
 25 | # Introduction
 26 | 
 27 | As much as possible, strapgod attempts to let you use any dplyr function that you want on the `resampled_df` object that is returned by `bootstrapify()` and `samplify()`. Some functions have specialized behavior, like `summarise()`, while most others just call `collect()` to materialize the bootstrap rows before passing on to the underlying dplyr function.
 28 | 
 29 | What follows is a list of the dplyr functions that have "special" properties when used on a `resampled_df`.
 30 | 
 31 | # collect()
 32 | 
 33 | The most important dplyr function for strapgod is `collect()`. Generally, this has been used to force a computation from a data base query and return the results as a tibble, and it has a similar context here. `collect()` forces the materialization of the virtual groups, and returns the full grouped tibble back to you.
 34 | 
 35 | ```{r}
 36 | x <- bootstrapify(iris, 10)
 37 | 
 38 | # Not materialized
 39 | x
 40 | 
 41 | # Materialized
 42 | collect(x)
 43 | ```
 44 | 
 45 | When calling `collect()` directly, there are two arguments available to extract extra information about the bootstraps. 
 46 | 
 47 | `id` adds a sequence of integers from `1:n` for each bootstrap group. It would be equivalent to adding the `row_number()` by group after the `collect()`, but saves some typing.
 48 | 
 49 | ```{r}
 50 | collect(x, id = ".id")
 51 | ```
 52 | 
 53 | `original_id` tacks on the original row of the current bootstrap observation. It is generally more useful than `id`, as it provides a way to link the bootstrap rows back to the original data.
 54 | 
 55 | ```{r}
 56 | collect(x, original_id = ".original_id")
 57 | ```
 58 | 
 59 | # summarise()
 60 | 
 61 | The motivation for this package was `summarise()`. It efficiently computes the summary results, only materializing the bootstrap rows as they are needed at the C++ level.
 62 | 
 63 | ```{r}
 64 | summarise(x, mean_length = mean(Sepal.Length))
 65 | ```
 66 | 
 67 | You can group by other columns before creating the virtual groups, and `bootstrapify()` will respect those extra groups in the `summarise()` call. Pay attention to how easy it is to go from a non-bootstrapped version to a bootstrapped version. It's just one extra line!
 68 | 
 69 | ```{r}
 70 | # Non-bootstrapped
 71 | iris %>%
 72 |   group_by(Species) %>%
 73 |   summarise(
 74 |     mean_length_across_species = mean(Sepal.Length)
 75 |   )
 76 | 
 77 | # Bootstrapped
 78 | iris %>%
 79 |   group_by(Species) %>%
 80 |   bootstrapify(5) %>%
 81 |   summarise(
 82 |     mean_length_across_species = mean(Sepal.Length)
 83 |   )
 84 | ```
 85 | 
 86 | # do()
 87 | 
 88 | While `dplyr::do()` is basically deprecated and has been replaced by `group_modify()`, it still has its uses sometimes. Like `summarise()`, `do()` materializes the groups only when they are required. Here we run the same linear model on each bootstrapped set of data.
 89 | 
 90 | ```{r}
 91 | do(x, model = lm(Sepal.Length ~ Sepal.Width, data = .))
 92 | ```
 93 | 
 94 | # group_nest()
 95 | 
 96 | `group_nest()` will materialize the groups so that they become columns in the outer tibble after the nest has been performed.
 97 | 
 98 | ```{r}
 99 | group_nest(x)
100 | ```
101 | 
102 | You can set `keep = TRUE` to include the groups in the inner tibbles as well.
103 | 
104 | ```{r}
105 | group_nest(x, keep = TRUE)$data[[1]]
106 | ```
107 | 
108 | # group_split()
109 | 
110 | `group_split()` allows you to materialize all of the bootstrap tibbles into separate tibbles, all bundled together into a list.
111 | 
112 | ```{r}
113 | group_split(x) %>% head(n = 3)
114 | ```
115 | 
116 | You can specify `keep = FALSE` if you never want to see the bootstrap columns.
117 | 
118 | ```{r}
119 | group_split(x, keep = FALSE) %>% head(n = 3)
120 | ```
121 | 
122 | # group_modify()
123 | 
124 | `group_modify()` is similar to `do()`, but (as of dplyr 0.8.0.1) always returns a data frame and gives you access to the non-group and group data separately.
125 | 
126 | ```{r}
127 | # Just show the first 2 rows of each bootstrap
128 | group_modify(x, ~head(.x, n = 2))
129 | 
130 | # As you iterate though each group, you have access to that
131 | # group's metadata through `.y` if you need it.
132 | group_modify_group_data <- group_modify(x, ~tibble(.g = list(.y)))
133 | 
134 | group_modify_group_data
135 | 
136 | group_modify_group_data$.g[[1]]
137 | ```
138 | 
139 | Like `do()`, it can be a convenient way to run multiple models as long as you return a data frame from each one.
140 | 
141 | ```{r}
142 | x %>%
143 |   group_by(Species, add = TRUE) %>%
144 |   group_modify(~ broom::tidy(lm(Petal.Length ~ Sepal.Length, data = .x)))
145 | ```
146 | 
147 | # ungroup()
148 | 
149 | `ungroup()` will return the original tibble back to you, without materializing the virtual groups.
150 | 
151 | ```{r}
152 | ungroup(x)
153 | ```
154 | 
155 | # as_tibble()
156 | 
157 | Like `ungroup()`, you can get the original tibble back by converting it to one explicitly with `as_tibble()`.
158 | 
159 | ```{r}
160 | as_tibble(x)
161 | ```
162 | 
163 | # Other dplyr functions
164 | 
165 | Most other dplyr functions work by first calling `collect()`, and then passing off to the underlying dplyr implementation. This means you can use `mutate()` like so:
166 | 
167 | ```{r}
168 | mutate(x, mean = mean(Sepal.Length))
169 | ```
170 | 
171 | This doesn't really get you anything in terms of speed, but can be convenient as an automatic way to convert back to a tibble and keep going with your workflow.
172 | 


--------------------------------------------------------------------------------
/vignettes/virtual-bootstraps.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Virtual Bootstraps"
  3 | output: rmarkdown::html_vignette
  4 | vignette: >
  5 |   %\VignetteIndexEntry{Virtual Bootstraps}
  6 |   %\VignetteEncoding{UTF-8}
  7 |   %\VignetteEngine{knitr::rmarkdown}
  8 | editor_options: 
  9 |   chunk_output_type: console
 10 | ---
 11 | 
 12 | ```{r, include = FALSE}
 13 | knitr::opts_chunk$set(
 14 |   collapse = TRUE,
 15 |   comment = "#>"
 16 | )
 17 | ```
 18 | 
 19 | ```{r setup, message=FALSE, warning=FALSE}
 20 | set.seed(123)
 21 | library(strapgod)
 22 | library(dplyr)
 23 | iris <- as_tibble(iris)
 24 | ```
 25 | 
 26 | # Introduction
 27 | 
 28 | The goal of strapgod is to make it easy to create _virtual groups_ on top of tibbles for use with resampling. This means that your tibble is grouped, but you don't actually "materialize" the groups until you actually need them. By doing this, some computations involving large amounts of bootstraps or resamples can be made much more efficient.
 29 | 
 30 | # Creating resampled data frames
 31 | 
 32 | There are two core functions that help you generate a `resampled_df` object.
 33 | 
 34 | `bootstrapify()` takes a data frame and bootstraps the rows of that data frame a set number of `times` to generate the virtual groups.
 35 | 
 36 | ```{r}
 37 | iris_boot <- bootstrapify(iris, times = 10)
 38 | 
 39 | nrow(iris)
 40 | nrow(iris_boot)
 41 | 
 42 | iris_boot
 43 | ```
 44 | 
 45 | What you'll immediately notice is that:
 46 | 
 47 | - The tibble still only has 150 rows.
 48 | 
 49 | - The tibble is now grouped by `.bootstrap`, which isn't a column in the tibble.
 50 | 
 51 | The invisible `.bootstrap` column is the virtual group. It hasn't been materialized (there are still only 150 rows, not 150 * 10 rows), but dplyr still seems to know about it.
 52 | 
 53 | `samplify()` is the other function that can generate resampled tibbles. It is a slight generalization of `bootstrapify()` that also allows you to specify the size of each resample, and if you want to resample with replacement or not.
 54 | 
 55 | ```{r}
 56 | iris_samp <- samplify(iris, times = 10, size = 20, replace = FALSE)
 57 | 
 58 | iris_samp
 59 | ```
 60 | 
 61 | This result:
 62 | 
 63 | - Has 10 resamples
 64 | 
 65 | - Each one is of size 20
 66 | 
 67 | - And the resampling was done without replacement each time
 68 | 
 69 | # Resampled summaries
 70 | 
 71 | What can you do with these neat resampled data frames? Great question! For one thing, you can `summarise()` the tibble to compute bootstrapped summaries quickly and efficiently.
 72 | 
 73 | ```{r}
 74 | # without the bootstrap
 75 | iris %>%
 76 |   summarise(
 77 |     mean_length = mean(Sepal.Length)
 78 |   )
 79 | 
 80 | # with the bootstrap
 81 | iris %>%
 82 |   bootstrapify(10) %>%
 83 |   summarise(
 84 |     mean_length = mean(Sepal.Length)
 85 |   )
 86 | ```
 87 | 
 88 | This makes it easy to compute bootstrapped estimates of individual statistics, along with bootstrapped standard deviations around those estimates.
 89 | 
 90 | ```{r}
 91 | iris %>%
 92 |   bootstrapify(10) %>%
 93 |   summarise(mean_length = mean(Sepal.Length)) %>%
 94 |   summarise(
 95 |     bootstrapped_mean = mean(mean_length),
 96 |     bootstrapped_sd   = sd(mean_length)
 97 |   )
 98 | ```
 99 | 
100 | If you want, you can take an existing grouped data frame and bootstrapify that as well, allowing you to compute bootstrapped statistics _across_ some other variable.
101 | 
102 | ```{r}
103 | iris_group_strap <- iris %>%
104 |   group_by(Species) %>%
105 |   bootstrapify(100) 
106 | 
107 | iris_group_strap
108 | ```
109 | 
110 | Reusing the code from above, we can now compute bootstrapped estimates for the mean `Sepal.Length` of each `Species`, along with standard deviations around those estimates.
111 | 
112 | ```{r}
113 | iris_group_strap %>%
114 |   summarise(mean_length = mean(Sepal.Length)) %>%
115 |   summarise(
116 |     bootstrapped_mean = mean(mean_length),
117 |     bootstrapped_sd   = sd(mean_length)
118 |   )
119 | ```
120 | 
121 | # Understanding virtual groups
122 | 
123 | The virtual groups are stored in the `group_data()` metadata of the `resampled_df` object. Every grouped data frame has one of these, and they are used internally to power the dplyr `group_by()` system.
124 | 
125 | ```{r}
126 | group_data(iris_boot)
127 | ```
128 | 
129 | The `.bootstrap` column contains the unique values of the groups, and the `.rows` column is a list column, where each element is an integer vector. That integer vector holds the rows that belong to that specific group. So, for `.bootstrap == 1`, there is a vector with 150 integers identifying the rows belonging to that resample.
130 | 
131 | ```{r}
132 | group_data(iris_boot)$.rows[[1]]
133 | ```
134 | 
135 | When a call to `collect()` is made, this row index information is used to construct the output. Essentially, we start with the `group_data()` and utilize the `.rows` info to replicate the rows of the original data frame for each group, building up the complete resampled data frame. Notice how we now have the `150 * 10 = 1500` rows from the 10 bootstraps.
136 | 
137 | ```{r}
138 | collect(iris_boot)
139 | ```
140 | 
141 | To learn more about `collect()`, and the other supported dplyr functions in strapgod, read the `vignette("dplyr-support", "strapgod")`.
142 | 


--------------------------------------------------------------------------------