├── .Rbuildignore
├── .gitignore
├── .travis.yml
├── DESCRIPTION
├── NAMESPACE
├── NEWS.md
├── R
├── bootstrapify.R
├── collect.R
├── compat-dplyr.R
├── compat-purrr.R
├── compat-tidyr.R
├── samplify.R
├── util.R
└── zzz.R
├── README.Rmd
├── README.md
├── _pkgdown.yml
├── codecov.yml
├── cran-comments.md
├── man
├── bootstrapify.Rd
├── collect.resampled_df.Rd
├── figures
│ ├── README-bootstrap-plots-1.png
│ ├── README-non-bootstrap-plots-1.png
│ └── strap-god.jpg
├── reexports.Rd
└── samplify.Rd
├── strapgod.Rproj
├── tests
├── testthat.R
└── testthat
│ ├── test-bootstrapify.R
│ ├── test-collect.R
│ ├── test-compat-tidyr.R
│ ├── test-dplyr-compat.R
│ ├── test-dplyr-do.R
│ ├── test-dplyr-group-funs.R
│ ├── test-dplyr-summarise.R
│ └── test-samplify.R
└── vignettes
├── .gitignore
├── dplyr-support.Rmd
└── virtual-bootstraps.Rmd
/.Rbuildignore:
--------------------------------------------------------------------------------
1 | ^CRAN-RELEASE$
2 | ^cran-comments\.md$
3 | ^pkgdown$
4 | ^_pkgdown\.yml$
5 | ^docs$
6 | ^codecov\.yml$
7 | ^\.travis\.yml$
8 | ^README\.Rmd$
9 | ^strapgod\.Rproj$
10 | ^\.Rproj\.user$
11 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | docs/
2 | inst/doc
3 | .Rhistory
4 | .RData
5 | .Rproj.user
6 |
--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
1 | # R for travis: see documentation at https://docs.travis-ci.com/user/languages/r
2 |
3 | language: r
4 | sudo: false
5 | cache: packages
6 |
7 | matrix:
8 | include:
9 | - r: devel
10 | - r: release
11 | after_success:
12 | - Rscript -e 'covr::codecov()'
13 | before_cache:
14 | - Rscript -e 'remotes::install_cran("pkgdown")'
15 | # For the README
16 | - Rscript -e 'remotes::install_cran("ggplot2")'
17 | deploy:
18 | provider: script
19 | script: Rscript -e 'pkgdown::deploy_site_github(run_dont_run = TRUE)'
20 | skip_cleanup: true
21 | - r: oldrel
22 | - r: 3.2
23 |
--------------------------------------------------------------------------------
/DESCRIPTION:
--------------------------------------------------------------------------------
1 | Package: strapgod
2 | Title: Resampled Data Frames
3 | Version: 0.0.4.9000
4 | Authors@R:
5 | person(given = "Davis",
6 | family = "Vaughan",
7 | role = c("aut", "cre"),
8 | email = "davis@rstudio.com")
9 | Description: Create data frames with virtual groups that can be used with
10 | 'dplyr' to efficiently compute resampled statistics, generate the data for
11 | hypothetical outcome plots, and fit multiple models on resampled variations
12 | of the original data.
13 | License: GPL-3
14 | Encoding: UTF-8
15 | LazyData: true
16 | Roxygen: list(markdown = TRUE)
17 | RoxygenNote: 6.1.1
18 | Depends:
19 | R (>= 3.2.0)
20 | Imports:
21 | dplyr (>= 0.8.3),
22 | tidyr (>= 1.0.0),
23 | rlang,
24 | tibble,
25 | lifecycle
26 | Suggests:
27 | broom,
28 | testthat,
29 | covr,
30 | knitr,
31 | rmarkdown
32 | URL: https://github.com/DavisVaughan/strapgod
33 | BugReports: https://github.com/DavisVaughan/strapgod/issues
34 | VignetteBuilder: knitr
35 |
--------------------------------------------------------------------------------
/NAMESPACE:
--------------------------------------------------------------------------------
1 | # Generated by roxygen2: do not edit by hand
2 |
3 | S3method(anti_join,resampled_df)
4 | S3method(arrange,resampled_df)
5 | S3method(arrange_,resampled_df)
6 | S3method(bootstrapify,data.frame)
7 | S3method(bootstrapify,grouped_df)
8 | S3method(bootstrapify,tbl_df)
9 | S3method(collect,resampled_df)
10 | S3method(distinct,resampled_df)
11 | S3method(filter,resampled_df)
12 | S3method(full_join,resampled_df)
13 | S3method(group_by,resampled_df)
14 | S3method(group_indices,resampled_df)
15 | S3method(group_indices_,resampled_df)
16 | S3method(group_nest,resampled_df)
17 | S3method(group_split,resampled_df)
18 | S3method(inner_join,resampled_df)
19 | S3method(left_join,resampled_df)
20 | S3method(mutate,resampled_df)
21 | S3method(mutate_,resampled_df)
22 | S3method(nest,resampled_df)
23 | S3method(pull,resampled_df)
24 | S3method(rename,resampled_df)
25 | S3method(right_join,resampled_df)
26 | S3method(samplify,data.frame)
27 | S3method(samplify,grouped_df)
28 | S3method(samplify,tbl_df)
29 | S3method(select,resampled_df)
30 | S3method(semi_join,resampled_df)
31 | S3method(slice,resampled_df)
32 | S3method(slice_,resampled_df)
33 | S3method(summarise,resampled_df)
34 | S3method(summarise_,resampled_df)
35 | S3method(transmute,resampled_df)
36 | export(bootstrapify)
37 | export(filter)
38 | export(samplify)
39 | importFrom(dplyr,anti_join)
40 | importFrom(dplyr,arrange)
41 | importFrom(dplyr,arrange_)
42 | importFrom(dplyr,collect)
43 | importFrom(dplyr,distinct)
44 | importFrom(dplyr,filter)
45 | importFrom(dplyr,full_join)
46 | importFrom(dplyr,group_by)
47 | importFrom(dplyr,group_indices)
48 | importFrom(dplyr,group_indices_)
49 | importFrom(dplyr,group_nest)
50 | importFrom(dplyr,group_split)
51 | importFrom(dplyr,inner_join)
52 | importFrom(dplyr,left_join)
53 | importFrom(dplyr,mutate)
54 | importFrom(dplyr,mutate_)
55 | importFrom(dplyr,pull)
56 | importFrom(dplyr,rename)
57 | importFrom(dplyr,right_join)
58 | importFrom(dplyr,select)
59 | importFrom(dplyr,semi_join)
60 | importFrom(dplyr,slice)
61 | importFrom(dplyr,slice_)
62 | importFrom(dplyr,summarise)
63 | importFrom(dplyr,summarise_)
64 | importFrom(dplyr,transmute)
65 | importFrom(lifecycle,deprecated)
66 | importFrom(rlang,":=")
67 | importFrom(tidyr,nest)
68 |
--------------------------------------------------------------------------------
/NEWS.md:
--------------------------------------------------------------------------------
1 | # strapgod (development version)
2 |
3 | # strapgod 0.0.4
4 |
5 | * The resampled data frame method for `tidyr::nest()` has been updated to
6 | support the changes in tidyr 1.0.0.
7 |
8 | # strapgod 0.0.3
9 |
10 | * In dplyr 0.8.2, the `ptype` is carried along as an attribute
11 | in `group_split()`. A test was updated to reflect this.
12 |
13 | * In dplyr 0.8.2, the behavior of `tbl_vars()` was changed to return a classed
14 | object. A test has been updated to reflect this (#13).
15 |
16 | # strapgod 0.0.2
17 |
18 | * In dplyr 0.8.1, the behavior of `group_map()` was moved to `group_modify()`,
19 | and `group_map()` was repurposed to always return a list. strapgod has been
20 | updated to reflect these changes.
21 |
22 | # strapgod 0.0.1
23 |
24 | * Added a `NEWS.md` file to track changes to the package.
25 |
--------------------------------------------------------------------------------
/R/bootstrapify.R:
--------------------------------------------------------------------------------
1 | #' Create a bootstrapped tibble
2 | #'
3 | #' @description
4 | #'
5 | #' `bootstrapify()` creates a bootstrapped tibble with _virtual groups_.
6 | #'
7 | #' @inherit samplify details
8 | #'
9 | #' @inherit samplify return
10 | #'
11 | #' @seealso [collect.resampled_df()]
12 | #'
13 | #' @inheritParams samplify
14 | #'
15 | #' @examples
16 | #' library(dplyr)
17 | #' library(broom)
18 | #'
19 | #' bootstrapify(iris, 5)
20 | #'
21 | #' iris %>%
22 | #' bootstrapify(5) %>%
23 | #' summarise(per_strap_mean = mean(Petal.Width))
24 | #'
25 | #' iris %>%
26 | #' group_by(Species) %>%
27 | #' bootstrapify(5) %>%
28 | #' summarise(per_strap_species_mean = mean(Petal.Width))
29 | #'
30 | #' iris %>%
31 | #' bootstrapify(5) %>%
32 | #' do(tidy(lm(Sepal.Width ~ Sepal.Length + Species, data = .)))
33 | #'
34 | #' # Alternatively, use the newer group_modify()
35 | #' iris %>%
36 | #' bootstrapify(5) %>%
37 | #' group_modify(~tidy(lm(Sepal.Width ~ Sepal.Length + Species, data = .x)))
38 | #'
39 | #' # Alter the name of the group with `key`
40 | #' # Materialize them with collect()
41 | #' straps <- bootstrapify(iris, 5, key = ".straps")
42 | #' collect(straps)
43 | #'
44 | #' @family virtual samplers
45 | #'
46 | #' @name bootstrapify
47 |
48 | #' @rdname bootstrapify
49 | #' @export
50 | bootstrapify <- function(data, times, ..., key = ".bootstrap") {
51 | UseMethod("bootstrapify")
52 | }
53 |
54 | #' @export
55 | bootstrapify.data.frame <- function(data, times, ..., key = ".bootstrap") {
56 | bootstrapify(dplyr::as_tibble(data), times, ..., key = key)
57 | }
58 |
59 | #' @export
60 | bootstrapify.tbl_df <- function(data, times, ..., key = ".bootstrap") {
61 | samplify(
62 | data = data,
63 | times = times,
64 | size = nrow(data),
65 | ...,
66 | replace = TRUE,
67 | key = key
68 | )
69 | }
70 |
71 | #' @export
72 | bootstrapify.grouped_df <- function(data, times, ..., key = ".bootstrap") {
73 | samplify(
74 | data = data,
75 | times = times,
76 | size = dplyr::group_size(data),
77 | ...,
78 | replace = TRUE,
79 | key = key
80 | )
81 | }
82 |
--------------------------------------------------------------------------------
/R/collect.R:
--------------------------------------------------------------------------------
1 | #' Force virtual groups to become explicit rows
2 | #'
3 | #' When `collect()` is used on a `resampled_df`, the virtual bootstrap groups
4 | #' are made explicit.
5 | #'
6 | #' @param x A `resampled_df`.
7 | #'
8 | #' @param ... Not used.
9 | #'
10 | #' @param id Optional. A single character that specifies a name for a column
11 | #' containing a sequence from `1:n` for each bootstrap group.
12 | #'
13 | #' @param original_id Optional. A single character that specifies a name for
14 | #' a column containing the original position of the bootstrapped row.
15 | #'
16 | #' @examples
17 | #' library(dplyr)
18 | #'
19 | #' # virtual groups become real rows
20 | #' collect(bootstrapify(iris, 5))
21 | #'
22 | #' # add on the id column for an identifier per bootstrap
23 | #' collect(bootstrapify(iris, 5), id = ".id")
24 | #'
25 | #' # add on the original_id column to know which row this bootstrapped row
26 | #' # originally came from
27 | #' collect(bootstrapify(iris, 5), original_id = ".original_id")
28 | #'
29 | #' @export
30 | collect.resampled_df <- function(x, ..., id = NULL, original_id = NULL) {
31 |
32 | check_empty_dots(...)
33 | validate_null_or_single_character(id, "id")
34 | validate_null_or_single_character(original_id, "original_id")
35 |
36 | group_syms <- dplyr::groups(x)
37 | group_tbl <- dplyr::group_data(x)
38 | x <- dplyr::ungroup(x)
39 |
40 | # The only column names that are not in the x and are not '.rows' is the .key
41 | # Could potentially have multiple bootstrap columns
42 | .key <- setdiff(colnames(group_tbl), c(colnames(x), ".rows"))
43 |
44 | # Strip off non-virtual groups
45 | .out <- dplyr::select(group_tbl, !!!.key, .rows)
46 |
47 | # Order of these calls matters
48 | .out <- maybe_use_id(.out, id)
49 | .out <- add_straps(.out, x)
50 | .out <- maybe_use_original_id(.out, original_id)
51 |
52 | # Flatten
53 | .out <- tidyr::unnest(.out, cols = c(!!id, !!original_id, ...x))
54 |
55 | .out <- dplyr::group_by(.out, !!!group_syms)
56 |
57 | .out
58 | }
59 |
60 | # ------------------------------------------------------------------------------
61 |
62 | # id = 1:n for each group
63 | maybe_use_id <- function(.out, id) {
64 |
65 | if(!is.null(id)) {
66 |
67 | id_col <- map(.out[[".rows"]], seq_along)
68 |
69 | .out <- tibble::add_column(.out, !!id := id_col, .before = ".rows")
70 | }
71 |
72 | .out
73 | }
74 |
75 | # Repeat `x` rows to generate the bootstraps
76 | # Does vctrs::vec_slice() actually speed this up?
77 | # Limited benchmarking seemed inconclusive
78 | add_straps <- function(.out, x) {
79 |
80 | .out[["...x"]] <- map(
81 | .x = .out[[".rows"]],
82 | .f = function(idx) x[idx, , drop = FALSE]
83 | )
84 |
85 | .out
86 | }
87 |
88 | maybe_use_original_id <- function(.out, original_id) {
89 |
90 | if (!is.null(original_id)) {
91 | .out <- dplyr::rename(.out, !!original_id := .rows)
92 | }
93 | else {
94 | .out[[".rows"]] <- NULL
95 | }
96 |
97 | .out
98 | }
99 |
100 | # ------------------------------------------------------------------------------
101 |
102 | validate_null_or_single_character <- function(.x, .x_nm) {
103 |
104 | if (is.null(.x)) {
105 | return(invisible(.x))
106 | }
107 |
108 | if (!rlang::is_scalar_character(.x)) {
109 | msg <- paste0("`", .x_nm, "` must be a character of size 1.")
110 | rlang::abort(msg)
111 | }
112 |
113 | invisible(.x)
114 | }
115 |
--------------------------------------------------------------------------------
/R/compat-dplyr.R:
--------------------------------------------------------------------------------
1 | # ------------------------------------------------------------------------------
2 | # Interesting dplyr functions
3 |
4 | # summarise()
5 | # do()
6 | # ungroup()
7 | # group_nest()
8 | # group_map()
9 | # group_modify()
10 | # group_walk()
11 | # group_split()
12 | # group_keys()
13 | # group_indices()
14 |
15 | # In theory we could let the default `summarise()` do its thing. But if the
16 | # user did a double `bootstrapify()` call, only one level of it will be removed
17 | # and the post-summarise() object will still be a resampled_df, even though
18 | # all of the bootstrap rows have been materialized.
19 |
20 | #' @importFrom dplyr summarise
21 | #' @export
22 | summarise.resampled_df <- function(.data, ...) {
23 | maybe_new_grouped_df(NextMethod())
24 | }
25 |
26 | # For `group_nest()`, the default method works unless `keep = TRUE`. In that
27 | # case, we need to `collect()` so the groups are available to be 'kept'.
28 |
29 | #' @importFrom dplyr group_nest
30 | #' @export
31 | group_nest.resampled_df <- function(.tbl, ..., .key = "data", keep = FALSE) {
32 |
33 | if (keep) {
34 | dplyr::group_nest(collect(.tbl), ..., .key = .key, keep = keep)
35 | }
36 | else {
37 | NextMethod()
38 | }
39 |
40 | }
41 |
42 | # Same idea as group_nest()
43 |
44 | #' @importFrom dplyr group_split
45 | #' @export
46 | group_split.resampled_df <- function(.tbl, ..., keep = TRUE) {
47 |
48 | if (keep) {
49 | dplyr::group_split(collect(.tbl), ..., keep = keep)
50 | }
51 | else {
52 | NextMethod()
53 | }
54 |
55 | }
56 |
57 | # `group_indices()` returns garbage unless we `collect()` first
58 |
59 | #' @importFrom dplyr group_indices
60 | #' @export
61 | group_indices.resampled_df <- function(.data, ...) {
62 | dplyr::group_indices(collect(.data), ...)
63 | }
64 |
65 | # ------------------------------------------------------------------------------
66 | # Interesting dplyr functions - Standard evaluation backwards compat
67 |
68 | # nocov start
69 |
70 | #' @importFrom dplyr summarise_
71 | #' @export
72 | summarise_.resampled_df <- function(.data, ..., .dots = list()) {
73 | maybe_new_grouped_df(NextMethod())
74 | }
75 |
76 | #' @importFrom dplyr group_indices_
77 | #' @export
78 | group_indices_.resampled_df <- function(.data, ..., .dots = list()) {
79 | dplyr::group_indices_(collect(.data), ..., .dots = list())
80 | }
81 |
82 | # nocov end
83 |
84 | # ------------------------------------------------------------------------------
85 | # dplyr support
86 |
87 | #' @importFrom dplyr mutate
88 | #' @export
89 | mutate.resampled_df <- function(.data, ...) {
90 | dplyr::mutate(collect(.data), ...)
91 | }
92 |
93 | #' @importFrom dplyr transmute
94 | #' @export
95 | transmute.resampled_df <- function(.data, ...) {
96 | dplyr::transmute(collect(.data), ...)
97 | }
98 |
99 | # Required to export filter, otherwise:
100 | # Warning: declared S3 method 'filter.resampled_df' not found
101 | # because of stats::filter
102 |
103 | #' @export
104 | dplyr::filter
105 |
106 | #' @importFrom dplyr filter
107 | #' @export
108 | filter.resampled_df <- function(.data, ...) {
109 | dplyr::filter(collect(.data), ...)
110 | }
111 |
112 | #' @importFrom dplyr arrange
113 | #' @export
114 | arrange.resampled_df <- function(.data, ...) {
115 | dplyr::arrange(collect(.data), ...)
116 | }
117 |
118 | #' @importFrom dplyr distinct
119 | #' @export
120 | distinct.resampled_df <- function(.data, ..., .keep_all = FALSE) {
121 | dplyr::distinct(collect(.data), ..., .keep_all = .keep_all)
122 | }
123 |
124 | #' @importFrom dplyr select
125 | #' @export
126 | select.resampled_df <- function(.data, ...) {
127 | dplyr::select(collect(.data), ...)
128 | }
129 |
130 | #' @importFrom dplyr slice
131 | #' @export
132 | slice.resampled_df <- function(.data, ...) {
133 | dplyr::slice(collect(.data), ...)
134 | }
135 |
136 | #' @importFrom dplyr pull
137 | #' @export
138 | pull.resampled_df <- function(.data, var = -1) {
139 | dplyr::pull(collect(.data), var = !!rlang::enquo(var))
140 | }
141 |
142 | #' @importFrom dplyr rename
143 | #' @export
144 | rename.resampled_df <- function(.data, ...) {
145 | dplyr::rename(collect(.data), ...)
146 | }
147 |
148 | #' @importFrom dplyr full_join
149 | #' @export
150 | full_join.resampled_df <- function(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) {
151 | dplyr::full_join(collect(x), collect(y), by = by, copy = copy, suffix = suffix, ...)
152 | }
153 |
154 | #' @importFrom dplyr inner_join
155 | #' @export
156 | inner_join.resampled_df <- function(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) {
157 | dplyr::inner_join(collect(x), collect(y), by = by, copy = copy, suffix = suffix, ...)
158 | }
159 |
160 | #' @importFrom dplyr left_join
161 | #' @export
162 | left_join.resampled_df <- function(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) {
163 | dplyr::left_join(collect(x), collect(y), by = by, copy = copy, suffix = suffix, ...)
164 | }
165 |
166 | #' @importFrom dplyr right_join
167 | #' @export
168 | right_join.resampled_df <- function(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) {
169 | dplyr::right_join(collect(x), collect(y), by = by, copy = copy, suffix = suffix, ...)
170 | }
171 |
172 | #' @importFrom dplyr anti_join
173 | #' @export
174 | anti_join.resampled_df <- function(x, y, by = NULL, copy = FALSE, ...) {
175 | dplyr::anti_join(collect(x), collect(y), by = by, copy = copy, ...)
176 | }
177 |
178 | #' @importFrom dplyr semi_join
179 | #' @export
180 | semi_join.resampled_df <- function(x, y, by = NULL, copy = FALSE, ...) {
181 | dplyr::semi_join(collect(x), collect(y), by = by, copy = copy, ...)
182 | }
183 |
184 | #' @importFrom dplyr group_by
185 | #' @export
186 | group_by.resampled_df <- function(.data, ..., add = FALSE, .drop = FALSE) {
187 |
188 | if (add) {
189 | .data <- collect(.data)
190 | }
191 | else {
192 | .data <- dplyr::ungroup(.data)
193 | }
194 |
195 | dplyr::group_by(.data, ..., add = add, .drop = .drop)
196 | }
197 |
198 | # ------------------------------------------------------------------------------
199 | # Backwards compat support for deprecated standard eval dplyr
200 |
201 | # nocov start
202 |
203 | # Only a few of them need it. arrange_.grouped_df()
204 | # directly calls arrange_impl() causing a problem.
205 |
206 | #' @importFrom dplyr arrange_
207 | #' @export
208 | arrange_.resampled_df <- function(.data, ..., .dots = list()) {
209 | dplyr::arrange_(collect(.data), ..., .dots = .dots)
210 | }
211 |
212 | #' @importFrom dplyr mutate_
213 | #' @export
214 | mutate_.resampled_df <- function(.data, ..., .dots = list()) {
215 | dplyr::mutate_(collect(.data), ..., .dots = .dots)
216 | }
217 |
218 | #' @importFrom dplyr slice_
219 | #' @export
220 | slice_.resampled_df <- function(.data, ..., .dots = list()) {
221 | dplyr::slice_(collect(.data), ..., .dots = .dots)
222 | }
223 |
224 | # nocov end
225 |
226 | # ------------------------------------------------------------------------------
227 | # Util
228 |
229 | maybe_new_grouped_df <- function(x) {
230 |
231 | if (dplyr::is_grouped_df(x)) {
232 | x <- dplyr::new_grouped_df(x = x, groups = dplyr::group_data(x))
233 | }
234 |
235 | x
236 | }
237 |
--------------------------------------------------------------------------------
/R/compat-purrr.R:
--------------------------------------------------------------------------------
1 | # nocov start - compat-purrr (last updated: rlang 0.3.0.9000)
2 |
3 | # This file serves as a reference for compatibility functions for
4 | # purrr. They are not drop-in replacements but allow a similar style
5 | # of programming. This is useful in cases where purrr is too heavy a
6 | # package to depend on. Please find the most recent version in rlang's
7 | # repository.
8 |
9 | map <- function(.x, .f, ...) {
10 | lapply(.x, .f, ...)
11 | }
12 |
13 | map2 <- function(.x, .y, .f, ...) {
14 | out <- mapply(.f, .x, .y, MoreArgs = list(...), SIMPLIFY = FALSE)
15 | if (length(out) == length(.x)) {
16 | rlang::set_names(out, names(.x))
17 | } else {
18 | rlang::set_names(out, NULL)
19 | }
20 | }
21 |
22 | # nocov end
23 |
--------------------------------------------------------------------------------
/R/compat-tidyr.R:
--------------------------------------------------------------------------------
1 | # ------------------------------------------------------------------------------
2 | # tidyr support
3 |
4 | #' @importFrom tidyr nest
5 | #' @importFrom lifecycle deprecated
6 | #' @export
7 | nest.resampled_df <- function(.data, ..., .key = deprecated()) {
8 | tidyr::nest(collect(.data), ..., .key = .key)
9 | }
10 |
--------------------------------------------------------------------------------
/R/samplify.R:
--------------------------------------------------------------------------------
1 | #' Created a resampled tibble
2 | #'
3 | #' `samplify()` creates a resampled tibble with _virtual groups_.
4 | #'
5 | #' @details
6 | #'
7 | #' The following functions have special / interesting behavior when used with
8 | #' a `resampled_df`:
9 | #'
10 | #' - [dplyr::collect()]
11 | #'
12 | #' - [dplyr::summarise()]
13 | #'
14 | #' - [dplyr::do()]
15 | #'
16 | #' - [dplyr::group_map()]
17 | #'
18 | #' - [dplyr::group_modify()]
19 | #'
20 | #' - [dplyr::group_walk()]
21 | #'
22 | #' - [dplyr::group_nest()]
23 | #'
24 | #' - [dplyr::group_split()]
25 | #'
26 | #' @param data A tbl.
27 | #'
28 | #' @param times A single integer specifying the number of resamples.
29 | #' If the `tibble` is grouped, this is the number of resamples per group.
30 | #'
31 | #' @param size A single integer specifying the size of each resample. For a
32 | #' grouped data frame, this is also allowed to be an integer vector with size
33 | #' equal to the number of groups in `data`. This can be helpful when sampling
34 | #' without replacement when the number of rows per group is very different.
35 | #'
36 | #' @param ... Not used.
37 | #'
38 | #' @param replace Whether or not to sample with replacement.
39 | #'
40 | #' @param key A single character specifying the name of the virtual group
41 | #' that is added.
42 | #'
43 | #' @return A `resampled_df` with an extra group specified by the `key`.
44 | #'
45 | #' @examples
46 | #' library(dplyr)
47 | #' library(broom)
48 | #'
49 | #' samplify(iris, times = 3, size = 20)
50 | #'
51 | #' iris %>%
52 | #' samplify(times = 3, size = 20) %>%
53 | #' summarise(per_strap_mean = mean(Petal.Width))
54 | #'
55 | #' iris %>%
56 | #' group_by(Species) %>%
57 | #' samplify(times = 3, size = 20) %>%
58 | #' summarise(per_strap_species_mean = mean(Petal.Width))
59 | #'
60 | #' # Alter the name of the group with `key`
61 | #' # Materialize them with collect()
62 | #' samps <- samplify(iris, times = 3, size = 5, key = ".samps")
63 | #' collect(samps)
64 | #'
65 | #' collect(samps, id = ".id", original_id = ".orig_id")
66 | #'
67 | #' #----------------------------------------------------------------------------
68 | #'
69 | #' # Be careful not to specify a `size` larger
70 | #' # than one of your groups! This will throw an error.
71 | #'
72 | #' iris_group_sizes_of_50_and_5 <- iris[1:55,] %>%
73 | #' group_by(Species) %>%
74 | #' group_trim()
75 | #'
76 | #' count(iris_group_sizes_of_50_and_5, Species)
77 | #'
78 | #' # size = 10 > min_group_size = 5
79 | #' \dontrun{
80 | #' iris_group_sizes_of_50_and_5 %>%
81 | #' samplify(times = 2, size = 10)
82 | #' }
83 | #'
84 | #' # Instead, pass a vector of sizes to `samplify()` if this
85 | #' # structure is absolutely required for your use case.
86 | #'
87 | #' # size of 10 for the first group
88 | #' # size of 5 for the second group
89 | #' # total number of rows is 10 * 2 + 5 * 2 = 30
90 | #' iris_group_sizes_of_50_and_5 %>%
91 | #' samplify(times = 2, size = c(10, 5)) %>%
92 | #' collect()
93 | #'
94 | #' @family virtual samplers
95 | #'
96 | #' @seealso [collect.resampled_df()]
97 | #'
98 | #' @name samplify
99 | NULL
100 |
101 | #' @rdname samplify
102 | #' @export
103 | samplify <- function(data, times, size, ...,
104 | replace = FALSE, key = ".sample") {
105 |
106 | check_empty_dots(...)
107 | validate_is_scalar_character(key, "key")
108 | validate_is_bool(replace, "replace")
109 | validate_is_scalar_positive_integerish(times, "times")
110 | validate_is_positive_integerish(size, "size")
111 |
112 | UseMethod("samplify")
113 | }
114 |
115 | #' @export
116 | samplify.data.frame <- function(data, times, size, ...,
117 | replace = FALSE, key = ".sample") {
118 | samplify(
119 | data = dplyr::as_tibble(data),
120 | times = times,
121 | size = size,
122 | ...,
123 | replace = replace,
124 | key = key
125 | )
126 | }
127 |
128 | #' @export
129 | samplify.tbl_df <- function(data, times, size, ...,
130 | replace = FALSE, key = ".sample") {
131 |
132 | # additional check for size 1 here
133 | validate_is(
134 | size, "size", rlang::is_scalar_integerish,
135 | "a single integer (for ungrouped data frames)"
136 | )
137 |
138 | .row_slice_ids <- seq_len(nrow(data))
139 |
140 | group_info <- index_sampler(
141 | .row_slice_ids = .row_slice_ids,
142 | times = times,
143 | key = key,
144 | size = size,
145 | replace = replace
146 | )
147 |
148 | dplyr::new_grouped_df(data, group_info, class = "resampled_df")
149 | }
150 |
151 | #' @export
152 | samplify.grouped_df <- function(data, times, size, ...,
153 | replace = FALSE, key = ".sample") {
154 |
155 | size <- recycle_size(size, dplyr::n_groups(data))
156 |
157 | # extract existing group_info
158 | group_info <- dplyr::group_data(data)
159 | index_list <- group_info[[".rows"]]
160 |
161 | new_row_index_tbl <- map2(
162 | .x = index_list,
163 | .y = size,
164 | .f = function(.x, .y) {
165 | index_sampler(
166 | .row_slice_ids = .x,
167 | times = times,
168 | key = key,
169 | size = .y,
170 | replace = replace
171 | )
172 | }
173 | )
174 |
175 | # overwrite current .rows and unnest
176 | group_info[[".rows"]] <- new_row_index_tbl
177 | group_info <- tidyr::unnest(group_info, cols = .rows, names_repair = "universal")
178 |
179 | dplyr::new_grouped_df(data, group_info, class = "resampled_df")
180 | }
181 |
182 | # ------------------------------------------------------------------------------
183 | # Utility
184 |
185 | # Actually perform the resampling of the row indices
186 | # and create the group tbl information from that
187 |
188 | index_sampler <- function(.row_slice_ids,
189 | times,
190 | key,
191 | size,
192 | replace = FALSE) {
193 |
194 | check_size(size, length(.row_slice_ids), replace)
195 |
196 | .bootstrap_id <- seq_len(times)
197 |
198 | # must unquote the colname as `.rows` is an arg to tibble()
199 | .row_col <- ".rows"
200 |
201 | .index_list <- replicate(
202 | n = times,
203 | expr = sample(
204 | x = .row_slice_ids,
205 | size = size,
206 | replace = replace,
207 | prob = NULL
208 | ),
209 | simplify = FALSE
210 | )
211 |
212 | dplyr::tibble(
213 | !!key := .bootstrap_id,
214 | !!.row_col := .index_list
215 | )
216 |
217 | }
218 |
219 | # dplyr:::check_size()
220 | check_size <- function (size, n, replace = FALSE) {
221 | if (size <= n || replace)
222 | return(invisible(size))
223 |
224 | msg <- paste0(
225 | "`size` (%i) must be less than or equal to the ",
226 | "size of the data / current group (%i), ",
227 | "set `replace = TRUE` to use sampling with replacement."
228 | )
229 |
230 | stop(sprintf(msg, size, n), call. = FALSE)
231 | }
232 |
233 | recycle_size <- function(.x, .n) {
234 |
235 | .n_x <- length(.x)
236 |
237 | if (.n_x == .n) {
238 | return(.x)
239 | }
240 |
241 | if (.n_x == 1L) {
242 | .x <- rep(.x, times = .n)
243 | return(.x)
244 | }
245 |
246 | msg <- paste0("`size` must be size 1 or ", .n, " (the number of groups).")
247 | rlang::abort(msg)
248 | }
249 |
250 | # ------------------------------------------------------------------------------
251 | # Validation
252 |
253 | validate_is <- function(.x, .x_nm, .is_f, .expected) {
254 |
255 | if (!.is_f(.x)) {
256 | msg <- paste0("`", .x_nm, "` must be ", .expected, ".")
257 | rlang::abort(msg)
258 | }
259 |
260 | invisible(.x)
261 | }
262 |
263 | validate_is_scalar_character <- function(.x, .x_nm) {
264 | validate_is(.x, .x_nm, rlang::is_scalar_character, "a single character")
265 | }
266 |
267 | is_bool <- function (x) {
268 | rlang::is_logical(x, n = 1) && !is.na(x)
269 | }
270 |
271 | validate_is_bool <- function(.x, .x_nm) {
272 | validate_is(.x, .x_nm, is_bool, "a single logical (TRUE/FALSE)")
273 | }
274 |
275 | is_positive <- function(x) {
276 | isTRUE(all(x > 0))
277 | }
278 |
279 | validate_is_scalar_positive_integerish <- function(.x, .x_nm) {
280 | validate_is(.x, .x_nm, rlang::is_scalar_integerish, "a single integer")
281 | validate_is(.x, .x_nm, is_positive, "a positive integer")
282 | }
283 |
284 | validate_is_positive_integerish <- function(.x, .x_nm) {
285 | validate_is(.x, .x_nm, rlang::is_integerish, "an integer vector")
286 | validate_is(.x, .x_nm, is_positive, "a positive integer vector")
287 | }
288 |
--------------------------------------------------------------------------------
/R/util.R:
--------------------------------------------------------------------------------
1 | check_empty_dots <- function(...) {
2 | dots <- rlang::enquos(...)
3 |
4 | dots_are_empty <- length(dots) == 0L
5 |
6 | if (!dots_are_empty) {
7 | rlang::abort("`...` must be empty.")
8 | }
9 |
10 | invisible()
11 | }
12 |
--------------------------------------------------------------------------------
/R/zzz.R:
--------------------------------------------------------------------------------
1 | #' @importFrom dplyr collect
2 | NULL
3 |
4 | #' @importFrom rlang :=
5 | NULL
6 |
7 | # For collect()
8 | utils::globalVariables(
9 | c(
10 | ".rows",
11 | "...x"
12 | )
13 | )
14 |
--------------------------------------------------------------------------------
/README.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | output: github_document
3 | editor_options:
4 | chunk_output_type: console
5 | ---
6 |
7 |
8 |
9 | ```{r setup, include = FALSE}
10 | knitr::opts_chunk$set(
11 | collapse = TRUE,
12 | comment = "#>",
13 | fig.path = "man/figures/README-",
14 | out.width = "100%"
15 | )
16 | ```
17 | # strapgod
18 |
19 |
20 | [](https://codecov.io/gh/DavisVaughan/strapgod?branch=master)
21 | [](https://travis-ci.org/DavisVaughan/strapgod)
22 | [](https://cran.r-project.org/package=strapgod)
23 | [](https://www.tidyverse.org/lifecycle/#experimental)
24 |
25 |
26 |
27 |
28 |
29 |
30 | ## Introduction
31 |
32 | The goal of strapgod is to create _virtual groups_ on top of a `tibble` or `grouped_df` as a way of resampling the original data frame. You can then efficiently perform various dplyr operations on this `resampled_df`, like: `summarise()`, `do()`, `group_map()`, and more, to easily compute bootstrapped and resampled statistics.
33 |
34 | ## Installation
35 |
36 | You can install the released version of strapgod from [CRAN](https://CRAN.R-project.org) with:
37 |
38 | ``` r
39 | install.packages("strapgod")
40 | ```
41 |
42 | Install the development version from GitHub with:
43 |
44 | ``` r
45 | devtools::install_github("DavisVaughan/strapgod")
46 | ```
47 |
48 | ## Learning about strapgod
49 |
50 | If you aren't already on the [pkgdown site](https://davisvaughan.github.io/strapgod/), I would encourage starting there. From there, you will be able to click on these two vignettes to learn about working with resampled tibbles.
51 |
52 | - `vignette("virtual-bootstraps", "strapgod")`
53 |
54 | - `vignette("dplyr-support", "strapgod")`
55 |
56 | ## Example
57 |
58 | Create resampled data frames with `bootstrapify()` or `samplify()`. Notice how we grouped by the _virtual_ column, `.bootstrap` and there are still only 150 rows even though we bootstrapped this dataset 10 times.
59 |
60 | ```{r first-strap, warning=FALSE, message=FALSE}
61 | library(strapgod)
62 | library(dplyr)
63 | set.seed(123)
64 |
65 | bootstrapify(iris, 10)
66 | ```
67 |
68 | You can feed a `resampled_df` into `summarise()` or `group_modify()` to perform
69 | efficient bootstrapped computations.
70 |
71 | ```{r summarise}
72 | iris %>%
73 | bootstrapify(10) %>%
74 | summarise(per_strap_mean = mean(Petal.Width))
75 | ```
76 |
77 | The original data can be grouped as well, and the bootstraps will be created for each group.
78 |
79 | ```{r summarise-by-group}
80 | iris %>%
81 | group_by(Species) %>%
82 | bootstrapify(10) %>%
83 | summarise(per_strap_per_species_mean = mean(Petal.Width))
84 | ```
85 |
86 | ## Plotting bootstrapped results
87 |
88 | A fun example of using strapgod is to create bootstrapped visualizations quickly and easily for hypothetical outcome plots.
89 |
90 | ```{r non-bootstrap-plots, warning=FALSE, message=FALSE}
91 | set.seed(123)
92 | library(ggplot2)
93 |
94 | # without bootstrap
95 | mtcars %>%
96 | ggplot(aes(hp, mpg)) +
97 | geom_smooth(se = FALSE) +
98 | ylim(y = c(0, 40))
99 | ```
100 |
101 | ```{r bootstrap-plots, warning=FALSE, message=FALSE}
102 | # with bootstrap
103 | mtcars %>%
104 | bootstrapify(10) %>%
105 | collect() %>%
106 | ggplot(aes(hp, mpg, group = .bootstrap)) +
107 | geom_smooth(se = FALSE) +
108 | ylim(y = c(0, 40))
109 | ```
110 |
111 | ## In the wild
112 |
113 | - Claus Wilke has used strapgod to power some pieces of his [ungeviz](https://github.com/wilkelab/ungeviz) package for visualizing uncertainty.
114 |
115 | - You can watch Claus's [rstudio::conf 2019](https://resources.rstudio.com/rstudio-conf-2019/visualizing-uncertainty-with-hypothetical-outcomes-plots) talk to see ungeviz and strapgod in action.
116 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | # strapgod
5 |
6 |
7 |
8 | [](https://codecov.io/gh/DavisVaughan/strapgod?branch=master)
10 | [](https://travis-ci.org/DavisVaughan/strapgod)
12 | [](https://cran.r-project.org/package=strapgod)
14 | [](https://www.tidyverse.org/lifecycle/#experimental)
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 | ## Introduction
25 |
26 | The goal of strapgod is to create *virtual groups* on top of a `tibble`
27 | or `grouped_df` as a way of resampling the original data frame. You can
28 | then efficiently perform various dplyr operations on this
29 | `resampled_df`, like: `summarise()`, `do()`, `group_map()`, and more, to
30 | easily compute bootstrapped and resampled statistics.
31 |
32 | ## Installation
33 |
34 | You can install the released version of strapgod from
35 | [CRAN](https://CRAN.R-project.org) with:
36 |
37 | ``` r
38 | install.packages("strapgod")
39 | ```
40 |
41 | Install the development version from GitHub with:
42 |
43 | ``` r
44 | devtools::install_github("DavisVaughan/strapgod")
45 | ```
46 |
47 | ## Learning about strapgod
48 |
49 | If you aren’t already on the [pkgdown
50 | site](https://davisvaughan.github.io/strapgod/), I would encourage
51 | starting there. From there, you will be able to click on these two
52 | vignettes to learn about working with resampled tibbles.
53 |
54 | - `vignette("virtual-bootstraps", "strapgod")`
55 |
56 | - `vignette("dplyr-support", "strapgod")`
57 |
58 | ## Example
59 |
60 | Create resampled data frames with `bootstrapify()` or `samplify()`.
61 | Notice how we grouped by the *virtual* column, `.bootstrap` and there
62 | are still only 150 rows even though we bootstrapped this dataset 10
63 | times.
64 |
65 | ``` r
66 | library(strapgod)
67 | library(dplyr)
68 | set.seed(123)
69 |
70 | bootstrapify(iris, 10)
71 | #> # A tibble: 150 x 5
72 | #> # Groups: .bootstrap [10]
73 | #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
74 | #>
75 | #> 1 5.1 3.5 1.4 0.2 setosa
76 | #> 2 4.9 3 1.4 0.2 setosa
77 | #> 3 4.7 3.2 1.3 0.2 setosa
78 | #> 4 4.6 3.1 1.5 0.2 setosa
79 | #> 5 5 3.6 1.4 0.2 setosa
80 | #> 6 5.4 3.9 1.7 0.4 setosa
81 | #> 7 4.6 3.4 1.4 0.3 setosa
82 | #> 8 5 3.4 1.5 0.2 setosa
83 | #> 9 4.4 2.9 1.4 0.2 setosa
84 | #> 10 4.9 3.1 1.5 0.1 setosa
85 | #> # … with 140 more rows
86 | ```
87 |
88 | You can feed a `resampled_df` into `summarise()` or `group_map()` to
89 | perform efficient bootstrapped computations.
90 |
91 | ``` r
92 | iris %>%
93 | bootstrapify(10) %>%
94 | summarise(per_strap_mean = mean(Petal.Width))
95 | #> # A tibble: 10 x 2
96 | #> .bootstrap per_strap_mean
97 | #>
98 | #> 1 1 1.20
99 | #> 2 2 1.22
100 | #> 3 3 1.23
101 | #> 4 4 1.13
102 | #> 5 5 1.20
103 | #> 6 6 1.15
104 | #> 7 7 1.18
105 | #> 8 8 1.13
106 | #> 9 9 1.31
107 | #> 10 10 1.19
108 | ```
109 |
110 | The original data can be grouped as well, and the bootstraps will be
111 | created for each group.
112 |
113 | ``` r
114 | iris %>%
115 | group_by(Species) %>%
116 | bootstrapify(10) %>%
117 | summarise(per_strap_per_species_mean = mean(Petal.Width))
118 | #> # A tibble: 30 x 3
119 | #> # Groups: Species [3]
120 | #> Species .bootstrap per_strap_per_species_mean
121 | #>
122 | #> 1 setosa 1 0.25
123 | #> 2 setosa 2 0.246
124 | #> 3 setosa 3 0.24
125 | #> 4 setosa 4 0.238
126 | #> 5 setosa 5 0.252
127 | #> 6 setosa 6 0.274
128 | #> 7 setosa 7 0.238
129 | #> 8 setosa 8 0.258
130 | #> 9 setosa 9 0.252
131 | #> 10 setosa 10 0.256
132 | #> # … with 20 more rows
133 | ```
134 |
135 | ## Plotting bootstrapped results
136 |
137 | A fun example of using strapgod is to create bootstrapped visualizations
138 | quickly and easily for hypothetical outcome plots.
139 |
140 | ``` r
141 | set.seed(123)
142 | library(ggplot2)
143 |
144 | # without bootstrap
145 | mtcars %>%
146 | ggplot(aes(hp, mpg)) +
147 | geom_smooth(se = FALSE) +
148 | ylim(y = c(0, 40))
149 | ```
150 |
151 |
152 |
153 | ``` r
154 | # with bootstrap
155 | mtcars %>%
156 | bootstrapify(10) %>%
157 | collect() %>%
158 | ggplot(aes(hp, mpg, group = .bootstrap)) +
159 | geom_smooth(se = FALSE) +
160 | ylim(y = c(0, 40))
161 | ```
162 |
163 |
164 |
165 | ## In the wild
166 |
167 | - Claus Wilke has used strapgod to power some pieces of his
168 | [ungeviz](https://github.com/wilkelab/ungeviz) package for
169 | visualizing uncertainty.
170 |
171 | - You can watch Claus’s
172 | [rstudio::conf 2019](https://resources.rstudio.com/rstudio-conf-2019/visualizing-uncertainty-with-hypothetical-outcomes-plots)
173 | talk to see ungeviz and strapgod in action.
174 |
--------------------------------------------------------------------------------
/_pkgdown.yml:
--------------------------------------------------------------------------------
1 | destination: docs
2 |
--------------------------------------------------------------------------------
/codecov.yml:
--------------------------------------------------------------------------------
1 | comment: false
2 |
3 | coverage:
4 | status:
5 | project:
6 | default:
7 | target: auto
8 | threshold: 1%
9 | patch:
10 | default:
11 | target: auto
12 | threshold: 1%
13 |
--------------------------------------------------------------------------------
/cran-comments.md:
--------------------------------------------------------------------------------
1 | ## Comments
2 | This is the fourth release of this package.
3 |
4 | This release requires tidyr 1.0.0 and updates the `resampled_df` method for
5 | `tidyr::nest()`.
6 |
7 | ## Test environments
8 | * local OS X install, R 3.6.0
9 | * ubuntu 14.04 (on travis-ci) (devel and release)
10 | * win-builder (devel and release)
11 |
12 | ## R CMD check results
13 |
14 | 0 errors | 0 warnings | 0 notes
15 |
--------------------------------------------------------------------------------
/man/bootstrapify.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/bootstrapify.R
3 | \name{bootstrapify}
4 | \alias{bootstrapify}
5 | \title{Create a bootstrapped tibble}
6 | \usage{
7 | bootstrapify(data, times, ..., key = ".bootstrap")
8 | }
9 | \arguments{
10 | \item{data}{A tbl.}
11 |
12 | \item{times}{A single integer specifying the number of resamples.
13 | If the \code{tibble} is grouped, this is the number of resamples per group.}
14 |
15 | \item{...}{Not used.}
16 |
17 | \item{key}{A single character specifying the name of the virtual group
18 | that is added.}
19 | }
20 | \value{
21 | A \code{resampled_df} with an extra group specified by the \code{key}.
22 | }
23 | \description{
24 | \code{bootstrapify()} creates a bootstrapped tibble with \emph{virtual groups}.
25 | }
26 | \details{
27 | The following functions have special / interesting behavior when used with
28 | a \code{resampled_df}:
29 | \itemize{
30 | \item \code{\link[dplyr:collect]{dplyr::collect()}}
31 | \item \code{\link[dplyr:summarise]{dplyr::summarise()}}
32 | \item \code{\link[dplyr:do]{dplyr::do()}}
33 | \item \code{\link[dplyr:group_map]{dplyr::group_map()}}
34 | \item \code{\link[dplyr:group_modify]{dplyr::group_modify()}}
35 | \item \code{\link[dplyr:group_walk]{dplyr::group_walk()}}
36 | \item \code{\link[dplyr:group_nest]{dplyr::group_nest()}}
37 | \item \code{\link[dplyr:group_split]{dplyr::group_split()}}
38 | }
39 | }
40 | \examples{
41 | library(dplyr)
42 | library(broom)
43 |
44 | bootstrapify(iris, 5)
45 |
46 | iris \%>\%
47 | bootstrapify(5) \%>\%
48 | summarise(per_strap_mean = mean(Petal.Width))
49 |
50 | iris \%>\%
51 | group_by(Species) \%>\%
52 | bootstrapify(5) \%>\%
53 | summarise(per_strap_species_mean = mean(Petal.Width))
54 |
55 | iris \%>\%
56 | bootstrapify(5) \%>\%
57 | do(tidy(lm(Sepal.Width ~ Sepal.Length + Species, data = .)))
58 |
59 | # Alternatively, use the newer group_modify()
60 | iris \%>\%
61 | bootstrapify(5) \%>\%
62 | group_modify(~tidy(lm(Sepal.Width ~ Sepal.Length + Species, data = .x)))
63 |
64 | # Alter the name of the group with `key`
65 | # Materialize them with collect()
66 | straps <- bootstrapify(iris, 5, key = ".straps")
67 | collect(straps)
68 |
69 | }
70 | \seealso{
71 | \code{\link[=collect.resampled_df]{collect.resampled_df()}}
72 |
73 | Other virtual samplers: \code{\link{samplify}}
74 | }
75 | \concept{virtual samplers}
76 |
--------------------------------------------------------------------------------
/man/collect.resampled_df.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/collect.R
3 | \name{collect.resampled_df}
4 | \alias{collect.resampled_df}
5 | \title{Force virtual groups to become explicit rows}
6 | \usage{
7 | \method{collect}{resampled_df}(x, ..., id = NULL, original_id = NULL)
8 | }
9 | \arguments{
10 | \item{x}{A \code{resampled_df}.}
11 |
12 | \item{...}{Not used.}
13 |
14 | \item{id}{Optional. A single character that specifies a name for a column
15 | containing a sequence from \code{1:n} for each bootstrap group.}
16 |
17 | \item{original_id}{Optional. A single character that specifies a name for
18 | a column containing the original position of the bootstrapped row.}
19 | }
20 | \description{
21 | When \code{collect()} is used on a \code{resampled_df}, the virtual bootstrap groups
22 | are made explicit.
23 | }
24 | \examples{
25 | library(dplyr)
26 |
27 | # virtual groups become real rows
28 | collect(bootstrapify(iris, 5))
29 |
30 | # add on the id column for an identifier per bootstrap
31 | collect(bootstrapify(iris, 5), id = ".id")
32 |
33 | # add on the original_id column to know which row this bootstrapped row
34 | # originally came from
35 | collect(bootstrapify(iris, 5), original_id = ".original_id")
36 |
37 | }
38 |
--------------------------------------------------------------------------------
/man/figures/README-bootstrap-plots-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavisVaughan/strapgod/ea2b1ecfc780a44ffa934c9bc7c2032954f4ffaa/man/figures/README-bootstrap-plots-1.png
--------------------------------------------------------------------------------
/man/figures/README-non-bootstrap-plots-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavisVaughan/strapgod/ea2b1ecfc780a44ffa934c9bc7c2032954f4ffaa/man/figures/README-non-bootstrap-plots-1.png
--------------------------------------------------------------------------------
/man/figures/strap-god.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DavisVaughan/strapgod/ea2b1ecfc780a44ffa934c9bc7c2032954f4ffaa/man/figures/strap-god.jpg
--------------------------------------------------------------------------------
/man/reexports.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/compat-dplyr.R
3 | \docType{import}
4 | \name{reexports}
5 | \alias{reexports}
6 | \alias{filter}
7 | \title{Objects exported from other packages}
8 | \keyword{internal}
9 | \description{
10 | These objects are imported from other packages. Follow the links
11 | below to see their documentation.
12 |
13 | \describe{
14 | \item{dplyr}{\code{\link[dplyr]{filter}}}
15 | }}
16 |
17 |
--------------------------------------------------------------------------------
/man/samplify.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/samplify.R
3 | \name{samplify}
4 | \alias{samplify}
5 | \title{Created a resampled tibble}
6 | \usage{
7 | samplify(data, times, size, ..., replace = FALSE, key = ".sample")
8 | }
9 | \arguments{
10 | \item{data}{A tbl.}
11 |
12 | \item{times}{A single integer specifying the number of resamples.
13 | If the \code{tibble} is grouped, this is the number of resamples per group.}
14 |
15 | \item{size}{A single integer specifying the size of each resample. For a
16 | grouped data frame, this is also allowed to be an integer vector with size
17 | equal to the number of groups in \code{data}. This can be helpful when sampling
18 | without replacement when the number of rows per group is very different.}
19 |
20 | \item{...}{Not used.}
21 |
22 | \item{replace}{Whether or not to sample with replacement.}
23 |
24 | \item{key}{A single character specifying the name of the virtual group
25 | that is added.}
26 | }
27 | \value{
28 | A \code{resampled_df} with an extra group specified by the \code{key}.
29 | }
30 | \description{
31 | \code{samplify()} creates a resampled tibble with \emph{virtual groups}.
32 | }
33 | \details{
34 | The following functions have special / interesting behavior when used with
35 | a \code{resampled_df}:
36 | \itemize{
37 | \item \code{\link[dplyr:collect]{dplyr::collect()}}
38 | \item \code{\link[dplyr:summarise]{dplyr::summarise()}}
39 | \item \code{\link[dplyr:do]{dplyr::do()}}
40 | \item \code{\link[dplyr:group_map]{dplyr::group_map()}}
41 | \item \code{\link[dplyr:group_modify]{dplyr::group_modify()}}
42 | \item \code{\link[dplyr:group_walk]{dplyr::group_walk()}}
43 | \item \code{\link[dplyr:group_nest]{dplyr::group_nest()}}
44 | \item \code{\link[dplyr:group_split]{dplyr::group_split()}}
45 | }
46 | }
47 | \examples{
48 | library(dplyr)
49 | library(broom)
50 |
51 | samplify(iris, times = 3, size = 20)
52 |
53 | iris \%>\%
54 | samplify(times = 3, size = 20) \%>\%
55 | summarise(per_strap_mean = mean(Petal.Width))
56 |
57 | iris \%>\%
58 | group_by(Species) \%>\%
59 | samplify(times = 3, size = 20) \%>\%
60 | summarise(per_strap_species_mean = mean(Petal.Width))
61 |
62 | # Alter the name of the group with `key`
63 | # Materialize them with collect()
64 | samps <- samplify(iris, times = 3, size = 5, key = ".samps")
65 | collect(samps)
66 |
67 | collect(samps, id = ".id", original_id = ".orig_id")
68 |
69 | #----------------------------------------------------------------------------
70 |
71 | # Be careful not to specify a `size` larger
72 | # than one of your groups! This will throw an error.
73 |
74 | iris_group_sizes_of_50_and_5 <- iris[1:55,] \%>\%
75 | group_by(Species) \%>\%
76 | group_trim()
77 |
78 | count(iris_group_sizes_of_50_and_5, Species)
79 |
80 | # size = 10 > min_group_size = 5
81 | \dontrun{
82 | iris_group_sizes_of_50_and_5 \%>\%
83 | samplify(times = 2, size = 10)
84 | }
85 |
86 | # Instead, pass a vector of sizes to `samplify()` if this
87 | # structure is absolutely required for your use case.
88 |
89 | # size of 10 for the first group
90 | # size of 5 for the second group
91 | # total number of rows is 10 * 2 + 5 * 2 = 30
92 | iris_group_sizes_of_50_and_5 \%>\%
93 | samplify(times = 2, size = c(10, 5)) \%>\%
94 | collect()
95 |
96 | }
97 | \seealso{
98 | \code{\link[=collect.resampled_df]{collect.resampled_df()}}
99 |
100 | Other virtual samplers: \code{\link{bootstrapify}}
101 | }
102 | \concept{virtual samplers}
103 |
--------------------------------------------------------------------------------
/strapgod.Rproj:
--------------------------------------------------------------------------------
1 | Version: 1.0
2 |
3 | RestoreWorkspace: No
4 | SaveWorkspace: No
5 | AlwaysSaveHistory: Default
6 |
7 | EnableCodeIndexing: Yes
8 | UseSpacesForTab: Yes
9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 |
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 |
15 | AutoAppendNewline: Yes
16 | StripTrailingWhitespace: Yes
17 |
18 | BuildType: Package
19 | PackageUseDevtools: Yes
20 | PackageInstallArgs: --no-multiarch --with-keep.source
21 | PackageRoxygenize: rd,collate,namespace
22 |
--------------------------------------------------------------------------------
/tests/testthat.R:
--------------------------------------------------------------------------------
1 | library(testthat)
2 | library(strapgod)
3 |
4 | test_check("strapgod")
5 |
--------------------------------------------------------------------------------
/tests/testthat/test-bootstrapify.R:
--------------------------------------------------------------------------------
1 | context("test-bootstrapify")
2 |
3 | library(dplyr)
4 |
5 | test_that("can create bootstrapped data frames", {
6 |
7 | expect_error(
8 | x <- bootstrapify(iris, 2),
9 | NA
10 | )
11 |
12 | expect_is(x, "resampled_df")
13 | expect_is(x, "grouped_df")
14 |
15 | x_gd <- group_data(x)
16 |
17 | expect_equal(
18 | colnames(x_gd),
19 | c(".bootstrap", ".rows")
20 | )
21 |
22 | expect_equal(
23 | x_gd[[".bootstrap"]],
24 | c(1, 2)
25 | )
26 |
27 | expect_equal(
28 | nrow(x_gd),
29 | 2
30 | )
31 |
32 | expect_equal(
33 | unique(vapply(x_gd$.rows, length, integer(1))),
34 | 150
35 | )
36 |
37 | })
38 |
39 | test_that("can bootstrap with groups", {
40 |
41 | x <- iris %>%
42 | group_by(Species) %>%
43 | bootstrapify(5)
44 |
45 | x_gd <- group_data(x)
46 |
47 | expect_equal(
48 | colnames(x_gd),
49 | c("Species", ".bootstrap", ".rows")
50 | )
51 |
52 | expect_equal(
53 | nrow(x_gd),
54 | 15
55 | )
56 |
57 | expect_equal(
58 | unique(vapply(x_gd$.rows, length, integer(1))),
59 | 50
60 | )
61 |
62 | })
63 |
64 | test_that("can correctly double bootstrap", {
65 |
66 | once <- iris %>%
67 | bootstrapify(5)
68 |
69 | twice <- once %>%
70 | bootstrapify(5, key = ".bootstrap1")
71 |
72 | once_gd <- group_data(once)
73 | twice_gd <- group_data(twice)
74 |
75 | once_strap_1 <- once_gd$.rows[[1]]
76 |
77 | twice_1 <- dplyr::filter(twice_gd, .bootstrap == 1)
78 |
79 | # Check that the indices generated from bootstrapping the second
80 | # time are subsets of the indices in each group from the first
81 | # bootstrap
82 |
83 | each_twice_is_subset_of_once <- vapply(
84 | X = twice_1$.rows,
85 | FUN = function(x) {all(x %in% once_strap_1)},
86 | FUN.VALUE = logical(1)
87 | )
88 |
89 | expect_true(all(each_twice_is_subset_of_once))
90 |
91 | expect_equal(
92 | colnames(twice_gd),
93 | c(".bootstrap", ".bootstrap1", ".rows")
94 | )
95 |
96 | })
97 |
98 | test_that("universal name repair kicks in with double bootstrap", {
99 | once <- iris %>%
100 | bootstrapify(5)
101 |
102 | # Expect universal name repair
103 | expect_message(
104 | twice <- once %>%
105 | bootstrapify(5),
106 | ".bootstrap -> .bootstrap...1"
107 | )
108 |
109 | expect_equal(
110 | colnames(group_data(twice)),
111 | c(".bootstrap...1", ".bootstrap...2", ".rows")
112 | )
113 | })
114 |
115 | test_that("can alter the key", {
116 | x <- bootstrapify(iris, 5, key = ".boot")
117 |
118 | expect_equal(
119 | colnames(group_data(x))[1],
120 | ".boot"
121 | )
122 | })
123 |
124 | test_that("cannot pass into the `...`", {
125 | expect_error(
126 | bootstrapify(iris, 5, "key-in-dots"),
127 | "`...` must be empty"
128 | )
129 | })
130 |
131 | test_that("invalid inputs are caught", {
132 | expect_error(
133 | bootstrapify(iris, 5, key = 1),
134 | "`key` must be a single character."
135 | )
136 | })
137 |
--------------------------------------------------------------------------------
/tests/testthat/test-collect.R:
--------------------------------------------------------------------------------
1 | context("test-collect")
2 |
3 | library(dplyr)
4 |
5 | test_that("can collect()", {
6 |
7 | x <- bootstrapify(iris, 5)
8 |
9 | expect_error(
10 | x_c <- collect(x),
11 | NA
12 | )
13 |
14 | expect_equal(
15 | nrow(x_c),
16 | 750
17 | )
18 |
19 | expect_is(
20 | x_c,
21 | "grouped_df"
22 | )
23 |
24 | expect_false("resample_df" %in% class(x_c))
25 |
26 | expect_equal(
27 | colnames(x_c),
28 | c(".bootstrap", colnames(iris))
29 | )
30 |
31 | })
32 |
33 | test_that("can do a double bootstrap collect", {
34 |
35 | x <- iris %>%
36 | bootstrapify(5) %>%
37 | bootstrapify(10, key = ".bootstrap_2")
38 |
39 | x_c <- collect(x)
40 |
41 | expect_equal(
42 | nrow(x_c),
43 | nrow(iris) * 5 * 10
44 | )
45 |
46 | expect_equal(
47 | x_c$.bootstrap,
48 | rep(1:5, each = nrow(iris) * 10)
49 | )
50 |
51 | expect_equal(
52 | x_c$.bootstrap_2,
53 | rep(1:10, each = nrow(iris), times = 5)
54 | )
55 |
56 | })
57 |
58 | test_that("all groups are preserved", {
59 |
60 | iris_g <- iris %>%
61 | group_by(Species)
62 |
63 | x <- bootstrapify(iris_g, 5)
64 |
65 | x_c <- collect(x)
66 |
67 | expect_equal(
68 | group_vars(x_c),
69 | c("Species", ".bootstrap")
70 | )
71 |
72 | expect_equal(
73 | nrow(x_c),
74 | sum(group_size(iris_g) * 5)
75 | )
76 |
77 | })
78 |
79 | test_that("`key` is propagated to `collect()`", {
80 |
81 | x <- bootstrapify(iris, 5, key = ".boot")
82 |
83 | expect_equal(
84 | colnames(collect(x))[1],
85 | ".boot"
86 | )
87 |
88 | })
89 |
90 | test_that("can collect with `id`", {
91 |
92 | x <- bootstrapify(iris, 5)
93 |
94 | x_c <- collect(x, id = ".id")
95 |
96 | expect_equal(
97 | colnames(x_c),
98 | c(".bootstrap", ".id", colnames(iris))
99 | )
100 |
101 | expect_equal(
102 | x_c$.id,
103 | rep(1:150, times = 5)
104 | )
105 |
106 | })
107 |
108 | test_that("can collect with `original_id`", {
109 |
110 | x <- bootstrapify(iris, 5)
111 |
112 | x_c <- collect(x, original_id = ".o_id")
113 |
114 | expect_equal(
115 | colnames(x_c),
116 | c(".bootstrap", ".o_id", colnames(iris))
117 | )
118 |
119 | expect_equal(
120 | x_c$.o_id,
121 | unlist(group_data(x)$.rows)
122 | )
123 |
124 | })
125 |
126 | test_that("can collect with `id` and `original_id`", {
127 |
128 | x <- bootstrapify(iris, 5)
129 |
130 | x_c <- collect(x, id = ".id", original_id = ".o_id")
131 |
132 | # want this column order. most intuitive.
133 | expect_equal(
134 | colnames(x_c),
135 | c(".bootstrap", ".id", ".o_id", colnames(iris))
136 | )
137 |
138 | })
139 |
140 | test_that("`id` must be a single character", {
141 |
142 | x <- bootstrapify(iris, 5)
143 |
144 | expect_error(
145 | collect(x, id = 1),
146 | "`id` must be a character of size 1."
147 | )
148 |
149 | })
150 |
151 | test_that("`original_id` must be a single character", {
152 |
153 | x <- bootstrapify(iris, 5)
154 |
155 | expect_error(
156 | collect(x, original_id = 1),
157 | "`original_id` must be a character of size 1."
158 | )
159 |
160 | })
161 |
--------------------------------------------------------------------------------
/tests/testthat/test-compat-tidyr.R:
--------------------------------------------------------------------------------
1 | context("test-compat-tidyr")
2 |
3 | test_that("nest()", {
4 |
5 | x <- bootstrapify(iris, 5)
6 |
7 | expect_error(
8 | x_n <- nest(x),
9 | NA
10 | )
11 |
12 | expect_equal(
13 | x_n$.bootstrap,
14 | 1:5
15 | )
16 |
17 | expect_equal(
18 | colnames(nest(x, .x = -.bootstrap)),
19 | c(".bootstrap", ".x")
20 | )
21 |
22 | })
23 |
--------------------------------------------------------------------------------
/tests/testthat/test-dplyr-compat.R:
--------------------------------------------------------------------------------
1 | context("test-dplyr-compat")
2 |
3 | library(dplyr)
4 |
5 | test_that("can ungroup() to lose `resampled_df`", {
6 |
7 | x <- bootstrapify(iris, 5)
8 |
9 | x_ug <- ungroup(x)
10 |
11 | expect_is(x_ug, "tbl_df")
12 | expect_false("resampled_df" %in% class(x_ug))
13 | expect_equal(attr(x_ug, "groups"), NULL)
14 |
15 | })
16 |
17 | test_that("can group_by() after bootstrapify()", {
18 |
19 | x <- bootstrapify(iris, 5)
20 |
21 | x_g1 <- group_by(x, Species)
22 |
23 | expect_is(x_g1, "grouped_df")
24 | expect_false("resampled_df" %in% class(x_g1))
25 | expect_false(".bootstrap" %in% colnames(x_g1))
26 |
27 | x_g2 <- group_by(x, Species, add = TRUE)
28 |
29 | expect_is(x_g2, "grouped_df")
30 | expect_false("resampled_df" %in% class(x_g2))
31 | expect_true(".bootstrap" %in% colnames(x_g2))
32 | expect_equal(group_vars(x_g2), c(".bootstrap", "Species"))
33 |
34 | expect_equal(
35 | nrow(x_g2),
36 | 5 * 150
37 | )
38 |
39 | x_g3 <- group_by(x, add = TRUE)
40 |
41 | expect_is(x_g3, "grouped_df")
42 | expect_false("resampled_df" %in% class(x_g3))
43 | expect_true(".bootstrap" %in% colnames(x_g3))
44 | expect_equal(group_vars(x_g3), ".bootstrap")
45 |
46 | x_g4 <- group_by(x)
47 |
48 | expect_is(x_g4, "tbl_df")
49 | expect_false("grouped_df" %in% class(x_g4))
50 | expect_false("resampled_df" %in% class(x_g4))
51 | expect_false(".bootstrap" %in% colnames(x_g4))
52 |
53 | # We know we get an error here
54 | # as the bootstraps aren't materialized
55 | expect_error(
56 | group_by(x, .bootstrap)
57 | )
58 |
59 | })
60 |
61 | test_that("mutate()", {
62 | x <- bootstrapify(iris, 2)
63 | expect_equal(
64 | nrow(mutate(x, x = 4)),
65 | 300
66 | )
67 | })
68 |
69 | test_that("transmute()", {
70 | x <- bootstrapify(iris, 2)
71 | expect_equal(
72 | nrow(transmute(x, x = 4)),
73 | 300
74 | )
75 | })
76 |
77 | test_that("filter()", {
78 | x <- bootstrapify(iris, 2)
79 | expect_equal(
80 | nrow(filter(x, .bootstrap <= 1)),
81 | 150
82 | )
83 | })
84 |
85 | test_that("arrange()", {
86 | x <- bootstrapify(iris, 2)
87 |
88 | expect_equal(
89 | nrow(arrange(x, desc(.bootstrap))),
90 | 300
91 | )
92 |
93 | expect_equal(
94 | arrange(x, desc(.bootstrap))$.bootstrap,
95 | rep(c(2,1), each = 150)
96 | )
97 | })
98 |
99 | test_that("distinct()", {
100 | x <- bootstrapify(iris, 2)
101 |
102 | expect_equal(
103 | colnames(distinct(x)),
104 | c(".bootstrap", colnames(iris))
105 | )
106 |
107 | expect_equal(
108 | distinct(x, .bootstrap),
109 | group_by(tibble(.bootstrap = 1:2), .bootstrap)
110 | )
111 |
112 | })
113 |
114 | test_that("select()", {
115 | x <- bootstrapify(iris, 2)
116 |
117 | expect_equal(
118 | colnames(select(x, group_cols())),
119 | ".bootstrap"
120 | )
121 |
122 | expect_equal(
123 | nrow(select(x, group_cols())),
124 | 300
125 | )
126 |
127 | })
128 |
129 | test_that("slice()", {
130 | x <- bootstrapify(iris, 2)
131 |
132 | expect_equal(
133 | nrow(slice(x, 1)),
134 | 2
135 | )
136 | })
137 |
138 | test_that("pull()", {
139 | x <- bootstrapify(iris, 2)
140 |
141 | expect_equal(
142 | length(pull(x, ".bootstrap")),
143 | 300
144 | )
145 |
146 | expect_equal(
147 | length(pull(x, "Sepal.Length")),
148 | 300
149 | )
150 | })
151 |
152 | test_that("rename()", {
153 | x <- bootstrapify(iris, 2)
154 |
155 | expect_equal(
156 | colnames(rename(x, y = .bootstrap)),
157 | c("y", colnames(iris))
158 | )
159 | })
160 |
161 | test_that("full_join()", {
162 |
163 | mini_iris <- iris[1:20, 1:2]
164 | x <- bootstrapify(mini_iris, 2)
165 | y <- tibble(Sepal.Length = 100, new = 1)
166 |
167 | x_fj <- full_join(x, y, by = "Sepal.Length")
168 |
169 | expect_equal(
170 | nrow(x_fj),
171 | 41
172 | )
173 |
174 | expect_equal(
175 | x_fj$new,
176 | c(rep(NA_real_, 40), 1)
177 | )
178 |
179 | expect_equal(
180 | colnames(x_fj),
181 | c(".bootstrap", "Sepal.Length", "Sepal.Width", "new")
182 | )
183 |
184 | })
185 |
186 | test_that("inner_join()", {
187 |
188 | mini_iris <- iris[1:20, 1:2]
189 | x <- bootstrapify(mini_iris, 2)
190 | y <- tibble(Sepal.Length = 100, new = 1)
191 |
192 | x_ij <- inner_join(x, y, by = "Sepal.Length")
193 |
194 | expect_equal(
195 | nrow(x_ij),
196 | 0
197 | )
198 |
199 | expect_equal(
200 | colnames(x_ij),
201 | c(".bootstrap", "Sepal.Length", "Sepal.Width", "new")
202 | )
203 |
204 | })
205 |
206 | test_that("left_join()", {
207 |
208 | mini_iris <- iris[1:20, 1:2]
209 | x <- bootstrapify(mini_iris, 2)
210 | y <- tibble(Sepal.Length = 100, new = 1)
211 |
212 | x_lj <- left_join(x, y, by = "Sepal.Length")
213 |
214 | expect_equal(
215 | nrow(x_lj),
216 | 40
217 | )
218 |
219 | expect_equal(
220 | x_lj$new,
221 | rep(NA_real_, times = 40)
222 | )
223 |
224 | expect_equal(
225 | colnames(x_lj),
226 | c(".bootstrap", "Sepal.Length", "Sepal.Width", "new")
227 | )
228 |
229 | })
230 |
231 | test_that("right_join()", {
232 |
233 | mini_iris <- iris[1:20, 1:2]
234 | x <- bootstrapify(mini_iris, 2)
235 | y <- tibble(Sepal.Length = 100, new = 1)
236 |
237 | x_rj <- right_join(x, y, by = "Sepal.Length")
238 |
239 | expect_equal(
240 | nrow(x_rj),
241 | 1
242 | )
243 |
244 | expect_equal(
245 | x_rj$new,
246 | 1
247 | )
248 |
249 | expect_equal(
250 | colnames(x_rj),
251 | c(".bootstrap", "Sepal.Length", "Sepal.Width", "new")
252 | )
253 |
254 | })
255 |
256 | test_that("anti_join()", {
257 |
258 | mini_iris <- iris[1:20, 1:2]
259 | x <- bootstrapify(mini_iris, 2)
260 | y <- tibble(Sepal.Length = 100, new = 1)
261 |
262 | x_aj <- anti_join(x, y, by = "Sepal.Length")
263 |
264 | expect_equal(
265 | x_aj,
266 | collect(x)
267 | )
268 |
269 | })
270 |
271 | test_that("semi_join()", {
272 |
273 | mini_iris <- iris[1:20, 1:2]
274 | x <- bootstrapify(mini_iris, 2)
275 | y <- tibble(Sepal.Length = 100, new = 1)
276 |
277 | x_sj <- semi_join(x, y, by = "Sepal.Length")
278 |
279 | expect_equal(
280 | x_sj,
281 | collect(x)[0,]
282 | )
283 |
284 | })
285 |
286 | # ------------------------------------------------------------------------------
287 | # dplyr functions implictly supported
288 |
289 | context("test-dplyr-compat-extra")
290 |
291 | test_that("add_count()", {
292 | x <- bootstrapify(iris, 2)
293 | expect_equal(
294 | nrow(add_count(x)),
295 | 300
296 | )
297 | })
298 |
299 | test_that("add_tally()", {
300 | x <- bootstrapify(iris, 2)
301 |
302 | expect_equal(
303 | nrow(add_tally(x)),
304 | 300
305 | )
306 |
307 | expect_equal(
308 | nrow(add_tally(x, .bootstrap, sort = TRUE)),
309 | 300
310 | )
311 |
312 | expect_equal(
313 | unique(add_tally(x, .bootstrap, sort = TRUE)$n),
314 | c(300, 150)
315 | )
316 | })
317 |
318 | test_that("as_tibble()", {
319 |
320 | x <- bootstrapify(iris, 2)
321 |
322 | x_at <- as_tibble(x)
323 |
324 | # can convert to tibble without expanding
325 | # virtual groups
326 | expect_equal(
327 | nrow(x_at),
328 | 150
329 | )
330 |
331 | expect_false(".bootstrap" %in% colnames(x_at))
332 | expect_false("resampled_df" %in% class(x_at))
333 |
334 | })
335 |
336 | test_that("bind_rows() fails sadly", {
337 |
338 | x <- bootstrapify(iris, 2)
339 |
340 | # Cant currently do anything about this
341 | expect_error(
342 | bind_rows(x, iris),
343 | "Column `.bootstrap` is unknown"
344 | )
345 |
346 | })
347 |
348 | test_that("bind_cols() works", {
349 |
350 | x <- bootstrapify(iris, 2)
351 |
352 | x_bc_1 <- bind_cols(x, iris)
353 |
354 | expect_is(x_bc_1, "resampled_df")
355 |
356 | expect_equal(
357 | ncol(x_bc_1),
358 | 10
359 | )
360 |
361 | expect_equal(
362 | nrow(x_bc_1),
363 | 150
364 | )
365 |
366 | expect_equal(
367 | nrow(collect(x_bc_1)),
368 | 300
369 | )
370 |
371 | x_bc_2 <- bind_cols(iris, x)
372 |
373 | expect_is(x_bc_2, "data.frame")
374 | expect_false("tbl_df" %in% class(x_bc_2))
375 | })
376 |
377 | test_that("sample_n()", {
378 | x <- bootstrapify(iris, 2)
379 |
380 | expect_equal(
381 | nrow(sample_n(x, 5)),
382 | 10
383 | )
384 | })
385 |
--------------------------------------------------------------------------------
/tests/testthat/test-dplyr-do.R:
--------------------------------------------------------------------------------
1 | context("test-dplyr-do")
2 |
3 | library(dplyr)
4 |
5 | test_that("can use do() with bootstraps", {
6 |
7 | x <- bootstrapify(iris, 5)
8 |
9 | expect_equal(
10 | do(x, .),
11 | collect(x)
12 | )
13 |
14 | })
15 |
16 | test_that("can return non-data frames", {
17 |
18 | x <- bootstrapify(iris, 5)
19 |
20 | x_do <- do(x, mod = lm(Sepal.Width ~ Species, data = .))
21 |
22 | expect_is(x_do, "rowwise_df")
23 | expect_false("resampled_df" %in% class(x_do))
24 |
25 | expect_equal(
26 | nrow(x_do),
27 | 5
28 | )
29 |
30 | })
31 |
32 | test_that("multiple groups", {
33 |
34 | x <- iris %>%
35 | group_by(Species) %>%
36 | bootstrapify(5)
37 |
38 | x_do <- do(x, mod = lm(Sepal.Width ~ Sepal.Length, data = .))
39 |
40 | expect_equal(
41 | nrow(x_do),
42 | 15
43 | )
44 |
45 | })
46 |
47 | test_that("double bootstrap", {
48 |
49 | x <- iris %>%
50 | bootstrapify(5) %>%
51 | bootstrapify(10, key = "b2")
52 |
53 | x_do <- do(x, mod = lm(Sepal.Width ~ Sepal.Length, data = .))
54 |
55 | expect_equal(
56 | colnames(x_do),
57 | c(".bootstrap", "b2", "mod")
58 | )
59 |
60 | })
61 |
--------------------------------------------------------------------------------
/tests/testthat/test-dplyr-group-funs.R:
--------------------------------------------------------------------------------
1 | context("test-dplyr-group-funs")
2 |
3 | library(dplyr)
4 |
5 | test_that("group_nest()", {
6 |
7 | x <- iris %>%
8 | bootstrapify(5)
9 |
10 | x_gn <- group_nest(x)
11 |
12 | expect_equal(nrow(x_gn), 5)
13 | expect_false("resampled_df" %in% class(x_gn))
14 |
15 | xx <- bootstrapify(x, 10, key = "bs2")
16 |
17 | xx_gn <- group_nest(xx)
18 |
19 | expect_equal(nrow(xx_gn), 50)
20 | expect_false("resampled_df" %in% class(xx_gn))
21 |
22 | })
23 |
24 | test_that("group_nest() with `keep = TRUE`", {
25 |
26 | x <- iris %>%
27 | bootstrapify(5)
28 |
29 | x_gn <- group_nest(x, keep = TRUE)
30 |
31 | expect_equal(
32 | colnames(x_gn$data[[1]])[1],
33 | ".bootstrap"
34 | )
35 |
36 | })
37 |
38 | test_that("group_modify()", {
39 |
40 | x <- iris %>%
41 | bootstrapify(5)
42 |
43 | expect_equal(
44 | group_modify(x, ~.x),
45 | collect(x)
46 | )
47 |
48 | x_gm <- group_modify(x, ~dplyr::tibble(.g = list(.y)))
49 |
50 | expect_equal(nrow(x_gm), 5)
51 |
52 | expect_equal(
53 | x_gm$.g[[1]],
54 | dplyr::tibble(.bootstrap = 1L)
55 | )
56 |
57 | })
58 |
59 | test_that("group_map()", {
60 |
61 | x <- iris %>%
62 | bootstrapify(5)
63 |
64 | # Don't check attributes, as group_split()
65 | # also adds the ptype as an attribute as of dplyr 0.8.2
66 | expect_equivalent(
67 | group_map(x, ~.x),
68 | group_split(x, keep = FALSE)
69 | )
70 |
71 | x_gm <- group_map(x, ~dplyr::tibble(.g = list(.y)))
72 |
73 | expect_is(x_gm, "list")
74 |
75 | expect_identical(
76 | x_gm[[1]],
77 | tibble(.g = list(tibble(.bootstrap = 1L)))
78 | )
79 |
80 | # `keep` argument
81 | expect_equal(
82 | unlist(group_map(x, ~ncol(.x), keep = TRUE)),
83 | rep(6, times = 5)
84 | )
85 |
86 | })
87 |
88 | test_that("group_walk()", {
89 |
90 | x <- iris %>%
91 | bootstrapify(5)
92 |
93 | res <- NULL
94 |
95 | group_walk(x, ~{ res <<- dplyr::bind_rows(res, .y) })
96 |
97 | expect_equal(
98 | res,
99 | dplyr::tibble(.bootstrap = 1:5)
100 | )
101 |
102 | res <- NULL
103 |
104 | group_walk(x, ~{ res <<- dplyr::bind_rows(res, .x) })
105 |
106 | expect_equal(
107 | res,
108 | select(ungroup(collect(x)), -.bootstrap)
109 | )
110 |
111 | })
112 |
113 | test_that("group_split() - `keep` argument", {
114 |
115 | x <- iris %>%
116 | bootstrapify(5)
117 |
118 | x_gs <- group_split(x, keep = TRUE)
119 |
120 | expect_equal(
121 | colnames(x_gs[[1]]),
122 | c(".bootstrap", colnames(iris))
123 | )
124 |
125 | x_gs2 <- group_split(x, keep = FALSE)
126 |
127 | expect_equal(
128 | colnames(x_gs2[[1]]),
129 | colnames(iris)
130 | )
131 | })
132 |
133 | test_that("group_keys() can find the virtual groups", {
134 |
135 | x <- iris %>%
136 | group_by(Species) %>%
137 | bootstrapify(1) %>%
138 | bootstrapify(2)
139 |
140 | x_keys <- group_keys(x)
141 |
142 | expect_equal(
143 | x_keys[[1]],
144 | rep(unique(iris$Species), each = 2)
145 | )
146 |
147 | expect_equal(
148 | x_keys[[2]],
149 | rep(1, times = 6)
150 | )
151 |
152 | expect_equal(
153 | x_keys[[3]],
154 | rep(1:2, times = 3)
155 | )
156 | })
157 |
158 | test_that("group_data() finds virtual groups", {
159 |
160 | x <- iris %>%
161 | bootstrapify(2)
162 |
163 | x_gd <- group_data(x)
164 |
165 | expect_equal(
166 | x_gd$.bootstrap,
167 | c(1, 2)
168 | )
169 | })
170 |
171 | test_that("group_indices() returns collect()ed indices", {
172 |
173 | x <- iris %>%
174 | bootstrapify(2)
175 |
176 | x_gi <- group_indices(x)
177 |
178 | expect_equal(
179 | x_gi,
180 | c(rep(1, times = 150), rep(2, times = 150))
181 | )
182 |
183 | })
184 |
185 | test_that("group_vars() returns virtual groups", {
186 |
187 | x <- iris %>%
188 | bootstrapify(2)
189 |
190 | expect_equal(
191 | group_vars(x),
192 | ".bootstrap"
193 | )
194 |
195 | # I think it is correct to expect that tbl_vars()
196 | # doesn't return the virtual group
197 | expect_equal(
198 | as.character(tbl_vars(x)),
199 | colnames(iris)
200 | )
201 |
202 | })
203 |
--------------------------------------------------------------------------------
/tests/testthat/test-dplyr-summarise.R:
--------------------------------------------------------------------------------
1 | context("test-dplyr-summarise")
2 |
3 | library(dplyr)
4 |
5 | test_that("summarise() works with a simple bootstrap", {
6 |
7 | x <- bootstrapify(iris, 5)
8 |
9 | expect_error(
10 | x_s <- summarise(x, y = mean(Sepal.Width)),
11 | NA
12 | )
13 |
14 | expect_equal(
15 | x_s$.bootstrap,
16 | 1:5
17 | )
18 |
19 | expect_false("grouped_df" %in% class(x_s))
20 | expect_false("resampled_df" %in% class(x_s))
21 |
22 | })
23 |
24 | test_that("summarise() == summarize()", {
25 |
26 | x <- bootstrapify(iris, 5)
27 |
28 | expect_equal(
29 | summarise(x, y = mean(Sepal.Width)),
30 | summarize(x, y = mean(Sepal.Width))
31 | )
32 |
33 | })
34 |
35 | test_that("works with a double bootstrap", {
36 |
37 | x <- iris %>%
38 | bootstrapify(5) %>%
39 | bootstrapify(10, key = "bs2")
40 |
41 | expect_error(
42 | x_s <- summarise(x, y = mean(Sepal.Width)),
43 | NA
44 | )
45 |
46 | # We want it to be a grouped_df, but no longer a resampled_df
47 | # because the groups have been materialized
48 | expect_is(x_s, "grouped_df")
49 | expect_false("resampled_df" %in% class(x_s))
50 |
51 | expect_equal(
52 | nrow(x_s),
53 | 5 * 10
54 | )
55 |
56 | })
57 |
58 | test_that("works with existing groups", {
59 |
60 | x <- iris %>%
61 | group_by(Species) %>%
62 | bootstrapify(5)
63 |
64 | expect_error(
65 | x_s <- summarise(x, y = mean(Sepal.Width)),
66 | NA
67 | )
68 |
69 | # We want it to be a grouped_df, but no longer a resampled_df
70 | # because the groups have been materialized
71 | expect_is(x_s, "grouped_df")
72 | expect_false("resampled_df" %in% class(x_s))
73 |
74 | expect_equal(
75 | nrow(x_s),
76 | 5 * 3
77 | )
78 |
79 | expect_equal(
80 | colnames(x_s),
81 | c("Species", ".bootstrap", "y")
82 | )
83 |
84 | })
85 |
--------------------------------------------------------------------------------
/tests/testthat/test-samplify.R:
--------------------------------------------------------------------------------
1 | context("test-samplify")
2 |
3 | # many of the would-be tests here are already covered by `test-bootstrapify.R`
4 |
5 | library(dplyr)
6 |
7 | test_that("can alter the `size` of each resample", {
8 |
9 | x <- samplify(iris, 1, 5)
10 |
11 | expect_equal(
12 | length(group_data(x)$.rows[[1]]),
13 | 5
14 | )
15 |
16 | })
17 |
18 | test_that("cannot specify vector `size` for ungrouped data frames", {
19 |
20 | expect_error(
21 | samplify(iris, 2, c(1, 5)),
22 | "a single integer \\(for ungrouped data frames\\)"
23 | )
24 |
25 | })
26 |
27 | test_that("can specify vector `size` for grouped data frames", {
28 |
29 | iris_g <- group_by(iris, Species)
30 |
31 | expect_error(
32 | x <- samplify(iris_g, 1, c(1, 2, 3)),
33 | NA
34 | )
35 |
36 | x_gd <- group_data(x)
37 |
38 | expect_equal(
39 | vapply(x_gd$.rows, length, integer(1)),
40 | c(1, 2, 3)
41 | )
42 |
43 | expect_error(
44 | samplify(x, 2, c(1, 5)),
45 | "must be size 1 or 3 \\(the number of groups\\)"
46 | )
47 |
48 | })
49 |
50 | test_that("cannot sample more than the number of rows without replacement", {
51 |
52 | expect_error(
53 | samplify(iris, 1, 151),
54 | "`size` \\(151\\) must be less than or equal to the size of the data / current group \\(150\\)"
55 | )
56 |
57 | iris_g <- group_by(iris, Species)
58 |
59 | expect_error(
60 | samplify(iris_g, 1, c(49, 49, 51)),
61 | "`size` \\(51\\) must be less than or equal to the size of the data / current group \\(50\\)"
62 | )
63 |
64 | })
65 |
66 | test_that("can sample with replacement past the number of rows", {
67 |
68 | expect_error(
69 | x <- samplify(iris, 1, 151, replace = TRUE),
70 | NA
71 | )
72 |
73 | x_gd <- group_data(x)
74 |
75 | expect_equal(
76 | length(x_gd$.rows[[1]]),
77 | 151
78 | )
79 |
80 | iris_g <- group_by(iris, Species)
81 |
82 | expect_error(
83 | xx <- samplify(iris_g, 1, c(51, 55, 40), replace = TRUE),
84 | NA
85 | )
86 |
87 | xx_gd <- group_data(xx)
88 |
89 |
90 | expect_equal(
91 | vapply(xx_gd$.rows, length, integer(1)),
92 | c(51, 55, 40)
93 | )
94 |
95 | })
96 |
97 | test_that("`replace` must be a bool", {
98 | expect_error(
99 | samplify(iris, 1, 1, replace = NA),
100 | "a single logical \\(TRUE/FALSE\\)"
101 | )
102 | })
103 |
104 | test_that("`size` is recycled as necessary", {
105 |
106 | iris_g <- group_by(iris, Species)
107 |
108 | x <- samplify(iris_g, 2, 5)
109 |
110 | x_gd <- group_data(x)
111 |
112 | expect_equal(
113 | unique(vapply(x_gd$.rows, length, integer(1))),
114 | 5
115 | )
116 |
117 | })
118 |
--------------------------------------------------------------------------------
/vignettes/.gitignore:
--------------------------------------------------------------------------------
1 | *.html
2 | *.R
3 |
--------------------------------------------------------------------------------
/vignettes/dplyr-support.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Support for dplyr"
3 | output: rmarkdown::html_vignette
4 | vignette: >
5 | %\VignetteIndexEntry{Support for dplyr}
6 | %\VignetteEncoding{UTF-8}
7 | %\VignetteEngine{knitr::rmarkdown}
8 | editor_options:
9 | chunk_output_type: console
10 | ---
11 |
12 | ```{r, include = FALSE}
13 | knitr::opts_chunk$set(
14 | collapse = TRUE,
15 | comment = "#>"
16 | )
17 | ```
18 |
19 | ```{r setup, message=FALSE, warning=FALSE}
20 | set.seed(123)
21 | library(strapgod)
22 | library(dplyr)
23 | ```
24 |
25 | # Introduction
26 |
27 | As much as possible, strapgod attempts to let you use any dplyr function that you want on the `resampled_df` object that is returned by `bootstrapify()` and `samplify()`. Some functions have specialized behavior, like `summarise()`, while most others just call `collect()` to materialize the bootstrap rows before passing on to the underlying dplyr function.
28 |
29 | What follows is a list of the dplyr functions that have "special" properties when used on a `resampled_df`.
30 |
31 | # collect()
32 |
33 | The most important dplyr function for strapgod is `collect()`. Generally, this has been used to force a computation from a data base query and return the results as a tibble, and it has a similar context here. `collect()` forces the materialization of the virtual groups, and returns the full grouped tibble back to you.
34 |
35 | ```{r}
36 | x <- bootstrapify(iris, 10)
37 |
38 | # Not materialized
39 | x
40 |
41 | # Materialized
42 | collect(x)
43 | ```
44 |
45 | When calling `collect()` directly, there are two arguments available to extract extra information about the bootstraps.
46 |
47 | `id` adds a sequence of integers from `1:n` for each bootstrap group. It would be equivalent to adding the `row_number()` by group after the `collect()`, but saves some typing.
48 |
49 | ```{r}
50 | collect(x, id = ".id")
51 | ```
52 |
53 | `original_id` tacks on the original row of the current bootstrap observation. It is generally more useful than `id`, as it provides a way to link the bootstrap rows back to the original data.
54 |
55 | ```{r}
56 | collect(x, original_id = ".original_id")
57 | ```
58 |
59 | # summarise()
60 |
61 | The motivation for this package was `summarise()`. It efficiently computes the summary results, only materializing the bootstrap rows as they are needed at the C++ level.
62 |
63 | ```{r}
64 | summarise(x, mean_length = mean(Sepal.Length))
65 | ```
66 |
67 | You can group by other columns before creating the virtual groups, and `bootstrapify()` will respect those extra groups in the `summarise()` call. Pay attention to how easy it is to go from a non-bootstrapped version to a bootstrapped version. It's just one extra line!
68 |
69 | ```{r}
70 | # Non-bootstrapped
71 | iris %>%
72 | group_by(Species) %>%
73 | summarise(
74 | mean_length_across_species = mean(Sepal.Length)
75 | )
76 |
77 | # Bootstrapped
78 | iris %>%
79 | group_by(Species) %>%
80 | bootstrapify(5) %>%
81 | summarise(
82 | mean_length_across_species = mean(Sepal.Length)
83 | )
84 | ```
85 |
86 | # do()
87 |
88 | While `dplyr::do()` is basically deprecated and has been replaced by `group_modify()`, it still has its uses sometimes. Like `summarise()`, `do()` materializes the groups only when they are required. Here we run the same linear model on each bootstrapped set of data.
89 |
90 | ```{r}
91 | do(x, model = lm(Sepal.Length ~ Sepal.Width, data = .))
92 | ```
93 |
94 | # group_nest()
95 |
96 | `group_nest()` will materialize the groups so that they become columns in the outer tibble after the nest has been performed.
97 |
98 | ```{r}
99 | group_nest(x)
100 | ```
101 |
102 | You can set `keep = TRUE` to include the groups in the inner tibbles as well.
103 |
104 | ```{r}
105 | group_nest(x, keep = TRUE)$data[[1]]
106 | ```
107 |
108 | # group_split()
109 |
110 | `group_split()` allows you to materialize all of the bootstrap tibbles into separate tibbles, all bundled together into a list.
111 |
112 | ```{r}
113 | group_split(x) %>% head(n = 3)
114 | ```
115 |
116 | You can specify `keep = FALSE` if you never want to see the bootstrap columns.
117 |
118 | ```{r}
119 | group_split(x, keep = FALSE) %>% head(n = 3)
120 | ```
121 |
122 | # group_modify()
123 |
124 | `group_modify()` is similar to `do()`, but (as of dplyr 0.8.0.1) always returns a data frame and gives you access to the non-group and group data separately.
125 |
126 | ```{r}
127 | # Just show the first 2 rows of each bootstrap
128 | group_modify(x, ~head(.x, n = 2))
129 |
130 | # As you iterate though each group, you have access to that
131 | # group's metadata through `.y` if you need it.
132 | group_modify_group_data <- group_modify(x, ~tibble(.g = list(.y)))
133 |
134 | group_modify_group_data
135 |
136 | group_modify_group_data$.g[[1]]
137 | ```
138 |
139 | Like `do()`, it can be a convenient way to run multiple models as long as you return a data frame from each one.
140 |
141 | ```{r}
142 | x %>%
143 | group_by(Species, add = TRUE) %>%
144 | group_modify(~ broom::tidy(lm(Petal.Length ~ Sepal.Length, data = .x)))
145 | ```
146 |
147 | # ungroup()
148 |
149 | `ungroup()` will return the original tibble back to you, without materializing the virtual groups.
150 |
151 | ```{r}
152 | ungroup(x)
153 | ```
154 |
155 | # as_tibble()
156 |
157 | Like `ungroup()`, you can get the original tibble back by converting it to one explicitly with `as_tibble()`.
158 |
159 | ```{r}
160 | as_tibble(x)
161 | ```
162 |
163 | # Other dplyr functions
164 |
165 | Most other dplyr functions work by first calling `collect()`, and then passing off to the underlying dplyr implementation. This means you can use `mutate()` like so:
166 |
167 | ```{r}
168 | mutate(x, mean = mean(Sepal.Length))
169 | ```
170 |
171 | This doesn't really get you anything in terms of speed, but can be convenient as an automatic way to convert back to a tibble and keep going with your workflow.
172 |
--------------------------------------------------------------------------------
/vignettes/virtual-bootstraps.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Virtual Bootstraps"
3 | output: rmarkdown::html_vignette
4 | vignette: >
5 | %\VignetteIndexEntry{Virtual Bootstraps}
6 | %\VignetteEncoding{UTF-8}
7 | %\VignetteEngine{knitr::rmarkdown}
8 | editor_options:
9 | chunk_output_type: console
10 | ---
11 |
12 | ```{r, include = FALSE}
13 | knitr::opts_chunk$set(
14 | collapse = TRUE,
15 | comment = "#>"
16 | )
17 | ```
18 |
19 | ```{r setup, message=FALSE, warning=FALSE}
20 | set.seed(123)
21 | library(strapgod)
22 | library(dplyr)
23 | iris <- as_tibble(iris)
24 | ```
25 |
26 | # Introduction
27 |
28 | The goal of strapgod is to make it easy to create _virtual groups_ on top of tibbles for use with resampling. This means that your tibble is grouped, but you don't actually "materialize" the groups until you actually need them. By doing this, some computations involving large amounts of bootstraps or resamples can be made much more efficient.
29 |
30 | # Creating resampled data frames
31 |
32 | There are two core functions that help you generate a `resampled_df` object.
33 |
34 | `bootstrapify()` takes a data frame and bootstraps the rows of that data frame a set number of `times` to generate the virtual groups.
35 |
36 | ```{r}
37 | iris_boot <- bootstrapify(iris, times = 10)
38 |
39 | nrow(iris)
40 | nrow(iris_boot)
41 |
42 | iris_boot
43 | ```
44 |
45 | What you'll immediately notice is that:
46 |
47 | - The tibble still only has 150 rows.
48 |
49 | - The tibble is now grouped by `.bootstrap`, which isn't a column in the tibble.
50 |
51 | The invisible `.bootstrap` column is the virtual group. It hasn't been materialized (there are still only 150 rows, not 150 * 10 rows), but dplyr still seems to know about it.
52 |
53 | `samplify()` is the other function that can generate resampled tibbles. It is a slight generalization of `bootstrapify()` that also allows you to specify the size of each resample, and if you want to resample with replacement or not.
54 |
55 | ```{r}
56 | iris_samp <- samplify(iris, times = 10, size = 20, replace = FALSE)
57 |
58 | iris_samp
59 | ```
60 |
61 | This result:
62 |
63 | - Has 10 resamples
64 |
65 | - Each one is of size 20
66 |
67 | - And the resampling was done without replacement each time
68 |
69 | # Resampled summaries
70 |
71 | What can you do with these neat resampled data frames? Great question! For one thing, you can `summarise()` the tibble to compute bootstrapped summaries quickly and efficiently.
72 |
73 | ```{r}
74 | # without the bootstrap
75 | iris %>%
76 | summarise(
77 | mean_length = mean(Sepal.Length)
78 | )
79 |
80 | # with the bootstrap
81 | iris %>%
82 | bootstrapify(10) %>%
83 | summarise(
84 | mean_length = mean(Sepal.Length)
85 | )
86 | ```
87 |
88 | This makes it easy to compute bootstrapped estimates of individual statistics, along with bootstrapped standard deviations around those estimates.
89 |
90 | ```{r}
91 | iris %>%
92 | bootstrapify(10) %>%
93 | summarise(mean_length = mean(Sepal.Length)) %>%
94 | summarise(
95 | bootstrapped_mean = mean(mean_length),
96 | bootstrapped_sd = sd(mean_length)
97 | )
98 | ```
99 |
100 | If you want, you can take an existing grouped data frame and bootstrapify that as well, allowing you to compute bootstrapped statistics _across_ some other variable.
101 |
102 | ```{r}
103 | iris_group_strap <- iris %>%
104 | group_by(Species) %>%
105 | bootstrapify(100)
106 |
107 | iris_group_strap
108 | ```
109 |
110 | Reusing the code from above, we can now compute bootstrapped estimates for the mean `Sepal.Length` of each `Species`, along with standard deviations around those estimates.
111 |
112 | ```{r}
113 | iris_group_strap %>%
114 | summarise(mean_length = mean(Sepal.Length)) %>%
115 | summarise(
116 | bootstrapped_mean = mean(mean_length),
117 | bootstrapped_sd = sd(mean_length)
118 | )
119 | ```
120 |
121 | # Understanding virtual groups
122 |
123 | The virtual groups are stored in the `group_data()` metadata of the `resampled_df` object. Every grouped data frame has one of these, and they are used internally to power the dplyr `group_by()` system.
124 |
125 | ```{r}
126 | group_data(iris_boot)
127 | ```
128 |
129 | The `.bootstrap` column contains the unique values of the groups, and the `.rows` column is a list column, where each element is an integer vector. That integer vector holds the rows that belong to that specific group. So, for `.bootstrap == 1`, there is a vector with 150 integers identifying the rows belonging to that resample.
130 |
131 | ```{r}
132 | group_data(iris_boot)$.rows[[1]]
133 | ```
134 |
135 | When a call to `collect()` is made, this row index information is used to construct the output. Essentially, we start with the `group_data()` and utilize the `.rows` info to replicate the rows of the original data frame for each group, building up the complete resampled data frame. Notice how we now have the `150 * 10 = 1500` rows from the 10 bootstraps.
136 |
137 | ```{r}
138 | collect(iris_boot)
139 | ```
140 |
141 | To learn more about `collect()`, and the other supported dplyr functions in strapgod, read the `vignette("dplyr-support", "strapgod")`.
142 |
--------------------------------------------------------------------------------