An S3 class single_exposure to represent exposure of data to one rule
72 | pack. It is a list of the following structure: pack_info - single
73 | pack_info object; report - tidy data validation report
74 | without column pack.
75 |
76 |
77 |
78 |
79 |
Details
80 |
Single exposure is implemented in order to encapsulate preliminary
81 | exposure data from single rule pack. It is needed to impute possibly missing
82 | pack names during exposure. That is why single_exposure doesn't
83 | contain pack name in any form.
84 |
85 |
86 |
87 |
88 |
90 |
91 |
92 |
93 |
102 |
103 |
104 |
105 |
106 |
107 |
108 |
109 |
110 |
--------------------------------------------------------------------------------
/docs/sitemap.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | /404.html
5 |
6 |
7 | /LICENSE-text.html
8 |
9 |
10 | /LICENSE.html
11 |
12 |
13 | /articles/design-and-format.html
14 |
15 |
16 | /articles/index.html
17 |
18 |
19 | /articles/rule-packs.html
20 |
21 |
22 | /articles/validation.html
23 |
24 |
25 | /authors.html
26 |
27 |
28 | /index.html
29 |
30 |
31 | /news/index.html
32 |
33 |
34 | /reference/act_after_exposure.html
35 |
36 |
37 | /reference/add_pack_names.html
38 |
39 |
40 | /reference/any_breaker.html
41 |
42 |
43 | /reference/assert_any_breaker.html
44 |
45 |
46 | /reference/bind_exposures.html
47 |
48 |
49 | /reference/cell-pack.html
50 |
51 |
52 | /reference/column-pack.html
53 |
54 |
55 | /reference/data-pack.html
56 |
57 |
58 | /reference/expose.html
59 |
60 |
61 | /reference/expose_single.html
62 |
63 |
64 | /reference/exposure.html
65 |
66 |
67 | /reference/group-pack.html
68 |
69 |
70 | /reference/index.html
71 |
72 |
73 | /reference/inside_punct.html
74 |
75 |
76 | /reference/pack_info.html
77 |
78 |
79 | /reference/packs_info.html
80 |
81 |
82 | /reference/pipe.html
83 |
84 |
85 | /reference/reexports.html
86 |
87 |
88 | /reference/row-pack.html
89 |
90 |
91 | /reference/rule-packs.html
92 |
93 |
94 | /reference/ruler-package.html
95 |
96 |
97 | /reference/ruler-report.html
98 |
99 |
100 | /reference/rules.html
101 |
102 |
103 | /reference/single_exposure.html
104 |
105 |
106 | /reference/spread_groups.html
107 |
108 |
109 |
--------------------------------------------------------------------------------
/inst/WORDLIST:
--------------------------------------------------------------------------------
1 | ’s
2 | all’
3 | assertr
4 | assertthat
5 | behaviour
6 | binded
7 | chr
8 | dplyr
9 | funs
10 | github
11 | integerish
12 | keyholder
13 | lgl
14 | magrittr
15 | name’
16 | naniar
17 | obeyers
18 | pack’
19 | README
20 | reproducibility
21 | sealr
22 | skimr
23 | summarised
24 | tibble
25 | tibbles
26 | tidyverse
27 | whole’
28 |
--------------------------------------------------------------------------------
/man/act_after_exposure.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/actions.R
3 | \name{act_after_exposure}
4 | \alias{act_after_exposure}
5 | \title{Act after exposure}
6 | \usage{
7 | act_after_exposure(.tbl, .trigger, .actor)
8 | }
9 | \arguments{
10 | \item{.tbl}{Result of \link[=expose]{exposure}, i.e. data frame with \link{exposure}
11 | attribute.}
12 |
13 | \item{.trigger}{Function which takes \code{.tbl} as argument and returns \code{TRUE} if
14 | some action needs to be performed.}
15 |
16 | \item{.actor}{Function which takes \code{.tbl} as argument and performs the
17 | action.}
18 | }
19 | \description{
20 | A wrapper for consistent application of some actions based on the data after
21 | exposure.
22 | }
23 | \details{
24 | Basically \code{act_after_exposure()} is doing the following:
25 | \itemize{
26 | \item Check that \code{.tbl} has a proper \link{exposure} attribute.
27 | \item Compute whether to perform intended action by computing \code{.trigger(.tbl)}.
28 | \item If trigger results in \code{TRUE} then \code{.actor(.tbl)} \strong{is returned}. In other
29 | case \code{.tbl} is returned.
30 | }
31 |
32 | It is a good idea that \code{.actor} should be doing one of two things:
33 | \itemize{
34 | \item Making side effects. For example throwing an error (if condition in
35 | \code{.trigger} is met), printing some information and so on. In this case it
36 | should return \code{.tbl} to be used properly inside a \link[magrittr:pipe]{pipe}.
37 | \item Changing \code{.tbl} based on exposure information. In this case it should
38 | return the imputed version of \code{.tbl}.
39 | }
40 | }
41 | \examples{
42 | exposure_printer <- function(.tbl) {
43 | print(get_exposure(.tbl))
44 | .tbl
45 | }
46 | mtcars_exposed <- mtcars \%>\%
47 | expose(data_packs(. \%>\% dplyr::summarise(nrow_low = nrow(.) > 50))) \%>\%
48 | act_after_exposure(any_breaker, exposure_printer)
49 | }
50 | \seealso{
51 | \link{any_breaker} for trigger which returns \code{TRUE} in case any rule
52 | breaker is found in exposure.
53 |
54 | \link{assert_any_breaker} for usage of \code{act_after_exposure()} in building data
55 | validation pipelines.
56 | }
57 |
--------------------------------------------------------------------------------
/man/add_pack_names.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/expose-helpers.R
3 | \name{add_pack_names}
4 | \alias{add_pack_names}
5 | \title{Add pack names to single exposures}
6 | \usage{
7 | add_pack_names(.single_exposures)
8 | }
9 | \arguments{
10 | \item{.single_exposures}{List of \link[=single_exposure]{single exposures}.}
11 | }
12 | \description{
13 | Function to add pack names to single exposures. Converts list of
14 | \link[=single_exposure]{single exposures} to list of \link[=exposure]{exposures} without
15 | validating.
16 | }
17 | \keyword{internal}
18 |
--------------------------------------------------------------------------------
/man/any_breaker.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/actions.R
3 | \name{any_breaker}
4 | \alias{any_breaker}
5 | \title{Is there any breaker in exposure?}
6 | \usage{
7 | any_breaker(.tbl)
8 | }
9 | \arguments{
10 | \item{.tbl}{Result of \link[=expose]{exposure}, i.e. data frame with \link{exposure}
11 | attribute.}
12 | }
13 | \description{
14 | Function designed to be used as trigger in \code{\link[=act_after_exposure]{act_after_exposure()}}. Returns
15 | \code{TRUE} if \link{exposure} attribute of \code{.tbl} has any information about data units
16 | not obeying the rules, i.e. rule breakers.
17 | }
18 | \examples{
19 | mtcars \%>\%
20 | expose(data_packs(. \%>\% dplyr::summarise(nrow_low = nrow(.) > 50))) \%>\%
21 | any_breaker()
22 | }
23 | \seealso{
24 | \link{assert_any_breaker} for implicit usage of \code{any_breaker()}.
25 | }
26 |
--------------------------------------------------------------------------------
/man/assert_any_breaker.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/actions.R
3 | \name{assert_any_breaker}
4 | \alias{assert_any_breaker}
5 | \title{Assert presence of rule breaker}
6 | \usage{
7 | assert_any_breaker(.tbl, .type = "error", .silent = FALSE, ...)
8 | }
9 | \arguments{
10 | \item{.tbl}{Result of \link[=expose]{exposure}, i.e. data frame with \link{exposure}
11 | attribute.}
12 |
13 | \item{.type}{The type of assertion. Can be only one of "error", "warning" or
14 | "message".}
15 |
16 | \item{.silent}{If \code{TRUE} no printing of rule breaker information is done.}
17 |
18 | \item{...}{Arguments for printing rule breaker information.}
19 | }
20 | \description{
21 | Function to assert if \link[=expose]{exposure} resulted in \link[=any_breaker]{detecting}
22 | some rule breakers.
23 | }
24 | \details{
25 | In case breaker presence this function does the following:
26 | \itemize{
27 | \item In case \code{.silent} is \code{FALSE} print rows from exposure
28 | \link[=ruler-report]{report} corresponding to rule breakers.
29 | \item Make assertion of the chosen \code{.type} about breaker presence in exposure.
30 | \item Return \code{.tbl} (for using inside a \link[magrittr:pipe]{pipe}).
31 | }
32 |
33 | If there are no breakers only \code{.tbl} is returned.
34 | }
35 | \examples{
36 | \dontrun{
37 | mtcars \%>\%
38 | expose(data_packs(. \%>\% dplyr::summarise(nrow_low = nrow(.) > 50))) \%>\%
39 | assert_any_breaker()
40 | }
41 | }
42 | \seealso{
43 | \link{any_breaker} for checking of breaker presence in exposure result.
44 |
45 | \link{act_after_exposure} for making general actions based in exposure result.
46 | }
47 |
--------------------------------------------------------------------------------
/man/bind_exposures.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/expose-helpers.R
3 | \name{bind_exposures}
4 | \alias{bind_exposures}
5 | \title{Bind exposures}
6 | \usage{
7 | bind_exposures(..., .validate_output = TRUE)
8 | }
9 | \arguments{
10 | \item{...}{Exposures to bind.}
11 |
12 | \item{.validate_output}{Whether to validate with \code{\link[=is_exposure]{is_exposure()}} if the
13 | output is exposure.}
14 | }
15 | \description{
16 | Function to bind several exposures into one.
17 | }
18 | \details{
19 | \strong{Note} that the output might not have names in list-column \code{fun}
20 | in \link[=packs_info]{packs info}, which depends on version of
21 | \link[dplyr:dplyr-package]{dplyr} package.
22 | }
23 | \examples{
24 | my_data_packs <- data_packs(
25 | data_dims = . \%>\% dplyr::summarise(nrow_low = nrow(.) < 10),
26 | data_sum = . \%>\% dplyr::summarise(sum = sum(.) < 1000)
27 | )
28 |
29 | ref_exposure <- mtcars \%>\%
30 | expose(my_data_packs) \%>\%
31 | get_exposure()
32 |
33 | exposure_1 <- mtcars \%>\%
34 | expose(my_data_packs[1]) \%>\%
35 | get_exposure()
36 | exposure_2 <- mtcars \%>\%
37 | expose(my_data_packs[2]) \%>\%
38 | get_exposure()
39 | exposure_binded <- bind_exposures(exposure_1, exposure_2)
40 |
41 | exposure_pipe <- mtcars \%>\%
42 | expose(my_data_packs[1]) \%>\%
43 | expose(my_data_packs[2]) \%>\%
44 | get_exposure()
45 |
46 | identical(exposure_binded, ref_exposure)
47 |
48 | identical(exposure_pipe, ref_exposure)
49 | }
50 |
--------------------------------------------------------------------------------
/man/cell-pack.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/packs.R
3 | \name{cell-pack}
4 | \alias{cell-pack}
5 | \title{Cell rule pack}
6 | \description{
7 | Cell rule pack is a \link[=rule-packs]{rule pack} which defines a set of rules for
8 | cells, i.e. functions which convert cells of interest to logical values. It
9 | should return a data frame with the following properties:
10 | \itemize{
11 | \item Number of rows equals to \strong{number of rows for checked cells}.
12 | \item Column names should be treated as concatenation of
13 | \bold{'column name of check cell' + 'separator' + 'rule name'}
14 | \item Values indicate whether the \strong{cell} follows the rule.
15 | }
16 | }
17 | \details{
18 | This format is inspired by \link[dplyr:mutate_all]{scoped variants of transmute()}.
19 |
20 | The most common way to define cell pack is by creating a
21 | \link[magrittr:pipe]{functional sequence} containing one of:
22 | \itemize{
23 | \item \code{transmute_all(.funs = rules(...))}.
24 | \item \code{transmute_if(.predicate, .funs = rules(...))}.
25 | \item \code{transmute_at(.vars, .funs = rules(...))}.
26 | }
27 |
28 | \strong{Note} that (as of \code{dplyr} version 0.7.4) when only one column is
29 | transmuted, names of the output don't have a necessary structure. The 'column
30 | name of check cell' is missing which results (after \link[=expose]{exposure})
31 | into empty string in \code{var} column of \link[=ruler-report]{validation report}. The
32 | current way of dealing with this is to name the input column (see examples).
33 | }
34 | \section{Using rules()}{
35 |
36 | Using \code{\link[=rules]{rules()}} to create list of functions for scoped \code{dplyr} "mutating"
37 | verbs (such as \link[dplyr:summarise_all]{summarise_all()} and
38 | \link[dplyr:mutate_all]{transmute_all()}) is recommended because:
39 | \itemize{
40 | \item It is a convenient way to ensure consistent naming of rules without manual
41 | name.
42 | \item It adds a common prefix to all rule names. This helps in defining
43 | separator as prefix surrounded by any number of non-alphanumeric values.
44 | }
45 | }
46 |
47 | \section{Note about rearranging rows}{
48 |
49 | \strong{Note} that during exposure packs are applied to \link[keyholder:keys-set]{keyed object} with \link[keyholder:keyholder-id]{id key}. So they
50 | can rearrange rows as long as it is done with \link[keyholder:keyholder-supported-funs]{functions supported by keyholder}. Rows will be tracked and
51 | recognized as in the original data frame of interest.
52 | }
53 |
54 | \examples{
55 | cell_outlier_rules <- . \%>\% dplyr::transmute_at(
56 | c("disp", "qsec"),
57 | rules(z_score = abs(. - mean(.)) / sd(.) > 1)
58 | )
59 |
60 | cell_packs(outlier = cell_outlier_rules)
61 |
62 | # Dealing with one column edge case
63 | improper_pack <- . \%>\% dplyr::transmute_at(
64 | dplyr::vars(vs),
65 | rules(improper_is_neg = . < 0)
66 | )
67 |
68 | proper_pack <- . \%>\% dplyr::transmute_at(
69 | dplyr::vars(vs = vs),
70 | rules(proper_is_neg = . < 0)
71 | )
72 |
73 | mtcars[1:2, ] \%>\%
74 | expose(cell_packs(improper_pack, proper_pack)) \%>\%
75 | get_report()
76 | }
77 | \seealso{
78 | \link[=data-pack]{Data pack}, \link[=group-pack]{group pack}, \link[=column-pack]{column pack}, \link[=row-pack]{row pack}.
79 | }
80 |
--------------------------------------------------------------------------------
/man/column-pack.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/packs.R
3 | \name{column-pack}
4 | \alias{column-pack}
5 | \title{Column rule pack}
6 | \description{
7 | Column rule pack is a \link[=rule-packs]{rule pack} which defines a set of rules
8 | for columns as a whole, i.e. functions which convert columns of interest to
9 | logical values. It should return a data frame with the following properties:
10 | \itemize{
11 | \item Number of rows equals to \strong{one}.
12 | \item Column names should be treated as concatenation of
13 | \bold{'check column name' + 'separator' + 'rule name'}.
14 | \item Values indicate whether the \strong{column as a whole} follows the rule.
15 | }
16 | }
17 | \details{
18 | This format is inspired by \code{dplyr}'s
19 | \link[dplyr:summarise_all]{scoped variants of summarise()} applied to non-grouped
20 | data.
21 |
22 | The most common way to define column pack is by creating a
23 | \link[magrittr:pipe]{functional sequence} with no grouping and ending with
24 | one of:
25 | \itemize{
26 | \item \code{summarise_all(.funs = rules(...))}.
27 | \item \code{summarise_if(.predicate, .funs = rules(...))}.
28 | \item \code{summarise_at(.vars, .funs = rules(...))}.
29 | }
30 |
31 | \strong{Note} that (as of \code{dplyr} version 0.7.4) when only one column is
32 | summarised, names of the output don't have a necessary structure. The 'check
33 | column name' is missing which results (after \link[=expose]{exposure}) into empty
34 | string in \code{var} column of \link[=ruler-report]{validation report}. The current way
35 | of dealing with this is to name the input column (see examples).
36 | }
37 | \section{Using rules()}{
38 |
39 | Using \code{\link[=rules]{rules()}} to create list of functions for scoped \code{dplyr} "mutating"
40 | verbs (such as \link[dplyr:summarise_all]{summarise_all()} and
41 | \link[dplyr:mutate_all]{transmute_all()}) is recommended because:
42 | \itemize{
43 | \item It is a convenient way to ensure consistent naming of rules without manual
44 | name.
45 | \item It adds a common prefix to all rule names. This helps in defining
46 | separator as prefix surrounded by any number of non-alphanumeric values.
47 | }
48 | }
49 |
50 | \examples{
51 | # Validating present columns
52 | numeric_column_rules <- . \%>\% dplyr::summarise_if(
53 | is.numeric,
54 | rules(mean(.) > 5, sd(.) < 10)
55 | )
56 | character_column_rules <- . \%>\% dplyr::summarise_if(
57 | is.character,
58 | rules(. \%in\% letters[1:4])
59 | )
60 |
61 | col_packs(
62 | num_col = numeric_column_rules,
63 | chr_col = character_column_rules
64 | )
65 |
66 | # Dealing with one column edge case
67 | improper_pack <- . \%>\% dplyr::summarise_at(
68 | dplyr::vars(vs),
69 | rules(improper_is_chr = is.character)
70 | )
71 |
72 | proper_pack <- . \%>\% dplyr::summarise_at(
73 | dplyr::vars(vs = vs),
74 | rules(proper_is_chr = is.character)
75 | )
76 |
77 | mtcars \%>\%
78 | expose(col_packs(improper_pack, proper_pack)) \%>\%
79 | get_report()
80 | }
81 | \seealso{
82 | \link[=data-pack]{Data pack}, \link[=group-pack]{group pack}, \link[=row-pack]{row pack}, \link[=cell-pack]{cell pack}.
83 | }
84 |
--------------------------------------------------------------------------------
/man/data-pack.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/packs.R
3 | \name{data-pack}
4 | \alias{data-pack}
5 | \title{Data rule pack}
6 | \description{
7 | Data rule pack is a \link[=rule-packs]{rule pack} which defines a set of rules for
8 | data as a whole, i.e. functions which convert data to logical values. It
9 | should return a data frame with the following properties:
10 | \itemize{
11 | \item Number of rows equals to \strong{one}.
12 | \item Column names should be treated as \strong{rule names}.
13 | \item Values indicate whether the \strong{data as a whole} follows the rule.
14 | }
15 | }
16 | \details{
17 | This format is inspired by \code{dplyr}'s \link[dplyr:summarise]{summarise()} applied
18 | to non-grouped data.
19 |
20 | The most common way to define data pack is by creating a
21 | \link[magrittr:pipe]{functional sequence} with no grouping and ending with
22 | \code{summarise(...)}.
23 | }
24 | \examples{
25 | data_dims_rules <- . \%>\%
26 | dplyr::summarise(
27 | nrow_low = nrow(.) > 10,
28 | nrow_up = nrow(.) < 20,
29 | ncol_low = ncol(.) > 5,
30 | ncol_up = ncol(.) < 10
31 | )
32 | data_na_rules <- . \%>\%
33 | dplyr::summarise(all_not_na = Negate(anyNA)(.))
34 |
35 | data_packs(
36 | data_nrow = data_dims_rules,
37 | data_na = data_na_rules
38 | )
39 | }
40 | \seealso{
41 | \link[=group-pack]{Group pack}, \link[=column-pack]{Column pack}, \link[=row-pack]{row pack}, \link[=cell-pack]{cell pack}.
42 | }
43 |
--------------------------------------------------------------------------------
/man/expose.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/expose.R
3 | \name{expose}
4 | \alias{expose}
5 | \title{Expose data to rule packs}
6 | \usage{
7 | expose(.tbl, ..., .rule_sep = inside_punct("\\\\._\\\\."),
8 | .remove_obeyers = TRUE, .guess = TRUE)
9 | }
10 | \arguments{
11 | \item{.tbl}{Data frame of interest.}
12 |
13 | \item{...}{Rule packs. They can be in pure form or inside a list
14 | (at any depth).}
15 |
16 | \item{.rule_sep}{Regular expression used as separator between column and
17 | rule names in \link[=column-pack]{col packs} and \link[=cell-pack]{cell packs}.}
18 |
19 | \item{.remove_obeyers}{Whether to remove elements which obey rules from
20 | report.}
21 |
22 | \item{.guess}{Whether to guess type of unsupported rule pack type (see
23 | Details).}
24 | }
25 | \value{
26 | A \code{.tbl} with possibly added 'exposure' attribute containing the
27 | resulting \link{exposure}. If \code{.tbl} already contains 'exposure' attribute then
28 | the result is binded with it.
29 | }
30 | \description{
31 | Function for applying rule packs to data.
32 | }
33 | \details{
34 | \code{expose()} applies all supplied rule packs to data, creates an
35 | \link{exposure} object based on results and stores it to attribute 'exposure'.
36 | It is guaranteed that \code{.tbl} is not modified in any other way in order to
37 | use \code{expose()} inside a \code{\link[magrittr]{pipe}}.
38 |
39 | It is a good idea to name all rule packs: explicitly in \code{...} (if they are
40 | supplied not inside list) or during creation with respective rule pack
41 | function. In case of missing name it is imputed based on possibly existing
42 | exposure attribute in \code{.tbl} and supplied rule packs. Imputation is similar
43 | to one in \code{\link[=rules]{rules()}} but applied to every pack type separately.
44 |
45 | Default value for \code{.rule_sep} is the regular expression \verb{characters ._. surrounded by non alphanumeric characters}. It is picked to be used
46 | smoothly with \code{dplyr}'s \link[dplyr:scoped]{scoped verbs} and \code{\link[=rules]{rules()}} instead
47 | of pure list. In most cases it shouldn't be changed but if needed it
48 | should align with \code{.prefix} in \code{\link[=rules]{rules()}}.
49 | }
50 | \section{Guessing}{
51 |
52 | To work properly in some edge cases one should specify pack types with
53 | \link[=rule-packs]{appropriate function}. However with \code{.guess} equals to \code{TRUE}
54 | \code{expose} will guess the pack type based on its output after applying to
55 | \code{.tbl}. It uses the following features:
56 | \itemize{
57 | \item Presence of non-logical columns: if present then the guess is \link[=group-pack]{group pack}. Grouping columns are guessed as all non-logical. This
58 | works incorrectly if some grouping column is logical: it will be guessed as
59 | result of applying the rule. \strong{Note} that on most occasions this edge case
60 | will produce error about grouping columns define non-unique levels.
61 | \item Combination of whether number of rows equals 1 (\code{n_rows_one}) and
62 | presence of \code{.rule_sep} in all column names (\code{all_contain_sep}). Guesses
63 | are:
64 | \itemize{
65 | \item \link[=data-pack]{Data pack} if \code{n_rows_one == TRUE} and \code{all_contain_sep == FALSE}.
66 | \item \link[=column-pack]{Column pack} if \code{n_rows_one == TRUE} and
67 | \code{all_contain_sep == TRUE}.
68 | \item \link[=row-pack]{Row pack} if \code{n_rows_one == FALSE} and \code{all_contain_sep == FALSE}. This works incorrectly if output has one row which is checked.
69 | In this case it will be guessed as data pack.
70 | \item \link[=cell-pack]{Cell pack} if \code{n_rows_one == FALSE} and \code{all_contain_sep == TRUE}. This works incorrectly if output has one row in which cells
71 | are checked. In this case it will be guessed as column pack.
72 | }
73 | }
74 | }
75 |
76 | \examples{
77 | my_rule_pack <- . \%>\% dplyr::summarise(nrow_neg = nrow(.) < 0)
78 | my_data_packs <- data_packs(my_data_pack_1 = my_rule_pack)
79 |
80 | # These pipes give identical results
81 | mtcars \%>\%
82 | expose(my_data_packs) \%>\%
83 | get_report()
84 |
85 | mtcars \%>\%
86 | expose(my_data_pack_1 = my_rule_pack) \%>\%
87 | get_report()
88 |
89 | # This throws an error because no pack type is specified for my_rule_pack
90 | \dontrun{
91 | mtcars \%>\% expose(my_data_pack_1 = my_rule_pack, .guess = FALSE)
92 | }
93 |
94 | # Edge cases against using 'guess = TRUE' for robust code
95 | group_rule_pack <- . \%>\%
96 | dplyr::mutate(vs_one = vs == 1) \%>\%
97 | dplyr::group_by(vs_one, am) \%>\%
98 | dplyr::summarise(n_low = dplyr::n() > 10)
99 | group_rule_pack_dummy <- . \%>\%
100 | dplyr::mutate(vs_one = vs == 1) \%>\%
101 | dplyr::group_by(mpg, vs_one, wt) \%>\%
102 | dplyr::summarise(n_low = dplyr::n() > 10)
103 | row_rule_pack <- . \%>\% dplyr::transmute(neg_row_sum = rowSums(.) < 0)
104 | cell_rule_pack <- . \%>\% dplyr::transmute_all(rules(neg_value = . < 0))
105 |
106 | # Only column 'am' is guessed as grouping which defines non-unique levels.
107 | \dontrun{
108 | mtcars \%>\%
109 | expose(group_rule_pack, .remove_obeyers = FALSE, .guess = TRUE) \%>\%
110 | get_report()
111 | }
112 |
113 | # Values in `var` should contain combination of three grouping columns but
114 | # column 'vs_one' is guessed as rule. No error is thrown because the guessed
115 | # grouping column define unique levels.
116 | mtcars \%>\%
117 | expose(group_rule_pack_dummy, .remove_obeyers = FALSE, .guess = TRUE) \%>\%
118 | get_report()
119 |
120 | # Results should have in column 'id' value 1 and not 0.
121 | mtcars \%>\%
122 | dplyr::slice(1) \%>\%
123 | expose(row_rule_pack) \%>\%
124 | get_report()
125 |
126 | mtcars \%>\%
127 | dplyr::slice(1) \%>\%
128 | expose(cell_rule_pack) \%>\%
129 | get_report()
130 | }
131 |
--------------------------------------------------------------------------------
/man/expose_single.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/expose.R
3 | \name{expose_single}
4 | \alias{expose_single}
5 | \title{Expose data to single rule pack}
6 | \usage{
7 | expose_single(.tbl, .pack, .rule_sep, .remove_obeyers, ...)
8 | }
9 | \arguments{
10 | \item{.tbl}{Data frame of interest.}
11 |
12 | \item{.pack}{Rule pack function.}
13 |
14 | \item{.rule_sep}{Regular expression used as separator between column and
15 | rule names in \link[=column-pack]{col packs} and \link[=cell-pack]{cell packs}.}
16 |
17 | \item{.remove_obeyers}{Whether to remove elements which obey rules from
18 | report.}
19 |
20 | \item{...}{Further arguments passed to or from other methods.}
21 | }
22 | \description{
23 | The workhorse generic function for doing exposure. The result is
24 | \link{single_exposure}.
25 | }
26 | \keyword{internal}
27 |
--------------------------------------------------------------------------------
/man/exposure.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/exposure.R
3 | \name{exposure}
4 | \alias{exposure}
5 | \alias{is_exposure}
6 | \alias{get_exposure}
7 | \alias{remove_exposure}
8 | \title{Exposure}
9 | \usage{
10 | is_exposure(.x)
11 |
12 | get_exposure(.object)
13 |
14 | remove_exposure(.object)
15 | }
16 | \arguments{
17 | \item{.x}{Object to test.}
18 |
19 | \item{.object}{Object to get or remove \code{exposure} attribute from.}
20 | }
21 | \value{
22 | \code{get_exposure()} returns \code{object} if it is exposure and its attribute
23 | 'exposure' otherwise.
24 |
25 | \code{remove_exposure()} returns \code{object} with removed attributed 'exposure'.
26 | }
27 | \description{
28 | Exposure is a result of \link[=expose]{exposing} data to rules. It is
29 | implemented with S3 class \code{exposure} which is a list of the following
30 | structure: \code{packs_info} - a \link{packs_info} object; \code{report} -
31 | \link[=ruler-report]{tidy data validation report}.
32 | }
33 | \examples{
34 | my_col_packs <- col_packs(
35 | col_sum_props = . \%>\% dplyr::summarise_all(
36 | rules(
37 | col_sum_low = sum(.) > 100,
38 | col_sum_high = sum(.) < 1000
39 | )
40 | )
41 | )
42 | mtcars_exposed <- mtcars \%>\% expose(my_col_packs)
43 | mtcars_exposure <- mtcars_exposed \%>\% get_exposure()
44 |
45 | is_exposure(mtcars_exposure)
46 |
47 | identical(remove_exposure(mtcars_exposed), mtcars)
48 |
49 | identical(get_exposure(mtcars_exposure), mtcars_exposure)
50 | }
51 |
--------------------------------------------------------------------------------
/man/group-pack.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/packs.R
3 | \name{group-pack}
4 | \alias{group-pack}
5 | \title{Group rule pack}
6 | \description{
7 | Group rule pack is a \link[=rule-packs]{rule pack} which defines a set of rules
8 | for groups of rows as a whole, i.e. functions which convert groups of
9 | interest to logical values. It should return a data frame with the following
10 | properties:
11 | \itemize{
12 | \item There should be present some columns which combined values \strong{uniquely}
13 | describe group. They should be defined during creation with
14 | \link[=rule-packs]{group_packs()}.
15 | \item Number of rows equals to \strong{number of checked groups}.
16 | \item Names of non-grouping columns should be treated as \strong{rule names}.
17 | \item Values indicate whether the \strong{group as a whole} follows the rule.
18 | }
19 | }
20 | \details{
21 | This format is inspired by \code{dplyr}'s \link[dplyr:summarise]{summarise()} applied
22 | to grouped data.
23 |
24 | The most common way to define data pack is by creating a
25 | \link[magrittr:pipe]{functional sequence} with grouping and ending with
26 | \code{summarise(...)}.
27 | }
28 | \section{Interpretation}{
29 |
30 | Group pack output is interpreted in the following way:
31 | \itemize{
32 | \item All grouping columns are \link[tidyr:unite]{united} with delimiter \code{.group_sep}
33 | (which is an argument of \code{group_packs()}).
34 | \item Levels of the resulting column are treated as names of some new variables
35 | which should be exposed as a whole. Names of non-grouping columns are treated
36 | as rule names. They are transformed in \link[=column-pack]{column pack} format and
37 | interpreted accordingly.
38 | }
39 |
40 | Exposure result of group pack is different from others in a way that column
41 | \code{var} in \link[=ruler-report]{exposure report} doesn't represent the actual column
42 | in data.
43 | }
44 |
45 | \examples{
46 | vs_am_rules <- . \%>\%
47 | dplyr::group_by(vs, am) \%>\%
48 | dplyr::summarise(
49 | nrow_low = n(.) > 10,
50 | nrow_up = n(.) < 20,
51 | rowmeans_low = rowMeans(.) > 19
52 | )
53 |
54 | group_packs(vs_am = vs_am_rules, .group_vars = c("vs", "am"))
55 | }
56 | \seealso{
57 | \link[=data-pack]{Data pack}, \link[=column-pack]{Column pack}, \link[=row-pack]{row pack}, \link[=cell-pack]{cell pack}.
58 | }
59 |
--------------------------------------------------------------------------------
/man/inside_punct.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/utils.R
3 | \name{inside_punct}
4 | \alias{inside_punct}
5 | \title{Inside punctuation regular expression}
6 | \usage{
7 | inside_punct(.x = "\\\\._\\\\.")
8 | }
9 | \arguments{
10 | \item{.x}{Middle characters to be put between non alpha-numeric characters.}
11 | }
12 | \description{
13 | Function to construct regular expression of form: 'non alpha-numeric
14 | characters' + 'some characters' + 'non alpha-numeric characters'.
15 | }
16 | \examples{
17 | inside_punct()
18 |
19 | inside_punct("abc")
20 | }
21 |
--------------------------------------------------------------------------------
/man/pack_info.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/exposure.R
3 | \name{pack_info}
4 | \alias{pack_info}
5 | \alias{new_pack_info}
6 | \title{Pack info}
7 | \usage{
8 | new_pack_info(.pack, .remove_obeyers)
9 | }
10 | \arguments{
11 | \item{.pack}{\link[=rule-packs]{Rule pack}.}
12 |
13 | \item{.remove_obeyers}{Value of \code{.remove_obeyers} argument of \code{\link[=expose]{expose()}} with
14 | which \code{.pack} was applied.}
15 | }
16 | \description{
17 | An S3 class \code{pack_info} to represent information about pack in \link[=single_exposure]{single exposure}. Its content is as in \link{packs_info} but without
18 | column 'name'.
19 | }
20 | \keyword{internal}
21 |
--------------------------------------------------------------------------------
/man/packs_info.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/exposure.R
3 | \name{packs_info}
4 | \alias{packs_info}
5 | \alias{is_packs_info}
6 | \alias{get_packs_info}
7 | \title{Packs info}
8 | \usage{
9 | is_packs_info(.x, .skip_class = FALSE)
10 |
11 | get_packs_info(.object)
12 | }
13 | \arguments{
14 | \item{.x}{Object to test.}
15 |
16 | \item{.skip_class}{Whether to skip checking inheritance from \code{packs_info}.}
17 |
18 | \item{.object}{Object to get \code{packs_info} value from \code{exposure} attribute.}
19 | }
20 | \value{
21 | \code{get_packs_info()} returns \code{packs_info} attribute of \code{object} if it
22 | is exposure and of its 'exposure' attribute otherwise.
23 | }
24 | \description{
25 | An S3 class \code{packs_info} to represent information about packs in \link{exposure}.
26 | It is a tibble with the following structure:
27 | \itemize{
28 | \item \strong{name} \verb{} : Name of the pack.
29 | \item \strong{type} \verb{} : \link[=rule-packs]{Pack type}.
30 | \item \strong{fun} \verb{} : List (preferably unnamed) of rule pack functions.
31 | \item \strong{remove_obeyers} \verb{} : value of \code{.remove_obeyers} argument of
32 | \code{\link[=expose]{expose()}} with which pack was applied.
33 | }
34 | }
35 | \details{
36 | To avoid possible confusion it is preferred (but not required) that
37 | list-column \code{fun} doesn't have names. Names of packs are stored in \code{name}
38 | column. During \link[=expose]{exposure} \code{fun} is always created without names.
39 | }
40 | \examples{
41 | my_row_packs <- row_packs(
42 | row_mean_props = . \%>\% dplyr::transmute(row_mean = rowMeans(.)) \%>\%
43 | dplyr::transmute(
44 | row_mean_low = row_mean > 20,
45 | row_mean_high = row_mean < 60
46 | ),
47 | row_outlier = . \%>\% dplyr::transmute(row_sum = rowSums(.)) \%>\%
48 | dplyr::transmute(
49 | not_row_outlier = abs(row_sum - mean(row_sum)) / sd(row_sum) < 1.5
50 | )
51 | )
52 | my_data_packs <- data_packs(
53 | data_dims = . \%>\% dplyr::summarise(
54 | nrow = nrow(.) == 32,
55 | ncol = ncol(.) == 5
56 | )
57 | )
58 |
59 | mtcars_exposed <- mtcars \%>\%
60 | expose(my_data_packs, .remove_obeyers = FALSE) \%>\%
61 | expose(my_row_packs)
62 |
63 | mtcars_exposed \%>\% get_packs_info()
64 |
65 | mtcars_exposed \%>\%
66 | get_packs_info() \%>\%
67 | is_packs_info()
68 | }
69 |
--------------------------------------------------------------------------------
/man/pipe.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/utils-pipe.R
3 | \name{\%>\%}
4 | \alias{\%>\%}
5 | \title{Pipe operator}
6 | \usage{
7 | lhs \%>\% rhs
8 | }
9 | \description{
10 | See \code{magrittr::\link[magrittr:pipe]{\%>\%}} for details.
11 | }
12 | \keyword{internal}
13 |
--------------------------------------------------------------------------------
/man/row-pack.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/packs.R
3 | \name{row-pack}
4 | \alias{row-pack}
5 | \title{Row rule pack}
6 | \description{
7 | Row rule pack is a \link[=rule-packs]{rule pack} which defines a set of rules for
8 | rows as a whole, i.e. functions which convert rows of interest to logical
9 | values. It should return a data frame with the following properties:
10 | \itemize{
11 | \item Number of rows equals to \strong{number of checked rows}.
12 | \item Column names should be treated as \strong{rule names}.
13 | \item Values indicate whether the \strong{row as a whole} follows the rule.
14 | }
15 | }
16 | \details{
17 | This format is inspired by \code{dplyr}'s \link[dplyr:transmute]{transmute()}.
18 |
19 | The most common way to define row pack is by creating a
20 | \link[magrittr:pipe]{functional sequence} containing \code{transmute(...)}.
21 | }
22 | \section{Note about rearranging rows}{
23 |
24 | \strong{Note} that during exposure packs are applied to \link[keyholder:keys-set]{keyed object} with \link[keyholder:keyholder-id]{id key}. So they
25 | can rearrange rows as long as it is done with \link[keyholder:keyholder-supported-funs]{functions supported by keyholder}. Rows will be tracked and
26 | recognized as in the original data frame of interest.
27 | }
28 |
29 | \examples{
30 | some_row_mean_rules <- . \%>\%
31 | dplyr::slice(1:3) \%>\%
32 | dplyr::mutate(row_mean = rowMeans(.)) \%>\%
33 | dplyr::transmute(
34 | row_mean_low = row_mean > 10,
35 | row_mean_up = row_mean < 20
36 | )
37 | all_row_sum_rules <- . \%>\%
38 | dplyr::mutate(row_sum = rowSums(.)) \%>\%
39 | dplyr::transmute(row_sum_low = row_sum > 30)
40 |
41 | row_packs(
42 | some_row_mean_rules,
43 | all_row_sum_rules
44 | )
45 | }
46 | \seealso{
47 | \link[=data-pack]{Data pack}, \link[=group-pack]{group pack}, \link[=column-pack]{column pack}, \link[=cell-pack]{cell pack}.
48 | }
49 |
--------------------------------------------------------------------------------
/man/rule-packs.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/packs.R
3 | \name{rule-packs}
4 | \alias{rule-packs}
5 | \alias{packs}
6 | \alias{data_packs}
7 | \alias{group_packs}
8 | \alias{col_packs}
9 | \alias{row_packs}
10 | \alias{cell_packs}
11 | \title{Create rule packs}
12 | \usage{
13 | data_packs(...)
14 |
15 | group_packs(..., .group_vars, .group_sep = ".")
16 |
17 | col_packs(...)
18 |
19 | row_packs(...)
20 |
21 | cell_packs(...)
22 | }
23 | \arguments{
24 | \item{...}{Functions which define packs. They can be in pure form or inside a
25 | list (at any depth).}
26 |
27 | \item{.group_vars}{Character vector of names of grouping variables.}
28 |
29 | \item{.group_sep}{String to be used as separator when uniting grouping
30 | levels for \code{var} column in \link[=ruler-report]{exposure report}.}
31 | }
32 | \value{
33 | \code{data_packs()} returns a list of what should be \link[=data-pack]{data rule packs}, \code{group_packs()} - \link[=group-pack]{group rule packs},
34 | \code{col_packs()} - \link[=column-pack]{column rule packs}, \code{row_packs()} - \link[=row-pack]{row rule packs}, \code{cell_packs()} - \link[=cell-pack]{cell rule packs}.
35 | }
36 | \description{
37 | Functions for creating different kinds of rule packs. \strong{Rule} is a function
38 | which converts data unit of interest (data, group, column, row, cell) to
39 | logical value indicating whether this object satisfies certain condition.
40 | \strong{Rule pack} is a function which combines several rules into one functional
41 | block. It takes a data frame of interest and returns a data frame with
42 | certain structure and column naming scheme. Types of packs differ in
43 | interpretation of their output.
44 | }
45 | \details{
46 | These functions convert \code{...} to list, apply \code{rlang}'s
47 | \link[rlang:flatten]{squash()} and add appropriate classes (\code{group_packs()} also
48 | adds necessary attributes). Also they are only constructors and do not check
49 | for validity of certain pack. \strong{Note} that it is allowed for elements of
50 | \code{...} to not have names: they will be computed during exposure. However it is
51 | a good idea to manually name packs.
52 | }
53 |
--------------------------------------------------------------------------------
/man/ruler-package.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/ruler-package.R
3 | \docType{package}
4 | \name{ruler-package}
5 | \alias{ruler}
6 | \alias{ruler-package}
7 | \title{ruler: Rule Your Data}
8 | \description{
9 | \code{ruler} offers a set of tools for creating tidy data validation reports using
10 | \href{https://dplyr.tidyverse.org}{dplyr} grammar of data manipulation. It
11 | is designed to be flexible and extendable in terms of creating rules and
12 | using their output.
13 | }
14 | \details{
15 | The common workflow is:
16 | \itemize{
17 | \item Define dplyr-style \link[=rule-packs]{packs} of rules for basic data units (data,
18 | group, column, row, cell) to obey.
19 | \item \link[=expose]{Expose} some data to those rules. The result is the same data with
20 | possibly created \link{exposure} attribute. Exposure contains
21 | information \link[=packs_info]{about applied packs} and \link[=ruler-report]{tidy data validation report}.
22 | \item Use data and exposure to perform some \link[=act_after_exposure]{actions}:
23 | \link[=assert_any_breaker]{assert about rule breakers}, impute data, remove
24 | outliers and so on.
25 | }
26 |
27 | To learn more about \code{ruler} browse vignettes with \code{browseVignettes(package = "ruler")}. The preferred order is:
28 | \enumerate{
29 | \item Design process and exposure format.
30 | \item Rule packs.
31 | \item Validation
32 | }
33 | }
34 | \seealso{
35 | Useful links:
36 | \itemize{
37 | \item \url{https://echasnovski.github.io/ruler/}
38 | \item \url{https://github.com/echasnovski/ruler}
39 | \item Report bugs at \url{https://github.com/echasnovski/ruler/issues}
40 | }
41 |
42 | }
43 | \author{
44 | \strong{Maintainer}: Evgeni Chasnovski \email{evgeni.chasnovski@gmail.com} (\href{https://orcid.org/0000-0002-1617-4019}{ORCID})
45 |
46 | }
47 |
--------------------------------------------------------------------------------
/man/ruler-report.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/exposure.R
3 | \name{ruler-report}
4 | \alias{ruler-report}
5 | \alias{is_report}
6 | \alias{get_report}
7 | \title{Tidy data validation report}
8 | \usage{
9 | is_report(.x, .skip_class = FALSE)
10 |
11 | get_report(.object)
12 | }
13 | \arguments{
14 | \item{.x}{Object to test.}
15 |
16 | \item{.skip_class}{Whether to skip checking inheritance from \code{ruler_report}.}
17 |
18 | \item{.object}{Object to get \code{report} value from \code{exposure} attribute.}
19 | }
20 | \value{
21 | \code{get_report()} returns \code{report} element of \code{object} if it is
22 | exposure and of its 'exposure' attribute otherwise.
23 | }
24 | \description{
25 | A tibble representing the data validation result of certain data units in
26 | tidy way:
27 | \itemize{
28 | \item \strong{pack} \verb{} : Name of rule pack from column 'name' of corresponding
29 | \link{packs_info} object.
30 | \item \strong{rule} \verb{} : Name of the rule defined in rule pack.
31 | \item \strong{var} \verb{} : Name of the variable which validation result is reported.
32 | Value '.all' is reserved and interpreted as 'all columns as a whole'.
33 | \strong{Note} that \code{var} doesn't always represent the actual column in data frame
34 | (see \link[=group-pack]{group packs}).
35 | \item \strong{id} \verb{} : Index of the row in tested data frame which validation
36 | result is reported. Value 0 is reserved and interpreted as 'all rows as a
37 | whole'.
38 | \item \strong{value} \verb{} : Whether the described data unit obeys the rule.
39 | }
40 | }
41 | \details{
42 | There are four basic combinations of \code{var} and \code{id} values which
43 | define five basic data units:
44 | \itemize{
45 | \item \code{var == '.all'} and \code{id == 0}: Data as a whole.
46 | \item \code{var != '.all'} and \code{id == 0}: Group (\code{var} shouldn't be an actual column
47 | name) or column (\code{var} should be an actual column name) as a whole.
48 | \item \code{var == '.all'} and \code{id != 0}: Row as a whole.
49 | \item \code{var != '.all'} and \code{id != 0}: Described cell.
50 | }
51 | }
52 | \examples{
53 | my_row_packs <- row_packs(
54 | row_mean_props = . \%>\% dplyr::transmute(row_mean = rowMeans(.)) \%>\%
55 | dplyr::transmute(
56 | row_mean_low = row_mean > 20,
57 | row_mean_high = row_mean < 60
58 | ),
59 | row_outlier = . \%>\% dplyr::transmute(row_sum = rowSums(.)) \%>\%
60 | dplyr::transmute(
61 | not_row_outlier = abs(row_sum - mean(row_sum)) / sd(row_sum) < 1.5
62 | )
63 | )
64 | my_data_packs <- data_packs(
65 | data_dims = . \%>\% dplyr::summarise(
66 | nrow = nrow(.) == 32,
67 | ncol = ncol(.) == 5
68 | )
69 | )
70 |
71 | mtcars_exposed <- mtcars \%>\%
72 | expose(my_data_packs, .remove_obeyers = FALSE) \%>\%
73 | expose(my_row_packs)
74 |
75 | mtcars_exposed \%>\% get_report()
76 |
77 | mtcars_exposed \%>\%
78 | get_report() \%>\%
79 | is_report()
80 | }
81 |
--------------------------------------------------------------------------------
/man/rules.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/rules.R
3 | \name{rules}
4 | \alias{rules}
5 | \title{Create a list of rules}
6 | \usage{
7 | rules(..., .prefix = "._.")
8 | }
9 | \arguments{
10 | \item{...}{Bare expression(s) with \code{.} as input.}
11 |
12 | \item{.prefix}{Prefix to be added to function names.}
13 | }
14 | \description{
15 | \code{rules()} is a function designed to create input for \code{.funs} argument of
16 | scoped \code{dplyr} "mutating" verbs (such as
17 | \link[dplyr:summarise_all]{summarise_all()} and
18 | \link[dplyr:mutate_all]{transmute_all()}). It converts bare expressions
19 | with \code{.} as input into formulas and repairs names of the output.
20 | }
21 | \details{
22 | \code{rules()} repairs names by the following algorithm:
23 | \itemize{
24 | \item Absent names are replaced with the 'rule__\\{ind\\}' where \\{ind\\} is the
25 | index of function position in the \code{...} .
26 | \item \code{.prefix} is added at the beginning of all names. The default is \code{._.} . It
27 | is picked for its symbolism (it is the Morse code of letter 'R') and rare
28 | occurrence in names. In those rare cases it can be manually changed but
29 | this will not be tracked further. \strong{Note} that it is a good idea for
30 | \code{.prefix} to be \link[=make.names]{syntactic}, as \code{dplyr} will force tibble
31 | names to be syntactic. To check if string is "good", use it as input to
32 | \code{make.names()}: if output equals that string than it is a "good" choice.
33 | }
34 | }
35 | \examples{
36 | # `rules()` accepts bare expression calls with `.` as input, which is not
37 | # possible with advised `list()` approach of `dplyr`
38 | dplyr::summarise_all(mtcars[, 1:2], rules(sd, "sd", sd(.), ~ sd(.)))
39 |
40 | dplyr::summarise_all(mtcars[, 1:2], rules(sd, .prefix = "a_a_"))
41 |
42 | # Use `...` in `summarise_all()` to supply extra arguments
43 | dplyr::summarise_all(data.frame(x = c(1:2, NA)), rules(sd), na.rm = TRUE)
44 | }
45 |
--------------------------------------------------------------------------------
/man/single_exposure.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/exposure.R
3 | \name{single_exposure}
4 | \alias{single_exposure}
5 | \title{Single exposure}
6 | \description{
7 | An S3 class \code{single_exposure} to represent exposure of data to \strong{one} rule
8 | pack. It is a list of the following structure: \code{pack_info} - single
9 | \link{pack_info} object; \code{report} - \link[=ruler-report]{tidy data validation report}
10 | without column \code{pack}.
11 | }
12 | \details{
13 | Single exposure is implemented in order to encapsulate preliminary
14 | exposure data from single rule pack. It is needed to impute possibly missing
15 | pack names during \link[=expose]{exposure}. That is why \code{single_exposure} doesn't
16 | contain pack name in any form.
17 | }
18 | \keyword{internal}
19 |
--------------------------------------------------------------------------------
/man/spread_groups.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/spread-groups.R
3 | \name{spread_groups}
4 | \alias{spread_groups}
5 | \title{Spread grouping columns}
6 | \usage{
7 | spread_groups(.tbl, ..., .group_sep = ".", .col_sep = "._.")
8 | }
9 | \arguments{
10 | \item{.tbl}{Data frame with result of grouped summary.}
11 |
12 | \item{...}{A selection of grouping columns (as in \code{\link[tidyr:unite]{tidyr::unite()}}).}
13 |
14 | \item{.group_sep}{A string to be used as separator of grouping levels.}
15 |
16 | \item{.col_sep}{A string to be used as separator in column pack.}
17 | }
18 | \value{
19 | A data frame in \link[=column-pack]{column pack} format.
20 | }
21 | \description{
22 | Function that is used during interpretation of \link[=group-pack]{group pack}
23 | output. It converts grouped \link[dplyr:summarise]{summary} into \link[=column-pack]{column pack} format.
24 | }
25 | \details{
26 | Multiple grouping variables are converted to one with
27 | \code{\link[tidyr:unite]{tidyr::unite()}} and separator \code{.group_sep}. New values are then treated as
28 | variable names which should be validated and which represent the group data
29 | as a whole.
30 | }
31 | \examples{
32 | mtcars_grouped_summary <- mtcars \%>\%
33 | dplyr::group_by(vs, am) \%>\%
34 | dplyr::summarise(n_low = dplyr::n() > 6, n_high = dplyr::n() < 10)
35 |
36 | spread_groups(mtcars_grouped_summary, vs, am)
37 |
38 | spread_groups(mtcars_grouped_summary, vs, am, .group_sep = "__")
39 |
40 | spread_groups(mtcars_grouped_summary, vs, am, .col_sep = "__")
41 | }
42 |
--------------------------------------------------------------------------------
/pkgdown/extra.css:
--------------------------------------------------------------------------------
1 | .navbar-default {
2 | background-color: #008080;
3 | border-color: #008080;
4 | }
5 |
6 | #toc {
7 | font-size: 150%;
8 | }
9 |
10 | #toc .nav a {
11 | font-size: 100%;
12 | }
13 |
14 | pre {
15 | background-color: #ffffff;
16 | border-color: #000000;
17 | border-width: 1px;
18 | overflow-x: auto;
19 | }
20 |
21 | pre code {
22 | overflow-wrap: normal;
23 | white-space: pre;
24 | }
25 |
26 | /* Idea style */
27 | .hljs {
28 | display: block;
29 | overflow-x: auto;
30 | padding: 0.5em;
31 | color: #000;
32 | background: #fff;
33 | }
34 |
35 | .hljs-subst,
36 | .hljs-title {
37 | font-weight: normal;
38 | color: #000;
39 | }
40 |
41 | .hljs-comment,
42 | .hljs-quote {
43 | color: #808080;
44 | font-style: italic;
45 | }
46 |
47 | .hljs-meta {
48 | color: #808000;
49 | }
50 |
51 | .hljs-tag {
52 | background: #efefef;
53 | }
54 |
55 | .hljs-section,
56 | .hljs-name,
57 | .hljs-literal,
58 | .hljs-keyword,
59 | .hljs-selector-tag,
60 | .hljs-type,
61 | .hljs-selector-id,
62 | .hljs-selector-class {
63 | font-weight: bold;
64 | color: #000080;
65 | }
66 |
67 | .hljs-attribute,
68 | .hljs-number,
69 | .hljs-regexp,
70 | .hljs-link {
71 | font-weight: bold;
72 | color: #0000ff;
73 | }
74 |
75 | .hljs-number,
76 | .hljs-regexp,
77 | .hljs-link {
78 | font-weight: normal;
79 | }
80 |
81 | .hljs-string {
82 | color: #008000;
83 | font-weight: bold;
84 | }
85 |
86 | .hljs-symbol,
87 | .hljs-bullet,
88 | .hljs-formula {
89 | color: #000;
90 | background: #d0eded;
91 | font-style: italic;
92 | }
93 |
94 | .hljs-doctag {
95 | text-decoration: underline;
96 | }
97 |
98 | .hljs-variable,
99 | .hljs-template-variable {
100 | color: #660e7a;
101 | }
102 |
103 | .hljs-addition {
104 | background: #baeeba;
105 | }
106 |
107 | .hljs-deletion {
108 | background: #ffc8bd;
109 | }
110 |
111 | .hljs-emphasis {
112 | font-style: italic;
113 | }
114 |
115 | .hljs-strong {
116 | font-weight: bold;
117 | }
118 |
119 | /* Custom highlighting */
120 |
121 | .hljs-tag, .hljs-formula, .hljs-addition, .hljs-deletion {
122 | background: #ffffff;
123 | }
124 |
125 | /* Strings */
126 | .hljs-string {
127 | color: #008000;
128 | font-weight: bold;
129 | }
130 |
131 | /* Comments */
132 | .hljs-comment {
133 | color: #404080;
134 | font-style: normal;
135 | }
136 |
137 | .hljs-fun-param {
138 | color: #ff4000;
139 | }
140 |
141 | .hljs-pipe, .hljs-assign {
142 | font-weight: bold;
143 | }
144 |
--------------------------------------------------------------------------------
/ruler.Rproj:
--------------------------------------------------------------------------------
1 | Version: 1.0
2 |
3 | RestoreWorkspace: No
4 | SaveWorkspace: No
5 | AlwaysSaveHistory: Default
6 |
7 | EnableCodeIndexing: Yes
8 | UseSpacesForTab: Yes
9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 |
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 |
15 | AutoAppendNewline: Yes
16 | StripTrailingWhitespace: Yes
17 |
18 | BuildType: Package
19 | PackageUseDevtools: Yes
20 | PackageInstallArgs: --no-multiarch --with-keep.source
21 | PackageRoxygenize: rd,collate,namespace
22 |
--------------------------------------------------------------------------------
/tests/testthat.R:
--------------------------------------------------------------------------------
1 | library(testthat)
2 | library(dplyr)
3 | library(rlang)
4 | library(ruler)
5 |
6 | test_check("ruler")
7 |
--------------------------------------------------------------------------------
/tests/testthat/helper-expose-data.R:
--------------------------------------------------------------------------------
1 | # Results of some packs ---------------------------------------------------
2 | input_data_pack_out <- tibble::tibble("rule__1" = TRUE, "nrow" = FALSE)
3 | input_group_pack_out <- tibble::tibble(
4 | "vs" = c(0, 0, 1, 1), "am" = c(0, 1, 0, 1),
5 | "n_low" = c(TRUE, FALSE, FALSE, FALSE),
6 | "n_high" = c(TRUE, TRUE, TRUE, TRUE)
7 | )
8 | input_col_pack_out <- tibble::tibble(
9 | "vs_._.rule__1" = TRUE, "am_._.rule__1" = FALSE,
10 | "cyl_._.not_outlier" = TRUE, "vs_._.not_outlier" = TRUE
11 | )
12 | input_row_pack_out <- tibble::tibble(
13 | "row_rule__1" = rep(TRUE, 2),
14 | "._.rule__2" = c(TRUE, FALSE)
15 | ) %>% keyholder::assign_keys(tibble::tibble(.id = c(1, 3)))
16 | input_cell_pack_out <- tibble::tibble(
17 | "vs_._.rule__1" = rep(TRUE, 2), "am_._.rule__1" = rep(FALSE, 2),
18 | "cyl_._.not_outlier" = c(TRUE, FALSE), "vs_._.not_outlier" = c(TRUE, FALSE)
19 | ) %>% keyholder::assign_keys(tibble::tibble(.id = c(1, 4)))
20 |
21 |
22 | # Exposure data -----------------------------------------------------------
23 | input_packs <- list(
24 | data = data_packs(
25 | . %>% dplyr::summarise(
26 | nrow_low = nrow(.) > 10, nrow_high = nrow(.) < 20,
27 | ncol_low = ncol(.) > 5, ncol_high = ncol(.) < 10
28 | )
29 | )[[1]],
30 | group = group_packs(
31 | . %>% dplyr::group_by(vs, am) %>%
32 | dplyr::summarise(n_low = dplyr::n() > 10, n_high = dplyr::n() < 15) %>%
33 | dplyr::ungroup(),
34 | .group_vars = c("vs", "am"), .group_sep = "."
35 | )[[1]],
36 | col = col_packs(
37 | . %>% dplyr::summarise_if(
38 | rlang::is_integerish,
39 | rules(tot_sum = sum(.) > 100)
40 | )
41 | )[[1]],
42 | row = row_packs(
43 | . %>% dplyr::transmute(row_sum = rowSums(.)) %>%
44 | dplyr::transmute(
45 | outlier_sum = abs(row_sum - mean(row_sum)) / sd(row_sum) < 1
46 | ) %>%
47 | dplyr::slice(15:1)
48 | )[[1]],
49 | cell = cell_packs(
50 | . %>% dplyr::transmute_if(
51 | Negate(rlang::is_integerish),
52 | rules(abs(. - mean(.)) / sd(.) < 2)
53 | )
54 | )[[1]],
55 | col_other = col_packs(
56 | . %>% dplyr::summarise_if(
57 | rlang::is_integerish,
58 | rules(
59 | tot_sum = sum(.) > 100,
60 | .prefix = "_._"
61 | )
62 | )
63 | )[[1]],
64 | cell_other = cell_packs(
65 | . %>% dplyr::transmute_if(
66 | Negate(rlang::is_integerish),
67 | rules(abs(. - mean(.)) / sd(.) < 2,
68 | .prefix = "_._"
69 | )
70 | )
71 | )[[1]]
72 | )
73 | input_remove_obeyers <- c(
74 | data = TRUE, group = FALSE, col = FALSE,
75 | row = TRUE, cell = TRUE
76 | )
77 | input_reports <- list(
78 | data = tibble::tibble(
79 | rule = c("nrow_high", "ncol_high"),
80 | var = rep(".all", 2),
81 | id = rep(0L, 2),
82 | value = rep(FALSE, 2)
83 | ),
84 | group = tibble::tibble(
85 | rule = rep(c("n_low", "n_high"), each = 4),
86 | var = rep(c("0.0", "0.1", "1.0", "1.1"), times = 2),
87 | id = rep(0L, 8),
88 | value = c(TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
89 | ),
90 | col = tibble::tibble(
91 | rule = rep("tot_sum", 6),
92 | var = c("cyl", "hp", "vs", "am", "gear", "carb"),
93 | id = rep(0L, 6),
94 | value = c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE)
95 | ),
96 | row = tibble::tibble(
97 | rule = rep("outlier_sum", 2),
98 | var = rep(".all", 2),
99 | id = c(15L, 7L),
100 | value = rep(FALSE, 2)
101 | ),
102 | cell = tibble::tibble(
103 | rule = rep("rule__1", 7),
104 | var = c("mpg", "mpg", "drat", "wt", "wt", "wt", "qsec"),
105 | id = c(18L, 20L, 19L, 15L, 16L, 17L, 9L),
106 | value = rep(FALSE, 7)
107 | )
108 | )
109 |
110 | # Construction of exposure data
111 | add_pack_name_to_single_report <- function(.report, .pack_name) {
112 | res <- .report
113 | res[["pack"]] <- rep(.pack_name, nrow(.report))
114 |
115 | res[, c("pack", colnames(.report))] %>% add_class("ruler_report")
116 | }
117 |
118 | single_exposure_inds <- c(
119 | "data", "cell", "col", "col", "data", "row", "data",
120 | "group"
121 | )
122 | exposure_names <- c(
123 | "data_dims", "cell_not_outlier", "col_proper_sums",
124 | "new_col_proper_sums", "new_data_pack", "row_not_outlier",
125 | "another_data_pack", "first_group_pack"
126 | )
127 |
128 | input_single_exposures <- mapply(
129 | new_single_exposure,
130 | # `unname()` is needed to ensure that input vectors have no names. Otherwise
131 | # there can be issues with `dplyr::bind_rows()` (powered by
132 | # `vctrs::vec_rbind()`) removing those names but 'tibble'>=3.0.0 keeping them.
133 | unname(input_packs[single_exposure_inds]),
134 | unname(input_remove_obeyers[single_exposure_inds]),
135 | unname(input_reports[single_exposure_inds]),
136 | SIMPLIFY = FALSE
137 | ) %>%
138 | setNames(exposure_names)
139 |
140 | input_exposures <- mapply(
141 | new_exposure,
142 | mapply(
143 | new_packs_info,
144 | exposure_names,
145 | # `unname()` is needed to ensure that input vectors have no names
146 | lapply(unname(input_packs[single_exposure_inds]), list),
147 | unname(input_remove_obeyers[single_exposure_inds]),
148 | SIMPLIFY = FALSE
149 | ),
150 | mapply(
151 | add_pack_name_to_single_report,
152 | # `unname()` is needed to ensure that input vectors have no names
153 | unname(input_reports[single_exposure_inds]),
154 | exposure_names,
155 | SIMPLIFY = FALSE
156 | ),
157 | SIMPLIFY = FALSE
158 | ) %>%
159 | setNames(exposure_names)
160 |
161 | exposure_ref_inds <- c("col", "col", "cell", "data", "data", "row", "group")
162 | exposure_ref_pack_names <- c(
163 | "col_pack_n1", "col_pack_n2", "cell_pack_n1",
164 | "data_pack_n1", "data_pack_n2", "row_pack_n1",
165 | "group_pack_n1"
166 | )
167 | input_exposure_ref <- new_exposure(
168 | new_packs_info(
169 | exposure_ref_pack_names,
170 | # `unname()` is needed to ensure that input vectors have no names
171 | unname(input_packs[exposure_ref_inds]),
172 | unname(input_remove_obeyers[exposure_ref_inds])
173 | ),
174 | mapply(
175 | add_pack_name_to_single_report,
176 | # `unname()` is needed to ensure that input vectors have no names
177 | unname(input_reports[exposure_ref_inds]),
178 | exposure_ref_pack_names,
179 | SIMPLIFY = FALSE
180 | ) %>%
181 | dplyr::bind_rows() %>%
182 | as_report(.validate = FALSE)
183 | )
184 |
--------------------------------------------------------------------------------
/tests/testthat/test-actions.R:
--------------------------------------------------------------------------------
1 | context("actions")
2 |
3 |
4 | # Helper functions --------------------------------------------------------
5 | # Taken from https://github.com/harrelfe/Hmisc/blob/master/R/regexpEscape.s
6 | escape_regex <- function(string) {
7 | gsub("([.|()\\^{}+$*?]|\\[|\\])", "\\\\\\1", string)
8 | }
9 |
10 |
11 | # Input data --------------------------------------------------------------
12 | mtcars_exposed <- mtcars %>% set_exposure(input_exposure_ref)
13 | rule_breakers <- input_exposure_ref %>%
14 | get_report() %>%
15 | filter(!(value %in% TRUE))
16 |
17 | trigger_nrow_30 <- function(.tbl) {
18 | nrow(get_report(.tbl)) > 40
19 | }
20 | trigger_nrow_10 <- function(.tbl) {
21 | nrow(get_report(.tbl)) > 10
22 | }
23 | actor_print <- function(.tbl) {
24 | print(get_exposure(.tbl))
25 |
26 | .tbl
27 | }
28 |
29 | assert_text <- "assert_any_breaker: Some breakers found in exposure."
30 |
31 | exposure_no_breakers <- input_exposure_ref
32 | exposure_no_breakers$packs_info <- exposure_no_breakers$packs_info %>%
33 | slice(1) %>%
34 | as_packs_info()
35 | exposure_no_breakers$report <- exposure_no_breakers$report %>%
36 | slice(c(1, 2, 5)) %>%
37 | as_report()
38 |
39 | mtcars_exposed_no_breakers <- set_exposure(mtcars, exposure_no_breakers)
40 |
41 |
42 | # Custom expectations -----------------------------------------------------
43 | expect_asserts <- function(input, type, silent = FALSE, result = input,
44 | output_name = "Breakers report\n",
45 | output_report,
46 | warnings = character(0),
47 | messages = character(0),
48 | ...) {
49 | assert_evaluation <- evaluate_promise(
50 | assert_any_breaker(input, type, silent, ...)
51 | )
52 |
53 | expect_identical(assert_evaluation$result, result)
54 | expect_match(assert_evaluation$output, output_name)
55 | expect_match(assert_evaluation$output, output_report)
56 | expect_identical(assert_evaluation$warnings, warnings)
57 | expect_identical(assert_evaluation$messages, messages)
58 | }
59 |
60 |
61 | # act_after_exposure ------------------------------------------------------
62 | test_that("act_after_exposure works", {
63 | expect_error(
64 | act_after_exposure(mtcars, trigger_nrow_30, actor_print),
65 | "act_after_exposure:.*not.*have"
66 | )
67 |
68 | input_bad <- mtcars
69 | attr(input_bad, "exposure") <- "a"
70 |
71 | expect_error(
72 | act_after_exposure(input_bad, trigger_nrow_30, actor_print),
73 | "act_after_exposure:.*not.*proper.*exposure"
74 | )
75 |
76 | expect_silent(
77 | output_1 <- act_after_exposure(
78 | mtcars_exposed, trigger_nrow_30,
79 | actor_print
80 | )
81 | )
82 | expect_identical(output_1, mtcars_exposed)
83 |
84 | output_ref <- capture_output(print(input_exposure_ref))
85 |
86 | expect_output(
87 | output_2 <- act_after_exposure(
88 | mtcars_exposed, trigger_nrow_10,
89 | actor_print
90 | ),
91 | output_ref,
92 | fixed = TRUE
93 | )
94 | expect_identical(output_2, mtcars_exposed)
95 | })
96 |
97 |
98 | # assert_any_breaker ------------------------------------------------------
99 | test_that("assert_any_breaker works", {
100 | output_ref <- escape_regex(capture_output(print(rule_breakers)))
101 |
102 | # Error assertions
103 | expect_error(
104 | expect_output(assert_any_breaker(mtcars_exposed), output_ref),
105 | assert_text
106 | )
107 | expect_error(
108 | expect_output(assert_any_breaker(mtcars_exposed, "error"), output_ref),
109 | assert_text
110 | )
111 | expect_error(
112 | expect_output(assert_any_breaker(mtcars_exposed, "error", TRUE), ""),
113 | assert_text
114 | )
115 |
116 | # Warning and message assertions
117 | expect_asserts(
118 | mtcars_exposed,
119 | "warning",
120 | output_report = output_ref,
121 | warnings = assert_text
122 | )
123 | expect_asserts(
124 | mtcars_exposed,
125 | "message",
126 | output_report = output_ref,
127 | messages = paste0(assert_text, "\n")
128 | )
129 |
130 | # Absence of printing
131 | expect_asserts(
132 | mtcars_exposed,
133 | "warning",
134 | silent = TRUE,
135 | output_name = "",
136 | output_report = "",
137 | warnings = assert_text
138 | )
139 | expect_asserts(
140 | mtcars_exposed,
141 | "message",
142 | silent = TRUE,
143 | output_name = "",
144 | output_report = "",
145 | messages = paste0(assert_text, "\n")
146 | )
147 |
148 | # Absence of assertions
149 | expect_asserts(
150 | mtcars_exposed_no_breakers,
151 | "error",
152 | output_name = "",
153 | output_report = ""
154 | )
155 | expect_asserts(
156 | mtcars_exposed_no_breakers,
157 | "warning",
158 | output_name = "",
159 | output_report = ""
160 | )
161 | expect_asserts(
162 | mtcars_exposed_no_breakers,
163 | "message",
164 | output_name = "",
165 | output_report = ""
166 | )
167 | })
168 |
169 | test_that("assert_any_breaker accounts for printing options", {
170 | output_ref <- escape_regex(capture_output(print(rule_breakers, n = 3)))
171 |
172 | expect_error(
173 | expect_output(
174 | assert_any_breaker(mtcars_exposed, "error", n = 3),
175 | output_ref
176 | ),
177 | assert_text
178 | )
179 | expect_asserts(
180 | mtcars_exposed,
181 | "warning",
182 | output_report = output_ref,
183 | warnings = assert_text,
184 | n = 3
185 | )
186 | expect_asserts(
187 | mtcars_exposed,
188 | "message",
189 | output_report = output_ref,
190 | messages = paste0(assert_text, "\n"),
191 | n = 3
192 | )
193 | })
194 |
195 |
196 | # any_breaker -------------------------------------------------------------
197 | test_that("any_breaker works", {
198 | expect_error(any_breaker("a"), "any_breaker:.*not.*proper.*exposure")
199 | expect_true(any_breaker(input_exposure_ref))
200 | expect_false(any_breaker(exposure_no_breakers))
201 | })
202 |
203 |
204 | # generate_breakers_informer ----------------------------------------------
205 | test_that("generate_breakers_informer works", {
206 | custom_assert_text <- "Custom"
207 | informer <- generate_breakers_informer(
208 | .fun = warning,
209 | .message = custom_assert_text,
210 | .silent = FALSE
211 | )
212 |
213 | expect_is(informer, "function")
214 |
215 | output <- evaluate_promise(informer(.tbl = mtcars_exposed))
216 |
217 | expect_identical(output$result, mtcars_exposed)
218 | expect_match(
219 | output$output,
220 | escape_regex(capture_output(print(rule_breakers)))
221 | )
222 | expect_identical(output$warnings, custom_assert_text)
223 | expect_identical(output$messages, character(0))
224 | })
225 |
--------------------------------------------------------------------------------
/tests/testthat/test-expose-helpers.R:
--------------------------------------------------------------------------------
1 | context("expose-helpers")
2 |
3 |
4 | # guess_pack_type ---------------------------------------------------------
5 | test_that("guess_pack_type works", {
6 | expect_identical(guess_pack_type(input_data_pack_out), "data_pack")
7 | expect_identical(guess_pack_type(input_group_pack_out), "group_pack")
8 | expect_identical(guess_pack_type(input_col_pack_out), "col_pack")
9 | expect_identical(guess_pack_type(input_row_pack_out), "row_pack")
10 | expect_identical(guess_pack_type(input_cell_pack_out), "cell_pack")
11 |
12 | input_col_pack_out_1 <- input_col_pack_out
13 | names(input_col_pack_out_1) <-
14 | gsub("\\._\\.", "\\.___\\.", names(input_col_pack_out_1))
15 |
16 | expect_identical(
17 | guess_pack_type(
18 | input_col_pack_out_1,
19 | inside_punct("\\.___\\.")
20 | ),
21 | "col_pack"
22 | )
23 | })
24 |
25 |
26 | # remove_obeyers ----------------------------------------------------------
27 | test_that("remove_obeyers works", {
28 | input_report <- tibble::tibble(
29 | pack = rep("data_pack", 4), rule = paste0("rule__", 1:4),
30 | var = rep(".all", 4), id = rep(0L, 4),
31 | value = c(TRUE, FALSE, TRUE, NA)
32 | )
33 |
34 | expect_identical(remove_obeyers(input_report, FALSE), input_report)
35 | expect_identical(remove_obeyers(input_report, TRUE), input_report[c(2, 4), ])
36 | })
37 |
38 |
39 | # impute_exposure_pack_names ----------------------------------------------
40 | test_that("impute_exposure_pack_names works with NULL reference exposure", {
41 | expect_identical(
42 | impute_exposure_pack_names(input_single_exposures, input_exposure_ref),
43 | input_single_exposures
44 | )
45 |
46 | cur_input_single_exposures <- input_single_exposures
47 | names_remove_inds <- c(1, 2, 3, 5, 6, 8)
48 | names(cur_input_single_exposures)[names_remove_inds] <-
49 | rep("", length(names_remove_inds))
50 |
51 | expect_identical(
52 | names(impute_exposure_pack_names(cur_input_single_exposures, NULL)),
53 | c(
54 | "data_pack__1", "cell_pack__1", "col_pack__1", "new_col_proper_sums",
55 | "data_pack__2", "row_pack__1", "another_data_pack", "group_pack__1"
56 | )
57 | )
58 | })
59 |
60 | test_that("impute_exposure_pack_names works with not NULL reference exposure", {
61 | cur_input_single_exposures <- input_single_exposures
62 | names_remove_inds <- c(1, 2, 3, 5, 6, 8)
63 | names(cur_input_single_exposures)[names_remove_inds] <-
64 | rep("", length(names_remove_inds))
65 |
66 | expect_identical(
67 | names(impute_exposure_pack_names(
68 | cur_input_single_exposures,
69 | input_exposure_ref
70 | )),
71 | c(
72 | "data_pack__3", "cell_pack__2", "col_pack__3", "new_col_proper_sums",
73 | "data_pack__4", "row_pack__2", "another_data_pack", "group_pack__2"
74 | )
75 | )
76 | })
77 |
78 |
79 | # add_pack_names ----------------------------------------------------------
80 | test_that("add_pack_names works", {
81 | expect_identical(
82 | add_pack_names(input_single_exposures),
83 | input_exposures
84 | )
85 | })
86 |
87 |
88 | # bind_exposures ----------------------------------------------------------
89 | test_that("bind_exposures works", {
90 | expect_identical(
91 | bind_exposures(list(input_exposure_ref, NULL)),
92 | input_exposure_ref
93 | )
94 | expect_identical(
95 | bind_exposures(list(NULL, NULL)),
96 | NULL
97 | )
98 |
99 | output_ref <- new_exposure(
100 | .packs_info = new_packs_info(
101 | rep(input_exposure_ref$packs_info$name, 2),
102 | c(input_exposure_ref$packs_info$fun, input_exposure_ref$packs_info$fun),
103 | rep(input_exposure_ref$packs_info$remove_obeyers, 2)
104 | ),
105 | .report = bind_rows(
106 | input_exposure_ref$report,
107 | input_exposure_ref$report
108 | ) %>%
109 | add_class_cond("ruler_report")
110 | )
111 |
112 | expect_identical(
113 | bind_exposures(list(input_exposure_ref, input_exposure_ref)),
114 | output_ref
115 | )
116 | expect_identical(
117 | bind_exposures(input_exposure_ref, input_exposure_ref),
118 | output_ref
119 | )
120 | })
121 |
122 |
123 | # filter_not_null ---------------------------------------------------------
124 | test_that("filter_not_null works", {
125 | input <- list(NULL, 1, list(2), NULL, "a", "b", list(NULL))
126 | output_ref <- input[-c(1, 4)]
127 |
128 | expect_identical(filter_not_null(input), output_ref)
129 | })
130 |
131 |
132 | # assert_pack_out_one_row -------------------------------------------------
133 | test_that("assert_pack_out_one_row works", {
134 | expect_silent(assert_pack_out_one_row(input_data_pack_out, "data_pack"))
135 | expect_error(
136 | assert_pack_out_one_row(input_row_pack_out, "row_pack"),
137 | "row_pack.*not.*row"
138 | )
139 | })
140 |
141 |
142 | # assert_pack_out_all_logical ---------------------------------------------
143 | test_that("assert_pack_out_all_logical works", {
144 | expect_silent(assert_pack_out_all_logical(input_data_pack_out, "data_pack"))
145 |
146 | input_bad <- tibble::tibble(good = c(TRUE, FALSE), bad = 1:2)
147 |
148 | expect_error(
149 | assert_pack_out_all_logical(input_bad, "cell_pack"),
150 | "cell_pack.*not.*logical"
151 | )
152 | })
153 |
154 |
155 | # assert_pack_out_all_have_separator --------------------------------------
156 | test_that("assert_pack_out_all_have_separator works", {
157 | expect_silent(
158 | assert_pack_out_all_have_separator(
159 | input_col_pack_out, "col_pack", inside_punct("\\._\\.")
160 | )
161 | )
162 | expect_error(
163 | assert_pack_out_all_have_separator(
164 | input_data_pack_out, "data_pack", inside_punct("\\._\\.")
165 | ),
166 | "data_pack.*not.*separator"
167 | )
168 | expect_error(
169 | assert_pack_out_all_have_separator(
170 | input_col_pack_out, "col_pack", inside_punct("\\.___\\.")
171 | ),
172 | "col_pack.*not.*separator"
173 | )
174 | })
175 |
--------------------------------------------------------------------------------
/tests/testthat/test-packs.R:
--------------------------------------------------------------------------------
1 | context("packs")
2 |
3 |
4 | # Input data --------------------------------------------------------------
5 | input <- list(1, dot2 = "a", mean, list(new = 2, 3))
6 |
7 | compute_output_ref <- function(.extra_class) {
8 | list(
9 | structure(1, class = c(.extra_class, "rule_pack", "numeric")),
10 | dot2 = structure("a", class = c(.extra_class, "rule_pack", "character")),
11 | structure(mean, class = c(.extra_class, "rule_pack", "function")),
12 | new = structure(2, class = c(.extra_class, "rule_pack", "numeric")),
13 | structure(3, class = c(.extra_class, "rule_pack", "numeric"))
14 | )
15 | }
16 |
17 |
18 | # data_packs --------------------------------------------------------------
19 | test_that("data_packs works", {
20 | output <- data_packs(!!!input)
21 | output_ref <- compute_output_ref(.extra_class = "data_pack")
22 |
23 | expect_identical(output, output_ref)
24 | })
25 |
26 |
27 | # group_packs -------------------------------------------------------------
28 | test_that("group_packs works", {
29 | output_1 <- group_packs(!!!input, .group_vars = c("x", "y"))
30 | output_2 <- group_packs(
31 | !!!input,
32 | .group_vars = c("x", "y"),
33 | .group_sep = "+"
34 | )
35 | output_ref <- compute_output_ref(.extra_class = "group_pack") %>%
36 | lapply(`attr<-`, which = "group_vars", value = c("x", "y"))
37 | output_ref_1 <- lapply(output_ref, `attr<-`, which = "group_sep", value = ".")
38 | output_ref_2 <- lapply(output_ref, `attr<-`, which = "group_sep", value = "+")
39 |
40 | expect_identical(output_1, output_ref_1)
41 | expect_identical(output_2, output_ref_2)
42 | })
43 |
44 | test_that("group_packs throws errors", {
45 | expect_error(group_packs(!!!input, .group_vars = character(0)))
46 | expect_error(group_packs(!!!input, .group_vars = 1:2))
47 |
48 | expect_error(group_packs(!!!input, .group_vars = "a", .group_sep = 1))
49 | expect_error(
50 | group_packs(!!!input, .group_vars = "a", .group_sep = c("+", "-"))
51 | )
52 | })
53 |
54 |
55 | # col_packs ---------------------------------------------------------------
56 | test_that("col_packs works", {
57 | output <- col_packs(!!!input)
58 | output_ref <- compute_output_ref(.extra_class = "col_pack")
59 |
60 | expect_identical(output, output_ref)
61 | })
62 |
63 |
64 | # row_packs ---------------------------------------------------------------
65 | test_that("row_packs works", {
66 | output <- row_packs(!!!input)
67 | output_ref <- compute_output_ref(.extra_class = "row_pack")
68 |
69 | expect_identical(output, output_ref)
70 | })
71 |
72 |
73 | # cell_packs --------------------------------------------------------------
74 | test_that("cell_packs works", {
75 | output <- cell_packs(!!!input)
76 | output_ref <- compute_output_ref(.extra_class = "cell_pack")
77 |
78 | expect_identical(output, output_ref)
79 | })
80 |
81 |
82 | # squash_dots_rule_pack ---------------------------------------------------
83 | test_that("squash_dots_rule_pack returns a list", {
84 | output <- squash_dots_rule_pack(1, .extra_class = "extra")
85 | names(output) <- NULL
86 | output_ref <- list(structure(1, class = c("extra", "rule_pack", "numeric")))
87 |
88 | expect_identical(output, output_ref)
89 | })
90 |
91 | test_that("squash_dots_rule_pack returns a named list", {
92 | output <- squash_dots_rule_pack(!!!input[1:3], .extra_class = "extra")
93 | output_ref <- compute_output_ref(.extra_class = "extra")[1:3]
94 |
95 | expect_identical(output, output_ref)
96 | })
97 |
98 | test_that("squash_dots_rule_pack squashes", {
99 | output <- squash_dots_rule_pack(
100 | list(list(1L), list(2L, list(3L))),
101 | list(list(list(4L)), list(5L, list(6L))),
102 | .extra_class = "extra"
103 | )
104 | names(output) <- NULL
105 | output_ref <- lapply(
106 | 1:6,
107 | structure,
108 | class = c("extra", "rule_pack", "integer")
109 | )
110 |
111 | expect_identical(output, output_ref)
112 | })
113 |
114 |
115 | # print.data_pack ---------------------------------------------------------
116 | test_that("print.data_pack works", {
117 | expect_output(print(data_packs(!!!input)[[1]]), "Data.*ule.*ack")
118 | })
119 |
120 |
121 | # print.group_pack --------------------------------------------------------
122 | test_that("print.group_pack works", {
123 | expect_output(
124 | print(group_packs(!!!input, .group_vars = "a")[[1]]),
125 | "Group.*ule.*ack"
126 | )
127 | })
128 |
129 |
130 | # print.col_pack ----------------------------------------------------------
131 | test_that("print.col_pack works", {
132 | expect_output(print(col_packs(!!!input)[[1]]), "Column.*ule.*ack")
133 | })
134 |
135 |
136 | # print.row_pack ----------------------------------------------------------
137 | test_that("print.row_pack works", {
138 | expect_output(print(row_packs(!!!input)[[1]]), "Row.*ule.*ack")
139 | })
140 |
141 |
142 | # print.cell_pack ---------------------------------------------------------
143 | test_that("print.cell_pack works", {
144 | expect_output(print(cell_packs(!!!input)[[1]]), "Cell.*ule.*ack")
145 | })
146 |
--------------------------------------------------------------------------------
/tests/testthat/test-rules.R:
--------------------------------------------------------------------------------
1 | context("rules")
2 |
3 |
4 | # rules -------------------------------------------------------------------
5 | test_that("rules works", {
6 | output_1 <- rules(mean, "mean", mean(.), ~ mean(.))
7 | output_ref_1 <- list(
8 | ._.rule__1 = mean,
9 | ._.rule__2 = "mean",
10 | ._.rule__3 = ~ mean(.),
11 | ._.rule__4 = output_1[[4]]
12 | )
13 |
14 | expect_identical(output_1, output_ref_1)
15 |
16 | output_2 <- rules(~ mean(.), .prefix = "a_a_")
17 | output_ref_2 <- list(a_a_rule__1 = output_2[[1]])
18 |
19 | expect_identical(output_2, output_ref_2)
20 |
21 | expect_error(rules(mean2), "`mean2`")
22 | })
23 |
24 |
25 | # extract_funs_input ------------------------------------------------------
26 | # Tested in `rules()`
27 |
28 |
29 | # has_dot_symbol ----------------------------------------------------------
30 | # Tested in `rules()`
31 |
32 |
33 | # squash_expr -------------------------------------------------------------
34 | # Tested in `rules()`
35 |
36 |
37 | # quo_get_function --------------------------------------------------------
38 | # Tested in `rules()`
39 |
--------------------------------------------------------------------------------
/tests/testthat/test-spread-groups.R:
--------------------------------------------------------------------------------
1 | context("spread-groups")
2 |
3 |
4 | # Input data --------------------------------------------------------------
5 | input_grouped_summary <- mtcars %>%
6 | group_by(vs, am) %>%
7 | summarise(n_low = dplyr::n() > 6, n_high = dplyr::n() < 10)
8 |
9 |
10 | # spread_groups -----------------------------------------------------------
11 | test_that("spread_groups works", {
12 | output_ref_1 <- tibble::tibble(
13 | "0.0._.n_low" = TRUE, "0.1._.n_low" = FALSE,
14 | "1.0._.n_low" = TRUE, "1.1._.n_low" = TRUE,
15 | "0.0._.n_high" = FALSE, "0.1._.n_high" = TRUE,
16 | "1.0._.n_high" = TRUE, "1.1._.n_high" = TRUE
17 | )
18 |
19 | expect_identical(
20 | spread_groups(input_grouped_summary, vs, am),
21 | output_ref_1
22 | )
23 |
24 | output_ref_2 <- output_ref_1
25 | colnames(output_ref_2) <- gsub("^(.)\\.", "\\1__", colnames(output_ref_2))
26 |
27 | expect_identical(
28 | spread_groups(input_grouped_summary, vs, am, .group_sep = "__"),
29 | output_ref_2
30 | )
31 |
32 | output_ref_3 <- output_ref_1
33 | colnames(output_ref_3) <- gsub("\\._\\.", "___", colnames(output_ref_3))
34 |
35 | expect_identical(
36 | spread_groups(input_grouped_summary, vs, am, .col_sep = "___"),
37 | output_ref_3
38 | )
39 | })
40 |
41 | test_that("spread_groups throws errors", {
42 | expect_error(
43 | spread_groups(input_grouped_summary),
44 | "spread_groups: No group.*column"
45 | )
46 | expect_error(
47 | spread_groups(input_grouped_summary, ends_with("Absent")),
48 | "spread_groups: No group.*column"
49 | )
50 | expect_error(
51 | spread_groups(input_grouped_summary, vs),
52 | "spread_groups:.*non-unique"
53 | )
54 | expect_error(
55 | spread_groups(input_grouped_summary, everything()),
56 | "spread_groups: No rule.*column"
57 | )
58 | expect_error(
59 | input_grouped_summary %>%
60 | ungroup() %>%
61 | mutate(vs = 1:4) %>%
62 | spread_groups(vs),
63 | "spread_groups:.*logical"
64 | )
65 | })
66 |
--------------------------------------------------------------------------------
/tests/testthat/test-utils.R:
--------------------------------------------------------------------------------
1 | context("utils")
2 |
3 |
4 | # Input data --------------------------------------------------------------
5 | df <- mtcars
6 |
7 |
8 | # inside_punct ------------------------------------------------------------
9 | test_that("inside_punct works", {
10 | input1 <- c(
11 | "._.", "._.a", "a._.", "a._.a",
12 | "a_._.a", "a._._a", "a_._._a",
13 | "a__._._a", "a_._.__a", "a__._.__a",
14 | "._.*_.", "._.._.",
15 | "__.a", ".__a", "...", "a_a"
16 | )
17 |
18 | expect_identical(grep(inside_punct(), input1), 1:12)
19 |
20 | input2 <- c(
21 | "a", "_a", "a_", "_a_",
22 | "__a", "a__", "__a__",
23 | "_"
24 | )
25 |
26 | expect_identical(grep(inside_punct("a"), input2), 1:7)
27 | })
28 |
29 |
30 | # negate_select_cols ------------------------------------------------------
31 | test_that("negate_select_cols works", {
32 | output_1 <- negate_select_cols(mtcars, vs, am)
33 | output_ref_1 <- setdiff(colnames(mtcars), c("vs", "am"))
34 |
35 | expect_identical(output_1, output_ref_1)
36 |
37 | output_2 <- negate_select_cols(mtcars, one_of("vs", "am"))
38 | output_ref_2 <- output_ref_1
39 |
40 | expect_identical(output_2, output_ref_2)
41 |
42 | output_3 <- negate_select_cols(mtcars, dplyr::matches("p|a"))
43 | output_ref_3 <- c("cyl", "wt", "qsec", "vs")
44 |
45 | expect_identical(output_3, output_ref_3)
46 |
47 | output_4 <- negate_select_cols(mtcars, cyl:am)
48 | output_ref_4 <- c("mpg", "gear", "carb")
49 |
50 | expect_identical(output_4, output_ref_4)
51 |
52 | output_5 <- negate_select_cols(mtcars, -(cyl:am))
53 | output_ref_5 <- c("cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am")
54 |
55 | expect_identical(output_5, output_ref_5)
56 | })
57 |
58 |
59 | # assert_positive_length --------------------------------------------------
60 | test_that("assert_positive_length works", {
61 | expect_error(
62 | assert_positive_length(list(), "Some name"),
63 | "^Some name.*positive.*length"
64 | )
65 |
66 | expect_identical(assert_positive_length(1:2, "Some name"), 1:2)
67 | expect_identical(assert_positive_length(list(1:2), "Some name"), list(1:2))
68 | })
69 |
70 |
71 | # assert_length -----------------------------------------------------------
72 | test_that("assert_length works", {
73 | expect_error(
74 | assert_length(c("a", "b"), 1, "New name"),
75 | "^New name.*length.*1"
76 | )
77 | expect_error(
78 | assert_length(1, 2, "New name"),
79 | "^New name.*length.*2"
80 | )
81 |
82 | expect_identical(assert_length(list("c"), 1, "New name"), list("c"))
83 | })
84 |
85 |
86 | # assert_character --------------------------------------------------------
87 | test_that("assert_character works", {
88 | expect_error(
89 | assert_character(1L, "Tmp name"),
90 | "Tmp name.*character"
91 | )
92 | expect_error(
93 | assert_character(list("a"), "Tmp name"),
94 | "Tmp name.*character"
95 | )
96 |
97 | expect_identical(assert_positive_length(c("a", "A"), "Tmp name"), c("a", "A"))
98 | })
99 |
100 |
101 | # add_class ---------------------------------------------------------------
102 | test_that("add_class works", {
103 | expect_equal(class(add_class(df, "some")), c("some", "data.frame"))
104 | })
105 |
106 |
107 | # add_class_cond ----------------------------------------------------------
108 | test_that("add_class_cond works", {
109 | expect_equal(class(add_class_cond(df, "data.frame")), "data.frame")
110 | expect_equal(class(add_class_cond(df, "some")), c("some", "data.frame"))
111 | })
112 |
113 |
114 | # remove_class_cond -------------------------------------------------------
115 | test_that("remove_class_cond works", {
116 | input <- structure(1, class = c("a", "b"))
117 |
118 | expect_equal(remove_class_cond(input, "a"), structure(1, class = "b"))
119 | expect_equal(remove_class_cond(input, "b"), input)
120 | })
121 |
122 |
123 | # compute_def_names -------------------------------------------------------
124 | test_that("compute_def_names works", {
125 | expect_identical(compute_def_names(0), character(0))
126 | expect_identical(compute_def_names(10), paste0("__", seq_len(10)))
127 | expect_identical(compute_def_names(10, "base"), paste0("base__", seq_len(10)))
128 | expect_identical(
129 | compute_def_names(10, .start_ind = 4),
130 | paste0("__", seq_len(10) + 3)
131 | )
132 | expect_identical(
133 | compute_def_names(10, "base", 4),
134 | paste0("base__", seq_len(10) + 3)
135 | )
136 | })
137 |
138 |
139 | # enhance_names -----------------------------------------------------------
140 | test_that("enhance_names works", {
141 | expect_identical(enhance_names(character(0)), character(0))
142 |
143 | input <- c("", "name", "", "name", "var")
144 | output_ref_1 <- c("__1", "name", "__3", "name", "var")
145 |
146 | expect_identical(enhance_names(input), output_ref_1)
147 | expect_identical(
148 | enhance_names(input, .prefix = "._."),
149 | paste0("._.", output_ref_1)
150 | )
151 | expect_identical(
152 | enhance_names(input, .suffix = "__"),
153 | paste0(output_ref_1, "__")
154 | )
155 | expect_identical(
156 | enhance_names(input, .prefix = "._.", .suffix = "__"),
157 | paste0("._.", output_ref_1, "__")
158 | )
159 |
160 | expect_identical(
161 | enhance_names(input, .root = "base"),
162 | c("base__1", "name", "base__3", "name", "var")
163 | )
164 | expect_identical(
165 | enhance_names(input, .root = "base", .start_ind = 5),
166 | c("base__5", "name", "base__7", "name", "var")
167 | )
168 | })
169 |
--------------------------------------------------------------------------------
/vignettes/design-and-format.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Design Process and Exposure Format"
3 | author: "Evgeni Chasnovski"
4 | date: "`r Sys.Date()`"
5 | output: rmarkdown::html_vignette
6 | vignette: >
7 | %\VignetteIndexEntry{Design Process and Exposure Format}
8 | %\VignetteEngine{knitr::rmarkdown}
9 | %\VignetteEncoding{UTF-8}
10 | ---
11 |
12 | The main idea of the `ruler` package is to create a format of validation results (along with functional API) that will work naturally with [tidyverse](https://www.tidyverse.org/) tools. This vignette will:
13 |
14 | - Guide you through the design process of __exposure__: `ruler`'s validation result format. This should help to understand the foundations of `ruler` validation workflow.
15 | - Describe exposure format.
16 |
17 | ## Design process
18 |
19 | The preferred local data structure in `tidyverse` is [tibble](https://tibble.tidyverse.org): "A modern re-imagining of the data frame", on which its implementation is based. That is why `ruler` uses data frames as preferred format for data to be validated. However the initial goal is to use tibbles in creation of validation result format as much as possible.
20 |
21 | Basically data frame is a list of variables with the same length. It is easier to think about it as two-dimensional structure where columns can be of different types.
22 |
23 | In abstract form validation of data frame can be put as ___asking whether certain subset of data frame (data unit) obeys certain rule___. The result of validation is logical __value__ representing an answer.
24 |
25 | With influence of [dplyr](https://dplyr.tidyverse.org)'s grammar of data manipulation a data frame can be represented in terms of the following data units:
26 |
27 | - Data frame as a whole. Validation can be done by `summarise()` _without_ grouping.
28 | - Collection of groups of rows. Validation can be done by `summarise()` _with_ grouping.
29 | - Collection of columns. Validation can be done by scoped variants of `summarise()` _without_ grouping: `summarise_all()`, `summarise_if()` and `summarise_at()`.
30 | - Collection of rows. Validation can be done by `transmute()`.
31 | - 2d-collection of cells. Validation can be done by scoped variants of `transmute()`: `transmute_all()`, `transmute_if()` and `transmute_at()`.
32 |
33 | In `ruler` data, group, column, row and cell are five basic data units. They all can be described by the combination of two variables:
34 |
35 | - __var__ which represents the variable name of data unit:
36 | - Value '.all' is reserved for 'all columns as a whole'.
37 | - Value _equal_ to some column name indicates column of data unit.
38 | - Value _not equal_ to some column name indicates the name of group: it is created by uniting (with delimiter) group levels of grouping columns.
39 | - __id__ which represents the row index of data unit:
40 | - Value 0 is reserved for 'all rows as a whole'.
41 | - Value not equal to 0 indicates the row index of data unit.
42 |
43 | Validation of data units can be done with the `dplyr` functions described above. Their application to some data unit can give answers to multiple questions. That is why by design __rules__ (functions that answer one certain question about one type of data unit) are combined in __rule packs__ (functions that answer multiple questions about one type of data unit).
44 |
45 | Application of rule pack to data is connected with several points:
46 |
47 | - Rule packs should have unique __names__ to be used as references.
48 | - By the same reason rules should have names. However uniqueness is necessary only within corresponding rule pack which makes pair 'pack name'+'rule name' a key of identifying the actual rule.
49 | - Output of rule packs for different data units differ in their structure. Therefore rule packs should have __types__ to apply different interpretations to their outputs.
50 | - During the actual validation the most part of results normally indicates obedience to rules. This can cause storing many redundant information in validation results. `ruler` has option of __removing obeyers__ from results during the validation.
51 |
52 | In `ruler` __exposing__ data to rules means applying rule packs to data, collecting results in common format and attaching them to the data as an `exposure` attribute. In this way actual exposure can be done in multiple steps and also be a part of a general data preparation pipeline.
53 |
54 | ## Exposure
55 |
56 | __Exposure__ is a format designed to contain uniform information about validation of different data units. For reproducibility it also saves information about applied packs. Basically exposure is a list with two elements:
57 |
58 | 1. __Packs info__: a `tibble` with the following structure:
59 | - _name_ \ : Name of the pack. If not set manually it will be imputed during exposure.
60 | - _type_ \ : Name of pack type. Indicates which data unit pack checks.
61 | - _fun_ \ : List (preferably unnamed) of rule pack functions.
62 | - _remove_obeyers_ \ : Whether rows about obeyers (data units that obey certain rule) were removed from report after applying pack.
63 | 2. __Tidy data validation report__: a `tibble` with the following structure:
64 | - _pack_ \ : Name of rule pack (from column 'name' in packs info).
65 | - _rule_ \ : Name of the rule defined in rule pack.
66 | - _var_ \ : Name of the data unit variable.
67 | - _id_ \ : Row index of data unit.
68 | - _value_ \ : Whether the described data unit obeys the rule.
69 |
--------------------------------------------------------------------------------