├── .gitignore
├── 00-workshop-contents.Rmd
├── 01-intro-to-data-validation.Rmd
├── 02-scan-your-data.Rmd
├── 03-expect-test-functions.Rmd
├── 04-scaling-up-data-validation.Rmd
├── 05-intro-to-data-documentation.Rmd
├── 06-getting-deeper-into-documenting-data.Rmd
├── LICENSE
├── README.md
├── game_revenue-validation.R
├── informant-penguins.html
├── pointblank-workshop.Rproj
├── save_multiple_agents_to_disk.R
├── small_table_tests
    ├── agent-small_table_2022-10-13
    ├── agent-small_table_2022-10-14
    ├── agent-small_table_2022-10-15
    ├── agent-small_table_2022-10-16
    └── agent-small_table_2022-10-17
├── storms-validation.R
├── tbl_scan-storms.html
└── test-game_revenue.R


/.gitignore:
--------------------------------------------------------------------------------
1 | .Rproj.user
2 | .Rhistory
3 | .RData
4 | .Ruserdata
5 | .DS_Store
6 | 


--------------------------------------------------------------------------------
/00-workshop-contents.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: "Workshop Contents"
 3 | output: html_document
 4 | ---
 5 | 
 6 | This **pointblank** workshop will teach you *a lot* about what **pointblank** can do, and, it'll give you an opportunity to experiment with the package. The workshop modules will introduce you to a large variety of examples to get you well-acquainted with the basic functionality of the package.
 7 | 
 8 | Each module of the workshop focuses on a different subset of functions and they are all presented here as **R Markdown** (.Rmd) files, with one file for each workshop module:
 9 | 
10 | - `"01-intro-to-data-validation.Rmd"` (The `agent`, validation fns, interrogation/reports)
11 | 
12 | - `"02-scan-your-data.Rmd"` (Looking at your data with `scan_data()`)
13 | 
14 | - `"03-expect-test-functions.Rmd"` (Using the `expect_*()` and `test_*()` functions)
15 | 
16 | - `"04-scaling-up-data-validation.Rmd"` (The `multiagent` and its reporting structures)
17 | 
18 | - `"05-intro-to-data-documentation.Rmd"` (The `informant` and describing your data)
19 | 
20 | - `"06-getting-deeper-into-documenting-data.Rmd"` (Using snippets and text tricks)
21 | 
22 | You can navigate to any of these and modify the code within the self-contained **R Markdown** code chunks. Entire **R Markdown** files can be knit to HTML, where a separate window will show the rendered document.
23 | 
24 | ### The **pointblank** Installation
25 | 
26 | Normally you would install **pointblank** on your system by using `install.packages()`:
27 | 
28 | ```{r eval=FALSE}
29 | # install.packages("pointblank")
30 | ```
31 | 
32 | For this workshop, however, we are going to use the development version of **pointblank** and install it from GitHub with `devtools::install_github()`.
33 | 
34 | ```{r eval=FALSE}
35 | # devtools::install_github("rich-iannone/pointblank")
36 | ```
37 | 


--------------------------------------------------------------------------------
/01-intro-to-data-validation.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Introduction to Data Validation, pointblank Style"
  3 | output: html_document
  4 | ---
  5 | 
  6 | ```{r setup, include=FALSE}
  7 | knitr::opts_chunk$set(echo = TRUE)
  8 | library(pointblank)
  9 | library(tidyverse)
 10 | library(blastula)
 11 | library(palmerpenguins)
 12 | ```
 13 | 
 14 | ## Intro
 15 | 
 16 | A common workflow for data validation in **pointblank** involves three basic components:
 17 | 
 18 | - the creation of an 'agent' (this is the main data collection and reporting object)
 19 | - the declaration of validation steps using validation functions (as many as you need)
 20 | - the interrogation of the data (here the agent finally carries out the validation tasks)
 21 | 
 22 | We always start with `create_agent()` and define how the data can be reached and also provide some basic rules about how an interrogation of how that data should eventually be carried out. While we are giving the agent some default behavior, we can override some of this on a step-by-step basis when declaring our validation steps. We always end with `interrogate()` and that function carries out the work of validating the data and generating the all-important reporting. To sum up, this is the construction:
 23 | 
 24 | ```r
 25 | agent <-
 26 |   create_agent(...) %>%
 27 |   << validation functions >> %>%
 28 |   interrogate()
 29 | ```
 30 | 
 31 | ### A simple data validation on a small dataset called `small_table`
 32 | 
 33 | The package contains a few datasets. A really small one for experimentation is called `small_table`:
 34 | 
 35 | ```{r paged.print=FALSE}
 36 | pointblank::small_table
 37 | ```
 38 | 
 39 | We're going to break the validation process into steps. First, let's create an `agent`, give it the `small_table`, and look at the report.
 40 | 
 41 | ```{r}
 42 | # Create the agent with `create_agent()`; the `tbl` is given to the agent
 43 | agent_1 <- 
 44 |   create_agent(
 45 |     tbl = small_table,
 46 |     tbl_name = "small_table",
 47 |     label = "Workshop agent No. 1",
 48 |   )
 49 | 
 50 | # Printing the `agent` will print the report with the default options
 51 | agent_1
 52 | ```
 53 | 
 54 | Okay. Let's provide a few validation functions.
 55 | 
 56 | ```{r}
 57 | agent_1 <-
 58 |   agent_1 %>%
 59 |   col_vals_gte(columns = d, value = 0) %>%
 60 |   col_vals_in_set(columns = f, set = c("low", "mid", "high")) %>%
 61 |   col_is_logical(columns = e) %>%
 62 |   col_is_numeric(columns = d) %>%
 63 |   col_is_character(columns = c(b, f)) %>%
 64 |   rows_distinct()
 65 | 
 66 | agent_1
 67 | ```
 68 | 
 69 | When looking at the report, we see that it contains the information about the validation steps but many of the table cells (to the right) have no entries. That area is the interrogation data, and, we haven't yet used the `interrogate()` function. Let's use it now:
 70 | 
 71 | ```{r}
 72 | agent_1 <- agent_1 %>% interrogate()
 73 | 
 74 | agent_1
 75 | ```
 76 | 
 77 | Now, we see a validation report we can use! Let's go over each of the columns and understand what they mean.
 78 | 
 79 | - `STEP`: the name of the validation function used for the validation step and
 80 | the step number.
 81 | 
 82 | - `COLUMNS`: the names of the target columns used in the validation step (if applicable).
 83 | 
 84 | - `VALUES`: the values used in the validation step, where applicable; this could be as literal values, as column names, an expression, etc.
 85 | 
 86 | - `TBL`: indicates whether any there were any changes to the target table just prior to interrogation. A rightward arrow from a small circle indicates that there was no mutation of the table. An arrow from a circle to a purple square indicates that 'preconditions' were used to modify the target table. An arrow from a circle to a half-filled circle indicates that the target table has been 'segmented'.
 87 | 
 88 | - `EVAL`: a symbol that denotes the success of interrogation evaluation for each step. A checkmark indicates no issues with evaluation. A warning sign indicates that a warning occurred during evaluation. An explosion symbol indicates that evaluation failed due to an error. Hover over the symbol for details on each condition.
 89 | 
 90 | - `UNITS`: the total number of test units for the validation step (these are the atomic units of testing which depend on the type of validation).
 91 | 
 92 | - `PASS`: on top is the absolute number of passing test units and below that is the fraction of passing test units over the total number of test units.
 93 | 
 94 | - `FAIL`: on top is the absolute number of failing test units and below that is the fraction of failing test units over the total number of test units.
 95 | 
 96 | - `W`, `S`, `N`: indicators that show whether the *warn*, *stop*, or *notify* states were entered; unset states appear as dashes, states that are set with thresholds appear as unfilled circles when not entered and filled when thresholds are exceeded (colors for `W`, `S`, and `N` are amber, red, and blue)
 97 | 
 98 | - `EXT`: a column that provides buttons to download data extracts as CSV files for row-based validation steps having failing test units. Buttons only appear when there is data to collect.
 99 | 
100 | We see nothing in the `W`, `S`, and `N` columns. This is because we have to explicitly set thresholds for those to be active. We'll do that next...
101 | 
102 | ### Data validation with threshold levels
103 | 
104 | We often should think about what's tolerable in terms of data quality and implement that into our reporting. Let's set proportional failure thresholds to the `warn`, `stop`, and `notify` states using the `action_levels()` function.
105 | 
106 | ```{r}
107 | # Create an `action_levels` object with the namesake function.
108 | al <- 
109 |   action_levels(
110 |       warn_at = 0.15,
111 |       stop_at = 0.25,
112 |     notify_at = 0.35
113 |   )
114 | 
115 | # This can be printed for inspection
116 | al
117 | ```
118 | 
119 | We are using threshold fractions of test units (between `0` and `1`). For `0.15`, this means that if 15% percent of the test units are found to *fail* (i.e., don't meet the expectation), then the designated failure state is entered.
120 | 
121 | Absolute values starting from `1` can be used instead, and this constitutes an absolute failure threshold (e.g., `10` means that if `10` of the test units are found to fail, the failure state is entered).
122 | 
123 | What are test units? They make up the individual tests for a validation step. They will vary by the validation function used but, in simple terms, a validation function that validates values in a column will have the number of test units equal to the number of rows. A validation function that validates a column type will have exactly one test unit. This is always given in the `UNITS` column of the reporting table.
124 | 
125 | Let’s use the `action_levels` object in a new validation process (based on the same `small_table` dataset). We'll make it so the validation functions used will result in more failing test units.
126 | 
127 | ```{r}
128 | agent_2 <-
129 |   create_agent(
130 |     tbl = small_table,
131 |     tbl_name = "small_table",
132 |     label = "Workshop agent No. 2",
133 |     actions = al
134 |   ) %>%
135 |   col_is_posix(columns = date_time) %>%
136 |   col_vals_lt(columns = a, value = 7) %>%
137 |   col_vals_regex(columns = b, regex = "^[0-9]-[a-w]{3}-[2-9]{3}$") %>%
138 |   col_vals_between(columns = d, left = 0, right = 4000) %>%
139 |   col_is_logical(columns = e) %>%
140 |   col_is_character(columns = c(b, f)) %>%
141 |   col_vals_lt(columns = d, value = 9600) %>%
142 |   col_vals_in_set(columns = f, set = c("low", "mid")) %>%
143 |   rows_distinct() %>%
144 |   interrogate()
145 | 
146 | agent_2
147 | ```
148 | 
149 | Some notes:
150 | 
151 | - the thresholds for the `warn`, `stop`, and `notify` states are presented in the table header; these are defaults for every validation step
152 | - we now have some indicators of failure thresholds being met (look at the `W`, `S`, and `N` columns); steps `2`, `3`, `9`, and `10` have at least the `warn` condition
153 | - it's possible to have test unit failures but not enter a `warn` state (look at steps `4` and `8`); they still provide CSVs for failed rows but the `W` indicator circle isn't filled in
154 | 
155 | How you set the default thresholds will depend on how strict the measure for data quality is. There might be certain validation steps where we'd like to be more stringent. For the next validation process we will apply the `action_levels()` function to individual steps, overriding the default setting. 
156 | 
157 | ```{r}
158 | agent_3 <-
159 |   create_agent(
160 |     tbl = small_table,
161 |     tbl_name = "small_table",
162 |     label = "Workshop agent No. 3",
163 |     actions = al
164 |   ) %>%
165 |   col_is_posix(columns = date_time) %>%
166 |   col_vals_lt(columns = a, value = 7) %>%
167 |   col_vals_regex(columns = b, regex = "^[0-9]-[a-w]{3}-[2-9]{3}$") %>%
168 |   col_vals_between(
169 |     columns = d,
170 |     left = 0,
171 |     right = 4000,
172 |     actions = action_levels( # Setting `actions` at the individual
173 |       warn_at = 1,           # validation step. This time, using absolute
174 |       stop_at = 3,           # threshold values (i.e., a single test unit
175 |       notify_at = 5          # failing triggers the `warn` state
176 |     )
177 |   ) %>%
178 |   col_is_logical(columns = e) %>%
179 |   col_is_character(columns = vars(b, f)) %>%
180 |   col_vals_lt(columns = d, value = 9600) %>%
181 |   col_vals_in_set(columns = f, set = c("low", "mid")) %>%
182 |   rows_distinct() %>%
183 |   interrogate()
184 | 
185 | agent_3
186 | ```
187 | 
188 | ### A look at the available validation functions
189 | 
190 | There are 36 validation functions. Here they are:
191 | 
192 | - `col_vals_lt()`
193 | - `col_vals_lte()`
194 | - `col_vals_equal()`
195 | - `col_vals_not_equal()`
196 | - `col_vals_gte()`
197 | - `col_vals_gt()`
198 | - `col_vals_between()`
199 | - `col_vals_not_between()`
200 | - `col_vals_in_set()`
201 | - `col_vals_not_in_set()`
202 | - `col_vals_make_set()`
203 | - `col_vals_make_subset()`
204 | - `col_vals_increasing()`
205 | - `col_vals_decreasing()`
206 | - `col_vals_null()`
207 | - `col_vals_not_null()`
208 | - `col_vals_regex()`
209 | - `col_vals_within_spec()`
210 | - `col_vals_expr()`
211 | - `rows_distinct()`
212 | - `rows_complete()`
213 | - `col_is_character()`
214 | - `col_is_numeric()`
215 | - `col_is_integer()`
216 | - `col_is_logical()`
217 | - `col_is_date()`
218 | - `col_is_posix()`
219 | - `col_is_factor()`
220 | - `col_exists()`
221 | - `col_schema_match()`
222 | - `row_count_match()`
223 | - `col_count_match()`
224 | - `tbl_match()`
225 | - `conjointly()`
226 | - `serially()`
227 | - `specially()`
228 | 
229 | It's a lot to keep track of but they all try to use a consistent interface. Let's break this down.
230 | 
231 | The `col_vals_*()` group will check individual cells within one or more columns. Aside from using an 'agent', we can use the validation functions *directly* on the data. It acts as a sort of validation 'filter'; data will pass through unchanged if validation passes, error if validation doesn't pass. Let's try that with `col_vals_between()`:
232 | 
233 | ```{r paged.print=FALSE}
234 | small_table %>% col_vals_between(columns = a, left = 0, right = 10)
235 | ```
236 | 
237 | ```{r error=TRUE}
238 | small_table %>% col_vals_between(columns = a, left = 5, right = 10)
239 | ```
240 | 
241 | The `col_is_*()` group will check whether a column is of a certain type. Let's look at two cases: one passing and the other failing.
242 | 
243 | ```{r paged.print=FALSE}
244 | small_table %>% col_is_character(columns = b)
245 | ```
246 | 
247 | ```{r error=TRUE}
248 | small_table %>% col_is_numeric(columns = date)
249 | ```
250 | 
251 | The two `rows_*()` functions (`rows_distinct()` and `rows_complete()`) will check entire rows (this can be narrowed down with the `columns` argument). Here are examples of both, with failing and then passing cases.
252 | 
253 | `rows_distinct()`:
254 | 
255 | ```{r error=TRUE}
256 | small_table %>% rows_distinct()
257 | ```
258 | 
259 | ```{r paged.print=FALSE}
260 | head(small_table) %>% rows_distinct()
261 | ```
262 | 
263 | `rows_complete()`:
264 | 
265 | ```{r error=TRUE}
266 | small_table %>% rows_complete()
267 | ```
268 | 
269 | ```{r paged.print=FALSE}
270 | small_table %>% rows_complete(columns = vars(date_time, date, a, b))
271 | ```
272 | 
273 | The `*_match()` functions validate whether some aspect of the table as a whole matches an expectation.
274 | 
275 | - `col_schema_match()` - column schema matching
276 | - `row_count_match()`  - tbl row count matching (with another tbl or fixed value)
277 | - `col_count_match()`  - tbl col count matching (with another tbl or fixed value)
278 | - `tbl_match()`        - does the target table match a comparison table?
279 | 
280 | Here are two (passing) examples:
281 | 
282 | ```{r paged.print=FALSE}
283 | small_table %>% row_count_match(count = 13)
284 | ```
285 | 
286 | ```{r paged.print=FALSE}
287 | small_table %>% col_count_match(count = palmerpenguins::penguins)
288 | ```
289 | 
290 | ### Getting data extracts for failed rows from the 'agent'
291 | 
292 | Those CSV buttons in the validation report are useful for sharing the report with others since they don't even need to know R to obtain those extracts. For the person familiar with R and **pointblank**, it is possible to get data frames for the failed rows (per validation step).
293 | 
294 | We can use the `get_data_extracts()` function to obtain a list of data frames, or, use the `i` argument to get a data frame available for a specific step. Not all steps will have associated data frames. Also, not all validation functions will produce data frames here (they need to check values in columns).
295 | 
296 | Let's use `get_data_extracts()` on `agent_3`.
297 | 
298 | ```{r paged.print=FALSE}
299 | get_data_extracts(agent = agent_3)
300 | ```
301 | 
302 | The list components are named for the validation steps that have data extracts (i.e., filtered rows where test unit failures occurred). Let's get an individual data extract from step `9` (the `col_vals_in_set()` step, which looked at column `f`):
303 | 
304 | ```{r paged.print=FALSE}
305 | get_data_extracts(agent = agent_3, i = 9)
306 | ```
307 | 
308 | ### Getting 'sundered' data back (either 'good' or 'bad' rows)
309 | 
310 | Sometimes, if your methodology allows for it, you want to use the best part of the input data for something else. With the `get_sundered_data()`, we use provide an agent object that interrogated the data and what we get back could be:
311 | 
312 | - the 'pass' data piece (rows with no failing test units across all row-based validation functions)
313 | - the 'fail' data piece (rows with at least one failing test unit across the same series of validations)
314 | - all the data with a new column that labels each row as passing or failing across validation steps (the labels can be customized).
315 | 
316 | Let's make new agent and validate `small_table` again.
317 | 
318 | ```{r}
319 | agent_4 <-
320 |   create_agent(
321 |     tbl = small_table,
322 |     tbl_name = "small_table",
323 |     label = "Workshop agent No. 4"
324 |   ) %>%
325 |   col_vals_gt(columns = d, value = 1000) %>%
326 |   col_vals_between(
327 |     columns = c,
328 |     left = vars(a), right = vars(d), # Using values in columns, not literal vals
329 |     na_pass = TRUE
330 |   ) %>%
331 |   interrogate()
332 | 
333 | agent_4
334 | ```
335 | 
336 | Get the sundered data piece that contains only rows that passed both validation steps (this is the default piece). This yields 5 of 13 total rows.
337 | 
338 | ```{r paged.print=FALSE}
339 | agent_4 %>% get_sundered_data()
340 | ```
341 | 
342 | Get the complementary data piece: all of those rows that failed either of the two validation steps. This yields 8 of 13 total rows.
343 | 
344 | ```{r paged.print=FALSE}
345 | agent_4 %>% get_sundered_data(type = "fail")
346 | ```
347 | 
348 | We can get all of the input data returned with a flag column (called `.pb_combined`). This is done by using `type = "combined"` and that rightmost column will contain `"pass"` and `"fail"` values.
349 | 
350 | ```{r paged.print=FALSE}
351 | agent_4 %>% get_sundered_data(type = "combined")
352 | ```
353 | 
354 | The labels can be changed and this is flexible:
355 | 
356 | ```{r paged.print=FALSE}
357 | agent_4 %>% get_sundered_data(type = "combined", pass_fail = c(TRUE, FALSE))
358 | ```
359 | 
360 | ```{r paged.print=FALSE}
361 | agent_4 %>% get_sundered_data(type = "combined", pass_fail = 0:1)
362 | ```
363 | 
364 | ### Accessing the plan/interrogation data with `get_agent_x_list()`
365 | 
366 | The agent's x-list is a record of information that the agent possesses at any given time. The x-list will contain the most complete information after an interrogation has taken place (before then, the data largely reflects the validation plan).
367 | 
368 | The x-list can be constrained to a particular validation step (by supplying the step number to the `i` argument), or, we can get the information for all validation steps by leaving `i` unspecified.
369 | 
370 | Let's obtain such a list from `agent_3`, which had 10 validation steps:
371 | 
372 | ```{r paged.print=FALSE}
373 | # Generate the `x_list` object from `agent_3`
374 | x_list_3 <- agent_3 %>% get_agent_x_list()
375 | 
376 | # Printing this gives us a console preview
377 | # of which components are available
378 | x_list_3
379 | ```
380 | 
381 | The amount of information contained in here is comprehensive (see `?get_agent_x_list` for a detailed breakdown) but we can provide a few examples.
382 | 
383 | The number of test units in each validation step.
384 | 
385 | ```{r}
386 | x_list_3$n
387 | ```
388 | 
389 | The number of *passing* test units in each validation step.
390 | 
391 | ```{r}
392 | x_list_3$n_passed
393 | ```
394 | 
395 | The *fraction* of passing test units in each validation step.
396 | 
397 | ```{r}
398 | x_list_3$f_passed
399 | ```
400 | 
401 | The `warn`, `stop`, and `notify` states. We can arrange that in a tibble and use the step numbers (`i`) as well.
402 | 
403 | ```{r  paged.print=FALSE}
404 | dplyr::tibble(
405 |   step   = x_list_3$i,
406 |   warn   = x_list_3$warn,
407 |   stop   = x_list_3$stop,
408 |   notify = x_list_3$notify
409 | )
410 | ```
411 | 
412 | ### Emailing the interrogation report with `email_create()`
413 | 
414 | We can choose to email the report if the `notify` state is entered. The message can be created with the agent through the `email_create()` function. Here's a useful bit of code that allows for conditional sending.
415 | 
416 | ```{r eval=FALSE}
417 | 
418 | if (any(x_list_3$notify)) {
419 | 
420 |   email_create(agent_3) %>%
421 |     blastula::smtp_send(
422 |       from = "sender@email.com",
423 |       to = "recipient@email.com",
424 |       credentials = creds_file(file = "email_secret")
425 |     )
426 | }
427 | ```
428 | 
429 | Such code might be useful during an automated process where data is periodically checked and failures beyond thresholds require notification.
430 | 
431 | While `email_create()` will generate the email message body, functions in the **blastula** package are responsible for the sending of that email. For more information on sending HTML email, look at the help article found by using `?blastula::smtp_send`.
432 | 
433 | ### Customizing the interrogation report with `get_agent_report()`
434 | 
435 | We don't have to fully accept the defaults for a data validation report. Using `get_agent_report()` gives us options.
436 | 
437 | Here's how you can change the title:
438 | 
439 | ```{r}
440 | agent_3 %>% get_agent_report(title = "The **3rd** Example")
441 | ```
442 | 
443 | You can bring the steps that had serious failures up to the top:
444 | 
445 | ```{r}
446 | agent_3 %>% get_agent_report(arrange_by = "severity")
447 | ```
448 | 
449 | You can remove those steps that had no failures:
450 | 
451 | ```{r}
452 | agent_3 %>% get_agent_report(arrange_by = "severity", keep = "fail_states")
453 | ```
454 | 
455 | You can change the language of the report:
456 | 
457 | ```{r}
458 | agent_3 %>% get_agent_report(lang = "de")
459 | ```
460 | 
461 | ------
462 | 
463 | ### SUMMARY
464 | 
465 | 1. Data validation in **pointblank** requires the creation of an agent, validation functions, and an interrogation.
466 | 2. The agent creates a report that tries to be informative and easily explainable.
467 | 3. We can set data quality thresholds with `action_levels()`; there can be default DQ thresholds and step-specific thresholds (in both cases, supplied to `actions`).
468 | 4. There are 36 validation functions (having a similar interface and many common arguments); they can be used with an agent or directly on the data.
469 | 5. We can get data extracts pertaining to failing test units in rows of the input dataset (with `get_data_extracts()`).
470 | 6. There is the option to obtain 'sundered' data, which is the input data split by whether cells contained failing test units (with `get_sundered_data()`)
471 | 7. A huge amount of validation data can be accessed with the `get_agent_x_list()` function (useful for programming with the validation results).
472 | 8. We can create an email message using a specialized version of the validation report with `email_create()`; this integrates with the **blastula** R package.
473 | 9. The report can be modified with `get_agent_report()`.
474 | 
475 | 


--------------------------------------------------------------------------------
/02-scan-your-data.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Scan Your Data"
  3 | output: html_document
  4 | ---
  5 | 
  6 | ```{r setup, include=FALSE}
  7 | knitr::opts_chunk$set(echo = TRUE)
  8 | library(pointblank)
  9 | library(tidyverse)
 10 | library(palmerpenguins)
 11 | library(safetyData)
 12 | ```
 13 | 
 14 | ## Intro
 15 | 
 16 | Sometimes you know nothing about a new dataset. The **pointblank** package is here to help and it has the `scan_data()` function. So simple, and it gives you so much information on a data table. The function generates an HTML report that scours the input table data.
 17 | 
 18 | In the same spirit, generating validation steps can be laborious and difficult at first. There's a function available to kickstart that process: `draft_validation()`. It'll generate a new .R file with a suggested validation plan that's meant to work and is tweakable.
 19 | 
 20 | ### Performing table scans with `scan_data()`
 21 | 
 22 | The `scan_data()` function is available for providing an interactive overview of a tabular dataset. The reporting output contains several sections to make everything more digestible, and these are:
 23 | 
 24 | - `Overview`: Shows table dimensions, duplicate row counts, column types, and reproducibility information
 25 | - `Variables`: Provides a summary for each table variable and further statistics and summaries depending on the variable type
 26 | - `Interactions`: Displays a matrix plot that describes the interactions between variables
 27 | - `Correlations`: This is a set of correlation matrix plots for numerical variables
 28 | - `Missing Values`: A summary figure that shows the degree of missingness across variables
 29 | - `Sample`: A table that provides the head and tail rows of the dataset
 30 | 
 31 | The output HTML report will appear in the RStudio Viewer and can also be integrated in R Markdown or Quarto HTML output. Here’s an example that uses the `penguins_raw` dataset from the **palmerpenguins** package.
 32 | 
 33 | ```{r}
 34 | scan_data(tbl = palmerpenguins::penguins_raw, navbar = FALSE)
 35 | ```
 36 | 
 37 | As could be seen, the first two sections had a lot of additional information tucked behind detail views (with the `Toggle details` buttons) and within tab sets. Should this amount of information be a little overwhelming, there is the option to disable one or more sections. With `scan_data()`’s `sections` argument, you can specify just the sections that are needed for a specific scan.
 38 | 
 39 | The default value for `sections` is the string `"OVICMS"` and each letter of that stands for the following sections in their default order:
 40 | 
 41 | `"O"`: `"overview"`
 42 | `"V"`: `"variables"`
 43 | `"I"`: `"interactions"`
 44 | `"C"`: `"correlations"`
 45 | `"M"`: `"missing"`
 46 | `"S"`: `"sample"`
 47 | 
 48 | This string can contain less key characters and the order can be changed to suit the desired layout of the report. For example, if you just need the Overview, a Sample, and the description of Variables in the target table, the string to use for sections would be `"OSV"`.
 49 | 
 50 | The `tbl` supplied could be a data frame, tibble, a `tbl_dbi` object, or a `tbl_spark` object. Here are a few more datasets that could be scanned, this time using `sections = "OSV"`:
 51 | 
 52 | ```{r eval=FALSE}
 53 | scan_data(tbl = safetyData::adam_adae, sections = "OSV")
 54 | ```
 55 | 
 56 | ```{r eval=FALSE}
 57 | scan_data(tbl = safetyData::adam_advs, sections = "OSV")
 58 | ```
 59 | 
 60 | The reporting generated by scan_data() can be presented in one of eight spoken languages: English (`"en"`, the default), French (`"fr"`), German (`"de"`), Italian (`"it"`), Spanish (`"es"`), Portuguese (`"pt"`), Turkish (`"tr"`), Chinese (`"zh"`), Russian (`"ru"`), Polish (`"pl"`), Danish (`"da"`), Swedish (`"sv"`), and Dutch (`"nl"`). These two-letter language codes can be used as an argument to the `lang` argument.
 61 | 
 62 | Here's an example that scans **dplyr**'s `starwars` dataset and creates the report in Danish. 
 63 | 
 64 | ```{r}
 65 | scan_data(tbl = dplyr::starwars, sections = "OVS", lang = "da")
 66 | ```
 67 | 
 68 | It's possible to export this reporting to a self-contained HTML file. To do so, use the `export_report()` function (this also works for every other type of reporting you'll see in the Viewer).
 69 | 
 70 | ```{r eval=FALSE}
 71 | # Use `scan_data()` and assign reporting to `tbl_scan`
 72 | tbl_scan <- scan_data(tbl = dplyr::storms, sections = "OVS")
 73 | 
 74 | # Write the `ptblank_tbl_scan` object to an HTML file
 75 | export_report(
 76 |   tbl_scan,
 77 |   filename = "tbl_scan-storms.html"
 78 | )
 79 | ```
 80 | 
 81 | ### Drafting a nice, new validation plan with `draft_validation()`
 82 | 
 83 | We can generate a draft validation plan in a new `.R` or `.Rmd` file using an input data table (just like with `scan_data()`). With `draft_validation()` the data table will be scanned to learn about its column data and a set of starter validation steps (constituting a validation plan) will be written.
 84 | 
 85 | Let's draft a validation plan for the `dplyr::storms` dataset. Here's a quick look at that table:
 86 | 
 87 | ```{r paged.print=FALSE}
 88 | dplyr::storms
 89 | ```
 90 | 
 91 | Here's how we generate the new `.R` file:
 92 | 
 93 | ```{r eval=FALSE}
 94 | 
 95 | draft_validation(
 96 |   tbl = ~dplyr::storms, # This `~` makes it an expression for getting the data
 97 |   tbl_name = "storms", 
 98 |   filename = "storms-validation"
 99 | )
100 | ```
101 | 
102 | Check out the new file called `"storms-validation.R"`! It's ready to run, all the validation steps run without failing test units, and the process (thanks to column inference routines) knows what to do with certain types of columns (like the latitude and longitude ones). 
103 | 
104 | Once in the file, it's possible to tweak the validation steps to better fit the expectations to the particular domain. It's best to use a data extract that contains a good amount of rows and is relatively free of spurious data.
105 | 
106 | ------
107 | 
108 | ### SUMMARY
109 | 
110 | 1. It's a great idea to examine data you're unfamiliar with with `scan_data()`!
111 | 2. The `draft_validation()` function can give you a super-quickstart for data validation (it scans your data, but in a different way). 
112 | 


--------------------------------------------------------------------------------
/03-expect-test-functions.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: "Using the `expect_*()` and `test_*()` Functions"
 3 | output: html_document
 4 | ---
 5 | 
 6 | ```{r setup, include=FALSE}
 7 | knitr::opts_chunk$set(echo = TRUE)
 8 | library(pointblank)
 9 | library(tidyverse)
10 | ```
11 | 
12 | ## Intro
13 | 
14 | Those validation functions used previously with an agent have two sets of variants, taking the forms `expect_*()` and `test_*()`.
15 | 
16 | The 'expect' prefix indicates that those functions are to be used in a **testthat** unit testing workflow. The 'test' prefix indicates that that set of functions produce logical outputs (`TRUE`/`FALSE`), making them suitable for programming.
17 | 
18 | ### Using the **pointblank** expectation functions
19 | 
20 | The **testthat** package has collection of functions that begin with `expect_`. The `expect_*()` functions in **pointblank** follow the same convention and can be used in the standard **testthat** workflow (in a `test-<name>.R` file, inside the `tests/testthat` folder). The big difference here is that instead of testing function outputs, we are testing data tables.
21 | 
22 | Say we wanted to test the values in the `c` column of the `small_table` dataset. Let's look at the values first:
23 | 
24 | ```{r}
25 | small_table$c
26 | ```
27 | 
28 | Our expectation is that values can be between `0` and `10` and `NA` values are permitted. We can use `expect_col_vals_between()` for that:
29 | 
30 | ```{r}
31 | expect_col_vals_between(small_table, columns = c, 0, 10, na_pass = TRUE)
32 | ```
33 | 
34 | When running this, nothing is returned. The default threshold for error is one test unit (can be changed with the `threshold` argument). If there is an error, that is reported in the console.
35 | 
36 | ```{r, error=TRUE}
37 | expect_col_vals_between(small_table, columns = c, 0, 7, na_pass = TRUE)
38 | ```
39 | 
40 | There are 36 `expect_*()` functions, which is a lot. It's actually somewhat overwhelming at first. If you wanted to test your dataset in the **testthat** framework, a nice beginning approach might be to take the dataset and do two things in sequence:
41 | 
42 | - use the `draft_validation()` function to generate a validation plan with the dataset as the primary input
43 | - use the `write_testthat_file()` function to create a **testthat** .R file using the agent from the `draft_validation()` file
44 | 
45 | Let's use the `game_revenue` dataset from the **pointblank** package in this two-step workflow.
46 | 
47 | ```{r eval=FALSE}
48 | draft_validation(
49 |   tbl = ~ pointblank::game_revenue,
50 |   file_name = "game_revenue-validation"
51 | )
52 | ```
53 | 
54 | Going into the `"game_revenue-validation.R"` file, the following line was added to the bottom:
55 | 
56 | ```{r eval=FALSE}
57 | write_testthat_file(agent = agent, name = "game_revenue", path = ".")
58 | ```
59 | 
60 | Then the entire file was executed, creating the `"test-game_revenue.R"` file. This can be run using the 'Run Tests' button.
61 | 
62 | ### Using the **pointblank** test functions
63 | 
64 | The collection of `test_*()` functions, 36 of them, are used to give us a single `TRUE` or `FALSE`.
65 | 
66 | Say we wanted a script to error if there are `NA` values in the `date_time` column of the `small_table` dataset. We could write this:
67 | 
68 | ```{r}
69 | if (!test_col_vals_not_null(small_table, columns = date_time)) {
70 |   stop("There should not be any `NA` values in the `date_time` column.")
71 | }
72 | ```
73 | 
74 | This one does result in an error:
75 | 
76 | ```{r error=TRUE}
77 | if (
78 |   !test_col_vals_increasing(small_table, date, allow_stationary = TRUE) ||
79 |   !test_col_vals_gt(small_table, a, 1)
80 |   ) {
81 |   stop("There are problems with `small_table`.")
82 | }
83 | ```
84 | 
85 | ------
86 | 
87 | ### SUMMARY
88 | 
89 | 1. You can validate tabular data in a **testthat** workflow with the `expect_*()` functions.
90 | 2. The `test_*()` collection of functions can be useful for developing conditional logic in programming contexts.
91 | 


--------------------------------------------------------------------------------
/04-scaling-up-data-validation.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Scaling up Data Validation with the Multiagent"
  3 | output: html_document
  4 | ---
  5 | 
  6 | ```{r setup, include=FALSE}
  7 | knitr::opts_chunk$set(echo = TRUE)
  8 | library(pointblank)
  9 | library(tidyverse)
 10 | ```
 11 | 
 12 | ## Intro
 13 | 
 14 | If your data quality process involves more data validation runs and you have a large variety of tables to validate, we can take advantage of functions available in **pointblank** to make all that possible.
 15 | 
 16 | ### Writing agents to disk
 17 | 
 18 | The `x_write_disk()` function lets you write a **pointblank** agent to disk. This is very useful if you want to continually save the interrogation results as part of a larger data quality process. Here's an example where the agent is saved to disk with the date as part of the filename.
 19 | 
 20 | ```{r eval=FALSE}
 21 | 
 22 | # Create the agent, develop a validation plan, interrogate
 23 | agent <-
 24 |   create_agent(
 25 |     tbl = ~ small_table,
 26 |     tbl_name = "small_table",
 27 |     label = "Daily check of `small_table`.",
 28 |     actions = action_levels(
 29 |       warn_at = 0.10,
 30 |       stop_at = 0.25,
 31 |       notify_at = 0.35
 32 |     )
 33 |   ) %>%
 34 |   col_exists(columns = vars(date, date_time)) %>%
 35 |   col_vals_regex(
 36 |     columns = b,
 37 |     regex = "[0-9]-[a-z]{3}-[0-9]{3}"
 38 |   ) %>%
 39 |   rows_distinct() %>%
 40 |   col_vals_gt(columns = d, value = 100) %>%
 41 |   col_vals_lte(columns = c, value = 5) %>%
 42 |   interrogate()
 43 | 
 44 | # Save the agent to disk with `x_write_disk()`; append the date
 45 | x_write_disk(
 46 |   agent,
 47 |   filename = affix_date("agent-small_table"),
 48 |   path = "small_table_tests"
 49 | )
 50 | ```
 51 | 
 52 | ### Reading agents from disk
 53 | 
 54 | We have this on disk as `"small_table_tests/agent-small_table_2022-10-13"`. We can read this from disk using the `x_read_disk()` function (it recreates the object).
 55 | 
 56 | ```{r}
 57 | agent_2022_10_13 <- 
 58 |   x_read_disk(
 59 |     filename = "agent-small_table_2022-10-13",
 60 |     path = "small_table_tests"
 61 |   )
 62 | ```
 63 | 
 64 | We can get the data validation report from it.
 65 | 
 66 | ```{r}
 67 | agent_2022_10_13
 68 | ```
 69 | 
 70 | ### Creating a 'multiagent' to get a combined data validation report
 71 | 
 72 | A common task might be to see how data quality is changing over time. If you have multiple saved agents that check the same table, we can make a combined validation report that shows all of those validations.
 73 | 
 74 | We actually have five saved agents in the `"small_table_tests"` directory:
 75 | 
 76 | - `"agent-small_table_2022-10-13"`
 77 | - `"agent-small_table_2022-10-14"`
 78 | - `"agent-small_table_2022-10-15"`
 79 | - `"agent-small_table_2022-10-16"`
 80 | - `"agent-small_table_2022-10-17"`
 81 | 
 82 | Let's get them all into a single report. We do this by generating a `multiagent` and that object has it's own `get_multiagent_report()` function for customizing the layout and content of the report.
 83 | 
 84 | ```{r}
 85 | multiagent <- 
 86 |   create_multiagent(
 87 |     x_read_disk("small_table_tests/agent-small_table_2022-10-13"),
 88 |     x_read_disk("small_table_tests/agent-small_table_2022-10-14"),
 89 |     x_read_disk("small_table_tests/agent-small_table_2022-10-15"),
 90 |     x_read_disk("small_table_tests/agent-small_table_2022-10-16"),
 91 |     x_read_disk("small_table_tests/agent-small_table_2022-10-17")
 92 |   )
 93 | ```
 94 | 
 95 | We can get a combined data validation report from it. By default, all validation reports are stacked together in the `"long"` display mode.
 96 | 
 97 | ```{r}
 98 | multiagent
 99 | ```
100 | 
101 | With `get_multiagent_report()` we can customize the reporting. Here, we will choose the `"wide"` display mode and provide a custom title.
102 | 
103 | ```{r}
104 | get_multiagent_report(
105 |   multiagent,
106 |   display_mode = "wide",
107 |   title = "Wide report from **Multiple** Table Validations"
108 | )
109 | ```
110 | 
111 | ------
112 | 
113 | ### SUMMARY
114 | 
115 | 1. We can save agents to disk and read them back. This is good for keeping records of data quality and all data/reporting is preserved.
116 | 2. Multiple agents can be combined together, generating specialized reports that can show the validations of multiple table (long display) of the evolution of data quality for a single table (wide display).
117 | 


--------------------------------------------------------------------------------
/05-intro-to-data-documentation.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Introduction to Data Documentation"
  3 | output: html_document
  4 | ---
  5 | 
  6 | ```{r setup, include=FALSE}
  7 | knitr::opts_chunk$set(echo = TRUE)
  8 | library(pointblank)
  9 | library(tidyverse)
 10 | ```
 11 | 
 12 | ## Intro
 13 | 
 14 | A good thing to do often is to document our datasets. We can do this in **pointblank** through the use of several functions that let us define portions of information about a table. This 'info text' can pertain to individual columns, the table as a whole, and whatever additional information makes sense for your organization.
 15 | 
 16 | ### A simple example using `small_table`
 17 | 
 18 | Let's document the `small_table` dataset that's available in **pointblank**. Here's the table once again:
 19 | 
 20 | ```{r paged.print=FALSE}
 21 | pointblank::small_table
 22 | ```
 23 | 
 24 | To start the process, the `create_informant()` function is used. This creates an 'informant' object that is quite a bit different from the 'agent' object.
 25 | 
 26 | ```{r}
 27 | 
 28 | # Create the informant
 29 | informant <- 
 30 |   create_informant(
 31 |     tbl = small_table,
 32 |     tbl_name = "small_table",
 33 |     label = "Metadata for the `small_table` dataset."
 34 |   )
 35 | 
 36 | # Print to get the information report for the table
 37 | informant
 38 | ```
 39 | 
 40 | Printing `informant` will show us the automatically-generated information on the `small_table` dataset, adding the *COLUMNS* section.
 41 | 
 42 | What we get in the initial report is very basic. Next, we ought to add information with the following set of `info_*()` functions:
 43 | 
 44 | - `info_tabular()`: Add info pertaining to the data table as a whole
 45 | - `info_columns()`: Add info for each table column
 46 | - `info_section()`: Add a section that provides ancillary information
 47 | 
 48 | Let’s try adding some information with each of these functions and then look at the resulting report.
 49 | 
 50 | ```{r}
 51 | informant <-
 52 |   create_informant(
 53 |     tbl = small_table,
 54 |     tbl_name = "small_table",
 55 |     label = "Example No. 2"
 56 |   ) %>%
 57 |   info_tabular(
 58 |     description = "This table is included in the **pointblank** pkg."
 59 |   ) %>%
 60 |   info_columns(
 61 |     columns = "date_time",
 62 |     info = "This column is full of timestamps."
 63 |   ) %>%
 64 |   info_section(
 65 |     section_name = "further information", 
 66 |     `examples and documentation` = "Examples for how to use the `info_*()` functions
 67 |     (and many more) are available at the 
 68 |     [**pointblank** site](https://rich-iannone.github.io/pointblank/)."
 69 |   )
 70 | 
 71 | informant
 72 | ```
 73 | 
 74 | As can be seen, the report is a bit more filled out with information. The *TABLE* and *COLUMNS* sections are in their prescribed order and the new section we named *FURTHER INFORMATION* follows those (and it has one subsection called *EXAMPLES AND DOCUMENTATION*). Let’s explore how each of the three different `info_*()` functions work.
 75 | 
 76 | ### The *TABLE* section and `info_tabular()`
 77 | 
 78 | The `info_tabular()` function adds information to the TABLE section. We use named arguments to define subsection names and their content. In the previous example 
 79 | 
 80 | ```r
 81 | info_tabular(description = "This table is included in the **pointblank** pkg.")
 82 | ```
 83 | 
 84 | was used to make the *DESCRIPTION* subsection (all section titles are automatically capitalized). We can define as many subsections to the *TABLE* section as we need, either in the same `info_tabular()` call or across multiple calls.
 85 | 
 86 | ```{r}
 87 | informant %>% 
 88 |   info_tabular(Updates = "This table is not regularly updated.")
 89 | ```
 90 | 
 91 | The *TABLE* section is a great place to put all the information about the table that needs to be front and center. Examples of some useful topics for this section might include:
 92 | 
 93 | - a high-level summary of the table, stating its purpose and importance
 94 | - what each row of the table represents
 95 | - the main users of the table within an organization
 96 | - a description of how the table is generated
 97 | - information on the frequency of updates
 98 | 
 99 | ### The *COLUMNS* section and `info_columns()`
100 | 
101 | The section that follows the *TABLE* section is *COLUMNS.* This section provides an opportunity to describe each table column in as much detail as necessary. Here, individual columns serve as subsections (automatically generated upon using `create_informant()`) and there can be subsections within each column as well.
102 | 
103 | The interesting thing about the information provided here via `info_columns()` is that the information is additive. We can make multiple calls of `info_columns()` and disperse common pieces of info text to multiple columns and append the text to any existing.
104 | 
105 | Let's use the `palmerpenguins::penguins` dataset and fill in information for each column by adapting documentation from the **palmerpenguins** package.
106 | 
107 | ```{r}
108 | informant_pp <- 
109 |   create_informant(
110 |     tbl = palmerpenguins::penguins,
111 |     tbl_name = "penguins",
112 |     label = "The `penguins` dataset from the **palmerpenguins** pkg."
113 |   ) %>% 
114 |   info_columns(
115 |     columns = "species",
116 |     info = "A factor denoting penguin species (*Adélie*, *Chinstrap*, and *Gentoo*)."
117 |   ) %>%
118 |   info_columns(
119 |     columns = "island",
120 |     info = "A factor denoting island in Palmer Archipelago, Antarctica
121 |     (*Biscoe*, *Dream*, or *Torgersen*)."
122 |   ) %>%
123 |   info_columns(
124 |     columns = "bill_length_mm",
125 |     info = "A number denoting bill length"
126 |   ) %>%
127 |   info_columns(
128 |     columns = "bill_depth_mm",
129 |     info = "A number denoting bill depth"
130 |   ) %>%
131 |   info_columns(
132 |     columns = "flipper_length_mm",
133 |     info = "An integer denoting flipper length"
134 |   ) %>%
135 |   info_columns(
136 |     columns = ends_with("mm"),
137 |     info = "(in units of millimeters)."
138 |   ) %>%
139 |   info_columns(
140 |     columns = "body_mass_g",
141 |     info = "An integer denoting body mass (grams)."
142 |   ) %>%
143 |   info_columns(
144 |     columns = "sex",
145 |     info = "A factor denoting penguin sex (`\"female\"`, `\"male\"`)."
146 |   ) %>%
147 |   info_columns(
148 |     columns = "year",
149 |     info = "The study year (e.g., `2007`, `2008`, `2009`)."
150 |   )
151 | 
152 | informant_pp
153 | ```
154 | 
155 | We can use **tidyselect** functions like `ends_with()` to append info text to a common subsection that exists across multiple columns. This was useful for stating the units which were common across three columns: `bill_length_mm`, `bill_depth_mm`, and `flipper_length_mm`. The following **tidyselect** functions are available in pointblank to make this process easier:
156 | 
157 | - `starts_with()`: Match columns that start with a prefix.
158 | - `ends_with()`: Match columns that end with a suffix.
159 | - `contains()`: Match columns that contain a literal string.
160 | - `matches()`: Perform matching with a regular expression.
161 | - `everything()`: Select all columns.
162 | 
163 | ------
164 | 
165 | ### Creating extra sections with `info_section()`
166 | 
167 | Any information that doesn't fit in the *TABLE* or *COLUMNS* sections can be placed in extra sections with `info_section()`. These sections go at the bottom (in the order of creation). Let’s include a *SOURCE* section that provides references and a note on the data.
168 | 
169 | ```{r}
170 | informant_pp <- 
171 |   informant_pp %>%
172 |   info_section(
173 |     section_name = "source",
174 |     References = c(
175 |       
176 | "Adélie penguins: Palmer Station Antarctica LTER and K. Gorman. 2020. Structural 
177 | size measurements and isotopic signatures of foraging among adult male and female 
178 | Adélie penguins (Pygoscelis adeliae) nesting along the Palmer Archipelago near
179 | Palmer Station, 2007-2009 ver 5. Environmental Data Initiative
180 | <https://doi.org/10.6073/pasta/98b16d7d563f265cb52372c8ca99e60f>",
181 | 
182 | "Gentoo penguins: Palmer Station Antarctica LTER and K. Gorman. 2020. Structural
183 | size measurements and isotopic signatures of foraging among adult male and female
184 | Gentoo penguin (Pygoscelis papua) nesting along the Palmer Archipelago near Palmer
185 | Station, 2007-2009 ver 5. Environmental Data Initiative
186 | <https://doi.org/10.6073/pasta/7fca67fb28d56ee2ffa3d9370ebda689>",
187 | 
188 | "Chinstrap penguins: Palmer Station Antarctica LTER and K. Gorman. 2020.
189 | Structural size measurements and isotopic signatures of foraging among adult male
190 | and female Chinstrap penguin (Pygoscelis antarcticus) nesting along the Palmer
191 | Archipelago near Palmer Station, 2007-2009 ver 6. Environmental Data Initiative
192 | <https://doi.org/10.6073/pasta/c14dfcfada8ea13a17536e73eb6fbe9e>"
193 |     ),
194 |     Note = 
195 | "Originally published in: Gorman KB, Williams TD, Fraser WR (2014) Ecological Sexual
196 | Dimorphism and Environmental Variability within a Community of Antarctic Penguins
197 | (Genus Pygoscelis). PLoS ONE 9(3): e90081. doi:10.1371/journal.pone.0090081
198 | "
199 | )
200 | 
201 | informant_pp
202 | ```
203 | 
204 | What other types of information go well in these separate sections? Some ideas are:
205 | 
206 | - any info related to the source of the data table (e.g., references, background, etc.)
207 | - definitions/explanations of terms used above
208 | - persons responsible for the data table, perhaps with contact information
209 | - further details on how the table is produced
210 | - any important issues with the table and notes on upcoming changes
211 | - links to other information artifacts that pertain to the table
212 | - report generation metadata, which might include things like the update history, persons responsible, instructions on how to contribute, etc.
213 | 
214 | ### Customizing the information report with `get_informant_report()`
215 | 
216 | With `get_informant_report()`, it's possible to alter the title of the information report, change the width of the table, and more. Let's make it so the report is slightly narrower at `600px` and that the title is the name of the table.
217 | 
218 | ```{r}
219 | informant_report <-
220 |   informant_pp %>%
221 |   get_informant_report(size = "600px", title = ":tbl_name:")
222 | 
223 | informant_report
224 | ```
225 | 
226 | Given that this report looks really good, it can be published in a variety of ways (e.g., Connect, Quarto Pub, etc.), and, you can export to a standalone HTML file with `export_report()`.
227 | 
228 | ```{r eval=FALSE}
229 | export_report(informant_report, filename = "informant-penguins.html")
230 | ```
231 | 
232 | ### SUMMARY
233 | 
234 | 1. Begin the process of documenting a dataset with `create_informant()`.
235 | 2. Use `info_tabular()` to describe the table in general terms.
236 | 3. With `info_columns()`, you can document each column in a dataset.
237 | 4. Arbitrary sections of additional information can be added with `info_section()`.
238 | 5. We can control the look and feel of the information report with `get_informant_report()`.
239 | 6. It's possible to export the informant to standalone HTML with `export_report()`.
240 | 


--------------------------------------------------------------------------------
/06-getting-deeper-into-documenting-data.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Getting Deeper Into Documenting Data"
  3 | output: html_document
  4 | ---
  5 | 
  6 | ```{r setup, include=FALSE}
  7 | knitr::opts_chunk$set(echo = TRUE)
  8 | library(pointblank)
  9 | library(tidyverse)
 10 | ```
 11 | 
 12 | ## Intro
 13 | 
 14 | We now know how to make a useful data dictionary that can be published and widely shared. We used a **pointblank** `informant` with a set of information functions to generate *info text* and put that text into the appropriate report sections. We’re going to take this a few steps further and look into some more functionality makes *info text* more dynamic and also include a finalizing step in this workflow that accounts for evolving data.
 15 | 
 16 | ### Creating snippets of useful text with `info_snippet()`
 17 | 
 18 | A great source of information about the table can be the table itself. Suppose you want to show:
 19 | 
 20 | - some categorical values from a particular column
 21 | - a range of values in an important numeric column
 22 | - KPI values that can be calculated using data in the table
 23 | 
 24 | This can all be done with the `info_snippet()` function. You give the snippet a name and you give it a function call. Let’s do this for the `small_table` dataset available in pointblank. Again, this is what that table looks like:
 25 | 
 26 | ```{r paged.print=FALSE}
 27 | pointblank::small_table
 28 | ```
 29 | 
 30 | If you wanted the mean value of data in column `d` rounded to one decimal place, one such way we could do it is with this expression:
 31 | 
 32 | ```{r}
 33 | small_table %>% .$d %>% mean() %>% round(1)
 34 | ```
 35 | 
 36 | Inside of an `info_snippet()` call, which is used after creating the informant object, the expression would look like this:
 37 | 
 38 | ```{r}
 39 | informant <- 
 40 |   create_informant(
 41 |     tbl = small_table,
 42 |     tbl_name = "small_table",
 43 |     label = "Metadata for the `small_table` dataset."
 44 |   ) %>%
 45 |   info_snippet(
 46 |     snippet_name = "mean_d",
 47 |     fn = ~ . %>% .$d %>% mean() %>% round(1)
 48 |   )
 49 | ```
 50 | 
 51 | The `small_table` dataset is associated with the `informant` as the target table, so, it’s represented as the leading `.` in the functional sequence given to `fn` inside of `info_snippet()`. It’s important to note that there’s a leading `~`, making this expression a formula (i.e., we don’t want to execute anything here, at this time).
 52 | 
 53 | Lastly, the snippet has been given the name `"mean_d"`. We know that this snippet will produce the value `2304.7` so what can we do with that? We should put that value into some info text and use the `snippet_name` as the key. It works similarly to how the **glue** package does text interpolation, and here’s the continuation of the above example:
 54 | 
 55 | ```{r}
 56 | informant <- 
 57 |   informant %>%
 58 |   info_columns(
 59 |     columns = vars(d),
 60 |     info = "This column contains fairly large numbers (much larger than
 61 |     those numbers in column `a`. The mean value is {mean_d}, which is
 62 |     far greater than any number in that other column."
 63 |   )
 64 | ```
 65 | 
 66 | Within the text, there’s the use of curly braces and the name of the snippet (`{mean_d}`). That’s where the `2304.7` value will be inserted. This methodology for inserting the computed values of snippets can be performed wherever info text is provided (in either of the `info_tabular()`, `info_columns()`, and `info_section()` functions). 
 67 | 
 68 | There's one last step. We have to finalize everything with the `incorporate()` function. Using this instructs **pointblank** to query the data (this is similar to using `interrogate()` when doing data validation).
 69 | 
 70 | Let’s write the whole thing again and finish it off with a call to `incorporate()`.
 71 | 
 72 | ```{r}
 73 | informant <- 
 74 |   create_informant(
 75 |     tbl = small_table,
 76 |     tbl_name = "small_table",
 77 |     label = "Metadata for the `small_table` dataset."
 78 |   ) %>%
 79 |   info_snippet(
 80 |     snippet_name = "mean_d",
 81 |     fn = ~ . %>% .$d %>% mean() %>% round(1)
 82 |   ) %>%
 83 |   info_columns(
 84 |     columns = vars(d),
 85 |     info = "This column contains fairly large numbers (much larger than
 86 |     those numbers in column `a`. The mean value is {mean_d}, which is
 87 |     far greater than any number in that other column."
 88 |   ) %>%
 89 |   incorporate()
 90 | ```
 91 | 
 92 | Now let's print the `informant` to get the information report for the table.
 93 | 
 94 | ```{r}
 95 | informant
 96 | ```
 97 | 
 98 | ### Using `snip_*()` functions with `info_snippet()`
 99 | 
100 | There are a few functions available in **pointblank** that make it much easier to get commonly-used text snippets. All of them begin with the `snip_` prefix and they are:
101 | 
102 | - `snip_list()`: Get a list of column categories
103 | - `snip_lowest()`: Get the lowest value from a column
104 | - `snip_highest()`: Get the highest value from a column
105 | - `snip_stats()`: Get an inline statistical summary
106 | 
107 | Each of these functions can be used directly as a `fn` value in `info_snippet()` and we don’t have to specify the table since its assumed that the target table is where we’ll be snipping data from. Let’s have a look at each of these in action.
108 | 
109 | #### `snip_list()`
110 | 
111 | When describing some aspect of the target table, we may want to extract some values from a column and include them as a piece of info text. This can be efficiently done with `snip_list()`.
112 | 
113 | ```{r}
114 | informant_pp <- 
115 |   create_informant(
116 |     tbl = ~ palmerpenguins::penguins,
117 |     tbl_name = "penguins",
118 |     label = "The `penguins` dataset from the **palmerpenguins** pkg."
119 |   ) %>% 
120 |   info_snippet(
121 |     snippet_name = "species_snippet",
122 |     fn = snip_list(column = "species")
123 |   ) %>%
124 |   info_snippet(
125 |     snippet_name = "island_snippet",
126 |     fn = snip_list(column = "island")
127 |   ) %>%
128 |   info_columns(
129 |     columns = "species",
130 |     info = "A factor denoting penguin species ({species_snippet})."
131 |   ) %>%
132 |   info_columns(
133 |     columns = "island",
134 |     info = "A factor denoting island in Palmer Archipelago, Antarctica
135 |     ({island_snippet})."
136 |   ) %>%
137 |   incorporate()
138 | ```
139 | 
140 | ```{r}
141 | informant_pp
142 | ```
143 | 
144 | This also works for numeric values. Let’s use `snip_list()` to provide a text snippet based on values in the `year` column (which is an `integer` column):
145 | 
146 | ```{r}
147 | informant_pp <-
148 |   informant_pp %>%
149 |   info_columns(
150 |     columns = "year",
151 |     info = "The study year ({year_snippet})."
152 |   ) %>%
153 |   info_snippet(
154 |     snippet_name = "year_snippet",
155 |     fn = snip_list(column = "year")
156 |   ) %>%
157 |   incorporate()
158 | ```
159 | 
160 | ```{r}
161 | informant_pp
162 | ```
163 | 
164 | #### `snip_lowest()` and `snip_highest()`
165 | 
166 | We can get the lowest and highest values from a column and inject those formatted values into some info_text. Let’s do that for some of the measured values in the penguins dataset with `snip_lowest()` and `snip_highest()`.
167 | 
168 | ```{r}
169 | informant_pp <-
170 |   informant_pp %>%
171 |   info_columns(
172 |     columns = "bill_length_mm",
173 |     info = "A number denoting bill length"
174 |   ) %>%
175 |   info_columns(
176 |     columns = "bill_depth_mm",
177 |     info = "A number denoting bill depth (in the range of
178 |     {min_depth} to {max_depth} millimeters)."
179 |   ) %>%
180 |   info_columns(
181 |     columns = "flipper_length_mm",
182 |     info = "An integer denoting flipper length"
183 |   ) %>%
184 |   info_columns(
185 |     columns = matches("length"),
186 |     info = "(in units of millimeters)."
187 |   ) %>%
188 |   info_columns(
189 |     columns = "flipper_length_mm",
190 |     info = "Largest observed is {largest_flipper_length} mm."
191 |   ) %>%
192 |   info_snippet(
193 |     snippet_name = "min_depth",
194 |     fn = snip_lowest(column = "bill_depth_mm")
195 |   ) %>%
196 |   info_snippet(
197 |     snippet_name = "max_depth",
198 |     fn = snip_highest(column = "bill_depth_mm")
199 |   ) %>%
200 |   info_snippet(
201 |     snippet_name = "largest_flipper_length",
202 |     fn = snip_highest(column = "flipper_length_mm")
203 |   ) %>%
204 |   incorporate()
205 | ```
206 | 
207 | ```{r}
208 | informant_pp
209 | ```
210 | 
211 | We can see from the report output that we can creatively use the lowest and highest values obtained by `snip_lowest()` and `snip_highest()` to specify a range or simply show some maximum value.
212 | 
213 | Note that while the ordering of the `info_columns()` calls in the example affects the overall layout of the text (through the text appending behavior), the placement of `info_snippet()` calls does not matter. And, again, we must use `incorporate()` to update all of the text snippets and render them in their appropriate locations.
214 | 
215 | ### Enhancements to text: *Text Tricks*
216 | 
217 | You can use Markdown but there are a few extra tricks that can make the resulting text even better; we call them *Text Tricks*. Once you know about these text tricks you’ll be able to express information in many more interesting ways.
218 | 
219 | #### Links and Dates
220 | 
221 | If you have links in your text, **pointblank** will try to identify them and style them nicely. This amounts to using a pleasing, light-blue color and underlines that appear on hover. It doesn’t take much to style links but it does require *something*. So, Markdown links written as `< link url >` or `[ link text ]( link url )` will both get the transformation treatment.
222 | 
223 | Sometimes you want dates to stand out from text. Try enclosing a date expressed in the ISO-8601 standard with parentheses, like this: `(2004-12-01)`.
224 | 
225 | Here’s how we might use these features while otherwise adding more information to the **palmerpenguins** reporting:
226 | 
227 | ```{r}
228 | informant_pp <-
229 |   informant_pp %>%
230 |   info_tabular(
231 |     `R dataset` = "The goal of `palmerpenguins` is to provide a great dataset
232 |     for data exploration & visualization, as an alternative to `iris`. The
233 |     latest CRAN release was published on (2020-07-25).",
234 |     `data collection` = "Data were collected and made available by Dr. Kristen
235 |     Gorman and the [Palmer Station, Antarctica LTER](https://pal.lternet.edu),
236 |     a member of the [Long Term Ecological Research Network](https://lternet.edu).",
237 |     citation = "Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer 
238 |     Archipelago (Antarctica) penguin data. R package version 0.1.0.
239 |     <https://allisonhorst.github.io/palmerpenguins/>. 
240 |     doi: 10.5281/zenodo.3960218."
241 |   ) %>%
242 |   incorporate()
243 | ```
244 | 
245 | ```{r}
246 | informant_pp
247 | ```
248 | 
249 | #### Labels
250 | 
251 | We can take portions of text and present them as labels. These will help you call out important attributes in short form and may eliminate the need for oft-repeated statements. You might apply to labels to signify priority, category, or any other information you find useful. To do this we have two options,
252 | 
253 | 1. Use double parentheses for a rectangular label: `((label text))`
254 | 2. Use triple parens for a rounded-rectangular label: `(((label text)))`
255 | 
256 | ```{r}
257 | informant_pp <-
258 |   informant_pp %>%
259 |   info_columns(
260 |     columns = vars(body_mass_g), 
261 |     info = "An integer denoting body mass."
262 |   ) %>%
263 |   info_columns(
264 |     columns = c(ends_with("mm"), ends_with("g")),
265 |     info = "((measured))"    
266 |   ) %>%
267 |   info_section(
268 |     section_name = "additional notes",
269 |     `data types` = "(((factor))) (((numeric))) (((integer)))"
270 |   ) %>%
271 |   incorporate()
272 | ```
273 | 
274 | ```{r}
275 | informant_pp
276 | ```
277 | 
278 | #### Styled text
279 | 
280 | If you want to use CSS styles on spans of info text, it’s possible with the following construction:
281 | 
282 | `[[ info text ]]<< CSS style rules >>`
283 | 
284 | It’s important to ensure that each CSS rule is concluded with a `;` character in this syntax. Styling the word `factor` inside a piece of *info text* might look like this:
285 | 
286 | `This is a [[factor]]<<color: red; font-weight: 300;>> value.`
287 | 
288 | here are many CSS style rules that can be used. Here’s a sample of a few useful ones:
289 | 
290 | - `color: <a color value>;` (text color)
291 | - `background-color: <a color value>;` (the text’s background color)
292 | - `text-decoration: (overline | line-through | underline);`
293 | - `text-transform: (uppercase | lowercase | capitalize);`
294 | - `letter-spacing: <a +/- length value>;`
295 | - `word-spacing: <a +/- length value>;`
296 | - `font-style: (normal | italic | oblique);`
297 | - `font-weight: (normal | bold | 100-900);`
298 | - `font-variant: (normal | bold | 100-900);`
299 | - `border: <a color value> <a length value> (solid | dashed | dotted);`
300 | 
301 | Continuing with our palmerpenguins reporting, we’ll add some more info text and take the opportunity to add CSS style rules using the `[[ ]]<< >>` syntax.
302 | 
303 | ```{r}
304 | informant_pp <-
305 |   informant_pp %>%
306 |   info_columns(
307 |     columns = vars(sex), 
308 |     info = "A [[factor]]<<text-decoration: underline;>> 
309 |     denoting penguin sex (female or male)."
310 |   ) %>%
311 |   info_section(
312 |     section_name = "additional notes",
313 |     keywords = "
314 |     [[((penguins))]]<<border-color: platinum; background-color: #F0F8FF;>>
315 |      [[((Antarctica))]]<<border-color: #800080; background-color: #F2F2F2;>>
316 |      [[((measurements))]]<<border-color: #FFB3B3; background-color: #FFFEF4;>>
317 |     "
318 |   ) %>%
319 |   incorporate()
320 | ```
321 | 
322 | ```{r}
323 | informant_pp
324 | ```
325 | 
326 | With the above `info_columns()` and `info_section()` function calls, we are able to style a single word (with an underline) and even style labels (changing the border and background colors). The syntax here is somewhat forgiving, allowing you to put line breaks between `]]` and `<<` and between style rules so that lines of markup don’t have to be overly long.
327 | 
328 | ### SUMMARY
329 | 
330 | 1. We can query the table being documented with an expression inside `info_snippet()`. This allows us to inject the expression output into *info text*.
331 | 2. There are several `snip_*()` functions included in **pointblank** that handle common use cases. They are used like this: `info_snippet(fn = snip_*(...))`.
332 | 3. We can create label-like text with `(( ))` or `((( )))`
333 | 4. We can style text with `[[ ]]<< >>`.
334 | 5. Links will be styled automatically if you use Markdown links; dates in ISO 8601 notation can be autostyled if enclosed in parentheses.
335 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 | Creative Commons Legal Code
  2 | 
  3 | CC0 1.0 Universal
  4 | 
  5 |     CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
  6 |     LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN
  7 |     ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
  8 |     INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
  9 |     REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS
 10 |     PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM
 11 |     THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED
 12 |     HEREUNDER.
 13 | 
 14 | Statement of Purpose
 15 | 
 16 | The laws of most jurisdictions throughout the world automatically confer
 17 | exclusive Copyright and Related Rights (defined below) upon the creator
 18 | and subsequent owner(s) (each and all, an "owner") of an original work of
 19 | authorship and/or a database (each, a "Work").
 20 | 
 21 | Certain owners wish to permanently relinquish those rights to a Work for
 22 | the purpose of contributing to a commons of creative, cultural and
 23 | scientific works ("Commons") that the public can reliably and without fear
 24 | of later claims of infringement build upon, modify, incorporate in other
 25 | works, reuse and redistribute as freely as possible in any form whatsoever
 26 | and for any purposes, including without limitation commercial purposes.
 27 | These owners may contribute to the Commons to promote the ideal of a free
 28 | culture and the further production of creative, cultural and scientific
 29 | works, or to gain reputation or greater distribution for their Work in
 30 | part through the use and efforts of others.
 31 | 
 32 | For these and/or other purposes and motivations, and without any
 33 | expectation of additional consideration or compensation, the person
 34 | associating CC0 with a Work (the "Affirmer"), to the extent that he or she
 35 | is an owner of Copyright and Related Rights in the Work, voluntarily
 36 | elects to apply CC0 to the Work and publicly distribute the Work under its
 37 | terms, with knowledge of his or her Copyright and Related Rights in the
 38 | Work and the meaning and intended legal effect of CC0 on those rights.
 39 | 
 40 | 1. Copyright and Related Rights. A Work made available under CC0 may be
 41 | protected by copyright and related or neighboring rights ("Copyright and
 42 | Related Rights"). Copyright and Related Rights include, but are not
 43 | limited to, the following:
 44 | 
 45 |   i. the right to reproduce, adapt, distribute, perform, display,
 46 |      communicate, and translate a Work;
 47 |  ii. moral rights retained by the original author(s) and/or performer(s);
 48 | iii. publicity and privacy rights pertaining to a person's image or
 49 |      likeness depicted in a Work;
 50 |  iv. rights protecting against unfair competition in regards to a Work,
 51 |      subject to the limitations in paragraph 4(a), below;
 52 |   v. rights protecting the extraction, dissemination, use and reuse of data
 53 |      in a Work;
 54 |  vi. database rights (such as those arising under Directive 96/9/EC of the
 55 |      European Parliament and of the Council of 11 March 1996 on the legal
 56 |      protection of databases, and under any national implementation
 57 |      thereof, including any amended or successor version of such
 58 |      directive); and
 59 | vii. other similar, equivalent or corresponding rights throughout the
 60 |      world based on applicable law or treaty, and any national
 61 |      implementations thereof.
 62 | 
 63 | 2. Waiver. To the greatest extent permitted by, but not in contravention
 64 | of, applicable law, Affirmer hereby overtly, fully, permanently,
 65 | irrevocably and unconditionally waives, abandons, and surrenders all of
 66 | Affirmer's Copyright and Related Rights and associated claims and causes
 67 | of action, whether now known or unknown (including existing as well as
 68 | future claims and causes of action), in the Work (i) in all territories
 69 | worldwide, (ii) for the maximum duration provided by applicable law or
 70 | treaty (including future time extensions), (iii) in any current or future
 71 | medium and for any number of copies, and (iv) for any purpose whatsoever,
 72 | including without limitation commercial, advertising or promotional
 73 | purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each
 74 | member of the public at large and to the detriment of Affirmer's heirs and
 75 | successors, fully intending that such Waiver shall not be subject to
 76 | revocation, rescission, cancellation, termination, or any other legal or
 77 | equitable action to disrupt the quiet enjoyment of the Work by the public
 78 | as contemplated by Affirmer's express Statement of Purpose.
 79 | 
 80 | 3. Public License Fallback. Should any part of the Waiver for any reason
 81 | be judged legally invalid or ineffective under applicable law, then the
 82 | Waiver shall be preserved to the maximum extent permitted taking into
 83 | account Affirmer's express Statement of Purpose. In addition, to the
 84 | extent the Waiver is so judged Affirmer hereby grants to each affected
 85 | person a royalty-free, non transferable, non sublicensable, non exclusive,
 86 | irrevocable and unconditional license to exercise Affirmer's Copyright and
 87 | Related Rights in the Work (i) in all territories worldwide, (ii) for the
 88 | maximum duration provided by applicable law or treaty (including future
 89 | time extensions), (iii) in any current or future medium and for any number
 90 | of copies, and (iv) for any purpose whatsoever, including without
 91 | limitation commercial, advertising or promotional purposes (the
 92 | "License"). The License shall be deemed effective as of the date CC0 was
 93 | applied by Affirmer to the Work. Should any part of the License for any
 94 | reason be judged legally invalid or ineffective under applicable law, such
 95 | partial invalidity or ineffectiveness shall not invalidate the remainder
 96 | of the License, and in such case Affirmer hereby affirms that he or she
 97 | will not (i) exercise any of his or her remaining Copyright and Related
 98 | Rights in the Work or (ii) assert any associated claims and causes of
 99 | action with respect to the Work, in either case contrary to Affirmer's
100 | express Statement of Purpose.
101 | 
102 | 4. Limitations and Disclaimers.
103 | 
104 |  a. No trademark or patent rights held by Affirmer are waived, abandoned,
105 |     surrendered, licensed or otherwise affected by this document.
106 |  b. Affirmer offers the Work as-is and makes no representations or
107 |     warranties of any kind concerning the Work, express, implied,
108 |     statutory or otherwise, including without limitation warranties of
109 |     title, merchantability, fitness for a particular purpose, non
110 |     infringement, or the absence of latent or other defects, accuracy, or
111 |     the present or absence of errors, whether or not discoverable, all to
112 |     the greatest extent permissible under applicable law.
113 |  c. Affirmer disclaims responsibility for clearing rights of other persons
114 |     that may apply to the Work or any use thereof, including without
115 |     limitation any person's Copyright and Related Rights in the Work.
116 |     Further, Affirmer disclaims responsibility for obtaining any necessary
117 |     consents, permissions or other rights required for any use of the
118 |     Work.
119 |  d. Affirmer understands and acknowledges that Creative Commons is not a
120 |     party to this document and has no duty or obligation with respect to
121 |     this CC0 or use of the Work.
122 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## The **pointblank** Workshop
 2 | 
 3 | This **pointblank** workshop will teach you *a lot* about what **pointblank** can do, and, it'll give you an opportunity to experiment with the package. All materials are also available as a Posit Cloud project, making it easy to get up and running.
 4 | 
 5 | https://posit.cloud/content/4726872
 6 | 
 7 | The goal of the workshop is to introduce you to a lot of examples and provide some time to use the functions of **pointblank** with some sample datasets, learning bit-by-bit as we go.
 8 | 
 9 | Each module of the workshop focuses on a different subset of functions and they are all presented here as **R Markdown** (.Rmd) files, with one file for each workshop module:
10 | 
11 | - `"01-intro-to-data-validation.Rmd"` (The `agent`, validation fns, interrogation/reports)
12 | - `"02-scan-your-data.Rmd"` (Looking at your data with `scan_data()`)
13 | - `"03-expect-test-functions.Rmd"` (Using the `expect_*()` and `test_*()` functions)
14 | - `"04-scaling-up-data-validation.Rmd"` (The `multiagent` and its reporting structures)
15 | - `"05-intro-to-data-documentation.Rmd"` (The `informant` and describing your data)
16 | - `"06-getting-deeper-into-documenting-data.Rmd"` (Using snippets and text tricks)
17 | 
18 | You can navigate to any of these and modify the code within the self-contained **R Markdown** code chunks. Entire **R Markdown** files can be knit to HTML, where a separate window will show the rendered document.
19 | 
20 | ### Installation
21 | 
22 | Installation of **pointblank** on your system is done by using `install.packages()`:
23 | 
24 | ```{r eval=FALSE}
25 | # install.packages("pointblank")
26 | ```
27 | 
28 | You can optionally use the development version of **pointblank**, installing it from GitHub with `devtools::install_github()`.
29 | 
30 | ```{r eval=FALSE}
31 | # devtools::install_github("rich-iannone/pointblank")
32 | ```
33 | 


--------------------------------------------------------------------------------
/game_revenue-validation.R:
--------------------------------------------------------------------------------
 1 | library(pointblank)
 2 | 
 3 | agent <-
 4 |   create_agent(
 5 |     tbl = ~pointblank::game_revenue,
 6 |     actions = action_levels(
 7 |       warn_at = 0.05,
 8 |       stop_at = 0.10
 9 |     ),
10 |     tbl_name = "~pointblank::game_revenue",
11 |     label = "Validation plan generated by `draft_validation()`."
12 |   ) %>%
13 |   # Expect that column `player_id` is of type: character
14 |   col_is_character(
15 |     columns = vars(player_id)
16 |   ) %>%
17 |   # Expect that column `session_id` is of type: character
18 |   col_is_character(
19 |     columns = vars(session_id)
20 |   ) %>%
21 |   # Expect that column `item_type` is of type: character
22 |   col_is_character(
23 |     columns = vars(item_type)
24 |   ) %>%
25 |   # Expect that column `item_name` is of type: character
26 |   col_is_character(
27 |     columns = vars(item_name)
28 |   ) %>%
29 |   # Expect that column `item_revenue` is of type: numeric
30 |   col_is_numeric(
31 |     columns = vars(item_revenue)
32 |   ) %>%
33 |   # Expect that values in `item_revenue` should be between `0.004` and `142.989`
34 |   col_vals_between(
35 |     columns = vars(item_revenue),
36 |     left = 0.004,
37 |     right = 142.989
38 |   ) %>%
39 |   # Expect that column `session_duration` is of type: numeric
40 |   col_is_numeric(
41 |     columns = vars(session_duration)
42 |   ) %>%
43 |   # Expect that values in `session_duration` should be between `3.2` and `41`
44 |   col_vals_between(
45 |     columns = vars(session_duration),
46 |     left = 3.2,
47 |     right = 41
48 |   ) %>%
49 |   # Expect that column `acquisition` is of type: character
50 |   col_is_character(
51 |     columns = vars(acquisition)
52 |   ) %>%
53 |   # Expect that column `country` is of type: character
54 |   col_is_character(
55 |     columns = vars(country)
56 |   ) %>%
57 |   # Expect that values in `country` should be in the set of `Germany`, `Canada`, `South Korea` (and 20 more)
58 |   col_vals_in_set(
59 |     columns = vars(country),
60 |     set = c("Germany", "Canada", "South Korea", "Sweden", "Austria", "Hong Kong", "United States", "Mexico", "Egypt", "Denmark", "Norway", "Japan", "Australia", "South Africa", "Spain", "France", "Portugal", "Russia", "India", "Switzerland", "China", "Philippines", "United Kingdom")
61 |   ) %>%
62 |   # Expect entirely distinct rows across all columns
63 |   rows_distinct() %>%
64 |   # Expect that column schemas match
65 |   col_schema_match(
66 |     schema = col_schema(
67 |       player_id = "character",
68 |       session_id = "character",
69 |       session_start = c("POSIXct", "POSIXt"),
70 |       time = c("POSIXct", "POSIXt"),
71 |       item_type = "character",
72 |       item_name = "character",
73 |       item_revenue = "numeric",
74 |       session_duration = "numeric",
75 |       start_day = "Date",
76 |       acquisition = "character",
77 |       country = "character"
78 |     )
79 |   ) %>%
80 |   interrogate()
81 | 
82 | agent
83 | 
84 | write_testthat_file(agent = agent, name = "game_revenue", path = ".")
85 | 


--------------------------------------------------------------------------------
/informant-penguins.html:
--------------------------------------------------------------------------------
  1 | <!DOCTYPE html>
  2 | <html lang="en">
  3 | <head>
  4 | <meta charset="utf-8"/>
  5 | <style>body{background-color:white;}</style>
  6 | 
  7 | 
  8 | </head>
  9 | <body>
 10 | <div id="pb_information" style="overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
 11 |   <style>@import url("https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:ital,wght@0,100;0,200;0,300;0,400;0,500;0,600;0,700;0,800;0,900;1,100;1,200;1,300;1,400;1,500;1,600;1,700;1,800;1,900&display=swap");
 12 | html {
 13 |   font-family: 'IBM Plex Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif;
 14 | }
 15 | 
 16 | #pb_information .gt_table {
 17 |   display: table;
 18 |   border-collapse: collapse;
 19 |   margin-left: auto;
 20 |   margin-right: auto;
 21 |   color: #333333;
 22 |   font-size: 130%;
 23 |   font-weight: normal;
 24 |   font-style: normal;
 25 |   background-color: #FFFFFF;
 26 |   width: 600px;
 27 |   border-top-style: solid;
 28 |   border-top-width: 2px;
 29 |   border-top-color: #A8A8A8;
 30 |   border-right-style: none;
 31 |   border-right-width: 2px;
 32 |   border-right-color: #D3D3D3;
 33 |   border-bottom-style: solid;
 34 |   border-bottom-width: 2px;
 35 |   border-bottom-color: #A8A8A8;
 36 |   border-left-style: none;
 37 |   border-left-width: 2px;
 38 |   border-left-color: #D3D3D3;
 39 | }
 40 | 
 41 | #pb_information .gt_heading {
 42 |   background-color: #FFFFFF;
 43 |   text-align: center;
 44 |   border-bottom-color: #FFFFFF;
 45 |   border-left-style: none;
 46 |   border-left-width: 1px;
 47 |   border-left-color: #D3D3D3;
 48 |   border-right-style: none;
 49 |   border-right-width: 1px;
 50 |   border-right-color: #D3D3D3;
 51 | }
 52 | 
 53 | #pb_information .gt_title {
 54 |   color: #333333;
 55 |   font-size: 125%;
 56 |   font-weight: initial;
 57 |   padding-top: 4px;
 58 |   padding-bottom: 4px;
 59 |   padding-left: 5px;
 60 |   padding-right: 5px;
 61 |   border-bottom-color: #FFFFFF;
 62 |   border-bottom-width: 0;
 63 | }
 64 | 
 65 | #pb_information .gt_subtitle {
 66 |   color: #333333;
 67 |   font-size: 85%;
 68 |   font-weight: initial;
 69 |   padding-top: 0;
 70 |   padding-bottom: 6px;
 71 |   padding-left: 5px;
 72 |   padding-right: 5px;
 73 |   border-top-color: #FFFFFF;
 74 |   border-top-width: 0;
 75 | }
 76 | 
 77 | #pb_information .gt_bottom_border {
 78 |   border-bottom-style: solid;
 79 |   border-bottom-width: 2px;
 80 |   border-bottom-color: #D3D3D3;
 81 | }
 82 | 
 83 | #pb_information .gt_col_headings {
 84 |   border-top-style: solid;
 85 |   border-top-width: 2px;
 86 |   border-top-color: #D3D3D3;
 87 |   border-bottom-style: solid;
 88 |   border-bottom-width: 2px;
 89 |   border-bottom-color: #D3D3D3;
 90 |   border-left-style: none;
 91 |   border-left-width: 1px;
 92 |   border-left-color: #D3D3D3;
 93 |   border-right-style: none;
 94 |   border-right-width: 1px;
 95 |   border-right-color: #D3D3D3;
 96 | }
 97 | 
 98 | #pb_information .gt_col_heading {
 99 |   color: #333333;
100 |   background-color: #FFFFFF;
101 |   font-size: 100%;
102 |   font-weight: normal;
103 |   text-transform: inherit;
104 |   border-left-style: none;
105 |   border-left-width: 1px;
106 |   border-left-color: #D3D3D3;
107 |   border-right-style: none;
108 |   border-right-width: 1px;
109 |   border-right-color: #D3D3D3;
110 |   vertical-align: bottom;
111 |   padding-top: 5px;
112 |   padding-bottom: 6px;
113 |   padding-left: 5px;
114 |   padding-right: 5px;
115 |   overflow-x: hidden;
116 | }
117 | 
118 | #pb_information .gt_column_spanner_outer {
119 |   color: #333333;
120 |   background-color: #FFFFFF;
121 |   font-size: 100%;
122 |   font-weight: normal;
123 |   text-transform: inherit;
124 |   padding-top: 0;
125 |   padding-bottom: 0;
126 |   padding-left: 4px;
127 |   padding-right: 4px;
128 | }
129 | 
130 | #pb_information .gt_column_spanner_outer:first-child {
131 |   padding-left: 0;
132 | }
133 | 
134 | #pb_information .gt_column_spanner_outer:last-child {
135 |   padding-right: 0;
136 | }
137 | 
138 | #pb_information .gt_column_spanner {
139 |   border-bottom-style: solid;
140 |   border-bottom-width: 2px;
141 |   border-bottom-color: #D3D3D3;
142 |   vertical-align: bottom;
143 |   padding-top: 5px;
144 |   padding-bottom: 5px;
145 |   overflow-x: hidden;
146 |   display: inline-block;
147 |   width: 100%;
148 | }
149 | 
150 | #pb_information .gt_group_heading {
151 |   padding-top: 8px;
152 |   padding-bottom: 8px;
153 |   padding-left: 5px;
154 |   padding-right: 5px;
155 |   color: #333333;
156 |   background-color: #FFFFFF;
157 |   font-size: 100%;
158 |   font-weight: initial;
159 |   text-transform: inherit;
160 |   border-top-style: solid;
161 |   border-top-width: 2px;
162 |   border-top-color: #D3D3D3;
163 |   border-bottom-style: solid;
164 |   border-bottom-width: 2px;
165 |   border-bottom-color: #D3D3D3;
166 |   border-left-style: none;
167 |   border-left-width: 1px;
168 |   border-left-color: #D3D3D3;
169 |   border-right-style: none;
170 |   border-right-width: 1px;
171 |   border-right-color: #D3D3D3;
172 |   vertical-align: middle;
173 |   text-align: left;
174 | }
175 | 
176 | #pb_information .gt_empty_group_heading {
177 |   padding: 0.5px;
178 |   color: #333333;
179 |   background-color: #FFFFFF;
180 |   font-size: 100%;
181 |   font-weight: initial;
182 |   border-top-style: solid;
183 |   border-top-width: 2px;
184 |   border-top-color: #D3D3D3;
185 |   border-bottom-style: solid;
186 |   border-bottom-width: 2px;
187 |   border-bottom-color: #D3D3D3;
188 |   vertical-align: middle;
189 | }
190 | 
191 | #pb_information .gt_from_md > :first-child {
192 |   margin-top: 0;
193 | }
194 | 
195 | #pb_information .gt_from_md > :last-child {
196 |   margin-bottom: 0;
197 | }
198 | 
199 | #pb_information .gt_row {
200 |   padding-top: 8px;
201 |   padding-bottom: 8px;
202 |   padding-left: 5px;
203 |   padding-right: 5px;
204 |   margin: 10px;
205 |   border-top-style: solid;
206 |   border-top-width: 1px;
207 |   border-top-color: #D3D3D3;
208 |   border-left-style: none;
209 |   border-left-width: 1px;
210 |   border-left-color: #D3D3D3;
211 |   border-right-style: none;
212 |   border-right-width: 1px;
213 |   border-right-color: #D3D3D3;
214 |   vertical-align: middle;
215 |   overflow-x: hidden;
216 | }
217 | 
218 | #pb_information .gt_stub {
219 |   color: #333333;
220 |   background-color: #FFFFFF;
221 |   font-size: 100%;
222 |   font-weight: initial;
223 |   text-transform: inherit;
224 |   border-right-style: solid;
225 |   border-right-width: 2px;
226 |   border-right-color: #D3D3D3;
227 |   padding-left: 5px;
228 |   padding-right: 5px;
229 | }
230 | 
231 | #pb_information .gt_stub_row_group {
232 |   color: #333333;
233 |   background-color: #FFFFFF;
234 |   font-size: 100%;
235 |   font-weight: initial;
236 |   text-transform: inherit;
237 |   border-right-style: solid;
238 |   border-right-width: 2px;
239 |   border-right-color: #D3D3D3;
240 |   padding-left: 5px;
241 |   padding-right: 5px;
242 |   vertical-align: top;
243 | }
244 | 
245 | #pb_information .gt_row_group_first td {
246 |   border-top-width: 2px;
247 | }
248 | 
249 | #pb_information .gt_summary_row {
250 |   color: #333333;
251 |   background-color: #FFFFFF;
252 |   text-transform: inherit;
253 |   padding-top: 8px;
254 |   padding-bottom: 8px;
255 |   padding-left: 5px;
256 |   padding-right: 5px;
257 | }
258 | 
259 | #pb_information .gt_first_summary_row {
260 |   border-top-style: solid;
261 |   border-top-color: #D3D3D3;
262 | }
263 | 
264 | #pb_information .gt_first_summary_row.thick {
265 |   border-top-width: 2px;
266 | }
267 | 
268 | #pb_information .gt_last_summary_row {
269 |   padding-top: 8px;
270 |   padding-bottom: 8px;
271 |   padding-left: 5px;
272 |   padding-right: 5px;
273 |   border-bottom-style: solid;
274 |   border-bottom-width: 2px;
275 |   border-bottom-color: #D3D3D3;
276 | }
277 | 
278 | #pb_information .gt_grand_summary_row {
279 |   color: #333333;
280 |   background-color: #FFFFFF;
281 |   text-transform: inherit;
282 |   padding-top: 8px;
283 |   padding-bottom: 8px;
284 |   padding-left: 5px;
285 |   padding-right: 5px;
286 | }
287 | 
288 | #pb_information .gt_first_grand_summary_row {
289 |   padding-top: 8px;
290 |   padding-bottom: 8px;
291 |   padding-left: 5px;
292 |   padding-right: 5px;
293 |   border-top-style: double;
294 |   border-top-width: 6px;
295 |   border-top-color: #D3D3D3;
296 | }
297 | 
298 | #pb_information .gt_striped {
299 |   background-color: rgba(128, 128, 128, 0.05);
300 | }
301 | 
302 | #pb_information .gt_table_body {
303 |   border-top-style: solid;
304 |   border-top-width: 2px;
305 |   border-top-color: #D3D3D3;
306 |   border-bottom-style: solid;
307 |   border-bottom-width: 2px;
308 |   border-bottom-color: #D3D3D3;
309 | }
310 | 
311 | #pb_information .gt_footnotes {
312 |   color: #333333;
313 |   background-color: #FFFFFF;
314 |   border-bottom-style: none;
315 |   border-bottom-width: 2px;
316 |   border-bottom-color: #D3D3D3;
317 |   border-left-style: none;
318 |   border-left-width: 2px;
319 |   border-left-color: #D3D3D3;
320 |   border-right-style: none;
321 |   border-right-width: 2px;
322 |   border-right-color: #D3D3D3;
323 | }
324 | 
325 | #pb_information .gt_footnote {
326 |   margin: 0px;
327 |   font-size: 90%;
328 |   padding-left: 4px;
329 |   padding-right: 4px;
330 |   padding-left: 5px;
331 |   padding-right: 5px;
332 | }
333 | 
334 | #pb_information .gt_sourcenotes {
335 |   color: #333333;
336 |   background-color: #FFFFFF;
337 |   border-bottom-style: none;
338 |   border-bottom-width: 2px;
339 |   border-bottom-color: #D3D3D3;
340 |   border-left-style: none;
341 |   border-left-width: 2px;
342 |   border-left-color: #D3D3D3;
343 |   border-right-style: none;
344 |   border-right-width: 2px;
345 |   border-right-color: #D3D3D3;
346 | }
347 | 
348 | #pb_information .gt_sourcenote {
349 |   font-size: 90%;
350 |   padding-top: 4px;
351 |   padding-bottom: 4px;
352 |   padding-left: 5px;
353 |   padding-right: 5px;
354 | }
355 | 
356 | #pb_information .gt_left {
357 |   text-align: left;
358 | }
359 | 
360 | #pb_information .gt_center {
361 |   text-align: center;
362 | }
363 | 
364 | #pb_information .gt_right {
365 |   text-align: right;
366 |   font-variant-numeric: tabular-nums;
367 | }
368 | 
369 | #pb_information .gt_font_normal {
370 |   font-weight: normal;
371 | }
372 | 
373 | #pb_information .gt_font_bold {
374 |   font-weight: bold;
375 | }
376 | 
377 | #pb_information .gt_font_italic {
378 |   font-style: italic;
379 | }
380 | 
381 | #pb_information .gt_super {
382 |   font-size: 65%;
383 | }
384 | 
385 | #pb_information .gt_footnote_marks {
386 |   font-style: italic;
387 |   font-weight: normal;
388 |   font-size: 75%;
389 |   vertical-align: 0.4em;
390 | }
391 | 
392 | #pb_information .gt_asterisk {
393 |   font-size: 100%;
394 |   vertical-align: 0;
395 | }
396 | 
397 | #pb_information .gt_indent_1 {
398 |   text-indent: 5px;
399 | }
400 | 
401 | #pb_information .gt_indent_2 {
402 |   text-indent: 10px;
403 | }
404 | 
405 | #pb_information .gt_indent_3 {
406 |   text-indent: 15px;
407 | }
408 | 
409 | #pb_information .gt_indent_4 {
410 |   text-indent: 20px;
411 | }
412 | 
413 | #pb_information .gt_indent_5 {
414 |   text-indent: 25px;
415 | }
416 | 
417 | #pb_information {
418 |   -webkit-font-smoothing: antialiased;
419 |   letter-spacing: .15px;
420 | }
421 | 
422 | #pb_information a {
423 |   color: #375F84;
424 |   text-decoration: none;
425 | }
426 | 
427 | #pb_information a:hover {
428 |   color: #375F84;
429 |   text-decoration: underline;
430 | }
431 | 
432 | #pb_information p {
433 |   overflow: visible;
434 |   margin-top: 2px;
435 |   margin-left: 0;
436 |   margin-right: 0;
437 |   margin-bottom: 5px;
438 | }
439 | 
440 | #pb_information ul {
441 |   list-style-type: square;
442 |   padding-left: 25px;
443 |   margin-top: -4px;
444 |   margin-bottom: 6px;
445 | }
446 | 
447 | #pb_information li {
448 |   text-indent: -1px;
449 | }
450 | 
451 | #pb_information h4 {
452 |   font-weight: 500;
453 |   color: #444444;
454 | }
455 | 
456 | #pb_information code {
457 |   font-family: 'IBM Plex Mono', monospace, courier;
458 |   font-size: 90%;
459 |   font-weight: 500;
460 |   color: #666666;
461 |   background-color: transparent;
462 |   padding: 0;
463 | }
464 | 
465 | #pb_information .pb_date {
466 |   text-decoration-style: solid;
467 |   text-decoration-color: #9933CC;
468 |   text-decoration-line: underline;
469 |   text-underline-position: under;
470 |   font-variant-numeric: tabular-nums;
471 |   margin-right: 4px;
472 | }
473 | 
474 | #pb_information .pb_label {
475 |   border: solid 1px;
476 |   border-color: inherit;
477 |   padding: 0px 3px 0px 3px;
478 | }
479 | 
480 | #pb_information .pb_label_rounded {
481 |   border: solid 1px;
482 |   border-color: inherit;
483 |   border-radius: 8px;
484 |   padding: 0px 8px 0px 8px;
485 | }
486 | 
487 | #pb_information .pb_sub_label {
488 |   font-size: smaller;
489 |   color: #777777;
490 | }
491 | 
492 | #pb_information .pb_col_type {
493 |   font-size: 75%;
494 | }
495 | 
496 | #pb_information .gt_sourcenote {
497 |   height: 35px;
498 |   font-size: 60%;
499 |   padding: 0;
500 | }
501 | 
502 | #pb_information .gt_group_heading {
503 |   text-indent: -3px;
504 | }
505 | </style>
506 |   <table class="gt_table">
507 |   <thead class="gt_header">
508 |     <tr>
509 |       <td colspan="1" class="gt_heading gt_title gt_font_normal" style="color: #444444; font-size: 28px; text-align: left; font-weight: 500;"><code>penguins</code></td>
510 |     </tr>
511 |     <tr>
512 |       <td colspan="1" class="gt_heading gt_subtitle gt_font_normal gt_bottom_border" style="font-size: 12px; text-align: left;"><span style="text-decoration-style:solid;text-decoration-color:#ADD8E6;text-decoration-line:underline;text-underline-position:under;color:#333333;font-variant-numeric:tabular-nums;padding-left:4px;margin-right:5px;padding-right:2px;">The <code>penguins</code> dataset from the <strong>palmerpenguins</strong> pkg.</span></p>
513 | <div style="height:25px;margin-top:10px;"><span style="background-color:#F1D35A;color:#222222;padding:0.5em 0.5em;position:inherit;text-transform:uppercase;margin:5px 0px 5px 5px;font-weight:bold;border:solid 1px #F1D35A;padding:2px 15px 2px 15px;font-size:smaller;">tibble</span>
514 | <span style="background-color:none;color:#222222;padding:0.5em 0.5em;position:inherit;margin:5px 10px 5px -4px;font-weight:bold;border:solid 1px #F1D35A;padding:2px 15px 2px 15px;font-size:smaller;">penguins</span><span style="background-color:#eecbff;color:#333333;padding:0.5em 0.5em;position:inherit;text-transform:uppercase;margin:5px 0px 5px 5px;font-weight:bold;border:solid 1px #eecbff;padding:2px 15px 2px 15px;font-size:smaller;">Rows</span>
515 | <span style="background-color:none;color:#333333;padding:0.5em 0.5em;position:inherit;margin:5px 0px 5px -4px;font-weight:bold;border:solid 1px #eecbff;padding:2px 15px 2px 15px;font-size:smaller;">344</span>
516 | <span style="background-color:#BDE7B4;color:#333333;padding:0.5em 0.5em;position:inherit;text-transform:uppercase;margin:5px 0px 5px 1px;font-weight:bold;border:solid 1px #BDE7B4;padding:2px 15px 2px 15px;font-size:smaller;">Columns</span>
517 | <span style="background-color:none;color:#333333;padding:0.5em 0.5em;position:inherit;margin:5px 0px 5px -4px;font-weight:bold;border:solid 1px #BDE7B4;padding:2px 15px 2px 15px;font-size:smaller;">8</span></div>
518 | </td>
519 |     </tr>
520 |   </thead>
521 |   
522 |   <tbody class="gt_table_body">
523 |     <tr class="gt_group_heading_row">
524 |       <th colspan="1" class="gt_group_heading" style="color: #444444; font-weight: 500; text-transform: uppercase; background-color: #FCFCFC; border-bottom-width: 1; border-bottom-style: solid; border-bottom-color: #EFEFEF;" scope="col" id="Columns">Columns</th>
525 |     </tr>
526 |     <tr class="gt_row_group_first"><td headers="Columns  item" class="gt_row gt_left" style="color: #666666; font-size: smaller; border-top-width: 1; border-top-style: solid; border-top-color: #EFEFEF;"><div class='gt_from_md'><p><code style="margin-bottom:5px;color:#555555;font-weight:500;line-height:2em;border:solid 1px #499FFE;padding:2px 8px 3px 8px;background-color:#FAFAFA;">species</code>  <code class="pb_col_type">factor</code></p>
527 | <p><strong class="pb_sub_label">INFO</strong> A factor denoting penguin species (<em>Adélie</em>, <em>Chinstrap</em>, and <em>Gentoo</em>).</p>
528 | </div></td></tr>
529 |     <tr><td headers="Columns  item" class="gt_row gt_left" style="color: #666666; font-size: smaller; border-top-width: 1; border-top-style: solid; border-top-color: #EFEFEF;"><div class='gt_from_md'><p><code style="margin-bottom:5px;color:#555555;font-weight:500;line-height:2em;border:solid 1px #499FFE;padding:2px 8px 3px 8px;background-color:#FAFAFA;">island</code>  <code class="pb_col_type">factor</code></p>
530 | <p><strong class="pb_sub_label">INFO</strong> A factor denoting island in Palmer Archipelago, Antarctica
531 | (<em>Biscoe</em>, <em>Dream</em>, or <em>Torgersen</em>).</p>
532 | </div></td></tr>
533 |     <tr><td headers="Columns  item" class="gt_row gt_left" style="color: #666666; font-size: smaller; border-top-width: 1; border-top-style: solid; border-top-color: #EFEFEF;"><div class='gt_from_md'><p><code style="margin-bottom:5px;color:#555555;font-weight:500;line-height:2em;border:solid 1px #499FFE;padding:2px 8px 3px 8px;background-color:#FAFAFA;">bill_length_mm</code>  <code class="pb_col_type">numeric</code></p>
534 | <p><strong class="pb_sub_label">INFO</strong> A number denoting bill length (in units of millimeters).</p>
535 | </div></td></tr>
536 |     <tr><td headers="Columns  item" class="gt_row gt_left" style="color: #666666; font-size: smaller; border-top-width: 1; border-top-style: solid; border-top-color: #EFEFEF;"><div class='gt_from_md'><p><code style="margin-bottom:5px;color:#555555;font-weight:500;line-height:2em;border:solid 1px #499FFE;padding:2px 8px 3px 8px;background-color:#FAFAFA;">bill_depth_mm</code>  <code class="pb_col_type">numeric</code></p>
537 | <p><strong class="pb_sub_label">INFO</strong> A number denoting bill depth (in units of millimeters).</p>
538 | </div></td></tr>
539 |     <tr><td headers="Columns  item" class="gt_row gt_left" style="color: #666666; font-size: smaller; border-top-width: 1; border-top-style: solid; border-top-color: #EFEFEF;"><div class='gt_from_md'><p><code style="margin-bottom:5px;color:#555555;font-weight:500;line-height:2em;border:solid 1px #499FFE;padding:2px 8px 3px 8px;background-color:#FAFAFA;">flipper_length_mm</code>  <code class="pb_col_type">integer</code></p>
540 | <p><strong class="pb_sub_label">INFO</strong> An integer denoting flipper length (in units of millimeters).</p>
541 | </div></td></tr>
542 |     <tr><td headers="Columns  item" class="gt_row gt_left" style="color: #666666; font-size: smaller; border-top-width: 1; border-top-style: solid; border-top-color: #EFEFEF;"><div class='gt_from_md'><p><code style="margin-bottom:5px;color:#555555;font-weight:500;line-height:2em;border:solid 1px #499FFE;padding:2px 8px 3px 8px;background-color:#FAFAFA;">body_mass_g</code>  <code class="pb_col_type">integer</code></p>
543 | <p><strong class="pb_sub_label">INFO</strong> An integer denoting body mass (grams).</p>
544 | </div></td></tr>
545 |     <tr><td headers="Columns  item" class="gt_row gt_left" style="color: #666666; font-size: smaller; border-top-width: 1; border-top-style: solid; border-top-color: #EFEFEF;"><div class='gt_from_md'><p><code style="margin-bottom:5px;color:#555555;font-weight:500;line-height:2em;border:solid 1px #499FFE;padding:2px 8px 3px 8px;background-color:#FAFAFA;">sex</code>  <code class="pb_col_type">factor</code></p>
546 | <p><strong class="pb_sub_label">INFO</strong> A factor denoting penguin sex (<code>&quot;female&quot;</code>, <code>&quot;male&quot;</code>).</p>
547 | </div></td></tr>
548 |     <tr><td headers="Columns  item" class="gt_row gt_left" style="color: #666666; font-size: smaller; border-top-width: 1; border-top-style: solid; border-top-color: #EFEFEF;"><div class='gt_from_md'><p><code style="margin-bottom:5px;color:#555555;font-weight:500;line-height:2em;border:solid 1px #499FFE;padding:2px 8px 3px 8px;background-color:#FAFAFA;">year</code>  <code class="pb_col_type">integer</code></p>
549 | <p><strong class="pb_sub_label">INFO</strong> The study year (e.g., <code>2007</code>, <code>2008</code>, <code>2009</code>).</p>
550 | </div></td></tr>
551 |     <tr class="gt_group_heading_row">
552 |       <th colspan="1" class="gt_group_heading" style="color: #444444; font-weight: 500; text-transform: uppercase; background-color: #FCFCFC; border-bottom-width: 1; border-bottom-style: solid; border-bottom-color: #EFEFEF;" scope="col" id="source">source</th>
553 |     </tr>
554 |     <tr class="gt_row_group_first"><td headers="source  item" class="gt_row gt_left" style="color: #666666; font-size: smaller; border-top-width: 1; border-top-style: solid; border-top-color: #EFEFEF;"><div class='gt_from_md'><h4 style="margin-bottom:5px;">REFERENCES</h4>
555 | <p>Adélie penguins: Palmer Station Antarctica LTER and K. Gorman. 2020. Structural
556 | size measurements and isotopic signatures of foraging among adult male and female
557 | Adélie penguins (Pygoscelis adeliae) nesting along the Palmer Archipelago near
558 | Palmer Station, 2007-2009 ver 5. Environmental Data Initiative
559 | <a href="https://doi.org/10.6073/pasta/98b16d7d563f265cb52372c8ca99e60f">https://doi.org/10.6073/pasta/98b16d7d563f265cb52372c8ca99e60f</a></p>
560 | <p>Gentoo penguins: Palmer Station Antarctica LTER and K. Gorman. 2020. Structural
561 | size measurements and isotopic signatures of foraging among adult male and female
562 | Gentoo penguin (Pygoscelis papua) nesting along the Palmer Archipelago near Palmer
563 | Station, 2007-2009 ver 5. Environmental Data Initiative
564 | <a href="https://doi.org/10.6073/pasta/7fca67fb28d56ee2ffa3d9370ebda689">https://doi.org/10.6073/pasta/7fca67fb28d56ee2ffa3d9370ebda689</a></p>
565 | <p>Chinstrap penguins: Palmer Station Antarctica LTER and K. Gorman. 2020.
566 | Structural size measurements and isotopic signatures of foraging among adult male
567 | and female Chinstrap penguin (Pygoscelis antarcticus) nesting along the Palmer
568 | Archipelago near Palmer Station, 2007-2009 ver 6. Environmental Data Initiative
569 | <a href="https://doi.org/10.6073/pasta/c14dfcfada8ea13a17536e73eb6fbe9e">https://doi.org/10.6073/pasta/c14dfcfada8ea13a17536e73eb6fbe9e</a></p>
570 | </div></td></tr>
571 |     <tr><td headers="source  item" class="gt_row gt_left" style="color: #666666; font-size: smaller; border-top-width: 1; border-top-style: solid; border-top-color: #EFEFEF;"><div class='gt_from_md'><h4 style="margin-bottom:5px;">NOTE</h4>
572 | <p>Originally published in: Gorman KB, Williams TD, Fraser WR (2014) Ecological Sexual
573 | Dimorphism and Environmental Variability within a Community of Antarctic Penguins
574 | (Genus Pygoscelis). PLoS ONE 9(3): e90081. doi:10.1371/journal.pone.0090081</p>
575 | </div></td></tr>
576 |   </tbody>
577 |   <tfoot class="gt_sourcenotes">
578 |     <tr>
579 |       <td class="gt_sourcenote" colspan="1"><span style="background-color:#FFF;color:#444;padding:0.5em 0.5em;position:inherit;text-transform:uppercase;margin-left:10px;border:solid 1px #999999;font-variant-numeric:tabular-nums;border-radius:0;padding:2px 10px 2px 10px;padding:2px 10px 2px 10px;">2022-10-13 15:52:30 EDT</span>
580 | <span style="background-color:#FFF;color:#444;padding:0.5em 0.5em;position:inherit;margin:5px 1px 5px 0;border:solid 1px #999999;font-variant-numeric:tabular-nums;border-radius:0;padding:2px 10px 2px 10px;">&lt; 1 s</span>
581 | <span style="background-color:#FFF;color:#444;padding:0.5em 0.5em;position:inherit;text-transform:uppercase;margin:5px 1px 5px -1px;border:solid 1px #999999;font-variant-numeric:tabular-nums;border-radius:0;padding:2px 10px 2px 10px;">2022-10-13 15:52:30 EDT</span></td>
582 |     </tr>
583 |   </tfoot>
584 |   
585 | </table>
586 | </div>
587 | </body>
588 | </html>
589 | 


--------------------------------------------------------------------------------
/pointblank-workshop.Rproj:
--------------------------------------------------------------------------------
 1 | Version: 1.0
 2 | 
 3 | RestoreWorkspace: Default
 4 | SaveWorkspace: Default
 5 | AlwaysSaveHistory: Default
 6 | 
 7 | EnableCodeIndexing: Yes
 8 | UseSpacesForTab: Yes
 9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 | 
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 | 
15 | AutoAppendNewline: Yes
16 | StripTrailingWhitespace: Yes
17 | 


--------------------------------------------------------------------------------
/save_multiple_agents_to_disk.R:
--------------------------------------------------------------------------------
  1 | library(pointblank)
  2 | 
  3 | al <-
  4 |   action_levels(
  5 |     warn_at = 0.05,
  6 |     stop_at = 0.25,
  7 |     notify_at = 0.35
  8 |   )
  9 | 
 10 | agent_1 <-
 11 |   create_agent(
 12 |     tbl = ~ small_table,
 13 |     tbl_name = "small_table",
 14 |     label = "Daily check of `small_table`.",
 15 |     actions = al
 16 |   ) %>%
 17 |   col_vals_gt(vars(date_time), vars(date), na_pass = TRUE) %>%
 18 |   col_vals_gt(vars(b), vars(g), na_pass = TRUE) %>%
 19 |   rows_distinct() %>%
 20 |   col_vals_gt(vars(d), 100) %>%
 21 |   col_vals_equal(vars(d), vars(d), na_pass = TRUE) %>%
 22 |   col_vals_between(vars(c), left = vars(a), right = vars(d), na_pass = TRUE) %>%
 23 |   col_vals_not_between(vars(c), left = 10, right = 20, na_pass = TRUE) %>%
 24 |   rows_distinct(vars(d, e, f)) %>%
 25 |   col_is_integer(vars(a)) %>%
 26 |   interrogate()
 27 | 
 28 | x_write_disk(
 29 |   agent_1,
 30 |   filename = "agent-small_table_2022-10-14",
 31 |   path = "small_table_tests"
 32 | )
 33 | 
 34 | agent_2 <-
 35 |   create_agent(
 36 |     tbl = ~ small_table,
 37 |     tbl_name = "small_table",
 38 |     label = "Daily check of `small_table`.",
 39 |     actions = al
 40 |   ) %>%
 41 |   col_exists(vars(date, date_time)) %>%
 42 |   col_vals_regex(
 43 |     vars(b), "[0-9]-[a-z]{3}-[0-9]{3}",
 44 |     active = FALSE
 45 |   ) %>%
 46 |   rows_distinct() %>%
 47 |   interrogate()
 48 | 
 49 | x_write_disk(
 50 |   agent_2,
 51 |   filename = "agent-small_table_2022-10-15",
 52 |   path = "small_table_tests"
 53 | )
 54 | 
 55 | agent_3 <-
 56 |   create_agent(
 57 |     tbl = ~ small_table,
 58 |     tbl_name = "small_table",
 59 |     label = "Daily check of `small_table`.",
 60 |     actions = al
 61 |   ) %>%
 62 |   rows_distinct() %>%
 63 |   col_vals_gt(vars(d), 100) %>%
 64 |   col_vals_lte(vars(c), 5) %>%
 65 |   col_vals_equal(
 66 |     vars(d), vars(d),
 67 |     na_pass = TRUE
 68 |   ) %>%
 69 |   col_vals_in_set(
 70 |     vars(f),
 71 |     set = c("low", "mid", "high")
 72 |   ) %>%
 73 |   col_vals_between(
 74 |     vars(c),
 75 |     left = vars(a), right = vars(d),
 76 |     na_pass = TRUE
 77 |   ) %>%
 78 |   interrogate()
 79 | 
 80 | x_write_disk(
 81 |   agent_3,
 82 |   filename = "agent-small_table_2022-10-16",
 83 |   path = "small_table_tests"
 84 | )
 85 | 
 86 | agent_4 <-
 87 |   create_agent(
 88 |     tbl = ~ small_table,
 89 |     tbl_name = "small_table",
 90 |     label = "Daily check of `small_table`.",
 91 |     actions = al
 92 |   ) %>%
 93 |   col_vals_gt(vars(date_time), vars(date), na_pass = TRUE) %>%
 94 |   interrogate()
 95 | 
 96 | x_write_disk(
 97 |   agent_4,
 98 |   filename = "agent-small_table_2022-10-17",
 99 |   path = "small_table_tests"
100 | )
101 | 


--------------------------------------------------------------------------------
/small_table_tests/agent-small_table_2022-10-13:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rich-iannone/pointblank-workshop/989c61db0e2915c9b1a39de2b2a137b24d9bf27c/small_table_tests/agent-small_table_2022-10-13


--------------------------------------------------------------------------------
/small_table_tests/agent-small_table_2022-10-14:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rich-iannone/pointblank-workshop/989c61db0e2915c9b1a39de2b2a137b24d9bf27c/small_table_tests/agent-small_table_2022-10-14


--------------------------------------------------------------------------------
/small_table_tests/agent-small_table_2022-10-15:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rich-iannone/pointblank-workshop/989c61db0e2915c9b1a39de2b2a137b24d9bf27c/small_table_tests/agent-small_table_2022-10-15


--------------------------------------------------------------------------------
/small_table_tests/agent-small_table_2022-10-16:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rich-iannone/pointblank-workshop/989c61db0e2915c9b1a39de2b2a137b24d9bf27c/small_table_tests/agent-small_table_2022-10-16


--------------------------------------------------------------------------------
/small_table_tests/agent-small_table_2022-10-17:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rich-iannone/pointblank-workshop/989c61db0e2915c9b1a39de2b2a137b24d9bf27c/small_table_tests/agent-small_table_2022-10-17


--------------------------------------------------------------------------------
/storms-validation.R:
--------------------------------------------------------------------------------
  1 | library(pointblank)
  2 | 
  3 | agent <-
  4 |   create_agent(
  5 |     tbl = ~dplyr::storms,
  6 |     actions = action_levels(
  7 |       warn_at = 0.05,
  8 |       stop_at = 0.10
  9 |     ),
 10 |     tbl_name = "storms",
 11 |     label = "Validation plan generated by `draft_validation()`."
 12 |   ) %>%
 13 |   # Expect that column `name` is of type: character
 14 |   col_is_character(
 15 |     columns = vars(name)
 16 |   ) %>%
 17 |   # Expect that column `year` is of type: numeric
 18 |   col_is_numeric(
 19 |     columns = vars(year)
 20 |   ) %>%
 21 |   # Expect that values in `year` should be between `1975` and `2020`
 22 |   col_vals_between(
 23 |     columns = vars(year),
 24 |     left = 1975,
 25 |     right = 2020
 26 |   ) %>%
 27 |   # Expect that column `month` is of type: numeric
 28 |   col_is_numeric(
 29 |     columns = vars(month)
 30 |   ) %>%
 31 |   # Expect that values in `month` should be between `1` and `12`
 32 |   col_vals_between(
 33 |     columns = vars(month),
 34 |     left = 1,
 35 |     right = 12
 36 |   ) %>%
 37 |   # Expect that column `day` is of type: integer
 38 |   col_is_integer(
 39 |     columns = vars(day)
 40 |   ) %>%
 41 |   # Expect that values in `day` should be between `1` and `31`
 42 |   col_vals_between(
 43 |     columns = vars(day),
 44 |     left = 1,
 45 |     right = 31
 46 |   ) %>%
 47 |   # Expect that column `hour` is of type: numeric
 48 |   col_is_numeric(
 49 |     columns = vars(hour)
 50 |   ) %>%
 51 |   # Expect that values in `hour` should be between `0` and `23`
 52 |   col_vals_between(
 53 |     columns = vars(hour),
 54 |     left = 0,
 55 |     right = 23
 56 |   ) %>%
 57 |   # Expect that column `lat` is of type: numeric
 58 |   col_is_numeric(
 59 |     columns = vars(lat)
 60 |   ) %>%
 61 |   # Expect that values in `lat` should be between `-90` and `90`
 62 |   col_vals_between(
 63 |     columns = vars(lat),
 64 |     left = -90,
 65 |     right = 90
 66 |   ) %>%
 67 |   # Expect that column `long` is of type: numeric
 68 |   col_is_numeric(
 69 |     columns = vars(long)
 70 |   ) %>%
 71 |   # Expect that values in `long` should be between `-180` and `180`
 72 |   col_vals_between(
 73 |     columns = vars(long),
 74 |     left = -180,
 75 |     right = 180
 76 |   ) %>%
 77 |   # Expect that column `status` is of type: character
 78 |   col_is_character(
 79 |     columns = vars(status)
 80 |   ) %>%
 81 |   # Expect that column `category` is of type: factor
 82 |   col_is_factor(
 83 |     columns = vars(category)
 84 |   ) %>%
 85 |   # Expect that column `wind` is of type: integer
 86 |   col_is_integer(
 87 |     columns = vars(wind)
 88 |   ) %>%
 89 |   # Expect that values in `wind` should be between `10` and `160`
 90 |   col_vals_between(
 91 |     columns = vars(wind),
 92 |     left = 10,
 93 |     right = 160
 94 |   ) %>%
 95 |   # Expect that column `pressure` is of type: integer
 96 |   col_is_integer(
 97 |     columns = vars(pressure)
 98 |   ) %>%
 99 |   # Expect that values in `pressure` should be between `882` and `1022`
100 |   col_vals_between(
101 |     columns = vars(pressure),
102 |     left = 882,
103 |     right = 1022
104 |   ) %>%
105 |   # Expect that column `tropicalstorm_force_diameter` is of type: integer
106 |   col_is_integer(
107 |     columns = vars(tropicalstorm_force_diameter)
108 |   ) %>%
109 |   # Expect that values in `tropicalstorm_force_diameter` should be between `0` and `870`
110 |   col_vals_between(
111 |     columns = vars(tropicalstorm_force_diameter),
112 |     left = 0,
113 |     right = 870,
114 |     na_pass = TRUE
115 |   ) %>%
116 |   # Expect that column `hurricane_force_diameter` is of type: integer
117 |   col_is_integer(
118 |     columns = vars(hurricane_force_diameter)
119 |   ) %>%
120 |   # Expect that values in `hurricane_force_diameter` should be between `0` and `300`
121 |   col_vals_between(
122 |     columns = vars(hurricane_force_diameter),
123 |     left = 0,
124 |     right = 300,
125 |     na_pass = TRUE
126 |   ) %>%
127 |   # Expect entirely distinct rows across all columns
128 |   rows_distinct() %>%
129 |   # Expect that column schemas match
130 |   col_schema_match(
131 |     schema = col_schema(
132 |       name = "character",
133 |       year = "numeric",
134 |       month = "numeric",
135 |       day = "integer",
136 |       hour = "numeric",
137 |       lat = "numeric",
138 |       long = "numeric",
139 |       status = "character",
140 |       category = c("ordered", "factor"),
141 |       wind = "integer",
142 |       pressure = "integer",
143 |       tropicalstorm_force_diameter = "integer",
144 |       hurricane_force_diameter = "integer"
145 |     )
146 |   ) %>%
147 |   interrogate()
148 | 
149 | agent
150 | 


--------------------------------------------------------------------------------
/test-game_revenue.R:
--------------------------------------------------------------------------------
  1 | # Generated by pointblank
  2 | 
  3 | library(pointblank)
  4 | 
  5 | tbl <- pointblank::game_revenue
  6 | 
  7 | test_that("column `player_id` is of type: character", {
  8 | 
  9 |   expect_col_is_character(
 10 |     tbl,
 11 |     columns = vars(player_id),
 12 |     threshold = 1
 13 |   )
 14 | })
 15 | 
 16 | test_that("column `session_id` is of type: character", {
 17 | 
 18 |   expect_col_is_character(
 19 |     tbl,
 20 |     columns = vars(session_id),
 21 |     threshold = 1
 22 |   )
 23 | })
 24 | 
 25 | test_that("column `item_type` is of type: character", {
 26 | 
 27 |   expect_col_is_character(
 28 |     tbl,
 29 |     columns = vars(item_type),
 30 |     threshold = 1
 31 |   )
 32 | })
 33 | 
 34 | test_that("column `item_name` is of type: character", {
 35 | 
 36 |   expect_col_is_character(
 37 |     tbl,
 38 |     columns = vars(item_name),
 39 |     threshold = 1
 40 |   )
 41 | })
 42 | 
 43 | test_that("column `item_revenue` is of type: numeric", {
 44 | 
 45 |   expect_col_is_numeric(
 46 |     tbl,
 47 |     columns = vars(item_revenue),
 48 |     threshold = 1
 49 |   )
 50 | })
 51 | 
 52 | test_that("values in `item_revenue` should be between `0.004` and `142.989`", {
 53 | 
 54 |   expect_col_vals_between(
 55 |     tbl,
 56 |     columns = vars(item_revenue),
 57 |     left = 0.004,
 58 |     right = 142.989,
 59 |     threshold = 0.1
 60 |   )
 61 | })
 62 | 
 63 | test_that("column `session_duration` is of type: numeric", {
 64 | 
 65 |   expect_col_is_numeric(
 66 |     tbl,
 67 |     columns = vars(session_duration),
 68 |     threshold = 1
 69 |   )
 70 | })
 71 | 
 72 | test_that("values in `session_duration` should be between `3.2` and `41`", {
 73 | 
 74 |   expect_col_vals_between(
 75 |     tbl,
 76 |     columns = vars(session_duration),
 77 |     left = 3.2,
 78 |     right = 41,
 79 |     threshold = 0.1
 80 |   )
 81 | })
 82 | 
 83 | test_that("column `acquisition` is of type: character", {
 84 | 
 85 |   expect_col_is_character(
 86 |     tbl,
 87 |     columns = vars(acquisition),
 88 |     threshold = 1
 89 |   )
 90 | })
 91 | 
 92 | test_that("column `country` is of type: character", {
 93 | 
 94 |   expect_col_is_character(
 95 |     tbl,
 96 |     columns = vars(country),
 97 |     threshold = 1
 98 |   )
 99 | })
100 | 
101 | test_that("values in `country` should be in the set of `Germany`, `Canada`, `South Korea` (and 20 more)", {
102 | 
103 |   expect_col_vals_in_set(
104 |     tbl,
105 |     columns = vars(country),
106 |     set = c("Germany", "Canada", "South Korea", "Sweden", "Austria", "Hong Kong", "United States", "Mexico", "Egypt", "Denmark", "Norway", "Japan", "Australia", "South Africa", "Spain", "France", "Portugal", "Russia", "India", "Switzerland", "China", "Philippines", "United Kingdom"),
107 |     threshold = 0.1
108 |   )
109 | })
110 | 
111 | test_that("entirely distinct rows across all columns", {
112 | 
113 |   expect_rows_distinct(tbl)
114 | })
115 | 
116 | test_that("column schemas match", {
117 | 
118 |   expect_col_schema_match(
119 |     tbl,
120 |     schema = col_schema(
121 |       player_id = "character",
122 |       session_id = "character",
123 |       session_start = c("POSIXct", "POSIXt"),
124 |       time = c("POSIXct", "POSIXt"),
125 |       item_type = "character",
126 |       item_name = "character",
127 |       item_revenue = "numeric",
128 |       session_duration = "numeric",
129 |       start_day = "Date",
130 |       acquisition = "character",
131 |       country = "character"
132 |     ),
133 |     threshold = 1
134 |   )
135 | })
136 | 


--------------------------------------------------------------------------------