├── .github ├── CODE-OF-CONDUCT.md └── CONTRIBUTING.md ├── .vscode └── settings.json ├── README.md └── images ├── banner.png └── banner.sh /.github/CODE-OF-CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Code of Conduct 2 | 3 | As contributors and maintainers of this project, we pledge to respect all people who 4 | contribute through reporting issues, posting feature requests, updating documentation, 5 | submitting pull requests or patches, and other activities. 6 | 7 | We are committed to making participation in this project a harassment-free experience for 8 | everyone, regardless of level of experience, gender, gender identity and expression, 9 | sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion. 10 | 11 | Examples of unacceptable behavior by participants include the use of sexual language or 12 | imagery, derogatory comments or personal attacks, trolling, public or private harassment, 13 | insults, or other unprofessional conduct. 14 | 15 | Project maintainers have the right and responsibility to remove, edit, or reject comments, 16 | commits, code, wiki edits, issues, and other contributions that are not aligned to this 17 | Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed 18 | from the project team. 19 | 20 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by 21 | opening an issue or contacting one or more of the project maintainers. 22 | 23 | This Code of Conduct is adapted from the Contributor Covenant 24 | (https://contributor-covenant.org), version 1.0.0, available at 25 | https://contributor-covenant.org/version/1/0/0/. 26 | -------------------------------------------------------------------------------- /.github/CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to r-base-shortcuts 2 | 3 | :+1::tada: First off, thanks for taking your time to contribute! :tada::+1: 4 | 5 | You could contribute to this project by: 6 | 7 | 1. Filing a report or request in an issue. 8 | 2. Suggesting a change via a pull request. 9 | 10 | ## Pull requests 11 | 12 | To suggest a change via pull requests, please: 13 | 14 | 1. Fork the repository into your GitHub account. 15 | 2. Clone the forked repository to local machine, make the changes. 16 | 3. Commit and push the changes to GitHub. Create a pull request. 17 | 18 | ## Table of contents 19 | 20 | To contribute new shortcuts or edit the existing headings, install the 21 | VSCode extension [Markdown All in One](https://marketplace.visualstudio.com/items?itemName=yzhang.markdown-all-in-one) first. 22 | It helps update the table of contents automatically. 23 | 24 | ## R code style 25 | 26 | All R code should be formatted using [styler](https://styler.r-lib.org/). 27 | -------------------------------------------------------------------------------- /.vscode/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "markdown.extension.toc.levels": "2..6", 3 | "markdown.extension.toc.omittedFromToc": { 4 | "README.md": [ 5 | "## Contents" 6 | ] 7 | }, 8 | } -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # r-base-shortcuts 2 | 3 | 4 |

5 | 6 | A collection of lesser-known but powerful base R idioms and shortcuts 7 | for writing concise and fast base R code, useful for beginner level to 8 | intermediate level R developers. 9 | 10 | Please help me improve and extend this list. 11 | See [contributing guide](.github/CONTRIBUTING.md) 12 | and [code of conduct](.github/CODE-OF-CONDUCT.md). 13 | 14 | > Why? 15 | > 16 | > From 2012 to 2022, I answered thousands of R questions in the 17 | > online community [Capital of Statistics](https://d.cosx.org/). 18 | > These recipes are observed and digested from the recurring patterns 19 | > I learned from the frequently asked questions with less common answers. 20 | 21 | ## Contents 22 | 23 | - [Object creation](#object-creation) 24 | - [Create sequences with `seq_len()` and `seq_along()`](#create-sequences-with-seq_len-and-seq_along) 25 | - [Repeat character strings with `strrep()`](#repeat-character-strings-with-strrep) 26 | - [Create an empty list of a given length](#create-an-empty-list-of-a-given-length) 27 | - [Create and assigning S3 classes in one step](#create-and-assigning-s3-classes-in-one-step) 28 | - [Assign names to vector elements or data frame columns at creation](#assign-names-to-vector-elements-or-data-frame-columns-at-creation) 29 | - [Use `I()` to include objects as is in data frames](#use-i-to-include-objects-as-is-in-data-frames) 30 | - [Generate factors using `gl()`](#generate-factors-using-gl) 31 | - [Object transformation](#object-transformation) 32 | - [Insert elements into a vector with `append()`](#insert-elements-into-a-vector-with-append) 33 | - [Modify data frames with `transform()`](#modify-data-frames-with-transform) 34 | - [Modify data frames with `within()`](#modify-data-frames-with-within) 35 | - [Use `[` and `[[` as functions in apply calls](#use--and--as-functions-in-apply-calls) 36 | - [Sum all components in a list](#sum-all-components-in-a-list) 37 | - [Bind multiple data frames in a list](#bind-multiple-data-frames-in-a-list) 38 | - [Use `modifyList()` to update a list](#use-modifylist-to-update-a-list) 39 | - [Use `aperm()` and `asplit()` to permute and split arrays](#use-aperm-and-asplit-to-permute-and-split-arrays) 40 | - [Run-length encoding](#run-length-encoding) 41 | - [Conditions](#conditions) 42 | - [Use `inherits()` for class checking](#use-inherits-for-class-checking) 43 | - [Replace multiple `ifelse()` with `cut()`](#replace-multiple-ifelse-with-cut) 44 | - [Simplify recoding categorical values with `factor()`](#simplify-recoding-categorical-values-with-factor) 45 | - [Save the number of `if` conditions with upcasting](#save-the-number-of-if-conditions-with-upcasting) 46 | - [Use `findInterval()` for many breakpoints](#use-findinterval-for-many-breakpoints) 47 | - [Vectorization](#vectorization) 48 | - [Use `match()` for fast lookups](#use-match-for-fast-lookups) 49 | - [Use environments as fast key-value stores for fast lookups](#use-environments-as-fast-key-value-stores-for-fast-lookups) 50 | - [Use `mapply()` for element-wise operations on multiple lists](#use-mapply-for-element-wise-operations-on-multiple-lists) 51 | - [Simplify element-wise min and max operations with `pmin()` and `pmax()`](#simplify-element-wise-min-and-max-operations-with-pmin-and-pmax) 52 | - [Apply a function to all combinations of parameters](#apply-a-function-to-all-combinations-of-parameters) 53 | - [Generate all possible combinations of given characters](#generate-all-possible-combinations-of-given-characters) 54 | - [Vectorize a function with `Vectorize()`](#vectorize-a-function-with-vectorize) 55 | - [Pairwise computations using `outer()`](#pairwise-computations-using-outer) 56 | - [Subtract column means from non-zero elements in a sparse matrix](#subtract-column-means-from-non-zero-elements-in-a-sparse-matrix) 57 | - [Functions](#functions) 58 | - [Specify formal argument lists with `alist()`](#specify-formal-argument-lists-with-alist) 59 | - [Use internal functions without `:::`](#use-internal-functions-without-) 60 | - [Side-effects](#side-effects) 61 | - [Return invisibly with `invisible()` for side-effect functions](#return-invisibly-with-invisible-for-side-effect-functions) 62 | - [Use `on.exit()` for cleanup](#use-onexit-for-cleanup) 63 | - [Numerical computations](#numerical-computations) 64 | - [Create step functions with `stepfun()`](#create-step-functions-with-stepfun) 65 | - [Further reading](#further-reading) 66 | 67 | ## Object creation 68 | 69 | ### Create sequences with `seq_len()` and `seq_along()` 70 | 71 | `seq_len()` and `seq_along()` are safer than `1:length(x)` or `1:nrow(x)` 72 | because they avoid the unexpected result when `x` is of length `0`: 73 | 74 | ```r 75 | # Safe version of 1:length(x) 76 | seq_len(length(x)) 77 | # Safe version of 1:length(x) 78 | seq_along(x) 79 | ``` 80 | 81 | ### Repeat character strings with `strrep()` 82 | 83 | When you need to repeat a string a certain number of times, instead of using 84 | the tedious pattern of `paste(rep("foo", 10), collapse = "")`, you can use 85 | the `strrep()` function: 86 | 87 | ```r 88 | strrep("foo", 10) 89 | ``` 90 | 91 | `strrep()` is vectorized, meaning that you can pass vectors as arguments and 92 | it will return a vector of the same length as the first argument: 93 | 94 | ```r 95 | fruits <- c("apple", "banana", "orange") 96 | strrep(c("*"), nchar(fruits)) 97 | strrep(c("-", "=", "**"), nchar(fruits)) 98 | ``` 99 | 100 | ### Create an empty list of a given length 101 | 102 | Use the `vector()` function to create an empty list of a specific length: 103 | 104 | ```r 105 | x <- vector("list", length) 106 | ``` 107 | 108 | ### Create and assigning S3 classes in one step 109 | 110 | Avoid creating an object and assigning its class separately. 111 | Instead, use the `structure()` function to do both at once: 112 | 113 | ```r 114 | x <- structure(list(), class = "my_class") 115 | ``` 116 | 117 | Instead of: 118 | 119 | ```r 120 | x <- list() 121 | class(x) <- "my_class" 122 | ``` 123 | 124 | This makes the code more concise when returning an object of a specific class. 125 | 126 | ### Assign names to vector elements or data frame columns at creation 127 | 128 | The `setNames()` function allows you to assign names to vector elements or 129 | data frame columns during creation: 130 | 131 | ```r 132 | x <- setNames(1:3, c("one", "two", "three")) 133 | x <- setNames(data.frame(...), c("names", "of", "columns")) 134 | ``` 135 | 136 | ### Use `I()` to include objects as is in data frames 137 | 138 | The `I()` function allows you to include objects as is when creating data frames: 139 | 140 | ```r 141 | df <- data.frame(x = I(list(1:10, letters))) 142 | df$x 143 | #> [[1]] 144 | #> [1] 1 2 3 4 5 6 7 8 9 10 145 | #> 146 | #> [[2]] 147 | #> [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" 148 | #> [14] "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z" 149 | ``` 150 | 151 | This creates a data frame with one column `x` that is a list of vectors. 152 | 153 | ### Generate factors using `gl()` 154 | 155 | Create a vector with specific levels with `gl()` by specifying the levels 156 | and the number of repetitions: 157 | 158 | ```r 159 | gl(n = 2, k = 5, labels = c("Low", "High")) 160 | #> [1] Low Low Low Low Low High High High High High 161 | #> Levels: Low High 162 | ``` 163 | 164 | The `gl()` function is particularly useful when setting up experiments 165 | or simulations that involve categorical variables. 166 | 167 | ## Object transformation 168 | 169 | ### Insert elements into a vector with `append()` 170 | 171 | When you need to insert elements into a vector at a specific position, 172 | use `append()`. It has an argument `after` that specifies the position after 173 | which the new elements should be inserted, defaulting to length of the vector 174 | being appended to. 175 | 176 | For example, To insert the numbers 4, 5, 6 between 1, 2, 3 and 7, 8, 9: 177 | 178 | ```r 179 | x <- c(1, 2, 3, 7, 8, 9) 180 | append(x, 4:6, after = 3) 181 | #> [1] 1 2 3 4 5 6 7 8 9 182 | ``` 183 | 184 | Without `append()`, the solution would be more verbose and less readable: 185 | 186 | ```r 187 | c(x[1:3], 4:6, x[4:length(x)]) 188 | #> [1] 1 2 3 4 5 6 7 8 9 189 | ``` 190 | 191 | When `after` is set to `0`, the new values are "appended" to the beginning of 192 | the input vector: 193 | 194 | ```r 195 | append(x, 4:6, after = 0) 196 | #> [1] 4 5 6 1 2 3 7 8 9 197 | ``` 198 | 199 | ### Modify data frames with `transform()` 200 | 201 | When adding new columns or modifying existing columns in a data frame, 202 | instead of assigning each column individually, use `transform()` to perform 203 | multiple transformations in a single step: 204 | 205 | ```r 206 | df <- data.frame(x = 1:5, y = 6:10) 207 | transform(df, z = x + y, y = y * 2, w = sqrt(x)) 208 | ``` 209 | 210 | This is more concise and readable compared to the alternative of 211 | multiple assignments and repeating `df$`: 212 | 213 | ```r 214 | df$z <- df$x + df$y 215 | df$y <- df$y * 2 216 | df$w <- sqrt(df$x) 217 | ``` 218 | 219 | ### Modify data frames with `within()` 220 | 221 | For more complex data transformations that involve multiple steps or 222 | intermediate variables, consider using the `within()` function 223 | (not to be confused with `with()`): 224 | 225 | ```r 226 | df <- data.frame(x = 1:5, y = 6:10) 227 | 228 | within(df, { 229 | y <- x / sum(x) 230 | z <- log(y) 231 | category <- ifelse(z > -2, "High", "Low") 232 | }) 233 | ``` 234 | 235 | Note that both `transform()` and `within()` return a modified copy of the 236 | original data frame and does not change the original data frame, 237 | unless you assign the result back. 238 | 239 | ### Use `[` and `[[` as functions in apply calls 240 | 241 | When you need to extract the same element from each item in a list or 242 | list-like object, you can leverage `[` and `[[` as functions 243 | (they actually are!) within `lapply()` and `sapply()` calls. 244 | 245 | Consider a list of named vectors: 246 | 247 | ```r 248 | lst <- list( 249 | item1 = c(a = 1, b = 2, c = 3), 250 | item2 = c(a = 4, b = 5, c = 6), 251 | item3 = c(a = 7, b = 8, c = 9) 252 | ) 253 | 254 | # Extract named element "a" using `[[` 255 | element_a <- sapply(lst, `[[`, "a") 256 | 257 | lst <- list( 258 | item1 = c(1, 2, 3), 259 | item2 = c(4, 5, 6), 260 | item3 = c(7, 8, 9) 261 | ) 262 | 263 | # Extract first element using `[` 264 | first_element <- sapply(lst, `[`, 1) 265 | ``` 266 | 267 | ### Sum all components in a list 268 | 269 | Use the `Reduce()` function with the infix function `+` to sum up all components 270 | in a list: 271 | 272 | ```r 273 | x <- Reduce("+", list) 274 | ``` 275 | 276 | ### Bind multiple data frames in a list 277 | 278 | The `do.call()` function with the `rbind` argument allows you to bind 279 | multiple data frames in a list into one data frame: 280 | 281 | ```r 282 | df_combined <- do.call("rbind", list_of_dfs) 283 | ``` 284 | 285 | Alternatively, more performant solutions for such operations are offered in 286 | `data.table::rbindlist()` and `dplyr::bind_rows()`. See 287 | [this article](https://rpubs.com/jimhester/rbind) for details. 288 | 289 | ### Use `modifyList()` to update a list 290 | 291 | The `modifyList()` function allows you to easily update values in a list 292 | without a verbose syntax: 293 | 294 | ```r 295 | old_list <- list(a = 1, b = 2, c = 3) 296 | new_vals <- list(a = 10, c = 30) 297 | new_list <- modifyList(defaults, new_vals) 298 | ``` 299 | 300 | This can be very useful for maintaining and updating a set of 301 | configuration parameters. 302 | 303 | ### Use `aperm()` and `asplit()` to permute and split arrays 304 | 305 | Use `aperm()` and `asplit()` to avoid nested for-loops in array manipulation. 306 | `aperm()` is the generalization of matrix transpose. `asplit()` can split along 307 | any array dimension. 308 | 309 | ```r 310 | arr <- array( 311 | 1:24, 312 | dim = c(2, 3, 4), 313 | dimnames = list( 314 | row = paste0("R", 1:2), 315 | col = paste0("C", 1:3), 316 | slice = paste0("S", 1:4) 317 | ) 318 | ) 319 | 320 | # Rearrange dimensions from (2 x 3 x 4) to (4 x 3 x 2) 321 | aperm(arr, perm = c(3, 2, 1)) 322 | 323 | # Split into a length-4 list of (2 x 3) matrices 324 | asplit(arr, MARGIN = 3) 325 | ``` 326 | 327 | ### Run-length encoding 328 | 329 | Run-length encoding is a simple form of data compression in which sequences 330 | of the same element are replaced by a single instance of the element followed 331 | by the number of times it appears in the sequence. 332 | 333 | Suppose you have a vector with many repeating elements: 334 | 335 | ```r 336 | x <- c(1, 1, 1, 2, 2, 3, 3, 3, 3, 2, 2, 2, 1, 1) 337 | ``` 338 | 339 | You can use `rle()` to compress this vector and decompress the result back 340 | into the original vector with `inverse.rle()`: 341 | 342 | ```r 343 | x <- c(1, 1, 1, 2, 2, 3, 3, 3, 3, 2, 2, 2, 1, 1) 344 | 345 | (y <- rle(x)) 346 | #> Run Length Encoding 347 | #> lengths: int [1:5] 3 2 4 3 2 348 | #> values : num [1:5] 1 2 3 2 1 349 | 350 | inverse.rle(y) 351 | #> [1] 1 1 1 2 2 3 3 3 3 2 2 2 1 1 352 | ``` 353 | 354 | ## Conditions 355 | 356 | ### Use `inherits()` for class checking 357 | 358 | Instead of using the `class()` function in conjunction with `==`, `!=`, 359 | or `%in%` operators to check if an object belongs to a certain class, 360 | use the `inherits()` function. 361 | 362 | ```r 363 | if (inherits(x, "class")) 364 | ``` 365 | 366 | This will return `TRUE` if "class" is one of the classes from which `x` inherits. 367 | This replaces the following more verbose forms: 368 | 369 | ```r 370 | if (class(x) == "class") 371 | ``` 372 | 373 | or 374 | 375 | ```r 376 | if (class(x) %in% c("class1", "class2")) 377 | ``` 378 | 379 | It is also more reliable because it checks for class inheritance, 380 | not just the first class name (R supports multiple classes for S3 and S4 objects). 381 | 382 | ### Replace multiple `ifelse()` with `cut()` 383 | 384 | For a series of range-based conditions, use `cut()` instead of chaining 385 | multiple `if-else` conditions or `ifelse()` calls: 386 | 387 | ```r 388 | categories <- cut( 389 | x, 390 | breaks = c(-Inf, 0, 10, Inf), 391 | labels = c("negative", "small", "large") 392 | ) 393 | ``` 394 | 395 | This assigns each element in `x` to the category that corresponds to the 396 | range it falls in. 397 | 398 | ### Simplify recoding categorical values with `factor()` 399 | 400 | When dealing with categorical variables, you might need to replace or 401 | recode certain levels. This can be achieved using chained `ifelse()` statements, 402 | but a more efficient and readable approach is to use the `factor()` function: 403 | 404 | ```r 405 | x <- c("M", "F", "F", NA) 406 | 407 | factor( 408 | x, 409 | levels = c("F", "M", NA), 410 | labels = c("Female", "Male", "Missing"), 411 | exclude = NULL # Include missing values in the levels 412 | ) 413 | ``` 414 | 415 | ### Save the number of `if` conditions with upcasting 416 | 417 | Sometimes, the number of conditions checked in multiple `if` statements 418 | can be reduced by cleverly using the fact that in R, 419 | `TRUE` is upcasted to `1` and `FALSE` to `0` in numeric contexts. 420 | This can be useful for selecting an index based on a set of conditions: 421 | 422 | ```r 423 | i <- (width >= 960) + (width >= 1140) + 1 424 | p <- p + facet_wrap(vars(class), ncol = c(1, 2, 4)[i]) 425 | ``` 426 | 427 | This does the same thing as the following code, but in a much more concise way: 428 | 429 | ```r 430 | if (width >= 1140) p <- p + facet_wrap(vars(class), ncol = 4) 431 | if (width >= 960 & width < 1140) p <- p + facet_wrap(vars(class), ncol = 2) 432 | if (width < 960) p <- p + facet_wrap(vars(class), ncol = 1) 433 | ``` 434 | 435 | This works because the condition checks in the parentheses result in a 436 | `TRUE` or `FALSE`, and when they are added together, they are 437 | upcasted to `1` or `0`. 438 | 439 | ### Use `findInterval()` for many breakpoints 440 | 441 | If you want to assign a variable to many different groups or intervals, 442 | instead of using a series of `if` statements, you can use the 443 | `findInterval()` function. Using the same example above: 444 | 445 | ```r 446 | breakpoints <- c(960, 1140) 447 | ncols <- c(1, 2, 4) 448 | i <- findInterval(width, breakpoints) + 1 449 | p <- p + facet_wrap(vars(class), ncol = ncols[i]) 450 | ``` 451 | 452 | The `findInterval()` function finds which interval each number in a 453 | given vector falls into and returns a vector of interval indices. 454 | It's a faster alternative when there are many breakpoints. 455 | 456 | ## Vectorization 457 | 458 | ### Use `match()` for fast lookups 459 | 460 | The `match()` function can be faster than `which()` for looking up 461 | values in a vector: 462 | 463 | ```r 464 | index <- match(value, my_vector) 465 | ``` 466 | 467 | This code sets `index` to the index of `value` in `my_vector`. 468 | 469 | ### Use environments as fast key-value stores for fast lookups 470 | 471 | Hashed environments created by `new.env(hash = TRUE)` can be used as fast 472 | key–value store (hash tables). 473 | 474 | Lookups (to check if a key exists) are effectively O(1) in a hashed environment 475 | versus O(N) when using a regular list with `names()`. 476 | This makes it a much faster and more memory-friendly choice than lists or 477 | named vectors for determining if "something already exists". 478 | 479 | ```r 480 | # Generate keys 481 | set.seed(42) 482 | 483 | n_keys <- 100000 484 | keys <- replicate(n_keys, paste0(sample(letters, 10, replace = TRUE), collapse = "")) 485 | 486 | # Store in a hashed environment 487 | hash_env <- new.env(hash = TRUE, size = n_keys) 488 | for (k in keys) hash_env[[k]] <- TRUE 489 | 490 | # Store in a named list 491 | my_list <- vector("list", length(keys)) 492 | names(my_list) <- keys 493 | 494 | # Benchmark 495 | n_tests <- 50000 496 | test_keys <- sample(keys, n_tests, replace = TRUE) 497 | 498 | system.time(for (k in test_keys) invisible(exists(k, envir = hash_env, inherits = FALSE))) 499 | # user system elapsed 500 | # 0.044 0.000 0.045 501 | system.time(for (k in test_keys) invisible(k %in% names(my_list))) 502 | # user system elapsed 503 | # 29.518 2.026 32.129 504 | ``` 505 | 506 | ### Use `mapply()` for element-wise operations on multiple lists 507 | 508 | `mapply()` applies a function over a set of lists in an element-wise fashion: 509 | 510 | ```r 511 | mapply(sum, list1, list2, list3) 512 | ``` 513 | 514 | ### Simplify element-wise min and max operations with `pmin()` and `pmax()` 515 | 516 | When comparing two or more vectors on an element-wise basis and get the 517 | minimum or maximum of each set of elements, use `pmin()` and `pmax()`. 518 | 519 | ```r 520 | vec1 <- c(1, 5, 3, 9, 5) 521 | vec2 <- c(4, 2, 8, 1, 7) 522 | 523 | # Instead of using sapply() or a loop: 524 | sapply(1:length(vec1), function(i) min(vec1[i], vec2[i])) 525 | sapply(1:length(vec1), function(i) max(vec1[i], vec2[i])) 526 | 527 | # Use pmin() and pmax() for a more concise and efficient solution: 528 | pmin(vec1, vec2) 529 | pmax(vec1, vec2) 530 | ``` 531 | 532 | `pmin()` and `pmax()` perform these operations much more efficiently than 533 | alternatives such as applying `min()` and `max()` in a loop or using `sapply()`. 534 | This can lead to a noticeable performance improvement when working with large vectors. 535 | 536 | ### Apply a function to all combinations of parameters 537 | 538 | Sometimes we need to run a function on every combination of a set of 539 | parameter values, for example, in grid search. We can use the combination of 540 | `expand.grid()`, `mapply()`, and `do.call()` + `rbind()` to accomplish this. 541 | 542 | Suppose we have a simple function that takes two parameters, `a` and `b`: 543 | 544 | ```r 545 | f <- function(a, b) { 546 | result <- a * b 547 | data.frame(a = a, b = b, result = result) 548 | } 549 | ``` 550 | 551 | Create a grid of `a` and `b` parameter values to evaluate: 552 | 553 | ```r 554 | params <- expand.grid(a = 1:3, b = 4:6) 555 | ``` 556 | 557 | We use `mapply()` to apply `f` to each row of our parameter grid. 558 | We will use `SIMPLIFY = FALSE` to keep the results as a list of data frames: 559 | 560 | ```r 561 | lst <- mapply(f, a = params$a, b = params$b, SIMPLIFY = FALSE) 562 | ``` 563 | 564 | Finally, we bind all the result data frames together into one final data frame: 565 | 566 | ```r 567 | do.call(rbind, lst) 568 | ``` 569 | 570 | ### Generate all possible combinations of given characters 571 | 572 | To generate all possible combinations of a given set of characters, 573 | `expand.grid()` and `do.call()` with `paste0()` can help. 574 | The following snippet produces all possible three-digit character 575 | strings consisting of both letters (lowercase) and numbers: 576 | 577 | ```r 578 | x <- c(letters, 0:9) 579 | do.call(paste0, expand.grid(x, x, x)) 580 | ``` 581 | 582 | Here, `expand.grid()` generates a data frame where each row is a unique 583 | combination of three elements from `x`. Then, `do.call(paste0, ...)` 584 | concatenates each combination together into a string. 585 | 586 | ### Vectorize a function with `Vectorize()` 587 | 588 | If a function is not natively vectorized (it has arguments that only take 589 | one value at a time), you can use `Vectorize()` to create a new function 590 | that accepts vector inputs: 591 | 592 | ```r 593 | f <- function(x) x^2 594 | lower <- c(1, 2, 3) 595 | upper <- c(4, 5, 6) 596 | 597 | integrate_vec <- Vectorize(integrate, vectorize.args = c("lower", "upper")) 598 | 599 | result <- integrate_vec(f, lower, upper) 600 | unlist(result["value", ]) 601 | ``` 602 | 603 | The `Vectorize()` function works internally by leveraging the `mapply()` 604 | function, which applies a function over two or more vectors or lists. 605 | 606 | ### Pairwise computations using `outer()` 607 | 608 | The `outer()` function is useful for applying a function to every pair of 609 | elements from two vectors. This can be particularly useful for U-statistics 610 | and other situations requiring pairwise computations. 611 | 612 | Consider two vectors of numeric values for which we wish to compute a 613 | custom function for each pair: 614 | 615 | ```r 616 | x <- rnorm(5) 617 | y <- rnorm(5) 618 | 619 | outer(x, y, FUN = function(x, y) x + x^2 - y) 620 | ``` 621 | 622 | ### Subtract column means from non-zero elements in a sparse matrix 623 | 624 | Here are three methods to achieve this, with increasing levels of optimization. 625 | 626 | ```r 627 | library(Matrix) 628 | 629 | set.seed(42) 630 | mat <- rsparsematrix(nrow = 1000, ncol = 500, density = 0.01) 631 | ``` 632 | 633 | **Method 1**. Loop over columns and subtract the mean for non-zero elements: 634 | 635 | ```r 636 | f1 <- function(mat) { 637 | col_means <- colSums(mat) / colSums(mat != 0) 638 | for (i in seq_len(ncol(mat))) { 639 | mat[mat[, i] != 0, i] <- mat[mat[, i] != 0, i] - col_means[i] 640 | } 641 | mat 642 | } 643 | ``` 644 | 645 | **Method 2**. Use a helper matrix to subtract column means with matrix multiplication: 646 | 647 | ```r 648 | f2 <- function(mat) { 649 | mat_copy <- mat 650 | mat_copy@x <- rep(1, length(mat_copy@x)) 651 | col_means <- colSums(mat) / colSums(mat_copy) 652 | mat - mat_copy %*% Diagonal(x = col_means) 653 | } 654 | ``` 655 | 656 | **Method 3**. Modify sparse matrix non-zero values directly: 657 | 658 | ```r 659 | f3 <- function(mat) { 660 | col_means <- colSums(mat) / colSums(mat != 0) 661 | mat@x <- mat@x - rep(col_means, diff(mat@p)) 662 | mat 663 | } 664 | ``` 665 | 666 | ```r 667 | microbenchmark::microbenchmark(f1(mat), f2(mat), f3(mat), times = 100) 668 | #> Unit: microseconds 669 | #> expr min lq mean median uq max 670 | #> f1(mat) 110731.242 113995.1290 133040.4843 115918.4595 119605.159 263641.562 671 | #> f2(mat) 473.509 504.6280 680.0215 571.9705 602.659 4543.620 672 | #> f3(mat) 172.446 192.5155 278.6069 238.0460 259.448 3965.356 673 | ``` 674 | 675 | The speedup is achieved by avoiding making redundant copies (from R's 676 | copy-on-modify semantics) and making in-place modifications as much as possible. 677 | 678 | ## Functions 679 | 680 | ### Specify formal argument lists with `alist()` 681 | 682 | The `alist()` function can create lists where some elements are intentionally 683 | left blank (or are "missing"), which can be helpful when we want to specify 684 | formal arguments of a function, especially in conjunction with `formals()`. 685 | 686 | Consider this scenario. Suppose we are writing a function that wraps another 687 | function, and we want our wrapper function to have the same formal arguments 688 | as the original function, even if it does not use all of them. 689 | Here is how we can use `alist()` to achieve that: 690 | 691 | ```r 692 | original_function <- function(a, b, c = 3, d = "something") a + b 693 | 694 | wrapper_function <- function(...) { 695 | # Use the formals of the original function 696 | arguments <- match.call(expand.dots = FALSE)$... 697 | 698 | # Update the formals using `alist()` 699 | formals(wrapper_function) <- alist(a = , b = , c = 3, d = "something") 700 | 701 | # Call the original function 702 | do.call(original_function, arguments) 703 | } 704 | ``` 705 | 706 | Now, `wrapper_function()` has the same formal arguments as 707 | `original_function()`, and any arguments passed to `wrapper_function()` 708 | are forwarded to `original_function()`. This way, even if `wrapper_function()` 709 | does not use all the arguments, it can still accept them, and code that uses 710 | `wrapper_function()` can be more consistent with code that uses 711 | `original_function()`. 712 | 713 | The `alist()` function is used here to create a list of formals where 714 | some elements are missing, which represents the fact that some arguments 715 | are required and have no default values. This would not be possible 716 | with `list()`, which cannot create lists with missing elements. 717 | 718 | ### Use internal functions without `:::` 719 | 720 | To use internal functions from packages without using `:::`, you can use 721 | 722 | ```r 723 | f <- utils::getFromNamespace("f", ns = "package") 724 | f(...) 725 | ``` 726 | 727 | ## Side-effects 728 | 729 | ### Return invisibly with `invisible()` for side-effect functions 730 | 731 | R functions always return a value. However, some functions are primarily 732 | designed for their side effects. To suppress the automatic printing 733 | of the returned value, use `invisible()`. 734 | 735 | ```r 736 | f <- function(x) { 737 | print(x^2) 738 | invisible(x) 739 | } 740 | ``` 741 | 742 | The value of `x` can be used later when the result is assigned to a variable 743 | or piped into the next function. 744 | 745 | ### Use `on.exit()` for cleanup 746 | 747 | `on.exit()` is a useful function for cleaning up side effects, such as 748 | deleting temporary files or closing opened connections, even if a function 749 | exits early due to an error: 750 | 751 | ```r 752 | f <- function() { 753 | temp_file <- tempfile() 754 | on.exit(unlink(temp_file)) 755 | 756 | # Do stuff with temp_file 757 | } 758 | 759 | f <- function(file) { 760 | con <- file(file, "r") 761 | on.exit(close(con)) 762 | readLines(con) 763 | } 764 | ``` 765 | 766 | This function creates a temporary file and then ensures it gets deleted 767 | when the function exits, regardless of why it exits. Note that the arguments 768 | `add` and `after` in `on.exit()` are important for controlling the overwriting 769 | and ordering behavior of the expressions. 770 | 771 | ## Numerical computations 772 | 773 | ### Create step functions with `stepfun()` 774 | 775 | The `stepfun()` function is an effective tool for creating step functions, 776 | which can be particularly handy in survival analysis. 777 | For instance, say we have two survival curves generated from Kaplan-Meier 778 | estimators, and we want to determine the difference in survival probabilities 779 | at a given time. 780 | 781 | Create the survival curves using `survfit()`: 782 | 783 | ```r 784 | library("survival") 785 | 786 | fit_km <- survfit(Surv(stop, event == "pcm") ~ 1, data = mgus1, subset = (start == 0)) 787 | fit_cr <- survfit(Surv(stop, event == "death") ~ 1, data = mgus1, subset = (start == 0)) 788 | ``` 789 | 790 | Convert these survival curves into step functions: 791 | 792 | ```r 793 | step_km <- stepfun(fit_km$time, c(1, fit_km$surv)) 794 | step_cr <- stepfun(fit_cr$time, c(1, fit_cr$surv)) 795 | ``` 796 | 797 | With these step functions, it becomes straightforward to compute the 798 | difference in survival probabilities at specific times: 799 | 800 | ```r 801 | t <- 1:3 * 1000 802 | step_km(t) - step_cr(t) 803 | ``` 804 | 805 | ## Further reading 806 | 807 | - [Data Manipulation with R](https://doi.org/10.1007/978-0-387-74731-6) 808 | - [The R Inferno](https://www.burns-stat.com/documents/books/the-r-inferno/) 809 | - [stackoverflow: Stack Overflow's Greatest Hits](https://cran.r-project.org/package=stackoverflow) 810 | -------------------------------------------------------------------------------- /images/banner.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nanxstats/r-base-shortcuts/e3faab1a59c2481d7dcd7ff1d70023631c92a256/images/banner.png -------------------------------------------------------------------------------- /images/banner.sh: -------------------------------------------------------------------------------- 1 | # brew install imagemagick 2 | # brew install --cask font-cascadia-code 3 | magick -size 2048x734 \ 4 | -define gradient:angle=330 \gradient:#03448c-#17ffc6 \ 5 | -gravity center \ 6 | -pointsize 105 \ 7 | -font 'JetBrains-Mono-Bold' \ 8 | -fill white \ 9 | -annotate +0-100 'r-base-shortcuts' \ 10 | -pointsize 28 \ 11 | -font 'JetBrains-Mono-Regular' \ 12 | -annotate +0+50 'Object Generation · Object Transformation · Vectorization' \ 13 | -annotate +0+100 'List Operations · Conditional Logic · Argument Handling' \ 14 | -annotate +0+150 'Side-Effect Management · Numerical Computations' \ 15 | png:- | pngquant - --force --output images/banner.png 16 | --------------------------------------------------------------------------------