├── .github ├── .gitignore ├── pics │ └── lay_history.png ├── ISSUE_TEMPLATE │ └── issue_template.md ├── workflows │ ├── pkgdown.yaml │ ├── R-CMD-check.yaml │ ├── test-coverage.yaml │ └── pr-commands.yaml ├── SUPPORT.md ├── CONTRIBUTING.md └── CODE_OF_CONDUCT.md ├── vignettes └── articles │ ├── .gitignore │ ├── alternatives.Rmd │ └── benchmarks.Rmd ├── LICENSE ├── data ├── drugs.rda └── drugs_full.rda ├── man ├── figures │ └── logo.png ├── reexports.Rd ├── drugs.Rd └── lay.Rd ├── .gitignore ├── source_hexsticker ├── lay.png ├── lay.xcf └── prepare_hexsticker.R ├── pkgdown └── favicon │ ├── favicon.ico │ ├── favicon-16x16.png │ ├── favicon-32x32.png │ ├── apple-touch-icon.png │ ├── apple-touch-icon-60x60.png │ ├── apple-touch-icon-76x76.png │ ├── apple-touch-icon-120x120.png │ ├── apple-touch-icon-152x152.png │ └── apple-touch-icon-180x180.png ├── tests ├── testthat.R ├── spelling.R └── testthat │ └── test-lay.R ├── NEWS.md ├── R ├── reexports.R ├── data.R └── lay.R ├── .Rbuildignore ├── NAMESPACE ├── _pkgdown.yml ├── inst ├── WORDLIST └── CITATION ├── lay.Rproj ├── cran-comments.md ├── DESCRIPTION ├── LICENSE.md ├── README.Rmd └── README.md /.github/.gitignore: -------------------------------------------------------------------------------- 1 | *.html 2 | -------------------------------------------------------------------------------- /vignettes/articles/.gitignore: -------------------------------------------------------------------------------- 1 | *.html 2 | *.R 3 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | YEAR: 2023 2 | COPYRIGHT HOLDER: Alexandre Courtiol 3 | -------------------------------------------------------------------------------- /data/drugs.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/courtiol/lay/HEAD/data/drugs.rda -------------------------------------------------------------------------------- /data/drugs_full.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/courtiol/lay/HEAD/data/drugs_full.rda -------------------------------------------------------------------------------- /man/figures/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/courtiol/lay/HEAD/man/figures/logo.png -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | inst/doc 5 | source_data 6 | docs 7 | -------------------------------------------------------------------------------- /source_hexsticker/lay.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/courtiol/lay/HEAD/source_hexsticker/lay.png -------------------------------------------------------------------------------- /source_hexsticker/lay.xcf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/courtiol/lay/HEAD/source_hexsticker/lay.xcf -------------------------------------------------------------------------------- /pkgdown/favicon/favicon.ico: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/courtiol/lay/HEAD/pkgdown/favicon/favicon.ico -------------------------------------------------------------------------------- /.github/pics/lay_history.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/courtiol/lay/HEAD/.github/pics/lay_history.png -------------------------------------------------------------------------------- /pkgdown/favicon/favicon-16x16.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/courtiol/lay/HEAD/pkgdown/favicon/favicon-16x16.png -------------------------------------------------------------------------------- /pkgdown/favicon/favicon-32x32.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/courtiol/lay/HEAD/pkgdown/favicon/favicon-32x32.png -------------------------------------------------------------------------------- /pkgdown/favicon/apple-touch-icon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/courtiol/lay/HEAD/pkgdown/favicon/apple-touch-icon.png -------------------------------------------------------------------------------- /tests/testthat.R: -------------------------------------------------------------------------------- 1 | library(testthat) 2 | library(lay) 3 | library(dplyr, warn.conflicts = FALSE) 4 | 5 | test_check("lay") 6 | -------------------------------------------------------------------------------- /pkgdown/favicon/apple-touch-icon-60x60.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/courtiol/lay/HEAD/pkgdown/favicon/apple-touch-icon-60x60.png -------------------------------------------------------------------------------- /pkgdown/favicon/apple-touch-icon-76x76.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/courtiol/lay/HEAD/pkgdown/favicon/apple-touch-icon-76x76.png -------------------------------------------------------------------------------- /pkgdown/favicon/apple-touch-icon-120x120.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/courtiol/lay/HEAD/pkgdown/favicon/apple-touch-icon-120x120.png -------------------------------------------------------------------------------- /pkgdown/favicon/apple-touch-icon-152x152.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/courtiol/lay/HEAD/pkgdown/favicon/apple-touch-icon-152x152.png -------------------------------------------------------------------------------- /pkgdown/favicon/apple-touch-icon-180x180.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/courtiol/lay/HEAD/pkgdown/favicon/apple-touch-icon-180x180.png -------------------------------------------------------------------------------- /NEWS.md: -------------------------------------------------------------------------------- 1 | # lay 0.1.3 2 | 3 | * Add value field in documentation to meet CRAN requirements. 4 | 5 | # lay 0.1.2 6 | 7 | * Initial CRAN submission. 8 | -------------------------------------------------------------------------------- /R/reexports.R: -------------------------------------------------------------------------------- 1 | #' @importFrom tibble tibble 2 | #' @export 3 | tibble::tibble 4 | 5 | #' @importFrom tibble as_tibble_row 6 | #' @export 7 | tibble::as_tibble_row 8 | -------------------------------------------------------------------------------- /tests/spelling.R: -------------------------------------------------------------------------------- 1 | if(requireNamespace('spelling', quietly = TRUE)) 2 | spelling::spell_check_test(vignettes = TRUE, error = FALSE, 3 | skip_on_cran = TRUE) 4 | -------------------------------------------------------------------------------- /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^lay\.Rproj$ 2 | ^\.Rproj\.user$ 3 | ^LICENSE\.md$ 4 | ^README\.Rmd$ 5 | ^cran-comments\.md$ 6 | ^\.github$ 7 | ^\.github/workflows/R-CMD-check\.yaml$ 8 | ^\.github/workflows/pr-commands\.yaml$ 9 | ^source\_data$ 10 | ^source\_hexsticker$ 11 | ^_pkgdown\.yml$ 12 | ^docs$ 13 | ^pkgdown$ 14 | ^vignettes$ 15 | -------------------------------------------------------------------------------- /source_hexsticker/prepare_hexsticker.R: -------------------------------------------------------------------------------- 1 | library(hexSticker) 2 | sticker("source_hexsticker/lay.png", 3 | h_color = "#00ff00", 4 | package = NULL, 5 | s_width = 1, 6 | s_x = 1.075, 7 | s_y = 1, 8 | white_around_sticker = TRUE, 9 | filename = "inst/figures/logo.png") 10 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | export(as_tibble_row) 4 | export(lay) 5 | export(tibble) 6 | importFrom(purrr,pmap) 7 | importFrom(rlang,as_function) 8 | importFrom(rlang,exec) 9 | importFrom(rlang,list2) 10 | importFrom(tibble,as_tibble_row) 11 | importFrom(tibble,tibble) 12 | importFrom(vctrs,vec_c) 13 | -------------------------------------------------------------------------------- /_pkgdown.yml: -------------------------------------------------------------------------------- 1 | url: https://courtiol.github.io/lay/ 2 | template: 3 | bootstrap: 5 4 | 5 | reference: 6 | - title: Verb 7 | desc: > 8 | The main verb of the package: 9 | contents: 10 | - lay 11 | 12 | - title: Data 13 | desc: > 14 | Some datasets to illustrate functionalities: 15 | contents: 16 | - drugs 17 | - drugs_full 18 | 19 | -------------------------------------------------------------------------------- /inst/WORDLIST: -------------------------------------------------------------------------------- 1 | Benchmarking 2 | CMD 3 | Lifecycle 4 | Rowwise 5 | caseid 6 | dplyr 7 | everused 8 | hydrocodone 9 | lorcert 10 | lortab 11 | nonmedically 12 | oxycontin 13 | percocet 14 | percodan 15 | purrr 16 | recoded 17 | rlang 18 | rowwise 19 | tibble 20 | tibbles 21 | tidyr 22 | tidyselect 23 | tramadol 24 | tylox 25 | vctrs 26 | vectorized 27 | vicodin 28 | -------------------------------------------------------------------------------- /inst/CITATION: -------------------------------------------------------------------------------- 1 | bibentry( 2 | bibtype = "Manual", 3 | title = "Simple but Efficient Rowwise Jobs", 4 | author = c(person("Alexandre", "Courtiol", email = "alexandre.courtiol@gmail.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0003-0637-2959")), 5 | person("Romain", "François", role = c("aut"), comment = c(ORCID = "0000-0002-2444-4226")) 6 | ), 7 | year = 2023, 8 | url = "https://github.com/courtiol/lay" 9 | ) 10 | -------------------------------------------------------------------------------- /lay.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: No 4 | SaveWorkspace: No 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | 15 | AutoAppendNewline: Yes 16 | StripTrailingWhitespace: Yes 17 | 18 | BuildType: Package 19 | PackageUseDevtools: Yes 20 | PackageInstallArgs: --no-multiarch --with-keep.source 21 | PackageRoxygenize: rd,collate,namespace 22 | -------------------------------------------------------------------------------- /man/reexports.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/reexports.R 3 | \docType{import} 4 | \name{reexports} 5 | \alias{reexports} 6 | \alias{tibble} 7 | \alias{as_tibble_row} 8 | \title{Objects exported from other packages} 9 | \keyword{internal} 10 | \description{ 11 | These objects are imported from other packages. Follow the links 12 | below to see their documentation. 13 | 14 | \describe{ 15 | \item{tibble}{\code{\link[tibble:as_tibble]{as_tibble_row}}, \code{\link[tibble]{tibble}}} 16 | }} 17 | 18 | -------------------------------------------------------------------------------- /cran-comments.md: -------------------------------------------------------------------------------- 1 | ## Test environments 2 | 3 | * local R installation, R 4.3.1 4 | * GitHub Actions (usethis::use_github_action("check-standard")) 5 | - {os: macos-latest, r: 'release'} 6 | - {os: windows-latest, r: 'release'} 7 | - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'} 8 | - {os: ubuntu-latest, r: 'release'} 9 | - {os: ubuntu-latest, r: 'oldrel-1'} 10 | * win-builder (devel) 11 | 12 | ## R CMD check results 13 | 14 | 2 false positives in terms of misspellings in DESCRIPTION: 15 | "Rowwise" and "tibble". 16 | 17 | 0 errors | 0 warnings | 1 note 18 | 19 | * This is a new release. 20 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/issue_template.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report or feature request 3 | about: Describe a bug you've seen or make a case for a new feature 4 | --- 5 | 6 | Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on or . 7 | 8 | Please include a minimal reproducible example (AKA a reprex). If you've never heard of a [reprex](http://reprex.tidyverse.org/) before, start by reading . 9 | 10 | Brief description of the problem 11 | 12 | ```r 13 | # insert reprex here 14 | ``` 15 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: lay 2 | Title: Simple but Efficient Rowwise Jobs 3 | Version: 0.1.3 4 | Authors@R: c( 5 | person("Alexandre", "Courtiol", email = "alexandre.courtiol@gmail.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0003-0637-2959")), 6 | person("Romain", "François", role = c("aut"), comment = c(ORCID = "0000-0002-2444-4226")) 7 | ) 8 | Description: Creating efficiently new column(s) in a data frame (including tibble) by applying a function one row at a time. 9 | License: MIT + file LICENSE 10 | URL: https://courtiol.github.io/lay/, https://github.com/courtiol/lay/ 11 | BugReports: https://github.com/courtiol/lay/issues/ 12 | Encoding: UTF-8 13 | LazyData: true 14 | Depends: 15 | R (>= 2.10) 16 | Imports: 17 | rlang, 18 | purrr, 19 | vctrs, 20 | tibble 21 | Suggests: 22 | bench, 23 | covr, 24 | data.table, 25 | dplyr (>= 1.0), 26 | forcats, 27 | ggplot2, 28 | ggbeeswarm, 29 | knitr, 30 | rmarkdown, 31 | slider, 32 | testthat (>= 2.1.0), 33 | tidyr, 34 | spelling 35 | Roxygen: list(markdown = TRUE) 36 | RoxygenNote: 7.2.3 37 | Language: en-US 38 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | # MIT License 2 | 3 | Copyright (c) 2023 Alexandre Courtiol 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /tests/testthat/test-lay.R: -------------------------------------------------------------------------------- 1 | test_that("lay works", { 2 | 3 | ## data for tests 4 | df <- tibble(x = 1:10, y = 11:20, z = 21:30) 5 | df_na <- df 6 | df_na[1, 1] <- NA 7 | 8 | ## simple calls 9 | expect_identical( 10 | lay(df, mean), 11 | rowMeans(df) 12 | ) 13 | 14 | expect_identical( 15 | lay(df, min), 16 | df$x 17 | ) 18 | 19 | ## call with fn arguments 20 | expect_identical( 21 | lay(df_na, mean, na.rm = TRUE), 22 | rowMeans(df_na, na.rm = TRUE) 23 | ) 24 | 25 | ## auto spliced output 26 | expect_identical( 27 | lay(df, ~ tibble(min = min(.x), max = max(.x))), 28 | tibble(min = df$x, max = df$z) 29 | ) 30 | 31 | ## both methods should lead to same results 32 | expect_identical( 33 | lay(df, mean, .method = "tidy"), 34 | lay(df, mean, .method = "apply") 35 | ) 36 | 37 | expect_identical( 38 | lay(df, ~ mean(.x), .method = "tidy"), 39 | lay(df, ~ mean(.x), .method = "apply") 40 | ) 41 | 42 | expect_identical( 43 | lay(df, ~ tibble(min = min(.x), max = max(.x)), method = "tidy"), 44 | lay(df, ~ tibble(min = min(.x), max = max(.x)), method = "apply") 45 | ) 46 | 47 | ## handle error properly 48 | expect_error( 49 | lay(df, mean, .method = "nonesense") 50 | ) 51 | 52 | }) 53 | -------------------------------------------------------------------------------- /.github/workflows/pkgdown.yaml: -------------------------------------------------------------------------------- 1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples 2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help 3 | on: 4 | push: 5 | branches: [main, master] 6 | pull_request: 7 | branches: [main, master] 8 | release: 9 | types: [published] 10 | workflow_dispatch: 11 | 12 | name: pkgdown 13 | 14 | jobs: 15 | pkgdown: 16 | runs-on: ubuntu-latest 17 | # Only restrict concurrency for non-PR jobs 18 | concurrency: 19 | group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }} 20 | env: 21 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 22 | permissions: 23 | contents: write 24 | steps: 25 | - uses: actions/checkout@v3 26 | 27 | - uses: r-lib/actions/setup-pandoc@v2 28 | 29 | - uses: r-lib/actions/setup-r@v2 30 | with: 31 | use-public-rspm: true 32 | 33 | - uses: r-lib/actions/setup-r-dependencies@v2 34 | with: 35 | extra-packages: any::pkgdown, local::. 36 | needs: website 37 | 38 | - name: Build site 39 | run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE) 40 | shell: Rscript {0} 41 | 42 | - name: Deploy to GitHub pages 🚀 43 | if: github.event_name != 'pull_request' 44 | uses: JamesIves/github-pages-deploy-action@v4.4.1 45 | with: 46 | clean: false 47 | branch: gh-pages 48 | folder: docs 49 | -------------------------------------------------------------------------------- /.github/workflows/R-CMD-check.yaml: -------------------------------------------------------------------------------- 1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples 2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help 3 | on: 4 | push: 5 | branches: [main, master] 6 | pull_request: 7 | branches: [main, master] 8 | 9 | name: R-CMD-check 10 | 11 | jobs: 12 | R-CMD-check: 13 | runs-on: ${{ matrix.config.os }} 14 | 15 | name: ${{ matrix.config.os }} (${{ matrix.config.r }}) 16 | 17 | strategy: 18 | fail-fast: false 19 | matrix: 20 | config: 21 | - {os: macos-latest, r: 'release'} 22 | - {os: windows-latest, r: 'release'} 23 | - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'} 24 | - {os: ubuntu-latest, r: 'release'} 25 | - {os: ubuntu-latest, r: 'oldrel-1'} 26 | 27 | env: 28 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 29 | R_KEEP_PKG_SOURCE: yes 30 | 31 | steps: 32 | - uses: actions/checkout@v3 33 | 34 | - uses: r-lib/actions/setup-pandoc@v2 35 | 36 | - uses: r-lib/actions/setup-r@v2 37 | with: 38 | r-version: ${{ matrix.config.r }} 39 | http-user-agent: ${{ matrix.config.http-user-agent }} 40 | use-public-rspm: true 41 | 42 | - uses: r-lib/actions/setup-r-dependencies@v2 43 | with: 44 | extra-packages: any::rcmdcheck 45 | needs: check 46 | 47 | - uses: r-lib/actions/check-r-package@v2 48 | with: 49 | upload-snapshots: true 50 | -------------------------------------------------------------------------------- /.github/workflows/test-coverage.yaml: -------------------------------------------------------------------------------- 1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples 2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help 3 | on: 4 | push: 5 | branches: [main, master] 6 | pull_request: 7 | branches: [main, master] 8 | 9 | name: test-coverage 10 | 11 | jobs: 12 | test-coverage: 13 | runs-on: ubuntu-latest 14 | env: 15 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 16 | 17 | steps: 18 | - uses: actions/checkout@v3 19 | 20 | - uses: r-lib/actions/setup-r@v2 21 | with: 22 | use-public-rspm: true 23 | 24 | - uses: r-lib/actions/setup-r-dependencies@v2 25 | with: 26 | extra-packages: any::covr 27 | needs: coverage 28 | 29 | - name: Test coverage 30 | run: | 31 | covr::codecov( 32 | quiet = FALSE, 33 | clean = FALSE, 34 | install_path = file.path(Sys.getenv("RUNNER_TEMP"), "package") 35 | ) 36 | shell: Rscript {0} 37 | 38 | - name: Show testthat output 39 | if: always() 40 | run: | 41 | ## -------------------------------------------------------------------- 42 | find ${{ runner.temp }}/package -name 'testthat.Rout*' -exec cat '{}' \; || true 43 | shell: bash 44 | 45 | - name: Upload test results 46 | if: failure() 47 | uses: actions/upload-artifact@v3 48 | with: 49 | name: coverage-test-failures 50 | path: ${{ runner.temp }}/package 51 | -------------------------------------------------------------------------------- /R/data.R: -------------------------------------------------------------------------------- 1 | #' Pain relievers misuse in the US 2 | #' 3 | #' Datasets containing information about the use of pain relievers for non medical purpose. 4 | #' 5 | #' These datasets are a small subset from the "National Survey on Drug Use and Health, 2014". 6 | #' All variables related to drug use have been recoded into vectors of integers talking value 0 for 7 | #' "No/Unknown" and value 1 for "Yes". The original variable names were the same as those defined 8 | #' here but in upper case and ending with the number 2. The dataset called `drugs` contain the first 9 | #' 100 rows of the one called `drugs_full`. 10 | #' 11 | #' @format A tibble with either 100 or 55271 rows, and 8 variables: 12 | #' \describe{ 13 | #' \item{caseid}{The identifier code of the respondent} 14 | #' \item{hydrocd}{Ever use hydrocodone nonmedically?} 15 | #' \item{oxycodp}{Ever use ever percocet, percodan, tylox, oxycontin... nonmedically?} 16 | #' \item{codeine}{Ever used codeine nonmedically?} 17 | #' \item{tramadl}{Ever used tramadol nonmedically?} 18 | #' \item{morphin}{Ever used morphine nonmedically?} 19 | #' \item{methdon}{Ever used methadone nonmedically?} 20 | #' \item{vicolor}{Ever used vicodin, lortab or lorcert nonmedically?} 21 | #' } 22 | #' @source \url{https://www.icpsr.umich.edu/web/NAHDAP/studies/36361} 23 | #' @references United States Department of Health and Human Services. 24 | #' Substance Abuse and Mental Health Services Administration. 25 | #' Center for Behavioral Health Statistics and Quality. 26 | #' National Survey on Drug Use and Health, 2014. 27 | #' Ann Arbor, MI: Inter-university Consortium for Political and Social Research (distributor), 2016-03-22. 28 | #' \doi{https://doi.org/10.3886/ICPSR36361.v1} 29 | #' 30 | #' @aliases drugs drugs_full 31 | #' @name drugs 32 | #' @examples 33 | #' drugs 34 | #' drugs_full 35 | NULL 36 | -------------------------------------------------------------------------------- /man/drugs.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/data.R 3 | \name{drugs} 4 | \alias{drugs} 5 | \alias{drugs_full} 6 | \title{Pain relievers misuse in the US} 7 | \format{ 8 | A tibble with either 100 or 55271 rows, and 8 variables: 9 | \describe{ 10 | \item{caseid}{The identifier code of the respondent} 11 | \item{hydrocd}{Ever use hydrocodone nonmedically?} 12 | \item{oxycodp}{Ever use ever percocet, percodan, tylox, oxycontin... nonmedically?} 13 | \item{codeine}{Ever used codeine nonmedically?} 14 | \item{tramadl}{Ever used tramadol nonmedically?} 15 | \item{morphin}{Ever used morphine nonmedically?} 16 | \item{methdon}{Ever used methadone nonmedically?} 17 | \item{vicolor}{Ever used vicodin, lortab or lorcert nonmedically?} 18 | } 19 | } 20 | \source{ 21 | \url{https://www.icpsr.umich.edu/web/NAHDAP/studies/36361} 22 | } 23 | \description{ 24 | Datasets containing information about the use of pain relievers for non medical purpose. 25 | } 26 | \details{ 27 | These datasets are a small subset from the "National Survey on Drug Use and Health, 2014". 28 | All variables related to drug use have been recoded into vectors of integers talking value 0 for 29 | "No/Unknown" and value 1 for "Yes". The original variable names were the same as those defined 30 | here but in upper case and ending with the number 2. The dataset called \code{drugs} contain the first 31 | 100 rows of the one called \code{drugs_full}. 32 | } 33 | \examples{ 34 | drugs 35 | drugs_full 36 | } 37 | \references{ 38 | United States Department of Health and Human Services. 39 | Substance Abuse and Mental Health Services Administration. 40 | Center for Behavioral Health Statistics and Quality. 41 | National Survey on Drug Use and Health, 2014. 42 | Ann Arbor, MI: Inter-university Consortium for Political and Social Research (distributor), 2016-03-22. 43 | \doi{https://doi.org/10.3886/ICPSR36361.v1} 44 | } 45 | -------------------------------------------------------------------------------- /.github/SUPPORT.md: -------------------------------------------------------------------------------- 1 | # Getting help with lay 2 | 3 | Thanks for using lay! 4 | Before filing an issue, there are a few places to explore and pieces to put together to make the process as smooth as possible. 5 | 6 | ## Make a reprex 7 | 8 | Start by making a minimal **repr**oducible **ex**ample using the [reprex](https://reprex.tidyverse.org/) package. 9 | If you haven't heard of or used reprex before, you're in for a treat! 10 | Seriously, reprex will make all of your R-question-asking endeavors easier (which is a pretty insane ROI for the five to ten minutes it'll take you to learn what it's all about). 11 | For additional reprex pointers, check out the [Get help!](https://www.tidyverse.org/help/) section of the tidyverse site. 12 | 13 | ## Where to ask? 14 | 15 | Armed with your reprex, the next step is to figure out [where to ask](https://www.tidyverse.org/help/#where-to-ask). 16 | 17 | * If it's a question: start with [community.rstudio.com](https://community.rstudio.com/), and/or StackOverflow. There are more people there to answer questions. 18 | 19 | * If it's a bug: you're in the right place, [file an issue](https://github.com//issues/new). 20 | 21 | * If you're not sure: let the community help you figure it out! 22 | If your problem _is_ a bug or a feature request, you can easily return here and report it. 23 | 24 | Before opening a new issue, be sure to [search issues and pull requests](https://github.com//issues) to make sure the bug hasn't been reported and/or already fixed in the development version. 25 | By default, the search will be pre-populated with `is:issue is:open`. 26 | You can [edit the qualifiers](https://help.github.com/articles/searching-issues-and-pull-requests/) (e.g. `is:pr`, `is:closed`) as needed. 27 | For example, you'd simply remove `is:open` to search _all_ issues in the repo, open or closed. 28 | 29 | ## What happens next? 30 | 31 | To be as efficient as possible, development of tidyverse packages tends to be very bursty, so you shouldn't worry if you don't get an immediate response. 32 | Typically we don't look at a repo until a sufficient quantity of issues accumulates, then there's a burst of intense activity as we focus our efforts. 33 | That makes development more efficient because it avoids expensive context switching between problems, at the cost of taking longer to get back to you. 34 | This process makes a good reprex particularly important because it might be multiple months between your initial report and when we start working on it. 35 | If we can't reproduce the bug, we can't fix it! 36 | -------------------------------------------------------------------------------- /.github/workflows/pr-commands.yaml: -------------------------------------------------------------------------------- 1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples 2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help 3 | on: 4 | issue_comment: 5 | types: [created] 6 | 7 | name: Commands 8 | 9 | jobs: 10 | document: 11 | if: ${{ github.event.issue.pull_request && (github.event.comment.author_association == 'MEMBER' || github.event.comment.author_association == 'OWNER') && startsWith(github.event.comment.body, '/document') }} 12 | name: document 13 | runs-on: ubuntu-latest 14 | env: 15 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 16 | steps: 17 | - uses: actions/checkout@v3 18 | 19 | - uses: r-lib/actions/pr-fetch@v2 20 | with: 21 | repo-token: ${{ secrets.GITHUB_TOKEN }} 22 | 23 | - uses: r-lib/actions/setup-r@v2 24 | with: 25 | use-public-rspm: true 26 | 27 | - uses: r-lib/actions/setup-r-dependencies@v2 28 | with: 29 | extra-packages: any::roxygen2 30 | needs: pr-document 31 | 32 | - name: Document 33 | run: roxygen2::roxygenise() 34 | shell: Rscript {0} 35 | 36 | - name: commit 37 | run: | 38 | git config --local user.name "$GITHUB_ACTOR" 39 | git config --local user.email "$GITHUB_ACTOR@users.noreply.github.com" 40 | git add man/\* NAMESPACE 41 | git commit -m 'Document' 42 | 43 | - uses: r-lib/actions/pr-push@v2 44 | with: 45 | repo-token: ${{ secrets.GITHUB_TOKEN }} 46 | 47 | style: 48 | if: ${{ github.event.issue.pull_request && (github.event.comment.author_association == 'MEMBER' || github.event.comment.author_association == 'OWNER') && startsWith(github.event.comment.body, '/style') }} 49 | name: style 50 | runs-on: ubuntu-latest 51 | env: 52 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 53 | steps: 54 | - uses: actions/checkout@v3 55 | 56 | - uses: r-lib/actions/pr-fetch@v2 57 | with: 58 | repo-token: ${{ secrets.GITHUB_TOKEN }} 59 | 60 | - uses: r-lib/actions/setup-r@v2 61 | 62 | - name: Install dependencies 63 | run: install.packages("styler") 64 | shell: Rscript {0} 65 | 66 | - name: Style 67 | run: styler::style_pkg() 68 | shell: Rscript {0} 69 | 70 | - name: commit 71 | run: | 72 | git config --local user.name "$GITHUB_ACTOR" 73 | git config --local user.email "$GITHUB_ACTOR@users.noreply.github.com" 74 | git add \*.R 75 | git commit -m 'Style' 76 | 77 | - uses: r-lib/actions/pr-push@v2 78 | with: 79 | repo-token: ${{ secrets.GITHUB_TOKEN }} 80 | -------------------------------------------------------------------------------- /.github/CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to lay 2 | 3 | This outlines how to propose a change to lay. 4 | For more detailed info about contributing to this, and other tidyverse packages, please see the 5 | [**development contributing guide**](https://rstd.io/tidy-contrib). 6 | 7 | ## Fixing typos 8 | 9 | You can fix typos, spelling mistakes, or grammatical errors in the documentation directly using the GitHub web interface, as long as the changes are made in the _source_ file. 10 | This generally means you'll need to edit [roxygen2 comments](https://roxygen2.r-lib.org/articles/roxygen2.html) in an `.R`, not a `.Rd` file. 11 | You can find the `.R` file that generates the `.Rd` by reading the comment in the first line. 12 | 13 | ## Bigger changes 14 | 15 | If you want to make a bigger change, it's a good idea to first file an issue and make sure someone from the team agrees that it's needed. 16 | If you've found a bug, please file an issue that illustrates the bug with a minimal 17 | [reprex](https://www.tidyverse.org/help/#reprex) (this will also help you write a unit test, if needed). 18 | 19 | ### Pull request process 20 | 21 | * Fork the package and clone onto your computer. If you haven't done this before, we recommend using `usethis::create_from_github("courtiol/lay", fork = TRUE)`. 22 | 23 | * Install all development dependences with `devtools::install_dev_deps()`, and then make sure the package passes R CMD check by running `devtools::check()`. 24 | If R CMD check doesn't pass cleanly, it's a good idea to ask for help before continuing. 25 | 26 | * Create a Git branch for your pull request (PR). We recommend using `usethis::pr_init("brief-description-of-change")`. 27 | 28 | * Make your changes, commit to git, and then create a PR by running `usethis::pr_push()`, and following the prompts in your browser. 29 | The title of your PR should briefly describe the change. 30 | The body of your PR should contain `Fixes #issue-number`. 31 | 32 | * For user-facing changes, add a bullet to the top of `NEWS.md` (i.e. just below the first header). Follow the style described in . 33 | 34 | ### Code style 35 | 36 | * New code should follow the tidyverse [style guide](https://style.tidyverse.org). 37 | You can use the [styler](https://CRAN.R-project.org/package=styler) package to apply these styles, but please don't restyle code that has nothing to do with your PR. 38 | 39 | * We use [roxygen2](https://cran.r-project.org/package=roxygen2), with [Markdown syntax](https://cran.r-project.org/web/packages/roxygen2/vignettes/markdown.html), for documentation. 40 | 41 | * We use [testthat](https://cran.r-project.org/package=testthat) for unit tests. 42 | Contributions with test cases included are easier to accept. 43 | 44 | ## Code of Conduct 45 | 46 | Please note that the lay project is released with a 47 | [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By contributing to this 48 | project you agree to abide by its terms. 49 | -------------------------------------------------------------------------------- /.github/CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | We as members, contributors, and leaders pledge to make participation in our 6 | community a harassment-free experience for everyone, regardless of age, body 7 | size, visible or invisible disability, ethnicity, sex characteristics, gender 8 | identity and expression, level of experience, education, socio-economic status, 9 | nationality, personal appearance, race, religion, or sexual identity and 10 | orientation. 11 | 12 | We pledge to act and interact in ways that contribute to an open, welcoming, 13 | diverse, inclusive, and healthy community. 14 | 15 | ## Our Standards 16 | 17 | Examples of behavior that contributes to a positive environment for our 18 | community include: 19 | 20 | * Demonstrating empathy and kindness toward other people 21 | * Being respectful of differing opinions, viewpoints, and experiences 22 | * Giving and gracefully accepting constructive feedback 23 | * Accepting responsibility and apologizing to those affected by our mistakes, 24 | and learning from the experience 25 | * Focusing on what is best not just for us as individuals, but for the overall 26 | community 27 | 28 | Examples of unacceptable behavior include: 29 | 30 | * The use of sexualized language or imagery, and sexual attention or 31 | advances of any kind 32 | * Trolling, insulting or derogatory comments, and personal or political attacks 33 | * Public or private harassment 34 | * Publishing others' private information, such as a physical or email 35 | address, without their explicit permission 36 | * Other conduct which could reasonably be considered inappropriate in a 37 | professional setting 38 | 39 | ## Enforcement Responsibilities 40 | 41 | Community leaders are responsible for clarifying and enforcing our standards 42 | of acceptable behavior and will take appropriate and fair corrective action in 43 | response to any behavior that they deem inappropriate, threatening, offensive, 44 | or harmful. 45 | 46 | Community leaders have the right and responsibility to remove, edit, or reject 47 | comments, commits, code, wiki edits, issues, and other contributions that are 48 | not aligned to this Code of Conduct, and will communicate reasons for moderation 49 | decisions when appropriate. 50 | 51 | ## Scope 52 | 53 | This Code of Conduct applies within all community spaces, and also applies 54 | when an individual is officially representing the community in public spaces. 55 | Examples of representing our community include using an official e-mail 56 | address, posting via an official social media account, or acting as an appointed 57 | representative at an online or offline event. 58 | 59 | ## Enforcement 60 | 61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 62 | reported to the community leaders responsible for enforcement at [INSERT CONTACT 63 | METHOD]. All complaints will be reviewed and investigated promptly and fairly. 64 | 65 | All community leaders are obligated to respect the privacy and security of the 66 | reporter of any incident. 67 | 68 | ## Enforcement Guidelines 69 | 70 | Community leaders will follow these Community Impact Guidelines in determining 71 | the consequences for any action they deem in violation of this Code of Conduct: 72 | 73 | ### 1. Correction 74 | 75 | **Community Impact**: Use of inappropriate language or other behavior deemed 76 | unprofessional or unwelcome in the community. 77 | 78 | **Consequence**: A private, written warning from community leaders, providing 79 | clarity around the nature of the violation and an explanation of why the 80 | behavior was inappropriate. A public apology may be requested. 81 | 82 | ### 2. Warning 83 | 84 | **Community Impact**: A violation through a single incident or series of 85 | actions. 86 | 87 | **Consequence**: A warning with consequences for continued behavior. No 88 | interaction with the people involved, including unsolicited interaction with 89 | those enforcing the Code of Conduct, for a specified period of time. This 90 | includes avoiding interactions in community spaces as well as external channels 91 | like social media. Violating these terms may lead to a temporary or permanent 92 | ban. 93 | 94 | ### 3. Temporary Ban 95 | 96 | **Community Impact**: A serious violation of community standards, including 97 | sustained inappropriate behavior. 98 | 99 | **Consequence**: A temporary ban from any sort of interaction or public 100 | communication with the community for a specified period of time. No public or 101 | private interaction with the people involved, including unsolicited interaction 102 | with those enforcing the Code of Conduct, is allowed during this period. 103 | Violating these terms may lead to a permanent ban. 104 | 105 | ### 4. Permanent Ban 106 | 107 | **Community Impact**: Demonstrating a pattern of violation of community 108 | standards, including sustained inappropriate behavior, harassment of an 109 | individual, or aggression toward or disparagement of classes of individuals. 110 | 111 | **Consequence**: A permanent ban from any sort of public interaction within the 112 | community. 113 | 114 | ## Attribution 115 | 116 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], 117 | version 2.0, 118 | available at https://www.contributor-covenant.org/version/2/0/ 119 | code_of_conduct.html. 120 | 121 | Community Impact Guidelines were inspired by [Mozilla's code of conduct 122 | enforcement ladder](https://github.com/mozilla/diversity). 123 | 124 | [homepage]: https://www.contributor-covenant.org 125 | 126 | For answers to common questions about this code of conduct, see the FAQ at 127 | https://www.contributor-covenant.org/faq. Translations are available at https:// 128 | www.contributor-covenant.org/translations. 129 | -------------------------------------------------------------------------------- /vignettes/articles/alternatives.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Alternatives for rowwise jobs" 3 | --- 4 | 5 | ```{r, include = FALSE} 6 | knitr::opts_chunk$set( 7 | collapse = TRUE, 8 | comment = "#>" 9 | ) 10 | ``` 11 | 12 | ## Article overview 13 | 14 | There are many alternatives to perform rowwise jobs in R. 15 | In this Article, we consider, in turns, these alternatives. 16 | We will stick to our example about drugs usage shown in [introduction](https://courtiol.github.io/lay/). 17 | The idea is to compare alternative ways to create a new variable named `everused` which indicates if each respondent has used any of the considered pain relievers for non medical purpose or not. 18 | 19 | ## Loading packages 20 | 21 | This Article requires you to load the following packages: 22 | 23 | ```{r load_pkg, message=FALSE} 24 | library(lay) ## for lay() and the data 25 | library(dplyr) ## for many things 26 | library(tidyr) ## for pivot_longer() and pivot_wider() 27 | library(purrr) ## for pmap_lgl() 28 | library(slider) ## for slide() 29 | library(data.table) ## for an alternative to base and dplyr 30 | ``` 31 | Please install them if they are not present on your system. 32 | 33 | ## Alternative 1: vectorized solution 34 | 35 | One solution is to simply do the following: 36 | ```{r vector} 37 | drugs_full |> 38 | mutate(everused = codeine | hydrocd | methdon | morphin | oxycodp | tramadl | vicolor) 39 | ``` 40 | It is certainly very efficient from a computational point of view, but coding this way presents two main limitations: 41 | 42 | - you need to name all columns explicitly, which can be problematic when dealing with many columns 43 | - you are stuck with expressing your task with logical and arithmetic operators, which is not always sufficient 44 | 45 | 46 | ## Alternative 2: 100% [**{dplyr}**](https://dplyr.tidyverse.org/) 47 | 48 | ```{r dplyr} 49 | drugs |> 50 | rowwise() |> 51 | mutate(everused = any(c_across(-caseid))) |> 52 | ungroup() 53 | ``` 54 | It is easy to use as `c_across()` turns its input into a vector and `rowwise()` implies that the 55 | vector only represents one row at a time. Yet, for now it remains quite slow on large datasets (see **Efficiency** below). 56 | 57 | 58 | ## Alternative 3: [**{tidyr}**](https://tidyr.tidyverse.org/) 59 | 60 | ```{r, } 61 | library(tidyr) ## requires to have installed {tidyr} 62 | 63 | drugs |> 64 | pivot_longer(-caseid) |> 65 | group_by(caseid) |> 66 | mutate(everused = any(value)) |> 67 | ungroup() |> 68 | pivot_wider() |> 69 | relocate(everused, .after = last_col()) 70 | ``` 71 | Here the trick is to turn the rowwise problem into a column problem by pivoting the values and then 72 | pivoting the results back. Many find that this involves a little too much intellectual gymnastic. It 73 | is also not particularly efficient on large dataset both in terms of computation time and memory required 74 | to pivot the tables. 75 | 76 | 77 | ## Alternative 4: [**{purrr}**](https://purrr.tidyverse.org/) 78 | 79 | ```{r purrr} 80 | library(purrr) ## requires to have installed {purrr} 81 | 82 | drugs |> 83 | mutate(everused = pmap_lgl(pick(-caseid), ~ any(...))) 84 | ``` 85 | This is a perfectly fine solution and actually part of what one implementation of `lay()` relies on 86 | (if `.method = "tidy"`), but from a user perspective it is a little too geeky-scary. 87 | 88 | 89 | ## Alternative 5: [**{slider}**](https://slider.r-lib.org/) 90 | 91 | ```{r slider} 92 | library(slider) ## requires to have installed {slider} 93 | 94 | drugs |> 95 | mutate(everused = slide_vec(pick(-caseid), any)) 96 | ``` 97 | The package [**{slider}**](https://slider.r-lib.org/) is a powerful package which provides several *sliding window* functions. 98 | It can be used to perform rowwise operations and is quite similar to **{lay}** in terms syntax. 99 | It is however not as efficient as **{lay}** and I am not sure it supports the automatic splicing demonstrated above. 100 | 101 | 102 | ## Alternative 6: [**{data.table}**](https://rdatatable.gitlab.io/data.table/) 103 | 104 | ```{r data.table, message=FALSE} 105 | library(data.table) ## requires to have installed {data.table} 106 | 107 | drugs_dt <- data.table(drugs) 108 | 109 | drugs_dt[, ..I := .I] 110 | drugs_dt[, everused := any(.SD), by = ..I, .SDcols = -"caseid"] 111 | drugs_dt[, ..I := NULL] 112 | as_tibble(drugs_dt) 113 | ``` 114 | This is a solution for those using [**{data.table}**](https://rdatatable.gitlab.io/data.table/). 115 | It is not particularly efficient, nor particularly easy to remember for those who do not program frequently using [**{data.table}**](https://rdatatable.gitlab.io/data.table/). 116 | 117 | 118 | ## Alternative 7: `apply()` 119 | 120 | ```{r apply} 121 | drugs |> 122 | mutate(everused = apply(pick(-caseid), 1L, any)) 123 | ``` 124 | This is the base R solution. Very efficient and actually part of the default method used in `lay()`. 125 | Our implementation of `lay()` strips the need of defining the margin (the `1L` above) and benefits from 126 | the automatic splicing and the lambda syntax as shown above. 127 | 128 | 129 | ## Alternative 8: `for (i in ...) {...}` 130 | 131 | ```{r for} 132 | drugs$everused <- NA 133 | 134 | columns_in <- !colnames(drugs) %in% c("caseid", "everused") 135 | 136 | for (i in seq_len(nrow(drugs))) { 137 | drugs$everused[i] <- any(drugs[i, columns_in]) 138 | } 139 | 140 | drugs 141 | ``` 142 | This is another base R solution, which does not involve any external package. It is not very pretty, 143 | nor particularly efficient. 144 | 145 | 146 | ## Other alternatives? 147 | 148 | There are probably other ways. If you think of a nice one, please leave an issue and we will add it here! 149 | 150 | 151 | ## Efficiency 152 | 153 | The results of benchmarks comparing alternative implementations for our simple rowwise job are shown in another Article (see [benchmarks](https://courtiol.github.io/lay/articles/benchmarks.html)). 154 | As you will see, `lay()` is not just simple and powerful, it is also quite efficient! 155 | 156 | -------------------------------------------------------------------------------- /man/lay.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/lay.R 3 | \name{lay} 4 | \alias{lay} 5 | \title{Apply a function within each row} 6 | \usage{ 7 | lay(.data, .fn, ..., .method = c("apply", "tidy")) 8 | } 9 | \arguments{ 10 | \item{.data}{A data frame or tibble (or other data frame extensions).} 11 | 12 | \item{.fn}{A function to apply to each row of \code{.data}. 13 | Possible values are: 14 | \itemize{ 15 | \item A function, e.g. \code{mean} 16 | \item An anonymous function, .e.g. \code{function(x) mean(x, na.rm = TRUE)} 17 | \item An anonymous function with shorthand, .e.g. \verb{\\(x) mean(x, na.rm = TRUE)} 18 | \item A purrr-style lambda, e.g. \code{~ mean(.x, na.rm = TRUE)} 19 | 20 | (wrap the output in a data frame to apply several functions at once, e.g. 21 | \code{~ tibble(min = min(.x), max = max(.x))}) 22 | }} 23 | 24 | \item{...}{Additional arguments for the function calls in \code{.fn} (must be named!).} 25 | 26 | \item{.method}{This is an experimental argument that allows you to control which internal method 27 | is used to apply the rowwise job: 28 | \itemize{ 29 | \item "apply", the default internally uses the function \code{\link[=apply]{apply()}}. 30 | \item "tidy", internally uses \code{\link[purrr:pmap]{purrr::pmap()}} and is stricter with respect to class coercion 31 | across columns. 32 | } 33 | 34 | The default has been chosen based on these \href{https://courtiol.github.io/lay/articles/benchmarks.html}{\strong{benchmarks}}.} 35 | } 36 | \value{ 37 | A vector with one element per row of \code{.data}, or a data frame (or tibble) with one row per row of \code{.data}. The class of the output is determined by \code{.fn}. 38 | } 39 | \description{ 40 | Create efficiently new column(s) in data frame (including tibble) by applying a function one row 41 | at a time. 42 | } 43 | \details{ 44 | \code{lay()} create a vector or a data frame (or tibble), by considering in turns each row of a data 45 | frame (\code{.data}) as the vector input of some function(s) \code{.fn}. 46 | 47 | This makes the creation of new columns based on a rowwise operation both simple (see 48 | \strong{Examples}; below) and efficient (see the Article \href{https://courtiol.github.io/lay/articles/benchmarks.html}{\strong{benchmarks}}). 49 | 50 | The function should be fully compatible with \code{{dplyr}}-based workflows and follows a syntax close 51 | to \code{\link[dplyr:across]{dplyr::across()}}. 52 | 53 | Yet, it takes \code{.data} instead of \code{.cols} as a main argument, which makes it possible to also use 54 | \code{lay()} outside \code{dplyr} verbs (see \strong{Examples}). 55 | 56 | The function \code{lay()} should work in a wide range of situations, provided that: 57 | \itemize{ 58 | \item The input \code{.data} should be a data frame (including tibble) with columns of same class, or of 59 | classes similar enough to be easily coerced into a single class. Note that \code{.method = "apply"} 60 | also allows for the input to be a matrix and is more permissive in terms of data coercion. 61 | \item The output of \code{.fn} should be a scalar (i.e., vector of length 1) or a 1 row data frame (or 62 | tibble). 63 | } 64 | 65 | If you use \code{lay()} within \code{\link[dplyr:mutate]{dplyr::mutate()}}, make sure that the data used by \code{\link[dplyr:mutate]{dplyr::mutate()}} 66 | contain no row-grouping, i.e., what is passed to \code{.data} in \code{\link[dplyr:mutate]{dplyr::mutate()}} should not be of 67 | class \code{grouped_df} or \code{rowwise_df}. If it is, \code{lay()} will be called multiple times, which will 68 | slow down the computation despite not influencing the output. 69 | } 70 | \examples{ 71 | 72 | # usage without dplyr ------------------------------------------------------------------------- 73 | 74 | # lay can return a vector 75 | lay(drugs[1:10, -1], any) 76 | 77 | # lay can return a data frame 78 | ## using the shorthand function syntax \(x) .fn(x) 79 | lay(drugs[1:10, -1], 80 | \(x) data.frame(drugs_taken = sum(x), drugs_not_taken = sum(x == 0))) 81 | 82 | ## using the rlang lambda syntax ~ fn(.x) 83 | lay(drugs[1:10, -1], 84 | ~ data.frame(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0))) 85 | 86 | # lay can be used to augment a data frame 87 | cbind(drugs[1:10, ], 88 | lay(drugs[1:10, -1], 89 | ~ data.frame(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0)))) 90 | 91 | 92 | # usage with dplyr ---------------------------------------------------------------------------- 93 | 94 | if (require("dplyr")) { 95 | 96 | # apply any() to each row 97 | drugs |> 98 | mutate(everused = lay(pick(-caseid), any)) 99 | 100 | # apply any() to each row using all columns 101 | drugs |> 102 | select(-caseid) |> 103 | mutate(everused = lay(pick(everything()), any)) 104 | 105 | # a workaround would be to use `rowSums` 106 | drugs |> 107 | mutate(everused = rowSums(pick(-caseid)) > 0) 108 | 109 | # but we can lay any function taking a vector as input, e.g. median 110 | drugs |> 111 | mutate(used_median = lay(pick(-caseid), median)) 112 | 113 | # you can pass arguments to the function 114 | drugs_with_NA <- drugs 115 | drugs_with_NA[1, 2] <- NA 116 | 117 | drugs_with_NA |> 118 | mutate(everused = lay(pick(-caseid), any)) 119 | drugs_with_NA |> 120 | mutate(everused = lay(pick(-caseid), any, na.rm = TRUE)) 121 | 122 | # you can lay the output into a 1-row tibble (or data.frame) 123 | # if you want to apply multiple functions 124 | drugs |> 125 | mutate(lay(pick(-caseid), 126 | ~ tibble(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0)))) 127 | 128 | # note that naming the output prevent the automatic splicing and you obtain a df-column 129 | drugs |> 130 | mutate(usage = lay(pick(-caseid), 131 | ~ tibble(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0)))) 132 | 133 | # if your function returns a vector longer than a scalar, you should turn the output 134 | # into a tibble, which is the job of as_tibble_row() 135 | drugs |> 136 | mutate(lay(pick(-caseid), ~ as_tibble_row(quantile(.x)))) 137 | 138 | # note that you could also wrap the output in a list and name it to obtain a list-column 139 | drugs |> 140 | mutate(usage_quantiles = lay(pick(-caseid), ~ list(quantile(.x)))) 141 | } 142 | 143 | } 144 | -------------------------------------------------------------------------------- /R/lay.R: -------------------------------------------------------------------------------- 1 | #' Apply a function within each row 2 | #' 3 | #' Create efficiently new column(s) in data frame (including tibble) by applying a function one row 4 | #' at a time. 5 | #' 6 | #' `lay()` create a vector or a data frame (or tibble), by considering in turns each row of a data 7 | #' frame (`.data`) as the vector input of some function(s) `.fn`. 8 | #' 9 | #' This makes the creation of new columns based on a rowwise operation both simple (see 10 | #' **Examples**; below) and efficient (see the Article [**benchmarks**](https://courtiol.github.io/lay/articles/benchmarks.html)). 11 | #' 12 | #' The function should be fully compatible with `{dplyr}`-based workflows and follows a syntax close 13 | #' to [dplyr::across()]. 14 | #' 15 | #' Yet, it takes `.data` instead of `.cols` as a main argument, which makes it possible to also use 16 | #' `lay()` outside `dplyr` verbs (see **Examples**). 17 | #' 18 | #' The function `lay()` should work in a wide range of situations, provided that: 19 | #' 20 | #' - The input `.data` should be a data frame (including tibble) with columns of same class, or of 21 | #' classes similar enough to be easily coerced into a single class. Note that `.method = "apply"` 22 | #' also allows for the input to be a matrix and is more permissive in terms of data coercion. 23 | #' 24 | #' - The output of `.fn` should be a scalar (i.e., vector of length 1) or a 1 row data frame (or 25 | #' tibble). 26 | #' 27 | #' If you use `lay()` within [dplyr::mutate()], make sure that the data used by [dplyr::mutate()] 28 | #' contain no row-grouping, i.e., what is passed to `.data` in [dplyr::mutate()] should not be of 29 | #' class `grouped_df` or `rowwise_df`. If it is, `lay()` will be called multiple times, which will 30 | #' slow down the computation despite not influencing the output. 31 | #' 32 | #' 33 | #' @param .data A data frame or tibble (or other data frame extensions). 34 | #' @param .fn A function to apply to each row of `.data`. 35 | #' Possible values are: 36 | #' 37 | #' - A function, e.g. `mean` 38 | #' - An anonymous function, .e.g. `function(x) mean(x, na.rm = TRUE)` 39 | #' - An anonymous function with shorthand, .e.g. `\(x) mean(x, na.rm = TRUE)` 40 | #' - A purrr-style lambda, e.g. `~ mean(.x, na.rm = TRUE)` 41 | #' 42 | #' (wrap the output in a data frame to apply several functions at once, e.g. 43 | #' `~ tibble(min = min(.x), max = max(.x))`) 44 | #' 45 | #' @param ... Additional arguments for the function calls in `.fn` (must be named!). 46 | #' @param .method This is an experimental argument that allows you to control which internal method 47 | #' is used to apply the rowwise job: 48 | #' - "apply", the default internally uses the function [apply()]. 49 | #' - "tidy", internally uses [purrr::pmap()] and is stricter with respect to class coercion 50 | #' across columns. 51 | #' 52 | #' The default has been chosen based on these [**benchmarks**](https://courtiol.github.io/lay/articles/benchmarks.html). 53 | #' 54 | #' @return A vector with one element per row of `.data`, or a data frame (or tibble) with one row per row of `.data`. The class of the output is determined by `.fn`. 55 | #' 56 | #' @importFrom vctrs vec_c 57 | #' @importFrom rlang list2 exec as_function 58 | #' @importFrom purrr pmap 59 | #' 60 | #' @examples 61 | #' 62 | #' # usage without dplyr ------------------------------------------------------------------------- 63 | #' 64 | #' # lay can return a vector 65 | #' lay(drugs[1:10, -1], any) 66 | #' 67 | #' # lay can return a data frame 68 | #' ## using the shorthand function syntax \(x) .fn(x) 69 | #' lay(drugs[1:10, -1], 70 | #' \(x) data.frame(drugs_taken = sum(x), drugs_not_taken = sum(x == 0))) 71 | #' 72 | #' ## using the rlang lambda syntax ~ fn(.x) 73 | #' lay(drugs[1:10, -1], 74 | #' ~ data.frame(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0))) 75 | #' 76 | #' # lay can be used to augment a data frame 77 | #' cbind(drugs[1:10, ], 78 | #' lay(drugs[1:10, -1], 79 | #' ~ data.frame(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0)))) 80 | #' 81 | #' 82 | #' # usage with dplyr ---------------------------------------------------------------------------- 83 | #' 84 | #' if (require("dplyr")) { 85 | #' 86 | #' # apply any() to each row 87 | #' drugs |> 88 | #' mutate(everused = lay(pick(-caseid), any)) 89 | #' 90 | #' # apply any() to each row using all columns 91 | #' drugs |> 92 | #' select(-caseid) |> 93 | #' mutate(everused = lay(pick(everything()), any)) 94 | #' 95 | #' # a workaround would be to use `rowSums` 96 | #' drugs |> 97 | #' mutate(everused = rowSums(pick(-caseid)) > 0) 98 | #' 99 | #' # but we can lay any function taking a vector as input, e.g. median 100 | #' drugs |> 101 | #' mutate(used_median = lay(pick(-caseid), median)) 102 | #' 103 | #' # you can pass arguments to the function 104 | #' drugs_with_NA <- drugs 105 | #' drugs_with_NA[1, 2] <- NA 106 | #' 107 | #' drugs_with_NA |> 108 | #' mutate(everused = lay(pick(-caseid), any)) 109 | #' drugs_with_NA |> 110 | #' mutate(everused = lay(pick(-caseid), any, na.rm = TRUE)) 111 | #' 112 | #' # you can lay the output into a 1-row tibble (or data.frame) 113 | #' # if you want to apply multiple functions 114 | #' drugs |> 115 | #' mutate(lay(pick(-caseid), 116 | #' ~ tibble(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0)))) 117 | #' 118 | #' # note that naming the output prevent the automatic splicing and you obtain a df-column 119 | #' drugs |> 120 | #' mutate(usage = lay(pick(-caseid), 121 | #' ~ tibble(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0)))) 122 | #' 123 | #' # if your function returns a vector longer than a scalar, you should turn the output 124 | #' # into a tibble, which is the job of as_tibble_row() 125 | #' drugs |> 126 | #' mutate(lay(pick(-caseid), ~ as_tibble_row(quantile(.x)))) 127 | #' 128 | #' # note that you could also wrap the output in a list and name it to obtain a list-column 129 | #' drugs |> 130 | #' mutate(usage_quantiles = lay(pick(-caseid), ~ list(quantile(.x)))) 131 | #' } 132 | #' 133 | #' @export 134 | lay <- function(.data, .fn, ..., .method = c("apply", "tidy")) { 135 | 136 | method <- match.arg(.method)[1] 137 | 138 | fn <- as_function(.fn) 139 | 140 | if (method == "tidy") { 141 | 142 | args <- list2(...) 143 | bits <- pmap(.data, function(...) exec(fn, vec_c(...), !!!args)) 144 | 145 | } else if (method == "apply") { 146 | 147 | bits <- apply(.data, 1L, fn, ...) 148 | 149 | } else stop(".method input unknown") 150 | 151 | vec_c(!!!bits) 152 | } 153 | -------------------------------------------------------------------------------- /vignettes/articles/benchmarks.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Benchmarks" 3 | --- 4 | 5 | ```{r, include = FALSE} 6 | knitr::opts_chunk$set( 7 | collapse = TRUE, 8 | comment = "#>" 9 | ) 10 | ``` 11 | 12 | ## Article overview 13 | 14 | The goal of this Article is to compare the performances of `lay()` to alternatives described [here](https://courtiol.github.io/lay/articles/alternatives.html). 15 | As you will see, the code using `lay()` is quite efficient. 16 | The only alternative that is clearly more efficient is the one labeled below "*vectorized*". 17 | Unfortunately, such a vectorized approach imply to refer explicitly to all column names which data are used. 18 | Furthermore, such a vectorized approach is not applicable generally, as it can only deal with logical 19 | and arithmetic operators and does allow the use of other types of functions. 20 | 21 | 22 | ## Loading packages 23 | 24 | This Article requires you to load the following packages: 25 | 26 | ```{r load_pkg, message=FALSE} 27 | library(lay) ## for lay() and the data 28 | library(dplyr) ## for many things 29 | library(tidyr) ## for pivot_longer() and pivot_wider() 30 | library(purrr) ## for pmap_lgl() 31 | library(slider) ## for slide() 32 | library(data.table) ## for an alternative to base and dplyr 33 | library(bench) ## for running the benchmarks 34 | library(forcats) ## for sorting levels in plot with fct_reorder() 35 | ``` 36 | 37 | Please install them if they are not present on your system. 38 | 39 | 40 | ## An example of a rowwise task 41 | 42 | Consider the dataset `drugs_full` from our package {lay}: 43 | ```{r drugs_full} 44 | drugs_full 45 | ``` 46 | 47 | In this dataset, all columns but `caseid` record the use of pain relievers for non medical purpose. 48 | 49 | For each drug there is a certain number of users and non-users: 50 | ```{r drugs_full_summary} 51 | drugs_full |> 52 | pivot_longer(-caseid, names_to = "drug", values_to = "used") |> 53 | count(drug, used) |> 54 | mutate(used = if_else(used == 1, "have_used", "have_not_used")) |> 55 | pivot_wider(names_from = used, values_from = n) 56 | ``` 57 | 58 | In this Article, we compare the efficiency of alternative ways to create a new variable named `everused` which indicates if each respondent has used any of the considered pain relievers for non medical purpose or not. 59 | 60 | We will run benchmarks on the dataset `drugs_full` and its `r nrow(drugs_full)` rows, as well as on a subset of this data called `drugs` that only contains `r nrow(drugs)` rows. 61 | 62 | ## Benchmarks on the full dataset (`r nrow(drugs_full)` rows) 63 | 64 | Let's compare the running time of different methods to do this job on the full dataset: 65 | 66 | ```{r bench_run1} 67 | drugs_full_dt <- data.table(drugs_full) ## coercion to data.table 68 | 69 | benchmark1 <- mark( 70 | vectorized = { 71 | drugs_full |> 72 | mutate(everused = codeine | hydrocd | methdon | morphin | oxycodp | tramadl | vicolor)}, 73 | lay = { 74 | drugs_full |> 75 | select(-caseid) |> 76 | mutate(everused = lay(pick(everything()), any))}, 77 | lay_alternative = { 78 | drugs_full |> 79 | mutate(everused = lay(pick(-caseid), any, .method = "tidy"))}, 80 | c_across = { 81 | drugs_full |> 82 | rowwise() |> 83 | mutate(everused = any(c_across(-caseid))) |> 84 | ungroup()}, 85 | pivot_pivot = { 86 | drugs_full |> 87 | pivot_longer(-caseid) |> 88 | group_by(caseid) |> 89 | mutate(everused = any(value)) |> 90 | ungroup() |> 91 | pivot_wider()}, 92 | pmap = { 93 | drugs_full |> 94 | mutate(everused = pmap_lgl(pick(-caseid), ~ any(...)))}, 95 | slider = { 96 | drugs_full |> 97 | mutate(everused = slide_vec(pick(-caseid), any))}, 98 | data.table = { 99 | drugs_full_dt[, ..I := .I] 100 | drugs_full_dt[, everused := any(.SD), by = ..I, .SDcols = -"caseid"]}, 101 | apply = { 102 | drugs_full |> 103 | mutate(everused = apply(pick(-caseid), 1, any))}, 104 | 'for' = { 105 | everused <- logical(nrow(drugs_full)) 106 | columns_in <- colnames(drugs_full) != "caseid" 107 | for (i in seq_len(nrow(drugs_full))) everused[i] <- any(drugs_full[i, columns_in])}, 108 | iterations = 5, 109 | time_unit = "ms", 110 | check = FALSE 111 | ) 112 | ``` 113 | 114 | Here are the results of this first series of benchmarks: 115 | ```{r bench_results1} 116 | benchmark1 |> 117 | arrange(median) 118 | ``` 119 | 120 | ```{r bench_plot1, fig.width=7} 121 | benchmark1 |> 122 | mutate(expression = fct_reorder(as.character(expression), median, .desc = TRUE)) |> 123 | plot() 124 | ``` 125 | 126 | Note that the x-axis of the plot is on a logarithmic scale. 127 | 128 | ## Benchmarks on a subset of the data (`r nrow(drugs)` rows) 129 | 130 | Let's repeat our benchmarks using a only a subset of the original dataset: 131 | 132 | ```{r bench_run2} 133 | drugs_dt <- data.table(drugs) ## coercion to data.table 134 | 135 | benchmark2 <- mark( 136 | vectorized = { 137 | drugs |> 138 | mutate(everused = codeine | hydrocd | methdon | morphin | oxycodp | tramadl | vicolor)}, 139 | lay = { 140 | drugs |> 141 | select(-caseid) |> 142 | mutate(everused = lay(pick(everything()), any))}, 143 | lay_alternative = { 144 | drugs |> 145 | mutate(everused = lay(pick(-caseid), any, .method = "tidy"))}, 146 | c_across = { 147 | drugs |> 148 | rowwise() |> 149 | mutate(everused = any(c_across(-caseid))) |> 150 | ungroup()}, 151 | pivot_pivot = { 152 | drugs |> 153 | pivot_longer(-caseid) |> 154 | group_by(caseid) |> 155 | mutate(everused = any(value)) |> 156 | ungroup() |> 157 | pivot_wider()}, 158 | pmap = { 159 | drugs |> 160 | mutate(everused = pmap_lgl(pick(-caseid), ~ any(...)))}, 161 | slider = { 162 | drugs |> 163 | mutate(everused = slide_vec(pick(-caseid), any))}, 164 | data.table = { 165 | drugs_dt[, ..I := .I] 166 | drugs_dt[, everused := any(.SD), by = ..I, .SDcols = -"caseid"]}, 167 | apply = { 168 | drugs |> 169 | mutate(everused = apply(pick(-caseid), 1, any))}, 170 | 'for' = { 171 | everused <- logical(nrow(drugs)) 172 | columns_in <- colnames(drugs) != "caseid" 173 | for (i in seq_len(nrow(drugs))) everused[i] <- any(drugs[i, columns_in])}, 174 | iterations = 30, 175 | time_unit = "ms", 176 | check = FALSE 177 | ) 178 | ``` 179 | 180 | Here are the results of this second series of benchmarks: 181 | ```{r bench_results2} 182 | benchmark2 |> 183 | arrange(median) 184 | ``` 185 | 186 | ```{r bench_plot2, fig.width=7} 187 | benchmark2 |> 188 | mutate(expression = fct_reorder(as.character(expression), median, .desc = TRUE)) |> 189 | plot(type = "violin") 190 | ``` 191 | 192 | Note again that the x-axis of the plot is on a logarithmic scale. 193 | 194 | ## Benchmarks' environment 195 | 196 | ```{r session} 197 | sessionInfo() 198 | ``` 199 | 200 | -------------------------------------------------------------------------------- /README.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | output: github_document 3 | editor_options: 4 | chunk_output_type: console 5 | --- 6 | 7 | 8 | 9 | ```{r, include = FALSE} 10 | knitr::opts_chunk$set( 11 | collapse = TRUE, 12 | comment = "#>", 13 | fig.path = "man/figures/", 14 | out.width = "70%", 15 | fig.align = "center" 16 | ) 17 | ``` 18 | 19 | # **{lay}** 20 | 21 | 22 | [![CRAN status](https://www.r-pkg.org/badges/version/lay)](https://CRAN.R-project.org/package=lay) 23 | [![R-CMD-check](https://github.com/courtiol/lay/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/courtiol/lay/actions/workflows/R-CMD-check.yaml) 24 | [![test-coverage](https://github.com/courtiol/lay/actions/workflows/test-coverage.yaml/badge.svg)](https://github.com/courtiol/lay/actions/workflows/test-coverage.yaml) 25 | [![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) 26 | 27 | 28 | ## An R package for simple but efficient rowwise jobs 29 | 30 | The function `lay()` -- the only function of the package **{lay}** -- is intended to be used to apply a function on each row of a data frame or tibble, independently, and across multiple columns containing values of the same class (e.g. all numeric). 31 | 32 | Implementing rowwise operations for tabular data is notoriously awkward in R. 33 | Many options have been proposed, but they tend to be complicated, inefficient, or both. 34 | Instead `lay()` aims at reaching a sweet spot between simplicity and efficiency. 35 | 36 | The function has been specifically designed to be combined with functions from [**{dplyr}**](https://dplyr.tidyverse.org/) and to feel as if 37 | it was part of it (but you can use `lay()` without [**{dplyr}**](https://dplyr.tidyverse.org/)). 38 | 39 | There is hardly any code behind `lay()` (it can be coded in 3 lines), so this package may just be an interim solution before an established package fulfills the need... Time will tell. 40 | 41 | ### Installation 42 | 43 | 44 | You can install the current CRAN version of **{lay}** with: 45 | 46 | ``` r 47 | install.packages("lay") 48 | ``` 49 | 50 | Alternatively, you can install the development version of **{lay}** using [**{remotes}**](https://remotes.r-lib.org/): 51 | 52 | ``` r 53 | remotes::install_github("courtiol/lay") ## requires to have installed {remotes} 54 | ``` 55 | 56 | ### Motivation 57 | 58 | Consider the following dataset, which contains information about the use of pain relievers for non medical purpose. 59 | ```{r motivation} 60 | library(lay) ## requires to have installed {lay} 61 | drugs 62 | ``` 63 | 64 | The dataset is [tidy](https://vita.had.co.nz/papers/tidy-data.pdf): each row represents one individual and each variable forms a column. 65 | 66 | Imagine now that you would like to know if each individual did use any of these pain relievers. 67 | 68 | How would you proceed? 69 | 70 | 71 | ### Our solution: `lay()` 72 | 73 | This is how you would achieve our goal using `lay()`: 74 | ```{r lay} 75 | library(dplyr, warn.conflicts = FALSE) ## requires to have installed {dplyr} 76 | 77 | drugs_full |> 78 | mutate(everused = lay(pick(-caseid), any)) 79 | ``` 80 | 81 | We used `mutate()` from [**{dplyr}**](https://dplyr.tidyverse.org/) to create a new column called *everused*, and we used `pick()` from that same package to remove the column *caseid* when laying down each row of the data and applying the function `any()`. 82 | 83 | When combining `lay()` and [**{dplyr}**](https://dplyr.tidyverse.org/), you should always use `pick()` or `across()`. The functions `pick()` and `across()` let you pick among many [selection helpers](https://tidyselect.r-lib.org/reference/language.html) from the package [**{tidyselect}**](https://tidyselect.r-lib.org/), which makes it easy to specify which columns to consider. 84 | 85 | Our function `lay()` is quite flexible! For example, you can pass argument(s) of the function you wish to apply rowwise (here `any()`): 86 | 87 | ```{r NA} 88 | drugs_with_NA <- drugs ## create a copy of the dataset 89 | drugs_with_NA[1, 2] <- NA ## introduce a missing value 90 | 91 | drugs_with_NA |> 92 | mutate(everused = lay(pick(-caseid), any)) |> ## without additional argument 93 | slice(1) ## keep first row only 94 | 95 | drugs_with_NA |> 96 | mutate(everused = lay(pick(-caseid), any, na.rm = TRUE)) |> ## with additional argument 97 | slice(1) 98 | ``` 99 | 100 | Since one of the backbones of `lay()` is [**{rlang}**](https://rlang.r-lib.org), you can use the so-called [*lambda* syntax](https://rlang.r-lib.org/reference/as_function.html) to define anonymous functions on the fly: 101 | 102 | ```{r lambda} 103 | drugs_with_NA |> 104 | mutate(everused = lay(pick(-caseid), ~ any(.x, na.rm = TRUE))) ## same as above, different syntax 105 | ``` 106 | 107 | We can also apply many functions at once, as exemplified with another dataset: 108 | 109 | ```{r worldbank} 110 | data("world_bank_pop", package = "tidyr") ## requires to have installed {tidyr} 111 | 112 | world_bank_pop |> 113 | filter(indicator == "SP.POP.TOTL") |> 114 | mutate(lay(pick(matches("\\d")), 115 | ~ tibble(min = min(.x), mean = mean(.x), max = max(.x))), .after = indicator) 116 | ``` 117 | 118 | Since the other backbone of `lay()` is [**{vctrs}**](https://vctrs.r-lib.org), the splicing happens automatically (unless the output of the call is used to create a named column). This is why, in the last chunk of code, three different columns (*min*, *mean* and *max*) are directly created. 119 | 120 | **Important:** when using `lay()` the function you want to use for the rowwise job must output a scalar (vector of length 1), or a tibble or data frame with a single row. 121 | 122 | We can apply a function that returns a vector of length > 1 by turning such a vector into a tibble using `as_tibble_row()` from [**{tibble}**](https://tibble.tidyverse.org/): 123 | 124 | ```{r worldbank2} 125 | world_bank_pop |> 126 | filter(indicator == "SP.POP.TOTL") |> 127 | mutate(lay(pick(matches("\\d")), 128 | ~ as_tibble_row(quantile(.x, na.rm = TRUE))), .after = indicator) 129 | ``` 130 | 131 | ### History 132 | 133 | lay_history 134 | 135 | The first draft of this package has been created by **@romainfrancois** as a reply to a tweet I (Alexandre Courtiol) posted under **@rdataberlin** in February 2020. 136 | At the time I was exploring different ways to perform rowwise jobs in R and I was experimenting with various ideas on how to exploit the fact that the newly introduced function `across()` from [**{dplyr}**](https://dplyr.tidyverse.org/) creates tibbles on which one can easily apply a function. 137 | Romain came up with `lay()` as the better solution, making good use of [**{rlang}**](https://rlang.r-lib.org/) & [**{vctrs}**](https://vctrs.r-lib.org/). 138 | 139 | The verb `lay()` never made it to be integrated within [**{dplyr}**](https://dplyr.tidyverse.org/), but, so far, I still find `lay()` superior than most alternatives, which is why I decided to document and maintain this package. 140 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | # **{lay}** 5 | 6 | 7 | 8 | [![CRAN 9 | status](https://www.r-pkg.org/badges/version/lay)](https://CRAN.R-project.org/package=lay) 10 | [![R-CMD-check](https://github.com/courtiol/lay/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/courtiol/lay/actions/workflows/R-CMD-check.yaml) 11 | [![test-coverage](https://github.com/courtiol/lay/actions/workflows/test-coverage.yaml/badge.svg)](https://github.com/courtiol/lay/actions/workflows/test-coverage.yaml) 12 | [![Lifecycle: 13 | experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) 14 | 15 | 16 | ## An R package for simple but efficient rowwise jobs 17 | 18 | The function `lay()` – the only function of the package **{lay}** – is 19 | intended to be used to apply a function on each row of a data frame or 20 | tibble, independently, and across multiple columns containing values of 21 | the same class (e.g. all numeric). 22 | 23 | Implementing rowwise operations for tabular data is notoriously awkward 24 | in R. Many options have been proposed, but they tend to be complicated, 25 | inefficient, or both. Instead `lay()` aims at reaching a sweet spot 26 | between simplicity and efficiency. 27 | 28 | The function has been specifically designed to be combined with 29 | functions from [**{dplyr}**](https://dplyr.tidyverse.org/) and to feel 30 | as if it was part of it (but you can use `lay()` without 31 | [**{dplyr}**](https://dplyr.tidyverse.org/)). 32 | 33 | There is hardly any code behind `lay()` (it can be coded in 3 lines), so 34 | this package may just be an interim solution before an established 35 | package fulfills the need… Time will tell. 36 | 37 | ### Installation 38 | 39 | You can install the current CRAN version of **{lay}** with: 40 | 41 | ``` r 42 | install.packages("lay") 43 | ``` 44 | 45 | Alternatively, you can install the development version of **{lay}** 46 | using [**{remotes}**](https://remotes.r-lib.org/): 47 | 48 | ``` r 49 | remotes::install_github("courtiol/lay") ## requires to have installed {remotes} 50 | ``` 51 | 52 | ### Motivation 53 | 54 | Consider the following dataset, which contains information about the use 55 | of pain relievers for non medical purpose. 56 | 57 | ``` r 58 | library(lay) ## requires to have installed {lay} 59 | drugs 60 | #> # A tibble: 100 × 8 61 | #> caseid hydrocd oxycodp codeine tramadl morphin methdon vicolor 62 | #> 63 | #> 1 1 0 0 0 0 0 0 0 64 | #> 2 2 0 0 0 0 0 0 0 65 | #> 3 3 0 0 0 0 0 0 0 66 | #> 4 4 0 0 0 0 0 0 0 67 | #> 5 5 0 0 0 0 0 0 0 68 | #> 6 6 0 0 0 0 0 0 0 69 | #> 7 7 0 0 0 0 0 0 0 70 | #> 8 8 0 0 0 0 0 0 0 71 | #> 9 9 0 0 0 0 0 0 1 72 | #> 10 10 0 0 0 0 0 0 0 73 | #> # ℹ 90 more rows 74 | ``` 75 | 76 | The dataset is [tidy](https://vita.had.co.nz/papers/tidy-data.pdf): each 77 | row represents one individual and each variable forms a column. 78 | 79 | Imagine now that you would like to know if each individual did use any 80 | of these pain relievers. 81 | 82 | How would you proceed? 83 | 84 | ### Our solution: `lay()` 85 | 86 | This is how you would achieve our goal using `lay()`: 87 | 88 | ``` r 89 | library(dplyr, warn.conflicts = FALSE) ## requires to have installed {dplyr} 90 | 91 | drugs_full |> 92 | mutate(everused = lay(pick(-caseid), any)) 93 | #> # A tibble: 55,271 × 9 94 | #> caseid hydrocd oxycodp codeine tramadl morphin methdon vicolor everused 95 | #> 96 | #> 1 1 0 0 0 0 0 0 0 FALSE 97 | #> 2 2 0 0 0 0 0 0 0 FALSE 98 | #> 3 3 0 0 0 0 0 0 0 FALSE 99 | #> 4 4 0 0 0 0 0 0 0 FALSE 100 | #> 5 5 0 0 0 0 0 0 0 FALSE 101 | #> 6 6 0 0 0 0 0 0 0 FALSE 102 | #> 7 7 0 0 0 0 0 0 0 FALSE 103 | #> 8 8 0 0 0 0 0 0 0 FALSE 104 | #> 9 9 0 0 0 0 0 0 1 TRUE 105 | #> 10 10 0 0 0 0 0 0 0 FALSE 106 | #> # ℹ 55,261 more rows 107 | ``` 108 | 109 | We used `mutate()` from [**{dplyr}**](https://dplyr.tidyverse.org/) to 110 | create a new column called *everused*, and we used `pick()` from that 111 | same package to remove the column *caseid* when laying down each row of 112 | the data and applying the function `any()`. 113 | 114 | When combining `lay()` and [**{dplyr}**](https://dplyr.tidyverse.org/), 115 | you should always use `pick()` or `across()`. The functions `pick()` and 116 | `across()` let you pick among many [selection 117 | helpers](https://tidyselect.r-lib.org/reference/language.html) from the 118 | package [**{tidyselect}**](https://tidyselect.r-lib.org/), which makes 119 | it easy to specify which columns to consider. 120 | 121 | Our function `lay()` is quite flexible! For example, you can pass 122 | argument(s) of the function you wish to apply rowwise (here `any()`): 123 | 124 | ``` r 125 | drugs_with_NA <- drugs ## create a copy of the dataset 126 | drugs_with_NA[1, 2] <- NA ## introduce a missing value 127 | 128 | drugs_with_NA |> 129 | mutate(everused = lay(pick(-caseid), any)) |> ## without additional argument 130 | slice(1) ## keep first row only 131 | #> # A tibble: 1 × 9 132 | #> caseid hydrocd oxycodp codeine tramadl morphin methdon vicolor everused 133 | #> 134 | #> 1 1 NA 0 0 0 0 0 0 NA 135 | 136 | drugs_with_NA |> 137 | mutate(everused = lay(pick(-caseid), any, na.rm = TRUE)) |> ## with additional argument 138 | slice(1) 139 | #> # A tibble: 1 × 9 140 | #> caseid hydrocd oxycodp codeine tramadl morphin methdon vicolor everused 141 | #> 142 | #> 1 1 NA 0 0 0 0 0 0 FALSE 143 | ``` 144 | 145 | Since one of the backbones of `lay()` is 146 | [**{rlang}**](https://rlang.r-lib.org), you can use the so-called 147 | [*lambda* syntax](https://rlang.r-lib.org/reference/as_function.html) to 148 | define anonymous functions on the fly: 149 | 150 | ``` r 151 | drugs_with_NA |> 152 | mutate(everused = lay(pick(-caseid), ~ any(.x, na.rm = TRUE))) ## same as above, different syntax 153 | #> # A tibble: 100 × 9 154 | #> caseid hydrocd oxycodp codeine tramadl morphin methdon vicolor everused 155 | #> 156 | #> 1 1 NA 0 0 0 0 0 0 FALSE 157 | #> 2 2 0 0 0 0 0 0 0 FALSE 158 | #> 3 3 0 0 0 0 0 0 0 FALSE 159 | #> 4 4 0 0 0 0 0 0 0 FALSE 160 | #> 5 5 0 0 0 0 0 0 0 FALSE 161 | #> 6 6 0 0 0 0 0 0 0 FALSE 162 | #> 7 7 0 0 0 0 0 0 0 FALSE 163 | #> 8 8 0 0 0 0 0 0 0 FALSE 164 | #> 9 9 0 0 0 0 0 0 1 TRUE 165 | #> 10 10 0 0 0 0 0 0 0 FALSE 166 | #> # ℹ 90 more rows 167 | ``` 168 | 169 | We can also apply many functions at once, as exemplified with another 170 | dataset: 171 | 172 | ``` r 173 | data("world_bank_pop", package = "tidyr") ## requires to have installed {tidyr} 174 | 175 | world_bank_pop |> 176 | filter(indicator == "SP.POP.TOTL") |> 177 | mutate(lay(pick(matches("\\d")), 178 | ~ tibble(min = min(.x), mean = mean(.x), max = max(.x))), .after = indicator) 179 | #> # A tibble: 266 × 23 180 | #> country indicator min mean max `2000` `2001` `2002` `2003` `2004` 181 | #> 182 | #> 1 ABW SP.POP.TOTL 8.91e4 9.81e4 1.05e5 8.91e4 9.07e4 9.18e4 9.27e4 9.35e4 183 | #> 2 AFE SP.POP.TOTL 4.02e8 5.08e8 6.33e8 4.02e8 4.12e8 4.23e8 4.34e8 4.45e8 184 | #> 3 AFG SP.POP.TOTL 1.95e7 2.73e7 3.56e7 1.95e7 1.97e7 2.10e7 2.26e7 2.36e7 185 | #> 4 AFW SP.POP.TOTL 2.70e8 3.45e8 4.31e8 2.70e8 2.77e8 2.85e8 2.93e8 3.01e8 186 | #> 5 AGO SP.POP.TOTL 1.64e7 2.26e7 3.02e7 1.64e7 1.69e7 1.75e7 1.81e7 1.88e7 187 | #> 6 ALB SP.POP.TOTL 2.87e6 2.96e6 3.09e6 3.09e6 3.06e6 3.05e6 3.04e6 3.03e6 188 | #> 7 AND SP.POP.TOTL 6.61e4 7.32e4 8.02e4 6.61e4 6.78e4 7.08e4 7.39e4 7.69e4 189 | #> 8 ARB SP.POP.TOTL 2.87e8 3.52e8 4.24e8 2.87e8 2.94e8 3.00e8 3.07e8 3.13e8 190 | #> 9 ARE SP.POP.TOTL 3.28e6 6.58e6 9.07e6 3.28e6 3.45e6 3.63e6 3.81e6 3.99e6 191 | #> 10 ARG SP.POP.TOTL 3.71e7 4.05e7 4.40e7 3.71e7 3.75e7 3.79e7 3.83e7 3.87e7 192 | #> # ℹ 256 more rows 193 | #> # ℹ 13 more variables: `2005` , `2006` , `2007` , `2008` , 194 | #> # `2009` , `2010` , `2011` , `2012` , `2013` , 195 | #> # `2014` , `2015` , `2016` , `2017` 196 | ``` 197 | 198 | Since the other backbone of `lay()` is 199 | [**{vctrs}**](https://vctrs.r-lib.org), the splicing happens 200 | automatically (unless the output of the call is used to create a named 201 | column). This is why, in the last chunk of code, three different columns 202 | (*min*, *mean* and *max*) are directly created. 203 | 204 | **Important:** when using `lay()` the function you want to use for the 205 | rowwise job must output a scalar (vector of length 1), or a tibble or 206 | data frame with a single row. 207 | 208 | We can apply a function that returns a vector of length \> 1 by turning 209 | such a vector into a tibble using `as_tibble_row()` from 210 | [**{tibble}**](https://tibble.tidyverse.org/): 211 | 212 | ``` r 213 | world_bank_pop |> 214 | filter(indicator == "SP.POP.TOTL") |> 215 | mutate(lay(pick(matches("\\d")), 216 | ~ as_tibble_row(quantile(.x, na.rm = TRUE))), .after = indicator) 217 | #> # A tibble: 266 × 25 218 | #> country indicator `0%` `25%` `50%` `75%` `100%` `2000` `2001` `2002` 219 | #> 220 | #> 1 ABW SP.POP.TOTL 8.91e4 9.38e4 9.86e4 1.03e5 1.05e5 8.91e4 9.07e4 9.18e4 221 | #> 2 AFE SP.POP.TOTL 4.02e8 4.48e8 5.03e8 5.64e8 6.33e8 4.02e8 4.12e8 4.23e8 222 | #> 3 AFG SP.POP.TOTL 1.95e7 2.38e7 2.69e7 3.13e7 3.56e7 1.95e7 1.97e7 2.10e7 223 | #> 4 AFW SP.POP.TOTL 2.70e8 3.03e8 3.42e8 3.85e8 4.31e8 2.70e8 2.77e8 2.85e8 224 | #> 5 AGO SP.POP.TOTL 1.64e7 1.89e7 2.21e7 2.59e7 3.02e7 1.64e7 1.69e7 1.75e7 225 | #> 6 ALB SP.POP.TOTL 2.87e6 2.90e6 2.94e6 3.02e6 3.09e6 3.09e6 3.06e6 3.05e6 226 | #> 7 AND SP.POP.TOTL 6.61e4 7.11e4 7.21e4 7.55e4 8.02e4 6.61e4 6.78e4 7.08e4 227 | #> 8 ARB SP.POP.TOTL 2.87e8 3.15e8 3.51e8 3.87e8 4.24e8 2.87e8 2.94e8 3.00e8 228 | #> 9 ARE SP.POP.TOTL 3.28e6 4.07e6 7.49e6 8.73e6 9.07e6 3.28e6 3.45e6 3.63e6 229 | #> 10 ARG SP.POP.TOTL 3.71e7 3.88e7 4.05e7 4.21e7 4.40e7 3.71e7 3.75e7 3.79e7 230 | #> # ℹ 256 more rows 231 | #> # ℹ 15 more variables: `2003` , `2004` , `2005` , `2006` , 232 | #> # `2007` , `2008` , `2009` , `2010` , `2011` , 233 | #> # `2012` , `2013` , `2014` , `2015` , `2016` , 234 | #> # `2017` 235 | ``` 236 | 237 | ### History 238 | 239 | lay_history 240 | 241 | The first draft of this package has been created by **@romainfrancois** 242 | as a reply to a tweet I (Alexandre Courtiol) posted under 243 | **@rdataberlin** in February 2020. At the time I was exploring different 244 | ways to perform rowwise jobs in R and I was experimenting with various 245 | ideas on how to exploit the fact that the newly introduced function 246 | `across()` from [**{dplyr}**](https://dplyr.tidyverse.org/) creates 247 | tibbles on which one can easily apply a function. Romain came up with 248 | `lay()` as the better solution, making good use of 249 | [**{rlang}**](https://rlang.r-lib.org/) & 250 | [**{vctrs}**](https://vctrs.r-lib.org/). 251 | 252 | The verb `lay()` never made it to be integrated within 253 | [**{dplyr}**](https://dplyr.tidyverse.org/), but, so far, I still find 254 | `lay()` superior than most alternatives, which is why I decided to 255 | document and maintain this package. 256 | --------------------------------------------------------------------------------