├── .github
├── .gitignore
├── pics
│ └── lay_history.png
├── ISSUE_TEMPLATE
│ └── issue_template.md
├── workflows
│ ├── pkgdown.yaml
│ ├── R-CMD-check.yaml
│ ├── test-coverage.yaml
│ └── pr-commands.yaml
├── SUPPORT.md
├── CONTRIBUTING.md
└── CODE_OF_CONDUCT.md
├── vignettes
└── articles
│ ├── .gitignore
│ ├── alternatives.Rmd
│ └── benchmarks.Rmd
├── LICENSE
├── data
├── drugs.rda
└── drugs_full.rda
├── man
├── figures
│ └── logo.png
├── reexports.Rd
├── drugs.Rd
└── lay.Rd
├── .gitignore
├── source_hexsticker
├── lay.png
├── lay.xcf
└── prepare_hexsticker.R
├── pkgdown
└── favicon
│ ├── favicon.ico
│ ├── favicon-16x16.png
│ ├── favicon-32x32.png
│ ├── apple-touch-icon.png
│ ├── apple-touch-icon-60x60.png
│ ├── apple-touch-icon-76x76.png
│ ├── apple-touch-icon-120x120.png
│ ├── apple-touch-icon-152x152.png
│ └── apple-touch-icon-180x180.png
├── tests
├── testthat.R
├── spelling.R
└── testthat
│ └── test-lay.R
├── NEWS.md
├── R
├── reexports.R
├── data.R
└── lay.R
├── .Rbuildignore
├── NAMESPACE
├── _pkgdown.yml
├── inst
├── WORDLIST
└── CITATION
├── lay.Rproj
├── cran-comments.md
├── DESCRIPTION
├── LICENSE.md
├── README.Rmd
└── README.md
/.github/.gitignore:
--------------------------------------------------------------------------------
1 | *.html
2 |
--------------------------------------------------------------------------------
/vignettes/articles/.gitignore:
--------------------------------------------------------------------------------
1 | *.html
2 | *.R
3 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | YEAR: 2023
2 | COPYRIGHT HOLDER: Alexandre Courtiol
3 |
--------------------------------------------------------------------------------
/data/drugs.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/courtiol/lay/HEAD/data/drugs.rda
--------------------------------------------------------------------------------
/data/drugs_full.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/courtiol/lay/HEAD/data/drugs_full.rda
--------------------------------------------------------------------------------
/man/figures/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/courtiol/lay/HEAD/man/figures/logo.png
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .Rproj.user
2 | .Rhistory
3 | .RData
4 | inst/doc
5 | source_data
6 | docs
7 |
--------------------------------------------------------------------------------
/source_hexsticker/lay.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/courtiol/lay/HEAD/source_hexsticker/lay.png
--------------------------------------------------------------------------------
/source_hexsticker/lay.xcf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/courtiol/lay/HEAD/source_hexsticker/lay.xcf
--------------------------------------------------------------------------------
/pkgdown/favicon/favicon.ico:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/courtiol/lay/HEAD/pkgdown/favicon/favicon.ico
--------------------------------------------------------------------------------
/.github/pics/lay_history.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/courtiol/lay/HEAD/.github/pics/lay_history.png
--------------------------------------------------------------------------------
/pkgdown/favicon/favicon-16x16.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/courtiol/lay/HEAD/pkgdown/favicon/favicon-16x16.png
--------------------------------------------------------------------------------
/pkgdown/favicon/favicon-32x32.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/courtiol/lay/HEAD/pkgdown/favicon/favicon-32x32.png
--------------------------------------------------------------------------------
/pkgdown/favicon/apple-touch-icon.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/courtiol/lay/HEAD/pkgdown/favicon/apple-touch-icon.png
--------------------------------------------------------------------------------
/tests/testthat.R:
--------------------------------------------------------------------------------
1 | library(testthat)
2 | library(lay)
3 | library(dplyr, warn.conflicts = FALSE)
4 |
5 | test_check("lay")
6 |
--------------------------------------------------------------------------------
/pkgdown/favicon/apple-touch-icon-60x60.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/courtiol/lay/HEAD/pkgdown/favicon/apple-touch-icon-60x60.png
--------------------------------------------------------------------------------
/pkgdown/favicon/apple-touch-icon-76x76.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/courtiol/lay/HEAD/pkgdown/favicon/apple-touch-icon-76x76.png
--------------------------------------------------------------------------------
/pkgdown/favicon/apple-touch-icon-120x120.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/courtiol/lay/HEAD/pkgdown/favicon/apple-touch-icon-120x120.png
--------------------------------------------------------------------------------
/pkgdown/favicon/apple-touch-icon-152x152.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/courtiol/lay/HEAD/pkgdown/favicon/apple-touch-icon-152x152.png
--------------------------------------------------------------------------------
/pkgdown/favicon/apple-touch-icon-180x180.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/courtiol/lay/HEAD/pkgdown/favicon/apple-touch-icon-180x180.png
--------------------------------------------------------------------------------
/NEWS.md:
--------------------------------------------------------------------------------
1 | # lay 0.1.3
2 |
3 | * Add value field in documentation to meet CRAN requirements.
4 |
5 | # lay 0.1.2
6 |
7 | * Initial CRAN submission.
8 |
--------------------------------------------------------------------------------
/R/reexports.R:
--------------------------------------------------------------------------------
1 | #' @importFrom tibble tibble
2 | #' @export
3 | tibble::tibble
4 |
5 | #' @importFrom tibble as_tibble_row
6 | #' @export
7 | tibble::as_tibble_row
8 |
--------------------------------------------------------------------------------
/tests/spelling.R:
--------------------------------------------------------------------------------
1 | if(requireNamespace('spelling', quietly = TRUE))
2 | spelling::spell_check_test(vignettes = TRUE, error = FALSE,
3 | skip_on_cran = TRUE)
4 |
--------------------------------------------------------------------------------
/.Rbuildignore:
--------------------------------------------------------------------------------
1 | ^lay\.Rproj$
2 | ^\.Rproj\.user$
3 | ^LICENSE\.md$
4 | ^README\.Rmd$
5 | ^cran-comments\.md$
6 | ^\.github$
7 | ^\.github/workflows/R-CMD-check\.yaml$
8 | ^\.github/workflows/pr-commands\.yaml$
9 | ^source\_data$
10 | ^source\_hexsticker$
11 | ^_pkgdown\.yml$
12 | ^docs$
13 | ^pkgdown$
14 | ^vignettes$
15 |
--------------------------------------------------------------------------------
/source_hexsticker/prepare_hexsticker.R:
--------------------------------------------------------------------------------
1 | library(hexSticker)
2 | sticker("source_hexsticker/lay.png",
3 | h_color = "#00ff00",
4 | package = NULL,
5 | s_width = 1,
6 | s_x = 1.075,
7 | s_y = 1,
8 | white_around_sticker = TRUE,
9 | filename = "inst/figures/logo.png")
10 |
--------------------------------------------------------------------------------
/NAMESPACE:
--------------------------------------------------------------------------------
1 | # Generated by roxygen2: do not edit by hand
2 |
3 | export(as_tibble_row)
4 | export(lay)
5 | export(tibble)
6 | importFrom(purrr,pmap)
7 | importFrom(rlang,as_function)
8 | importFrom(rlang,exec)
9 | importFrom(rlang,list2)
10 | importFrom(tibble,as_tibble_row)
11 | importFrom(tibble,tibble)
12 | importFrom(vctrs,vec_c)
13 |
--------------------------------------------------------------------------------
/_pkgdown.yml:
--------------------------------------------------------------------------------
1 | url: https://courtiol.github.io/lay/
2 | template:
3 | bootstrap: 5
4 |
5 | reference:
6 | - title: Verb
7 | desc: >
8 | The main verb of the package:
9 | contents:
10 | - lay
11 |
12 | - title: Data
13 | desc: >
14 | Some datasets to illustrate functionalities:
15 | contents:
16 | - drugs
17 | - drugs_full
18 |
19 |
--------------------------------------------------------------------------------
/inst/WORDLIST:
--------------------------------------------------------------------------------
1 | Benchmarking
2 | CMD
3 | Lifecycle
4 | Rowwise
5 | caseid
6 | dplyr
7 | everused
8 | hydrocodone
9 | lorcert
10 | lortab
11 | nonmedically
12 | oxycontin
13 | percocet
14 | percodan
15 | purrr
16 | recoded
17 | rlang
18 | rowwise
19 | tibble
20 | tibbles
21 | tidyr
22 | tidyselect
23 | tramadol
24 | tylox
25 | vctrs
26 | vectorized
27 | vicodin
28 |
--------------------------------------------------------------------------------
/inst/CITATION:
--------------------------------------------------------------------------------
1 | bibentry(
2 | bibtype = "Manual",
3 | title = "Simple but Efficient Rowwise Jobs",
4 | author = c(person("Alexandre", "Courtiol", email = "alexandre.courtiol@gmail.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0003-0637-2959")),
5 | person("Romain", "François", role = c("aut"), comment = c(ORCID = "0000-0002-2444-4226"))
6 | ),
7 | year = 2023,
8 | url = "https://github.com/courtiol/lay"
9 | )
10 |
--------------------------------------------------------------------------------
/lay.Rproj:
--------------------------------------------------------------------------------
1 | Version: 1.0
2 |
3 | RestoreWorkspace: No
4 | SaveWorkspace: No
5 | AlwaysSaveHistory: Default
6 |
7 | EnableCodeIndexing: Yes
8 | UseSpacesForTab: Yes
9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 |
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 |
15 | AutoAppendNewline: Yes
16 | StripTrailingWhitespace: Yes
17 |
18 | BuildType: Package
19 | PackageUseDevtools: Yes
20 | PackageInstallArgs: --no-multiarch --with-keep.source
21 | PackageRoxygenize: rd,collate,namespace
22 |
--------------------------------------------------------------------------------
/man/reexports.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/reexports.R
3 | \docType{import}
4 | \name{reexports}
5 | \alias{reexports}
6 | \alias{tibble}
7 | \alias{as_tibble_row}
8 | \title{Objects exported from other packages}
9 | \keyword{internal}
10 | \description{
11 | These objects are imported from other packages. Follow the links
12 | below to see their documentation.
13 |
14 | \describe{
15 | \item{tibble}{\code{\link[tibble:as_tibble]{as_tibble_row}}, \code{\link[tibble]{tibble}}}
16 | }}
17 |
18 |
--------------------------------------------------------------------------------
/cran-comments.md:
--------------------------------------------------------------------------------
1 | ## Test environments
2 |
3 | * local R installation, R 4.3.1
4 | * GitHub Actions (usethis::use_github_action("check-standard"))
5 | - {os: macos-latest, r: 'release'}
6 | - {os: windows-latest, r: 'release'}
7 | - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'}
8 | - {os: ubuntu-latest, r: 'release'}
9 | - {os: ubuntu-latest, r: 'oldrel-1'}
10 | * win-builder (devel)
11 |
12 | ## R CMD check results
13 |
14 | 2 false positives in terms of misspellings in DESCRIPTION:
15 | "Rowwise" and "tibble".
16 |
17 | 0 errors | 0 warnings | 1 note
18 |
19 | * This is a new release.
20 |
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/issue_template.md:
--------------------------------------------------------------------------------
1 | ---
2 | name: Bug report or feature request
3 | about: Describe a bug you've seen or make a case for a new feature
4 | ---
5 |
6 | Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on or .
7 |
8 | Please include a minimal reproducible example (AKA a reprex). If you've never heard of a [reprex](http://reprex.tidyverse.org/) before, start by reading .
9 |
10 | Brief description of the problem
11 |
12 | ```r
13 | # insert reprex here
14 | ```
15 |
--------------------------------------------------------------------------------
/DESCRIPTION:
--------------------------------------------------------------------------------
1 | Package: lay
2 | Title: Simple but Efficient Rowwise Jobs
3 | Version: 0.1.3
4 | Authors@R: c(
5 | person("Alexandre", "Courtiol", email = "alexandre.courtiol@gmail.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0003-0637-2959")),
6 | person("Romain", "François", role = c("aut"), comment = c(ORCID = "0000-0002-2444-4226"))
7 | )
8 | Description: Creating efficiently new column(s) in a data frame (including tibble) by applying a function one row at a time.
9 | License: MIT + file LICENSE
10 | URL: https://courtiol.github.io/lay/, https://github.com/courtiol/lay/
11 | BugReports: https://github.com/courtiol/lay/issues/
12 | Encoding: UTF-8
13 | LazyData: true
14 | Depends:
15 | R (>= 2.10)
16 | Imports:
17 | rlang,
18 | purrr,
19 | vctrs,
20 | tibble
21 | Suggests:
22 | bench,
23 | covr,
24 | data.table,
25 | dplyr (>= 1.0),
26 | forcats,
27 | ggplot2,
28 | ggbeeswarm,
29 | knitr,
30 | rmarkdown,
31 | slider,
32 | testthat (>= 2.1.0),
33 | tidyr,
34 | spelling
35 | Roxygen: list(markdown = TRUE)
36 | RoxygenNote: 7.2.3
37 | Language: en-US
38 |
--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
1 | # MIT License
2 |
3 | Copyright (c) 2023 Alexandre Courtiol
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/tests/testthat/test-lay.R:
--------------------------------------------------------------------------------
1 | test_that("lay works", {
2 |
3 | ## data for tests
4 | df <- tibble(x = 1:10, y = 11:20, z = 21:30)
5 | df_na <- df
6 | df_na[1, 1] <- NA
7 |
8 | ## simple calls
9 | expect_identical(
10 | lay(df, mean),
11 | rowMeans(df)
12 | )
13 |
14 | expect_identical(
15 | lay(df, min),
16 | df$x
17 | )
18 |
19 | ## call with fn arguments
20 | expect_identical(
21 | lay(df_na, mean, na.rm = TRUE),
22 | rowMeans(df_na, na.rm = TRUE)
23 | )
24 |
25 | ## auto spliced output
26 | expect_identical(
27 | lay(df, ~ tibble(min = min(.x), max = max(.x))),
28 | tibble(min = df$x, max = df$z)
29 | )
30 |
31 | ## both methods should lead to same results
32 | expect_identical(
33 | lay(df, mean, .method = "tidy"),
34 | lay(df, mean, .method = "apply")
35 | )
36 |
37 | expect_identical(
38 | lay(df, ~ mean(.x), .method = "tidy"),
39 | lay(df, ~ mean(.x), .method = "apply")
40 | )
41 |
42 | expect_identical(
43 | lay(df, ~ tibble(min = min(.x), max = max(.x)), method = "tidy"),
44 | lay(df, ~ tibble(min = min(.x), max = max(.x)), method = "apply")
45 | )
46 |
47 | ## handle error properly
48 | expect_error(
49 | lay(df, mean, .method = "nonesense")
50 | )
51 |
52 | })
53 |
--------------------------------------------------------------------------------
/.github/workflows/pkgdown.yaml:
--------------------------------------------------------------------------------
1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
3 | on:
4 | push:
5 | branches: [main, master]
6 | pull_request:
7 | branches: [main, master]
8 | release:
9 | types: [published]
10 | workflow_dispatch:
11 |
12 | name: pkgdown
13 |
14 | jobs:
15 | pkgdown:
16 | runs-on: ubuntu-latest
17 | # Only restrict concurrency for non-PR jobs
18 | concurrency:
19 | group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }}
20 | env:
21 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
22 | permissions:
23 | contents: write
24 | steps:
25 | - uses: actions/checkout@v3
26 |
27 | - uses: r-lib/actions/setup-pandoc@v2
28 |
29 | - uses: r-lib/actions/setup-r@v2
30 | with:
31 | use-public-rspm: true
32 |
33 | - uses: r-lib/actions/setup-r-dependencies@v2
34 | with:
35 | extra-packages: any::pkgdown, local::.
36 | needs: website
37 |
38 | - name: Build site
39 | run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE)
40 | shell: Rscript {0}
41 |
42 | - name: Deploy to GitHub pages 🚀
43 | if: github.event_name != 'pull_request'
44 | uses: JamesIves/github-pages-deploy-action@v4.4.1
45 | with:
46 | clean: false
47 | branch: gh-pages
48 | folder: docs
49 |
--------------------------------------------------------------------------------
/.github/workflows/R-CMD-check.yaml:
--------------------------------------------------------------------------------
1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
3 | on:
4 | push:
5 | branches: [main, master]
6 | pull_request:
7 | branches: [main, master]
8 |
9 | name: R-CMD-check
10 |
11 | jobs:
12 | R-CMD-check:
13 | runs-on: ${{ matrix.config.os }}
14 |
15 | name: ${{ matrix.config.os }} (${{ matrix.config.r }})
16 |
17 | strategy:
18 | fail-fast: false
19 | matrix:
20 | config:
21 | - {os: macos-latest, r: 'release'}
22 | - {os: windows-latest, r: 'release'}
23 | - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'}
24 | - {os: ubuntu-latest, r: 'release'}
25 | - {os: ubuntu-latest, r: 'oldrel-1'}
26 |
27 | env:
28 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
29 | R_KEEP_PKG_SOURCE: yes
30 |
31 | steps:
32 | - uses: actions/checkout@v3
33 |
34 | - uses: r-lib/actions/setup-pandoc@v2
35 |
36 | - uses: r-lib/actions/setup-r@v2
37 | with:
38 | r-version: ${{ matrix.config.r }}
39 | http-user-agent: ${{ matrix.config.http-user-agent }}
40 | use-public-rspm: true
41 |
42 | - uses: r-lib/actions/setup-r-dependencies@v2
43 | with:
44 | extra-packages: any::rcmdcheck
45 | needs: check
46 |
47 | - uses: r-lib/actions/check-r-package@v2
48 | with:
49 | upload-snapshots: true
50 |
--------------------------------------------------------------------------------
/.github/workflows/test-coverage.yaml:
--------------------------------------------------------------------------------
1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
3 | on:
4 | push:
5 | branches: [main, master]
6 | pull_request:
7 | branches: [main, master]
8 |
9 | name: test-coverage
10 |
11 | jobs:
12 | test-coverage:
13 | runs-on: ubuntu-latest
14 | env:
15 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
16 |
17 | steps:
18 | - uses: actions/checkout@v3
19 |
20 | - uses: r-lib/actions/setup-r@v2
21 | with:
22 | use-public-rspm: true
23 |
24 | - uses: r-lib/actions/setup-r-dependencies@v2
25 | with:
26 | extra-packages: any::covr
27 | needs: coverage
28 |
29 | - name: Test coverage
30 | run: |
31 | covr::codecov(
32 | quiet = FALSE,
33 | clean = FALSE,
34 | install_path = file.path(Sys.getenv("RUNNER_TEMP"), "package")
35 | )
36 | shell: Rscript {0}
37 |
38 | - name: Show testthat output
39 | if: always()
40 | run: |
41 | ## --------------------------------------------------------------------
42 | find ${{ runner.temp }}/package -name 'testthat.Rout*' -exec cat '{}' \; || true
43 | shell: bash
44 |
45 | - name: Upload test results
46 | if: failure()
47 | uses: actions/upload-artifact@v3
48 | with:
49 | name: coverage-test-failures
50 | path: ${{ runner.temp }}/package
51 |
--------------------------------------------------------------------------------
/R/data.R:
--------------------------------------------------------------------------------
1 | #' Pain relievers misuse in the US
2 | #'
3 | #' Datasets containing information about the use of pain relievers for non medical purpose.
4 | #'
5 | #' These datasets are a small subset from the "National Survey on Drug Use and Health, 2014".
6 | #' All variables related to drug use have been recoded into vectors of integers talking value 0 for
7 | #' "No/Unknown" and value 1 for "Yes". The original variable names were the same as those defined
8 | #' here but in upper case and ending with the number 2. The dataset called `drugs` contain the first
9 | #' 100 rows of the one called `drugs_full`.
10 | #'
11 | #' @format A tibble with either 100 or 55271 rows, and 8 variables:
12 | #' \describe{
13 | #' \item{caseid}{The identifier code of the respondent}
14 | #' \item{hydrocd}{Ever use hydrocodone nonmedically?}
15 | #' \item{oxycodp}{Ever use ever percocet, percodan, tylox, oxycontin... nonmedically?}
16 | #' \item{codeine}{Ever used codeine nonmedically?}
17 | #' \item{tramadl}{Ever used tramadol nonmedically?}
18 | #' \item{morphin}{Ever used morphine nonmedically?}
19 | #' \item{methdon}{Ever used methadone nonmedically?}
20 | #' \item{vicolor}{Ever used vicodin, lortab or lorcert nonmedically?}
21 | #' }
22 | #' @source \url{https://www.icpsr.umich.edu/web/NAHDAP/studies/36361}
23 | #' @references United States Department of Health and Human Services.
24 | #' Substance Abuse and Mental Health Services Administration.
25 | #' Center for Behavioral Health Statistics and Quality.
26 | #' National Survey on Drug Use and Health, 2014.
27 | #' Ann Arbor, MI: Inter-university Consortium for Political and Social Research (distributor), 2016-03-22.
28 | #' \doi{https://doi.org/10.3886/ICPSR36361.v1}
29 | #'
30 | #' @aliases drugs drugs_full
31 | #' @name drugs
32 | #' @examples
33 | #' drugs
34 | #' drugs_full
35 | NULL
36 |
--------------------------------------------------------------------------------
/man/drugs.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/data.R
3 | \name{drugs}
4 | \alias{drugs}
5 | \alias{drugs_full}
6 | \title{Pain relievers misuse in the US}
7 | \format{
8 | A tibble with either 100 or 55271 rows, and 8 variables:
9 | \describe{
10 | \item{caseid}{The identifier code of the respondent}
11 | \item{hydrocd}{Ever use hydrocodone nonmedically?}
12 | \item{oxycodp}{Ever use ever percocet, percodan, tylox, oxycontin... nonmedically?}
13 | \item{codeine}{Ever used codeine nonmedically?}
14 | \item{tramadl}{Ever used tramadol nonmedically?}
15 | \item{morphin}{Ever used morphine nonmedically?}
16 | \item{methdon}{Ever used methadone nonmedically?}
17 | \item{vicolor}{Ever used vicodin, lortab or lorcert nonmedically?}
18 | }
19 | }
20 | \source{
21 | \url{https://www.icpsr.umich.edu/web/NAHDAP/studies/36361}
22 | }
23 | \description{
24 | Datasets containing information about the use of pain relievers for non medical purpose.
25 | }
26 | \details{
27 | These datasets are a small subset from the "National Survey on Drug Use and Health, 2014".
28 | All variables related to drug use have been recoded into vectors of integers talking value 0 for
29 | "No/Unknown" and value 1 for "Yes". The original variable names were the same as those defined
30 | here but in upper case and ending with the number 2. The dataset called \code{drugs} contain the first
31 | 100 rows of the one called \code{drugs_full}.
32 | }
33 | \examples{
34 | drugs
35 | drugs_full
36 | }
37 | \references{
38 | United States Department of Health and Human Services.
39 | Substance Abuse and Mental Health Services Administration.
40 | Center for Behavioral Health Statistics and Quality.
41 | National Survey on Drug Use and Health, 2014.
42 | Ann Arbor, MI: Inter-university Consortium for Political and Social Research (distributor), 2016-03-22.
43 | \doi{https://doi.org/10.3886/ICPSR36361.v1}
44 | }
45 |
--------------------------------------------------------------------------------
/.github/SUPPORT.md:
--------------------------------------------------------------------------------
1 | # Getting help with lay
2 |
3 | Thanks for using lay!
4 | Before filing an issue, there are a few places to explore and pieces to put together to make the process as smooth as possible.
5 |
6 | ## Make a reprex
7 |
8 | Start by making a minimal **repr**oducible **ex**ample using the [reprex](https://reprex.tidyverse.org/) package.
9 | If you haven't heard of or used reprex before, you're in for a treat!
10 | Seriously, reprex will make all of your R-question-asking endeavors easier (which is a pretty insane ROI for the five to ten minutes it'll take you to learn what it's all about).
11 | For additional reprex pointers, check out the [Get help!](https://www.tidyverse.org/help/) section of the tidyverse site.
12 |
13 | ## Where to ask?
14 |
15 | Armed with your reprex, the next step is to figure out [where to ask](https://www.tidyverse.org/help/#where-to-ask).
16 |
17 | * If it's a question: start with [community.rstudio.com](https://community.rstudio.com/), and/or StackOverflow. There are more people there to answer questions.
18 |
19 | * If it's a bug: you're in the right place, [file an issue](https://github.com//issues/new).
20 |
21 | * If you're not sure: let the community help you figure it out!
22 | If your problem _is_ a bug or a feature request, you can easily return here and report it.
23 |
24 | Before opening a new issue, be sure to [search issues and pull requests](https://github.com//issues) to make sure the bug hasn't been reported and/or already fixed in the development version.
25 | By default, the search will be pre-populated with `is:issue is:open`.
26 | You can [edit the qualifiers](https://help.github.com/articles/searching-issues-and-pull-requests/) (e.g. `is:pr`, `is:closed`) as needed.
27 | For example, you'd simply remove `is:open` to search _all_ issues in the repo, open or closed.
28 |
29 | ## What happens next?
30 |
31 | To be as efficient as possible, development of tidyverse packages tends to be very bursty, so you shouldn't worry if you don't get an immediate response.
32 | Typically we don't look at a repo until a sufficient quantity of issues accumulates, then there's a burst of intense activity as we focus our efforts.
33 | That makes development more efficient because it avoids expensive context switching between problems, at the cost of taking longer to get back to you.
34 | This process makes a good reprex particularly important because it might be multiple months between your initial report and when we start working on it.
35 | If we can't reproduce the bug, we can't fix it!
36 |
--------------------------------------------------------------------------------
/.github/workflows/pr-commands.yaml:
--------------------------------------------------------------------------------
1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
3 | on:
4 | issue_comment:
5 | types: [created]
6 |
7 | name: Commands
8 |
9 | jobs:
10 | document:
11 | if: ${{ github.event.issue.pull_request && (github.event.comment.author_association == 'MEMBER' || github.event.comment.author_association == 'OWNER') && startsWith(github.event.comment.body, '/document') }}
12 | name: document
13 | runs-on: ubuntu-latest
14 | env:
15 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
16 | steps:
17 | - uses: actions/checkout@v3
18 |
19 | - uses: r-lib/actions/pr-fetch@v2
20 | with:
21 | repo-token: ${{ secrets.GITHUB_TOKEN }}
22 |
23 | - uses: r-lib/actions/setup-r@v2
24 | with:
25 | use-public-rspm: true
26 |
27 | - uses: r-lib/actions/setup-r-dependencies@v2
28 | with:
29 | extra-packages: any::roxygen2
30 | needs: pr-document
31 |
32 | - name: Document
33 | run: roxygen2::roxygenise()
34 | shell: Rscript {0}
35 |
36 | - name: commit
37 | run: |
38 | git config --local user.name "$GITHUB_ACTOR"
39 | git config --local user.email "$GITHUB_ACTOR@users.noreply.github.com"
40 | git add man/\* NAMESPACE
41 | git commit -m 'Document'
42 |
43 | - uses: r-lib/actions/pr-push@v2
44 | with:
45 | repo-token: ${{ secrets.GITHUB_TOKEN }}
46 |
47 | style:
48 | if: ${{ github.event.issue.pull_request && (github.event.comment.author_association == 'MEMBER' || github.event.comment.author_association == 'OWNER') && startsWith(github.event.comment.body, '/style') }}
49 | name: style
50 | runs-on: ubuntu-latest
51 | env:
52 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
53 | steps:
54 | - uses: actions/checkout@v3
55 |
56 | - uses: r-lib/actions/pr-fetch@v2
57 | with:
58 | repo-token: ${{ secrets.GITHUB_TOKEN }}
59 |
60 | - uses: r-lib/actions/setup-r@v2
61 |
62 | - name: Install dependencies
63 | run: install.packages("styler")
64 | shell: Rscript {0}
65 |
66 | - name: Style
67 | run: styler::style_pkg()
68 | shell: Rscript {0}
69 |
70 | - name: commit
71 | run: |
72 | git config --local user.name "$GITHUB_ACTOR"
73 | git config --local user.email "$GITHUB_ACTOR@users.noreply.github.com"
74 | git add \*.R
75 | git commit -m 'Style'
76 |
77 | - uses: r-lib/actions/pr-push@v2
78 | with:
79 | repo-token: ${{ secrets.GITHUB_TOKEN }}
80 |
--------------------------------------------------------------------------------
/.github/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # Contributing to lay
2 |
3 | This outlines how to propose a change to lay.
4 | For more detailed info about contributing to this, and other tidyverse packages, please see the
5 | [**development contributing guide**](https://rstd.io/tidy-contrib).
6 |
7 | ## Fixing typos
8 |
9 | You can fix typos, spelling mistakes, or grammatical errors in the documentation directly using the GitHub web interface, as long as the changes are made in the _source_ file.
10 | This generally means you'll need to edit [roxygen2 comments](https://roxygen2.r-lib.org/articles/roxygen2.html) in an `.R`, not a `.Rd` file.
11 | You can find the `.R` file that generates the `.Rd` by reading the comment in the first line.
12 |
13 | ## Bigger changes
14 |
15 | If you want to make a bigger change, it's a good idea to first file an issue and make sure someone from the team agrees that it's needed.
16 | If you've found a bug, please file an issue that illustrates the bug with a minimal
17 | [reprex](https://www.tidyverse.org/help/#reprex) (this will also help you write a unit test, if needed).
18 |
19 | ### Pull request process
20 |
21 | * Fork the package and clone onto your computer. If you haven't done this before, we recommend using `usethis::create_from_github("courtiol/lay", fork = TRUE)`.
22 |
23 | * Install all development dependences with `devtools::install_dev_deps()`, and then make sure the package passes R CMD check by running `devtools::check()`.
24 | If R CMD check doesn't pass cleanly, it's a good idea to ask for help before continuing.
25 |
26 | * Create a Git branch for your pull request (PR). We recommend using `usethis::pr_init("brief-description-of-change")`.
27 |
28 | * Make your changes, commit to git, and then create a PR by running `usethis::pr_push()`, and following the prompts in your browser.
29 | The title of your PR should briefly describe the change.
30 | The body of your PR should contain `Fixes #issue-number`.
31 |
32 | * For user-facing changes, add a bullet to the top of `NEWS.md` (i.e. just below the first header). Follow the style described in .
33 |
34 | ### Code style
35 |
36 | * New code should follow the tidyverse [style guide](https://style.tidyverse.org).
37 | You can use the [styler](https://CRAN.R-project.org/package=styler) package to apply these styles, but please don't restyle code that has nothing to do with your PR.
38 |
39 | * We use [roxygen2](https://cran.r-project.org/package=roxygen2), with [Markdown syntax](https://cran.r-project.org/web/packages/roxygen2/vignettes/markdown.html), for documentation.
40 |
41 | * We use [testthat](https://cran.r-project.org/package=testthat) for unit tests.
42 | Contributions with test cases included are easier to accept.
43 |
44 | ## Code of Conduct
45 |
46 | Please note that the lay project is released with a
47 | [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By contributing to this
48 | project you agree to abide by its terms.
49 |
--------------------------------------------------------------------------------
/.github/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Contributor Covenant Code of Conduct
2 |
3 | ## Our Pledge
4 |
5 | We as members, contributors, and leaders pledge to make participation in our
6 | community a harassment-free experience for everyone, regardless of age, body
7 | size, visible or invisible disability, ethnicity, sex characteristics, gender
8 | identity and expression, level of experience, education, socio-economic status,
9 | nationality, personal appearance, race, religion, or sexual identity and
10 | orientation.
11 |
12 | We pledge to act and interact in ways that contribute to an open, welcoming,
13 | diverse, inclusive, and healthy community.
14 |
15 | ## Our Standards
16 |
17 | Examples of behavior that contributes to a positive environment for our
18 | community include:
19 |
20 | * Demonstrating empathy and kindness toward other people
21 | * Being respectful of differing opinions, viewpoints, and experiences
22 | * Giving and gracefully accepting constructive feedback
23 | * Accepting responsibility and apologizing to those affected by our mistakes,
24 | and learning from the experience
25 | * Focusing on what is best not just for us as individuals, but for the overall
26 | community
27 |
28 | Examples of unacceptable behavior include:
29 |
30 | * The use of sexualized language or imagery, and sexual attention or
31 | advances of any kind
32 | * Trolling, insulting or derogatory comments, and personal or political attacks
33 | * Public or private harassment
34 | * Publishing others' private information, such as a physical or email
35 | address, without their explicit permission
36 | * Other conduct which could reasonably be considered inappropriate in a
37 | professional setting
38 |
39 | ## Enforcement Responsibilities
40 |
41 | Community leaders are responsible for clarifying and enforcing our standards
42 | of acceptable behavior and will take appropriate and fair corrective action in
43 | response to any behavior that they deem inappropriate, threatening, offensive,
44 | or harmful.
45 |
46 | Community leaders have the right and responsibility to remove, edit, or reject
47 | comments, commits, code, wiki edits, issues, and other contributions that are
48 | not aligned to this Code of Conduct, and will communicate reasons for moderation
49 | decisions when appropriate.
50 |
51 | ## Scope
52 |
53 | This Code of Conduct applies within all community spaces, and also applies
54 | when an individual is officially representing the community in public spaces.
55 | Examples of representing our community include using an official e-mail
56 | address, posting via an official social media account, or acting as an appointed
57 | representative at an online or offline event.
58 |
59 | ## Enforcement
60 |
61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be
62 | reported to the community leaders responsible for enforcement at [INSERT CONTACT
63 | METHOD]. All complaints will be reviewed and investigated promptly and fairly.
64 |
65 | All community leaders are obligated to respect the privacy and security of the
66 | reporter of any incident.
67 |
68 | ## Enforcement Guidelines
69 |
70 | Community leaders will follow these Community Impact Guidelines in determining
71 | the consequences for any action they deem in violation of this Code of Conduct:
72 |
73 | ### 1. Correction
74 |
75 | **Community Impact**: Use of inappropriate language or other behavior deemed
76 | unprofessional or unwelcome in the community.
77 |
78 | **Consequence**: A private, written warning from community leaders, providing
79 | clarity around the nature of the violation and an explanation of why the
80 | behavior was inappropriate. A public apology may be requested.
81 |
82 | ### 2. Warning
83 |
84 | **Community Impact**: A violation through a single incident or series of
85 | actions.
86 |
87 | **Consequence**: A warning with consequences for continued behavior. No
88 | interaction with the people involved, including unsolicited interaction with
89 | those enforcing the Code of Conduct, for a specified period of time. This
90 | includes avoiding interactions in community spaces as well as external channels
91 | like social media. Violating these terms may lead to a temporary or permanent
92 | ban.
93 |
94 | ### 3. Temporary Ban
95 |
96 | **Community Impact**: A serious violation of community standards, including
97 | sustained inappropriate behavior.
98 |
99 | **Consequence**: A temporary ban from any sort of interaction or public
100 | communication with the community for a specified period of time. No public or
101 | private interaction with the people involved, including unsolicited interaction
102 | with those enforcing the Code of Conduct, is allowed during this period.
103 | Violating these terms may lead to a permanent ban.
104 |
105 | ### 4. Permanent Ban
106 |
107 | **Community Impact**: Demonstrating a pattern of violation of community
108 | standards, including sustained inappropriate behavior, harassment of an
109 | individual, or aggression toward or disparagement of classes of individuals.
110 |
111 | **Consequence**: A permanent ban from any sort of public interaction within the
112 | community.
113 |
114 | ## Attribution
115 |
116 | This Code of Conduct is adapted from the [Contributor Covenant][homepage],
117 | version 2.0,
118 | available at https://www.contributor-covenant.org/version/2/0/
119 | code_of_conduct.html.
120 |
121 | Community Impact Guidelines were inspired by [Mozilla's code of conduct
122 | enforcement ladder](https://github.com/mozilla/diversity).
123 |
124 | [homepage]: https://www.contributor-covenant.org
125 |
126 | For answers to common questions about this code of conduct, see the FAQ at
127 | https://www.contributor-covenant.org/faq. Translations are available at https://
128 | www.contributor-covenant.org/translations.
129 |
--------------------------------------------------------------------------------
/vignettes/articles/alternatives.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Alternatives for rowwise jobs"
3 | ---
4 |
5 | ```{r, include = FALSE}
6 | knitr::opts_chunk$set(
7 | collapse = TRUE,
8 | comment = "#>"
9 | )
10 | ```
11 |
12 | ## Article overview
13 |
14 | There are many alternatives to perform rowwise jobs in R.
15 | In this Article, we consider, in turns, these alternatives.
16 | We will stick to our example about drugs usage shown in [introduction](https://courtiol.github.io/lay/).
17 | The idea is to compare alternative ways to create a new variable named `everused` which indicates if each respondent has used any of the considered pain relievers for non medical purpose or not.
18 |
19 | ## Loading packages
20 |
21 | This Article requires you to load the following packages:
22 |
23 | ```{r load_pkg, message=FALSE}
24 | library(lay) ## for lay() and the data
25 | library(dplyr) ## for many things
26 | library(tidyr) ## for pivot_longer() and pivot_wider()
27 | library(purrr) ## for pmap_lgl()
28 | library(slider) ## for slide()
29 | library(data.table) ## for an alternative to base and dplyr
30 | ```
31 | Please install them if they are not present on your system.
32 |
33 | ## Alternative 1: vectorized solution
34 |
35 | One solution is to simply do the following:
36 | ```{r vector}
37 | drugs_full |>
38 | mutate(everused = codeine | hydrocd | methdon | morphin | oxycodp | tramadl | vicolor)
39 | ```
40 | It is certainly very efficient from a computational point of view, but coding this way presents two main limitations:
41 |
42 | - you need to name all columns explicitly, which can be problematic when dealing with many columns
43 | - you are stuck with expressing your task with logical and arithmetic operators, which is not always sufficient
44 |
45 |
46 | ## Alternative 2: 100% [**{dplyr}**](https://dplyr.tidyverse.org/)
47 |
48 | ```{r dplyr}
49 | drugs |>
50 | rowwise() |>
51 | mutate(everused = any(c_across(-caseid))) |>
52 | ungroup()
53 | ```
54 | It is easy to use as `c_across()` turns its input into a vector and `rowwise()` implies that the
55 | vector only represents one row at a time. Yet, for now it remains quite slow on large datasets (see **Efficiency** below).
56 |
57 |
58 | ## Alternative 3: [**{tidyr}**](https://tidyr.tidyverse.org/)
59 |
60 | ```{r, }
61 | library(tidyr) ## requires to have installed {tidyr}
62 |
63 | drugs |>
64 | pivot_longer(-caseid) |>
65 | group_by(caseid) |>
66 | mutate(everused = any(value)) |>
67 | ungroup() |>
68 | pivot_wider() |>
69 | relocate(everused, .after = last_col())
70 | ```
71 | Here the trick is to turn the rowwise problem into a column problem by pivoting the values and then
72 | pivoting the results back. Many find that this involves a little too much intellectual gymnastic. It
73 | is also not particularly efficient on large dataset both in terms of computation time and memory required
74 | to pivot the tables.
75 |
76 |
77 | ## Alternative 4: [**{purrr}**](https://purrr.tidyverse.org/)
78 |
79 | ```{r purrr}
80 | library(purrr) ## requires to have installed {purrr}
81 |
82 | drugs |>
83 | mutate(everused = pmap_lgl(pick(-caseid), ~ any(...)))
84 | ```
85 | This is a perfectly fine solution and actually part of what one implementation of `lay()` relies on
86 | (if `.method = "tidy"`), but from a user perspective it is a little too geeky-scary.
87 |
88 |
89 | ## Alternative 5: [**{slider}**](https://slider.r-lib.org/)
90 |
91 | ```{r slider}
92 | library(slider) ## requires to have installed {slider}
93 |
94 | drugs |>
95 | mutate(everused = slide_vec(pick(-caseid), any))
96 | ```
97 | The package [**{slider}**](https://slider.r-lib.org/) is a powerful package which provides several *sliding window* functions.
98 | It can be used to perform rowwise operations and is quite similar to **{lay}** in terms syntax.
99 | It is however not as efficient as **{lay}** and I am not sure it supports the automatic splicing demonstrated above.
100 |
101 |
102 | ## Alternative 6: [**{data.table}**](https://rdatatable.gitlab.io/data.table/)
103 |
104 | ```{r data.table, message=FALSE}
105 | library(data.table) ## requires to have installed {data.table}
106 |
107 | drugs_dt <- data.table(drugs)
108 |
109 | drugs_dt[, ..I := .I]
110 | drugs_dt[, everused := any(.SD), by = ..I, .SDcols = -"caseid"]
111 | drugs_dt[, ..I := NULL]
112 | as_tibble(drugs_dt)
113 | ```
114 | This is a solution for those using [**{data.table}**](https://rdatatable.gitlab.io/data.table/).
115 | It is not particularly efficient, nor particularly easy to remember for those who do not program frequently using [**{data.table}**](https://rdatatable.gitlab.io/data.table/).
116 |
117 |
118 | ## Alternative 7: `apply()`
119 |
120 | ```{r apply}
121 | drugs |>
122 | mutate(everused = apply(pick(-caseid), 1L, any))
123 | ```
124 | This is the base R solution. Very efficient and actually part of the default method used in `lay()`.
125 | Our implementation of `lay()` strips the need of defining the margin (the `1L` above) and benefits from
126 | the automatic splicing and the lambda syntax as shown above.
127 |
128 |
129 | ## Alternative 8: `for (i in ...) {...}`
130 |
131 | ```{r for}
132 | drugs$everused <- NA
133 |
134 | columns_in <- !colnames(drugs) %in% c("caseid", "everused")
135 |
136 | for (i in seq_len(nrow(drugs))) {
137 | drugs$everused[i] <- any(drugs[i, columns_in])
138 | }
139 |
140 | drugs
141 | ```
142 | This is another base R solution, which does not involve any external package. It is not very pretty,
143 | nor particularly efficient.
144 |
145 |
146 | ## Other alternatives?
147 |
148 | There are probably other ways. If you think of a nice one, please leave an issue and we will add it here!
149 |
150 |
151 | ## Efficiency
152 |
153 | The results of benchmarks comparing alternative implementations for our simple rowwise job are shown in another Article (see [benchmarks](https://courtiol.github.io/lay/articles/benchmarks.html)).
154 | As you will see, `lay()` is not just simple and powerful, it is also quite efficient!
155 |
156 |
--------------------------------------------------------------------------------
/man/lay.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/lay.R
3 | \name{lay}
4 | \alias{lay}
5 | \title{Apply a function within each row}
6 | \usage{
7 | lay(.data, .fn, ..., .method = c("apply", "tidy"))
8 | }
9 | \arguments{
10 | \item{.data}{A data frame or tibble (or other data frame extensions).}
11 |
12 | \item{.fn}{A function to apply to each row of \code{.data}.
13 | Possible values are:
14 | \itemize{
15 | \item A function, e.g. \code{mean}
16 | \item An anonymous function, .e.g. \code{function(x) mean(x, na.rm = TRUE)}
17 | \item An anonymous function with shorthand, .e.g. \verb{\\(x) mean(x, na.rm = TRUE)}
18 | \item A purrr-style lambda, e.g. \code{~ mean(.x, na.rm = TRUE)}
19 |
20 | (wrap the output in a data frame to apply several functions at once, e.g.
21 | \code{~ tibble(min = min(.x), max = max(.x))})
22 | }}
23 |
24 | \item{...}{Additional arguments for the function calls in \code{.fn} (must be named!).}
25 |
26 | \item{.method}{This is an experimental argument that allows you to control which internal method
27 | is used to apply the rowwise job:
28 | \itemize{
29 | \item "apply", the default internally uses the function \code{\link[=apply]{apply()}}.
30 | \item "tidy", internally uses \code{\link[purrr:pmap]{purrr::pmap()}} and is stricter with respect to class coercion
31 | across columns.
32 | }
33 |
34 | The default has been chosen based on these \href{https://courtiol.github.io/lay/articles/benchmarks.html}{\strong{benchmarks}}.}
35 | }
36 | \value{
37 | A vector with one element per row of \code{.data}, or a data frame (or tibble) with one row per row of \code{.data}. The class of the output is determined by \code{.fn}.
38 | }
39 | \description{
40 | Create efficiently new column(s) in data frame (including tibble) by applying a function one row
41 | at a time.
42 | }
43 | \details{
44 | \code{lay()} create a vector or a data frame (or tibble), by considering in turns each row of a data
45 | frame (\code{.data}) as the vector input of some function(s) \code{.fn}.
46 |
47 | This makes the creation of new columns based on a rowwise operation both simple (see
48 | \strong{Examples}; below) and efficient (see the Article \href{https://courtiol.github.io/lay/articles/benchmarks.html}{\strong{benchmarks}}).
49 |
50 | The function should be fully compatible with \code{{dplyr}}-based workflows and follows a syntax close
51 | to \code{\link[dplyr:across]{dplyr::across()}}.
52 |
53 | Yet, it takes \code{.data} instead of \code{.cols} as a main argument, which makes it possible to also use
54 | \code{lay()} outside \code{dplyr} verbs (see \strong{Examples}).
55 |
56 | The function \code{lay()} should work in a wide range of situations, provided that:
57 | \itemize{
58 | \item The input \code{.data} should be a data frame (including tibble) with columns of same class, or of
59 | classes similar enough to be easily coerced into a single class. Note that \code{.method = "apply"}
60 | also allows for the input to be a matrix and is more permissive in terms of data coercion.
61 | \item The output of \code{.fn} should be a scalar (i.e., vector of length 1) or a 1 row data frame (or
62 | tibble).
63 | }
64 |
65 | If you use \code{lay()} within \code{\link[dplyr:mutate]{dplyr::mutate()}}, make sure that the data used by \code{\link[dplyr:mutate]{dplyr::mutate()}}
66 | contain no row-grouping, i.e., what is passed to \code{.data} in \code{\link[dplyr:mutate]{dplyr::mutate()}} should not be of
67 | class \code{grouped_df} or \code{rowwise_df}. If it is, \code{lay()} will be called multiple times, which will
68 | slow down the computation despite not influencing the output.
69 | }
70 | \examples{
71 |
72 | # usage without dplyr -------------------------------------------------------------------------
73 |
74 | # lay can return a vector
75 | lay(drugs[1:10, -1], any)
76 |
77 | # lay can return a data frame
78 | ## using the shorthand function syntax \(x) .fn(x)
79 | lay(drugs[1:10, -1],
80 | \(x) data.frame(drugs_taken = sum(x), drugs_not_taken = sum(x == 0)))
81 |
82 | ## using the rlang lambda syntax ~ fn(.x)
83 | lay(drugs[1:10, -1],
84 | ~ data.frame(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0)))
85 |
86 | # lay can be used to augment a data frame
87 | cbind(drugs[1:10, ],
88 | lay(drugs[1:10, -1],
89 | ~ data.frame(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0))))
90 |
91 |
92 | # usage with dplyr ----------------------------------------------------------------------------
93 |
94 | if (require("dplyr")) {
95 |
96 | # apply any() to each row
97 | drugs |>
98 | mutate(everused = lay(pick(-caseid), any))
99 |
100 | # apply any() to each row using all columns
101 | drugs |>
102 | select(-caseid) |>
103 | mutate(everused = lay(pick(everything()), any))
104 |
105 | # a workaround would be to use `rowSums`
106 | drugs |>
107 | mutate(everused = rowSums(pick(-caseid)) > 0)
108 |
109 | # but we can lay any function taking a vector as input, e.g. median
110 | drugs |>
111 | mutate(used_median = lay(pick(-caseid), median))
112 |
113 | # you can pass arguments to the function
114 | drugs_with_NA <- drugs
115 | drugs_with_NA[1, 2] <- NA
116 |
117 | drugs_with_NA |>
118 | mutate(everused = lay(pick(-caseid), any))
119 | drugs_with_NA |>
120 | mutate(everused = lay(pick(-caseid), any, na.rm = TRUE))
121 |
122 | # you can lay the output into a 1-row tibble (or data.frame)
123 | # if you want to apply multiple functions
124 | drugs |>
125 | mutate(lay(pick(-caseid),
126 | ~ tibble(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0))))
127 |
128 | # note that naming the output prevent the automatic splicing and you obtain a df-column
129 | drugs |>
130 | mutate(usage = lay(pick(-caseid),
131 | ~ tibble(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0))))
132 |
133 | # if your function returns a vector longer than a scalar, you should turn the output
134 | # into a tibble, which is the job of as_tibble_row()
135 | drugs |>
136 | mutate(lay(pick(-caseid), ~ as_tibble_row(quantile(.x))))
137 |
138 | # note that you could also wrap the output in a list and name it to obtain a list-column
139 | drugs |>
140 | mutate(usage_quantiles = lay(pick(-caseid), ~ list(quantile(.x))))
141 | }
142 |
143 | }
144 |
--------------------------------------------------------------------------------
/R/lay.R:
--------------------------------------------------------------------------------
1 | #' Apply a function within each row
2 | #'
3 | #' Create efficiently new column(s) in data frame (including tibble) by applying a function one row
4 | #' at a time.
5 | #'
6 | #' `lay()` create a vector or a data frame (or tibble), by considering in turns each row of a data
7 | #' frame (`.data`) as the vector input of some function(s) `.fn`.
8 | #'
9 | #' This makes the creation of new columns based on a rowwise operation both simple (see
10 | #' **Examples**; below) and efficient (see the Article [**benchmarks**](https://courtiol.github.io/lay/articles/benchmarks.html)).
11 | #'
12 | #' The function should be fully compatible with `{dplyr}`-based workflows and follows a syntax close
13 | #' to [dplyr::across()].
14 | #'
15 | #' Yet, it takes `.data` instead of `.cols` as a main argument, which makes it possible to also use
16 | #' `lay()` outside `dplyr` verbs (see **Examples**).
17 | #'
18 | #' The function `lay()` should work in a wide range of situations, provided that:
19 | #'
20 | #' - The input `.data` should be a data frame (including tibble) with columns of same class, or of
21 | #' classes similar enough to be easily coerced into a single class. Note that `.method = "apply"`
22 | #' also allows for the input to be a matrix and is more permissive in terms of data coercion.
23 | #'
24 | #' - The output of `.fn` should be a scalar (i.e., vector of length 1) or a 1 row data frame (or
25 | #' tibble).
26 | #'
27 | #' If you use `lay()` within [dplyr::mutate()], make sure that the data used by [dplyr::mutate()]
28 | #' contain no row-grouping, i.e., what is passed to `.data` in [dplyr::mutate()] should not be of
29 | #' class `grouped_df` or `rowwise_df`. If it is, `lay()` will be called multiple times, which will
30 | #' slow down the computation despite not influencing the output.
31 | #'
32 | #'
33 | #' @param .data A data frame or tibble (or other data frame extensions).
34 | #' @param .fn A function to apply to each row of `.data`.
35 | #' Possible values are:
36 | #'
37 | #' - A function, e.g. `mean`
38 | #' - An anonymous function, .e.g. `function(x) mean(x, na.rm = TRUE)`
39 | #' - An anonymous function with shorthand, .e.g. `\(x) mean(x, na.rm = TRUE)`
40 | #' - A purrr-style lambda, e.g. `~ mean(.x, na.rm = TRUE)`
41 | #'
42 | #' (wrap the output in a data frame to apply several functions at once, e.g.
43 | #' `~ tibble(min = min(.x), max = max(.x))`)
44 | #'
45 | #' @param ... Additional arguments for the function calls in `.fn` (must be named!).
46 | #' @param .method This is an experimental argument that allows you to control which internal method
47 | #' is used to apply the rowwise job:
48 | #' - "apply", the default internally uses the function [apply()].
49 | #' - "tidy", internally uses [purrr::pmap()] and is stricter with respect to class coercion
50 | #' across columns.
51 | #'
52 | #' The default has been chosen based on these [**benchmarks**](https://courtiol.github.io/lay/articles/benchmarks.html).
53 | #'
54 | #' @return A vector with one element per row of `.data`, or a data frame (or tibble) with one row per row of `.data`. The class of the output is determined by `.fn`.
55 | #'
56 | #' @importFrom vctrs vec_c
57 | #' @importFrom rlang list2 exec as_function
58 | #' @importFrom purrr pmap
59 | #'
60 | #' @examples
61 | #'
62 | #' # usage without dplyr -------------------------------------------------------------------------
63 | #'
64 | #' # lay can return a vector
65 | #' lay(drugs[1:10, -1], any)
66 | #'
67 | #' # lay can return a data frame
68 | #' ## using the shorthand function syntax \(x) .fn(x)
69 | #' lay(drugs[1:10, -1],
70 | #' \(x) data.frame(drugs_taken = sum(x), drugs_not_taken = sum(x == 0)))
71 | #'
72 | #' ## using the rlang lambda syntax ~ fn(.x)
73 | #' lay(drugs[1:10, -1],
74 | #' ~ data.frame(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0)))
75 | #'
76 | #' # lay can be used to augment a data frame
77 | #' cbind(drugs[1:10, ],
78 | #' lay(drugs[1:10, -1],
79 | #' ~ data.frame(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0))))
80 | #'
81 | #'
82 | #' # usage with dplyr ----------------------------------------------------------------------------
83 | #'
84 | #' if (require("dplyr")) {
85 | #'
86 | #' # apply any() to each row
87 | #' drugs |>
88 | #' mutate(everused = lay(pick(-caseid), any))
89 | #'
90 | #' # apply any() to each row using all columns
91 | #' drugs |>
92 | #' select(-caseid) |>
93 | #' mutate(everused = lay(pick(everything()), any))
94 | #'
95 | #' # a workaround would be to use `rowSums`
96 | #' drugs |>
97 | #' mutate(everused = rowSums(pick(-caseid)) > 0)
98 | #'
99 | #' # but we can lay any function taking a vector as input, e.g. median
100 | #' drugs |>
101 | #' mutate(used_median = lay(pick(-caseid), median))
102 | #'
103 | #' # you can pass arguments to the function
104 | #' drugs_with_NA <- drugs
105 | #' drugs_with_NA[1, 2] <- NA
106 | #'
107 | #' drugs_with_NA |>
108 | #' mutate(everused = lay(pick(-caseid), any))
109 | #' drugs_with_NA |>
110 | #' mutate(everused = lay(pick(-caseid), any, na.rm = TRUE))
111 | #'
112 | #' # you can lay the output into a 1-row tibble (or data.frame)
113 | #' # if you want to apply multiple functions
114 | #' drugs |>
115 | #' mutate(lay(pick(-caseid),
116 | #' ~ tibble(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0))))
117 | #'
118 | #' # note that naming the output prevent the automatic splicing and you obtain a df-column
119 | #' drugs |>
120 | #' mutate(usage = lay(pick(-caseid),
121 | #' ~ tibble(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0))))
122 | #'
123 | #' # if your function returns a vector longer than a scalar, you should turn the output
124 | #' # into a tibble, which is the job of as_tibble_row()
125 | #' drugs |>
126 | #' mutate(lay(pick(-caseid), ~ as_tibble_row(quantile(.x))))
127 | #'
128 | #' # note that you could also wrap the output in a list and name it to obtain a list-column
129 | #' drugs |>
130 | #' mutate(usage_quantiles = lay(pick(-caseid), ~ list(quantile(.x))))
131 | #' }
132 | #'
133 | #' @export
134 | lay <- function(.data, .fn, ..., .method = c("apply", "tidy")) {
135 |
136 | method <- match.arg(.method)[1]
137 |
138 | fn <- as_function(.fn)
139 |
140 | if (method == "tidy") {
141 |
142 | args <- list2(...)
143 | bits <- pmap(.data, function(...) exec(fn, vec_c(...), !!!args))
144 |
145 | } else if (method == "apply") {
146 |
147 | bits <- apply(.data, 1L, fn, ...)
148 |
149 | } else stop(".method input unknown")
150 |
151 | vec_c(!!!bits)
152 | }
153 |
--------------------------------------------------------------------------------
/vignettes/articles/benchmarks.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Benchmarks"
3 | ---
4 |
5 | ```{r, include = FALSE}
6 | knitr::opts_chunk$set(
7 | collapse = TRUE,
8 | comment = "#>"
9 | )
10 | ```
11 |
12 | ## Article overview
13 |
14 | The goal of this Article is to compare the performances of `lay()` to alternatives described [here](https://courtiol.github.io/lay/articles/alternatives.html).
15 | As you will see, the code using `lay()` is quite efficient.
16 | The only alternative that is clearly more efficient is the one labeled below "*vectorized*".
17 | Unfortunately, such a vectorized approach imply to refer explicitly to all column names which data are used.
18 | Furthermore, such a vectorized approach is not applicable generally, as it can only deal with logical
19 | and arithmetic operators and does allow the use of other types of functions.
20 |
21 |
22 | ## Loading packages
23 |
24 | This Article requires you to load the following packages:
25 |
26 | ```{r load_pkg, message=FALSE}
27 | library(lay) ## for lay() and the data
28 | library(dplyr) ## for many things
29 | library(tidyr) ## for pivot_longer() and pivot_wider()
30 | library(purrr) ## for pmap_lgl()
31 | library(slider) ## for slide()
32 | library(data.table) ## for an alternative to base and dplyr
33 | library(bench) ## for running the benchmarks
34 | library(forcats) ## for sorting levels in plot with fct_reorder()
35 | ```
36 |
37 | Please install them if they are not present on your system.
38 |
39 |
40 | ## An example of a rowwise task
41 |
42 | Consider the dataset `drugs_full` from our package {lay}:
43 | ```{r drugs_full}
44 | drugs_full
45 | ```
46 |
47 | In this dataset, all columns but `caseid` record the use of pain relievers for non medical purpose.
48 |
49 | For each drug there is a certain number of users and non-users:
50 | ```{r drugs_full_summary}
51 | drugs_full |>
52 | pivot_longer(-caseid, names_to = "drug", values_to = "used") |>
53 | count(drug, used) |>
54 | mutate(used = if_else(used == 1, "have_used", "have_not_used")) |>
55 | pivot_wider(names_from = used, values_from = n)
56 | ```
57 |
58 | In this Article, we compare the efficiency of alternative ways to create a new variable named `everused` which indicates if each respondent has used any of the considered pain relievers for non medical purpose or not.
59 |
60 | We will run benchmarks on the dataset `drugs_full` and its `r nrow(drugs_full)` rows, as well as on a subset of this data called `drugs` that only contains `r nrow(drugs)` rows.
61 |
62 | ## Benchmarks on the full dataset (`r nrow(drugs_full)` rows)
63 |
64 | Let's compare the running time of different methods to do this job on the full dataset:
65 |
66 | ```{r bench_run1}
67 | drugs_full_dt <- data.table(drugs_full) ## coercion to data.table
68 |
69 | benchmark1 <- mark(
70 | vectorized = {
71 | drugs_full |>
72 | mutate(everused = codeine | hydrocd | methdon | morphin | oxycodp | tramadl | vicolor)},
73 | lay = {
74 | drugs_full |>
75 | select(-caseid) |>
76 | mutate(everused = lay(pick(everything()), any))},
77 | lay_alternative = {
78 | drugs_full |>
79 | mutate(everused = lay(pick(-caseid), any, .method = "tidy"))},
80 | c_across = {
81 | drugs_full |>
82 | rowwise() |>
83 | mutate(everused = any(c_across(-caseid))) |>
84 | ungroup()},
85 | pivot_pivot = {
86 | drugs_full |>
87 | pivot_longer(-caseid) |>
88 | group_by(caseid) |>
89 | mutate(everused = any(value)) |>
90 | ungroup() |>
91 | pivot_wider()},
92 | pmap = {
93 | drugs_full |>
94 | mutate(everused = pmap_lgl(pick(-caseid), ~ any(...)))},
95 | slider = {
96 | drugs_full |>
97 | mutate(everused = slide_vec(pick(-caseid), any))},
98 | data.table = {
99 | drugs_full_dt[, ..I := .I]
100 | drugs_full_dt[, everused := any(.SD), by = ..I, .SDcols = -"caseid"]},
101 | apply = {
102 | drugs_full |>
103 | mutate(everused = apply(pick(-caseid), 1, any))},
104 | 'for' = {
105 | everused <- logical(nrow(drugs_full))
106 | columns_in <- colnames(drugs_full) != "caseid"
107 | for (i in seq_len(nrow(drugs_full))) everused[i] <- any(drugs_full[i, columns_in])},
108 | iterations = 5,
109 | time_unit = "ms",
110 | check = FALSE
111 | )
112 | ```
113 |
114 | Here are the results of this first series of benchmarks:
115 | ```{r bench_results1}
116 | benchmark1 |>
117 | arrange(median)
118 | ```
119 |
120 | ```{r bench_plot1, fig.width=7}
121 | benchmark1 |>
122 | mutate(expression = fct_reorder(as.character(expression), median, .desc = TRUE)) |>
123 | plot()
124 | ```
125 |
126 | Note that the x-axis of the plot is on a logarithmic scale.
127 |
128 | ## Benchmarks on a subset of the data (`r nrow(drugs)` rows)
129 |
130 | Let's repeat our benchmarks using a only a subset of the original dataset:
131 |
132 | ```{r bench_run2}
133 | drugs_dt <- data.table(drugs) ## coercion to data.table
134 |
135 | benchmark2 <- mark(
136 | vectorized = {
137 | drugs |>
138 | mutate(everused = codeine | hydrocd | methdon | morphin | oxycodp | tramadl | vicolor)},
139 | lay = {
140 | drugs |>
141 | select(-caseid) |>
142 | mutate(everused = lay(pick(everything()), any))},
143 | lay_alternative = {
144 | drugs |>
145 | mutate(everused = lay(pick(-caseid), any, .method = "tidy"))},
146 | c_across = {
147 | drugs |>
148 | rowwise() |>
149 | mutate(everused = any(c_across(-caseid))) |>
150 | ungroup()},
151 | pivot_pivot = {
152 | drugs |>
153 | pivot_longer(-caseid) |>
154 | group_by(caseid) |>
155 | mutate(everused = any(value)) |>
156 | ungroup() |>
157 | pivot_wider()},
158 | pmap = {
159 | drugs |>
160 | mutate(everused = pmap_lgl(pick(-caseid), ~ any(...)))},
161 | slider = {
162 | drugs |>
163 | mutate(everused = slide_vec(pick(-caseid), any))},
164 | data.table = {
165 | drugs_dt[, ..I := .I]
166 | drugs_dt[, everused := any(.SD), by = ..I, .SDcols = -"caseid"]},
167 | apply = {
168 | drugs |>
169 | mutate(everused = apply(pick(-caseid), 1, any))},
170 | 'for' = {
171 | everused <- logical(nrow(drugs))
172 | columns_in <- colnames(drugs) != "caseid"
173 | for (i in seq_len(nrow(drugs))) everused[i] <- any(drugs[i, columns_in])},
174 | iterations = 30,
175 | time_unit = "ms",
176 | check = FALSE
177 | )
178 | ```
179 |
180 | Here are the results of this second series of benchmarks:
181 | ```{r bench_results2}
182 | benchmark2 |>
183 | arrange(median)
184 | ```
185 |
186 | ```{r bench_plot2, fig.width=7}
187 | benchmark2 |>
188 | mutate(expression = fct_reorder(as.character(expression), median, .desc = TRUE)) |>
189 | plot(type = "violin")
190 | ```
191 |
192 | Note again that the x-axis of the plot is on a logarithmic scale.
193 |
194 | ## Benchmarks' environment
195 |
196 | ```{r session}
197 | sessionInfo()
198 | ```
199 |
200 |
--------------------------------------------------------------------------------
/README.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | output: github_document
3 | editor_options:
4 | chunk_output_type: console
5 | ---
6 |
7 |
8 |
9 | ```{r, include = FALSE}
10 | knitr::opts_chunk$set(
11 | collapse = TRUE,
12 | comment = "#>",
13 | fig.path = "man/figures/",
14 | out.width = "70%",
15 | fig.align = "center"
16 | )
17 | ```
18 |
19 | #
**{lay}**
20 |
21 |
22 | [](https://CRAN.R-project.org/package=lay)
23 | [](https://github.com/courtiol/lay/actions/workflows/R-CMD-check.yaml)
24 | [](https://github.com/courtiol/lay/actions/workflows/test-coverage.yaml)
25 | [](https://lifecycle.r-lib.org/articles/stages.html#experimental)
26 |
27 |
28 | ## An R package for simple but efficient rowwise jobs
29 |
30 | The function `lay()` -- the only function of the package **{lay}** -- is intended to be used to apply a function on each row of a data frame or tibble, independently, and across multiple columns containing values of the same class (e.g. all numeric).
31 |
32 | Implementing rowwise operations for tabular data is notoriously awkward in R.
33 | Many options have been proposed, but they tend to be complicated, inefficient, or both.
34 | Instead `lay()` aims at reaching a sweet spot between simplicity and efficiency.
35 |
36 | The function has been specifically designed to be combined with functions from [**{dplyr}**](https://dplyr.tidyverse.org/) and to feel as if
37 | it was part of it (but you can use `lay()` without [**{dplyr}**](https://dplyr.tidyverse.org/)).
38 |
39 | There is hardly any code behind `lay()` (it can be coded in 3 lines), so this package may just be an interim solution before an established package fulfills the need... Time will tell.
40 |
41 | ### Installation
42 |
43 |
44 | You can install the current CRAN version of **{lay}** with:
45 |
46 | ``` r
47 | install.packages("lay")
48 | ```
49 |
50 | Alternatively, you can install the development version of **{lay}** using [**{remotes}**](https://remotes.r-lib.org/):
51 |
52 | ``` r
53 | remotes::install_github("courtiol/lay") ## requires to have installed {remotes}
54 | ```
55 |
56 | ### Motivation
57 |
58 | Consider the following dataset, which contains information about the use of pain relievers for non medical purpose.
59 | ```{r motivation}
60 | library(lay) ## requires to have installed {lay}
61 | drugs
62 | ```
63 |
64 | The dataset is [tidy](https://vita.had.co.nz/papers/tidy-data.pdf): each row represents one individual and each variable forms a column.
65 |
66 | Imagine now that you would like to know if each individual did use any of these pain relievers.
67 |
68 | How would you proceed?
69 |
70 |
71 | ### Our solution: `lay()`
72 |
73 | This is how you would achieve our goal using `lay()`:
74 | ```{r lay}
75 | library(dplyr, warn.conflicts = FALSE) ## requires to have installed {dplyr}
76 |
77 | drugs_full |>
78 | mutate(everused = lay(pick(-caseid), any))
79 | ```
80 |
81 | We used `mutate()` from [**{dplyr}**](https://dplyr.tidyverse.org/) to create a new column called *everused*, and we used `pick()` from that same package to remove the column *caseid* when laying down each row of the data and applying the function `any()`.
82 |
83 | When combining `lay()` and [**{dplyr}**](https://dplyr.tidyverse.org/), you should always use `pick()` or `across()`. The functions `pick()` and `across()` let you pick among many [selection helpers](https://tidyselect.r-lib.org/reference/language.html) from the package [**{tidyselect}**](https://tidyselect.r-lib.org/), which makes it easy to specify which columns to consider.
84 |
85 | Our function `lay()` is quite flexible! For example, you can pass argument(s) of the function you wish to apply rowwise (here `any()`):
86 |
87 | ```{r NA}
88 | drugs_with_NA <- drugs ## create a copy of the dataset
89 | drugs_with_NA[1, 2] <- NA ## introduce a missing value
90 |
91 | drugs_with_NA |>
92 | mutate(everused = lay(pick(-caseid), any)) |> ## without additional argument
93 | slice(1) ## keep first row only
94 |
95 | drugs_with_NA |>
96 | mutate(everused = lay(pick(-caseid), any, na.rm = TRUE)) |> ## with additional argument
97 | slice(1)
98 | ```
99 |
100 | Since one of the backbones of `lay()` is [**{rlang}**](https://rlang.r-lib.org), you can use the so-called [*lambda* syntax](https://rlang.r-lib.org/reference/as_function.html) to define anonymous functions on the fly:
101 |
102 | ```{r lambda}
103 | drugs_with_NA |>
104 | mutate(everused = lay(pick(-caseid), ~ any(.x, na.rm = TRUE))) ## same as above, different syntax
105 | ```
106 |
107 | We can also apply many functions at once, as exemplified with another dataset:
108 |
109 | ```{r worldbank}
110 | data("world_bank_pop", package = "tidyr") ## requires to have installed {tidyr}
111 |
112 | world_bank_pop |>
113 | filter(indicator == "SP.POP.TOTL") |>
114 | mutate(lay(pick(matches("\\d")),
115 | ~ tibble(min = min(.x), mean = mean(.x), max = max(.x))), .after = indicator)
116 | ```
117 |
118 | Since the other backbone of `lay()` is [**{vctrs}**](https://vctrs.r-lib.org), the splicing happens automatically (unless the output of the call is used to create a named column). This is why, in the last chunk of code, three different columns (*min*, *mean* and *max*) are directly created.
119 |
120 | **Important:** when using `lay()` the function you want to use for the rowwise job must output a scalar (vector of length 1), or a tibble or data frame with a single row.
121 |
122 | We can apply a function that returns a vector of length > 1 by turning such a vector into a tibble using `as_tibble_row()` from [**{tibble}**](https://tibble.tidyverse.org/):
123 |
124 | ```{r worldbank2}
125 | world_bank_pop |>
126 | filter(indicator == "SP.POP.TOTL") |>
127 | mutate(lay(pick(matches("\\d")),
128 | ~ as_tibble_row(quantile(.x, na.rm = TRUE))), .after = indicator)
129 | ```
130 |
131 | ### History
132 |
133 |
134 |
135 | The first draft of this package has been created by **@romainfrancois** as a reply to a tweet I (Alexandre Courtiol) posted under **@rdataberlin** in February 2020.
136 | At the time I was exploring different ways to perform rowwise jobs in R and I was experimenting with various ideas on how to exploit the fact that the newly introduced function `across()` from [**{dplyr}**](https://dplyr.tidyverse.org/) creates tibbles on which one can easily apply a function.
137 | Romain came up with `lay()` as the better solution, making good use of [**{rlang}**](https://rlang.r-lib.org/) & [**{vctrs}**](https://vctrs.r-lib.org/).
138 |
139 | The verb `lay()` never made it to be integrated within [**{dplyr}**](https://dplyr.tidyverse.org/), but, so far, I still find `lay()` superior than most alternatives, which is why I decided to document and maintain this package.
140 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | #
**{lay}**
5 |
6 |
7 |
8 | [](https://CRAN.R-project.org/package=lay)
10 | [](https://github.com/courtiol/lay/actions/workflows/R-CMD-check.yaml)
11 | [](https://github.com/courtiol/lay/actions/workflows/test-coverage.yaml)
12 | [](https://lifecycle.r-lib.org/articles/stages.html#experimental)
14 |
15 |
16 | ## An R package for simple but efficient rowwise jobs
17 |
18 | The function `lay()` – the only function of the package **{lay}** – is
19 | intended to be used to apply a function on each row of a data frame or
20 | tibble, independently, and across multiple columns containing values of
21 | the same class (e.g. all numeric).
22 |
23 | Implementing rowwise operations for tabular data is notoriously awkward
24 | in R. Many options have been proposed, but they tend to be complicated,
25 | inefficient, or both. Instead `lay()` aims at reaching a sweet spot
26 | between simplicity and efficiency.
27 |
28 | The function has been specifically designed to be combined with
29 | functions from [**{dplyr}**](https://dplyr.tidyverse.org/) and to feel
30 | as if it was part of it (but you can use `lay()` without
31 | [**{dplyr}**](https://dplyr.tidyverse.org/)).
32 |
33 | There is hardly any code behind `lay()` (it can be coded in 3 lines), so
34 | this package may just be an interim solution before an established
35 | package fulfills the need… Time will tell.
36 |
37 | ### Installation
38 |
39 | You can install the current CRAN version of **{lay}** with:
40 |
41 | ``` r
42 | install.packages("lay")
43 | ```
44 |
45 | Alternatively, you can install the development version of **{lay}**
46 | using [**{remotes}**](https://remotes.r-lib.org/):
47 |
48 | ``` r
49 | remotes::install_github("courtiol/lay") ## requires to have installed {remotes}
50 | ```
51 |
52 | ### Motivation
53 |
54 | Consider the following dataset, which contains information about the use
55 | of pain relievers for non medical purpose.
56 |
57 | ``` r
58 | library(lay) ## requires to have installed {lay}
59 | drugs
60 | #> # A tibble: 100 × 8
61 | #> caseid hydrocd oxycodp codeine tramadl morphin methdon vicolor
62 | #>
63 | #> 1 1 0 0 0 0 0 0 0
64 | #> 2 2 0 0 0 0 0 0 0
65 | #> 3 3 0 0 0 0 0 0 0
66 | #> 4 4 0 0 0 0 0 0 0
67 | #> 5 5 0 0 0 0 0 0 0
68 | #> 6 6 0 0 0 0 0 0 0
69 | #> 7 7 0 0 0 0 0 0 0
70 | #> 8 8 0 0 0 0 0 0 0
71 | #> 9 9 0 0 0 0 0 0 1
72 | #> 10 10 0 0 0 0 0 0 0
73 | #> # ℹ 90 more rows
74 | ```
75 |
76 | The dataset is [tidy](https://vita.had.co.nz/papers/tidy-data.pdf): each
77 | row represents one individual and each variable forms a column.
78 |
79 | Imagine now that you would like to know if each individual did use any
80 | of these pain relievers.
81 |
82 | How would you proceed?
83 |
84 | ### Our solution: `lay()`
85 |
86 | This is how you would achieve our goal using `lay()`:
87 |
88 | ``` r
89 | library(dplyr, warn.conflicts = FALSE) ## requires to have installed {dplyr}
90 |
91 | drugs_full |>
92 | mutate(everused = lay(pick(-caseid), any))
93 | #> # A tibble: 55,271 × 9
94 | #> caseid hydrocd oxycodp codeine tramadl morphin methdon vicolor everused
95 | #>
96 | #> 1 1 0 0 0 0 0 0 0 FALSE
97 | #> 2 2 0 0 0 0 0 0 0 FALSE
98 | #> 3 3 0 0 0 0 0 0 0 FALSE
99 | #> 4 4 0 0 0 0 0 0 0 FALSE
100 | #> 5 5 0 0 0 0 0 0 0 FALSE
101 | #> 6 6 0 0 0 0 0 0 0 FALSE
102 | #> 7 7 0 0 0 0 0 0 0 FALSE
103 | #> 8 8 0 0 0 0 0 0 0 FALSE
104 | #> 9 9 0 0 0 0 0 0 1 TRUE
105 | #> 10 10 0 0 0 0 0 0 0 FALSE
106 | #> # ℹ 55,261 more rows
107 | ```
108 |
109 | We used `mutate()` from [**{dplyr}**](https://dplyr.tidyverse.org/) to
110 | create a new column called *everused*, and we used `pick()` from that
111 | same package to remove the column *caseid* when laying down each row of
112 | the data and applying the function `any()`.
113 |
114 | When combining `lay()` and [**{dplyr}**](https://dplyr.tidyverse.org/),
115 | you should always use `pick()` or `across()`. The functions `pick()` and
116 | `across()` let you pick among many [selection
117 | helpers](https://tidyselect.r-lib.org/reference/language.html) from the
118 | package [**{tidyselect}**](https://tidyselect.r-lib.org/), which makes
119 | it easy to specify which columns to consider.
120 |
121 | Our function `lay()` is quite flexible! For example, you can pass
122 | argument(s) of the function you wish to apply rowwise (here `any()`):
123 |
124 | ``` r
125 | drugs_with_NA <- drugs ## create a copy of the dataset
126 | drugs_with_NA[1, 2] <- NA ## introduce a missing value
127 |
128 | drugs_with_NA |>
129 | mutate(everused = lay(pick(-caseid), any)) |> ## without additional argument
130 | slice(1) ## keep first row only
131 | #> # A tibble: 1 × 9
132 | #> caseid hydrocd oxycodp codeine tramadl morphin methdon vicolor everused
133 | #>
134 | #> 1 1 NA 0 0 0 0 0 0 NA
135 |
136 | drugs_with_NA |>
137 | mutate(everused = lay(pick(-caseid), any, na.rm = TRUE)) |> ## with additional argument
138 | slice(1)
139 | #> # A tibble: 1 × 9
140 | #> caseid hydrocd oxycodp codeine tramadl morphin methdon vicolor everused
141 | #>
142 | #> 1 1 NA 0 0 0 0 0 0 FALSE
143 | ```
144 |
145 | Since one of the backbones of `lay()` is
146 | [**{rlang}**](https://rlang.r-lib.org), you can use the so-called
147 | [*lambda* syntax](https://rlang.r-lib.org/reference/as_function.html) to
148 | define anonymous functions on the fly:
149 |
150 | ``` r
151 | drugs_with_NA |>
152 | mutate(everused = lay(pick(-caseid), ~ any(.x, na.rm = TRUE))) ## same as above, different syntax
153 | #> # A tibble: 100 × 9
154 | #> caseid hydrocd oxycodp codeine tramadl morphin methdon vicolor everused
155 | #>
156 | #> 1 1 NA 0 0 0 0 0 0 FALSE
157 | #> 2 2 0 0 0 0 0 0 0 FALSE
158 | #> 3 3 0 0 0 0 0 0 0 FALSE
159 | #> 4 4 0 0 0 0 0 0 0 FALSE
160 | #> 5 5 0 0 0 0 0 0 0 FALSE
161 | #> 6 6 0 0 0 0 0 0 0 FALSE
162 | #> 7 7 0 0 0 0 0 0 0 FALSE
163 | #> 8 8 0 0 0 0 0 0 0 FALSE
164 | #> 9 9 0 0 0 0 0 0 1 TRUE
165 | #> 10 10 0 0 0 0 0 0 0 FALSE
166 | #> # ℹ 90 more rows
167 | ```
168 |
169 | We can also apply many functions at once, as exemplified with another
170 | dataset:
171 |
172 | ``` r
173 | data("world_bank_pop", package = "tidyr") ## requires to have installed {tidyr}
174 |
175 | world_bank_pop |>
176 | filter(indicator == "SP.POP.TOTL") |>
177 | mutate(lay(pick(matches("\\d")),
178 | ~ tibble(min = min(.x), mean = mean(.x), max = max(.x))), .after = indicator)
179 | #> # A tibble: 266 × 23
180 | #> country indicator min mean max `2000` `2001` `2002` `2003` `2004`
181 | #>
182 | #> 1 ABW SP.POP.TOTL 8.91e4 9.81e4 1.05e5 8.91e4 9.07e4 9.18e4 9.27e4 9.35e4
183 | #> 2 AFE SP.POP.TOTL 4.02e8 5.08e8 6.33e8 4.02e8 4.12e8 4.23e8 4.34e8 4.45e8
184 | #> 3 AFG SP.POP.TOTL 1.95e7 2.73e7 3.56e7 1.95e7 1.97e7 2.10e7 2.26e7 2.36e7
185 | #> 4 AFW SP.POP.TOTL 2.70e8 3.45e8 4.31e8 2.70e8 2.77e8 2.85e8 2.93e8 3.01e8
186 | #> 5 AGO SP.POP.TOTL 1.64e7 2.26e7 3.02e7 1.64e7 1.69e7 1.75e7 1.81e7 1.88e7
187 | #> 6 ALB SP.POP.TOTL 2.87e6 2.96e6 3.09e6 3.09e6 3.06e6 3.05e6 3.04e6 3.03e6
188 | #> 7 AND SP.POP.TOTL 6.61e4 7.32e4 8.02e4 6.61e4 6.78e4 7.08e4 7.39e4 7.69e4
189 | #> 8 ARB SP.POP.TOTL 2.87e8 3.52e8 4.24e8 2.87e8 2.94e8 3.00e8 3.07e8 3.13e8
190 | #> 9 ARE SP.POP.TOTL 3.28e6 6.58e6 9.07e6 3.28e6 3.45e6 3.63e6 3.81e6 3.99e6
191 | #> 10 ARG SP.POP.TOTL 3.71e7 4.05e7 4.40e7 3.71e7 3.75e7 3.79e7 3.83e7 3.87e7
192 | #> # ℹ 256 more rows
193 | #> # ℹ 13 more variables: `2005` , `2006` , `2007` , `2008` ,
194 | #> # `2009` , `2010` , `2011` , `2012` , `2013` ,
195 | #> # `2014` , `2015` , `2016` , `2017`
196 | ```
197 |
198 | Since the other backbone of `lay()` is
199 | [**{vctrs}**](https://vctrs.r-lib.org), the splicing happens
200 | automatically (unless the output of the call is used to create a named
201 | column). This is why, in the last chunk of code, three different columns
202 | (*min*, *mean* and *max*) are directly created.
203 |
204 | **Important:** when using `lay()` the function you want to use for the
205 | rowwise job must output a scalar (vector of length 1), or a tibble or
206 | data frame with a single row.
207 |
208 | We can apply a function that returns a vector of length \> 1 by turning
209 | such a vector into a tibble using `as_tibble_row()` from
210 | [**{tibble}**](https://tibble.tidyverse.org/):
211 |
212 | ``` r
213 | world_bank_pop |>
214 | filter(indicator == "SP.POP.TOTL") |>
215 | mutate(lay(pick(matches("\\d")),
216 | ~ as_tibble_row(quantile(.x, na.rm = TRUE))), .after = indicator)
217 | #> # A tibble: 266 × 25
218 | #> country indicator `0%` `25%` `50%` `75%` `100%` `2000` `2001` `2002`
219 | #>
220 | #> 1 ABW SP.POP.TOTL 8.91e4 9.38e4 9.86e4 1.03e5 1.05e5 8.91e4 9.07e4 9.18e4
221 | #> 2 AFE SP.POP.TOTL 4.02e8 4.48e8 5.03e8 5.64e8 6.33e8 4.02e8 4.12e8 4.23e8
222 | #> 3 AFG SP.POP.TOTL 1.95e7 2.38e7 2.69e7 3.13e7 3.56e7 1.95e7 1.97e7 2.10e7
223 | #> 4 AFW SP.POP.TOTL 2.70e8 3.03e8 3.42e8 3.85e8 4.31e8 2.70e8 2.77e8 2.85e8
224 | #> 5 AGO SP.POP.TOTL 1.64e7 1.89e7 2.21e7 2.59e7 3.02e7 1.64e7 1.69e7 1.75e7
225 | #> 6 ALB SP.POP.TOTL 2.87e6 2.90e6 2.94e6 3.02e6 3.09e6 3.09e6 3.06e6 3.05e6
226 | #> 7 AND SP.POP.TOTL 6.61e4 7.11e4 7.21e4 7.55e4 8.02e4 6.61e4 6.78e4 7.08e4
227 | #> 8 ARB SP.POP.TOTL 2.87e8 3.15e8 3.51e8 3.87e8 4.24e8 2.87e8 2.94e8 3.00e8
228 | #> 9 ARE SP.POP.TOTL 3.28e6 4.07e6 7.49e6 8.73e6 9.07e6 3.28e6 3.45e6 3.63e6
229 | #> 10 ARG SP.POP.TOTL 3.71e7 3.88e7 4.05e7 4.21e7 4.40e7 3.71e7 3.75e7 3.79e7
230 | #> # ℹ 256 more rows
231 | #> # ℹ 15 more variables: `2003` , `2004` , `2005` , `2006` ,
232 | #> # `2007` , `2008` , `2009` , `2010` , `2011` ,
233 | #> # `2012` , `2013` , `2014` , `2015` , `2016` ,
234 | #> # `2017`
235 | ```
236 |
237 | ### History
238 |
239 |
240 |
241 | The first draft of this package has been created by **@romainfrancois**
242 | as a reply to a tweet I (Alexandre Courtiol) posted under
243 | **@rdataberlin** in February 2020. At the time I was exploring different
244 | ways to perform rowwise jobs in R and I was experimenting with various
245 | ideas on how to exploit the fact that the newly introduced function
246 | `across()` from [**{dplyr}**](https://dplyr.tidyverse.org/) creates
247 | tibbles on which one can easily apply a function. Romain came up with
248 | `lay()` as the better solution, making good use of
249 | [**{rlang}**](https://rlang.r-lib.org/) &
250 | [**{vctrs}**](https://vctrs.r-lib.org/).
251 |
252 | The verb `lay()` never made it to be integrated within
253 | [**{dplyr}**](https://dplyr.tidyverse.org/), but, so far, I still find
254 | `lay()` superior than most alternatives, which is why I decided to
255 | document and maintain this package.
256 |
--------------------------------------------------------------------------------