├── .Rbuildignore ├── .github ├── .gitignore ├── CODE_OF_CONDUCT.md ├── ISSUE_TEMPLATE.md └── workflows │ ├── R-CMD-check-hard.yaml │ ├── R-CMD-check.yaml │ ├── lock.yaml │ ├── pkgdown.yaml │ ├── pr-commands.yaml │ └── test-coverage.yaml ├── .gitignore ├── .vscode ├── extensions.json └── settings.json ├── DESCRIPTION ├── LICENSE ├── LICENSE.md ├── NAMESPACE ├── NEWS.md ├── R ├── aaa.R ├── contrasts.R ├── data.R ├── dplyr-compat.R ├── misc.R ├── perf_mod.R ├── posterior_diff-compat.R ├── posteriors-compat.R ├── posteriors.R ├── reexports.R ├── tidyposterior-package.R ├── transformations.R └── zzz.R ├── README.Rmd ├── README.md ├── _pkgdown.yml ├── air.toml ├── codecov.yml ├── data ├── concrete_example.rda ├── ex_objects.RData ├── noisy_example.rda ├── precise_example.rda └── ts_example.rda ├── inst └── examples │ └── glm_v_nnet.RData ├── man ├── autoplot.posterior.Rd ├── autoplot.posterior_diff.Rd ├── contrast_models.Rd ├── figures │ ├── README-post-1.png │ ├── lifecycle-archived.svg │ ├── lifecycle-defunct.svg │ ├── lifecycle-deprecated.svg │ ├── lifecycle-experimental.svg │ ├── lifecycle-maturing.svg │ ├── lifecycle-questioning.svg │ ├── lifecycle-soft-deprecated.svg │ ├── lifecycle-stable.svg │ ├── lifecycle-superseded.svg │ └── logo.png ├── perf_mod.Rd ├── precise_example.Rd ├── reexports.Rd ├── rmd │ ├── example-interfaces.Rmd │ └── example-interfaces.md ├── summary.posterior.Rd ├── summary.posterior_diff.Rd ├── tidy.perf_mod.Rd ├── tidyposterior-package.Rd ├── transformations.Rd ├── vctrs_methods_posterior.Rd └── vctrs_methods_posterior_diff.Rd ├── revdep ├── README.md ├── failures.md └── problems.md ├── tests ├── testthat.R └── testthat │ ├── _snaps │ ├── contrast.md │ └── perf_mod.md │ ├── test_contrast.R │ ├── test_dplyr_new.R │ ├── test_perf_mod.R │ └── test_tidy.R ├── tidyposterior.Rproj └── vignettes ├── articles └── Getting_Started.Rmd └── tidyposterior.Rmd /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^.*\.Rproj$ 2 | ^\.Rproj\.user$ 3 | ^docs$ 4 | ^_pkgdown\.yml$ 5 | vignettes/.build.timestamp 6 | ^\.travis\.yml$ 7 | ^codecov\.yml$ 8 | README.* 9 | ^pkgdown$ 10 | tidyposterior_hex.png 11 | ^.github$ 12 | ^revdep$ 13 | ^inst/examples$ 14 | ^\.github$ 15 | ^CODE_OF_CONDUCT\.md$ 16 | ^LICENSE\.md$ 17 | ^README\.Rmd$ 18 | ^[\.]?air\.toml$ 19 | ^\.vscode$ 20 | -------------------------------------------------------------------------------- /.github/.gitignore: -------------------------------------------------------------------------------- 1 | *.html 2 | -------------------------------------------------------------------------------- /.github/CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | We as members, contributors, and leaders pledge to make participation in our 6 | community a harassment-free experience for everyone, regardless of age, body 7 | size, visible or invisible disability, ethnicity, sex characteristics, gender 8 | identity and expression, level of experience, education, socio-economic status, 9 | nationality, personal appearance, race, caste, color, religion, or sexual 10 | identity and orientation. 11 | 12 | We pledge to act and interact in ways that contribute to an open, welcoming, 13 | diverse, inclusive, and healthy community. 14 | 15 | ## Our Standards 16 | 17 | Examples of behavior that contributes to a positive environment for our 18 | community include: 19 | 20 | * Demonstrating empathy and kindness toward other people 21 | * Being respectful of differing opinions, viewpoints, and experiences 22 | * Giving and gracefully accepting constructive feedback 23 | * Accepting responsibility and apologizing to those affected by our mistakes, 24 | and learning from the experience 25 | * Focusing on what is best not just for us as individuals, but for the overall 26 | community 27 | 28 | Examples of unacceptable behavior include: 29 | 30 | * The use of sexualized language or imagery, and sexual attention or advances of 31 | any kind 32 | * Trolling, insulting or derogatory comments, and personal or political attacks 33 | * Public or private harassment 34 | * Publishing others' private information, such as a physical or email address, 35 | without their explicit permission 36 | * Other conduct which could reasonably be considered inappropriate in a 37 | professional setting 38 | 39 | ## Enforcement Responsibilities 40 | 41 | Community leaders are responsible for clarifying and enforcing our standards of 42 | acceptable behavior and will take appropriate and fair corrective action in 43 | response to any behavior that they deem inappropriate, threatening, offensive, 44 | or harmful. 45 | 46 | Community leaders have the right and responsibility to remove, edit, or reject 47 | comments, commits, code, wiki edits, issues, and other contributions that are 48 | not aligned to this Code of Conduct, and will communicate reasons for moderation 49 | decisions when appropriate. 50 | 51 | ## Scope 52 | 53 | This Code of Conduct applies within all community spaces, and also applies when 54 | an individual is officially representing the community in public spaces. 55 | Examples of representing our community include using an official e-mail address, 56 | posting via an official social media account, or acting as an appointed 57 | representative at an online or offline event. 58 | 59 | ## Enforcement 60 | 61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 62 | reported to the community leaders responsible for enforcement at codeofconduct@posit.co. 63 | All complaints will be reviewed and investigated promptly and fairly. 64 | 65 | All community leaders are obligated to respect the privacy and security of the 66 | reporter of any incident. 67 | 68 | ## Enforcement Guidelines 69 | 70 | Community leaders will follow these Community Impact Guidelines in determining 71 | the consequences for any action they deem in violation of this Code of Conduct: 72 | 73 | ### 1. Correction 74 | 75 | **Community Impact**: Use of inappropriate language or other behavior deemed 76 | unprofessional or unwelcome in the community. 77 | 78 | **Consequence**: A private, written warning from community leaders, providing 79 | clarity around the nature of the violation and an explanation of why the 80 | behavior was inappropriate. A public apology may be requested. 81 | 82 | ### 2. Warning 83 | 84 | **Community Impact**: A violation through a single incident or series of 85 | actions. 86 | 87 | **Consequence**: A warning with consequences for continued behavior. No 88 | interaction with the people involved, including unsolicited interaction with 89 | those enforcing the Code of Conduct, for a specified period of time. This 90 | includes avoiding interactions in community spaces as well as external channels 91 | like social media. Violating these terms may lead to a temporary or permanent 92 | ban. 93 | 94 | ### 3. Temporary Ban 95 | 96 | **Community Impact**: A serious violation of community standards, including 97 | sustained inappropriate behavior. 98 | 99 | **Consequence**: A temporary ban from any sort of interaction or public 100 | communication with the community for a specified period of time. No public or 101 | private interaction with the people involved, including unsolicited interaction 102 | with those enforcing the Code of Conduct, is allowed during this period. 103 | Violating these terms may lead to a permanent ban. 104 | 105 | ### 4. Permanent Ban 106 | 107 | **Community Impact**: Demonstrating a pattern of violation of community 108 | standards, including sustained inappropriate behavior, harassment of an 109 | individual, or aggression toward or disparagement of classes of individuals. 110 | 111 | **Consequence**: A permanent ban from any sort of public interaction within the 112 | community. 113 | 114 | ## Attribution 115 | 116 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], 117 | version 2.1, available at 118 | . 119 | 120 | Community Impact Guidelines were inspired by 121 | [Mozilla's code of conduct enforcement ladder][https://github.com/mozilla/inclusion]. 122 | 123 | For answers to common questions about this code of conduct, see the FAQ at 124 | . Translations are available at . 125 | 126 | [homepage]: https://www.contributor-covenant.org 127 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | If you are filing a bug, make sure these boxes are checked before submitting your issue— thank you! 2 | 3 | - [ ] Start a new R session 4 | - [ ] Install the latest version of of the package: `update.packages(oldPkgs="tidyposterior", ask=FALSE)` 5 | - [ ] [Write a minimal reproducible example](http://stackoverflow.com/a/5963610) 6 | - [ ] run `sessionInfo()` and add the results to the issue. Even better would be to use the [`sessioninfo`](https://github.com/r-lib/sessioninfo) package's `session_info()`. 7 | 8 | ### Minimal, reproducible example: 9 | 10 | __Please read this page__: [reprex = {repr}oducible {ex}ample](https://github.com/jennybc/reprex#what-is-a-reprex) 11 | 12 | Text and example code modified from [the R FAQ on stackoverflow](http://stackoverflow.com/a/5963610) 13 | 14 | #### _Minimal_ Reproducible Dataset: 15 | 16 | If you are not using a data set in a package then use e.g. `dput()` to give us something that can be put in R immediately, e.g. 17 | 18 | ```r 19 | dput(head(iris,4)) 20 | ``` 21 | 22 | Without a dataset, there usually isn't much that we can do to help. 23 | 24 | If your data frame has a factor with many levels, the `dput` output can be unwieldy because it will still list all the possible factor levels even if they aren't present in the the subset of your data. To solve this issue, you can use the `droplevels()` function. Notice below how species is a factor with only one level: `dput(droplevels(head(iris, 4)))`. 25 | 26 | #### Minimal, runnable code: 27 | 28 | ```r 29 | library(tidyposterior) 30 | library(dplyr) 31 | 32 | data(precise_example) 33 | 34 | accuracy <- precise_example |> select(id, contains("Accuracy")) 35 | accuracy <- setNames(accuracy, tolower(gsub("_Accuracy$", "", names(accuracy)))) 36 | accuracy 37 | 38 | acc_model <- perf_mod(accuracy, seed = 13311, verbose = FALSE) 39 | ``` 40 | 41 | ### Session Info: 42 | 43 | ```r 44 | sessionInfo() 45 | 46 | # or sessioninfo::session_info() 47 | 48 | ``` 49 | 50 | Be sure to test your chunks of code in an empty R session before submitting your issue! 51 | -------------------------------------------------------------------------------- /.github/workflows/R-CMD-check-hard.yaml: -------------------------------------------------------------------------------- 1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples 2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help 3 | # 4 | # NOTE: This workflow only directly installs "hard" dependencies, i.e. Depends, 5 | # Imports, and LinkingTo dependencies. Notably, Suggests dependencies are never 6 | # installed, with the exception of testthat, knitr, and rmarkdown. The cache is 7 | # never used to avoid accidentally restoring a cache containing a suggested 8 | # dependency. 9 | on: 10 | push: 11 | branches: [main, master] 12 | pull_request: 13 | 14 | name: R-CMD-check-hard.yaml 15 | 16 | permissions: read-all 17 | 18 | jobs: 19 | check-no-suggests: 20 | runs-on: ${{ matrix.config.os }} 21 | 22 | name: ${{ matrix.config.os }} (${{ matrix.config.r }}) 23 | 24 | strategy: 25 | fail-fast: false 26 | matrix: 27 | config: 28 | - {os: ubuntu-latest, r: 'release'} 29 | 30 | env: 31 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 32 | R_KEEP_PKG_SOURCE: yes 33 | 34 | steps: 35 | - uses: actions/checkout@v4 36 | 37 | - uses: r-lib/actions/setup-pandoc@v2 38 | 39 | - uses: r-lib/actions/setup-r@v2 40 | with: 41 | r-version: ${{ matrix.config.r }} 42 | http-user-agent: ${{ matrix.config.http-user-agent }} 43 | use-public-rspm: true 44 | 45 | - uses: r-lib/actions/setup-r-dependencies@v2 46 | with: 47 | dependencies: '"hard"' 48 | cache: false 49 | extra-packages: | 50 | any::rcmdcheck 51 | any::testthat 52 | any::knitr 53 | any::rmarkdown 54 | needs: check 55 | 56 | - uses: r-lib/actions/check-r-package@v2 57 | with: 58 | upload-snapshots: true 59 | build_args: 'c("--no-manual","--compact-vignettes=gs+qpdf")' 60 | -------------------------------------------------------------------------------- /.github/workflows/R-CMD-check.yaml: -------------------------------------------------------------------------------- 1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples 2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help 3 | # 4 | # NOTE: This workflow is overkill for most R packages and 5 | # check-standard.yaml is likely a better choice. 6 | # usethis::use_github_action("check-standard") will install it. 7 | on: 8 | push: 9 | branches: [main, master] 10 | pull_request: 11 | 12 | name: R-CMD-check.yaml 13 | 14 | permissions: read-all 15 | 16 | jobs: 17 | R-CMD-check: 18 | runs-on: ${{ matrix.config.os }} 19 | 20 | name: ${{ matrix.config.os }} (${{ matrix.config.r }}) 21 | 22 | strategy: 23 | fail-fast: false 24 | matrix: 25 | config: 26 | - {os: macos-latest, r: 'release'} 27 | 28 | - {os: windows-latest, r: 'release'} 29 | # use 4.0 or 4.1 to check with rtools40's older compiler 30 | - {os: windows-latest, r: 'oldrel-4'} 31 | 32 | - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'} 33 | - {os: ubuntu-latest, r: 'release'} 34 | - {os: ubuntu-latest, r: 'oldrel-1'} 35 | - {os: ubuntu-latest, r: 'oldrel-2'} 36 | - {os: ubuntu-latest, r: 'oldrel-3'} 37 | - {os: ubuntu-latest, r: 'oldrel-4'} 38 | 39 | env: 40 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 41 | R_KEEP_PKG_SOURCE: yes 42 | 43 | steps: 44 | - uses: actions/checkout@v4 45 | 46 | - uses: r-lib/actions/setup-pandoc@v2 47 | 48 | - uses: r-lib/actions/setup-r@v2 49 | with: 50 | r-version: ${{ matrix.config.r }} 51 | http-user-agent: ${{ matrix.config.http-user-agent }} 52 | use-public-rspm: true 53 | 54 | - uses: r-lib/actions/setup-r-dependencies@v2 55 | with: 56 | extra-packages: any::rcmdcheck 57 | needs: check 58 | 59 | - uses: r-lib/actions/check-r-package@v2 60 | with: 61 | upload-snapshots: true 62 | build_args: 'c("--no-manual","--compact-vignettes=gs+qpdf")' 63 | -------------------------------------------------------------------------------- /.github/workflows/lock.yaml: -------------------------------------------------------------------------------- 1 | name: 'Lock Threads' 2 | 3 | on: 4 | schedule: 5 | - cron: '0 0 * * *' 6 | 7 | jobs: 8 | lock: 9 | runs-on: ubuntu-latest 10 | steps: 11 | - uses: dessant/lock-threads@v2 12 | with: 13 | github-token: ${{ github.token }} 14 | issue-lock-inactive-days: '14' 15 | # issue-exclude-labels: '' 16 | # issue-lock-labels: 'outdated' 17 | issue-lock-comment: > 18 | This issue has been automatically locked. If you believe you have 19 | found a related problem, please file a new issue (with a reprex: 20 | ) and link to this issue. 21 | issue-lock-reason: '' 22 | pr-lock-inactive-days: '14' 23 | # pr-exclude-labels: 'wip' 24 | pr-lock-labels: '' 25 | pr-lock-comment: > 26 | This pull request has been automatically locked. If you believe you 27 | have found a related problem, please file a new issue (with a reprex: 28 | ) and link to this issue. 29 | pr-lock-reason: '' 30 | # process-only: 'issues' 31 | -------------------------------------------------------------------------------- /.github/workflows/pkgdown.yaml: -------------------------------------------------------------------------------- 1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples 2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help 3 | on: 4 | push: 5 | branches: [main, master] 6 | pull_request: 7 | release: 8 | types: [published] 9 | workflow_dispatch: 10 | 11 | name: pkgdown.yaml 12 | 13 | permissions: read-all 14 | 15 | jobs: 16 | pkgdown: 17 | runs-on: ubuntu-latest 18 | # Only restrict concurrency for non-PR jobs 19 | concurrency: 20 | group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }} 21 | env: 22 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 23 | permissions: 24 | contents: write 25 | steps: 26 | - uses: actions/checkout@v4 27 | 28 | - uses: r-lib/actions/setup-pandoc@v2 29 | 30 | - uses: r-lib/actions/setup-r@v2 31 | with: 32 | use-public-rspm: true 33 | 34 | - uses: r-lib/actions/setup-r-dependencies@v2 35 | with: 36 | extra-packages: any::pkgdown, local::. 37 | needs: website 38 | 39 | - name: Build site 40 | run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE) 41 | shell: Rscript {0} 42 | 43 | - name: Deploy to GitHub pages 🚀 44 | if: github.event_name != 'pull_request' 45 | uses: JamesIves/github-pages-deploy-action@v4.5.0 46 | with: 47 | clean: false 48 | branch: gh-pages 49 | folder: docs 50 | -------------------------------------------------------------------------------- /.github/workflows/pr-commands.yaml: -------------------------------------------------------------------------------- 1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples 2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help 3 | on: 4 | issue_comment: 5 | types: [created] 6 | 7 | name: pr-commands.yaml 8 | 9 | permissions: read-all 10 | 11 | jobs: 12 | document: 13 | if: ${{ github.event.issue.pull_request && (github.event.comment.author_association == 'MEMBER' || github.event.comment.author_association == 'OWNER') && startsWith(github.event.comment.body, '/document') }} 14 | name: document 15 | runs-on: ubuntu-latest 16 | env: 17 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 18 | permissions: 19 | contents: write 20 | steps: 21 | - uses: actions/checkout@v4 22 | 23 | - uses: r-lib/actions/pr-fetch@v2 24 | with: 25 | repo-token: ${{ secrets.GITHUB_TOKEN }} 26 | 27 | - uses: r-lib/actions/setup-r@v2 28 | with: 29 | use-public-rspm: true 30 | 31 | - uses: r-lib/actions/setup-r-dependencies@v2 32 | with: 33 | extra-packages: any::roxygen2 34 | needs: pr-document 35 | 36 | - name: Document 37 | run: roxygen2::roxygenise() 38 | shell: Rscript {0} 39 | 40 | - name: commit 41 | run: | 42 | git config --local user.name "$GITHUB_ACTOR" 43 | git config --local user.email "$GITHUB_ACTOR@users.noreply.github.com" 44 | git add man/\* NAMESPACE 45 | git commit -m 'Document' 46 | 47 | - uses: r-lib/actions/pr-push@v2 48 | with: 49 | repo-token: ${{ secrets.GITHUB_TOKEN }} 50 | 51 | style: 52 | if: ${{ github.event.issue.pull_request && (github.event.comment.author_association == 'MEMBER' || github.event.comment.author_association == 'OWNER') && startsWith(github.event.comment.body, '/style') }} 53 | name: style 54 | runs-on: ubuntu-latest 55 | env: 56 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 57 | permissions: 58 | contents: write 59 | steps: 60 | - uses: actions/checkout@v4 61 | 62 | - uses: r-lib/actions/pr-fetch@v2 63 | with: 64 | repo-token: ${{ secrets.GITHUB_TOKEN }} 65 | 66 | - uses: r-lib/actions/setup-r@v2 67 | 68 | - name: Install dependencies 69 | run: install.packages("styler") 70 | shell: Rscript {0} 71 | 72 | - name: Style 73 | run: styler::style_pkg() 74 | shell: Rscript {0} 75 | 76 | - name: commit 77 | run: | 78 | git config --local user.name "$GITHUB_ACTOR" 79 | git config --local user.email "$GITHUB_ACTOR@users.noreply.github.com" 80 | git add \*.R 81 | git commit -m 'Style' 82 | 83 | - uses: r-lib/actions/pr-push@v2 84 | with: 85 | repo-token: ${{ secrets.GITHUB_TOKEN }} 86 | -------------------------------------------------------------------------------- /.github/workflows/test-coverage.yaml: -------------------------------------------------------------------------------- 1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples 2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help 3 | on: 4 | push: 5 | branches: [main, master] 6 | pull_request: 7 | 8 | name: test-coverage.yaml 9 | 10 | permissions: read-all 11 | 12 | jobs: 13 | test-coverage: 14 | runs-on: ubuntu-latest 15 | env: 16 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 17 | 18 | steps: 19 | - uses: actions/checkout@v4 20 | 21 | - uses: r-lib/actions/setup-r@v2 22 | with: 23 | use-public-rspm: true 24 | 25 | - uses: r-lib/actions/setup-r-dependencies@v2 26 | with: 27 | extra-packages: any::covr, any::xml2 28 | needs: coverage 29 | 30 | - name: Test coverage 31 | run: | 32 | cov <- covr::package_coverage( 33 | quiet = FALSE, 34 | clean = FALSE, 35 | install_path = file.path(normalizePath(Sys.getenv("RUNNER_TEMP"), winslash = "/"), "package") 36 | ) 37 | print(cov) 38 | covr::to_cobertura(cov) 39 | shell: Rscript {0} 40 | 41 | - uses: codecov/codecov-action@v5 42 | with: 43 | # Fail if error if not on PR, or if on PR and token is given 44 | fail_ci_if_error: ${{ github.event_name != 'pull_request' || secrets.CODECOV_TOKEN }} 45 | files: ./cobertura.xml 46 | plugins: noop 47 | disable_search: true 48 | token: ${{ secrets.CODECOV_TOKEN }} 49 | 50 | - name: Show testthat output 51 | if: always() 52 | run: | 53 | ## -------------------------------------------------------------------- 54 | find '${{ runner.temp }}/package' -name 'testthat.Rout*' -exec cat '{}' \; || true 55 | shell: bash 56 | 57 | - name: Upload test results 58 | if: failure() 59 | uses: actions/upload-artifact@v4 60 | with: 61 | name: coverage-test-failures 62 | path: ${{ runner.temp }}/package 63 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | docs/articles/.DS_Store 4 | docs/.DS_Store 5 | .DS_Store 6 | revdep/library.noindex/* 7 | revdep/data.sqlite 8 | revdep/checks.noindex/* 9 | docs 10 | -------------------------------------------------------------------------------- /.vscode/extensions.json: -------------------------------------------------------------------------------- 1 | { 2 | "recommendations": [ 3 | "Posit.air-vscode" 4 | ] 5 | } 6 | -------------------------------------------------------------------------------- /.vscode/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "[r]": { 3 | "editor.formatOnSave": true, 4 | "editor.defaultFormatter": "Posit.air-vscode" 5 | } 6 | } 7 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: tidyposterior 2 | Title: Bayesian Analysis to Compare Models using Resampling Statistics 3 | Version: 1.0.1.9000 4 | Authors@R: c( 5 | person("Max", "Kuhn", , "max@posit.co", role = c("aut", "cre"), 6 | comment = c(ORCID = "0000-0003-2402-136X")), 7 | person("Posit Software, PBC", role = c("cph", "fnd"), 8 | comment = c(ROR = "03wc8by49")) 9 | ) 10 | Description: Bayesian analysis used here to answer the question: "when 11 | looking at resampling results, are the differences between models 12 | 'real'?" To answer this, a model can be created were the performance 13 | statistic is the resampling statistics (e.g. accuracy or RMSE). These 14 | values are explained by the model types. In doing this, we can get 15 | parameter estimates for each model's affect on performance and make 16 | statistical (and practical) comparisons between models. The methods 17 | included here are similar to Benavoli et al (2017) 18 | . 19 | License: MIT + file LICENSE 20 | URL: https://tidyposterior.tidymodels.org, 21 | https://github.com/tidymodels/tidyposterior 22 | BugReports: https://github.com/tidymodels/tidyposterior/issues 23 | Depends: 24 | R (>= 4.1) 25 | Imports: 26 | dplyr (> 1.0.0), 27 | generics, 28 | ggplot2, 29 | purrr, 30 | rlang, 31 | rsample (>= 0.0.2), 32 | rstanarm (>= 2.21.1), 33 | stats, 34 | tibble, 35 | tidyr (>= 0.7.1), 36 | tune (>= 0.2.0), 37 | utils, 38 | vctrs (>= 0.3.0), 39 | workflowsets 40 | Suggests: 41 | covr, 42 | knitr, 43 | parsnip, 44 | rmarkdown, 45 | splines2, 46 | testthat (>= 3.0.0), 47 | yardstick 48 | VignetteBuilder: 49 | knitr 50 | ByteCompile: true 51 | Config/Needs/website: tidymodels, tidyverse/tidytemplate 52 | Config/testthat/edition: 3 53 | Config/usethis/last-upkeep: 2025-04-27 54 | Encoding: UTF-8 55 | LazyData: true 56 | Roxygen: list(markdown = TRUE) 57 | RoxygenNote: 7.3.2 58 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | YEAR: 2025 2 | COPYRIGHT HOLDER: tidyposterior authors 3 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | # MIT License 2 | 3 | Copyright (c) 2025 tidyposterior authors 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | S3method("[",posterior) 4 | S3method("[",posterior_diff) 5 | S3method("names<-",posterior) 6 | S3method("names<-",posterior_diff) 7 | S3method(autoplot,perf_mod) 8 | S3method(autoplot,perf_mod_workflow_set) 9 | S3method(autoplot,posterior) 10 | S3method(autoplot,posterior_diff) 11 | S3method(perf_mod,data.frame) 12 | S3method(perf_mod,default) 13 | S3method(perf_mod,resamples) 14 | S3method(perf_mod,rset) 15 | S3method(perf_mod,tune_results) 16 | S3method(perf_mod,workflow_set) 17 | S3method(postint,data.frame) 18 | S3method(postint,numeric) 19 | S3method(print,perf_mod) 20 | S3method(print,posterior) 21 | S3method(print,posterior_diff) 22 | S3method(summary,perf_mod) 23 | S3method(summary,posterior) 24 | S3method(summary,posterior_diff) 25 | S3method(tidy,perf_mod) 26 | export(Fisher_trans) 27 | export(autoplot) 28 | export(contrast_models) 29 | export(inv_trans) 30 | export(ln_trans) 31 | export(logit_trans) 32 | export(no_trans) 33 | export(perf_mod) 34 | export(tidy) 35 | export(tidy.perf_mod) 36 | export(vec_cast.data.frame.posterior) 37 | export(vec_cast.data.frame.posterior_diff) 38 | export(vec_cast.posterior.data.frame) 39 | export(vec_cast.posterior.posterior) 40 | export(vec_cast.posterior.tbl_df) 41 | export(vec_cast.posterior_diff.data.frame) 42 | export(vec_cast.posterior_diff.posterior_diff) 43 | export(vec_cast.posterior_diff.tbl_df) 44 | export(vec_cast.tbl_df.posterior) 45 | export(vec_cast.tbl_df.posterior_diff) 46 | export(vec_proxy.posterior) 47 | export(vec_proxy.posterior_diff) 48 | export(vec_ptype2.data.frame.posterior) 49 | export(vec_ptype2.data.frame.posterior_diff) 50 | export(vec_ptype2.posterior.data.frame) 51 | export(vec_ptype2.posterior.posterior) 52 | export(vec_ptype2.posterior.tbl_df) 53 | export(vec_ptype2.posterior_diff.data.frame) 54 | export(vec_ptype2.posterior_diff.posterior_diff) 55 | export(vec_ptype2.posterior_diff.tbl_df) 56 | export(vec_ptype2.tbl_df.posterior) 57 | export(vec_ptype2.tbl_df.posterior_diff) 58 | export(vec_restore.posterior) 59 | export(vec_restore.posterior_diff) 60 | importFrom(dplyr,bind_cols) 61 | importFrom(generics,tidy) 62 | importFrom(ggplot2,autoplot) 63 | importFrom(purrr,map) 64 | importFrom(purrr,map_chr) 65 | importFrom(rlang,"!!") 66 | importFrom(rlang,is_string) 67 | importFrom(rlang,na_dbl) 68 | importFrom(rsample,vfold_cv) 69 | importFrom(rstanarm,posterior_epred) 70 | importFrom(rstanarm,stan_glmer) 71 | importFrom(stats,as.formula) 72 | importFrom(stats,median) 73 | importFrom(stats,quantile) 74 | importFrom(stats,setNames) 75 | importFrom(tibble,as_tibble) 76 | importFrom(utils,combn) 77 | -------------------------------------------------------------------------------- /NEWS.md: -------------------------------------------------------------------------------- 1 | # tidyposterior (development version) 2 | 3 | * Transition from the magrittr pipe to the base R pipe. 4 | 5 | # tidyposterior 1.0.1 6 | 7 | Maintenance release for a broken test in current R-devel. 8 | 9 | Maintainer email change. 10 | 11 | # tidyposterior 1.0.0 12 | 13 | * Transition from `gather()` to `pivot_longer()`. 14 | 15 | * Update to testthat 3e. 16 | 17 | # tidyposterior 0.1.0 18 | 19 | * The `ggplot()` methods are now removed in factor of `autoplot()` methods. 20 | 21 | * `perf_mod()` methods added for tuning parameter objects from `tune`, `finetune`, and `workflowsets`. 22 | 23 | * `rstanarm` version 2.21.1 or higher is now required due to changes to their APIs. 24 | 25 | * Re-licensed package from GPL-2 to MIT. All copyright holders are RStudio employees and give consent. 26 | 27 | # `tidyposterior` 0.0.3 28 | 29 | * `contrast_models()` now returns a tibble and has an extra column called `contrast`. 30 | 31 | * The plot methods are now deprecated and will be removed in the next version. They are not very good and can be replaced with simple `ggplot` code. 32 | 33 | * An optional formula argument was added to `perf_mod.rset()` and `perf_mod.data.frame()`. When the resampling method has multiple ID columns, a nested data structure is assumed (with a warning). The new `formula` argument can be used to over-ride the nesting. 34 | 35 | 36 | 37 | ## Breaking Changes 38 | 39 | * Methods for compatibility with `dplyr` 1.0.0. For the newer `dplyr` version, if critical columns for `posterior` or `posterior_diff` objects are removed, the objects is down-graded to a tibble. For earlier versions of `dplyr`, the object is not down-cast. 40 | 41 | 42 | # `tidyposterior` 0.0.2 43 | 44 | A small, maintenance release. 45 | 46 | ## Minor bug fixes and improvements 47 | 48 | * Moved from the `broom` package to the `generics` package to get the `tidy` generic. 49 | 50 | * `ggplot2` was moved to Suggests 51 | 52 | * The sole `tidy` method was more explicitly exported so that the `generics` man files show the method. 53 | 54 | * The large RData objects containing the examples have been removed from the package and are accessible via a link to the GitHub repo. 55 | 56 | # `tidyposterior` 0.0.1 57 | 58 | * First CRAN submission 59 | -------------------------------------------------------------------------------- /R/aaa.R: -------------------------------------------------------------------------------- 1 | # nocov start 2 | 3 | # Global vars ------------------------------------------------------------------ 4 | 5 | utils::globalVariables(c( 6 | "contrast", 7 | "difference", 8 | "model_1", 9 | "model_2", 10 | ".", 11 | "aes", 12 | "posterior", 13 | ".metric", 14 | "id", 15 | "model", 16 | "splits", 17 | "statistic", 18 | "Resample", 19 | ".config", 20 | ".estimate", 21 | ".lower", 22 | ".upper", 23 | "pract_equiv", 24 | "sub_model", 25 | "wflow_id", 26 | "workflow" 27 | )) 28 | 29 | # nocov end 30 | -------------------------------------------------------------------------------- /R/contrasts.R: -------------------------------------------------------------------------------- 1 | #' Estimate the Difference Between Models 2 | #' 3 | #' The posterior distributions created by [perf_mod()] can be used to obtain 4 | #' the posterior distribution of the difference(s) between models. One or more 5 | #' comparisons can be computed at the same time. 6 | #' 7 | #' @param x An object produced by [perf_mod()]. 8 | #' @param list_1,list_2 Character vectors of equal length that specify the 9 | #' specific pairwise contrasts. The contrast is parameterized as 10 | #' `list_1[i] - list_2[i]`. If the defaults are left to `NULL`, all 11 | #' combinations are evaluated. 12 | #' @param seed A single integer for sampling from the posterior. 13 | #' @return A data frame of the posterior distribution(s) of the difference(s). 14 | #' The object has an extra class of `"posterior_diff"`. 15 | #' @details If a transformation was used when `x` was created, the inverse is 16 | #' applied _before_ the difference is computed. 17 | #' @export 18 | contrast_models <- function( 19 | x, 20 | list_1 = NULL, 21 | list_2 = NULL, 22 | seed = sample.int(10000, 1) 23 | ) { 24 | if (is.null(list_1) & is.null(list_2)) { 25 | combos <- combn(x$names, 2) 26 | list_1 <- combos[1, ] 27 | list_2 <- combos[2, ] 28 | } else { 29 | if (length(list_1) != length(list_2)) { 30 | rlang::abort("`list_1` and `list_2` should be the same length.") 31 | } 32 | } 33 | 34 | models <- purrr::map2(list_1, list_2, make_df, id_vals = x$ids) 35 | diffs <- 36 | purrr::map_df( 37 | models, 38 | make_diffs, 39 | obj = x$stan, 40 | trans = x$transform, 41 | seed = seed 42 | ) |> 43 | dplyr::mutate(contrast = paste(model_1, model_2, sep = " vs. ")) 44 | diffs <- tibble::as_tibble(diffs) 45 | class(diffs) <- c("posterior_diff", class(diffs)) 46 | diffs 47 | } 48 | 49 | 50 | #' @export 51 | print.posterior_diff <- function(x, ...) { 52 | cat("# Posterior samples of performance differences\n") 53 | print(tibble::as_tibble(x), ...) 54 | } 55 | 56 | 57 | #' Summarize Posterior Distributions of Model Differences 58 | #' 59 | #' Credible intervals are created for the differences. Also, 60 | #' region of practical equivalence (ROPE) statistics are computed 61 | #' when the effective size of a difference is given. 62 | #' 63 | #' @param object An object produced by [contrast_models()]. 64 | #' @param prob A number p (0 < p < 1) indicating the desired 65 | #' probability mass to include in the intervals. 66 | #' @param size The size of an effective difference in the units of the chosen 67 | #' metric. For example, a 5 percent increase in accuracy (`size = 0.05`) 68 | #' between two models might be considered a "real" difference. 69 | #' @param ... Not currently used 70 | #' @return A data frame with interval and ROPE statistics for each 71 | #' comparison. 72 | #' @details The ROPE estimates included in the results are the 73 | #' columns `pract_neg`, `pract_equiv`, and `pract_pos`. `pract_neg` 74 | #' integrates the portion of the posterior below `-size` (and 75 | #' `pract_pos` is the upper integral starting at `size`). The 76 | #' interpretation depends on whether the metric being analyzed is 77 | #' better when larger or smaller. `pract_equiv` integrates between 78 | #' `[-size, size]`. If this is close to one, the two models are 79 | #' unlikely to be practically different relative to `size`. 80 | #' @export 81 | #' @examples 82 | #' data("ex_objects") 83 | #' 84 | #' summary(contrast_samples) 85 | #' summary(contrast_samples, size = 0.025) 86 | summary.posterior_diff <- function(object, prob = 0.90, size = 0, ...) { 87 | object <- object |> 88 | dplyr::mutate(contrast = paste(model_1, model_2, sep = " vs ")) |> 89 | dplyr::rename(posterior = difference) 90 | post_int <- object |> 91 | dplyr::group_by(contrast) |> 92 | dplyr::do(postint.data.frame(., prob = prob)) 93 | post_stats <- object |> 94 | dplyr::group_by(contrast) |> 95 | dplyr::summarise( 96 | probability = mean(posterior > 0), 97 | mean = mean(posterior) 98 | ) |> 99 | dplyr::full_join(post_int, by = c("contrast")) 100 | if (size != 0) { 101 | rope_stats <- object |> 102 | dplyr::group_by(contrast) |> 103 | dplyr::summarise( 104 | size = size, 105 | pract_neg = mean(posterior < -size), 106 | pract_equiv = mean(posterior >= -size & posterior <= size), 107 | pract_pos = mean(posterior > size) 108 | ) 109 | } else { 110 | rope_stats <- object |> 111 | dplyr::group_by(contrast) |> 112 | dplyr::summarise( 113 | size = size, 114 | pract_neg = na_dbl, 115 | pract_equiv = na_dbl, 116 | pract_pos = na_dbl 117 | ) 118 | } 119 | dplyr::full_join(post_stats, rope_stats, by = c("contrast")) 120 | } 121 | 122 | 123 | #' Visualize the Posterior Distributions of Model Differences 124 | #' 125 | #' A density is created for each contrast in a faceted grid. 126 | #' 127 | #' @param object An object produced by [contrast_models()]. 128 | #' @param size The size of an effective difference. For example, a 129 | #' 5\% increase in accuracy between two models might be considered a 130 | #' "real" difference. 131 | #' @param ... Options passed to `geom_line(stat = "density", ...)`. 132 | #' @return A [ggplot2::ggplot()] object using `geom_density` 133 | #' faceted by the models being contrasted (when there are 2 or 134 | #' more contrasts). 135 | #' @examples 136 | #' data(ex_objects) 137 | #' library(ggplot2) 138 | #' autoplot(contrast_samples) 139 | #' @export 140 | autoplot.posterior_diff <- 141 | function(object, size = 0, ...) { 142 | object <- as.data.frame(object) 143 | out <- 144 | ggplot2::ggplot(object, ggplot2::aes(x = difference)) + 145 | ggplot2::geom_line(stat = "density", ...) + 146 | ggplot2::ylab("Posterior Probability") 147 | if (length(unique(paste0(object$model_1, object$model_2))) > 1) { 148 | out <- out + ggplot2::facet_grid(model_2 ~ model_1) 149 | } 150 | if (size != 0) { 151 | out <- out + 152 | ggplot2::geom_vline(xintercept = c(-size, size), lty = 2, alpha = .5) 153 | } 154 | out 155 | } 156 | 157 | 158 | make_df <- function(a, b, id_vals = NULL) { 159 | new_dat <- data.frame(model = c(a, b)) 160 | as.data.frame(lapply(id_vals, function(x) rep(x[1], nrow(new_dat)))) |> 161 | dplyr::bind_cols(new_dat) 162 | } 163 | 164 | make_diffs <- function(spec, obj, trans, seed) { 165 | res_1 <- posterior_epred( 166 | obj, 167 | newdata = spec[1, ], 168 | seed = seed, 169 | re.form = NA 170 | ) 171 | res_1 <- trans$inv(res_1[, 1]) 172 | res_2 <- posterior_epred( 173 | obj, 174 | newdata = spec[2, ], 175 | seed = seed, 176 | re.form = NA 177 | ) 178 | res_2 <- trans$inv(res_2[, 1]) 179 | res <- data.frame( 180 | difference = res_1 - res_2, 181 | model_1 = as.character(spec$model[1]), 182 | model_2 = as.character(spec$model[2]), 183 | stringsAsFactors = FALSE 184 | ) 185 | res 186 | } 187 | -------------------------------------------------------------------------------- /R/data.R: -------------------------------------------------------------------------------- 1 | #' Example Data Sets 2 | #' 3 | #' @details Several data sets are contained in the package 4 | #' as examples. Each _simulates_ an `rset` object but the `splits` 5 | #' columns are not included to save space. 6 | #' 7 | #' \itemize{ 8 | #' \item{`precise_example` contains the results of the classification 9 | #' analysis of a real data set using 10-fold CV. The holdout data 10 | #' sets contained thousands of examples and have precise 11 | #' performance estimates. Three models were fit to the original 12 | #' data and several performance metrics are included.} 13 | #' 14 | #' \item{`noisy_example` was also generated from a regression data 15 | #' simulation. The original data set was small (50 samples) and 16 | #' 10-repeated of 10-fold CV were used with four models. There is 17 | #' an excessive of variability in the results (probably more than 18 | #' the resample-to-resample variability). The RMSE distributions 19 | #' show fairly right-skewed distributions.} 20 | #' 21 | #' \item{`concrete_example` contains the results of the regression case 22 | #' study from the book _Applied Predictive Modeling_. The original 23 | #' data set contained 745 samples in the training set. 10-repeats 24 | #' of 10-fold CV was also used and 13 models were fit to the data.} 25 | #' 26 | #' \item{`ts_example` is from a data set where rolling-origin forecast 27 | #' resampling was used. Each assessment set is the summary of 14 28 | #' observations (i.e. 2 weeks). The analysis set consisted of a 29 | #' base of about 5,500 samples plus the previous assessment sets. 30 | #' Four regression models were applied to these data.} 31 | #' 32 | #' \item{`ex_object` objects were generated from the `two_class_dat` data in 33 | #' the `modeldata` package. Basic 10-fold cross validation was used to evaluate 34 | #' the models. The `posterior_samples` object is samples of the posterior 35 | #' distribution of the model ROC values while `contrast_samples` are posterior 36 | #' probabilities form the differences in ROC values.} 37 | #' } 38 | #' 39 | #' @name precise_example 40 | #' @aliases precise_example 41 | #' @docType data 42 | #' @return Tibbles with the additional class `rset` 43 | #' @keywords datasets 44 | #' @examples 45 | #' data(precise_example) 46 | #' precise_example 47 | NULL 48 | 49 | #' @name noisy_example 50 | #' @rdname precise_example 51 | #' @aliases noisy_example 52 | #' @docType data 53 | #' @keywords datasets 54 | NULL 55 | 56 | #' @name concrete_example 57 | #' @rdname precise_example 58 | #' @aliases concrete_example 59 | #' @docType data 60 | #' @keywords datasets 61 | NULL 62 | 63 | #' @name ts_example 64 | #' @rdname precise_example 65 | #' @aliases ts_example 66 | #' @docType data 67 | #' @keywords datasets 68 | NULL 69 | 70 | #' @name ex_object 71 | #' @rdname precise_example 72 | #' @aliases posterior_samples 73 | #' @aliases contrast_samples 74 | #' @docType data 75 | #' @keywords datasets 76 | NULL 77 | -------------------------------------------------------------------------------- /R/dplyr-compat.R: -------------------------------------------------------------------------------- 1 | # nocov 2 | ## adapted from 3 | ## https://github.com/hadley/dtplyr/blob/2308ff25e88bb81fe84f9051e37ddd9d572189ee/R/compat-dplyr-0.6.0.R 4 | ## and based on 5 | ## https://github.com/tidyverse/googledrive/commit/95455812d2e0d6bdf92b5f6728e3265bf65d8467#diff-ba61d4f2ccd992868e27305a9ab68a3c 6 | 7 | ## function is called in .onLoad() 8 | register_s3_method <- function(pkg, generic, class, fun = NULL) { 9 | # nocov start 10 | stopifnot(is_string(pkg)) 11 | envir <- asNamespace(pkg) 12 | 13 | stopifnot(is_string(generic)) 14 | stopifnot(is_string(class)) 15 | if (is.null(fun)) { 16 | fun <- get(paste0(generic, ".", class), envir = parent.frame()) 17 | } 18 | stopifnot(is.function(fun)) 19 | 20 | if (pkg %in% loadedNamespaces()) { 21 | registerS3method(generic, class, fun, envir = envir) 22 | } 23 | 24 | # Always register hook in case package is later unloaded & reloaded 25 | setHook( 26 | packageEvent(pkg, "onLoad"), 27 | function(...) { 28 | registerS3method(generic, class, fun, envir = envir) 29 | } 30 | ) 31 | } # nocov end 32 | -------------------------------------------------------------------------------- /R/misc.R: -------------------------------------------------------------------------------- 1 | check_trans <- function(x) { 2 | trans_msg <- "`transform` should have two functions: 'func' and 'inv'" 3 | if (length(x) != 2) { 4 | rlang::abort(trans_msg) 5 | } else { 6 | if (!all(sort(names(x)) == c("func", "inv"))) { 7 | rlang::abort(trans_msg) 8 | } 9 | if (!all(is.function(transform))) { 10 | rlang::abort(trans_msg) 11 | } 12 | } 13 | invisible(x) 14 | } 15 | 16 | 17 | is_repeated_cv <- function(x) { 18 | all(grepl("^Fold", x$values$Resample) & grepl("\\.Rep", x$values$Resample)) 19 | } 20 | 21 | 22 | get_id_vals <- function(x) { 23 | id_vars <- grep("(^id$)|(^id[1-9]$)", names(x), value = TRUE) 24 | map(x[, id_vars, drop = FALSE], function(x) unique(as.character(x))) 25 | } 26 | -------------------------------------------------------------------------------- /R/perf_mod.R: -------------------------------------------------------------------------------- 1 | #' Bayesian Analysis of Resampling Statistics 2 | #' 3 | #' Bayesian analysis used here to answer the question: "when looking at 4 | #' resampling results, are the differences between models 'real?'" To answer 5 | #' this, a model can be created were the _outcome_ is the resampling statistics 6 | #' (e.g. accuracy or RMSE). These values are explained by the model types. In 7 | #' doing this, we can get parameter estimates for each model's affect on 8 | #' performance and make statistical (and practical) comparisons between models. 9 | #' 10 | #' @param object Depending on the context (see Details below): 11 | #' 12 | #' * A data frame with `id` columns for the resampling groupds and metric 13 | #' results in all of the other columns.. 14 | #' * An `rset` object (such as [rsample::vfold_cv()]) containing the `id` 15 | #' column(s) and at least two numeric columns of model performance 16 | #' statistics (e.g. accuracy). 17 | #' * An object from `caret::resamples`. 18 | #' * An object with class `tune_results`, which could be produced by 19 | #' `tune::tune_grid()`, `tune::tune_bayes()` or similar. 20 | #' * A workflow set where all results contain the metric value given in the 21 | #' `metric` argument value. 22 | #' 23 | #' @param formula An optional model formula to use for the Bayesian hierarchical model 24 | #' (see Details below). 25 | #' @param ... Additional arguments to pass to [rstanarm::stan_glmer()] such as 26 | #' `verbose`, `prior`, `seed`, `refresh`, `family`, etc. 27 | #' @param metric A single character string for the metric used in the 28 | #' `tune_results` that should be used in the Bayesian analysis. If none is given, 29 | #' the first metric value is used. 30 | #' @param filter A conditional logic statement that can be used to filter the 31 | #' statistics generated by `tune_results` using the tuning parameter values or 32 | #' the `.config` column. 33 | #' @return An object of class `perf_mod`. If a workfkow set is given in 34 | #' `object`, there is an extra class of `"perf_mod_workflow_set"`. 35 | #' @details These functions can be used to process and analyze matched 36 | #' resampling statistics from different models using a Bayesian generalized 37 | #' linear model with effects for the model and the resamples. 38 | #' 39 | #' ## Bayesian Model formula 40 | #' 41 | #' By default, a generalized linear model with Gaussian error and an identity 42 | #' link is fit to the data and has terms for the predictive model grouping 43 | #' variable. In this way, the performance metrics can be compared between 44 | #' models. 45 | #' 46 | #' Additionally, random effect terms are also used. For most resampling 47 | #' methods (except repeated _V_-fold cross-validation), a simple random 48 | #' intercept model its used with an exchangeable (i.e. compound-symmetric) 49 | #' variance structure. In the case of repeated cross-validation, two random 50 | #' intercept terms are used; one for the repeat and another for the fold within 51 | #' repeat. These also have exchangeable correlation structures. 52 | #' 53 | #' The above model specification assumes that the variance in the performance 54 | #' metrics is the same across models. However, this is unlikely to be true in 55 | #' some cases. For example, for simple binomial accuracy, it well know that the 56 | #' variance is highest when the accuracy is near 50 percent. When the argument 57 | #' `hetero_var = TRUE`, the variance structure uses random intercepts for each 58 | #' model term. This may produce more realistic posterior distributions but may 59 | #' take more time to converge. 60 | #' 61 | #' Examples of the default formulas are: 62 | #' 63 | #' \preformatted{ 64 | #' # One ID field and common variance: 65 | #' statistic ~ model + (model | id) 66 | #' 67 | #' # One ID field and heterogeneous variance: 68 | #' statistic ~ model + (model + 0 | id) 69 | #' 70 | #' # Repeated CV (id = repeat, id2 = fold within repeat) 71 | #' # with a common variance: 72 | #' statistic ~ model + (model | id2/id) 73 | #' 74 | #' # Repeated CV (id = repeat, id2 = fold within repeat) 75 | #' # with a heterogeneous variance: 76 | #' statistic ~ model + (model + 0| id2/id) 77 | #' 78 | #' # Default for unknown resampling method and 79 | #' # multiple ID fields: 80 | #' statistic ~ model + (model | idN/../id) 81 | #' } 82 | #' 83 | #' Custom formulas should use `statistic` as the outcome variable and `model` 84 | #' as the factor variable with the model names. 85 | #' 86 | #' Also, as shown in the package vignettes, the Gaussian assumption make be 87 | #' unrealistic. In this case, there are at least two approaches that can be 88 | #' used. First, the outcome statistics can be transformed prior to fitting the 89 | #' model. For example, for accuracy, the logit transformation can be used to 90 | #' convert the outcome values to be on the real line and a model is fit to 91 | #' these data. Once the posterior distributions are computed, the inverse 92 | #' transformation can be used to put them back into the original units. The 93 | #' `transform` argument can be used to do this. 94 | #' 95 | #' The second approach would be to use a different error distribution from the 96 | #' exponential family. For RMSE values, the Gamma distribution may produce 97 | #' better results at the expense of model computational complexity. This can be 98 | #' achieved by passing the `family` argument to `perf_mod` as one might with 99 | #' the `glm` function. 100 | #' 101 | #' ## Input formats 102 | #' 103 | #' There are several ways to give resampling results to the `perf_mod()` function. To 104 | #' illustrate, here are some example objects using 10-fold cross-validation for a 105 | #' simple two-class problem: 106 | #' 107 | #' 108 | #' ```r 109 | #' library(tidymodels) 110 | #' library(tidyposterior) 111 | #' library(workflowsets) 112 | #' 113 | #' data(two_class_dat, package = "modeldata") 114 | #' 115 | #' set.seed(100) 116 | #' folds <- vfold_cv(two_class_dat) 117 | #' ``` 118 | #' 119 | #' We can define two different models (for simplicity, with no tuning parameters). 120 | #' 121 | #' 122 | #' ```r 123 | #' logistic_reg_glm_spec <- 124 | #' logistic_reg() |> 125 | #' set_engine('glm') 126 | #' 127 | #' mars_earth_spec <- 128 | #' mars(prod_degree = 1) |> 129 | #' set_engine('earth') |> 130 | #' set_mode('classification') 131 | #' ``` 132 | #' 133 | #' For tidymodels, the [tune::fit_resamples()] function can be used to estimate 134 | #' performance for each model/resample: 135 | #' 136 | #' 137 | #' ```r 138 | #' rs_ctrl <- control_resamples(save_workflow = TRUE) 139 | #' 140 | #' logistic_reg_glm_res <- 141 | #' logistic_reg_glm_spec |> 142 | #' fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl) 143 | #' 144 | #' mars_earth_res <- 145 | #' mars_earth_spec |> 146 | #' fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl) 147 | #' ``` 148 | #' 149 | #' From these, there are several ways to pass the results to `perf_mod()`. 150 | #' 151 | #' ### Data Frame as Input 152 | #' 153 | #' The most general approach is to have a data frame with the resampling labels (i.e., 154 | #' one or more id columns) as well as columns for each model that you would like to 155 | #' compare. 156 | #' 157 | #' For the model results above, [tune::collect_metrics()] can be used along with some 158 | #' basic data manipulation steps: 159 | #' 160 | #' 161 | #' ```r 162 | #' logistic_roc <- 163 | #' collect_metrics(logistic_reg_glm_res, summarize = FALSE) |> 164 | #' dplyr::filter(.metric == "roc_auc") |> 165 | #' dplyr::select(id, logistic = .estimate) 166 | #' 167 | #' mars_roc <- 168 | #' collect_metrics(mars_earth_res, summarize = FALSE) |> 169 | #' dplyr::filter(.metric == "roc_auc") |> 170 | #' dplyr::select(id, mars = .estimate) 171 | #' 172 | #' resamples_df <- full_join(logistic_roc, mars_roc, by = "id") 173 | #' resamples_df 174 | #' ``` 175 | #' 176 | #' ``` 177 | #' ## # A tibble: 10 x 3 178 | #' ## id logistic mars 179 | #' ## 180 | #' ## 1 Fold01 0.908 0.875 181 | #' ## 2 Fold02 0.904 0.917 182 | #' ## 3 Fold03 0.924 0.938 183 | #' ## 4 Fold04 0.881 0.881 184 | #' ## 5 Fold05 0.863 0.864 185 | #' ## 6 Fold06 0.893 0.889 186 | #' ## # … with 4 more rows 187 | #' ``` 188 | #' 189 | #' We can then give this directly to `perf_mod()`: 190 | #' 191 | #' 192 | #' ```r 193 | #' set.seed(101) 194 | #' roc_model_via_df <- perf_mod(resamples_df, refresh = 0) 195 | #' tidy(roc_model_via_df) |> summary() 196 | #' ``` 197 | #' 198 | #' ``` 199 | #' ## # A tibble: 2 x 4 200 | #' ## model mean lower upper 201 | #' ## 202 | #' ## 1 logistic 0.892 0.879 0.906 203 | #' ## 2 mars 0.888 0.875 0.902 204 | #' ``` 205 | #' 206 | #' ### rsample Object as Input 207 | #' 208 | #' Alternatively, the result columns can be merged back into the original `rsample` 209 | #' object. The up-side to using this method is that `perf_mod()` will know exactly 210 | #' which model formula to use for the Bayesian model: 211 | #' 212 | #' 213 | #' ```r 214 | #' resamples_rset <- 215 | #' full_join(folds, logistic_roc, by = "id") |> 216 | #' full_join(mars_roc, by = "id") 217 | #' 218 | #' set.seed(101) 219 | #' roc_model_via_rset <- perf_mod(resamples_rset, refresh = 0) 220 | #' tidy(roc_model_via_rset) |> summary() 221 | #' ``` 222 | #' 223 | #' ``` 224 | #' ## # A tibble: 2 x 4 225 | #' ## model mean lower upper 226 | #' ## 227 | #' ## 1 logistic 0.892 0.879 0.906 228 | #' ## 2 mars 0.888 0.875 0.902 229 | #' ``` 230 | #' 231 | #' ### Workflow Set Object as Input 232 | #' 233 | #' Finally, for tidymodels, a workflow set object can be used. This is a collection of 234 | #' models/preprocessing combinations in one object. We can emulate a workflow set using 235 | #' the existing example results then pass that to `perf_mod()`: 236 | #' 237 | #' 238 | #' ```r 239 | #' example_wset <- 240 | #' as_workflow_set(logistic = logistic_reg_glm_res, mars = mars_earth_res) 241 | #' 242 | #' set.seed(101) 243 | #' roc_model_via_wflowset <- perf_mod(example_wset, refresh = 0) 244 | #' tidy(roc_model_via_rset) |> summary() 245 | #' ``` 246 | #' 247 | #' ``` 248 | #' ## # A tibble: 2 x 4 249 | #' ## model mean lower upper 250 | #' ## 251 | #' ## 1 logistic 0.892 0.879 0.906 252 | #' ## 2 mars 0.888 0.875 0.902 253 | #' ``` 254 | #' 255 | #' ### caret resamples object 256 | #' 257 | #' The `caret` package can also be used. An equivalent set of models are created: 258 | #' 259 | #' 260 | #' 261 | #' ```r 262 | #' library(caret) 263 | #' 264 | #' set.seed(102) 265 | #' logistic_caret <- train(Class ~ ., data = two_class_dat, method = "glm", 266 | #' trControl = trainControl(method = "cv")) 267 | #' 268 | #' set.seed(102) 269 | #' mars_caret <- train(Class ~ ., data = two_class_dat, method = "gcvEarth", 270 | #' tuneGrid = data.frame(degree = 1), 271 | #' trControl = trainControl(method = "cv")) 272 | #' ``` 273 | #' 274 | #' Note that these two models use the same resamples as one another due to setting the 275 | #' seed prior to calling `train()`. However, these are different from the tidymodels 276 | #' results used above (so the final results will be different). 277 | #' 278 | #' `caret` has a `resamples()` function that can collect and collate the resamples. 279 | #' This can also be given to `perf_mod()`: 280 | #' 281 | #' 282 | #' ```r 283 | #' caret_resamples <- resamples(list(logistic = logistic_caret, mars = mars_caret)) 284 | #' 285 | #' set.seed(101) 286 | #' roc_model_via_caret <- perf_mod(caret_resamples, refresh = 0) 287 | #' tidy(roc_model_via_caret) |> summary() 288 | #' ``` 289 | #' 290 | #' ``` 291 | #' ## # A tibble: 2 x 4 292 | #' ## model mean lower upper 293 | #' ## 294 | #' ## 1 logistic 0.821 0.801 0.842 295 | #' ## 2 mars 0.822 0.802 0.842 296 | #' ``` 297 | #' @references 298 | #' Kuhn and Silge (2021) _Tidy Models with R_, Chapter 11, 299 | #' \url{https://www.tmwr.org/compare.html} 300 | #' @seealso [tidy.perf_mod()], [tidyposterior::contrast_models()] 301 | #' @export 302 | perf_mod <- function(object, ...) { 303 | UseMethod("perf_mod") 304 | } 305 | 306 | 307 | #' @export 308 | perf_mod.default <- function(object, ...) { 309 | rlang::abort( 310 | "`object` should have at least one of these classes: ", 311 | "'rset', 'workflow_set', 'data.frame', 'resamples', or 'vfold_cv'. ", 312 | "See ?perf_mod" 313 | ) 314 | } 315 | 316 | #' @rdname perf_mod 317 | #' @param transform An named list of transformation and inverse 318 | #' transformation functions. See [logit_trans()] as an example. 319 | #' @param hetero_var A logical; if `TRUE`, then different 320 | #' variances are estimated for each model group. Otherwise, the 321 | #' same variance is used for each group. Estimating heterogeneous 322 | #' variances may slow or prevent convergence. 323 | #' @export 324 | 325 | perf_mod.rset <- 326 | function( 327 | object, 328 | transform = no_trans, 329 | hetero_var = FALSE, 330 | formula = NULL, 331 | ... 332 | ) { 333 | check_trans(transform) 334 | rset_type <- try(pretty(object), silent = TRUE) 335 | if (inherits(rset_type, "try-error")) { 336 | rset_type <- NA 337 | } 338 | 339 | ## dplyr::filter (and `[` !) drops the other classes =[ 340 | if (inherits(object, "bootstraps")) { 341 | oc <- class(object) 342 | object <- object |> dplyr::filter(id != "Apparent") 343 | class(object) <- oc 344 | } 345 | 346 | if (any(names(object) == "splits")) { 347 | object$splits <- NULL 348 | } 349 | resamples <- 350 | tidyr::pivot_longer( 351 | object, 352 | cols = c(-dplyr::matches("(^id$)|(^id[0-9])")), 353 | names_to = "model", 354 | values_to = "statistic" 355 | ) |> 356 | dplyr::mutate(statistic = transform$func(statistic)) 357 | 358 | ## Make a formula based on resampling type (repeatedcv, rof), 359 | ## This could be done with more specific classes 360 | id_cols <- grep("(^id$)|(^id[1-9]$)", names(object), value = TRUE) 361 | formula <- make_formula(id_cols, hetero_var, formula) 362 | 363 | model_names <- unique(as.character(resamples$model)) 364 | 365 | mod <- stan_glmer(formula, data = resamples, ...) 366 | 367 | res <- list( 368 | stan = mod, 369 | hetero_var = hetero_var, 370 | names = model_names, 371 | rset_type = rset_type, 372 | ids = get_id_vals(resamples), 373 | transform = transform, 374 | metric = list(name = NA_character_, direction = NA_character_) 375 | ) 376 | class(res) <- "perf_mod" 377 | res 378 | } 379 | 380 | make_formula <- function(ids, hetero_var, formula) { 381 | if (is.null(formula)) { 382 | ids <- sort(ids) 383 | p <- length(ids) 384 | if (p > 1) { 385 | msg <- 386 | paste0( 387 | "There were multiple resample ID columns in the data. It is ", 388 | "unclear what the model formula should be for the hierarchical ", 389 | "model. This analysis used the formula: " 390 | ) 391 | nested <- paste0(rev(ids), collapse = "/") 392 | if (hetero_var) { 393 | f_chr <- paste0("statistic ~ model + (model + 0 |", nested, ")") 394 | f <- as.formula(f_chr) 395 | } else { 396 | f_chr <- paste0("statistic ~ model + (1 |", nested, ")") 397 | f <- as.formula(f_chr) 398 | } 399 | msg <- paste0( 400 | msg, 401 | rlang::expr_label(f), 402 | " The `formula` arg can be used to change this value." 403 | ) 404 | rlang::warn(msg) 405 | } else { 406 | if (hetero_var) { 407 | f <- statistic ~ model + (model + 0 | id) 408 | } else { 409 | f <- statistic ~ model + (1 | id) 410 | } 411 | } 412 | } else { 413 | f <- formula 414 | } 415 | attr(f, ".Environment") <- rlang::base_env() 416 | f 417 | } 418 | 419 | #' @export 420 | print.perf_mod <- function(x, ...) { 421 | cat("Bayesian Analysis of Resampling Results\n") 422 | if (!is.na(x$rset_type)) { 423 | cat("Original data: ") 424 | cat(x$rset_type, sep = "\n") 425 | } 426 | cat("\n") 427 | invisible(x) 428 | } 429 | 430 | #' @export 431 | summary.perf_mod <- function(object, ...) { 432 | summary(object$stan) 433 | } 434 | 435 | 436 | #' @export 437 | #' @rdname perf_mod 438 | #' @param metric A single character value for the statistic from 439 | #' the `resamples` object that should be analyzed. 440 | perf_mod.resamples <- 441 | function( 442 | object, 443 | transform = no_trans, 444 | hetero_var = FALSE, 445 | metric = object$metrics[1], 446 | ... 447 | ) { 448 | suffix <- paste0("~", metric, "$") 449 | metric_cols <- grep(suffix, names(object$values), value = TRUE) 450 | object$values <- object$values |> 451 | dplyr::select(Resample, !!metric_cols) 452 | 453 | object$values <- 454 | setNames(object$values, gsub(suffix, "", names(object$values))) 455 | 456 | if (is_repeated_cv(object)) { 457 | split_up <- strsplit(as.character(object$values$Resample), "\\.") 458 | object$values <- object$values |> 459 | dplyr::mutate( 460 | id = map_chr(split_up, function(x) x[2]), 461 | id2 = map_chr(split_up, function(x) x[1]) 462 | ) |> 463 | dplyr::select(-Resample) 464 | class(object$values) <- c("vfold_cv", "rset", class(object$values)) 465 | cv_att <- list( 466 | v = length(unique(object$values$id2)), 467 | repeats = length(unique(object$values$id)), 468 | strata = FALSE 469 | ) 470 | for (i in names(cv_att)) attr(object$values, i) <- cv_att[[i]] 471 | } else { 472 | object$values <- object$values |> 473 | dplyr::rename(id = Resample) 474 | class(object$values) <- c("rset", class(object$values)) 475 | } 476 | 477 | res <- perf_mod( 478 | object$values, 479 | transform = transform, 480 | hetero_var = hetero_var, 481 | ... 482 | ) 483 | res$metric <- list(name = metric_cols[1], direction = NA_character_) 484 | res 485 | } 486 | 487 | #' @export 488 | #' @rdname perf_mod 489 | perf_mod.data.frame <- 490 | function( 491 | object, 492 | transform = no_trans, 493 | hetero_var = FALSE, 494 | formula = NULL, 495 | ... 496 | ) { 497 | id_cols <- grep("(^id)|(^id[1-9]$)", names(object), value = TRUE) 498 | if (length(id_cols) == 0) { 499 | rlang::abort("One or more `id` columns are required.") 500 | } 501 | 502 | class(object) <- c("rset", class(object)) 503 | 504 | res <- perf_mod( 505 | object, 506 | transform = transform, 507 | hetero_var = hetero_var, 508 | formula = formula, 509 | ... 510 | ) 511 | res$metric <- list(name = NA_character_, direction = NA_character_) 512 | res 513 | } 514 | 515 | 516 | #' @export 517 | #' @rdname perf_mod 518 | perf_mod.tune_results <- 519 | function( 520 | object, 521 | metric = NULL, 522 | transform = no_trans, 523 | hetero_var = FALSE, 524 | formula = NULL, 525 | filter = NULL, 526 | ... 527 | ) { 528 | metric_info <- tune::.get_tune_metrics(object) 529 | metric_info <- tune::metrics_info(metric_info) 530 | if (!is.null(metric)) { 531 | if (all(metric != metric_info$.metric)) { 532 | rlang::abort( 533 | paste0( 534 | "'metric` should be one of: ", 535 | paste0("'", metric_info$.metric, "'", collapse = ", ") 536 | ) 537 | ) 538 | } 539 | metric <- metric[1] 540 | } else { 541 | metric <- metric_info$.metric[1] 542 | } 543 | metric_dir <- metric_info$direction[metric_info$.metric == metric] 544 | 545 | dat <- tune::collect_metrics(object, summarize = FALSE) 546 | dat <- dplyr::filter(dat, .metric == metric) 547 | 548 | filters <- rlang::enexpr(filter) 549 | if (!is.null(filters)) { 550 | dat <- dplyr::filter(dat, !!filters) 551 | } 552 | 553 | id_vars <- grep("(^id$)|(^id[0-9])", names(dat), value = TRUE) 554 | keep_vars <- c(id_vars, ".estimate", ".config") 555 | if (any(names(dat) == ".iter")) { 556 | keep_vars <- c(keep_vars, ".iter") 557 | } 558 | dat <- dplyr::select(dat, dplyr::all_of(keep_vars)) 559 | 560 | dat <- tidyr::pivot_wider( 561 | dat, 562 | id_cols = dplyr::all_of(id_vars), 563 | names_from = ".config", 564 | values_from = ".estimate" 565 | ) 566 | 567 | rset_info <- attributes(object)$rset_info$att 568 | rset_info$class <- c(rset_info$class, class(dplyr::tibble())) 569 | dat <- rlang::exec("structure", .Data = dat, !!!rset_info) 570 | res <- perf_mod( 571 | dat, 572 | transform = transform, 573 | hetero_var = hetero_var, 574 | formula = formula, 575 | ... 576 | ) 577 | res$metric <- list(name = metric, direction = metric_dir) 578 | res 579 | } 580 | 581 | #' @export 582 | #' @rdname perf_mod 583 | perf_mod.workflow_set <- 584 | function( 585 | object, 586 | metric = NULL, 587 | transform = no_trans, 588 | hetero_var = FALSE, 589 | formula = NULL, 590 | ... 591 | ) { 592 | check_trans(transform) 593 | metric_info <- tune::.get_tune_metrics(object$result[[1]]) 594 | metric_info <- tune::metrics_info(metric_info) 595 | if (!is.null(metric)) { 596 | if (all(metric != metric_info$.metric)) { 597 | rlang::abort( 598 | paste0( 599 | "'metric` should be one of: ", 600 | paste0("'", metric_info$.metric, "'", collapse = ", ") 601 | ) 602 | ) 603 | } 604 | metric <- metric[1] 605 | } else { 606 | metric <- metric_info$.metric[1] 607 | } 608 | metric_dir <- metric_info$direction[metric_info$.metric == metric] 609 | 610 | resamples <- 611 | tune::collect_metrics(object, summarize = FALSE) |> 612 | dplyr::filter(.metric == metric & id != "Apparent") 613 | 614 | ranked <- 615 | workflowsets::rank_results( 616 | object, 617 | rank_metric = metric, 618 | select_best = TRUE 619 | ) |> 620 | dplyr::select(wflow_id, .config) 621 | resamples <- dplyr::inner_join( 622 | resamples, 623 | ranked, 624 | by = c("wflow_id", ".config") 625 | ) 626 | 627 | if (any(names(resamples) == ".iter")) { 628 | resamples$sub_model <- paste( 629 | resamples$.config, 630 | resamples$.iter, 631 | sep = "_" 632 | ) 633 | } else { 634 | resamples$sub_model <- resamples$.config 635 | } 636 | 637 | resamples <- 638 | resamples |> 639 | dplyr::select( 640 | model = wflow_id, 641 | sub_model, 642 | dplyr::starts_with("id"), 643 | statistic = .estimate 644 | ) 645 | 646 | ## Make a formula based on resampling type (repeatedcv, rof), 647 | ## This could be done with more specific classes 648 | id_cols <- grep("(^id)|(^id[1-9]$)", names(object), value = TRUE) 649 | formula <- make_formula(id_cols, hetero_var, formula) 650 | 651 | model_names <- unique(as.character(resamples$model)) 652 | 653 | mod <- rstanarm::stan_glmer(formula, data = resamples, ...) 654 | 655 | res <- list( 656 | stan = mod, 657 | hetero_var = hetero_var, 658 | names = model_names, 659 | rset_type = attributes(object$result[[1]])$rset_info$label, 660 | metric = list(name = metric, direction = metric_dir), 661 | ids = get_id_vals(resamples), 662 | transform = transform 663 | ) 664 | structure(res, class = c("perf_mod_workflow_set", "perf_mod")) 665 | } 666 | -------------------------------------------------------------------------------- /R/posterior_diff-compat.R: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # posterior_diff 3 | 4 | posterior_diff_reconstruct <- function(x, to) { 5 | if (posterior_diff_reconstructable(x, to)) { 6 | df_reconstruct(x, to) 7 | } else { 8 | tib_upcast(x) 9 | } 10 | } 11 | 12 | posterior_diff_reconstructable <- function(x, to) { 13 | x_names <- names(x) 14 | to_names <- names(to) 15 | req_names <- c("difference", "model_1", "model_2", "contrast") 16 | if (!all(req_names %in% x_names)) { 17 | return(FALSE) 18 | } else { 19 | if (!is.numeric(x[["difference"]])) { 20 | return(FALSE) 21 | } 22 | if (!is.character(x[["model_1"]]) & !is.factor(x[["model_1"]])) { 23 | return(FALSE) 24 | } 25 | if (!is.character(x[["model_2"]]) & !is.factor(x[["model_2"]])) { 26 | return(FALSE) 27 | } 28 | if (!is.character(x[["contrast"]]) & !is.factor(x[["contrast"]])) { 29 | return(FALSE) 30 | } 31 | } 32 | TRUE 33 | } 34 | 35 | # ------------------------------------------------------------------------------ 36 | 37 | #' @export 38 | `[.posterior_diff` <- function(x, i, j, ...) { 39 | out <- NextMethod() 40 | posterior_diff_reconstruct(out, x) 41 | } 42 | 43 | #' @export 44 | `names<-.posterior_diff` <- function(x, value) { 45 | out <- NextMethod() 46 | posterior_diff_reconstruct(out, x) 47 | } 48 | 49 | # ------------------------------------------------------------------------------ 50 | # posterior_diff vctrs functions 51 | 52 | #' Extra methods for the `posterior_diff` class to work with dplyr verbs 53 | #' 54 | #' Objects with class `posterior_diff` are defined to be tibbles with required 55 | #' columns `difference` (numeric) and character columns `model_1`, `model_2`, 56 | #' and `contrast`. If operations on these objects break those rules, they are 57 | #' down-cast to basic tibbles. 58 | #' @export 59 | #' @rdname vctrs_methods_posterior_diff 60 | #' @keywords internal 61 | vec_restore.posterior_diff <- function(x, to, ...) { 62 | posterior_diff_reconstruct(x, to) 63 | } 64 | 65 | 66 | #' @export 67 | #' @rdname vctrs_methods_posterior_diff 68 | #' @keywords internal 69 | vec_proxy.posterior_diff <- function(x, ...) { 70 | vctrs::new_data_frame(x) 71 | } 72 | 73 | 74 | #' @export 75 | #' @rdname vctrs_methods_posterior_diff 76 | #' @keywords internal 77 | vec_ptype2.posterior_diff.posterior_diff <- function( 78 | x, 79 | y, 80 | ..., 81 | x_arg = "", 82 | y_arg = "" 83 | ) { 84 | tibble::tibble( 85 | difference = numeric(0), 86 | model_1 = character(0), 87 | model_2 = character(0), 88 | contrast = character(0) 89 | ) 90 | } 91 | #' @export 92 | #' @rdname vctrs_methods_posterior_diff 93 | #' @keywords internal 94 | vec_ptype2.posterior_diff.tbl_df <- function( 95 | x, 96 | y, 97 | ..., 98 | x_arg = "", 99 | y_arg = "" 100 | ) { 101 | vctrs::tib_ptype2(x, y, ..., x_arg = x_arg, y_arg = y_arg) 102 | } 103 | #' @export 104 | #' @rdname vctrs_methods_posterior_diff 105 | #' @keywords internal 106 | vec_ptype2.tbl_df.posterior_diff <- function( 107 | x, 108 | y, 109 | ..., 110 | x_arg = "", 111 | y_arg = "" 112 | ) { 113 | vctrs::tib_ptype2(x, y, ..., x_arg = x_arg, y_arg = y_arg) 114 | } 115 | #' @export 116 | #' @rdname vctrs_methods_posterior_diff 117 | #' @keywords internal 118 | vec_ptype2.posterior_diff.data.frame <- function( 119 | x, 120 | y, 121 | ..., 122 | x_arg = "", 123 | y_arg = "" 124 | ) { 125 | vctrs::tib_ptype2(x, y, ..., x_arg = x_arg, y_arg = y_arg) 126 | } 127 | #' @export 128 | #' @rdname vctrs_methods_posterior_diff 129 | #' @keywords internal 130 | vec_ptype2.data.frame.posterior_diff <- function( 131 | x, 132 | y, 133 | ..., 134 | x_arg = "", 135 | y_arg = "" 136 | ) { 137 | vctrs::tib_ptype2(x, y, ..., x_arg = x_arg, y_arg = y_arg) 138 | } 139 | 140 | #' @export 141 | #' @rdname vctrs_methods_posterior_diff 142 | #' @keywords internal 143 | vec_cast.posterior_diff.posterior_diff <- function( 144 | x, 145 | to, 146 | ..., 147 | x_arg = "", 148 | to_arg = "" 149 | ) { 150 | x 151 | } 152 | #' @export 153 | #' @rdname vctrs_methods_posterior_diff 154 | #' @keywords internal 155 | vec_cast.posterior_diff.tbl_df <- function( 156 | x, 157 | to, 158 | ..., 159 | x_arg = "", 160 | to_arg = "" 161 | ) { 162 | stop_incompatible_cast_posterior_diff( 163 | x, 164 | to, 165 | ..., 166 | x_arg = x_arg, 167 | to_arg = to_arg 168 | ) 169 | } 170 | #' @export 171 | #' @rdname vctrs_methods_posterior_diff 172 | #' @keywords internal 173 | vec_cast.tbl_df.posterior_diff <- function( 174 | x, 175 | to, 176 | ..., 177 | x_arg = "", 178 | to_arg = "" 179 | ) { 180 | vctrs::tib_cast(x, to, ..., x_arg = x_arg, to_arg = to_arg) 181 | } 182 | #' @export 183 | #' @rdname vctrs_methods_posterior_diff 184 | #' @keywords internal 185 | vec_cast.posterior_diff.data.frame <- function( 186 | x, 187 | to, 188 | ..., 189 | x_arg = "", 190 | to_arg = "" 191 | ) { 192 | stop_incompatible_cast_posterior_diff( 193 | x, 194 | to, 195 | ..., 196 | x_arg = x_arg, 197 | to_arg = to_arg 198 | ) 199 | } 200 | #' @export 201 | #' @rdname vctrs_methods_posterior_diff 202 | #' @keywords internal 203 | vec_cast.data.frame.posterior_diff <- function( 204 | x, 205 | to, 206 | ..., 207 | x_arg = "", 208 | to_arg = "" 209 | ) { 210 | vctrs::df_cast(x, to, ..., x_arg = x_arg, to_arg = to_arg) 211 | } 212 | 213 | stop_incompatible_cast_posterior_diff <- function(x, to, ..., x_arg, to_arg) { 214 | details <- "Can't cast to a because columns names and types are likely incompatible." 215 | vctrs::stop_incompatible_cast( 216 | x, 217 | to, 218 | x_arg = x_arg, 219 | to_arg = to_arg, 220 | details = details 221 | ) 222 | } 223 | -------------------------------------------------------------------------------- /R/posteriors-compat.R: -------------------------------------------------------------------------------- 1 | dplyr_pre_1.0.0 <- function() { 2 | utils::packageVersion("dplyr") <= "0.8.5" 3 | } 4 | 5 | # ------------------------------------------------------------------------------ 6 | # posterior 7 | 8 | posterior_reconstruct <- function(x, to) { 9 | if (posterior_reconstructable(x, to)) { 10 | df_reconstruct(x, to) 11 | } else { 12 | tib_upcast(x) 13 | } 14 | } 15 | 16 | posterior_reconstructable <- function(x, to) { 17 | x_names <- names(x) 18 | to_names <- names(to) 19 | req_names <- c("model", "posterior") 20 | if (!all(req_names %in% x_names)) { 21 | return(FALSE) 22 | } else { 23 | if (!is.numeric(x[["posterior"]])) { 24 | return(FALSE) 25 | } 26 | if (!is.character(x[["model"]]) & !is.factor(x[["model"]])) { 27 | return(FALSE) 28 | } 29 | } 30 | TRUE 31 | } 32 | 33 | # ------------------------------------------------------------------------------ 34 | 35 | #' @export 36 | `[.posterior` <- function(x, i, j, ...) { 37 | out <- NextMethod() 38 | posterior_reconstruct(out, x) 39 | } 40 | 41 | #' @export 42 | `names<-.posterior` <- function(x, value) { 43 | out <- NextMethod() 44 | posterior_reconstruct(out, x) 45 | } 46 | 47 | # ------------------------------------------------------------------------------ 48 | 49 | # Maybe this should live in vctrs? 50 | # Fallback to a tibble from the current data frame subclass. 51 | # Removes subclass specific attributes and additional ones added by the user. 52 | tib_upcast <- function(x) { 53 | size <- df_size(x) 54 | 55 | # Strip all attributes except names to construct 56 | # a bare list to build the tibble back up from. 57 | attributes(x) <- list(names = names(x)) 58 | 59 | tibble::new_tibble(x, nrow = size) 60 | } 61 | 62 | df_size <- function(x) { 63 | if (!is.list(x)) { 64 | rlang::abort("Cannot get the df size of a non-list.") 65 | } 66 | 67 | if (length(x) == 0L) { 68 | return(0L) 69 | } 70 | 71 | col <- x[[1L]] 72 | 73 | vctrs::vec_size(col) 74 | } 75 | 76 | # ------------------------------------------------------------------------------ 77 | 78 | # Maybe this should live in vctrs? 79 | df_reconstruct <- function(x, to) { 80 | attrs <- attributes(to) 81 | attrs$names <- names(x) 82 | attrs$row.names <- .row_names_info(x, type = 0L) 83 | attributes(x) <- attrs 84 | x 85 | } 86 | 87 | # ------------------------------------------------------------------------------ 88 | # posterior vctrs functions 89 | 90 | #' Extra methods for the posterior class to work with dplyr verbs 91 | #' 92 | #' Objects with class `posterior` are defined to be tibbles with required 93 | #' columns `model` (character) and `posterior` (numeric). If operations on these 94 | #' objects break those rules, they are down-cast to basic tibbles. 95 | #' @export 96 | #' @rdname vctrs_methods_posterior 97 | #' @keywords internal 98 | vec_restore.posterior <- function(x, to, ...) { 99 | posterior_reconstruct(x, to) 100 | } 101 | 102 | 103 | #' @export 104 | #' @rdname vctrs_methods_posterior 105 | #' @keywords internal 106 | vec_proxy.posterior <- function(x, ...) { 107 | vctrs::new_data_frame(x) 108 | } 109 | 110 | 111 | #' @export 112 | #' @rdname vctrs_methods_posterior 113 | #' @keywords internal 114 | vec_ptype2.posterior.posterior <- function(x, y, ..., x_arg = "", y_arg = "") { 115 | tibble::tibble(model = character(0), posterior = numeric(0)) 116 | } 117 | #' @export 118 | #' @rdname vctrs_methods_posterior 119 | #' @keywords internal 120 | vec_ptype2.posterior.tbl_df <- function(x, y, ..., x_arg = "", y_arg = "") { 121 | vctrs::tib_ptype2(x, y, ..., x_arg = x_arg, y_arg = y_arg) 122 | } 123 | #' @export 124 | #' @rdname vctrs_methods_posterior 125 | #' @keywords internal 126 | vec_ptype2.tbl_df.posterior <- function(x, y, ..., x_arg = "", y_arg = "") { 127 | vctrs::tib_ptype2(x, y, ..., x_arg = x_arg, y_arg = y_arg) 128 | } 129 | #' @export 130 | #' @rdname vctrs_methods_posterior 131 | #' @keywords internal 132 | vec_ptype2.posterior.data.frame <- function(x, y, ..., x_arg = "", y_arg = "") { 133 | vctrs::tib_ptype2(x, y, ..., x_arg = x_arg, y_arg = y_arg) 134 | } 135 | #' @export 136 | #' @rdname vctrs_methods_posterior 137 | #' @keywords internal 138 | vec_ptype2.data.frame.posterior <- function(x, y, ..., x_arg = "", y_arg = "") { 139 | vctrs::tib_ptype2(x, y, ..., x_arg = x_arg, y_arg = y_arg) 140 | } 141 | 142 | #' @export 143 | #' @rdname vctrs_methods_posterior 144 | #' @keywords internal 145 | vec_cast.posterior.posterior <- function(x, to, ..., x_arg = "", to_arg = "") { 146 | x 147 | } 148 | #' @export 149 | #' @rdname vctrs_methods_posterior 150 | #' @keywords internal 151 | vec_cast.posterior.tbl_df <- function(x, to, ..., x_arg = "", to_arg = "") { 152 | stop_incompatible_cast_posterior(x, to, ..., x_arg = x_arg, to_arg = to_arg) 153 | } 154 | #' @export 155 | #' @rdname vctrs_methods_posterior 156 | #' @keywords internal 157 | vec_cast.tbl_df.posterior <- function(x, to, ..., x_arg = "", to_arg = "") { 158 | vctrs::tib_cast(x, to, ..., x_arg = x_arg, to_arg = to_arg) 159 | } 160 | #' @export 161 | #' @rdname vctrs_methods_posterior 162 | #' @keywords internal 163 | vec_cast.posterior.data.frame <- function(x, to, ..., x_arg = "", to_arg = "") { 164 | stop_incompatible_cast_posterior(x, to, ..., x_arg = x_arg, to_arg = to_arg) 165 | } 166 | #' @export 167 | #' @rdname vctrs_methods_posterior 168 | #' @keywords internal 169 | vec_cast.data.frame.posterior <- function(x, to, ..., x_arg = "", to_arg = "") { 170 | vctrs::df_cast(x, to, ..., x_arg = x_arg, to_arg = to_arg) 171 | } 172 | 173 | stop_incompatible_cast_posterior <- function(x, to, ..., x_arg, to_arg) { 174 | details <- "Can't cast to a because columns names and types are likely incompatible." 175 | vctrs::stop_incompatible_cast( 176 | x, 177 | to, 178 | x_arg = x_arg, 179 | to_arg = to_arg, 180 | details = details 181 | ) 182 | } 183 | -------------------------------------------------------------------------------- /R/posteriors.R: -------------------------------------------------------------------------------- 1 | #' Extract Posterior Distributions for Models 2 | #' 3 | #' `tidy` can be used on an object produced by [perf_mod()] 4 | #' to create a data frame with a column for the model name and 5 | #' the posterior predictive distribution values. 6 | #' 7 | #' @param x An object from [perf_mod()] 8 | #' @param seed A single integer for sampling from the posterior. 9 | #' @param ... Not currently used 10 | #' @return A data frame with the additional class `"posterior"` 11 | #' @details Note that this posterior only reflects the variability 12 | #' of the groups (i.e. the fixed effects). This helps answer the 13 | #' question of which model is best _for this data set_. If does not 14 | #' answer the question of which model would be best on a new 15 | #' resample of the data (which would have greater variability). 16 | #' @export 17 | #' @export tidy.perf_mod 18 | 19 | tidy.perf_mod <- function(x, seed = sample.int(10000, 1), ...) { 20 | post_dat <- get_post(x, seed = seed) 21 | post_dat <- 22 | post_dat |> 23 | tidyr::pivot_longer( 24 | c(dplyr::everything()), 25 | names_to = "model", 26 | values_to = "posterior" 27 | ) |> 28 | dplyr::mutate(posterior = x$transform$inv(posterior)) 29 | post_dat <- as_tibble(post_dat) 30 | class(post_dat) <- c("posterior", class(post_dat)) 31 | post_dat 32 | } 33 | 34 | #' @export 35 | print.posterior <- function(x, ...) { 36 | cat("# Posterior samples of performance\n") 37 | print(tibble::as_tibble(x), ...) 38 | } 39 | 40 | #' Summarize the Posterior Distributions of Model Statistics 41 | #' 42 | #' Numerical summaries are created for each model including the 43 | #' posterior mean and upper and lower credible intervals (aka 44 | #' uncertainty intervals). 45 | #' 46 | #' @param object An object produced by [tidy.perf_mod()]. 47 | #' @param prob A number p (0 < p < 1) indicating the desired 48 | #' probability mass to include in the intervals. 49 | #' @param seed A single integer for sampling from the posterior. 50 | #' @param ... Not currently used 51 | #' @return A data frame with summary statistics and a row for 52 | #' each model. 53 | #' @examples 54 | #' data("ex_objects") 55 | #' 56 | #' summary(posterior_samples) 57 | #' @export 58 | summary.posterior <- function( 59 | object, 60 | prob = 0.90, 61 | seed = sample.int(10000, 1), 62 | ... 63 | ) { 64 | post_int <- object |> 65 | dplyr::group_by(model) |> 66 | dplyr::do(postint.data.frame(., prob = prob, seed = seed)) 67 | post_stats <- object |> 68 | dplyr::group_by(model) |> 69 | dplyr::summarise(mean = mean(posterior)) |> 70 | dplyr::full_join(post_int, by = "model") 71 | post_stats 72 | } 73 | 74 | 75 | get_post <- function(x, seed = sample.int(10000, 1)) { 76 | new_dat <- data.frame(model = unique(x$names)) 77 | new_dat <- 78 | as.data.frame(lapply(x$ids, function(x) rep(x[1], nrow(new_dat)))) |> 79 | bind_cols(new_dat) 80 | post_data <- 81 | rstanarm::posterior_epred( 82 | x$stan, 83 | newdata = new_dat, 84 | seed = seed, 85 | re.form = NA 86 | ) 87 | post_data <- as.data.frame(post_data) 88 | names(post_data) <- x$names 89 | post_data 90 | } 91 | 92 | postint <- function(object, ...) UseMethod("postint") 93 | 94 | 95 | #' @export 96 | postint.numeric <- function( 97 | object, 98 | prob = 0.90, 99 | seed = sample.int(10000, 1), 100 | ... 101 | ) { 102 | object <- matrix(object, ncol = 1) 103 | res <- rstanarm::posterior_interval(object, prob = prob, seed = seed) 104 | res <- as.data.frame(res) 105 | names(res) <- c("lower", "upper") 106 | res 107 | } 108 | 109 | #' @export 110 | postint.data.frame <- function( 111 | object, 112 | prob = 0.90, 113 | seed = sample.int(10000, 1), 114 | ... 115 | ) { 116 | postint(getElement(object, "posterior"), prob = prob, seed = seed) 117 | } 118 | 119 | 120 | #' Visualize the Posterior Distributions of Model Statistics 121 | #' 122 | #' For objects of classes `posterior` and `perf_mod`, `autoplot()` produces a 123 | #' simple plot of posterior distributions. For workflow set objects, there are 124 | #' several types of plots that can be produced. 125 | #' 126 | #' @inheritParams summary.posterior_diff 127 | #' @param object An object produced by [perf_mod()], [tidy.perf_mod()], or a 128 | #' workflow set with computed results. 129 | #' @return A [ggplot2::ggplot()] object. 130 | #' @param ... Options passed to `geom_line(stat = "density", ...)`. 131 | #' @param type A value of one of: `"intervals"` (for model rank versus posterior 132 | #' probability using interval estimation), `"posteriors"` (density plots for 133 | #' each model), or `"ROPE"` (for practical equivalence probabilities versus 134 | #' workflow rank). 135 | #' @examples 136 | #' data(ex_objects) 137 | #' autoplot(posterior_samples) 138 | #' @export 139 | autoplot.posterior <- 140 | function(object, ...) { 141 | ggplot2::ggplot( 142 | as.data.frame(object), 143 | ggplot2::aes(x = posterior, col = model) 144 | ) + 145 | ggplot2::geom_line(stat = "density", ...) 146 | } 147 | 148 | 149 | #' @rdname autoplot.posterior 150 | #' @export 151 | autoplot.perf_mod <- function(object, ...) { 152 | samples <- tidy(object) 153 | res <- autoplot(samples, ...) 154 | if (any(names(object) == "metric") && !is.na(object$metric$name)) { 155 | res <- res + ggplot2::xlab(object$metric$name) 156 | } 157 | res 158 | } 159 | 160 | #' @rdname autoplot.posterior 161 | #' @export 162 | autoplot.perf_mod_workflow_set <- function( 163 | object, 164 | type = "intervals", 165 | prob = 0.9, 166 | size = NULL, 167 | ... 168 | ) { 169 | type <- match.arg(type, c("intervals", "posteriors", "ROPE")) 170 | if (type == "intervals") { 171 | res <- plot_wset_intervals(object, prob, ...) 172 | } else if (type == "posteriors") { 173 | res <- autoplot.perf_mod(object, ...) 174 | } else { 175 | res <- plot_rope_probs(object, size, ...) 176 | } 177 | res 178 | } 179 | 180 | plot_wset_intervals <- function(object, prob, ...) { 181 | plot_data <- 182 | tidy(object) |> 183 | dplyr::group_by(model) |> 184 | dplyr::summarize( 185 | .lower = quantile(posterior, prob = 1 - prob[1]), 186 | .estimate = median(posterior), 187 | .upper = quantile(posterior, prob = prob[1]), 188 | .groups = "drop" 189 | ) |> 190 | dplyr::ungroup() |> 191 | dplyr::rename(workflow = model) 192 | if (object$metric$direction == "maximize") { 193 | plot_data$rank <- rank(-plot_data$.estimate, ties.method = "random") 194 | } else if (object$metric$direction == "minimize") { 195 | plot_data$rank <- rank(plot_data$.estimate, ties.method = "random") 196 | } else { 197 | rlang::abort("Don't know how to rank metric") 198 | } 199 | ggplot2::ggplot( 200 | plot_data, 201 | ggplot2::aes(x = rank, y = .estimate, col = workflow) 202 | ) + 203 | ggplot2::geom_point() + 204 | ggplot2::geom_errorbar( 205 | ggplot2::aes(ymin = .lower, ymax = .upper), 206 | width = diff(range(plot_data$rank)) / 75 207 | ) + 208 | ggplot2::labs(x = "Workflow Rank", y = object$metric$name) 209 | } 210 | 211 | plot_rope_probs <- function(object, size, ...) { 212 | if (is.null(size)) { 213 | rlang::abort( 214 | "Please supply a practical effect size via the `size` argument. " 215 | ) 216 | } 217 | posteriors <- 218 | tidy(object) |> 219 | dplyr::group_by(model) |> 220 | dplyr::summarize(.estimate = median(posterior), .groups = "drop") |> 221 | dplyr::ungroup() 222 | 223 | if (object$metric$direction == "maximize") { 224 | posteriors <- dplyr::arrange(posteriors, dplyr::desc(.estimate)) 225 | posteriors$rank <- rank(-posteriors$.estimate, ties.method = "random") 226 | worse_dir <- "pract_pos" 227 | } else if (object$metric$direction == "minimize") { 228 | posteriors <- dplyr::arrange(posteriors, .estimate) 229 | posteriors$rank <- rank(posteriors$.estimate, ties.method = "random") 230 | worse_dir <- "pract_neg" 231 | } else { 232 | rlang::abort("Don't know how to rank metric") 233 | } 234 | l1 <- rep(posteriors$model[1], nrow(posteriors)) 235 | l2 <- posteriors$model 236 | 237 | model_diffs <- contrast_models(object, l1, l2, seed = sample.int(1, 1000)) 238 | plot_data <- summary(model_diffs, size = size) 239 | rm_text <- paste0(posteriors$model[1], " vs ") 240 | plot_data$model <- gsub(rm_text, "", plot_data$contrast, fixed = TRUE) 241 | 242 | plot_data <- dplyr::full_join( 243 | plot_data[, c("model", "pract_equiv")], 244 | posteriors[, c("model", "rank")], 245 | by = "model" 246 | ) 247 | plot_data$workflow <- plot_data$model 248 | ggplot2::ggplot(plot_data, ggplot2::aes(x = rank, y = pract_equiv)) + 249 | ggplot2::geom_line(alpha = .2) + 250 | ggplot2::geom_point(ggplot2::aes(col = workflow)) + 251 | ggplot2::labs( 252 | x = "Workflow Rank", 253 | y = "Probability of Practical Equivalence" 254 | ) + 255 | ggplot2::ylim(0:1) 256 | } 257 | -------------------------------------------------------------------------------- /R/reexports.R: -------------------------------------------------------------------------------- 1 | #' @importFrom generics tidy 2 | #' @export 3 | generics::tidy 4 | 5 | #' @importFrom ggplot2 autoplot 6 | #' @export 7 | ggplot2::autoplot 8 | -------------------------------------------------------------------------------- /R/tidyposterior-package.R: -------------------------------------------------------------------------------- 1 | #' @keywords internal 2 | "_PACKAGE" 3 | 4 | ## usethis namespace: start 5 | #' @importFrom dplyr bind_cols 6 | #' @importFrom purrr map 7 | #' @importFrom purrr map_chr 8 | #' @importFrom rlang !! 9 | #' @importFrom rlang is_string 10 | #' @importFrom rlang na_dbl 11 | #' @importFrom rsample vfold_cv 12 | #' @importFrom rstanarm posterior_epred 13 | #' @importFrom rstanarm stan_glmer 14 | #' @importFrom stats as.formula 15 | #' @importFrom stats median 16 | #' @importFrom stats quantile 17 | #' @importFrom stats setNames 18 | #' @importFrom tibble as_tibble 19 | #' @importFrom utils combn 20 | ## usethis namespace: end 21 | NULL 22 | -------------------------------------------------------------------------------- /R/transformations.R: -------------------------------------------------------------------------------- 1 | #' Simple Transformation Functions 2 | #' 3 | #' A set of objects are contained here to easily facilitate the 4 | #' use of outcome transformations for modeling. For example, if 5 | #' there is a large amount of variability in the resampling results 6 | #' for the Kappa statistics, which lies between -1 and 1, assuming 7 | #' normality may produce posterior estimates outside of the natural 8 | #' bound. One way to solve this is to use a link function or assume 9 | #' a prior that is appropriately bounded. Another approach is to 10 | #' transform the outcome values prior to modeling using a Gaussian 11 | #' prior and reverse-transforming the posterior estimates prior to 12 | #' visualization and summarization. These object can help 13 | #' facilitate this last approach. 14 | #' 15 | #' @details The `logit_trans` object is useful for model 16 | #' performance statistics bounds in zero and one, such as accuracy 17 | #' or the area under the ROC curve. 18 | #' 19 | #' `ln_trans` and `inv_trans` can be useful when the statistics 20 | #' are right-skewed and strictly positive. 21 | #' 22 | #' `Fisher_trans` was originally used for correlation statistics 23 | #' but can be used here for an metrics falling between -1 and 1, 24 | #' such as Kappa. 25 | #' 26 | #' @examples 27 | #' logit_trans$func(.5) 28 | #' logit_trans$inv(0) 29 | #' @rdname transformations 30 | #' @export 31 | no_trans <- list( 32 | func = function(x) x, 33 | inv = function(x) x 34 | ) 35 | #' @rdname transformations 36 | #' @export 37 | logit_trans <- list( 38 | func = binomial()$linkfun, 39 | inv = binomial()$linkinv 40 | ) 41 | #' @rdname transformations 42 | #' @export 43 | Fisher_trans <- list(func = atanh, inv = tanh) 44 | #' @rdname transformations 45 | #' @export 46 | ln_trans <- list(func = log, inv = exp) 47 | #' @rdname transformations 48 | #' @export 49 | inv_trans <- list( 50 | func = function(x) 1 / x, 51 | inv = function(x) 1 / x 52 | ) 53 | -------------------------------------------------------------------------------- /R/zzz.R: -------------------------------------------------------------------------------- 1 | # nocov 2 | .onLoad <- function(libname, pkgname) { 3 | vctrs::s3_register( 4 | "dplyr::dplyr_reconstruct", 5 | "posterior", 6 | method = posterior_reconstruct 7 | ) 8 | vctrs::s3_register( 9 | "dplyr::dplyr_reconstruct", 10 | "posterior_diff", 11 | method = posterior_diff_reconstruct 12 | ) 13 | 14 | vctrs::s3_register("ggplot2::autoplot", "perf_mod") 15 | vctrs::s3_register("ggplot2::autoplot", "perf_mod_workflow_set") 16 | vctrs::s3_register("ggplot2::autoplot", "posterior") 17 | vctrs::s3_register("ggplot2::autoplot", "posterior_diff") 18 | } 19 | 20 | # nocov end 21 | -------------------------------------------------------------------------------- /README.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | output: github_document 3 | --- 4 | 5 | 6 | 7 | ```{r} 8 | #| include: false 9 | knitr::opts_chunk$set( 10 | collapse = TRUE, 11 | comment = "#>", 12 | fig.path = "man/figures/README-", 13 | out.width = "100%" 14 | ) 15 | ``` 16 | 17 | # tidyposterior 18 | 19 | 20 | [![R-CMD-check](https://github.com/tidymodels/tidyposterior/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidymodels/tidyposterior/actions/workflows/R-CMD-check.yaml) 21 | [![Codecov test coverage](https://codecov.io/gh/tidymodels/tidyposterior/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/tidyposterior?branch=main) 22 | [![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/tidyposterior)](https://CRAN.r-project.org/package=tidyposterior) 23 | [![Downloads](http://cranlogs.r-pkg.org/badges/tidyposterior)](https://CRAN.r-project.org/package=tidyposterior) 24 | ![](https://img.shields.io/badge/lifecycle-maturing-blue.svg) 25 | 26 | 27 | 28 | This package can be used to conduct _post hoc_ analyses of resampling results generated by models. 29 | 30 | For example, if two models are evaluated with the root mean squared error (RMSE) using 10-fold cross-validation, there are 10 paired statistics. These can be used to make comparisons between models without involving a test set. 31 | 32 | There is a rich literature on the analysis of model resampling results such as McLachlan's [_Discriminant Analysis and Statistical Pattern Recognition_](https://books.google.com/books?id=O_qHDLaWpDUC&lpg=PR7&ots=6GJnIREXZM&dq=%22Discriminant%20Analysis%20and%20Statistical%20Pattern%20Recognition%22&lr&pg=PR7#v=onepage&q=%22Discriminant%20Analysis%20and%20Statistical%20Pattern%20Recognition%22&f=false) and the references therein. This package follows _the spirit_ of [Benavoli _et al_ (2017)](https://people.idsia.ch//~marco/papers/2017jmlr-tests.pdf). 33 | 34 | tidyposterior uses Bayesian generalized linear models for this purpose and can be considered an upgraded version of the [`caret::resamples()`](https://topepo.github.io/caret/model-training-and-tuning.html#exploring-and-comparing-resampling-distributions) function. The package works with [rsample](https://rsample.tidymodels.org/) objects natively but any results in a data frame can be used. 35 | 36 | See [Chapter 11](https://www.tmwr.org/compare.html) of [_Tidy Models with R_](https://www.tmwr.org) for examples and more details. 37 | 38 | ## Installation 39 | 40 | You can install the released version of tidyposterior from [CRAN](https://CRAN.R-project.org) with: 41 | 42 | ``` r 43 | install.packages("tidyposterior") 44 | ``` 45 | 46 | And the development version from [GitHub](https://github.com/) with: 47 | 48 | ``` r 49 | # install.packages("pak") 50 | pak::pak("tidymodels/tidyposterior") 51 | ``` 52 | ## Example 53 | 54 | To illustrate, here are some example objects using 10-fold cross-validation for a simple two-class problem: 55 | 56 | ```{r} 57 | #| label: setup 58 | #| results: hide 59 | library(tidymodels) 60 | library(tidyposterior) 61 | 62 | data(two_class_dat, package = "modeldata") 63 | 64 | set.seed(100) 65 | folds <- vfold_cv(two_class_dat) 66 | ``` 67 | 68 | We can define two different models (for simplicity, with no tuning parameters). 69 | 70 | ```{r} 71 | #| label: model-specs 72 | logistic_reg_glm_spec <- 73 | logistic_reg() |> 74 | set_engine('glm') 75 | 76 | mars_earth_spec <- 77 | mars(prod_degree = 1) |> 78 | set_engine('earth') |> 79 | set_mode('classification') 80 | ``` 81 | 82 | For tidymodels, the [tune::fit_resamples()] function can be used to estimate performance for each model/resample: 83 | 84 | ```{r} 85 | #| label: tm-resamples 86 | rs_ctrl <- control_resamples(save_workflow = TRUE) 87 | 88 | logistic_reg_glm_res <- 89 | logistic_reg_glm_spec |> 90 | fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl) 91 | 92 | mars_earth_res <- 93 | mars_earth_spec |> 94 | fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl) 95 | ``` 96 | 97 | From these, there are several ways to pass the results to the `perf_mod()` function. The most general approach is to have a data frame with the resampling labels (i.e., one or more id columns) as well as columns for each model that you would like to compare. 98 | 99 | For the model results above, [tune::collect_metrics()] can be used along with some basic data manipulation steps: 100 | 101 | ```{r} 102 | #| label: df-results 103 | logistic_roc <- 104 | collect_metrics(logistic_reg_glm_res, summarize = FALSE) |> 105 | dplyr::filter(.metric == "roc_auc") |> 106 | dplyr::select(id, logistic = .estimate) 107 | 108 | mars_roc <- 109 | collect_metrics(mars_earth_res, summarize = FALSE) |> 110 | dplyr::filter(.metric == "roc_auc") |> 111 | dplyr::select(id, mars = .estimate) 112 | 113 | resamples_df <- full_join(logistic_roc, mars_roc, by = "id") 114 | resamples_df 115 | ``` 116 | 117 | We can then give this directly to `perf_mod()`: 118 | 119 | ```{r} 120 | #| label: df-mod 121 | set.seed(101) 122 | roc_model_via_df <- perf_mod(resamples_df, iter = 2000) 123 | ``` 124 | 125 | From this, the posterior distributions for each model can be obtained from the `tidy()` method: 126 | 127 | ```{r} 128 | #| label: post 129 | #| fig-alt: "Faceted histogram chart. Area Under the ROC Curve along the x-axis, count along the y-axis. The two facets are logistic and mars. Both histogram looks fairly normally distributed, with a mean of 0.89 for logistic and 0.88 for mars. The full range is 0.84 to 0.93." 130 | roc_model_via_df |> 131 | tidy() |> 132 | ggplot(aes(x = posterior)) + 133 | geom_histogram(bins = 40, col = "blue", fill = "blue", alpha = .4) + 134 | facet_wrap(~ model, ncol = 1) + 135 | xlab("Area Under the ROC Curve") 136 | ``` 137 | 138 | 139 | See `contrast_models()` for how to analyze these distributions 140 | 141 | 142 | ## Contributing 143 | 144 | This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms. 145 | 146 | - For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on RStudio Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question). 147 | 148 | - If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/tidyposterior/issues). 149 | 150 | - Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code. 151 | 152 | - Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/). 153 | 154 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | # tidyposterior 5 | 6 | 7 | 8 | [![R-CMD-check](https://github.com/tidymodels/tidyposterior/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidymodels/tidyposterior/actions/workflows/R-CMD-check.yaml) 9 | [![Codecov test 10 | coverage](https://codecov.io/gh/tidymodels/tidyposterior/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/tidyposterior?branch=main) 11 | [![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/tidyposterior)](https://CRAN.r-project.org/package=tidyposterior) 12 | [![Downloads](http://cranlogs.r-pkg.org/badges/tidyposterior)](https://CRAN.r-project.org/package=tidyposterior) 13 | ![](https://img.shields.io/badge/lifecycle-maturing-blue.svg) 14 | 15 | 16 | 17 | This package can be used to conduct *post hoc* analyses of resampling 18 | results generated by models. 19 | 20 | For example, if two models are evaluated with the root mean squared 21 | error (RMSE) using 10-fold cross-validation, there are 10 paired 22 | statistics. These can be used to make comparisons between models without 23 | involving a test set. 24 | 25 | There is a rich literature on the analysis of model resampling results 26 | such as McLachlan’s [*Discriminant Analysis and Statistical Pattern 27 | Recognition*](https://books.google.com/books?id=O_qHDLaWpDUC&lpg=PR7&ots=6GJnIREXZM&dq=%22Discriminant%20Analysis%20and%20Statistical%20Pattern%20Recognition%22&lr&pg=PR7#v=onepage&q=%22Discriminant%20Analysis%20and%20Statistical%20Pattern%20Recognition%22&f=false) 28 | and the references therein. This package follows *the spirit* of 29 | [Benavoli *et al* 30 | (2017)](https://people.idsia.ch//~marco/papers/2017jmlr-tests.pdf). 31 | 32 | tidyposterior uses Bayesian generalized linear models for this purpose 33 | and can be considered an upgraded version of the 34 | [`caret::resamples()`](https://topepo.github.io/caret/model-training-and-tuning.html#exploring-and-comparing-resampling-distributions) 35 | function. The package works with 36 | [rsample](https://rsample.tidymodels.org/) objects natively but any 37 | results in a data frame can be used. 38 | 39 | See [Chapter 11](https://www.tmwr.org/compare.html) of [*Tidy Models 40 | with R*](https://www.tmwr.org) for examples and more details. 41 | 42 | ## Installation 43 | 44 | You can install the released version of tidyposterior from 45 | [CRAN](https://CRAN.R-project.org) with: 46 | 47 | ``` r 48 | install.packages("tidyposterior") 49 | ``` 50 | 51 | And the development version from [GitHub](https://github.com/) with: 52 | 53 | ``` r 54 | # install.packages("pak") 55 | pak::pak("tidymodels/tidyposterior") 56 | ``` 57 | 58 | ## Example 59 | 60 | To illustrate, here are some example objects using 10-fold 61 | cross-validation for a simple two-class problem: 62 | 63 | ``` r 64 | library(tidymodels) 65 | #> ── Attaching packages ────────────────────────────────────── tidymodels 1.3.0 ── 66 | #> ✔ broom 1.0.8 ✔ recipes 1.3.0.9000 67 | #> ✔ dials 1.4.0.9000 ✔ rsample 1.3.0.9000 68 | #> ✔ dplyr 1.1.4 ✔ tibble 3.2.1 69 | #> ✔ ggplot2 3.5.2 ✔ tidyr 1.3.1 70 | #> ✔ infer 1.0.8 ✔ tune 1.3.0 71 | #> ✔ modeldata 1.4.0 ✔ workflows 1.2.0.9000 72 | #> ✔ parsnip 1.3.1.9000 ✔ workflowsets 1.1.0 73 | #> ✔ purrr 1.0.4 ✔ yardstick 1.3.2.9000 74 | #> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ── 75 | #> ✖ purrr::discard() masks scales::discard() 76 | #> ✖ dplyr::filter() masks stats::filter() 77 | #> ✖ dplyr::lag() masks stats::lag() 78 | #> ✖ recipes::step() masks stats::step() 79 | library(tidyposterior) 80 | 81 | data(two_class_dat, package = "modeldata") 82 | 83 | set.seed(100) 84 | folds <- vfold_cv(two_class_dat) 85 | ``` 86 | 87 | We can define two different models (for simplicity, with no tuning 88 | parameters). 89 | 90 | ``` r 91 | logistic_reg_glm_spec <- 92 | logistic_reg() |> 93 | set_engine('glm') 94 | 95 | mars_earth_spec <- 96 | mars(prod_degree = 1) |> 97 | set_engine('earth') |> 98 | set_mode('classification') 99 | ``` 100 | 101 | For tidymodels, the \[tune::fit_resamples()\] function can be used to 102 | estimate performance for each model/resample: 103 | 104 | ``` r 105 | rs_ctrl <- control_resamples(save_workflow = TRUE) 106 | 107 | logistic_reg_glm_res <- 108 | logistic_reg_glm_spec |> 109 | fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl) 110 | 111 | mars_earth_res <- 112 | mars_earth_spec |> 113 | fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl) 114 | ``` 115 | 116 | From these, there are several ways to pass the results to the 117 | `perf_mod()` function. The most general approach is to have a data frame 118 | with the resampling labels (i.e., one or more id columns) as well as 119 | columns for each model that you would like to compare. 120 | 121 | For the model results above, \[tune::collect_metrics()\] can be used 122 | along with some basic data manipulation steps: 123 | 124 | ``` r 125 | logistic_roc <- 126 | collect_metrics(logistic_reg_glm_res, summarize = FALSE) |> 127 | dplyr::filter(.metric == "roc_auc") |> 128 | dplyr::select(id, logistic = .estimate) 129 | 130 | mars_roc <- 131 | collect_metrics(mars_earth_res, summarize = FALSE) |> 132 | dplyr::filter(.metric == "roc_auc") |> 133 | dplyr::select(id, mars = .estimate) 134 | 135 | resamples_df <- full_join(logistic_roc, mars_roc, by = "id") 136 | resamples_df 137 | #> # A tibble: 10 × 3 138 | #> id logistic mars 139 | #> 140 | #> 1 Fold01 0.856 0.845 141 | #> 2 Fold02 0.933 0.951 142 | #> 3 Fold03 0.934 0.937 143 | #> 4 Fold04 0.864 0.858 144 | #> 5 Fold05 0.847 0.854 145 | #> 6 Fold06 0.911 0.840 146 | #> 7 Fold07 0.867 0.858 147 | #> 8 Fold08 0.886 0.876 148 | #> 9 Fold09 0.898 0.898 149 | #> 10 Fold10 0.906 0.894 150 | ``` 151 | 152 | We can then give this directly to `perf_mod()`: 153 | 154 | ``` r 155 | set.seed(101) 156 | roc_model_via_df <- perf_mod(resamples_df, iter = 2000) 157 | #> 158 | #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1). 159 | #> Chain 1: 160 | #> Chain 1: Gradient evaluation took 4.9e-05 seconds 161 | #> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.49 seconds. 162 | #> Chain 1: Adjust your expectations accordingly! 163 | #> Chain 1: 164 | #> Chain 1: 165 | #> Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup) 166 | #> Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup) 167 | #> Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup) 168 | #> Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup) 169 | #> Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup) 170 | #> Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup) 171 | #> Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling) 172 | #> Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling) 173 | #> Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling) 174 | #> Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling) 175 | #> Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling) 176 | #> Chain 1: Iteration: 2000 / 2000 [100%] (Sampling) 177 | #> Chain 1: 178 | #> Chain 1: Elapsed Time: 0.277 seconds (Warm-up) 179 | #> Chain 1: 0.089 seconds (Sampling) 180 | #> Chain 1: 0.366 seconds (Total) 181 | #> Chain 1: 182 | #> 183 | #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2). 184 | #> Chain 2: 185 | #> Chain 2: Gradient evaluation took 6e-06 seconds 186 | #> Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds. 187 | #> Chain 2: Adjust your expectations accordingly! 188 | #> Chain 2: 189 | #> Chain 2: 190 | #> Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup) 191 | #> Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup) 192 | #> Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup) 193 | #> Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup) 194 | #> Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup) 195 | #> Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup) 196 | #> Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling) 197 | #> Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling) 198 | #> Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling) 199 | #> Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling) 200 | #> Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling) 201 | #> Chain 2: Iteration: 2000 / 2000 [100%] (Sampling) 202 | #> Chain 2: 203 | #> Chain 2: Elapsed Time: 0.275 seconds (Warm-up) 204 | #> Chain 2: 0.105 seconds (Sampling) 205 | #> Chain 2: 0.38 seconds (Total) 206 | #> Chain 2: 207 | #> 208 | #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3). 209 | #> Chain 3: 210 | #> Chain 3: Gradient evaluation took 6e-06 seconds 211 | #> Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds. 212 | #> Chain 3: Adjust your expectations accordingly! 213 | #> Chain 3: 214 | #> Chain 3: 215 | #> Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup) 216 | #> Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup) 217 | #> Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup) 218 | #> Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup) 219 | #> Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup) 220 | #> Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup) 221 | #> Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling) 222 | #> Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling) 223 | #> Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling) 224 | #> Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling) 225 | #> Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling) 226 | #> Chain 3: Iteration: 2000 / 2000 [100%] (Sampling) 227 | #> Chain 3: 228 | #> Chain 3: Elapsed Time: 0.209 seconds (Warm-up) 229 | #> Chain 3: 0.111 seconds (Sampling) 230 | #> Chain 3: 0.32 seconds (Total) 231 | #> Chain 3: 232 | #> 233 | #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4). 234 | #> Chain 4: 235 | #> Chain 4: Gradient evaluation took 6e-06 seconds 236 | #> Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds. 237 | #> Chain 4: Adjust your expectations accordingly! 238 | #> Chain 4: 239 | #> Chain 4: 240 | #> Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup) 241 | #> Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup) 242 | #> Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup) 243 | #> Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup) 244 | #> Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup) 245 | #> Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup) 246 | #> Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling) 247 | #> Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling) 248 | #> Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling) 249 | #> Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling) 250 | #> Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling) 251 | #> Chain 4: Iteration: 2000 / 2000 [100%] (Sampling) 252 | #> Chain 4: 253 | #> Chain 4: Elapsed Time: 0.23 seconds (Warm-up) 254 | #> Chain 4: 0.082 seconds (Sampling) 255 | #> Chain 4: 0.312 seconds (Total) 256 | #> Chain 4: 257 | ``` 258 | 259 | From this, the posterior distributions for each model can be obtained 260 | from the `tidy()` method: 261 | 262 | ``` r 263 | roc_model_via_df |> 264 | tidy() |> 265 | ggplot(aes(x = posterior)) + 266 | geom_histogram(bins = 40, col = "blue", fill = "blue", alpha = .4) + 267 | facet_wrap(~ model, ncol = 1) + 268 | xlab("Area Under the ROC Curve") 269 | ``` 270 | 271 | Faceted histogram chart. Area Under the ROC Curve along the x-axis, count along the y-axis. The two facets are logistic and mars. Both histogram looks fairly normally distributed, with a mean of 0.89 for logistic and 0.88 for mars. The full range is 0.84 to 0.93. 272 | 273 | See `contrast_models()` for how to analyze these distributions 274 | 275 | ## Contributing 276 | 277 | This project is released with a [Contributor Code of 278 | Conduct](https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html). 279 | By contributing to this project, you agree to abide by its terms. 280 | 281 | - For questions and discussions about tidymodels packages, modeling, and 282 | machine learning, please [post on RStudio 283 | Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question). 284 | 285 | - If you think you have encountered a bug, please [submit an 286 | issue](https://github.com/tidymodels/tidyposterior/issues). 287 | 288 | - Either way, learn how to create and share a 289 | [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) 290 | (a minimal, reproducible example), to clearly communicate about your 291 | code. 292 | 293 | - Check out further details on [contributing guidelines for tidymodels 294 | packages](https://www.tidymodels.org/contribute/) and [how to get 295 | help](https://www.tidymodels.org/help/). 296 | -------------------------------------------------------------------------------- /_pkgdown.yml: -------------------------------------------------------------------------------- 1 | url: https://tidyposterior.tidymodels.org 2 | 3 | template: 4 | package: tidytemplate 5 | bootstrap: 5 6 | bslib: 7 | danger: "#CA225E" 8 | primary: "#CA225E" 9 | includes: 10 | in_header: | 11 | 12 | 13 | development: 14 | mode: auto 15 | 16 | figures: 17 | fig.width: 8 18 | fig.height: 5.75 19 | 20 | reference: 21 | - title: Bayesian analysis of resampling results 22 | contents: 23 | - perf_mod 24 | - tidy.perf_mod 25 | - contrast_models 26 | - matches("summary") 27 | - matches("autoplot") 28 | - no_trans 29 | - logit_trans 30 | - ln_trans 31 | - inv_trans 32 | - Fisher_trans 33 | 34 | - title: Data 35 | contents: 36 | - precise_example 37 | - noisy_example 38 | - concrete_example 39 | -------------------------------------------------------------------------------- /air.toml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tidymodels/tidyposterior/cd1f51a59fdd38ce38ea0d28e8041b112782d8db/air.toml -------------------------------------------------------------------------------- /codecov.yml: -------------------------------------------------------------------------------- 1 | comment: false 2 | 3 | coverage: 4 | status: 5 | project: 6 | default: 7 | target: auto 8 | threshold: 1% 9 | informational: true 10 | patch: 11 | default: 12 | target: auto 13 | threshold: 1% 14 | informational: true 15 | -------------------------------------------------------------------------------- /data/concrete_example.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tidymodels/tidyposterior/cd1f51a59fdd38ce38ea0d28e8041b112782d8db/data/concrete_example.rda -------------------------------------------------------------------------------- /data/ex_objects.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tidymodels/tidyposterior/cd1f51a59fdd38ce38ea0d28e8041b112782d8db/data/ex_objects.RData -------------------------------------------------------------------------------- /data/noisy_example.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tidymodels/tidyposterior/cd1f51a59fdd38ce38ea0d28e8041b112782d8db/data/noisy_example.rda -------------------------------------------------------------------------------- /data/precise_example.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tidymodels/tidyposterior/cd1f51a59fdd38ce38ea0d28e8041b112782d8db/data/precise_example.rda -------------------------------------------------------------------------------- /data/ts_example.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tidymodels/tidyposterior/cd1f51a59fdd38ce38ea0d28e8041b112782d8db/data/ts_example.rda -------------------------------------------------------------------------------- /inst/examples/glm_v_nnet.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tidymodels/tidyposterior/cd1f51a59fdd38ce38ea0d28e8041b112782d8db/inst/examples/glm_v_nnet.RData -------------------------------------------------------------------------------- /man/autoplot.posterior.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/posteriors.R 3 | \name{autoplot.posterior} 4 | \alias{autoplot.posterior} 5 | \alias{autoplot.perf_mod} 6 | \alias{autoplot.perf_mod_workflow_set} 7 | \title{Visualize the Posterior Distributions of Model Statistics} 8 | \usage{ 9 | \method{autoplot}{posterior}(object, ...) 10 | 11 | \method{autoplot}{perf_mod}(object, ...) 12 | 13 | \method{autoplot}{perf_mod_workflow_set}(object, type = "intervals", prob = 0.9, size = NULL, ...) 14 | } 15 | \arguments{ 16 | \item{object}{An object produced by \code{\link[=perf_mod]{perf_mod()}}, \code{\link[=tidy.perf_mod]{tidy.perf_mod()}}, or a 17 | workflow set with computed results.} 18 | 19 | \item{...}{Options passed to \code{geom_line(stat = "density", ...)}.} 20 | 21 | \item{type}{A value of one of: \code{"intervals"} (for model rank versus posterior 22 | probability using interval estimation), \code{"posteriors"} (density plots for 23 | each model), or \code{"ROPE"} (for practical equivalence probabilities versus 24 | workflow rank).} 25 | 26 | \item{prob}{A number p (0 < p < 1) indicating the desired 27 | probability mass to include in the intervals.} 28 | 29 | \item{size}{The size of an effective difference in the units of the chosen 30 | metric. For example, a 5 percent increase in accuracy (\code{size = 0.05}) 31 | between two models might be considered a "real" difference.} 32 | } 33 | \value{ 34 | A \code{\link[ggplot2:ggplot]{ggplot2::ggplot()}} object. 35 | } 36 | \description{ 37 | For objects of classes \code{posterior} and \code{perf_mod}, \code{autoplot()} produces a 38 | simple plot of posterior distributions. For workflow set objects, there are 39 | several types of plots that can be produced. 40 | } 41 | \examples{ 42 | data(ex_objects) 43 | autoplot(posterior_samples) 44 | } 45 | -------------------------------------------------------------------------------- /man/autoplot.posterior_diff.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/contrasts.R 3 | \name{autoplot.posterior_diff} 4 | \alias{autoplot.posterior_diff} 5 | \title{Visualize the Posterior Distributions of Model Differences} 6 | \usage{ 7 | \method{autoplot}{posterior_diff}(object, size = 0, ...) 8 | } 9 | \arguments{ 10 | \item{object}{An object produced by \code{\link[=contrast_models]{contrast_models()}}.} 11 | 12 | \item{size}{The size of an effective difference. For example, a 13 | 5\\% increase in accuracy between two models might be considered a 14 | "real" difference.} 15 | 16 | \item{...}{Options passed to \code{geom_line(stat = "density", ...)}.} 17 | } 18 | \value{ 19 | A \code{\link[ggplot2:ggplot]{ggplot2::ggplot()}} object using \code{geom_density} 20 | faceted by the models being contrasted (when there are 2 or 21 | more contrasts). 22 | } 23 | \description{ 24 | A density is created for each contrast in a faceted grid. 25 | } 26 | \examples{ 27 | data(ex_objects) 28 | library(ggplot2) 29 | autoplot(contrast_samples) 30 | } 31 | -------------------------------------------------------------------------------- /man/contrast_models.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/contrasts.R 3 | \name{contrast_models} 4 | \alias{contrast_models} 5 | \title{Estimate the Difference Between Models} 6 | \usage{ 7 | contrast_models(x, list_1 = NULL, list_2 = NULL, seed = sample.int(10000, 1)) 8 | } 9 | \arguments{ 10 | \item{x}{An object produced by \code{\link[=perf_mod]{perf_mod()}}.} 11 | 12 | \item{list_1, list_2}{Character vectors of equal length that specify the 13 | specific pairwise contrasts. The contrast is parameterized as 14 | \code{list_1[i] - list_2[i]}. If the defaults are left to \code{NULL}, all 15 | combinations are evaluated.} 16 | 17 | \item{seed}{A single integer for sampling from the posterior.} 18 | } 19 | \value{ 20 | A data frame of the posterior distribution(s) of the difference(s). 21 | The object has an extra class of \code{"posterior_diff"}. 22 | } 23 | \description{ 24 | The posterior distributions created by \code{\link[=perf_mod]{perf_mod()}} can be used to obtain 25 | the posterior distribution of the difference(s) between models. One or more 26 | comparisons can be computed at the same time. 27 | } 28 | \details{ 29 | If a transformation was used when \code{x} was created, the inverse is 30 | applied \emph{before} the difference is computed. 31 | } 32 | -------------------------------------------------------------------------------- /man/figures/README-post-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tidymodels/tidyposterior/cd1f51a59fdd38ce38ea0d28e8041b112782d8db/man/figures/README-post-1.png -------------------------------------------------------------------------------- /man/figures/lifecycle-archived.svg: -------------------------------------------------------------------------------- 1 | lifecyclelifecyclearchivedarchived -------------------------------------------------------------------------------- /man/figures/lifecycle-defunct.svg: -------------------------------------------------------------------------------- 1 | lifecyclelifecycledefunctdefunct -------------------------------------------------------------------------------- /man/figures/lifecycle-deprecated.svg: -------------------------------------------------------------------------------- 1 | lifecyclelifecycledeprecateddeprecated -------------------------------------------------------------------------------- /man/figures/lifecycle-experimental.svg: -------------------------------------------------------------------------------- 1 | lifecyclelifecycleexperimentalexperimental -------------------------------------------------------------------------------- /man/figures/lifecycle-maturing.svg: -------------------------------------------------------------------------------- 1 | lifecyclelifecyclematuringmaturing -------------------------------------------------------------------------------- /man/figures/lifecycle-questioning.svg: -------------------------------------------------------------------------------- 1 | lifecyclelifecyclequestioningquestioning -------------------------------------------------------------------------------- /man/figures/lifecycle-soft-deprecated.svg: -------------------------------------------------------------------------------- 1 | lifecyclelifecyclesoft-deprecatedsoft-deprecated -------------------------------------------------------------------------------- /man/figures/lifecycle-stable.svg: -------------------------------------------------------------------------------- 1 | lifecyclelifecyclestablestable -------------------------------------------------------------------------------- /man/figures/lifecycle-superseded.svg: -------------------------------------------------------------------------------- 1 | lifecyclelifecyclesupersededsuperseded -------------------------------------------------------------------------------- /man/figures/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tidymodels/tidyposterior/cd1f51a59fdd38ce38ea0d28e8041b112782d8db/man/figures/logo.png -------------------------------------------------------------------------------- /man/perf_mod.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/perf_mod.R 3 | \name{perf_mod} 4 | \alias{perf_mod} 5 | \alias{perf_mod.rset} 6 | \alias{perf_mod.resamples} 7 | \alias{perf_mod.data.frame} 8 | \alias{perf_mod.tune_results} 9 | \alias{perf_mod.workflow_set} 10 | \title{Bayesian Analysis of Resampling Statistics} 11 | \usage{ 12 | perf_mod(object, ...) 13 | 14 | \method{perf_mod}{rset}(object, transform = no_trans, hetero_var = FALSE, formula = NULL, ...) 15 | 16 | \method{perf_mod}{resamples}( 17 | object, 18 | transform = no_trans, 19 | hetero_var = FALSE, 20 | metric = object$metrics[1], 21 | ... 22 | ) 23 | 24 | \method{perf_mod}{data.frame}(object, transform = no_trans, hetero_var = FALSE, formula = NULL, ...) 25 | 26 | \method{perf_mod}{tune_results}( 27 | object, 28 | metric = NULL, 29 | transform = no_trans, 30 | hetero_var = FALSE, 31 | formula = NULL, 32 | filter = NULL, 33 | ... 34 | ) 35 | 36 | \method{perf_mod}{workflow_set}( 37 | object, 38 | metric = NULL, 39 | transform = no_trans, 40 | hetero_var = FALSE, 41 | formula = NULL, 42 | ... 43 | ) 44 | } 45 | \arguments{ 46 | \item{object}{Depending on the context (see Details below): 47 | \itemize{ 48 | \item A data frame with \code{id} columns for the resampling groupds and metric 49 | results in all of the other columns.. 50 | \item An \code{rset} object (such as \code{\link[rsample:vfold_cv]{rsample::vfold_cv()}}) containing the \code{id} 51 | column(s) and at least two numeric columns of model performance 52 | statistics (e.g. accuracy). 53 | \item An object from \code{caret::resamples}. 54 | \item An object with class \code{tune_results}, which could be produced by 55 | \code{tune::tune_grid()}, \code{tune::tune_bayes()} or similar. 56 | \item A workflow set where all results contain the metric value given in the 57 | \code{metric} argument value. 58 | }} 59 | 60 | \item{...}{Additional arguments to pass to \code{\link[rstanarm:stan_glmer]{rstanarm::stan_glmer()}} such as 61 | \code{verbose}, \code{prior}, \code{seed}, \code{refresh}, \code{family}, etc.} 62 | 63 | \item{transform}{An named list of transformation and inverse 64 | transformation functions. See \code{\link[=logit_trans]{logit_trans()}} as an example.} 65 | 66 | \item{hetero_var}{A logical; if \code{TRUE}, then different 67 | variances are estimated for each model group. Otherwise, the 68 | same variance is used for each group. Estimating heterogeneous 69 | variances may slow or prevent convergence.} 70 | 71 | \item{formula}{An optional model formula to use for the Bayesian hierarchical model 72 | (see Details below).} 73 | 74 | \item{metric}{A single character value for the statistic from 75 | the \code{resamples} object that should be analyzed.} 76 | 77 | \item{filter}{A conditional logic statement that can be used to filter the 78 | statistics generated by \code{tune_results} using the tuning parameter values or 79 | the \code{.config} column.} 80 | } 81 | \value{ 82 | An object of class \code{perf_mod}. If a workfkow set is given in 83 | \code{object}, there is an extra class of \code{"perf_mod_workflow_set"}. 84 | } 85 | \description{ 86 | Bayesian analysis used here to answer the question: "when looking at 87 | resampling results, are the differences between models 'real?'" To answer 88 | this, a model can be created were the \emph{outcome} is the resampling statistics 89 | (e.g. accuracy or RMSE). These values are explained by the model types. In 90 | doing this, we can get parameter estimates for each model's affect on 91 | performance and make statistical (and practical) comparisons between models. 92 | } 93 | \details{ 94 | These functions can be used to process and analyze matched 95 | resampling statistics from different models using a Bayesian generalized 96 | linear model with effects for the model and the resamples. 97 | \subsection{Bayesian Model formula}{ 98 | 99 | By default, a generalized linear model with Gaussian error and an identity 100 | link is fit to the data and has terms for the predictive model grouping 101 | variable. In this way, the performance metrics can be compared between 102 | models. 103 | 104 | Additionally, random effect terms are also used. For most resampling 105 | methods (except repeated \emph{V}-fold cross-validation), a simple random 106 | intercept model its used with an exchangeable (i.e. compound-symmetric) 107 | variance structure. In the case of repeated cross-validation, two random 108 | intercept terms are used; one for the repeat and another for the fold within 109 | repeat. These also have exchangeable correlation structures. 110 | 111 | The above model specification assumes that the variance in the performance 112 | metrics is the same across models. However, this is unlikely to be true in 113 | some cases. For example, for simple binomial accuracy, it well know that the 114 | variance is highest when the accuracy is near 50 percent. When the argument 115 | \code{hetero_var = TRUE}, the variance structure uses random intercepts for each 116 | model term. This may produce more realistic posterior distributions but may 117 | take more time to converge. 118 | 119 | Examples of the default formulas are: 120 | 121 | \preformatted{ 122 | # One ID field and common variance: 123 | statistic ~ model + (model | id) 124 | 125 | # One ID field and heterogeneous variance: 126 | statistic ~ model + (model + 0 | id) 127 | 128 | # Repeated CV (id = repeat, id2 = fold within repeat) 129 | # with a common variance: 130 | statistic ~ model + (model | id2/id) 131 | 132 | # Repeated CV (id = repeat, id2 = fold within repeat) 133 | # with a heterogeneous variance: 134 | statistic ~ model + (model + 0| id2/id) 135 | 136 | # Default for unknown resampling method and 137 | # multiple ID fields: 138 | statistic ~ model + (model | idN/../id) 139 | } 140 | 141 | Custom formulas should use \code{statistic} as the outcome variable and \code{model} 142 | as the factor variable with the model names. 143 | 144 | Also, as shown in the package vignettes, the Gaussian assumption make be 145 | unrealistic. In this case, there are at least two approaches that can be 146 | used. First, the outcome statistics can be transformed prior to fitting the 147 | model. For example, for accuracy, the logit transformation can be used to 148 | convert the outcome values to be on the real line and a model is fit to 149 | these data. Once the posterior distributions are computed, the inverse 150 | transformation can be used to put them back into the original units. The 151 | \code{transform} argument can be used to do this. 152 | 153 | The second approach would be to use a different error distribution from the 154 | exponential family. For RMSE values, the Gamma distribution may produce 155 | better results at the expense of model computational complexity. This can be 156 | achieved by passing the \code{family} argument to \code{perf_mod} as one might with 157 | the \code{glm} function. 158 | } 159 | 160 | \subsection{Input formats}{ 161 | 162 | There are several ways to give resampling results to the \code{perf_mod()} function. To 163 | illustrate, here are some example objects using 10-fold cross-validation for a 164 | simple two-class problem: 165 | 166 | \if{html}{\out{
}}\preformatted{ library(tidymodels) 167 | library(tidyposterior) 168 | library(workflowsets) 169 | 170 | data(two_class_dat, package = "modeldata") 171 | 172 | set.seed(100) 173 | folds <- vfold_cv(two_class_dat) 174 | }\if{html}{\out{
}} 175 | 176 | We can define two different models (for simplicity, with no tuning parameters). 177 | 178 | \if{html}{\out{
}}\preformatted{ logistic_reg_glm_spec <- 179 | logistic_reg() |> 180 | set_engine('glm') 181 | 182 | mars_earth_spec <- 183 | mars(prod_degree = 1) |> 184 | set_engine('earth') |> 185 | set_mode('classification') 186 | }\if{html}{\out{
}} 187 | 188 | For tidymodels, the \code{\link[tune:fit_resamples]{tune::fit_resamples()}} function can be used to estimate 189 | performance for each model/resample: 190 | 191 | \if{html}{\out{
}}\preformatted{ rs_ctrl <- control_resamples(save_workflow = TRUE) 192 | 193 | logistic_reg_glm_res <- 194 | logistic_reg_glm_spec |> 195 | fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl) 196 | 197 | mars_earth_res <- 198 | mars_earth_spec |> 199 | fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl) 200 | }\if{html}{\out{
}} 201 | 202 | From these, there are several ways to pass the results to \code{perf_mod()}. 203 | \subsection{Data Frame as Input}{ 204 | 205 | The most general approach is to have a data frame with the resampling labels (i.e., 206 | one or more id columns) as well as columns for each model that you would like to 207 | compare. 208 | 209 | For the model results above, \code{\link[tune:collect_predictions]{tune::collect_metrics()}} can be used along with some 210 | basic data manipulation steps: 211 | 212 | \if{html}{\out{
}}\preformatted{ logistic_roc <- 213 | collect_metrics(logistic_reg_glm_res, summarize = FALSE) |> 214 | dplyr::filter(.metric == "roc_auc") |> 215 | dplyr::select(id, logistic = .estimate) 216 | 217 | mars_roc <- 218 | collect_metrics(mars_earth_res, summarize = FALSE) |> 219 | dplyr::filter(.metric == "roc_auc") |> 220 | dplyr::select(id, mars = .estimate) 221 | 222 | resamples_df <- full_join(logistic_roc, mars_roc, by = "id") 223 | resamples_df 224 | }\if{html}{\out{
}} 225 | 226 | \if{html}{\out{
}}\preformatted{ ## # A tibble: 10 x 3 227 | ## id logistic mars 228 | ## 229 | ## 1 Fold01 0.908 0.875 230 | ## 2 Fold02 0.904 0.917 231 | ## 3 Fold03 0.924 0.938 232 | ## 4 Fold04 0.881 0.881 233 | ## 5 Fold05 0.863 0.864 234 | ## 6 Fold06 0.893 0.889 235 | ## # … with 4 more rows 236 | }\if{html}{\out{
}} 237 | 238 | We can then give this directly to \code{perf_mod()}: 239 | 240 | \if{html}{\out{
}}\preformatted{ set.seed(101) 241 | roc_model_via_df <- perf_mod(resamples_df, refresh = 0) 242 | tidy(roc_model_via_df) |> summary() 243 | }\if{html}{\out{
}} 244 | 245 | \if{html}{\out{
}}\preformatted{ ## # A tibble: 2 x 4 246 | ## model mean lower upper 247 | ## 248 | ## 1 logistic 0.892 0.879 0.906 249 | ## 2 mars 0.888 0.875 0.902 250 | }\if{html}{\out{
}} 251 | } 252 | 253 | \subsection{rsample Object as Input}{ 254 | 255 | Alternatively, the result columns can be merged back into the original \code{rsample} 256 | object. The up-side to using this method is that \code{perf_mod()} will know exactly 257 | which model formula to use for the Bayesian model: 258 | 259 | \if{html}{\out{
}}\preformatted{ resamples_rset <- 260 | full_join(folds, logistic_roc, by = "id") |> 261 | full_join(mars_roc, by = "id") 262 | 263 | set.seed(101) 264 | roc_model_via_rset <- perf_mod(resamples_rset, refresh = 0) 265 | tidy(roc_model_via_rset) |> summary() 266 | }\if{html}{\out{
}} 267 | 268 | \if{html}{\out{
}}\preformatted{ ## # A tibble: 2 x 4 269 | ## model mean lower upper 270 | ## 271 | ## 1 logistic 0.892 0.879 0.906 272 | ## 2 mars 0.888 0.875 0.902 273 | }\if{html}{\out{
}} 274 | } 275 | 276 | \subsection{Workflow Set Object as Input}{ 277 | 278 | Finally, for tidymodels, a workflow set object can be used. This is a collection of 279 | models/preprocessing combinations in one object. We can emulate a workflow set using 280 | the existing example results then pass that to \code{perf_mod()}: 281 | 282 | \if{html}{\out{
}}\preformatted{ example_wset <- 283 | as_workflow_set(logistic = logistic_reg_glm_res, mars = mars_earth_res) 284 | 285 | set.seed(101) 286 | roc_model_via_wflowset <- perf_mod(example_wset, refresh = 0) 287 | tidy(roc_model_via_rset) |> summary() 288 | }\if{html}{\out{
}} 289 | 290 | \if{html}{\out{
}}\preformatted{ ## # A tibble: 2 x 4 291 | ## model mean lower upper 292 | ## 293 | ## 1 logistic 0.892 0.879 0.906 294 | ## 2 mars 0.888 0.875 0.902 295 | }\if{html}{\out{
}} 296 | } 297 | 298 | \subsection{caret resamples object}{ 299 | 300 | The \code{caret} package can also be used. An equivalent set of models are created: 301 | 302 | \if{html}{\out{
}}\preformatted{ library(caret) 303 | 304 | set.seed(102) 305 | logistic_caret <- train(Class ~ ., data = two_class_dat, method = "glm", 306 | trControl = trainControl(method = "cv")) 307 | 308 | set.seed(102) 309 | mars_caret <- train(Class ~ ., data = two_class_dat, method = "gcvEarth", 310 | tuneGrid = data.frame(degree = 1), 311 | trControl = trainControl(method = "cv")) 312 | }\if{html}{\out{
}} 313 | 314 | Note that these two models use the same resamples as one another due to setting the 315 | seed prior to calling \code{train()}. However, these are different from the tidymodels 316 | results used above (so the final results will be different). 317 | 318 | \code{caret} has a \code{resamples()} function that can collect and collate the resamples. 319 | This can also be given to \code{perf_mod()}: 320 | 321 | \if{html}{\out{
}}\preformatted{ caret_resamples <- resamples(list(logistic = logistic_caret, mars = mars_caret)) 322 | 323 | set.seed(101) 324 | roc_model_via_caret <- perf_mod(caret_resamples, refresh = 0) 325 | tidy(roc_model_via_caret) |> summary() 326 | }\if{html}{\out{
}} 327 | 328 | \if{html}{\out{
}}\preformatted{ ## # A tibble: 2 x 4 329 | ## model mean lower upper 330 | ## 331 | ## 1 logistic 0.821 0.801 0.842 332 | ## 2 mars 0.822 0.802 0.842 333 | }\if{html}{\out{
}} 334 | } 335 | 336 | } 337 | } 338 | \references{ 339 | Kuhn and Silge (2021) \emph{Tidy Models with R}, Chapter 11, 340 | \url{https://www.tmwr.org/compare.html} 341 | } 342 | \seealso{ 343 | \code{\link[=tidy.perf_mod]{tidy.perf_mod()}}, \code{\link[=contrast_models]{contrast_models()}} 344 | } 345 | -------------------------------------------------------------------------------- /man/precise_example.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/data.R 3 | \docType{data} 4 | \name{precise_example} 5 | \alias{precise_example} 6 | \alias{noisy_example} 7 | \alias{concrete_example} 8 | \alias{ts_example} 9 | \alias{ex_object} 10 | \alias{posterior_samples} 11 | \alias{contrast_samples} 12 | \title{Example Data Sets} 13 | \value{ 14 | Tibbles with the additional class \code{rset} 15 | } 16 | \description{ 17 | Example Data Sets 18 | } 19 | \details{ 20 | Several data sets are contained in the package 21 | as examples. Each \emph{simulates} an \code{rset} object but the \code{splits} 22 | columns are not included to save space. 23 | 24 | \itemize{ 25 | \item{\code{precise_example} contains the results of the classification 26 | analysis of a real data set using 10-fold CV. The holdout data 27 | sets contained thousands of examples and have precise 28 | performance estimates. Three models were fit to the original 29 | data and several performance metrics are included.} 30 | 31 | \item{\code{noisy_example} was also generated from a regression data 32 | simulation. The original data set was small (50 samples) and 33 | 10-repeated of 10-fold CV were used with four models. There is 34 | an excessive of variability in the results (probably more than 35 | the resample-to-resample variability). The RMSE distributions 36 | show fairly right-skewed distributions.} 37 | 38 | \item{\code{concrete_example} contains the results of the regression case 39 | study from the book \emph{Applied Predictive Modeling}. The original 40 | data set contained 745 samples in the training set. 10-repeats 41 | of 10-fold CV was also used and 13 models were fit to the data.} 42 | 43 | \item{\code{ts_example} is from a data set where rolling-origin forecast 44 | resampling was used. Each assessment set is the summary of 14 45 | observations (i.e. 2 weeks). The analysis set consisted of a 46 | base of about 5,500 samples plus the previous assessment sets. 47 | Four regression models were applied to these data.} 48 | 49 | \item{\code{ex_object} objects were generated from the \code{two_class_dat} data in 50 | the \code{modeldata} package. Basic 10-fold cross validation was used to evaluate 51 | the models. The \code{posterior_samples} object is samples of the posterior 52 | distribution of the model ROC values while \code{contrast_samples} are posterior 53 | probabilities form the differences in ROC values.} 54 | } 55 | } 56 | \examples{ 57 | data(precise_example) 58 | precise_example 59 | } 60 | \keyword{datasets} 61 | -------------------------------------------------------------------------------- /man/reexports.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/reexports.R 3 | \docType{import} 4 | \name{reexports} 5 | \alias{reexports} 6 | \alias{tidy} 7 | \alias{autoplot} 8 | \title{Objects exported from other packages} 9 | \keyword{internal} 10 | \description{ 11 | These objects are imported from other packages. Follow the links 12 | below to see their documentation. 13 | 14 | \describe{ 15 | \item{generics}{\code{\link[generics]{tidy}}} 16 | 17 | \item{ggplot2}{\code{\link[ggplot2]{autoplot}}} 18 | }} 19 | 20 | -------------------------------------------------------------------------------- /man/rmd/example-interfaces.Rmd: -------------------------------------------------------------------------------- 1 | ```{r} 2 | #| label: quiet-startup 3 | #| include: false 4 | library(tidymodels) 5 | library(tidyposterior) 6 | library(workflowsets) 7 | # library(knitr) 8 | 9 | options(digits = 4, width = 84) 10 | options(dplyr.print_min = 6, dplyr.print_max = 6) 11 | options(cli.width = 85) 12 | options(crayon.enabled = FALSE) 13 | # Inset the code and results with soe extra space 14 | options(prompt = " ", continue = " ") 15 | knitr::opts_chunk$set(comment = " ##", prompt = TRUE) 16 | ``` 17 | 18 | ## Input formats 19 | 20 | There are several ways to give resampling results to the `perf_mod()` function. To illustrate, here are some example objects using 10-fold cross-validation for a simple two-class problem: 21 | 22 | ```{r} 23 | #| label: setup 24 | library(tidymodels) 25 | library(tidyposterior) 26 | library(workflowsets) 27 | 28 | data(two_class_dat, package = "modeldata") 29 | 30 | set.seed(100) 31 | folds <- vfold_cv(two_class_dat) 32 | ``` 33 | 34 | We can define two different models (for simplicity, with no tuning parameters). 35 | 36 | ```{r} 37 | #| label: model-specs 38 | logistic_reg_glm_spec <- 39 | logistic_reg() |> 40 | set_engine('glm') 41 | 42 | mars_earth_spec <- 43 | mars(prod_degree = 1) |> 44 | set_engine('earth') |> 45 | set_mode('classification') 46 | ``` 47 | 48 | For tidymodels, the [tune::fit_resamples()] function can be used to estimate performance for each model/resample: 49 | 50 | ```{r} 51 | #| label: tm-resamples 52 | rs_ctrl <- control_resamples(save_workflow = TRUE) 53 | 54 | logistic_reg_glm_res <- 55 | logistic_reg_glm_spec |> 56 | fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl) 57 | 58 | mars_earth_res <- 59 | mars_earth_spec |> 60 | fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl) 61 | ``` 62 | 63 | From these, there are several ways to pass the results to `perf_mod()`. 64 | 65 | ### Data Frame as Input 66 | 67 | The most general approach is to have a data frame with the resampling labels (i.e., one or more id columns) as well as columns for each model that you would like to compare. 68 | 69 | For the model results above, [tune::collect_metrics()] can be used along with some basic data manipulation steps: 70 | 71 | ```{r} 72 | #| label: df-results 73 | logistic_roc <- 74 | collect_metrics(logistic_reg_glm_res, summarize = FALSE) |> 75 | dplyr::filter(.metric == "roc_auc") |> 76 | dplyr::select(id, logistic = .estimate) 77 | 78 | mars_roc <- 79 | collect_metrics(mars_earth_res, summarize = FALSE) |> 80 | dplyr::filter(.metric == "roc_auc") |> 81 | dplyr::select(id, mars = .estimate) 82 | 83 | resamples_df <- full_join(logistic_roc, mars_roc, by = "id") 84 | resamples_df 85 | ``` 86 | 87 | We can then give this directly to `perf_mod()`: 88 | 89 | ```{r} 90 | #| label: df-mod 91 | set.seed(101) 92 | roc_model_via_df <- perf_mod(resamples_df, refresh = 0) 93 | tidy(roc_model_via_df) |> summary() 94 | ``` 95 | 96 | ### rsample Object as Input 97 | 98 | Alternatively, the result columns can be merged back into the original `rsample` object. The up-side to using this method is that `perf_mod()` will know exactly which model formula to use for the Bayesian model: 99 | 100 | ```{r} 101 | #| label: rset-results 102 | resamples_rset <- 103 | full_join(folds, logistic_roc, by = "id") |> 104 | full_join(mars_roc, by = "id") 105 | 106 | set.seed(101) 107 | roc_model_via_rset <- perf_mod(resamples_rset, refresh = 0) 108 | tidy(roc_model_via_rset) |> summary() 109 | ``` 110 | 111 | ### Workflow Set Object as Input 112 | 113 | Finally, for tidymodels, a workflow set object can be used. This is a collection of models/preprocessing combinations in one object. We can emulate a workflow set using the existing example results then pass that to `perf_mod()`: 114 | 115 | ```{r} 116 | #| label: wflowset-results 117 | example_wset <- 118 | as_workflow_set(logistic = logistic_reg_glm_res, mars = mars_earth_res) 119 | 120 | set.seed(101) 121 | roc_model_via_wflowset <- perf_mod(example_wset, refresh = 0) 122 | tidy(roc_model_via_rset) |> summary() 123 | ``` 124 | 125 | ### caret resamples object 126 | 127 | The `caret` package can also be used. An equivalent set of models are created: 128 | 129 | ```{r} 130 | #| label: quiet-caret 131 | #| include: false 132 | library(caret) 133 | ``` 134 | ```{r} 135 | #| label: caret 136 | library(caret) 137 | 138 | set.seed(102) 139 | logistic_caret <- train(Class ~ ., data = two_class_dat, method = "glm", 140 | trControl = trainControl(method = "cv")) 141 | 142 | set.seed(102) 143 | mars_caret <- train(Class ~ ., data = two_class_dat, method = "gcvEarth", 144 | tuneGrid = data.frame(degree = 1), 145 | trControl = trainControl(method = "cv")) 146 | ``` 147 | 148 | Note that these two models use the same resamples as one another due to setting the seed prior to calling `train()`. However, these are different from the tidymodels results used above (so the final results will be different). 149 | 150 | `caret` has a `resamples()` function that can collect and collate the resamples. This can also be given to `perf_mod()`: 151 | 152 | ```{r} 153 | #| label: caret-resamples 154 | caret_resamples <- resamples(list(logistic = logistic_caret, mars = mars_caret)) 155 | 156 | set.seed(101) 157 | roc_model_via_caret <- perf_mod(caret_resamples, refresh = 0) 158 | tidy(roc_model_via_caret) |> summary() 159 | ``` 160 | -------------------------------------------------------------------------------- /man/rmd/example-interfaces.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | #' ## Input formats 4 | #' 5 | #' There are several ways to give resapling results to the `perf_mod()` function. To #' illustrate, here are some example objects using 10-fold cross-validation for a #' simple two-class problem: 6 | #' 7 | #' 8 | #' ```r 9 | #' library(tidymodels) 10 | #' library(tidyposterior) 11 | #' library(workflowsets) 12 | #' 13 | #' data(two_class_dat, package = "modeldata") 14 | #' 15 | #' set.seed(100) 16 | #' folds <- vfold_cv(two_class_dat) 17 | #' ``` 18 | #' 19 | #' We can define two different models (for simplicity, with no tuning parameters). 20 | #' 21 | #' 22 | #' ```r 23 | #' logistic_reg_glm_spec <- 24 | #' logistic_reg() |> 25 | #' set_engine('glm') 26 | #' 27 | #' mars_earth_spec <- 28 | #' mars(prod_degree = 1) |> 29 | #' set_engine('earth') |> 30 | #' set_mode('classification') 31 | #' ``` 32 | #' 33 | #' For tidymodels, the [tune::fit_resamples()] function can be used to estimate #' performance for each model/resample: 34 | #' 35 | #' 36 | #' ```r 37 | #' rs_ctrl <- control_resamples(save_workflow = TRUE) 38 | #' 39 | #' logistic_reg_glm_res <- 40 | #' logistic_reg_glm_spec |> 41 | #' fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl) 42 | #' 43 | #' mars_earth_res <- 44 | #' mars_earth_spec |> 45 | #' fit_resamples(Class ~ ., resamples = folds, control = rs_ctrl) 46 | #' ``` 47 | #' 48 | #' From these, there are several ways to pass the results to `perf_mod()`. 49 | #' 50 | #' ### Data Frame as Input 51 | #' 52 | #' The most general approach is to have a data frame with the resampling labels (i.e., #' one or more id columns) as well as columns for each model that you would like to #' compare. 53 | #' 54 | #' For the model results above, [tune::collect_metrics()] can be used along with some #' basic data manipulation steps: 55 | #' 56 | #' 57 | #' ```r 58 | #' logistic_roc <- 59 | #' collect_metrics(logistic_reg_glm_res, summarize = FALSE) |> 60 | #' dplyr::filter(.metric == "roc_auc") |> 61 | #' dplyr::select(id, logistic = .estimate) 62 | #' 63 | #' mars_roc <- 64 | #' collect_metrics(mars_earth_res, summarize = FALSE) |> 65 | #' dplyr::filter(.metric == "roc_auc") |> 66 | #' dplyr::select(id, mars = .estimate) 67 | #' 68 | #' resamples_df <- full_join(logistic_roc, mars_roc, by = "id") 69 | #' resamples_df 70 | #' ``` 71 | #' 72 | #' ``` 73 | #' ## # A tibble: 10 x 3 74 | #' ## id logistic mars 75 | #' ## 76 | #' ## 1 Fold01 0.908 0.875 77 | #' ## 2 Fold02 0.904 0.917 78 | #' ## 3 Fold03 0.924 0.938 79 | #' ## 4 Fold04 0.881 0.881 80 | #' ## 5 Fold05 0.863 0.864 81 | #' ## 6 Fold06 0.893 0.889 82 | #' ## # … with 4 more rows 83 | #' ``` 84 | #' 85 | #' We can then give this directly to `perf_mod()`: 86 | #' 87 | #' 88 | #' ```r 89 | #' set.seed(101) 90 | #' roc_model_via_df <- perf_mod(resamples_df, refresh = 0) 91 | #' tidy(roc_model_via_df) |> summary() 92 | #' ``` 93 | #' 94 | #' ``` 95 | #' ## # A tibble: 2 x 4 96 | #' ## model mean lower upper 97 | #' ## 98 | #' ## 1 logistic 0.892 0.879 0.906 99 | #' ## 2 mars 0.888 0.875 0.902 100 | #' ``` 101 | #' 102 | #' ### rsample Object as Input 103 | #' 104 | #' Alternatively, the result columns can be merged back into the original `rsample` #' object. The up-side to using this method is that `perf_mod()` will know exactly #' which model formula to use for the Bayesian model: 105 | #' 106 | #' 107 | #' ```r 108 | #' resamples_rset <- 109 | #' full_join(folds, logistic_roc, by = "id") |> 110 | #' full_join(mars_roc, by = "id") 111 | #' 112 | #' set.seed(101) 113 | #' roc_model_via_rset <- perf_mod(resamples_rset, refresh = 0) 114 | #' tidy(roc_model_via_rset) |> summary() 115 | #' ``` 116 | #' 117 | #' ``` 118 | #' ## # A tibble: 2 x 4 119 | #' ## model mean lower upper 120 | #' ## 121 | #' ## 1 logistic 0.892 0.879 0.906 122 | #' ## 2 mars 0.888 0.875 0.902 123 | #' ``` 124 | #' 125 | #' ### Workflow Set Object as Input 126 | #' 127 | #' Finally, for tidymodels, a workflow set object can be used. This is a collection of #' models/preprocessing combinations in one object. We can emulate a workflow set using #' the existing example results then pass that to `perf_mod()`: 128 | #' 129 | #' 130 | #' ```r 131 | #' example_wset <- 132 | #' as_workflow_set(logistic = logistic_reg_glm_res, mars = mars_earth_res) 133 | #' 134 | #' set.seed(101) 135 | #' roc_model_via_wflowset <- perf_mod(example_wset, refresh = 0) 136 | #' tidy(roc_model_via_rset) |> summary() 137 | #' ``` 138 | #' 139 | #' ``` 140 | #' ## # A tibble: 2 x 4 141 | #' ## model mean lower upper 142 | #' ## 143 | #' ## 1 logistic 0.892 0.879 0.906 144 | #' ## 2 mars 0.888 0.875 0.902 145 | #' ``` 146 | #' 147 | #' ### caret resamples object 148 | #' 149 | #' The `caret` package can also be used. An equivalent set of models are created: 150 | #' 151 | #' 152 | #' 153 | #' ```r 154 | #' library(caret) 155 | #' 156 | #' set.seed(102) 157 | #' logistic_caret <- train(Class ~ ., data = two_class_dat, method = "glm", 158 | #' trControl = trainControl(method = "cv")) 159 | #' 160 | #' set.seed(102) 161 | #' mars_caret <- train(Class ~ ., data = two_class_dat, method = "gcvEarth", 162 | #' tuneGrid = data.frame(degree = 1), 163 | #' trControl = trainControl(method = "cv")) 164 | #' ``` 165 | #' 166 | #' Note that these two models use the same resamples as one another due to setting the #' seed prior to calling `train()`. However, these are different from the tidymodels #' results used above (so the final results will be different). 167 | #' 168 | #' `caret` has a `resamples()` function that can collect and collate the resamples. #' This can also be given to `perf_mod()`: 169 | #' 170 | #' 171 | #' ```r 172 | #' caret_resamples <- resamples(list(logistic = logistic_caret, mars = mars_caret)) 173 | #' 174 | #' set.seed(101) 175 | #' roc_model_via_caret <- perf_mod(caret_resamples, refresh = 0) 176 | #' tidy(roc_model_via_caret) |> summary() 177 | #' ``` 178 | #' 179 | #' ``` 180 | #' ## # A tibble: 2 x 4 181 | #' ## model mean lower upper 182 | #' ## 183 | #' ## 1 logistic 0.821 0.801 0.842 184 | #' ## 2 mars 0.822 0.802 0.842 185 | #' ``` 186 | #' 187 | -------------------------------------------------------------------------------- /man/summary.posterior.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/posteriors.R 3 | \name{summary.posterior} 4 | \alias{summary.posterior} 5 | \title{Summarize the Posterior Distributions of Model Statistics} 6 | \usage{ 7 | \method{summary}{posterior}(object, prob = 0.9, seed = sample.int(10000, 1), ...) 8 | } 9 | \arguments{ 10 | \item{object}{An object produced by \code{\link[=tidy.perf_mod]{tidy.perf_mod()}}.} 11 | 12 | \item{prob}{A number p (0 < p < 1) indicating the desired 13 | probability mass to include in the intervals.} 14 | 15 | \item{seed}{A single integer for sampling from the posterior.} 16 | 17 | \item{...}{Not currently used} 18 | } 19 | \value{ 20 | A data frame with summary statistics and a row for 21 | each model. 22 | } 23 | \description{ 24 | Numerical summaries are created for each model including the 25 | posterior mean and upper and lower credible intervals (aka 26 | uncertainty intervals). 27 | } 28 | \examples{ 29 | data("ex_objects") 30 | 31 | summary(posterior_samples) 32 | } 33 | -------------------------------------------------------------------------------- /man/summary.posterior_diff.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/contrasts.R 3 | \name{summary.posterior_diff} 4 | \alias{summary.posterior_diff} 5 | \title{Summarize Posterior Distributions of Model Differences} 6 | \usage{ 7 | \method{summary}{posterior_diff}(object, prob = 0.9, size = 0, ...) 8 | } 9 | \arguments{ 10 | \item{object}{An object produced by \code{\link[=contrast_models]{contrast_models()}}.} 11 | 12 | \item{prob}{A number p (0 < p < 1) indicating the desired 13 | probability mass to include in the intervals.} 14 | 15 | \item{size}{The size of an effective difference in the units of the chosen 16 | metric. For example, a 5 percent increase in accuracy (\code{size = 0.05}) 17 | between two models might be considered a "real" difference.} 18 | 19 | \item{...}{Not currently used} 20 | } 21 | \value{ 22 | A data frame with interval and ROPE statistics for each 23 | comparison. 24 | } 25 | \description{ 26 | Credible intervals are created for the differences. Also, 27 | region of practical equivalence (ROPE) statistics are computed 28 | when the effective size of a difference is given. 29 | } 30 | \details{ 31 | The ROPE estimates included in the results are the 32 | columns \code{pract_neg}, \code{pract_equiv}, and \code{pract_pos}. \code{pract_neg} 33 | integrates the portion of the posterior below \code{-size} (and 34 | \code{pract_pos} is the upper integral starting at \code{size}). The 35 | interpretation depends on whether the metric being analyzed is 36 | better when larger or smaller. \code{pract_equiv} integrates between 37 | \verb{[-size, size]}. If this is close to one, the two models are 38 | unlikely to be practically different relative to \code{size}. 39 | } 40 | \examples{ 41 | data("ex_objects") 42 | 43 | summary(contrast_samples) 44 | summary(contrast_samples, size = 0.025) 45 | } 46 | -------------------------------------------------------------------------------- /man/tidy.perf_mod.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/posteriors.R 3 | \name{tidy.perf_mod} 4 | \alias{tidy.perf_mod} 5 | \title{Extract Posterior Distributions for Models} 6 | \usage{ 7 | \method{tidy}{perf_mod}(x, seed = sample.int(10000, 1), ...) 8 | } 9 | \arguments{ 10 | \item{x}{An object from \code{\link[=perf_mod]{perf_mod()}}} 11 | 12 | \item{seed}{A single integer for sampling from the posterior.} 13 | 14 | \item{...}{Not currently used} 15 | } 16 | \value{ 17 | A data frame with the additional class \code{"posterior"} 18 | } 19 | \description{ 20 | \code{tidy} can be used on an object produced by \code{\link[=perf_mod]{perf_mod()}} 21 | to create a data frame with a column for the model name and 22 | the posterior predictive distribution values. 23 | } 24 | \details{ 25 | Note that this posterior only reflects the variability 26 | of the groups (i.e. the fixed effects). This helps answer the 27 | question of which model is best \emph{for this data set}. If does not 28 | answer the question of which model would be best on a new 29 | resample of the data (which would have greater variability). 30 | } 31 | -------------------------------------------------------------------------------- /man/tidyposterior-package.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/tidyposterior-package.R 3 | \docType{package} 4 | \name{tidyposterior-package} 5 | \alias{tidyposterior} 6 | \alias{tidyposterior-package} 7 | \title{tidyposterior: Bayesian Analysis to Compare Models using Resampling Statistics} 8 | \description{ 9 | \if{html}{\figure{logo.png}{options: style='float: right' alt='logo' width='120'}} 10 | 11 | Bayesian analysis used here to answer the question: "when looking at resampling results, are the differences between models 'real'?" To answer this, a model can be created were the performance statistic is the resampling statistics (e.g. accuracy or RMSE). These values are explained by the model types. In doing this, we can get parameter estimates for each model's affect on performance and make statistical (and practical) comparisons between models. The methods included here are similar to Benavoli et al (2017) \url{https://jmlr.org/papers/v18/16-305.html}. 12 | } 13 | \seealso{ 14 | Useful links: 15 | \itemize{ 16 | \item \url{https://tidyposterior.tidymodels.org} 17 | \item \url{https://github.com/tidymodels/tidyposterior} 18 | \item Report bugs at \url{https://github.com/tidymodels/tidyposterior/issues} 19 | } 20 | 21 | } 22 | \author{ 23 | \strong{Maintainer}: Max Kuhn \email{max@posit.co} (\href{https://orcid.org/0000-0003-2402-136X}{ORCID}) 24 | 25 | Other contributors: 26 | \itemize{ 27 | \item Posit Software, PBC (03wc8by49) [copyright holder, funder] 28 | } 29 | 30 | } 31 | \keyword{internal} 32 | -------------------------------------------------------------------------------- /man/transformations.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/transformations.R 3 | \docType{data} 4 | \name{no_trans} 5 | \alias{no_trans} 6 | \alias{logit_trans} 7 | \alias{Fisher_trans} 8 | \alias{ln_trans} 9 | \alias{inv_trans} 10 | \title{Simple Transformation Functions} 11 | \format{ 12 | An object of class \code{list} of length 2. 13 | 14 | An object of class \code{list} of length 2. 15 | 16 | An object of class \code{list} of length 2. 17 | 18 | An object of class \code{list} of length 2. 19 | 20 | An object of class \code{list} of length 2. 21 | } 22 | \usage{ 23 | no_trans 24 | 25 | logit_trans 26 | 27 | Fisher_trans 28 | 29 | ln_trans 30 | 31 | inv_trans 32 | } 33 | \description{ 34 | A set of objects are contained here to easily facilitate the 35 | use of outcome transformations for modeling. For example, if 36 | there is a large amount of variability in the resampling results 37 | for the Kappa statistics, which lies between -1 and 1, assuming 38 | normality may produce posterior estimates outside of the natural 39 | bound. One way to solve this is to use a link function or assume 40 | a prior that is appropriately bounded. Another approach is to 41 | transform the outcome values prior to modeling using a Gaussian 42 | prior and reverse-transforming the posterior estimates prior to 43 | visualization and summarization. These object can help 44 | facilitate this last approach. 45 | } 46 | \details{ 47 | The \code{logit_trans} object is useful for model 48 | performance statistics bounds in zero and one, such as accuracy 49 | or the area under the ROC curve. 50 | 51 | \code{ln_trans} and \code{inv_trans} can be useful when the statistics 52 | are right-skewed and strictly positive. 53 | 54 | \code{Fisher_trans} was originally used for correlation statistics 55 | but can be used here for an metrics falling between -1 and 1, 56 | such as Kappa. 57 | } 58 | \examples{ 59 | logit_trans$func(.5) 60 | logit_trans$inv(0) 61 | } 62 | \keyword{datasets} 63 | -------------------------------------------------------------------------------- /man/vctrs_methods_posterior.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/posteriors-compat.R 3 | \name{vec_restore.posterior} 4 | \alias{vec_restore.posterior} 5 | \alias{vec_proxy.posterior} 6 | \alias{vec_ptype2.posterior.posterior} 7 | \alias{vec_ptype2.posterior.tbl_df} 8 | \alias{vec_ptype2.tbl_df.posterior} 9 | \alias{vec_ptype2.posterior.data.frame} 10 | \alias{vec_ptype2.data.frame.posterior} 11 | \alias{vec_cast.posterior.posterior} 12 | \alias{vec_cast.posterior.tbl_df} 13 | \alias{vec_cast.tbl_df.posterior} 14 | \alias{vec_cast.posterior.data.frame} 15 | \alias{vec_cast.data.frame.posterior} 16 | \title{Extra methods for the posterior class to work with dplyr verbs} 17 | \usage{ 18 | vec_restore.posterior(x, to, ...) 19 | 20 | vec_proxy.posterior(x, ...) 21 | 22 | vec_ptype2.posterior.posterior(x, y, ..., x_arg = "", y_arg = "") 23 | 24 | vec_ptype2.posterior.tbl_df(x, y, ..., x_arg = "", y_arg = "") 25 | 26 | vec_ptype2.tbl_df.posterior(x, y, ..., x_arg = "", y_arg = "") 27 | 28 | vec_ptype2.posterior.data.frame(x, y, ..., x_arg = "", y_arg = "") 29 | 30 | vec_ptype2.data.frame.posterior(x, y, ..., x_arg = "", y_arg = "") 31 | 32 | vec_cast.posterior.posterior(x, to, ..., x_arg = "", to_arg = "") 33 | 34 | vec_cast.posterior.tbl_df(x, to, ..., x_arg = "", to_arg = "") 35 | 36 | vec_cast.tbl_df.posterior(x, to, ..., x_arg = "", to_arg = "") 37 | 38 | vec_cast.posterior.data.frame(x, to, ..., x_arg = "", to_arg = "") 39 | 40 | vec_cast.data.frame.posterior(x, to, ..., x_arg = "", to_arg = "") 41 | } 42 | \description{ 43 | Objects with class \code{posterior} are defined to be tibbles with required 44 | columns \code{model} (character) and \code{posterior} (numeric). If operations on these 45 | objects break those rules, they are down-cast to basic tibbles. 46 | } 47 | \keyword{internal} 48 | -------------------------------------------------------------------------------- /man/vctrs_methods_posterior_diff.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/posterior_diff-compat.R 3 | \name{vec_restore.posterior_diff} 4 | \alias{vec_restore.posterior_diff} 5 | \alias{vec_proxy.posterior_diff} 6 | \alias{vec_ptype2.posterior_diff.posterior_diff} 7 | \alias{vec_ptype2.posterior_diff.tbl_df} 8 | \alias{vec_ptype2.tbl_df.posterior_diff} 9 | \alias{vec_ptype2.posterior_diff.data.frame} 10 | \alias{vec_ptype2.data.frame.posterior_diff} 11 | \alias{vec_cast.posterior_diff.posterior_diff} 12 | \alias{vec_cast.posterior_diff.tbl_df} 13 | \alias{vec_cast.tbl_df.posterior_diff} 14 | \alias{vec_cast.posterior_diff.data.frame} 15 | \alias{vec_cast.data.frame.posterior_diff} 16 | \title{Extra methods for the \code{posterior_diff} class to work with dplyr verbs} 17 | \usage{ 18 | vec_restore.posterior_diff(x, to, ...) 19 | 20 | vec_proxy.posterior_diff(x, ...) 21 | 22 | vec_ptype2.posterior_diff.posterior_diff(x, y, ..., x_arg = "", y_arg = "") 23 | 24 | vec_ptype2.posterior_diff.tbl_df(x, y, ..., x_arg = "", y_arg = "") 25 | 26 | vec_ptype2.tbl_df.posterior_diff(x, y, ..., x_arg = "", y_arg = "") 27 | 28 | vec_ptype2.posterior_diff.data.frame(x, y, ..., x_arg = "", y_arg = "") 29 | 30 | vec_ptype2.data.frame.posterior_diff(x, y, ..., x_arg = "", y_arg = "") 31 | 32 | vec_cast.posterior_diff.posterior_diff(x, to, ..., x_arg = "", to_arg = "") 33 | 34 | vec_cast.posterior_diff.tbl_df(x, to, ..., x_arg = "", to_arg = "") 35 | 36 | vec_cast.tbl_df.posterior_diff(x, to, ..., x_arg = "", to_arg = "") 37 | 38 | vec_cast.posterior_diff.data.frame(x, to, ..., x_arg = "", to_arg = "") 39 | 40 | vec_cast.data.frame.posterior_diff(x, to, ..., x_arg = "", to_arg = "") 41 | } 42 | \description{ 43 | Objects with class \code{posterior_diff} are defined to be tibbles with required 44 | columns \code{difference} (numeric) and character columns \code{model_1}, \code{model_2}, 45 | and \code{contrast}. If operations on these objects break those rules, they are 46 | down-cast to basic tibbles. 47 | } 48 | \keyword{internal} 49 | -------------------------------------------------------------------------------- /revdep/README.md: -------------------------------------------------------------------------------- 1 | # Platform 2 | 3 | |field |value | 4 | |:--------|:----------------------------| 5 | |version |R version 4.0.0 (2020-04-24) | 6 | |os |macOS Mojave 10.14.6 | 7 | |system |x86_64, darwin17.0 | 8 | |ui |RStudio | 9 | |language |(EN) | 10 | |collate |en_US.UTF-8 | 11 | |ctype |en_US.UTF-8 | 12 | |tz |America/New_York | 13 | |date |2020-06-03 | 14 | 15 | # Dependencies 16 | 17 | |package |old |new |Δ | 18 | |:-------------|:---------|:----------|:--| 19 | |tidyposterior |0.0.2 |0.0.2.9000 |* | 20 | |askpass |1.1 |1.1 | | 21 | |assertthat |0.2.1 |0.2.1 | | 22 | |backports |1.1.7 |1.1.7 | | 23 | |base64enc |0.1-3 |0.1-3 | | 24 | |bayesplot |1.7.2 |1.7.2 | | 25 | |BH |1.72.0-3 |1.72.0-3 | | 26 | |callr |3.4.3 |3.4.3 | | 27 | |checkmate |2.0.0 |2.0.0 | | 28 | |cli |2.0.2 |2.0.2 | | 29 | |colorspace |1.4-1 |1.4-1 | | 30 | |colourpicker |1.0 |1.0 | | 31 | |crayon |1.3.4 |1.3.4 | | 32 | |crosstalk |1.1.0.1 |1.1.0.1 | | 33 | |curl |4.3 |4.3 | | 34 | |desc |1.2.0 |1.2.0 | | 35 | |digest |0.6.25 |0.6.25 | | 36 | |dplyr |1.0.0 |1.0.0 | | 37 | |DT |0.13 |0.13 | | 38 | |dygraphs |1.1.1.6 |1.1.1.6 | | 39 | |ellipsis |0.3.1 |0.3.1 | | 40 | |evaluate |0.14 |0.14 | | 41 | |fansi |0.4.1 |0.4.1 | | 42 | |farver |2.0.3 |2.0.3 | | 43 | |fastmap |1.0.1 |1.0.1 | | 44 | |furrr |0.1.0 |0.1.0 | | 45 | |future |1.17.0 |1.17.0 | | 46 | |generics |0.0.2 |0.0.2 | | 47 | |ggplot2 |3.3.1 |3.3.1 | | 48 | |ggridges |0.5.2 |0.5.2 | | 49 | |globals |0.12.5 |0.12.5 | | 50 | |glue |1.4.1 |1.4.1 | | 51 | |gridExtra |2.3 |2.3 | | 52 | |gtable |0.3.0 |0.3.0 | | 53 | |gtools |3.8.2 |3.8.2 | | 54 | |htmltools |0.4.0 |0.4.0 | | 55 | |htmlwidgets |1.5.1 |1.5.1 | | 56 | |httpuv |1.5.3.1 |1.5.3.1 | | 57 | |igraph |1.2.5 |1.2.5 | | 58 | |inline |0.3.15 |0.3.15 | | 59 | |isoband |0.2.1 |0.2.1 | | 60 | |jsonlite |1.6.1 |1.6.1 | | 61 | |labeling |0.3 |0.3 | | 62 | |later |1.0.0 |1.0.0 | | 63 | |lazyeval |0.2.2 |0.2.2 | | 64 | |lifecycle |0.2.0 |0.2.0 | | 65 | |listenv |0.8.0 |0.8.0 | | 66 | |lme4 |1.1-23 |1.1-23 | | 67 | |loo |2.2.0 |2.2.0 | | 68 | |magrittr |1.5 |1.5 | | 69 | |markdown |1.1 |1.1 | | 70 | |matrixStats |0.56.0 |0.56.0 | | 71 | |mime |0.9 |0.9 | | 72 | |miniUI |0.1.1.1 |0.1.1.1 | | 73 | |minqa |1.2.4 |1.2.4 | | 74 | |munsell |0.5.0 |0.5.0 | | 75 | |nloptr |1.2.2.1 |1.2.2.1 | | 76 | |openssl |1.4.1 |1.4.1 | | 77 | |packrat |0.5.0 |0.5.0 | | 78 | |pillar |1.4.4 |1.4.4 | | 79 | |pkgbuild |1.0.8 |1.0.8 | | 80 | |pkgconfig |2.0.3 |2.0.3 | | 81 | |pkgload |1.1.0 |1.1.0 | | 82 | |plyr |1.8.6 |1.8.6 | | 83 | |praise |1.0.0 |1.0.0 | | 84 | |prettyunits |1.1.1 |1.1.1 | | 85 | |processx |3.4.2 |3.4.2 | | 86 | |promises |1.1.0 |1.1.0 | | 87 | |ps |1.3.3 |1.3.3 | | 88 | |purrr |0.3.4 |0.3.4 | | 89 | |R6 |2.4.1 |2.4.1 | | 90 | |RColorBrewer |1.1-2 |1.1-2 | | 91 | |Rcpp |1.0.4.6 |1.0.4.6 | | 92 | |RcppEigen |0.3.3.7.0 |0.3.3.7.0 | | 93 | |RcppParallel |5.0.1 |5.0.1 | | 94 | |reshape2 |1.4.4 |1.4.4 | | 95 | |rlang |0.4.6 |0.4.6 | | 96 | |rprojroot |1.3-2 |1.3-2 | | 97 | |rsample |0.0.6 |0.0.6 | | 98 | |rsconnect |0.8.16 |0.8.16 | | 99 | |rstan |2.19.3 |2.19.3 | | 100 | |rstanarm |2.19.3 |2.19.3 | | 101 | |rstantools |2.1.0 |2.1.0 | | 102 | |rstudioapi |0.11 |0.11 | | 103 | |scales |1.1.1 |1.1.1 | | 104 | |shiny |1.4.0.2 |1.4.0.2 | | 105 | |shinyjs |1.1 |1.1 | | 106 | |shinystan |2.5.0 |2.5.0 | | 107 | |shinythemes |1.1.2 |1.1.2 | | 108 | |sourcetools |0.1.7 |0.1.7 | | 109 | |StanHeaders |2.21.0-3 |2.21.0-3 | | 110 | |statmod |1.4.34 |1.4.34 | | 111 | |stringi |1.4.6 |1.4.6 | | 112 | |stringr |1.4.0 |1.4.0 | | 113 | |sys |3.3 |3.3 | | 114 | |testthat |2.3.2 |2.3.2 | | 115 | |threejs |0.3.3 |0.3.3 | | 116 | |tibble |3.0.1 |3.0.1 | | 117 | |tidyr |1.1.0 |1.1.0 | | 118 | |tidyselect |1.1.0 |1.1.0 | | 119 | |utf8 |1.1.4 |1.1.4 | | 120 | |vctrs |0.3.0 |0.3.0 | | 121 | |viridisLite |0.3.0 |0.3.0 | | 122 | |withr |2.2.0 |2.2.0 | | 123 | |xfun |0.14 |0.14 | | 124 | |xtable |1.8-4 |1.8-4 | | 125 | |xts |0.12-0 |0.12-0 | | 126 | |yaml |2.2.1 |2.2.1 | | 127 | |zoo |1.8-8 |1.8-8 | | 128 | 129 | # Revdeps 130 | 131 | -------------------------------------------------------------------------------- /revdep/failures.md: -------------------------------------------------------------------------------- 1 | *Wow, no problems at all. :)* -------------------------------------------------------------------------------- /revdep/problems.md: -------------------------------------------------------------------------------- 1 | *Wow, no problems at all. :)* -------------------------------------------------------------------------------- /tests/testthat.R: -------------------------------------------------------------------------------- 1 | library(testthat) 2 | library(tidyposterior) 3 | 4 | test_check("tidyposterior") 5 | -------------------------------------------------------------------------------- /tests/testthat/_snaps/contrast.md: -------------------------------------------------------------------------------- 1 | # bad args 2 | 3 | Code 4 | contrast_models(fit_bt, "one", c("two", "three")) 5 | Condition 6 | Error in `contrast_models()`: 7 | ! `list_1` and `list_2` should be the same length. 8 | 9 | -------------------------------------------------------------------------------- /tests/testthat/_snaps/perf_mod.md: -------------------------------------------------------------------------------- 1 | # bad arguments 2 | 3 | Code 4 | perf_mod(test_bt, transform = NULL) 5 | Condition 6 | Error in `check_trans()`: 7 | ! `transform` should have two functions: 'func' and 'inv' 8 | 9 | --- 10 | 11 | Code 12 | perf_mod(test_bt, transform = no_trans[1]) 13 | Condition 14 | Error in `check_trans()`: 15 | ! `transform` should have two functions: 'func' and 'inv' 16 | 17 | --- 18 | 19 | Code 20 | perf_mod(test_bt, transform = list(not = 1, right = 2)) 21 | Condition 22 | Error in `check_trans()`: 23 | ! `transform` should have two functions: 'func' and 'inv' 24 | 25 | --- 26 | 27 | Code 28 | perf_mod(test_bt, transform = list(func = 1, inc = 2)) 29 | Condition 30 | Error in `check_trans()`: 31 | ! `transform` should have two functions: 'func' and 'inv' 32 | 33 | --- 34 | 35 | Code 36 | perf_mod(1:10) 37 | Condition 38 | Error in `error_cnd()`: 39 | ! Conditions must have named data fields 40 | 41 | # basic usage 42 | 43 | Code 44 | print(obj_1) 45 | Output 46 | Bayesian Analysis of Resampling Results 47 | Original data: Bootstrap sampling 48 | 49 | 50 | # data frame method 51 | 52 | Code 53 | print(obj_2) 54 | Output 55 | Bayesian Analysis of Resampling Results 56 | 57 | 58 | # rsample method 59 | 60 | Code 61 | print(obj_4) 62 | Output 63 | Bayesian Analysis of Resampling Results 64 | 65 | 66 | # rsample method with repeated cv 67 | 68 | Code 69 | print(obj_5) 70 | Output 71 | Bayesian Analysis of Resampling Results 72 | Original data: 5-fold cross-validation repeated 2 times 73 | 74 | 75 | # repeated v_fold method 76 | 77 | Code 78 | print(obj_6) 79 | Output 80 | Bayesian Analysis of Resampling Results 81 | Original data: 5-fold cross-validation repeated 2 times 82 | 83 | 84 | -------------------------------------------------------------------------------- /tests/testthat/test_contrast.R: -------------------------------------------------------------------------------- 1 | library(tidyposterior) 2 | library(rsample) 3 | library(testthat) 4 | 5 | # ------------------------------------------------------------------------------ 6 | 7 | set.seed(4633) 8 | test_bt <- bootstraps(mtcars, times = 10) 9 | test_bt$one <- rnorm(nrow(test_bt), mean = 10) 10 | test_bt$two <- rnorm(nrow(test_bt), mean = 12) 11 | test_bt$three <- rnorm(nrow(test_bt), mean = 14) 12 | 13 | fit_bt <- perf_mod(test_bt, seed = 781, chains = 2, iter = 100, verbose = FALSE) 14 | 15 | contr_obj <- contrast_models(fit_bt, seed = 3666) 16 | 17 | # ------------------------------------------------------------------------------ 18 | 19 | test_that("bad args", { 20 | expect_snapshot( 21 | error = TRUE, 22 | contrast_models(fit_bt, "one", c("two", "three")) 23 | ) 24 | }) 25 | 26 | test_that("basics", { 27 | expect_equal(nrow(contr_obj), 3 * 100) 28 | expect_true(inherits(contr_obj, "tbl_df")) 29 | expect_true(inherits(contr_obj, "posterior_diff")) 30 | expect_equal( 31 | names(contr_obj), 32 | c("difference", "model_1", "model_2", "contrast") 33 | ) 34 | expect_true(is.character(contr_obj$model_1)) 35 | expect_true(is.character(contr_obj$model_2)) 36 | expect_true(is.character(contr_obj$contrast)) 37 | expect_true(is.numeric(contr_obj$difference)) 38 | }) 39 | 40 | # ------------------------------------------------------------------------------ 41 | 42 | test_that("reproducibility", { 43 | expect_equal(contrast_models(fit_bt, seed = 3666), contr_obj) 44 | }) 45 | 46 | # ------------------------------------------------------------------------------ 47 | 48 | # TODO test for dplyr compatability 49 | 50 | # ------------------------------------------------------------------------------ 51 | 52 | test_that("autoplot for contrasts", { 53 | p_1 <- autoplot(contrast_models(fit_bt, seed = 3666)) 54 | expect_s3_class(p_1, "ggplot") 55 | expect_equal( 56 | names(p_1$data), 57 | c("difference", "model_1", "model_2", "contrast") 58 | ) 59 | expect_equal(rlang::get_expr(p_1$mapping$x), rlang::expr(difference)) 60 | expect_equal(rlang::get_expr(p_1$mapping$y), NULL) 61 | expect_true("model_1" %in% names(as.list(p_1$facet)$params$cols)) 62 | expect_true("model_2" %in% names(as.list(p_1$facet)$params$rows)) 63 | expect_equal(as.character(p_1$labels$y), "Posterior Probability") 64 | expect_equal(as.character(p_1$labels$x), "difference") 65 | }) 66 | -------------------------------------------------------------------------------- /tests/testthat/test_dplyr_new.R: -------------------------------------------------------------------------------- 1 | library(dplyr) 2 | 3 | data("ex_objects", package = "tidyposterior") 4 | 5 | keep_post <- function(x) inherits(x, "posterior") 6 | keep_diff <- function(x) inherits(x, "posterior_diff") 7 | 8 | # ------------------------------------------------------------------------------ 9 | 10 | test_that("keep posterior class", { 11 | expect_true(keep_post(posterior_samples |> dplyr::select(posterior, model))) 12 | expect_true(keep_post( 13 | posterior_samples |> 14 | dplyr::filter(posterior > median(posterior_samples$posterior)) 15 | )) 16 | expect_true(keep_post(posterior_samples |> mutate(hey = "you"))) 17 | expect_true(keep_post( 18 | posterior_samples |> mutate(hey = "you") |> select(-hey) 19 | )) 20 | expect_true(keep_post(posterior_samples |> arrange(model))) 21 | expect_true(keep_post( 22 | posterior_samples |> inner_join(tibble(model = "cart"), by = "model") 23 | )) 24 | expect_true(keep_post( 25 | posterior_samples |> full_join(tibble(model = "cart"), by = "model") 26 | )) 27 | expect_true(keep_post( 28 | posterior_samples |> left_join(tibble(model = "cart"), by = "model") 29 | )) 30 | expect_true(keep_post( 31 | posterior_samples |> right_join(tibble(model = "cart"), by = "model") 32 | )) 33 | expect_true(keep_post( 34 | posterior_samples |> anti_join(tibble(model = "cart"), by = "model") 35 | )) 36 | }) 37 | 38 | test_that("drop posterior class", { 39 | skip_if(tidyposterior:::dplyr_pre_1.0.0()) 40 | expect_false(keep_post(posterior_samples |> select(-model))) 41 | expect_false(keep_post(posterior_samples[, -2])) 42 | }) 43 | 44 | # ------------------------------------------------------------------------------ 45 | 46 | test_that("keep posterior_diff class", { 47 | expect_true(keep_diff( 48 | contrast_samples |> dplyr::select(difference, model_1, model_2, contrast) 49 | )) 50 | expect_true(keep_diff( 51 | contrast_samples |> 52 | dplyr::filter(difference > median(contrast_samples$difference)) 53 | )) 54 | expect_true(keep_diff(contrast_samples |> mutate(hey = "you"))) 55 | expect_true(keep_diff( 56 | contrast_samples |> mutate(hey = "you") |> select(-hey) 57 | )) 58 | expect_true(keep_diff(contrast_samples |> arrange(model_2))) 59 | expect_true(keep_diff( 60 | contrast_samples |> inner_join(tibble(model_2 = "cart"), by = "model_2") 61 | )) 62 | expect_true(keep_diff( 63 | contrast_samples |> full_join(tibble(model_2 = "cart"), by = "model_2") 64 | )) 65 | expect_true(keep_diff( 66 | contrast_samples |> left_join(tibble(model_2 = "cart"), by = "model_2") 67 | )) 68 | expect_true(keep_diff( 69 | contrast_samples |> right_join(tibble(model_2 = "cart"), by = "model_2") 70 | )) 71 | expect_true(keep_diff( 72 | contrast_samples |> anti_join(tibble(model_2 = "cart"), by = "model_2") 73 | )) 74 | }) 75 | 76 | test_that("drop posterior_diff class", { 77 | skip_if(tidyposterior:::dplyr_pre_1.0.0()) 78 | expect_false(keep_diff(contrast_samples |> select(-model_2))) 79 | expect_false(keep_diff(contrast_samples[, -2])) 80 | }) 81 | -------------------------------------------------------------------------------- /tests/testthat/test_perf_mod.R: -------------------------------------------------------------------------------- 1 | ## run fits outside of test functions 2 | ## https://github.com/stan-dev/rstanarm/issues/202 3 | if (rlang::is_installed(c("parsnip", "yardstick"))) { 4 | library(rsample) 5 | library(parsnip) 6 | library(workflowsets) 7 | library(testthat) 8 | library(yardstick) 9 | 10 | set.seed(4633) 11 | test_bt <- bootstraps(mtcars, times = 10) 12 | test_bt$one <- rnorm(nrow(test_bt), mean = 10) 13 | test_bt$two <- rnorm(nrow(test_bt), mean = 12) 14 | 15 | set.seed(4633) 16 | test_rcv <- vfold_cv(mtcars, v = 5, repeats = 2) 17 | test_rcv$one <- rnorm(nrow(test_rcv), mean = 10) 18 | test_rcv$two <- rnorm(nrow(test_rcv), mean = 12) 19 | 20 | ## emulate caret::resamples object from 10-fold 21 | 22 | rs_obj <- list( 23 | methods = c(one = "lm", two = "rpart"), 24 | values = as.data.frame(test_bt[, -1]), 25 | metrics = "blah" 26 | ) 27 | colnames(rs_obj$values) <- c("Resample", "one~blah", "two~blah") 28 | rs_obj$values$Resample <- vfold_cv(mtcars)$id 29 | class(rs_obj) <- "resamples" 30 | 31 | rs_rcv <- rs_obj 32 | rs_rcv$values$Resample <- 33 | paste0("Fold", rep(1:5, 2), ".", "Rep", rep(1:2, each = 5)) 34 | 35 | obj_1 <- perf_mod( 36 | test_bt, 37 | seed = 781, 38 | chains = 2, 39 | iter = 1000, 40 | refresh = 0, 41 | verbose = FALSE 42 | ) 43 | 44 | test_df <- as.data.frame(test_bt[, -1]) 45 | obj_2 <- perf_mod( 46 | test_df, 47 | seed = 781, 48 | refresh = 0, 49 | chains = 2, 50 | iter = 1000, 51 | verbose = FALSE 52 | ) 53 | 54 | obj_3 <- perf_mod( 55 | test_bt, 56 | seed = 781, 57 | chains = 2, 58 | iter = 1000, 59 | refresh = 0, 60 | verbose = FALSE, 61 | hetero_var = TRUE 62 | ) 63 | 64 | obj_4 <- perf_mod( 65 | rs_obj, 66 | seed = 781, 67 | chains = 2, 68 | iter = 1000, 69 | refresh = 0, 70 | verbose = FALSE 71 | ) 72 | 73 | obj_5 <- perf_mod( 74 | rs_rcv, 75 | seed = 781, 76 | chains = 2, 77 | iter = 1000, 78 | verbose = FALSE 79 | ) 80 | 81 | obj_6 <- perf_mod( 82 | test_rcv, 83 | seed = 781, 84 | chains = 2, 85 | iter = 1000, 86 | refresh = 0, 87 | verbose = FALSE 88 | ) 89 | } 90 | # ------------------------------------------------------------------------------ 91 | 92 | test_that("bad arguments", { 93 | expect_snapshot(error = TRUE, perf_mod(test_bt, transform = NULL)) 94 | expect_snapshot(error = TRUE, perf_mod(test_bt, transform = no_trans[1])) 95 | expect_snapshot( 96 | error = TRUE, 97 | perf_mod(test_bt, transform = list(not = 1, right = 2)) 98 | ) 99 | expect_snapshot( 100 | error = TRUE, 101 | perf_mod(test_bt, transform = list(func = 1, inc = 2)) 102 | ) 103 | expect_snapshot(error = TRUE, perf_mod(1:10)) 104 | }) 105 | 106 | # ------------------------------------------------------------------------------ 107 | 108 | test_that("basic usage", { 109 | skip_if_not_installed(c("parsnip")) 110 | skip_if_not_installed(c("yardstick")) 111 | 112 | expect_equal(obj_1$names, c("one", "two")) 113 | expect_equal( 114 | obj_1$ids, 115 | list(id = c(paste0("Bootstrap0", 1:9), "Bootstrap10")) 116 | ) 117 | expect_equal(obj_1$rset_type, "Bootstrap sampling") 118 | expect_equal(class(obj_1$stan), c("stanreg", "glm", "lm", "lmerMod")) 119 | expect_equal( 120 | formula(obj_1$stan), 121 | as.formula(statistic ~ model + (1 | id)), 122 | ignore_formula_env = TRUE 123 | ) 124 | expect_snapshot(print(obj_1)) 125 | expect_equal(summary(obj_1), summary(obj_1$stan)) 126 | }) 127 | 128 | # ------------------------------------------------------------------------------ 129 | 130 | test_that("data frame method", { 131 | skip_if_not_installed(c("parsnip")) 132 | skip_if_not_installed(c("yardstick")) 133 | 134 | expect_equal(obj_2$names, c("one", "two")) 135 | expect_equal( 136 | obj_2$ids, 137 | list(id = c(paste0("Bootstrap0", 1:9), "Bootstrap10")) 138 | ) 139 | expect_equal(obj_2$rset_type, NA) 140 | expect_equal(class(obj_2$stan), c("stanreg", "glm", "lm", "lmerMod")) 141 | expect_equal( 142 | formula(obj_2$stan), 143 | as.formula(statistic ~ model + (1 | id)), 144 | ignore_formula_env = TRUE 145 | ) 146 | expect_snapshot(print(obj_2)) 147 | expect_equal(summary(obj_2), summary(obj_2$stan)) 148 | }) 149 | 150 | # ------------------------------------------------------------------------------ 151 | 152 | test_that("model-specifc variance", { 153 | skip_if_not_installed(c("parsnip")) 154 | skip_if_not_installed(c("yardstick")) 155 | 156 | expect_equal( 157 | formula(obj_3$stan), 158 | as.formula(statistic ~ model + (model + 0 | id)), 159 | ignore_formula_env = TRUE 160 | ) 161 | }) 162 | 163 | # ------------------------------------------------------------------------------ 164 | 165 | test_that("rsample method", { 166 | skip_if_not_installed(c("parsnip")) 167 | skip_if_not_installed(c("yardstick")) 168 | 169 | expect_equal(obj_4$names, c("one", "two")) 170 | expect_equal(obj_4$ids, list(id = c(paste0("Fold0", 1:9), "Fold10"))) 171 | expect_equal(obj_4$rset_type, NA) 172 | expect_equal(class(obj_4$stan), c("stanreg", "glm", "lm", "lmerMod")) 173 | expect_equal( 174 | formula(obj_4$stan), 175 | as.formula(statistic ~ model + (1 | id)), 176 | ignore_formula_env = TRUE 177 | ) 178 | expect_snapshot(print(obj_4)) 179 | expect_equal(summary(obj_4), summary(obj_4$stan)) 180 | }) 181 | 182 | 183 | test_that("rsample method with repeated cv", { 184 | skip_if_not_installed(c("parsnip")) 185 | skip_if_not_installed(c("yardstick")) 186 | 187 | expect_true(tidyposterior:::is_repeated_cv(rs_rcv$values)) 188 | expect_equal(obj_5$names, c("one", "two")) 189 | expect_equal(obj_5$rset_type, "5-fold cross-validation repeated 2 times") 190 | expect_equal(class(obj_5$stan), c("stanreg", "glm", "lm", "lmerMod")) 191 | expect_equal( 192 | formula(obj_5$stan), 193 | as.formula(statistic ~ model + (1 | id2 / id)), 194 | ignore_formula_env = TRUE 195 | ) 196 | expect_snapshot(print(obj_5)) 197 | expect_equal(summary(obj_5), summary(obj_5$stan)) 198 | }) 199 | 200 | # ------------------------------------------------------------------------------ 201 | 202 | test_that("repeated v_fold method", { 203 | skip_if_not_installed(c("parsnip")) 204 | skip_if_not_installed(c("yardstick")) 205 | 206 | expect_equal(obj_6$names, c("one", "two")) 207 | expect_equal( 208 | obj_6$ids, 209 | list( 210 | id = paste0("Repeat", 1:2), 211 | id2 = paste0("Fold", 1:5) 212 | ) 213 | ) 214 | expect_equal(obj_6$rset_type, "5-fold cross-validation repeated 2 times") 215 | expect_equal(class(obj_6$stan), c("stanreg", "glm", "lm", "lmerMod")) 216 | expect_equal( 217 | formula(obj_6$stan), 218 | as.formula(statistic ~ model + (1 | id2 / id)), 219 | ignore_formula_env = TRUE 220 | ) 221 | expect_snapshot(print(obj_6)) 222 | expect_equal(summary(obj_6), summary(obj_6$stan)) 223 | }) 224 | 225 | # ------------------------------------------------------------------------------ 226 | 227 | test_that("summary", { 228 | skip_if_not_installed(c("parsnip")) 229 | skip_if_not_installed(c("yardstick")) 230 | 231 | expect_true(inherits(summary(obj_1), "summary.stanreg")) 232 | }) 233 | 234 | 235 | test_that("postint", { 236 | skip_if_not_installed(c("parsnip")) 237 | skip_if_not_installed(c("yardstick")) 238 | 239 | expect_equal( 240 | tidyposterior:::postint.numeric(2), 241 | data.frame(lower = 2, upper = 2) 242 | ) 243 | expect_equal( 244 | tidyposterior:::postint.data.frame(tidy(obj_1)), 245 | data.frame(lower = 9.52393870753461, upper = 12.3900327798712), 246 | tolerance = 0.01 247 | ) 248 | }) 249 | 250 | test_that("autoplots", { 251 | skip_if_not_installed(c("parsnip")) 252 | skip_if_not_installed(c("yardstick")) 253 | 254 | p_1 <- autoplot(obj_1) 255 | expect_s3_class(p_1, "ggplot") 256 | expect_equal( 257 | names(p_1$data), 258 | c("model", "posterior") 259 | ) 260 | expect_equal(rlang::get_expr(p_1$mapping$x), rlang::expr(posterior)) 261 | expect_equal(rlang::get_expr(p_1$mapping$y), NULL) 262 | expect_equal(rlang::get_expr(p_1$mapping$colour), rlang::expr(model)) 263 | expect_equal(as.list(p_1$facet)$params, list()) 264 | expect_equal(as.character(p_1$labels$y), "density") 265 | expect_equal(as.character(p_1$labels$x), "posterior") 266 | 267 | p_2 <- autoplot(tidy(obj_1)) 268 | expect_s3_class(p_2, "ggplot") 269 | expect_equal( 270 | names(p_2$data), 271 | c("model", "posterior") 272 | ) 273 | expect_equal(rlang::get_expr(p_2$mapping$x), rlang::expr(posterior)) 274 | expect_equal(rlang::get_expr(p_2$mapping$y), NULL) 275 | expect_equal(rlang::get_expr(p_2$mapping$colour), rlang::expr(model)) 276 | expect_equal(as.list(p_2$facet)$params, list()) 277 | expect_equal(as.character(p_2$labels$y), "density") 278 | expect_equal(as.character(p_2$labels$x), "posterior") 279 | }) 280 | 281 | # ------------------------------------------------------------------------------ 282 | 283 | test_that("workflow sets", { 284 | skip_if_not_installed(c("parsnip")) 285 | skip_if_not_installed(c("yardstick")) 286 | 287 | lm_spec <- linear_reg() |> set_engine("lm") 288 | set.seed(10) 289 | bt <- bootstraps(mtcars, times = 10) 290 | wset <- 291 | workflow_set( 292 | list( 293 | one = mpg ~ I(1 / sqrt(disp)), 294 | half = mpg ~ cyl + I(1 / sqrt(disp)) + hp + drat + wt, 295 | all = mpg ~ . 296 | ), 297 | list(lm = lm_spec) 298 | ) |> 299 | workflow_map("fit_resamples", resamples = bt, seed = 1) 300 | 301 | expect_no_error( 302 | rsq_mod <- perf_mod(wset, seed = 3, refresh = 0, metric = "rsq") 303 | ) 304 | expect_equal( 305 | colnames(coef(rsq_mod$stan)$id), 306 | c("(Intercept)", "modelhalf_lm", "modelone_lm") 307 | ) 308 | expect_equal( 309 | unique(tidy(rsq_mod)$model), 310 | c("one_lm", "half_lm", "all_lm") 311 | ) 312 | 313 | p_tidy <- autoplot(rsq_mod, type = "posteriors") 314 | expect_s3_class(p_tidy, "ggplot") 315 | expect_equal( 316 | names(p_tidy$data), 317 | c("model", "posterior") 318 | ) 319 | expect_equal(rlang::get_expr(p_tidy$mapping$x), rlang::expr(posterior)) 320 | expect_equal(rlang::get_expr(p_tidy$mapping$colour), rlang::expr(model)) 321 | expect_equal(as.list(p_tidy$facet$params), list()) 322 | expect_equal(as.character(p_tidy$labels$x), "rsq") 323 | expect_equal(as.character(p_tidy$labels$colour), "model") 324 | expect_equal(as.character(p_tidy$labels$y), "density") 325 | expect_equal(as.character(p_tidy$labels$fill), "fill") 326 | 327 | p_int <- autoplot(rsq_mod, type = "intervals") 328 | expect_s3_class(p_int, "ggplot") 329 | expect_equal( 330 | names(p_int$data), 331 | c("workflow", ".lower", ".estimate", ".upper", "rank") 332 | ) 333 | expect_equal(rlang::get_expr(p_int$mapping$x), rlang::expr(rank)) 334 | expect_equal(rlang::get_expr(p_int$mapping$y), rlang::expr(.estimate)) 335 | expect_equal(rlang::get_expr(p_int$mapping$colour), rlang::expr(workflow)) 336 | expect_equal(as.list(p_tidy$facet$params), list()) 337 | expect_equal(as.character(p_int$labels$x), "Workflow Rank") 338 | expect_equal(as.character(p_int$labels$y), "rsq") 339 | expect_equal(as.character(p_int$labels$colour), "workflow") 340 | expect_equal(as.character(p_int$labels$ymin), ".lower") 341 | expect_equal(as.character(p_int$labels$ymax), ".upper") 342 | 343 | p_rope <- autoplot(rsq_mod, type = "ROPE", size = .1) 344 | expect_s3_class(p_rope, "ggplot") 345 | expect_equal( 346 | names(p_rope$data), 347 | c("model", "pract_equiv", "rank", "workflow") 348 | ) 349 | expect_equal(rlang::get_expr(p_rope$mapping$x), rlang::expr(rank)) 350 | expect_equal(rlang::get_expr(p_rope$mapping$y), rlang::expr(pract_equiv)) 351 | expect_equal(as.list(p_tidy$facet$params), list()) 352 | expect_equal(as.character(p_rope$labels$x), "Workflow Rank") 353 | expect_equal( 354 | as.character(p_rope$labels$y), 355 | "Probability of Practical Equivalence" 356 | ) 357 | expect_equal(as.character(p_rope$labels$colour), "workflow") 358 | }) 359 | -------------------------------------------------------------------------------- /tests/testthat/test_tidy.R: -------------------------------------------------------------------------------- 1 | library(rsample) 2 | library(testthat) 3 | library(ggplot2) 4 | 5 | ################################################################### 6 | 7 | set.seed(4633) 8 | test_bt <- bootstraps(mtcars, times = 10) 9 | test_bt$one <- rnorm(nrow(test_bt), mean = 10) 10 | test_bt$two <- rnorm(nrow(test_bt), mean = 12) 11 | 12 | set.seed(4633) 13 | test_rcv <- vfold_cv(mtcars, v = 5, repeats = 2) 14 | test_rcv$one <- rnorm(nrow(test_rcv), mean = 10) 15 | test_rcv$two <- rnorm(nrow(test_rcv), mean = 12) 16 | 17 | fit_bt <- perf_mod(test_bt, seed = 781, chains = 2, iter = 50, verbose = FALSE) 18 | 19 | tidy_bt <- tidy(fit_bt, seed = 647) 20 | 21 | ################################################################### 22 | 23 | test_that("reproducibility", { 24 | expect_equal(tidy(fit_bt, seed = 647), tidy_bt) 25 | }) 26 | 27 | ################################################################### 28 | 29 | test_that("basic object", { 30 | expect_equal(sort(unique(tidy_bt$model)), c("one", "two")) 31 | expect_equal(nrow(tidy_bt), 100) 32 | }) 33 | 34 | ################################################################### 35 | 36 | test_that("ggplot object", { 37 | p <- ggplot(tidy_bt) 38 | expect_true(inherits(p, "ggplot")) 39 | expect_equal(nrow(p$data), 100) 40 | }) 41 | -------------------------------------------------------------------------------- /tidyposterior.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: No 4 | SaveWorkspace: No 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: knitr 13 | LaTeX: pdfLaTeX 14 | 15 | AutoAppendNewline: Yes 16 | StripTrailingWhitespace: Yes 17 | 18 | BuildType: Package 19 | PackageUseDevtools: Yes 20 | PackageInstallArgs: --no-multiarch --with-keep.source 21 | PackageRoxygenize: rd,collate,namespace 22 | -------------------------------------------------------------------------------- /vignettes/articles/Getting_Started.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Getting Started" 3 | vignette: > 4 | %\VignetteEngine{knitr::rmarkdown} 5 | %\VignetteIndexEntry{Getting Started} 6 | output: 7 | knitr:::html_vignette: 8 | toc: yes 9 | --- 10 | 11 | 12 | ```{r} 13 | #| label: setup 14 | #| include: false 15 | if (rlang::is_installed(c("tidymodels", "parsnip", "yardstick", "splines2"))) { 16 | run <- TRUE 17 | } else { 18 | run <- FALSE 19 | } 20 | 21 | knitr::opts_chunk$set(echo = TRUE, fig.align = "center", eval = run) 22 | library(tidymodels) 23 | library(tidyposterior) 24 | data(two_class_dat) 25 | theme_set(theme_bw() + theme(legend.position = "top")) 26 | ``` 27 | 28 | 29 | To show how `tidyposterior` compares models, let's look at a small data set. The `modeldata` package has a data set called `two_class_dat` that has `r nrow(two_class_dat)` data points on to predictors. The outcome is a two-level factor. There is some linear-ish separation between the classes but hints that a nonlinear class boundary might do slightly better. 30 | 31 | ```{r} 32 | #| label: class-plot 33 | #| fig-alt: "Scatter chart. A along the x-axis, B along the y-axis. Two classes class1 and class2 are colors red and blue respectively. data is linear-ish separation with class2 having higher B values." 34 | library(tidymodels) 35 | library(tidyposterior) 36 | 37 | data(two_class_dat) 38 | 39 | ggplot(two_class_dat, aes(x = A, y = B, col = Class)) + 40 | geom_point(alpha = 0.3, cex = 2) + 41 | coord_fixed() 42 | ``` 43 | 44 | `tidyposterior` models performance statistics produced by models, such as RMSE, accuracy, or the area under the ROC curve. It relies on resampling to produce replicates of these performance statistics so that they can be modeled. 45 | 46 | We'll use simple 10-fold cross-validation here. Any other resampling method from `rsample`, except a simple validation set, would also be appropriate. 47 | 48 | ```{r} 49 | #| label: resampling 50 | set.seed(1) 51 | cv_folds <- vfold_cv(two_class_dat) 52 | cv_folds 53 | ``` 54 | 55 | We'll use a logistic regression model for these data and initially consider two different preprocessing methods that might help the fit. Let's define a model specification: 56 | 57 | ```{r} 58 | #| label: logistic 59 | logistic_spec <- logistic_reg() |> set_engine("glm") 60 | ``` 61 | 62 | ## Comparing modeling/prepreocessing methods 63 | 64 | One way to incorporate nonlinearity into the class boundary is to use a spline basis expansion for the predictors. A `recipe` step using `step_spline_natural` will will encode the predictors in this way. The degrees of freedom will be hard-coded to produce two additional feature columns per predictor: 65 | 66 | ```{r} 67 | #| label: splines 68 | spline_rec <- 69 | recipe(Class ~ ., data = two_class_dat) |> 70 | step_spline_natural(A, B, deg_free = 3) 71 | 72 | spline_wflow <- 73 | workflow() |> 74 | add_recipe(spline_rec) |> 75 | add_model(logistic_spec) 76 | ``` 77 | 78 | Binding the model and recipe into a workflow creates a simple interface when we `fit()` and `predict()` the data (but isn't required by `tidypredict`). 79 | 80 | An alternate preprocessing method is to normalize the data using a spatial sign transformation. This projects the predictors on to a unit circle and can sometimes mitigate the effect of collinearity or outliers. A recipe is also used. Here is a visual representation of the data after the transformation: 81 | 82 | ```{r} 83 | #| label: class-plot-spatial-sign 84 | #| fig-alt: "Scatter chart. A along the x-axis, B along the y-axis. Two classes class1 and class2 are colors red and blue respectively. All points are on the unit circle. with most of the blue points having a higher B value." 85 | spatial_sign_rec <- 86 | recipe(Class ~ ., data = two_class_dat) |> 87 | step_normalize(A, B) |> 88 | step_spatialsign(A, B) 89 | 90 | spatial_sign_rec |> 91 | prep() |> 92 | bake(new_data = NULL) |> 93 | ggplot(aes(x = A, y = B, col = Class)) + 94 | geom_point(alpha = 0.3, cex = 2) + 95 | coord_fixed() 96 | ``` 97 | Another workflow is created for this method: 98 | 99 | ```{r} 100 | #| label: spatial-sign 101 | spatial_sign_wflow <- 102 | workflow() |> 103 | add_recipe(spatial_sign_rec) |> 104 | add_model(logistic_spec) 105 | ``` 106 | 107 | `tidyposterior` does not require the user to create their models using tidymodels packages, `caret`, or any other method (although there are advantages to using those tools). In the end a data frame format with resample identifiers and columns for performance statistics are needed. 108 | 109 | To produce this format with our tidymodels objects, this small convenience function will create a model on the 90% of the data allocated by cross-validation, predict the other 10%, then calculate the area under the ROC curve. If you use tidymodels, there are high-level interfaces (shown below) that don't require such a function. 110 | 111 | ```{r} 112 | #| label: compute-roc-values 113 | compute_roc <- function(split, wflow) { 114 | # Fit the model to 90% of the data 115 | mod <- fit(wflow, data = analysis(split)) 116 | # Predict the other 10% 117 | pred <- predict(mod, new_data = assessment(split), type = "prob") 118 | # Compute the area under the ROC curve 119 | pred |> 120 | bind_cols(assessment(split)) |> 121 | roc_auc(Class, .pred_Class1) |> 122 | pull(.estimate) 123 | } 124 | ``` 125 | 126 | For our `rsample` object `cv_folds`, let's create two columns of ROC statistics using this function in conjunction with `purrr::map` and `dplyr::mutate`: 127 | 128 | ```{r} 129 | #| label: roc-values 130 | roc_values <- 131 | cv_folds |> 132 | mutate( 133 | spatial_sign = map_dbl(splits, compute_roc, spatial_sign_wflow), 134 | splines = map_dbl(splits, compute_roc, spline_wflow) 135 | ) 136 | 137 | roc_values 138 | 139 | # Overall ROC statistics per workflow: 140 | summarize( 141 | roc_values, 142 | splines = mean(splines), 143 | spatial_sign = mean(spatial_sign) 144 | ) 145 | ``` 146 | There is the suggestion that using splines is better than the spatial sign. It would be nice to have some inferential analysis that could tell us if the size of this difference is create than the experimental noise in the data. 147 | 148 | `tidyposterior` uses a Bayesian ANOVA model to compute posterior distributions for the performance statistic of each modeling method. This tells use the probabilistic distribution of the model performance metrics and allows us to make more formal statements about the superiority (or equivalence) of different models. [_Tidy Models with R_](https://www.tmwr.org/compare.html#tidyposterior) has a good explanation of how the Bayesian ANOVA model works. 149 | 150 | The main function to conduct the analysis is `perf_mod()`. The main argument is for the object containing the resampling information and at least two numeric columns of performance statistics (measuring the same metric). As described in `?perf_mod`, there are a variety of other object types that can be used for this argument. 151 | 152 | There are also options for statistical parameters of the analysis, such as any transformation of the output statistics that should be used and so on. 153 | 154 | The main options in our analysis are passed through to the `rstanarm` function `stan_glmer()`. These include: 155 | 156 | * `seed`: An integer that controls the random numbers used in the Bayesian model. 157 | 158 | * `iter`: The total number of Montre Carlo iterations used (including the burn-in samples). 159 | 160 | * `chains`: The number of independent Markov Chain Monte Carlo analyses to compute. 161 | 162 | * `refresh`: How often to update the log (a value of zero means no output). 163 | 164 | Other options that can be helpful (but we'll use their defaults): 165 | 166 | * `prior_intercept`: The main argument in this analysis for specifying the prior distribution of the parameters. 167 | 168 | * `family`: The exponential family distribution for the performance statistics. 169 | 170 | * `cores`: The number of parallel workers to use to speed-up computations. 171 | 172 | Our call to this function is: 173 | 174 | ```{r} 175 | #| label: preproc-bayes-analysis 176 | rset_mod <- perf_mod(roc_values, seed = 2, iter = 5000, chains = 5, refresh = 0) 177 | ``` 178 | 179 | The `summary()` function for this type of object shows the output from `stan_glmer()`. It's long, so we show some of the initial output: 180 | 181 | ```{r} 182 | #| label: preproc-bayes-summary 183 | #| eval: false 184 | print(summary(rset_mod), digits = 3) 185 | ``` 186 | 187 | ```{r} 188 | #| label: preproc-bayes-summary-print 189 | #| echo: false 190 | res <- capture.output(print(summary(rset_mod), digits = 3)) 191 | cat(c(res[1:16], "", "\n"), sep = "\n") 192 | ``` 193 | 194 | Assuming that our assumptions are appropriate, one of the main things that we'd like to get out of the object are samples of the posterior distributions for the performance metrics (per modeling method). The `tidy()` method will produce a data frame with such samples: 195 | 196 | ```{r} 197 | #| label: preproc-bayes-tidy 198 | tidy(rset_mod, seed = 3) 199 | ``` 200 | 201 | We require a `seed` value since it is a sample. 202 | 203 | There is a simple plotting method for the object too: 204 | 205 | ```{r} 206 | #| label: preproc-bayes-plot 207 | #| fig-alt: "Line chart. posterior along the x-axis, density along the y-axis. Two lines representing spatial_sign and splines are colors red and blue respectively. The densities appear to have similar shape, with splines being shifted 0.02 to the right. Full range is from 0.80 to 0.95." 208 | autoplot(rset_mod) 209 | ``` 210 | 211 | There is some overlap but, again, it would be better if we could quantify this. 212 | 213 | To compare models, the `contrast_models()` function computes the posterior distributions of differences in performance statistics between models. For example, what does the posterior look like for the difference in performance for these two preprocessing methods? By default, the function computes all possible differences (a single contrast for this example). There are also `summary()` and plot methods: 214 | 215 | ```{r} 216 | #| label: preproc-contrast 217 | #| fig-alt: "Line chart. Differences in ROC (spatial sign - splines) along the x-axis, posterior along the y-axis. the highpoint of the density is at around -0.02. Full range is -0.05 to 0.02." 218 | preproc_diff <- contrast_models(rset_mod, seed = 4) 219 | summary(preproc_diff, seed = 5) 220 | autoplot(preproc_diff) + 221 | xlab("Difference in ROC (spatial sign - splines)") 222 | ``` 223 | Since the difference is negative, the spline model appears better than the spatial sign method. The summary output quantifies this by producing a simple credible interval for the difference. The probability column also reflects this since it is the probability that the spline ROC scores are greater than the analogous statistics from the spatial sign model. A value of 0.5 would indicate no difference. 224 | 225 | There is an additional analysis that can be used. The ROPE method, short for _Region of Practical Equivalence_, is a method for understanding the differences in models in less subjective way. For this analysis, we would specify a practical effect size (usually before the analysis). This quantity reflects what difference in the metric is considered practically meaning full in the context of our problem. In our example, if we saw two models with a difference in their ROC statistics of 0.02, we might consider them effectually different (your beliefs may differ). 226 | 227 | Once we have settled on a value of this effect size (in the units of the statistic), we can compute how much of the difference is within this region of practical equivalence (in our example, this is `[-0.02, 0.02]`). If the difference is mostly within these bounds, the models might be significantly different but not practically different. Alternatively, if the differences are beyond this, they would be different in both senses. 228 | 229 | The summary and plot methods have optional arguments called `size.` The `summary()` function computes the probability of the posterior differences that fall inside and outside of this region. The plot method shows it visually: 230 | 231 | 232 | ```{r} 233 | #| label: preproc-rope 234 | #| fig-alt: "Line chart. Differences in ROC (spatial sign - splines) along the x-axis, posterior along the y-axis. the highpoint of the density is at around -0.02. Full range is -0.05 to 0.02. dotted vertical lines have been placed at -0.02 and 0.02." 235 | summary(preproc_diff, size = 0.02) |> 236 | select(contrast, starts_with("pract")) 237 | autoplot(preproc_diff, size = 0.02) 238 | ``` 239 | 240 | For this analysis, there are about even odds that the difference between these models is not practically important (since the `pract_equiv` is near 0.5). 241 | 242 | ### About our assumptions 243 | 244 | Previously, the expression "assuming that our assumptions are appropriate" was used. This is an inferential analysis and the validity of our assumptions matter a great deal. There are a few assumptions for this analysis. The main one is that we've specified the outcome distribution well. We've models the area under the ROC curve. This is a statistic bounded (effectively) between 0.5 and 1.0. The variance of the statistic is probably related to the mean; there is likely less variation in scores near 1.0 than those near 0.5. 245 | 246 | The default family for `stan_glmer()` is Gaussian. Given the characteristics of this metric, that assumption might seem problematic. 247 | 248 | However, Gaussian seems like a good first approach for this assumption. The rationale is based on the Central Limit Theorem. As the sample size increases, the sample mean statistic converges to normality despite the distribution of the individual data points. Our performance estimates are summary statistics and, if the training set size is "large enough", they will exhibit behavior consistent with normality. 249 | 250 | As a simple (and approximate) example/diagnostics, suppose we used a simple ANOVA for the ROC statistics using `lm()`. This is not the same analysis as the one used by `tidyposterior`, but the regression parameter estimates should be fairly similar. For that analysis, we can assess the normality of the residuals and see that they are pretty consistent with the normality assumption: 251 | 252 | ```{r} 253 | #| label: roc_dist 254 | #| fig-alt: "Scatter chart. x along the x-axis, y along the y-axis. A dashed line goes at a diagonal, with all the points located around it." 255 | roc_longer <- 256 | roc_values |> 257 | select(-splits) |> 258 | pivot_longer(cols = c(-id), names_to = "preprocessor", values_to = "roc") 259 | 260 | roc_fit <- lm(roc ~ preprocessor, roc_longer) 261 | 262 | roc_fit |> 263 | augment() |> 264 | ggplot(aes(sample = .resid)) + 265 | geom_qq() + 266 | geom_qq_line(lty = 2) + 267 | coord_fixed(ratio = 20) 268 | ``` 269 | 270 | If this were not the case there are a few things that we can do. 271 | 272 | The easiest approach would be to use a variance stabilizing transformation of the metrics and keep the Gaussian assumption. `perf_mod()` has a `transform` argument that will transform the outcome but still produce the posterior distributions in the original units. This will help if the variation within each model significantly changes over the range of the values. When transformed back to the original units, the posteriors will have different variances. 273 | 274 | Another option that can help with heterogeneous variances is `hetero_var.` This fits a difference variance for each modeling method. However, this may make convergence of the model more difficult. 275 | 276 | Finally, a different distribution can be assumed using the family argument to `stan_glmer()`. Since our metrics are numeric, there are not many families to choose from. 277 | 278 | ## Evaluating sub-models 279 | 280 | The previous example was a between-model comparison (where "model" really means statistical model plus preprocessing method). If the model must be tuned, there is also the issue of within-model comparisons. 281 | 282 | For our spline analysis, we assumed that three degrees of freedom were appropriate. However, we might tune the model over that parameter to see what the best degrees of freedom should be. 283 | 284 | The previous spline recipe is altered so that the degrees of freedom parameter doesn't have an actual value. Instead, using a value of `tune()` will mark this parameter for optimization. There are a few different approaches for tuning this parameter; we'll use simpe grid search. 285 | 286 | ```{r} 287 | #| label: tune-rec 288 | spline_rec <- 289 | recipe(Class ~ ., data = two_class_dat) |> 290 | step_ns(A, B, deg_free = tune()) 291 | ``` 292 | 293 | The `tune` package function `tune_grid()` is used to evaluate several values of the parameter. For each value, the resampled area under the ROC curve is computed. 294 | 295 | ```{r} 296 | #| label: tune-grid 297 | spline_tune <- 298 | logistic_spec |> 299 | tune_grid( 300 | spline_rec, 301 | resamples = cv_folds, 302 | grid = tibble(deg_free = c(1, 3, 5, 10)), 303 | metrics = metric_set(roc_auc), 304 | control = control_grid(save_workflow = TRUE) 305 | ) 306 | collect_metrics(spline_tune) |> 307 | arrange(desc(mean)) 308 | ``` 309 | 310 | There is a `perf_mod()` method for this type of object. The computations are conducted in the same manner but, in this instance, four sub-models are compared. 311 | 312 | ```{r} 313 | #| label: tune-grid-bayes 314 | #| fig-alt: "Line chart. roc_auc along the x-axis, density along the y-axis. The density lines are colored according to the preprocessor. There is a fair amount of overlap. With Preprocessor4 having the lowest values." 315 | grid_mod <- perf_mod(spline_tune, seed = 6, iter = 5000, chains = 5, refresh = 0) 316 | autoplot(grid_mod) 317 | ``` 318 | 319 | When the object given to perf_mod is from a model tuning function, the `model` column corresponds to the `.config` column in the results. 320 | 321 | There is a lot of overlap. The results do call into question the overall utility of using splines. A single degree of freedom model corresponds to a linear effect. Let's compare the linear class boundaries to the other sub-models to see if splines are even improving the model. 322 | 323 | The contrast_model function can take two lists of model identifiers and compute their differences. Again, for tuning objects, this should include values of `.config`. This specification compute the difference `{1 df - X df}` so positive differences indicate that the linear model is better. 324 | 325 | ```{r} 326 | #| label: tune-grid-bayes-contrast 327 | grid_diff <- 328 | contrast_models( 329 | grid_mod, 330 | list_1 = rep("Preprocessor1_Model1", 3), 331 | list_2 = c( 332 | "Preprocessor2_Model1", # <- 3 df spline 333 | "Preprocessor3_Model1", # <- 5 df spline 334 | "Preprocessor4_Model1" # <- 10 df spline 335 | ), 336 | seed = 7 337 | ) 338 | ``` 339 | 340 | 341 | ```{r} 342 | #| label: tune-grid-bayes-contrast-plot 343 | #| fig-alt: "Faceted line chart. difference along the x-axis, posterior probability along the y-axis. Each density corresponds to the difference between preprocessor1 and the other 3 preprocessors. preprocessor2 looks to have a mean around 0, preprocessor3 looks to have a mean around 0.0025, and preprocessor4 looks to have a mean around 0.005. Range is from -0.015 to 0.02." 344 | summary(grid_diff) 345 | autoplot(grid_diff) 346 | ``` 347 | 348 | The results indicate that a lot of degrees of freedom might make the model worse. At best, there is a limited difference in performance when more than one spline term is used. 349 | 350 | The ROPE analysis is more definitive; there is no sense of practical differences within the previously used effect size: 351 | 352 | ```{r} 353 | #| label: tune-grid-rope 354 | #| fig-alt: "Faceted line chart. difference along the x-axis, posterior probability along the y-axis. Each density corresponds to the difference between preprocessor1 and the other 3 preprocessors. preprocessor2 looks to have a mean around 0, preprocessor3 looks to have a mean around 0.0025, and preprocessor4 looks to have a mean around 0.005. Range is from -0.015 to 0.02. vertical dashed lines placed at -0.02 and 0.02." 355 | autoplot(grid_diff, size = 0.02) 356 | ``` 357 | 358 | ## Workflow sets 359 | 360 | Workflow sets are collections of workflows and their results. These can be made after existing workflows have been evaluated or by using `workflow_set()` to create an evaluate the models. 361 | 362 | Let's create an initial set that has difference combinations of the two predictors for this data set. 363 | 364 | ```{r} 365 | #| label: wflow-set-create 366 | library(workflowsets) 367 | 368 | logistic_set <- 369 | workflow_set( 370 | list(A = Class ~ A, B = Class ~ B, ratio = Class ~ I(log(A/B)), 371 | spatial_sign = spatial_sign_rec), 372 | list(logistic = logistic_spec) 373 | ) 374 | logistic_set 375 | ``` 376 | 377 | The `object` volumn contains the workflows that are created by the combinations of preprocessors and the model (multiple models could have been used). Rather than calling the same functions from the `tune` package repeatedly, we can evaluate these with a single function call. Notice that none of these workflows require tuning so `tune::fit_resamples()` can be used: 378 | 379 | ```{r} 380 | #| label: wflow-set-resample 381 | logistic_res <- 382 | logistic_set |> 383 | workflow_map("fit_resamples", seed = 3, resamples = cv_folds, 384 | metrics = metric_set(roc_auc)) 385 | logistic_res 386 | 387 | collect_metrics(logistic_res) |> 388 | filter(.metric == "roc_auc") 389 | ``` 390 | 391 | We can also add the previously tuned spline results by first converting them to a workflow set then appending their rows to the results: 392 | 393 | ```{r} 394 | #| label: wflow-set-appned 395 | logistic_res <- 396 | logistic_res |> 397 | bind_rows( 398 | as_workflow_set(splines = spline_tune) 399 | ) 400 | logistic_res 401 | ``` 402 | 403 | There are some convenience functions to take an initial look at the results: 404 | 405 | ```{r} 406 | #| label: wflow-res 407 | #| fig-alt: "Error-bar chart. The x-axis is the workflow rank in the set (a value of one being the best) versus the performance metric(s) on the y-axis. All but two of the hower around a little under 0.9. with the last two being at 0.75 and 0.70." 408 | rank_results(logistic_res, rank_metric = "roc_auc") |> 409 | filter(.metric == "roc_auc") 410 | autoplot(logistic_res, metric = "roc_auc") 411 | ``` 412 | 413 | The `perf_mod()` method for workflow sets takes the best submodel from each workflow and then uses the standard `tidyposterior` analysis: 414 | 415 | ```{r} 416 | #| label: wflow-mod 417 | roc_mod <- perf_mod(logistic_res, metric = "roc_auc", seed = 1, refresh = 0) 418 | ``` 419 | 420 | The results of this call produces an object with an additional class to enable some `autoplot()` methods specific to workflow sets. For example, the default plot shows 90% credible intervals for the best results in each workflow: 421 | 422 | ```{r} 423 | #| label: wflow-intervals 424 | #| fig-alt: "Error-bar chart. The x-axis is the workflow rank in the set (a value of one being the best) versus the performance metric(s) on the y-axis. splines is doing the best with a mean around 0.885, spatial_sign_logistic and B_logistic both have a mean around 0.87. The error bars makes it so these 3 models are within each others. ratio_logisitc has a mean of 0.75 and A_logistic has a mean around 0.70." 425 | autoplot(roc_mod) 426 | ``` 427 | 428 | Alternatively, the ROPE estimates for a given since can be computed to compare the numerically best workflow to the others. The probability of practical equivalence is shown for all results: 429 | 430 | ```{r} 431 | #| label: wflow-rope 432 | #| fig-alt: "Connected line chart. Workflow Rank along the x-axis, Probability of Practical Equivalence along the y axis. Splines has value of 1. spatial_sign_logistic has value of 0.6, B_logistic has a value of 0.575, ratio_logistic and A_logistic both has a value of 0." 433 | autoplot(roc_mod, type = "ROPE", size = 0.025) 434 | ``` 435 | -------------------------------------------------------------------------------- /vignettes/tidyposterior.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Introduction to tidyposterior" 3 | vignette: > 4 | %\VignetteEngine{knitr::rmarkdown} 5 | %\VignetteIndexEntry{Introduction to tidyposterior} 6 | output: 7 | knitr:::html_vignette: 8 | toc: yes 9 | --- 10 | 11 | A new version of this vignette can be found at 12 | --------------------------------------------------------------------------------