├── .github └── workflows │ ├── README.md │ ├── pr-close-signal.yaml │ ├── pr-comment.yaml │ ├── pr-post-remove-branch.yaml │ ├── pr-preflight.yaml │ ├── pr-receive.yaml │ ├── sandpaper-main.yaml │ ├── sandpaper-version.txt │ ├── update-cache.yaml │ └── update-workflows.yaml ├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE.md ├── README.md ├── config.yaml ├── episodes ├── basic-targets.Rmd ├── branch.Rmd ├── cache.Rmd ├── fig │ ├── 03-qmd-workflow.png │ ├── basic-rstudio-project.png │ ├── basic-rstudio-wizard.png │ └── lifecycle-visnetwork.png ├── files.Rmd ├── files │ ├── lesson_functions.R │ ├── packages.R │ ├── plans │ │ ├── README.md │ │ ├── plan_0.R │ │ ├── plan_1.R │ │ ├── plan_10.R │ │ ├── plan_11.R │ │ ├── plan_2.R │ │ ├── plan_2b.R │ │ ├── plan_3.R │ │ ├── plan_4.R │ │ ├── plan_5.R │ │ ├── plan_6.R │ │ ├── plan_6b.R │ │ ├── plan_7.R │ │ ├── plan_8.R │ │ └── plan_9.R │ └── tar_functions │ │ ├── README.md │ │ ├── augment_with_mod_name.R │ │ ├── augment_with_mod_name_slow.R │ │ ├── clean_penguin_data.R │ │ ├── glance_with_mod_name.R │ │ ├── glance_with_mod_name_slow.R │ │ ├── model_augment.R │ │ ├── model_augment_slow.R │ │ ├── model_glance.R │ │ ├── model_glance_orig.R │ │ ├── model_glance_slow.R │ │ └── write_lines_file.R ├── functions.Rmd ├── introduction.Rmd ├── lifecycle.Rmd ├── organization.Rmd ├── packages.Rmd ├── parallel.Rmd └── quarto.Rmd ├── index.md ├── instructors └── instructor-notes.md ├── learners ├── reference.md └── setup.md ├── links.md ├── profiles └── learner-profiles.md ├── renv ├── activate.R ├── profile └── profiles │ └── lesson-requirements │ ├── renv.lock │ └── renv │ ├── .gitignore │ └── settings.json ├── site └── README.md └── targets-workshop.Rproj /.github/workflows/README.md: -------------------------------------------------------------------------------- 1 | # Carpentries Workflows 2 | 3 | This directory contains workflows to be used for Lessons using the {sandpaper} 4 | lesson infrastructure. Two of these workflows require R (`sandpaper-main.yaml` 5 | and `pr-receive.yaml`) and the rest are bots to handle pull request management. 6 | 7 | These workflows will likely change as {sandpaper} evolves, so it is important to 8 | keep them up-to-date. To do this in your lesson you can do the following in your 9 | R console: 10 | 11 | ```r 12 | # Install/Update sandpaper 13 | options(repos = c(carpentries = "https://carpentries.r-universe.dev/", 14 | CRAN = "https://cloud.r-project.org")) 15 | install.packages("sandpaper") 16 | 17 | # update the workflows in your lesson 18 | library("sandpaper") 19 | update_github_workflows() 20 | ``` 21 | 22 | Inside this folder, you will find a file called `sandpaper-version.txt`, which 23 | will contain a version number for sandpaper. This will be used in the future to 24 | alert you if a workflow update is needed. 25 | 26 | What follows are the descriptions of the workflow files: 27 | 28 | ## Deployment 29 | 30 | ### 01 Build and Deploy (sandpaper-main.yaml) 31 | 32 | This is the main driver that will only act on the main branch of the repository. 33 | This workflow does the following: 34 | 35 | 1. checks out the lesson 36 | 2. provisions the following resources 37 | - R 38 | - pandoc 39 | - lesson infrastructure (stored in a cache) 40 | - lesson dependencies if needed (stored in a cache) 41 | 3. builds the lesson via `sandpaper:::ci_deploy()` 42 | 43 | #### Caching 44 | 45 | This workflow has two caches; one cache is for the lesson infrastructure and 46 | the other is for the the lesson dependencies if the lesson contains rendered 47 | content. These caches are invalidated by new versions of the infrastructure and 48 | the `renv.lock` file, respectively. If there is a problem with the cache, 49 | manual invaliation is necessary. You will need maintain access to the repository 50 | and you can either go to the actions tab and [click on the caches button to find 51 | and invalidate the failing cache](https://github.blog/changelog/2022-10-20-manage-caches-in-your-actions-workflows-from-web-interface/) 52 | or by setting the `CACHE_VERSION` secret to the current date (which will 53 | invalidate all of the caches). 54 | 55 | ## Updates 56 | 57 | ### Setup Information 58 | 59 | These workflows run on a schedule and at the maintainer's request. Because they 60 | create pull requests that update workflows/require the downstream actions to run, 61 | they need a special repository/organization secret token called 62 | `SANDPAPER_WORKFLOW` and it must have the `public_repo` and `workflow` scope. 63 | 64 | This can be an individual user token, OR it can be a trusted bot account. If you 65 | have a repository in one of the official Carpentries accounts, then you do not 66 | need to worry about this token being present because the Carpentries Core Team 67 | will take care of supplying this token. 68 | 69 | If you want to use your personal account: you can go to 70 | 71 | to create a token. Once you have created your token, you should copy it to your 72 | clipboard and then go to your repository's settings > secrets > actions and 73 | create or edit the `SANDPAPER_WORKFLOW` secret, pasting in the generated token. 74 | 75 | If you do not specify your token correctly, the runs will not fail and they will 76 | give you instructions to provide the token for your repository. 77 | 78 | ### 02 Maintain: Update Workflow Files (update-workflow.yaml) 79 | 80 | The {sandpaper} repository was designed to do as much as possible to separate 81 | the tools from the content. For local builds, this is absolutely true, but 82 | there is a minor issue when it comes to workflow files: they must live inside 83 | the repository. 84 | 85 | This workflow ensures that the workflow files are up-to-date. The way it work is 86 | to download the update-workflows.sh script from GitHub and run it. The script 87 | will do the following: 88 | 89 | 1. check the recorded version of sandpaper against the current version on github 90 | 2. update the files if there is a difference in versions 91 | 92 | After the files are updated, if there are any changes, they are pushed to a 93 | branch called `update/workflows` and a pull request is created. Maintainers are 94 | encouraged to review the changes and accept the pull request if the outputs 95 | are okay. 96 | 97 | This update is run weekly or on demand. 98 | 99 | ### 03 Maintain: Update Package Cache (update-cache.yaml) 100 | 101 | For lessons that have generated content, we use {renv} to ensure that the output 102 | is stable. This is controlled by a single lockfile which documents the packages 103 | needed for the lesson and the version numbers. This workflow is skipped in 104 | lessons that do not have generated content. 105 | 106 | Because the lessons need to remain current with the package ecosystem, it's a 107 | good idea to make sure these packages can be updated periodically. The 108 | update cache workflow will do this by checking for updates, applying them in a 109 | branch called `updates/packages` and creating a pull request with _only the 110 | lockfile changed_. 111 | 112 | From here, the markdown documents will be rebuilt and you can inspect what has 113 | changed based on how the packages have updated. 114 | 115 | ## Pull Request and Review Management 116 | 117 | Because our lessons execute code, pull requests are a secruity risk for any 118 | lesson and thus have security measures associted with them. **Do not merge any 119 | pull requests that do not pass checks and do not have bots commented on them.** 120 | 121 | This series of workflows all go together and are described in the following 122 | diagram and the below sections: 123 | 124 | ![Graph representation of a pull request](https://carpentries.github.io/sandpaper/articles/img/pr-flow.dot.svg) 125 | 126 | ### Pre Flight Pull Request Validation (pr-preflight.yaml) 127 | 128 | This workflow runs every time a pull request is created and its purpose is to 129 | validate that the pull request is okay to run. This means the following things: 130 | 131 | 1. The pull request does not contain modified workflow files 132 | 2. If the pull request contains modified workflow files, it does not contain 133 | modified content files (such as a situation where @carpentries-bot will 134 | make an automated pull request) 135 | 3. The pull request does not contain an invalid commit hash (e.g. from a fork 136 | that was made before a lesson was transitioned from styles to use the 137 | workbench). 138 | 139 | Once the checks are finished, a comment is issued to the pull request, which 140 | will allow maintainers to determine if it is safe to run the 141 | "Receive Pull Request" workflow from new contributors. 142 | 143 | ### Receive Pull Request (pr-receive.yaml) 144 | 145 | **Note of caution:** This workflow runs arbitrary code by anyone who creates a 146 | pull request. GitHub has safeguarded the token used in this workflow to have no 147 | priviledges in the repository, but we have taken precautions to protect against 148 | spoofing. 149 | 150 | This workflow is triggered with every push to a pull request. If this workflow 151 | is already running and a new push is sent to the pull request, the workflow 152 | running from the previous push will be cancelled and a new workflow run will be 153 | started. 154 | 155 | The first step of this workflow is to check if it is valid (e.g. that no 156 | workflow files have been modified). If there are workflow files that have been 157 | modified, a comment is made that indicates that the workflow is not run. If 158 | both a workflow file and lesson content is modified, an error will occurr. 159 | 160 | The second step (if valid) is to build the generated content from the pull 161 | request. This builds the content and uploads three artifacts: 162 | 163 | 1. The pull request number (pr) 164 | 2. A summary of changes after the rendering process (diff) 165 | 3. The rendered files (build) 166 | 167 | Because this workflow builds generated content, it follows the same general 168 | process as the `sandpaper-main` workflow with the same caching mechanisms. 169 | 170 | The artifacts produced are used by the next workflow. 171 | 172 | ### Comment on Pull Request (pr-comment.yaml) 173 | 174 | This workflow is triggered if the `pr-receive.yaml` workflow is successful. 175 | The steps in this workflow are: 176 | 177 | 1. Test if the workflow is valid and comment the validity of the workflow to the 178 | pull request. 179 | 2. If it is valid: create an orphan branch with two commits: the current state 180 | of the repository and the proposed changes. 181 | 3. If it is valid: update the pull request comment with the summary of changes 182 | 183 | Importantly: if the pull request is invalid, the branch is not created so any 184 | malicious code is not published. 185 | 186 | From here, the maintainer can request changes from the author and eventually 187 | either merge or reject the PR. When this happens, if the PR was valid, the 188 | preview branch needs to be deleted. 189 | 190 | ### Send Close PR Signal (pr-close-signal.yaml) 191 | 192 | Triggered any time a pull request is closed. This emits an artifact that is the 193 | pull request number for the next action 194 | 195 | ### Remove Pull Request Branch (pr-post-remove-branch.yaml) 196 | 197 | Tiggered by `pr-close-signal.yaml`. This removes the temporary branch associated with 198 | the pull request (if it was created). 199 | -------------------------------------------------------------------------------- /.github/workflows/pr-close-signal.yaml: -------------------------------------------------------------------------------- 1 | name: "Bot: Send Close Pull Request Signal" 2 | 3 | on: 4 | pull_request: 5 | types: 6 | [closed] 7 | 8 | jobs: 9 | send-close-signal: 10 | name: "Send closing signal" 11 | runs-on: ubuntu-22.04 12 | if: ${{ github.event.action == 'closed' }} 13 | steps: 14 | - name: "Create PRtifact" 15 | run: | 16 | mkdir -p ./pr 17 | printf ${{ github.event.number }} > ./pr/NUM 18 | - name: Upload Diff 19 | uses: actions/upload-artifact@v4 20 | with: 21 | name: pr 22 | path: ./pr 23 | -------------------------------------------------------------------------------- /.github/workflows/pr-comment.yaml: -------------------------------------------------------------------------------- 1 | name: "Bot: Comment on the Pull Request" 2 | 3 | # read-write repo token 4 | # access to secrets 5 | on: 6 | workflow_run: 7 | workflows: ["Receive Pull Request"] 8 | types: 9 | - completed 10 | 11 | concurrency: 12 | group: pr-${{ github.event.workflow_run.pull_requests[0].number }} 13 | cancel-in-progress: true 14 | 15 | 16 | jobs: 17 | # Pull requests are valid if: 18 | # - they match the sha of the workflow run head commit 19 | # - they are open 20 | # - no .github files were committed 21 | test-pr: 22 | name: "Test if pull request is valid" 23 | runs-on: ubuntu-22.04 24 | if: > 25 | github.event.workflow_run.event == 'pull_request' && 26 | github.event.workflow_run.conclusion == 'success' 27 | outputs: 28 | is_valid: ${{ steps.check-pr.outputs.VALID }} 29 | payload: ${{ steps.check-pr.outputs.payload }} 30 | number: ${{ steps.get-pr.outputs.NUM }} 31 | msg: ${{ steps.check-pr.outputs.MSG }} 32 | steps: 33 | - name: 'Download PR artifact' 34 | id: dl 35 | uses: carpentries/actions/download-workflow-artifact@main 36 | with: 37 | run: ${{ github.event.workflow_run.id }} 38 | name: 'pr' 39 | 40 | - name: "Get PR Number" 41 | if: ${{ steps.dl.outputs.success == 'true' }} 42 | id: get-pr 43 | run: | 44 | unzip pr.zip 45 | echo "NUM=$(<./NR)" >> $GITHUB_OUTPUT 46 | 47 | - name: "Fail if PR number was not present" 48 | id: bad-pr 49 | if: ${{ steps.dl.outputs.success != 'true' }} 50 | run: | 51 | echo '::error::A pull request number was not recorded. The pull request that triggered this workflow is likely malicious.' 52 | exit 1 53 | - name: "Get Invalid Hashes File" 54 | id: hash 55 | run: | 56 | echo "json<> $GITHUB_OUTPUT 59 | - name: "Check PR" 60 | id: check-pr 61 | if: ${{ steps.dl.outputs.success == 'true' }} 62 | uses: carpentries/actions/check-valid-pr@main 63 | with: 64 | pr: ${{ steps.get-pr.outputs.NUM }} 65 | sha: ${{ github.event.workflow_run.head_sha }} 66 | headroom: 3 # if it's within the last three commits, we can keep going, because it's likely rapid-fire 67 | invalid: ${{ fromJSON(steps.hash.outputs.json)[github.repository] }} 68 | fail_on_error: true 69 | 70 | # Create an orphan branch on this repository with two commits 71 | # - the current HEAD of the md-outputs branch 72 | # - the output from running the current HEAD of the pull request through 73 | # the md generator 74 | create-branch: 75 | name: "Create Git Branch" 76 | needs: test-pr 77 | runs-on: ubuntu-22.04 78 | if: ${{ needs.test-pr.outputs.is_valid == 'true' }} 79 | env: 80 | NR: ${{ needs.test-pr.outputs.number }} 81 | permissions: 82 | contents: write 83 | steps: 84 | - name: 'Checkout md outputs' 85 | uses: actions/checkout@v4 86 | with: 87 | ref: md-outputs 88 | path: built 89 | fetch-depth: 1 90 | 91 | - name: 'Download built markdown' 92 | id: dl 93 | uses: carpentries/actions/download-workflow-artifact@main 94 | with: 95 | run: ${{ github.event.workflow_run.id }} 96 | name: 'built' 97 | 98 | - if: ${{ steps.dl.outputs.success == 'true' }} 99 | run: unzip built.zip 100 | 101 | - name: "Create orphan and push" 102 | if: ${{ steps.dl.outputs.success == 'true' }} 103 | run: | 104 | cd built/ 105 | git config --local user.email "actions@github.com" 106 | git config --local user.name "GitHub Actions" 107 | CURR_HEAD=$(git rev-parse HEAD) 108 | git checkout --orphan md-outputs-PR-${NR} 109 | git add -A 110 | git commit -m "source commit: ${CURR_HEAD}" 111 | ls -A | grep -v '^.git$' | xargs -I _ rm -r '_' 112 | cd .. 113 | unzip -o -d built built.zip 114 | cd built 115 | git add -A 116 | git commit --allow-empty -m "differences for PR #${NR}" 117 | git push -u --force --set-upstream origin md-outputs-PR-${NR} 118 | 119 | # Comment on the Pull Request with a link to the branch and the diff 120 | comment-pr: 121 | name: "Comment on Pull Request" 122 | needs: [test-pr, create-branch] 123 | runs-on: ubuntu-22.04 124 | if: ${{ needs.test-pr.outputs.is_valid == 'true' }} 125 | env: 126 | NR: ${{ needs.test-pr.outputs.number }} 127 | permissions: 128 | pull-requests: write 129 | steps: 130 | - name: 'Download comment artifact' 131 | id: dl 132 | uses: carpentries/actions/download-workflow-artifact@main 133 | with: 134 | run: ${{ github.event.workflow_run.id }} 135 | name: 'diff' 136 | 137 | - if: ${{ steps.dl.outputs.success == 'true' }} 138 | run: unzip ${{ github.workspace }}/diff.zip 139 | 140 | - name: "Comment on PR" 141 | id: comment-diff 142 | if: ${{ steps.dl.outputs.success == 'true' }} 143 | uses: carpentries/actions/comment-diff@main 144 | with: 145 | pr: ${{ env.NR }} 146 | path: ${{ github.workspace }}/diff.md 147 | 148 | # Comment if the PR is open and matches the SHA, but the workflow files have 149 | # changed 150 | comment-changed-workflow: 151 | name: "Comment if workflow files have changed" 152 | needs: test-pr 153 | runs-on: ubuntu-22.04 154 | if: ${{ always() && needs.test-pr.outputs.is_valid == 'false' }} 155 | env: 156 | NR: ${{ github.event.workflow_run.pull_requests[0].number }} 157 | body: ${{ needs.test-pr.outputs.msg }} 158 | permissions: 159 | pull-requests: write 160 | steps: 161 | - name: 'Check for spoofing' 162 | id: dl 163 | uses: carpentries/actions/download-workflow-artifact@main 164 | with: 165 | run: ${{ github.event.workflow_run.id }} 166 | name: 'built' 167 | 168 | - name: 'Alert if spoofed' 169 | id: spoof 170 | if: ${{ steps.dl.outputs.success == 'true' }} 171 | run: | 172 | echo 'body<> $GITHUB_ENV 173 | echo '' >> $GITHUB_ENV 174 | echo '## :x: DANGER :x:' >> $GITHUB_ENV 175 | echo 'This pull request has modified workflows that created output. Close this now.' >> $GITHUB_ENV 176 | echo '' >> $GITHUB_ENV 177 | echo 'EOF' >> $GITHUB_ENV 178 | 179 | - name: "Comment on PR" 180 | id: comment-diff 181 | uses: carpentries/actions/comment-diff@main 182 | with: 183 | pr: ${{ env.NR }} 184 | body: ${{ env.body }} 185 | -------------------------------------------------------------------------------- /.github/workflows/pr-post-remove-branch.yaml: -------------------------------------------------------------------------------- 1 | name: "Bot: Remove Temporary PR Branch" 2 | 3 | on: 4 | workflow_run: 5 | workflows: ["Bot: Send Close Pull Request Signal"] 6 | types: 7 | - completed 8 | 9 | jobs: 10 | delete: 11 | name: "Delete branch from Pull Request" 12 | runs-on: ubuntu-22.04 13 | if: > 14 | github.event.workflow_run.event == 'pull_request' && 15 | github.event.workflow_run.conclusion == 'success' 16 | permissions: 17 | contents: write 18 | steps: 19 | - name: 'Download artifact' 20 | uses: carpentries/actions/download-workflow-artifact@main 21 | with: 22 | run: ${{ github.event.workflow_run.id }} 23 | name: pr 24 | - name: "Get PR Number" 25 | id: get-pr 26 | run: | 27 | unzip pr.zip 28 | echo "NUM=$(<./NUM)" >> $GITHUB_OUTPUT 29 | - name: 'Remove branch' 30 | uses: carpentries/actions/remove-branch@main 31 | with: 32 | pr: ${{ steps.get-pr.outputs.NUM }} 33 | -------------------------------------------------------------------------------- /.github/workflows/pr-preflight.yaml: -------------------------------------------------------------------------------- 1 | name: "Pull Request Preflight Check" 2 | 3 | on: 4 | pull_request_target: 5 | branches: 6 | ["main"] 7 | types: 8 | ["opened", "synchronize", "reopened"] 9 | 10 | jobs: 11 | test-pr: 12 | name: "Test if pull request is valid" 13 | if: ${{ github.event.action != 'closed' }} 14 | runs-on: ubuntu-22.04 15 | outputs: 16 | is_valid: ${{ steps.check-pr.outputs.VALID }} 17 | permissions: 18 | pull-requests: write 19 | steps: 20 | - name: "Get Invalid Hashes File" 21 | id: hash 22 | run: | 23 | echo "json<> $GITHUB_OUTPUT 26 | - name: "Check PR" 27 | id: check-pr 28 | uses: carpentries/actions/check-valid-pr@main 29 | with: 30 | pr: ${{ github.event.number }} 31 | invalid: ${{ fromJSON(steps.hash.outputs.json)[github.repository] }} 32 | fail_on_error: true 33 | - name: "Comment result of validation" 34 | id: comment-diff 35 | if: ${{ always() }} 36 | uses: carpentries/actions/comment-diff@main 37 | with: 38 | pr: ${{ github.event.number }} 39 | body: ${{ steps.check-pr.outputs.MSG }} 40 | -------------------------------------------------------------------------------- /.github/workflows/pr-receive.yaml: -------------------------------------------------------------------------------- 1 | name: "Receive Pull Request" 2 | 3 | on: 4 | pull_request: 5 | types: 6 | [opened, synchronize, reopened] 7 | 8 | concurrency: 9 | group: ${{ github.ref }} 10 | cancel-in-progress: true 11 | 12 | jobs: 13 | test-pr: 14 | name: "Record PR number" 15 | if: ${{ github.event.action != 'closed' }} 16 | runs-on: ubuntu-22.04 17 | outputs: 18 | is_valid: ${{ steps.check-pr.outputs.VALID }} 19 | steps: 20 | - name: "Record PR number" 21 | id: record 22 | if: ${{ always() }} 23 | run: | 24 | echo ${{ github.event.number }} > ${{ github.workspace }}/NR # 2022-03-02: artifact name fixed to be NR 25 | - name: "Upload PR number" 26 | id: upload 27 | if: ${{ always() }} 28 | uses: actions/upload-artifact@v4 29 | with: 30 | name: pr 31 | path: ${{ github.workspace }}/NR 32 | - name: "Get Invalid Hashes File" 33 | id: hash 34 | run: | 35 | echo "json<> $GITHUB_OUTPUT 38 | - name: "echo output" 39 | run: | 40 | echo "${{ steps.hash.outputs.json }}" 41 | - name: "Check PR" 42 | id: check-pr 43 | uses: carpentries/actions/check-valid-pr@main 44 | with: 45 | pr: ${{ github.event.number }} 46 | invalid: ${{ fromJSON(steps.hash.outputs.json)[github.repository] }} 47 | 48 | build-md-source: 49 | name: "Build markdown source files if valid" 50 | needs: test-pr 51 | runs-on: ubuntu-22.04 52 | if: ${{ needs.test-pr.outputs.is_valid == 'true' }} 53 | env: 54 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 55 | RENV_PATHS_ROOT: ~/.local/share/renv/ 56 | CHIVE: ${{ github.workspace }}/site/chive 57 | PR: ${{ github.workspace }}/site/pr 58 | MD: ${{ github.workspace }}/site/built 59 | steps: 60 | - name: "Check Out Main Branch" 61 | uses: actions/checkout@v4 62 | 63 | - name: "Check Out Staging Branch" 64 | uses: actions/checkout@v4 65 | with: 66 | ref: md-outputs 67 | path: ${{ env.MD }} 68 | 69 | - name: "Set up R" 70 | uses: r-lib/actions/setup-r@v2 71 | with: 72 | use-public-rspm: true 73 | install-r: false 74 | 75 | - name: "Set up Pandoc" 76 | uses: r-lib/actions/setup-pandoc@v2 77 | 78 | - name: "Setup Lesson Engine" 79 | uses: carpentries/actions/setup-sandpaper@main 80 | with: 81 | cache-version: ${{ secrets.CACHE_VERSION }} 82 | 83 | - name: "Setup Package Cache" 84 | uses: carpentries/actions/setup-lesson-deps@main 85 | with: 86 | cache-version: ${{ secrets.CACHE_VERSION }} 87 | 88 | - name: "Validate and Build Markdown" 89 | id: build-site 90 | run: | 91 | sandpaper::package_cache_trigger(TRUE) 92 | sandpaper::validate_lesson(path = '${{ github.workspace }}') 93 | sandpaper:::build_markdown(path = '${{ github.workspace }}', quiet = FALSE) 94 | shell: Rscript {0} 95 | 96 | - name: "Generate Artifacts" 97 | id: generate-artifacts 98 | run: | 99 | sandpaper:::ci_bundle_pr_artifacts( 100 | repo = '${{ github.repository }}', 101 | pr_number = '${{ github.event.number }}', 102 | path_md = '${{ env.MD }}', 103 | path_pr = '${{ env.PR }}', 104 | path_archive = '${{ env.CHIVE }}', 105 | branch = 'md-outputs' 106 | ) 107 | shell: Rscript {0} 108 | 109 | - name: "Upload PR" 110 | uses: actions/upload-artifact@v4 111 | with: 112 | name: pr 113 | path: ${{ env.PR }} 114 | overwrite: true 115 | 116 | - name: "Upload Diff" 117 | uses: actions/upload-artifact@v4 118 | with: 119 | name: diff 120 | path: ${{ env.CHIVE }} 121 | retention-days: 1 122 | 123 | - name: "Upload Build" 124 | uses: actions/upload-artifact@v4 125 | with: 126 | name: built 127 | path: ${{ env.MD }} 128 | retention-days: 1 129 | 130 | - name: "Teardown" 131 | run: sandpaper::reset_site() 132 | shell: Rscript {0} 133 | -------------------------------------------------------------------------------- /.github/workflows/sandpaper-main.yaml: -------------------------------------------------------------------------------- 1 | name: "01 Build and Deploy Site" 2 | 3 | on: 4 | push: 5 | branches: 6 | - main 7 | - master 8 | schedule: 9 | - cron: '0 0 * * 2' 10 | workflow_dispatch: 11 | inputs: 12 | name: 13 | description: 'Who triggered this build?' 14 | required: true 15 | default: 'Maintainer (via GitHub)' 16 | reset: 17 | description: 'Reset cached markdown files' 18 | required: false 19 | default: false 20 | type: boolean 21 | jobs: 22 | full-build: 23 | name: "Build Full Site" 24 | 25 | # 2024-10-01: ubuntu-latest is now 24.04 and R is not installed by default in the runner image 26 | # pin to 22.04 for now 27 | runs-on: ubuntu-22.04 28 | permissions: 29 | checks: write 30 | contents: write 31 | pages: write 32 | env: 33 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 34 | RENV_PATHS_ROOT: ~/.local/share/renv/ 35 | steps: 36 | 37 | - name: "Checkout Lesson" 38 | uses: actions/checkout@v4 39 | 40 | - name: "Set up R" 41 | uses: r-lib/actions/setup-r@v2 42 | with: 43 | use-public-rspm: true 44 | install-r: false 45 | 46 | - name: "Set up Pandoc" 47 | uses: r-lib/actions/setup-pandoc@v2 48 | 49 | - name: "Setup Lesson Engine" 50 | uses: carpentries/actions/setup-sandpaper@main 51 | with: 52 | cache-version: ${{ secrets.CACHE_VERSION }} 53 | 54 | - name: "Setup Package Cache" 55 | uses: carpentries/actions/setup-lesson-deps@main 56 | with: 57 | cache-version: ${{ secrets.CACHE_VERSION }} 58 | 59 | - name: "Deploy Site" 60 | run: | 61 | reset <- "${{ github.event.inputs.reset }}" == "true" 62 | sandpaper::package_cache_trigger(TRUE) 63 | sandpaper:::ci_deploy(reset = reset) 64 | shell: Rscript {0} 65 | -------------------------------------------------------------------------------- /.github/workflows/sandpaper-version.txt: -------------------------------------------------------------------------------- 1 | 0.16.11 2 | -------------------------------------------------------------------------------- /.github/workflows/update-cache.yaml: -------------------------------------------------------------------------------- 1 | name: "03 Maintain: Update Package Cache" 2 | 3 | on: 4 | workflow_dispatch: 5 | inputs: 6 | name: 7 | description: 'Who triggered this build (enter github username to tag yourself)?' 8 | required: true 9 | default: 'monthly run' 10 | schedule: 11 | # Run every tuesday 12 | - cron: '0 0 * * 2' 13 | 14 | jobs: 15 | preflight: 16 | name: "Preflight Check" 17 | runs-on: ubuntu-22.04 18 | outputs: 19 | ok: ${{ steps.check.outputs.ok }} 20 | steps: 21 | - id: check 22 | run: | 23 | if [[ ${{ github.event_name }} == 'workflow_dispatch' ]]; then 24 | echo "ok=true" >> $GITHUB_OUTPUT 25 | echo "Running on request" 26 | # using single brackets here to avoid 08 being interpreted as octal 27 | # https://github.com/carpentries/sandpaper/issues/250 28 | elif [ `date +%d` -le 7 ]; then 29 | # If the Tuesday lands in the first week of the month, run it 30 | echo "ok=true" >> $GITHUB_OUTPUT 31 | echo "Running on schedule" 32 | else 33 | echo "ok=false" >> $GITHUB_OUTPUT 34 | echo "Not Running Today" 35 | fi 36 | 37 | check_renv: 38 | name: "Check if We Need {renv}" 39 | runs-on: ubuntu-22.04 40 | needs: preflight 41 | if: ${{ needs.preflight.outputs.ok == 'true'}} 42 | outputs: 43 | needed: ${{ steps.renv.outputs.exists }} 44 | steps: 45 | - name: "Checkout Lesson" 46 | uses: actions/checkout@v4 47 | - id: renv 48 | run: | 49 | if [[ -d renv ]]; then 50 | echo "exists=true" >> $GITHUB_OUTPUT 51 | fi 52 | 53 | check_token: 54 | name: "Check SANDPAPER_WORKFLOW token" 55 | runs-on: ubuntu-22.04 56 | needs: check_renv 57 | if: ${{ needs.check_renv.outputs.needed == 'true' }} 58 | outputs: 59 | workflow: ${{ steps.validate.outputs.wf }} 60 | repo: ${{ steps.validate.outputs.repo }} 61 | steps: 62 | - name: "validate token" 63 | id: validate 64 | uses: carpentries/actions/check-valid-credentials@main 65 | with: 66 | token: ${{ secrets.SANDPAPER_WORKFLOW }} 67 | 68 | update_cache: 69 | name: "Update Package Cache" 70 | needs: check_token 71 | if: ${{ needs.check_token.outputs.repo== 'true' }} 72 | runs-on: ubuntu-22.04 73 | env: 74 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 75 | RENV_PATHS_ROOT: ~/.local/share/renv/ 76 | steps: 77 | 78 | - name: "Checkout Lesson" 79 | uses: actions/checkout@v4 80 | 81 | - name: "Set up R" 82 | uses: r-lib/actions/setup-r@v2 83 | with: 84 | use-public-rspm: true 85 | install-r: false 86 | 87 | - name: "Update {renv} deps and determine if a PR is needed" 88 | id: update 89 | uses: carpentries/actions/update-lockfile@main 90 | with: 91 | cache-version: ${{ secrets.CACHE_VERSION }} 92 | 93 | - name: Create Pull Request 94 | id: cpr 95 | if: ${{ steps.update.outputs.n > 0 }} 96 | uses: carpentries/create-pull-request@main 97 | with: 98 | token: ${{ secrets.SANDPAPER_WORKFLOW }} 99 | delete-branch: true 100 | branch: "update/packages" 101 | commit-message: "[actions] update ${{ steps.update.outputs.n }} packages" 102 | title: "Update ${{ steps.update.outputs.n }} packages" 103 | body: | 104 | :robot: This is an automated build 105 | 106 | This will update ${{ steps.update.outputs.n }} packages in your lesson with the following versions: 107 | 108 | ``` 109 | ${{ steps.update.outputs.report }} 110 | ``` 111 | 112 | :stopwatch: In a few minutes, a comment will appear that will show you how the output has changed based on these updates. 113 | 114 | If you want to inspect these changes locally, you can use the following code to check out a new branch: 115 | 116 | ```bash 117 | git fetch origin update/packages 118 | git checkout update/packages 119 | ``` 120 | 121 | - Auto-generated by [create-pull-request][1] on ${{ steps.update.outputs.date }} 122 | 123 | [1]: https://github.com/carpentries/create-pull-request/tree/main 124 | labels: "type: package cache" 125 | draft: false 126 | -------------------------------------------------------------------------------- /.github/workflows/update-workflows.yaml: -------------------------------------------------------------------------------- 1 | name: "02 Maintain: Update Workflow Files" 2 | 3 | on: 4 | workflow_dispatch: 5 | inputs: 6 | name: 7 | description: 'Who triggered this build (enter github username to tag yourself)?' 8 | required: true 9 | default: 'weekly run' 10 | clean: 11 | description: 'Workflow files/file extensions to clean (no wildcards, enter "" for none)' 12 | required: false 13 | default: '.yaml' 14 | schedule: 15 | # Run every Tuesday 16 | - cron: '0 0 * * 2' 17 | 18 | jobs: 19 | check_token: 20 | name: "Check SANDPAPER_WORKFLOW token" 21 | runs-on: ubuntu-22.04 22 | outputs: 23 | workflow: ${{ steps.validate.outputs.wf }} 24 | repo: ${{ steps.validate.outputs.repo }} 25 | steps: 26 | - name: "validate token" 27 | id: validate 28 | uses: carpentries/actions/check-valid-credentials@main 29 | with: 30 | token: ${{ secrets.SANDPAPER_WORKFLOW }} 31 | 32 | update_workflow: 33 | name: "Update Workflow" 34 | runs-on: ubuntu-22.04 35 | needs: check_token 36 | if: ${{ needs.check_token.outputs.workflow == 'true' }} 37 | steps: 38 | - name: "Checkout Repository" 39 | uses: actions/checkout@v4 40 | 41 | - name: Update Workflows 42 | id: update 43 | uses: carpentries/actions/update-workflows@main 44 | with: 45 | clean: ${{ github.event.inputs.clean }} 46 | 47 | - name: Create Pull Request 48 | id: cpr 49 | if: "${{ steps.update.outputs.new }}" 50 | uses: carpentries/create-pull-request@main 51 | with: 52 | token: ${{ secrets.SANDPAPER_WORKFLOW }} 53 | delete-branch: true 54 | branch: "update/workflows" 55 | commit-message: "[actions] update sandpaper workflow to version ${{ steps.update.outputs.new }}" 56 | title: "Update Workflows to Version ${{ steps.update.outputs.new }}" 57 | body: | 58 | :robot: This is an automated build 59 | 60 | Update Workflows from sandpaper version ${{ steps.update.outputs.old }} -> ${{ steps.update.outputs.new }} 61 | 62 | - Auto-generated by [create-pull-request][1] on ${{ steps.update.outputs.date }} 63 | 64 | [1]: https://github.com/carpentries/create-pull-request/tree/main 65 | labels: "type: template and tools" 66 | draft: false 67 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # sandpaper files 2 | episodes/*html 3 | site/* 4 | !site/README.md 5 | 6 | # History files 7 | .Rhistory 8 | .Rapp.history 9 | 10 | # Session Data files 11 | .RData 12 | 13 | # User-specific files 14 | .Ruserdata 15 | 16 | # Example code in package build process 17 | *-Ex.R 18 | 19 | # Output files from R CMD build 20 | /*.tar.gz 21 | 22 | # Output files from R CMD check 23 | /*.Rcheck/ 24 | 25 | # RStudio files 26 | .Rproj.user/ 27 | 28 | # produced vignettes 29 | vignettes/*.html 30 | vignettes/*.pdf 31 | 32 | # OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3 33 | .httr-oauth 34 | 35 | # knitr and R markdown default cache directories 36 | *_cache/ 37 | /cache/ 38 | 39 | # Temporary files created by R markdown 40 | *.utf8.md 41 | *.knit.md 42 | 43 | # R Environment Variables 44 | .Renviron 45 | 46 | # pkgdown site 47 | docs/ 48 | 49 | # translation temp files 50 | po/*~ 51 | 52 | # renv detritus 53 | renv/sandbox/ 54 | 55 | # vscode settings 56 | .vscode 57 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Contributor Code of Conduct" 3 | --- 4 | 5 | As contributors and maintainers of this project, 6 | we pledge to follow the [The Carpentries Code of Conduct][coc]. 7 | 8 | Instances of abusive, harassing, or otherwise unacceptable behavior 9 | may be reported by following our [reporting guidelines][coc-reporting]. 10 | 11 | 12 | [coc-reporting]: https://docs.carpentries.org/topic_folders/policies/incident-reporting.html 13 | [coc]: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html 14 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | ## Contributing 2 | 3 | [The Carpentries][cp-site] ([Software Carpentry][swc-site], [Data 4 | Carpentry][dc-site], and [Library Carpentry][lc-site]) are open source 5 | projects, and we welcome contributions of all kinds: new lessons, fixes to 6 | existing material, bug reports, and reviews of proposed changes are all 7 | welcome. 8 | 9 | ### Contributor Agreement 10 | 11 | By contributing, you agree that we may redistribute your work under [our 12 | license](LICENSE.md). In exchange, we will address your issues and/or assess 13 | your change proposal as promptly as we can, and help you become a member of our 14 | community. Everyone involved in [The Carpentries][cp-site] agrees to abide by 15 | our [code of conduct](CODE_OF_CONDUCT.md). 16 | 17 | ### How to Contribute 18 | 19 | The easiest way to get started is to file an issue to tell us about a spelling 20 | mistake, some awkward wording, or a factual error. This is a good way to 21 | introduce yourself and to meet some of our community members. 22 | 23 | 1. If you do not have a [GitHub][github] account, you can [send us comments by 24 | email][contact]. However, we will be able to respond more quickly if you use 25 | one of the other methods described below. 26 | 27 | 2. If you have a [GitHub][github] account, or are willing to [create 28 | one][github-join], but do not know how to use Git, you can report problems 29 | or suggest improvements by [creating an issue][issues]. This allows us to 30 | assign the item to someone and to respond to it in a threaded discussion. 31 | 32 | 3. If you are comfortable with Git, and would like to add or change material, 33 | you can submit a pull request (PR). Instructions for doing this are 34 | [included below](#using-github). 35 | 36 | Note: if you want to build the website locally, please refer to [The Workbench 37 | documentation][template-doc]. 38 | 39 | ### Where to Contribute 40 | 41 | 1. If you wish to change this lesson, add issues and pull requests here. 42 | 2. If you wish to change the template used for workshop websites, please refer 43 | to [The Workbench documentation][template-doc]. 44 | 45 | 46 | ### What to Contribute 47 | 48 | There are many ways to contribute, from writing new exercises and improving 49 | existing ones to updating or filling in the documentation and submitting [bug 50 | reports][issues] about things that do not work, are not clear, or are missing. 51 | If you are looking for ideas, please see [the list of issues for this 52 | repository][repo], or the issues for [Data Carpentry][dc-issues], [Library 53 | Carpentry][lc-issues], and [Software Carpentry][swc-issues] projects. 54 | 55 | Comments on issues and reviews of pull requests are just as welcome: we are 56 | smarter together than we are on our own. **Reviews from novices and newcomers 57 | are particularly valuable**: it's easy for people who have been using these 58 | lessons for a while to forget how impenetrable some of this material can be, so 59 | fresh eyes are always welcome. 60 | 61 | ### What *Not* to Contribute 62 | 63 | Our lessons already contain more material than we can cover in a typical 64 | workshop, so we are usually *not* looking for more concepts or tools to add to 65 | them. As a rule, if you want to introduce a new idea, you must (a) estimate how 66 | long it will take to teach and (b) explain what you would take out to make room 67 | for it. The first encourages contributors to be honest about requirements; the 68 | second, to think hard about priorities. 69 | 70 | We are also not looking for exercises or other material that only run on one 71 | platform. Our workshops typically contain a mixture of Windows, macOS, and 72 | Linux users; in order to be usable, our lessons must run equally well on all 73 | three. 74 | 75 | ### Using GitHub 76 | 77 | If you choose to contribute via GitHub, you may want to look at [How to 78 | Contribute to an Open Source Project on GitHub][how-contribute]. In brief, we 79 | use [GitHub flow][github-flow] to manage changes: 80 | 81 | 1. Create a new branch in your desktop copy of this repository for each 82 | significant change. 83 | 2. Commit the change in that branch. 84 | 3. Push that branch to your fork of this repository on GitHub. 85 | 4. Submit a pull request from that branch to the [upstream repository][repo]. 86 | 5. If you receive feedback, make changes on your desktop and push to your 87 | branch on GitHub: the pull request will update automatically. 88 | 89 | NB: The published copy of the lesson is usually in the `main` branch. 90 | 91 | Each lesson has a team of maintainers who review issues and pull requests or 92 | encourage others to do so. The maintainers are community volunteers, and have 93 | final say over what gets merged into the lesson. 94 | 95 | ### Other Resources 96 | 97 | The Carpentries is a global organisation with volunteers and learners all over 98 | the world. We share values of inclusivity and a passion for sharing knowledge, 99 | teaching and learning. There are several ways to connect with The Carpentries 100 | community listed at including via social 101 | media, slack, newsletters, and email lists. You can also [reach us by 102 | email][contact]. 103 | 104 | [repo]: https://github.com/joelnitta/targets-workshop 105 | [contact]: mailto:team@carpentries.org 106 | [cp-site]: https://carpentries.org/ 107 | [dc-issues]: https://github.com/issues?q=user%3Adatacarpentry 108 | [dc-lessons]: https://datacarpentry.org/lessons/ 109 | [dc-site]: https://datacarpentry.org/ 110 | [discuss-list]: https://lists.software-carpentry.org/listinfo/discuss 111 | [github]: https://github.com 112 | [github-flow]: https://guides.github.com/introduction/flow/ 113 | [github-join]: https://github.com/join 114 | [how-contribute]: https://egghead.io/series/how-to-contribute-to-an-open-source-project-on-github 115 | [issues]: https://carpentries.org/help-wanted-issues/ 116 | [lc-issues]: https://github.com/issues?q=user%3ALibraryCarpentry 117 | [swc-issues]: https://github.com/issues?q=user%3Aswcarpentry 118 | [swc-lessons]: https://software-carpentry.org/lessons/ 119 | [swc-site]: https://software-carpentry.org/ 120 | [lc-site]: https://librarycarpentry.org/ 121 | [template-doc]: https://carpentries.github.io/workbench/ 122 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Licenses" 3 | --- 4 | 5 | ## Instructional Material 6 | 7 | All Carpentries (Software Carpentry, Data Carpentry, and Library Carpentry) 8 | instructional material is made available under the [Creative Commons 9 | Attribution license][cc-by-human]. The following is a human-readable summary of 10 | (and not a substitute for) the [full legal text of the CC BY 4.0 11 | license][cc-by-legal]. 12 | 13 | You are free: 14 | 15 | - to **Share**---copy and redistribute the material in any medium or format 16 | - to **Adapt**---remix, transform, and build upon the material 17 | 18 | for any purpose, even commercially. 19 | 20 | The licensor cannot revoke these freedoms as long as you follow the license 21 | terms. 22 | 23 | Under the following terms: 24 | 25 | - **Attribution**---You must give appropriate credit (mentioning that your work 26 | is derived from work that is Copyright (c) The Carpentries and, where 27 | practical, linking to ), provide a [link to the 28 | license][cc-by-human], and indicate if changes were made. You may do so in 29 | any reasonable manner, but not in any way that suggests the licensor endorses 30 | you or your use. 31 | 32 | - **No additional restrictions**---You may not apply legal terms or 33 | technological measures that legally restrict others from doing anything the 34 | license permits. With the understanding that: 35 | 36 | Notices: 37 | 38 | * You do not have to comply with the license for elements of the material in 39 | the public domain or where your use is permitted by an applicable exception 40 | or limitation. 41 | * No warranties are given. The license may not give you all of the permissions 42 | necessary for your intended use. For example, other rights such as publicity, 43 | privacy, or moral rights may limit how you use the material. 44 | 45 | ## Software 46 | 47 | Except where otherwise noted, the example programs and other software provided 48 | by The Carpentries are made available under the [OSI][osi]-approved [MIT 49 | license][mit-license]. 50 | 51 | Permission is hereby granted, free of charge, to any person obtaining a copy of 52 | this software and associated documentation files (the "Software"), to deal in 53 | the Software without restriction, including without limitation the rights to 54 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 55 | of the Software, and to permit persons to whom the Software is furnished to do 56 | so, subject to the following conditions: 57 | 58 | The above copyright notice and this permission notice shall be included in all 59 | copies or substantial portions of the Software. 60 | 61 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 62 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 63 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 64 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 65 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 66 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 67 | SOFTWARE. 68 | 69 | ## Trademark 70 | 71 | "The Carpentries", "Software Carpentry", "Data Carpentry", and "Library 72 | Carpentry" and their respective logos are registered trademarks of [Community 73 | Initiatives][ci]. 74 | 75 | [cc-by-human]: https://creativecommons.org/licenses/by/4.0/ 76 | [cc-by-legal]: https://creativecommons.org/licenses/by/4.0/legalcode 77 | [mit-license]: https://opensource.org/licenses/mit-license.html 78 | [ci]: https://communityin.org/ 79 | [osi]: https://opensource.org 80 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Introduction to the "targets" Package for Reproducible Data Analysis in R 2 | 3 | This is a pre-alpha lesson about the [targets](https://github.com/ropensci/targets) R package built using [The Carpentries Workbench][workbench]. 4 | 5 | The lesson website is here: https://carpentries-incubator.github.io/targets-workshop/ 6 | 7 | [workbench]: https://carpentries.github.io/sandpaper-docs/ 8 | 9 | Materials licensed under [CC-BY 4.0](LICENSE.md) by the authors 10 | -------------------------------------------------------------------------------- /config.yaml: -------------------------------------------------------------------------------- 1 | #------------------------------------------------------------ 2 | # Values for this lesson. 3 | #------------------------------------------------------------ 4 | 5 | # Which carpentry is this (swc, dc, lc, or cp)? 6 | # swc: Software Carpentry 7 | # dc: Data Carpentry 8 | # lc: Library Carpentry 9 | # cp: Carpentries (to use for instructor training for instance) 10 | # incubator: The Carpentries Incubator 11 | carpentry: 'incubator' 12 | 13 | # Overall title for pages. 14 | title: 'Introduction to targets' 15 | 16 | # Date the lesson was created (YYYY-MM-DD, this is empty by default) 17 | created: ~ 18 | 19 | # Comma-separated list of keywords for the lesson 20 | keywords: 'reproducibility, data, targets, R' 21 | 22 | # Life cycle stage of the lesson 23 | # possible values: pre-alpha, alpha, beta, stable 24 | life_cycle: 'pre-alpha' 25 | 26 | # License of the lesson 27 | license: 'CC-BY 4.0' 28 | 29 | # Link to the source repository for this lesson 30 | source: 'https://github.com/carpentries-incubator/targets-workshop' 31 | 32 | # Default branch of your lesson 33 | branch: 'main' 34 | 35 | # Who to contact if there are any issues 36 | contact: 'joelnitta@gmail.com' 37 | 38 | # Navigation ------------------------------------------------ 39 | # 40 | # Use the following menu items to specify the order of 41 | # individual pages in each dropdown section. Leave blank to 42 | # include all pages in the folder. 43 | # 44 | # Example ------------- 45 | # 46 | # episodes: 47 | # - introduction.md 48 | # - first-steps.md 49 | # 50 | # learners: 51 | # - setup.md 52 | # 53 | # instructors: 54 | # - instructor-notes.md 55 | # 56 | # profiles: 57 | # - one-learner.md 58 | # - another-learner.md 59 | 60 | # Order of episodes in your lesson 61 | episodes: 62 | - introduction.Rmd 63 | - basic-targets.Rmd 64 | - functions.Rmd 65 | - cache.Rmd 66 | - lifecycle.Rmd 67 | - organization.Rmd 68 | - packages.Rmd 69 | - files.Rmd 70 | - branch.Rmd 71 | - parallel.Rmd 72 | - quarto.Rmd 73 | 74 | # Information for Learners 75 | learners: 76 | 77 | # Information for Instructors 78 | instructors: 79 | 80 | # Learner Profiles 81 | profiles: 82 | 83 | # Customisation --------------------------------------------- 84 | # 85 | # This space below is where custom yaml items (e.g. pinning 86 | # sandpaper and varnish versions) should live 87 | 88 | 89 | -------------------------------------------------------------------------------- /episodes/basic-targets.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'First targets Workflow' 3 | teaching: 30 4 | exercises: 10 5 | --- 6 | 7 | :::::::::::::::::::::::::::::::::::::: questions 8 | 9 | - What are best practices for organizing analyses? 10 | - What is a `_targets.R` file for? 11 | - What is the content of the `_targets.R` file? 12 | - How do you run a workflow? 13 | 14 | :::::::::::::::::::::::::::::::::::::::::::::::: 15 | 16 | ::::::::::::::::::::::::::::::::::::: objectives 17 | 18 | - Create a project in RStudio 19 | - Explain the purpose of the `_targets.R` file 20 | - Write a basic `_targets.R` file 21 | - Use a `_targets.R` file to run a workflow 22 | 23 | :::::::::::::::::::::::::::::::::::::::::::::::: 24 | 25 | ::::::::::::::::::::::::::::::::::::: {.instructor} 26 | 27 | Episode summary: First chance to get hands dirty by writing a very simple workflow 28 | 29 | ::::::::::::::::::::::::::::::::::::: 30 | 31 | ```{r} 32 | #| label: setup 33 | #| echo: FALSE 34 | #| message: FALSE 35 | #| warning: FALSE 36 | library(targets) 37 | 38 | if (interactive()) { 39 | setwd("episodes") 40 | } 41 | 42 | source("files/lesson_functions.R") 43 | ``` 44 | 45 | ## Create a project 46 | 47 | ### About projects 48 | 49 | `targets` uses the "project" concept for organizing analyses: all of the files needed for a given project are put in a single folder, the project folder. 50 | The project folder has additional subfolders for organization, such as folders for data, code, and results. 51 | 52 | By using projects, it makes it straightforward to re-orient yourself if you return to an analysis after time spent elsewhere. 53 | This wouldn't be a problem if we only ever work on one thing at a time until completion, but that is almost never the case. 54 | It is hard to remember what you were doing when you come back to a project after working on something else (a phenomenon called "context switching"). 55 | By using a standardized organization system, you will reduce confusion and lost time... in other words, you are increasing reproducibility! 56 | 57 | This workshop will use RStudio, since it also works well with the project organization concept. 58 | 59 | ### Create a project in RStudio 60 | 61 | Let's start a new project using RStudio. 62 | 63 | Click "File", then select "New Project". 64 | 65 | This will open the New Project Wizard, a set of menus to help you set up the project. 66 | 67 | ![The New Project Wizard](fig/basic-rstudio-wizard.png){alt="Screenshot of RStudio New Project Wizard menu"} 68 | 69 | In the Wizard, click the first option, "New Directory", since we are making a brand-new project from scratch. 70 | Click "New Project" in the next menu. 71 | In "Directory name", enter a name that helps you remember the purpose of the project, such as "targets-demo" (follow best practices for naming files and folders). 72 | Under "Create project as a subdirectory of...", click the "Browse" button to select a directory to put the project. 73 | We recommend putting it on your Desktop so you can easily find it. 74 | 75 | You can leave "Create a git repository" and "Use renv with this project" unchecked, but these are both excellent tools to improve reproducibility, and you should consider learning them and using them in the future, if you don't already. 76 | They can be enabled at any later time, so you don't need to worry about trying to use them immediately. 77 | 78 | Once you work through these steps, your RStudio session should look like this: 79 | 80 | ![Your newly created project](fig/basic-rstudio-project.png){alt="Screenshot of RStudio with a newly created project called 'targets-demo' open containing a single file, 'targets-demo.Rproj'"} 81 | 82 | Our project now contains a single file, created by RStudio: `targets-demo.Rproj`. You should not edit this file by hand. Its purpose is to tell RStudio that this is a project folder and to store some RStudio settings (if you use version-control software, it is OK to commit this file). Also, you can open the project by double clicking on the `.Rproj` file in your file explorer (try it by quitting RStudio then navigating in your file browser to your Desktop, opening the "targets-demo" folder, and double clicking `targets-demo.Rproj`). 83 | 84 | OK, now that our project is set up, we are (almost) ready to start using `targets`! 85 | 86 | ## Background: non-`targets` version 87 | 88 | First though, to get familiar with the functions and packages we'll use, let's run the code like you would in a "normal" R script without using `targets`. 89 | 90 | Recall that we are using the `palmerpenguins` R package to obtain the data. 91 | This package actually includes two variations of the dataset: one is an external CSV file with the raw data, and another is the cleaned data loaded into R. 92 | In real life you are probably have externally stored raw data, so **let's use the raw penguin data** as the starting point for our analysis too. 93 | 94 | The `path_to_file()` function in `palmerpenguins` provides the path to the raw data CSV file (it is inside the `palmerpenguins` R package source code that you downloaded to your computer when you installed the package). 95 | 96 | ```{r} 97 | #| label: normal-r-path 98 | library(palmerpenguins) 99 | 100 | # Get path to CSV file 101 | penguins_csv_file <- path_to_file("penguins_raw.csv") 102 | 103 | penguins_csv_file 104 | ``` 105 | 106 | We will use the `tidyverse` set of packages for loading and manipulating the data. We don't have time to cover all the details about using `tidyverse` now, but if you want to learn more about it, please see the ["Manipulating, analyzing and exporting data with tidyverse" lesson](https://datacarpentry.org/R-ecology-lesson/03-dplyr.html), or the Carpentry incubator lesson [R and the tidyverse for working with datasets](https://carpentries-incubator.github.io/r-tidyverse-4-datasets/). 107 | 108 | Let's load the data with `read_csv()`. 109 | 110 | ```{r} 111 | #| label: normal-r-load-show 112 | #| eval: false 113 | library(tidyverse) 114 | 115 | # Read CSV file into R 116 | penguins_data_raw <- read_csv(penguins_csv_file) 117 | 118 | penguins_data_raw 119 | ``` 120 | 121 | ```{r} 122 | #| label: normal-r-load-hide 123 | #| echo: false 124 | suppressPackageStartupMessages(library(tidyverse)) 125 | 126 | # Read CSV file into R 127 | penguins_data_raw <- read_csv(penguins_csv_file) 128 | 129 | penguins_data_raw 130 | ``` 131 | 132 | We see the raw data has some awkward column names with spaces (these are hard to type out and can easily lead to mistakes in the code), and far more columns than we need. 133 | For the purposes of this analysis, we only need species name, bill length, and bill depth. 134 | In the raw data, the rather technical term "culmen" is used to refer to the bill. 135 | 136 | ![Illustration of bill (culmen) length and depth. Artwork by @allison_horst.](https://allisonhorst.github.io/palmerpenguins/reference/figures/culmen_depth.png) 137 | 138 | Let's clean up the data to make it easier to use for downstream analyses. 139 | We will also remove any rows with missing data, because this could cause errors for some functions later. 140 | 141 | ```{r} 142 | #| label: normal-r-clean 143 | 144 | # Clean up raw data 145 | penguins_data <- penguins_data_raw |> 146 | # Rename columns for easier typing and 147 | # subset to only the columns needed for analysis 148 | select( 149 | species = Species, 150 | bill_length_mm = `Culmen Length (mm)`, 151 | bill_depth_mm = `Culmen Depth (mm)` 152 | ) |> 153 | # Delete rows with missing data 154 | drop_na() 155 | 156 | penguins_data 157 | ``` 158 | 159 | We have not run the full analysis yet, but this is enough to get us started with the transition to using `targets`. 160 | 161 | ## `targets` version 162 | 163 | ### About the `_targets.R` file 164 | 165 | One major difference between a typical R data analysis and a `targets` project is that the latter must include a special file, called `_targets.R` in the main project folder (the "project root"). 166 | 167 | The `_targets.R` file includes the specification of the workflow: these are the directions for R to run your analysis, kind of like a recipe. 168 | By using the `_targets.R` file, **you won't have to remember to run specific scripts in a certain order**; instead, R will do it for you! 169 | This is a **huge win**, both for your future self and anybody else trying to reproduce your analysis. 170 | 171 | ### Writing the initial `_targets.R` file 172 | 173 | We will now start to write a `_targets.R` file. Fortunately, `targets` comes with a function to help us do this. 174 | 175 | In the R console, first load the `targets` package with `library(targets)`, then run the command `tar_script()`. 176 | 177 | ```{r} 178 | #| label: start-targets-show 179 | #| eval: FALSE 180 | library(targets) 181 | tar_script() 182 | ``` 183 | 184 | Nothing will happen in the console, but in the file viewer, you should see a new file, `_targets.R` appear. Open it using the File menu or by clicking on it. 185 | 186 | ```{r} 187 | #| label: start-targets-hide 188 | #| eval: true 189 | #| echo: false 190 | #| results: "asis" 191 | plan_0_dir <- make_tempdir() 192 | pushd(plan_0_dir) 193 | tar_script() 194 | default_script <- readr::read_lines("_targets.R") 195 | popd() 196 | 197 | cat("```{.r}\n") 198 | cat(default_script, sep = "\n") 199 | cat("```") 200 | ``` 201 | 202 | Don't worry about the details of this file. 203 | Instead, notice that that it includes three main parts: 204 | 205 | - Loading packages with `library()` 206 | - Defining a custom function with `function()` 207 | - Defining a list with `list()`. 208 | 209 | You may not have used `function()` before. 210 | If not, that's OK; we will cover this in more detail in the [next episode](episodes/functions.Rmd), so we will ignore it for now. 211 | 212 | The last part, the list, is the **most important part** of the `_targets.R` file. 213 | It defines the steps in the workflow. 214 | The `_targets.R` file **must always end with this list**. 215 | 216 | Furthermore, each item in the list is a call of the `tar_target()` function. 217 | The first argument of `tar_target()` is name of the target to build, and the second argument is the command used to build it. 218 | Note that the name of the target is **unquoted**, that is, it is written without any surrounding quotation marks. 219 | 220 | ## Modifying `_targets.R` to run the example analysis 221 | 222 | First, let's load all of the packages we need for our workflow. 223 | Add `library(tidyverse)` and `library(palmerpenguins)` to the top of `_targets.R` after `library(targets)`. 224 | 225 | Next, we can delete the `function()` statement since we won't be using that just yet (we will come back to custom functions soon!). 226 | 227 | The last, and trickiest, part is correctly defining the workflow in the list at the end of the file. 228 | 229 | From [the non-`targets` version](#background-non-targets-version), you can see we have three steps so far: 230 | 231 | 1. Define the path to the CSV file with the raw penguins data. 232 | 2. Read the CSV file. 233 | 3. Clean the raw data. 234 | 235 | Each of these will be one item in the list. 236 | Furthermore, we need to write each item using the `tar_target()` function. 237 | Recall that we write the `tar_target()` function by writing the **name of the target to build** first and the **command to build it** second. 238 | 239 | ::::::::::::::::::::::::::::::::::::: {.callout} 240 | 241 | ## Choosing good target names 242 | 243 | The name of each target could be anything you like, but it is strongly recommended to **choose names that reflect what the target actually contains**. 244 | 245 | For example, `penguins_data_raw` for the raw data loaded from the CSV file and not `x`. 246 | 247 | Your future self will thank you! 248 | 249 | :::::::::::::::::::::::::::::::::::::::::: 250 | 251 | ::::::::::::::::::::::::::::::::::::: {.challenge} 252 | 253 | ## Challenge: Use `tar_target()` 254 | 255 | Can you use `tar_target()` to define the first step in the workflow (setting the path to the CSV file with the penguins data)? 256 | 257 | :::::::::::::::::::::::::::::::::: {.solution} 258 | 259 | ```{r} 260 | #| label: challenge-solution-1 261 | #| eval: false 262 | tar_target(name = penguins_csv_file, command = path_to_file("penguins_raw.csv")) 263 | ``` 264 | 265 | The first two arguments of `tar_target()` are the **name** of the target, followed by the **command** to build it. 266 | 267 | These arguments are used so frequently we will typically omit the argument names, instead writing it like this: 268 | 269 | ```{r} 270 | #| label: challenge-solution-2 271 | #| eval: false 272 | tar_target(penguins_csv_file, path_to_file("penguins_raw.csv")) 273 | ``` 274 | 275 | :::::::::::::::::::::::::::::::::: 276 | 277 | :::::::::::::::::::::::::::::::::::::::::: 278 | 279 | Now that we've seen how to define the first target, let's continue and add the rest. 280 | 281 | Once you've done that, this is how `_targets.R` should look: 282 | 283 | ```{r} 284 | #| label = "targets-show-workflow", 285 | #| eval = FALSE, 286 | #| code = readLines("files/plans/plan_0.R")[2:22] 287 | ``` 288 | 289 | I have set `show_col_types = FALSE` in `read_csv()` because we know from the earlier code that the column types were set correctly by default (character for species and numeric for bill length and depth), so we don't need to see the warning it would otherwise issue. 290 | 291 | ## Run the workflow 292 | 293 | Now that we have a workflow, we can run it with the `tar_make()` function. 294 | Try running it, and you should see something like this: 295 | 296 | ```{r} 297 | #| label: targets-run 298 | #| eval: true 299 | #| echo: [3] 300 | pushd(make_tempdir()) 301 | write_example_plan("plan_0.R") 302 | tar_make() 303 | popd() 304 | ``` 305 | 306 | Congratulations, you've run your first workflow with `targets`! 307 | 308 | ::::::::::::::::::::::::::::::::::::: {.callout} 309 | 310 | ## The workflow cannot be run interactively 311 | 312 | You may be used to running R code interactively by selecting lines and pressing the "Run" button (or using the keyboard shortcut) in RStudio or your IDE of choice. 313 | 314 | You *could* run the list at the of `_targets.R` this way, but it will not execute the workflow (it will return a list instead). 315 | 316 | **The only way to run the workflow is with `tar_make()`.** 317 | 318 | You do not need to select and run anything interactively in `_targets.R`. 319 | In fact, you do not even need to have the `_targets.R` file open to run the workflow with `tar_make()`---try it for yourself! 320 | 321 | Similarly, **you must not write `tar_make()` in the `_targets.R` file**; you should only use `tar_make()` as a direct command at the R console. 322 | 323 | :::::::::::::::::::::::::::::::::::::::::: 324 | 325 | Remember, now that we are using `targets`, **the only thing you need to do to replicate your analysis is run `tar_make()`**. 326 | 327 | This is true no matter how long or complicated your analysis becomes. 328 | 329 | ::::::::::::::::::::::::::::::::::::: keypoints 330 | 331 | - Projects help keep our analyses organized so we can easily re-run them later 332 | - Use the RStudio Project Wizard to create projects 333 | - The `_targets.R` file is a special file that must be included in all `targets` projects, and defines the worklow 334 | - Use `tar_script()` to create a default `_targets.R` file 335 | - Use `tar_make()` to run the workflow 336 | 337 | :::::::::::::::::::::::::::::::::::::::::::::::: 338 | -------------------------------------------------------------------------------- /episodes/branch.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Branching' 3 | teaching: 30 4 | exercises: 2 5 | --- 6 | 7 | :::::::::::::::::::::::::::::::::::::: questions 8 | 9 | - How can we specify many targets without typing everything out? 10 | 11 | :::::::::::::::::::::::::::::::::::::::::::::::: 12 | 13 | ::::::::::::::::::::::::::::::::::::: objectives 14 | 15 | - Be able to specify targets using branching 16 | 17 | :::::::::::::::::::::::::::::::::::::::::::::::: 18 | 19 | ::::::::::::::::::::::::::::::::::::: instructor 20 | 21 | Episode summary: Show how to use branching 22 | 23 | ::::::::::::::::::::::::::::::::::::: 24 | 25 | ```{r} 26 | #| label: setup 27 | #| echo: FALSE 28 | #| message: FALSE 29 | #| warning: FALSE 30 | library(targets) 31 | library(tarchetypes) 32 | library(broom) 33 | 34 | # sandpaper renders this lesson from episodes/ 35 | # need to emulate this behavior during interactive development 36 | # would be preferable to use here::here() but it doesn't work for some reason 37 | if (interactive()) { 38 | setwd("episodes") 39 | } 40 | 41 | source("files/lesson_functions.R") 42 | 43 | # Increase width for printing tibbles 44 | options(width = 140) 45 | ``` 46 | 47 | ## Why branching? 48 | 49 | One of the major strengths of `targets` is the ability to define many targets from a single line of code ("branching"). 50 | This not only saves you typing, it also **reduces the risk of errors** since there is less chance of making a typo. 51 | 52 | ## Types of branching 53 | 54 | There are two types of branching, **dynamic branching** and **static branching**. 55 | "Branching" refers to the idea that you can provide a single specification for how to make targets (the "pattern"), and `targets` generates multiple targets from it ("branches"). 56 | "Dynamic" means that the branches that result from the pattern do not have to be defined ahead of time---they are a dynamic result of the code. 57 | 58 | In this workshop, we will only cover dynamic branching since it is generally easier to write (static branching requires use of [meta-programming](https://books.ropensci.org/targets/static.html#metaprogramming), an advanced topic). For more information about each and when you might want to use one or the other (or some combination of the two), [see the `targets` package manual](https://books.ropensci.org/targets/dynamic.html). 59 | 60 | ## Example without branching 61 | 62 | To see how this works, let's continue our analysis of the `palmerpenguins` dataset. 63 | 64 | **Our hypothesis is that bill depth decreases with bill length.** 65 | We will test this hypothesis with a linear model. 66 | 67 | For example, this is a model of bill depth dependent on bill length: 68 | 69 | ```{r} 70 | #| label: example-lm 71 | #| eval: FALSE 72 | lm(bill_depth_mm ~ bill_length_mm, data = penguins_data) 73 | ``` 74 | 75 | We can add this to our pipeline. We will call it the `combined_model` because it combines all the species together without distinction: 76 | 77 | ```{r} 78 | #| label = "example-lm-pipeline-show", 79 | #| eval = FALSE, 80 | #| code = readLines("files/plans/plan_4.R")[2:19] 81 | ``` 82 | 83 | ```{r} 84 | #| label: example-lm-pipeline-hide 85 | #| echo: false 86 | plan_4_dir <- make_tempdir() 87 | pushd(plan_4_dir) 88 | write_example_plan("plan_3.R") 89 | tar_make(reporter = "silent") 90 | write_example_plan("plan_4.R") 91 | tar_make() 92 | popd() 93 | ``` 94 | 95 | Let's have a look at the model. We will use the `glance()` function from the `broom` package. Unlike base R `summary()`, this function returns output as a tibble (the tidyverse equivalent of a dataframe), which as we will see later is quite useful for downstream analyses. 96 | 97 | ```{r} 98 | #| label: example-lm-pipeline-inspect-show 99 | #| eval: true 100 | #| echo: [2, 3, 4] 101 | pushd(plan_4_dir) 102 | library(broom) 103 | tar_load(combined_model) 104 | glance(combined_model) 105 | popd() 106 | ``` 107 | 108 | Notice the small *P*-value. 109 | This seems to indicate that the model is highly significant. 110 | 111 | But wait a moment... is this really an appropriate model? Recall that there are three species of penguins in the dataset. It is possible that the relationship between bill depth and length **varies by species**. 112 | 113 | Let's try making one model *per* species (three models total) to see how that does (this is technically not the correct statistical approach, but our focus here is to learn `targets`, not statistics). 114 | 115 | Now our workflow is getting more complicated. This is what a workflow for such an analysis might look like **without branching** (make sure to add `library(broom)` to `packages.R`): 116 | 117 | ```{r} 118 | #| label = "example-model-show-1", 119 | #| eval = FALSE, 120 | #| code = readLines("files/plans/plan_5.R")[2:36] 121 | ``` 122 | 123 | ```{r} 124 | #| label: example-model-hide-1 125 | #| echo: false 126 | plan_5_dir <- make_tempdir() 127 | pushd(plan_5_dir) 128 | # simulate already running the plan once 129 | write_example_plan("plan_4.R") 130 | tar_make(reporter = "silent") 131 | write_example_plan("plan_5.R") 132 | tar_make() 133 | popd() 134 | ``` 135 | 136 | Let's look at the summary of one of the models: 137 | 138 | ```{r} 139 | #| label: example-model-show-2 140 | #| eval: true 141 | #| echo: [2] 142 | pushd(plan_5_dir) 143 | tar_read(adelie_summary) 144 | popd() 145 | ``` 146 | 147 | So this way of writing the pipeline works, but is repetitive: we have to call `glance()` each time we want to obtain summary statistics for each model. 148 | Furthermore, each summary target (`adelie_summary`, etc.) is explicitly named and typed out manually. 149 | It would be fairly easy to make a typo and end up with the wrong model being summarized. 150 | 151 | Before moving on, let's define another **custom function** function: `model_glance()`. 152 | You will need to write custom functions frequently when using `targets`, so it's good to get used to it! 153 | 154 | As the name `model_glance()` suggests (it is good to write functions with names that indicate their purpose), this will build a model then immediately run `glance()` on it. 155 | The reason for doing so is that we get a **dataframe as a result**, which is very helpful for branching, as we will see in the next section. 156 | Save this in `R/functions.R`: 157 | 158 | ```{r} 159 | #| label = "model-glance", 160 | #| eval = FALSE, 161 | #| code = readLines("files/tar_functions/model_glance_orig.R") 162 | ``` 163 | 164 | ## Example with branching 165 | 166 | ### First attempt 167 | 168 | Let's see how to write the same plan using **dynamic branching** (after running it, we will go through the new version in detail to understand each step): 169 | 170 | ```{r} 171 | #| label = "example-model-show-3", 172 | #| eval = FALSE, 173 | #| code = readLines("files/plans/plan_6.R")[2:28] 174 | ``` 175 | 176 | What is going on here? 177 | 178 | First, let's look at the messages provided by `tar_make()`. 179 | 180 | ```{r} 181 | #| label: example-model-hide-3 182 | #| echo: false 183 | plan_6_dir <- make_tempdir() 184 | pushd(plan_6_dir) 185 | # simulate already running the plan once 186 | write_example_plan("plan_5.R") 187 | tar_make(reporter = "silent") 188 | # run version of plan that uses `model_glance_orig()` (doesn't include species 189 | # names in output) 190 | write_example_plan("plan_6b.R") 191 | tar_make() 192 | example_branch_name <- tar_branch_names(species_summary, 1) 193 | popd() 194 | ``` 195 | 196 | There is a series of smaller targets (branches) that are each named like `r example_branch_name`, then one overall `species_summary` target. 197 | That is the result of specifying targets using branching: each of the smaller targets are the "branches" that comprise the overall target. 198 | Since `targets` has no way of knowing ahead of time how many branches there will be or what they represent, it names each one using this series of numbers and letters (the "hash"). 199 | `targets` builds each branch one at a time, then combines them into the overall target. 200 | 201 | Next, let's look in more detail about how the workflow is set up, starting with how we set up the data: 202 | 203 | ```{r} 204 | #| label = "model-def", 205 | #| code = readLines("files/plans/plan_6.R")[14:19], 206 | #| eval = FALSE 207 | ``` 208 | 209 | Unlike the non-branching version, we added a step that **groups the data**. 210 | This is because dynamic branching is similar to the [`tidyverse` approach](https://dplyr.tidyverse.org/articles/grouping.html) of applying the same function to a grouped dataframe. 211 | So we use the `tar_group_by()` function to specify the groups in our input data: one group per species. 212 | 213 | Next, take a look at the command to build the target `species_summary`. 214 | 215 | ```{r} 216 | #| label = "model-summaries", 217 | #| code = readLines("files/plans/plan_6.R")[22:27], 218 | #| eval = FALSE 219 | ``` 220 | 221 | As before, the first argument to `tar_target()` is the name of the target to build, and the second is the command to build it. 222 | 223 | Here, we apply our custom `model_glance()` function to each group (in other words, each species) in `penguins_data_grouped`. 224 | 225 | Finally, there is an argument we haven't seen before, `pattern`, which indicates that this target should be built using dynamic branching. 226 | `map` means to apply the function to each group of the input data (`penguins_data_grouped`) sequentially. 227 | 228 | Now that we understand how the branching workflow is constructed, let's inspect the output: 229 | 230 | ```{r} 231 | #| label: example-model-show-4 232 | #| eval: FALSE 233 | tar_read(species_summary) 234 | ``` 235 | 236 | ```{r} 237 | #| label: example-model-hide-4 238 | #| echo: FALSE 239 | pushd(plan_6_dir) 240 | tar_read(species_summary) 241 | popd() 242 | ``` 243 | 244 | The model summary statistics are all included in a single dataframe. 245 | 246 | But there's one problem: **we can't tell which row came from which species!** It would be unwise to assume that they are in the same order as the input data. 247 | 248 | This is due to the way dynamic branching works: by default, there is no information about the provenance of each target preserved in the output. 249 | 250 | How can we fix this? 251 | 252 | ### Second attempt 253 | 254 | The key to obtaining useful output from branching pipelines is to include the necessary information in the output of each individual branch. 255 | Here, we want to know the species that corresponds to each row of the model summaries. 256 | 257 | We can achieve this by modifying our `model_glance` function. Be sure to save it after modifying it to include a column for species: 258 | 259 | ```{r} 260 | #| label: example-model-show-5 261 | #| eval: FALSE 262 | #| file: files/tar_functions/model_glance.R 263 | ``` 264 | 265 | Our new pipeline looks exactly the same as before; we have made a modification, but to a **function**, not the pipeline. 266 | 267 | Since `targets` tracks the contents of each custom function, it realizes that it needs to recompute `species_summary` and runs this target again with the newly modified function. 268 | 269 | ```{r} 270 | #| label: example-model-hide-6 271 | #| echo: FALSE 272 | pushd(plan_6_dir) 273 | write_example_plan("plan_6.R") 274 | tar_make() 275 | popd() 276 | ``` 277 | 278 | And this time, when we load the `model_summaries`, we can tell which model corresponds to which row (the `.before = 1` in `mutate()` ensures that it shows up before the other columns). 279 | 280 | ```{r} 281 | #| label: example-model-7 282 | #| echo: [2] 283 | #| warning: false 284 | pushd(plan_6_dir) 285 | tar_read(species_summary) 286 | popd() 287 | ``` 288 | 289 | Next we will add one more target, a prediction of bill depth based on each model. These will be needed for plotting the models in the report. 290 | Such a prediction can be obtained with the `augment()` function of the `broom` package, and we create a custom function that outputs predicted points as a dataframe much like we did for the model summaries. 291 | 292 | 293 | ::::::::::::::::::::::::::::::::::::: {.challenge} 294 | 295 | ## Challenge: Add model predictions to the workflow 296 | 297 | Can you add the model predictions using `augment()`? You will need to define a custom function just like we did for `glance()`. 298 | 299 | :::::::::::::::::::::::::::::::::: {.solution} 300 | 301 | Define the new function as `model_augment()`. It is the same as `model_glance()`, but use `augment()` instead of `glance()`: 302 | 303 | ```{r} 304 | #| label: example-model-augment-func 305 | #| eval: FALSE 306 | #| file: files/tar_functions/model_augment.R 307 | ``` 308 | 309 | Add the step to the workflow: 310 | 311 | ```{r} 312 | #| label = "example-model-augment-show", 313 | #| code = readLines("files/plans/plan_7.R")[2:36], 314 | #| eval = FALSE 315 | ``` 316 | 317 | :::::::::::::::::::::::::::::::::: 318 | 319 | ::::::::::::::::::::::::::::::::::::: 320 | 321 | ### Further simplify the workflow 322 | 323 | You may have noticed that we can further simplify the workflow: there is no need to have separate `penguins_data` and `penguins_data_grouped` dataframes. 324 | In general it is best to keep the number of named objects as small as possible to make it easier to reason about your code. 325 | Let's combine the cleaning and grouping step into a single command: 326 | 327 | ```{r} 328 | #| label = "example-model-show-8", 329 | #| eval = FALSE, 330 | #| code = readLines("files/plans/plan_8.R")[2:34] 331 | ``` 332 | 333 | And run it once more: 334 | 335 | ```{r} 336 | #| label: example-model-hide-8 337 | #| echo: false 338 | pushd(plan_6_dir) 339 | # simulate already running the plan once 340 | write_example_plan("plan_7.R") 341 | tar_make(reporter = "silent") 342 | # run version of plan that uses `model_glance_orig()` (doesn't include species 343 | # names in output) 344 | write_example_plan("plan_8.R") 345 | tar_make() 346 | popd() 347 | ``` 348 | 349 | ::::::::::::::::::::::::::::::::::::: {.callout} 350 | 351 | ## Best practices for branching 352 | 353 | Dynamic branching is designed to work well with **dataframes** (it can also use [lists](https://books.ropensci.org/targets/dynamic.html#list-iteration), but that is more advanced, so we recommend using dataframes when possible). 354 | 355 | It is recommended to write your custom functions to accept dataframes as input and return them as output, and always include any necessary metadata as a column or columns. 356 | 357 | ::::::::::::::::::::::::::::::::::::: 358 | 359 | ::::::::::::::::::::::::::::::::::::: {.challenge} 360 | 361 | ## Challenge: What other kinds of patterns are there? 362 | 363 | So far, we have only used a single function in conjunction with the `pattern` argument, `map()`, which applies the function to each element of its input in sequence. 364 | 365 | Can you think of any other ways you might want to apply a branching pattern? 366 | 367 | :::::::::::::::::::::::::::::::::: {.solution} 368 | 369 | Some other ways of applying branching patterns include: 370 | 371 | - crossing: one branch per combination of elements (`cross()` function) 372 | - slicing: one branch for each of a manually selected set of elements (`slice()` function) 373 | - sampling: one branch for each of a randomly selected set of elements (`sample()` function) 374 | 375 | You can [find out more about different branching patterns in the `targets` manual](https://books.ropensci.org/targets/dynamic.html#patterns). 376 | 377 | :::::::::::::::::::::::::::::::::: 378 | 379 | ::::::::::::::::::::::::::::::::::::: 380 | 381 | ::::::::::::::::::::::::::::::::::::: keypoints 382 | 383 | - Dynamic branching creates multiple targets with a single command 384 | - You usually need to write custom functions so that the output of the branches includes necessary metadata 385 | 386 | :::::::::::::::::::::::::::::::::::::::::::::::: 387 | -------------------------------------------------------------------------------- /episodes/cache.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Loading Workflow Objects' 3 | teaching: 10 4 | exercises: 2 5 | --- 6 | 7 | ```{r} 8 | #| label: setup 9 | #| echo: FALSE 10 | #| message: FALSE 11 | #| warning: FALSE 12 | library(targets) 13 | source("files/lesson_functions.R") 14 | ``` 15 | 16 | :::::::::::::::::::::::::::::::::::::: questions 17 | 18 | - Where does the workflow happen? 19 | - How can we inspect the objects built by the workflow? 20 | 21 | :::::::::::::::::::::::::::::::::::::::::::::::: 22 | 23 | ::::::::::::::::::::::::::::::::::::: objectives 24 | 25 | - Explain where `targets` runs the workflow and why 26 | - Be able to load objects built by the workflow into your R session 27 | 28 | :::::::::::::::::::::::::::::::::::::::::::::::: 29 | 30 | ::::::::::::::::::::::::::::::::::::: instructor 31 | 32 | Episode summary: Show how to get at the objects that we built 33 | 34 | ::::::::::::::::::::::::::::::::::::: 35 | 36 | ## Where does the workflow happen? 37 | 38 | So we just finished running our workflow. 39 | Now you probably want to look at its output. 40 | But, if we just call the name of the object (for example, `penguins_data`), we get an error. 41 | ```{r} 42 | #| label: error 43 | penguins_data 44 | ``` 45 | 46 | Where are the results of our workflow? 47 | 48 | ::::::::::::::::::::::::::::::::::::: instructor 49 | 50 | - To reinforce the concept of `targets` running in a separate R session, you may want to pretend trying to run `penguins_data`, then feigning surprise when it doesn't work and using it as a teaching moment (errors are pedagogy!). 51 | 52 | :::::::::::::::::::::::::::::::::::::::::::::::: 53 | 54 | We don't see the workflow results because `targets` **runs the workflow in a separate R session** that we can't interact with. 55 | This is for reproducibility---the objects built by the workflow should only depend on the code in your project, not any commands you may have interactively given to R. 56 | 57 | Fortunately, `targets` has two functions that can be used to load objects built by the workflow into our current session, `tar_load()` and `tar_read()`. 58 | Let's see how these work. 59 | 60 | ## tar_load() 61 | 62 | `tar_load()` loads an object built by the workflow into the current session. 63 | Its first argument is the name of the object you want to load. 64 | Let's use this to load `penguins_data` and get an overview of the data with `summary()`. 65 | 66 | ```{r} 67 | #| label: targets-run-hide 68 | #| echo: FALSE 69 | # When building the Rmd, each instance of the workflow is isolated, so need 70 | # to re-run 71 | plan_1_dir <- make_tempdir() 72 | pushd(plan_1_dir) 73 | write_example_plan("plan_1.R") 74 | tar_make(reporter = "silent") 75 | penguins_csv_file_hide <- tar_read(penguins_csv_file) 76 | penguins_data_hide <- tar_read(penguins_data) 77 | popd() 78 | ``` 79 | 80 | ```{r} 81 | #| label: targets-load 82 | #| echo: [2, 3] 83 | pushd(plan_1_dir) 84 | tar_load(penguins_data) 85 | summary(penguins_data) 86 | popd() 87 | ``` 88 | 89 | Note that `tar_load()` is used for its **side-effect**---loading the desired object into the current R session. 90 | It doesn't actually return a value. 91 | 92 | ## tar_read() 93 | 94 | `tar_read()` is similar to `tar_load()` in that it is used to retrieve objects built by the workflow, but unlike `tar_load()`, it returns them directly as output. 95 | 96 | Let's try it with `penguins_csv_file`. 97 | 98 | ```{r} 99 | #| label: targets-read-show 100 | #| echo: [2] 101 | pushd(plan_1_dir) 102 | tar_read(penguins_csv_file) 103 | popd() 104 | ``` 105 | 106 | We immediately see the contents of `penguins_csv_file`. 107 | But it has not been loaded into the environment. 108 | If you try to run `penguins_csv_file` now, you will get an error: 109 | 110 | ```{r} 111 | #| label: error-2 112 | penguins_csv_file 113 | ``` 114 | 115 | ## When to use which function 116 | 117 | `tar_load()` tends to be more useful when you want to load objects and do things with them. 118 | `tar_read()` is more useful when you just want to immediately inspect an object. 119 | 120 | ## The targets cache 121 | 122 | If you close your R session, then re-start it and use `tar_load()` or `tar_read()`, you will notice that it can still load the workflow objects. 123 | In other words, the workflow output is **saved across R sessions**. 124 | How is this possible? 125 | 126 | You may have noticed a new folder has appeared in your project, called `_targets`. 127 | This is the **targets cache**. 128 | It contains all of the workflow output; that is how we can load the targets built by the workflow even after quitting then restarting R. 129 | 130 | **You should not edit the contents of the cache by hand** (with one exception). 131 | Doing so would make your analysis non-reproducible. 132 | 133 | The one exception to this rule is a special subfolder called `_targets/user`. 134 | This folder does not exist by default. 135 | You can create it if you want, and put whatever you want inside. 136 | 137 | Generally, `_targets/user` is a good place to store files that are not code, like data and output. 138 | 139 | Note that if you don't have anything in `_targets/user` that you need to keep around, it is possible to "reset" your workflow by simply deleting the entire `_targets` folder. Of course, this means you will need to run everything over again, so don't do this lightly! 140 | 141 | ::::::::::::::::::::::::::::::::::::: keypoints 142 | 143 | - `targets` workflows are run in a separate, non-interactive R session 144 | - `tar_load()` loads a workflow object into the current R session 145 | - `tar_read()` reads a workflow object and returns its value 146 | - The `_targets` folder is the cache and generally should not be edited by hand 147 | 148 | :::::::::::::::::::::::::::::::::::::::::::::::: 149 | -------------------------------------------------------------------------------- /episodes/fig/03-qmd-workflow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/carpentries-incubator/targets-workshop/10b9a63c208b2802fe0855c4a2d2ad9d6104276f/episodes/fig/03-qmd-workflow.png -------------------------------------------------------------------------------- /episodes/fig/basic-rstudio-project.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/carpentries-incubator/targets-workshop/10b9a63c208b2802fe0855c4a2d2ad9d6104276f/episodes/fig/basic-rstudio-project.png -------------------------------------------------------------------------------- /episodes/fig/basic-rstudio-wizard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/carpentries-incubator/targets-workshop/10b9a63c208b2802fe0855c4a2d2ad9d6104276f/episodes/fig/basic-rstudio-wizard.png -------------------------------------------------------------------------------- /episodes/fig/lifecycle-visnetwork.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/carpentries-incubator/targets-workshop/10b9a63c208b2802fe0855c4a2d2ad9d6104276f/episodes/fig/lifecycle-visnetwork.png -------------------------------------------------------------------------------- /episodes/files.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Working with External Files' 3 | teaching: 10 4 | exercises: 2 5 | --- 6 | 7 | :::::::::::::::::::::::::::::::::::::: questions 8 | 9 | - How can we load external data? 10 | 11 | :::::::::::::::::::::::::::::::::::::::::::::::: 12 | 13 | ::::::::::::::::::::::::::::::::::::: objectives 14 | 15 | - Be able to load external data into a workflow 16 | - Configure the workflow to rerun if the contents of the external data change 17 | 18 | :::::::::::::::::::::::::::::::::::::::::::::::: 19 | 20 | ::::::::::::::::::::::::::::::::::::: instructor 21 | 22 | Episode summary: Show how to read and write external files 23 | 24 | ::::::::::::::::::::::::::::::::::::: 25 | 26 | ```{r} 27 | #| label: setup 28 | #| echo: FALSE 29 | #| message: FALSE 30 | #| warning: FALSE 31 | library(targets) 32 | library(tarchetypes) 33 | source("files/lesson_functions.R") 34 | ``` 35 | 36 | ## Treating external files as a dependency 37 | 38 | Almost all workflows will start by importing data, which is typically stored as an external file. 39 | 40 | As a simple example, let's create an external data file in RStudio with the "New File" menu option. Enter a single line of text, "Hello World" and save it as "hello.txt" text file in `_targets/user/data/`. 41 | 42 | We will read in the contents of this file and store it as `some_data` in the workflow by writing the following plan and running `tar_make()`: 43 | 44 | ::::::::::::::::::::::::::::::::::::: {.callout} 45 | 46 | ## Save your progress 47 | 48 | You can only have one active `_targets.R` file at a time in a given project. 49 | 50 | We are about to create a new `_targets.R` file, but you probably don't want to lose your progress in the one we have been working on so far (the penguins bill analysis). You can temporarily rename that one to something like `_targets_old.R` so that you don't overwrite it with the new example `_targets.R` file below. Then, rename them when you are ready to work on it again. 51 | 52 | ::::::::::::::::::::::::::::::::::::: 53 | 54 | ```{r} 55 | #| label: example-file-show-1 56 | #| eval: FALSE 57 | library(targets) 58 | library(tarchetypes) 59 | 60 | tar_plan( 61 | some_data = readLines("_targets/user/data/hello.txt") 62 | ) 63 | ``` 64 | 65 | ```{r} 66 | #| label: example-file-hide-1 67 | #| echo: FALSE 68 | tar_dir({ 69 | fs::dir_create("_targets/user/data") 70 | writeLines("Hello World", "_targets/user/data/hello.txt") 71 | write_example_plan(chunk = "example-file-show-1") 72 | tar_make() 73 | }) 74 | ``` 75 | 76 | If we inspect the contents of `some_data` with `tar_read(some_data)`, it will contain the string `"Hello World"` as expected. 77 | 78 | Now say we edit "hello.txt", perhaps add some text: "Hello World. How are you?". Edit this in the RStudio text editor and save it. Now run the pipeline again. 79 | 80 | ```{r} 81 | #| label = "example-file-show-2", 82 | #| eval = FALSE, 83 | #| code = knitr::knit_code$get("example-file-show-1") 84 | ``` 85 | 86 | ```{r} 87 | #| label: example-file-hide-2 88 | #| echo: FALSE 89 | tar_dir({ 90 | fs::dir_create("_targets/user/data") 91 | writeLines("Hello World", "_targets/user/data/hello.txt") 92 | write_example_plan(chunk = "example-file-show-1") 93 | tar_make(reporter = "silent") 94 | writeLines("Hello World. How are you?", "_targets/user/data/hello.txt") 95 | tar_make() 96 | }) 97 | ``` 98 | 99 | The target `some_data` was skipped, even though the contents of the file changed. 100 | 101 | That is because right now, targets is only tracking the **name** of the file, not its contents. We need to use a special function for that, `tar_file()` from the `tarchetypes` package. `tar_file()` will calculate the "hash" of a file---a unique digital signature that is determined by the file's contents. If the contents change, the hash will change, and this will be detected by `targets`. 102 | 103 | ```{r} 104 | #| label: example-file-show-3 105 | #| eval: FALSE 106 | library(targets) 107 | library(tarchetypes) 108 | 109 | tar_plan( 110 | tar_file(data_file, "_targets/user/data/hello.txt"), 111 | some_data = readLines(data_file) 112 | ) 113 | ``` 114 | 115 | ```{r} 116 | #| label: example-file-hide-3 117 | #| echo: FALSE 118 | tar_dir({ 119 | fs::dir_create("_targets/user/data") 120 | writeLines("Hello World", "_targets/user/data/hello.txt") 121 | write_example_plan(chunk = "example-file-show-3") 122 | tar_make(reporter = "silent") 123 | writeLines("Hello World. How are you?", "_targets/user/data/hello.txt") 124 | tar_make() 125 | }) 126 | ``` 127 | 128 | This time we see that `targets` does successfully re-build `some_data` as expected. 129 | 130 | ## A shortcut (or, About target factories) 131 | 132 | However, also notice that this means we need to write two targets instead of one: one target to track the contents of the file (`data_file`), and one target to store what we load from the file (`some_data`). 133 | 134 | It turns out that this is a common pattern in `targets` workflows, so `tarchetypes` provides a shortcut to express this more concisely, `tar_file_read()`. 135 | 136 | ```{r} 137 | #| label: example-file-show-4 138 | #| eval: FALSE 139 | library(targets) 140 | library(tarchetypes) 141 | 142 | tar_plan( 143 | tar_file_read( 144 | hello, 145 | "_targets/user/data/hello.txt", 146 | readLines(!!.x) 147 | ) 148 | ) 149 | ``` 150 | 151 | Let's inspect this pipeline with `tar_manifest()`: 152 | 153 | ```{r} 154 | #| label: example-file-show-5 155 | #| eval: FALSE 156 | tar_manifest() 157 | ``` 158 | 159 | ```{r} 160 | #| label: example-file-hide-5 161 | #| echo: FALSE 162 | tar_dir({ 163 | # Emulate what the learner is doing 164 | fs::dir_create("_targets/user/data") 165 | # Old (longer) version: 166 | writeLines("Hello World. How are you?", "_targets/user/data/hello.txt") 167 | # Make it again with the shorter version 168 | write_example_plan(chunk = "example-file-show-4") 169 | tar_manifest() 170 | }) 171 | ``` 172 | 173 | Notice that even though we only specified one target in the pipeline (`hello`, with `tar_file_read()`), the pipeline actually includes **two** targets, `hello_file` and `hello`. 174 | 175 | That is because `tar_file_read()` is a special function called a **target factory**, so-called because it makes **multiple** targets at once. One of the main purposes of the `tarchetypes` package is to provide target factories to make writing pipelines easier and less error-prone. 176 | 177 | ## Non-standard evaluation 178 | 179 | What is the deal with the `!!.x`? That may look unfamiliar even if you are used to using R. It is known as "non-standard evaluation," and gets used in some special contexts. We don't have time to go into the details now, but just remember that you will need to use this special notation with `tar_file_read()`. If you forget how to write it (this happens frequently!) look at the examples in the help file by running `?tar_file_read`. 180 | 181 | ## Other data loading functions 182 | 183 | Although we used `readLines()` as an example here, you can use the same pattern for other functions that load data from external files, such as `readr::read_csv()`, `xlsx::read_excel()`, and others (for example, `read_csv(!!.x)`, `read_excel(!!.x)`, etc.). 184 | 185 | This is generally recommended so that your pipeline stays up to date with your input data. 186 | 187 | ::::::::::::::::::::::::::::::::::::: {.challenge} 188 | 189 | ## Challenge: Use `tar_file_read()` with the penguins example 190 | 191 | We didn't know about `tar_file_read()` yet when we started on the penguins bill analysis. 192 | 193 | How can you use `tar_file_read()` to load the CSV file while tracking its contents? 194 | 195 | :::::::::::::::::::::::::::::::::: {.solution} 196 | 197 | ```{r} 198 | #| label = "tar-file-read-answer-show", 199 | #| eval = FALSE, 200 | #| code = readLines("files/plans/plan_3.R")[2:12] 201 | ``` 202 | 203 | ```{r} 204 | #| label: tar-file-read-answer-hide 205 | #| echo: FALSE 206 | tar_dir({ 207 | # New workflow 208 | write_example_plan("plan_3.R") 209 | # Run it 210 | tar_make() 211 | }) 212 | ``` 213 | 214 | :::::::::::::::::::::::::::::::::: 215 | 216 | ::::::::::::::::::::::::::::::::::::: 217 | 218 | ## Writing out data 219 | 220 | Writing to files is similar to loading in files: we will use the `tar_file()` function. There is one important caveat: in this case, the second argument of `tar_file()` (the command to build the target) **must return the path to the file**. Not all functions that write files do this (some return nothing; these treat the output file is a side-effect of running the function), so you may need to define a custom function that writes out the file and then returns its path. 221 | 222 | Let's do this for `writeLines()`, the R function that writes character data to a file. Normally, its output would be `NULL` (nothing), as we can see here: 223 | 224 | ```{r} 225 | #| label: write-data-show-1 226 | #| eval: false 227 | x <- writeLines("some text", "test.txt") 228 | x 229 | ``` 230 | 231 | ```{r} 232 | #| label: write-data-hide-1 233 | #| echo: false 234 | x <- writeLines("some text", "test.txt") 235 | x 236 | fs::file_delete("test.txt") 237 | ``` 238 | 239 | Here is our modified function that writes character data to a file and returns the name of the file (the `...` means "pass the rest of these arguments to `writeLines()`"): 240 | 241 | ```{r} 242 | #| label: write-data-func 243 | #| file: files/tar_functions/write_lines_file.R 244 | ``` 245 | 246 | Let's try it out: 247 | 248 | ```{r} 249 | #| label: write-data-show-2 250 | #| eval: false 251 | x <- write_lines_file("some text", "test.txt") 252 | x 253 | ``` 254 | 255 | ```{r} 256 | #| label: write-data-hide-2 257 | #| echo: false 258 | x <- write_lines_file("some text", "test.txt") 259 | x 260 | fs::file_delete("test.txt") 261 | ``` 262 | 263 | We can now use this in a pipeline. For example let's change the text to upper case then write it out again: 264 | 265 | ```{r} 266 | #| label: example-file-show-6 267 | #| eval: false 268 | library(targets) 269 | library(tarchetypes) 270 | 271 | source("R/functions.R") 272 | 273 | tar_plan( 274 | tar_file_read( 275 | hello, 276 | "_targets/user/data/hello.txt", 277 | readLines(!!.x) 278 | ), 279 | hello_caps = toupper(hello), 280 | tar_file( 281 | hello_caps_out, 282 | write_lines_file(hello_caps, "_targets/user/results/hello_caps.txt") 283 | ) 284 | ) 285 | ``` 286 | 287 | ```{r} 288 | #| label: example-file-hide-6 289 | #| echo: false 290 | tar_dir({ 291 | fs::dir_create("_targets/user/data") 292 | fs::dir_create("_targets/user/results") 293 | writeLines("Hello World. How are you?", "_targets/user/data/hello.txt") 294 | write_example_plan(chunk = "example-file-show-6") 295 | tar_make() 296 | }) 297 | ``` 298 | 299 | Take a look at `hello_caps.txt` in the `results` folder and verify it is as you expect. 300 | 301 | ::::::::::::::::::::::::::::::::::::: {.challenge} 302 | 303 | ## Challenge: What happens to file output if its modified? 304 | 305 | Delete or change the contents of `hello_caps.txt` in the `results` folder. 306 | What do you think will happen when you run `tar_make()` again? 307 | Try it and see. 308 | 309 | :::::::::::::::::::::::::::::::::: {.solution} 310 | 311 | `targets` detects that `hello_caps_out` has changed (is "invalidated"), and re-runs the code to make it, thus writing out `hello_caps.txt` to `results` again. 312 | 313 | So this way of writing out results makes your pipeline more robust: we have a guarantee that the contents of the file in `results` are generated solely by the code in your plan. 314 | 315 | :::::::::::::::::::::::::::::::::: 316 | 317 | ::::::::::::::::::::::::::::::::::::: 318 | 319 | ::::::::::::::::::::::::::::::::::::: keypoints 320 | 321 | - `tarchetypes::tar_file()` tracks the contents of a file 322 | - Use `tarchetypes::tar_file_read()` in combination with data loading functions like `read_csv()` to keep the pipeline in sync with your input data 323 | - Use `tarchetypes::tar_file()` in combination with a function that writes to a file and returns its path to write out data 324 | 325 | :::::::::::::::::::::::::::::::::::::::::::::::: 326 | -------------------------------------------------------------------------------- /episodes/files/lesson_functions.R: -------------------------------------------------------------------------------- 1 | # Functions used in the lesson `.Rmd` files, but that learners 2 | # aren't exposed to, and aren't used inside the Targets pipelines 3 | 4 | make_tempdir <- function() { 5 | x <- tempfile() 6 | dir.create(x, showWarnings = FALSE) 7 | x 8 | } 9 | 10 | files_root <- normalizePath("files") 11 | plan_root <- file.path(files_root, "plans") 12 | utility_funcs <- file.path(files_root, "tar_functions") |> 13 | list.files(full.names = TRUE, pattern = "\\.R$") |> 14 | lapply(readLines) |> 15 | unlist() 16 | package_script <- file.path(files_root, "packages.R") 17 | 18 | #' @param file The path to another file to use as a workflow 19 | #' @param chunk The chunk name to use as a targets workflow 20 | write_example_plan <- function(file = NULL, chunk = NULL) { 21 | # Write the utility functions into the R/ directory 22 | 23 | if (!dir.exists("R")) { 24 | dir.create("R") 25 | 26 | # Write the functions.R script 27 | file.path("R", "functions.R") |> 28 | writeLines(utility_funcs, con = _) 29 | 30 | # Copy the packages.R script 31 | file.path("R", "packages.R") |> 32 | file.copy(from = package_script, to = _) 33 | } 34 | 35 | # Write the workflow 36 | if (!is.null(file)) { 37 | file.path(plan_root, file) |> 38 | file.copy(from = _, to = "_targets.R", overwrite = TRUE) 39 | } 40 | if (!is.null(chunk)) { 41 | writeLines(text = knitr::knit_code$get(chunk), con = "_targets.R") 42 | } 43 | 44 | invisible() 45 | } 46 | 47 | directory_stack <- getwd() 48 | 49 | pushd <- function(dir) { 50 | directory_stack <<- c(dir, directory_stack) 51 | setwd(directory_stack[1]) 52 | invisible() 53 | } 54 | 55 | popd <- function() { 56 | directory_stack <<- directory_stack[-1] 57 | setwd(directory_stack[1]) 58 | invisible() 59 | } 60 | -------------------------------------------------------------------------------- /episodes/files/packages.R: -------------------------------------------------------------------------------- 1 | library(targets) 2 | library(tarchetypes) 3 | library(palmerpenguins) 4 | library(tidyverse) 5 | library(broom) 6 | library(htmlwidgets) 7 | -------------------------------------------------------------------------------- /episodes/files/plans/README.md: -------------------------------------------------------------------------------- 1 | Plans that are re-used between multiple episodes are placed here -------------------------------------------------------------------------------- /episodes/files/plans/plan_0.R: -------------------------------------------------------------------------------- 1 | options(tidyverse.quiet = TRUE) 2 | library(targets) 3 | library(tidyverse) 4 | library(palmerpenguins) 5 | 6 | list( 7 | tar_target(penguins_csv_file, path_to_file("penguins_raw.csv")), 8 | tar_target( 9 | penguins_data_raw, 10 | read_csv(penguins_csv_file, show_col_types = FALSE) 11 | ), 12 | tar_target( 13 | penguins_data, 14 | penguins_data_raw |> 15 | select( 16 | species = Species, 17 | bill_length_mm = `Culmen Length (mm)`, 18 | bill_depth_mm = `Culmen Depth (mm)` 19 | ) |> 20 | drop_na() 21 | ) 22 | ) 23 | -------------------------------------------------------------------------------- /episodes/files/plans/plan_1.R: -------------------------------------------------------------------------------- 1 | options(tidyverse.quiet = TRUE) 2 | library(targets) 3 | library(tidyverse) 4 | library(palmerpenguins) 5 | 6 | clean_penguin_data <- function(penguins_data_raw) { 7 | penguins_data_raw |> 8 | select( 9 | species = Species, 10 | bill_length_mm = `Culmen Length (mm)`, 11 | bill_depth_mm = `Culmen Depth (mm)` 12 | ) |> 13 | drop_na() 14 | } 15 | 16 | list( 17 | tar_target(penguins_csv_file, path_to_file("penguins_raw.csv")), 18 | tar_target(penguins_data_raw, read_csv( 19 | penguins_csv_file, show_col_types = FALSE)), 20 | tar_target(penguins_data, clean_penguin_data(penguins_data_raw)) 21 | ) 22 | -------------------------------------------------------------------------------- /episodes/files/plans/plan_10.R: -------------------------------------------------------------------------------- 1 | options(tidyverse.quiet = TRUE) 2 | suppressPackageStartupMessages(library(crew)) 3 | source("R/functions.R") 4 | source("R/packages.R") 5 | 6 | # Set up parallelization 7 | library(crew) 8 | tar_option_set( 9 | controller = crew_controller_local(workers = 2) 10 | ) 11 | 12 | tar_plan( 13 | # Load raw data 14 | tar_file_read( 15 | penguins_data_raw, 16 | path_to_file("penguins_raw.csv"), 17 | read_csv(!!.x, show_col_types = FALSE) 18 | ), 19 | # Clean and group data 20 | tar_group_by( 21 | penguins_data, 22 | clean_penguin_data(penguins_data_raw), 23 | species 24 | ), 25 | # Get summary of combined model with all species together 26 | combined_summary = model_glance_slow(penguins_data), 27 | # Get summary of one model per species 28 | tar_target( 29 | species_summary, 30 | model_glance_slow(penguins_data), 31 | pattern = map(penguins_data) 32 | ), 33 | # Get predictions of combined model with all species together 34 | combined_predictions = model_augment_slow(penguins_data), 35 | # Get predictions of one model per species 36 | tar_target( 37 | species_predictions, 38 | model_augment_slow(penguins_data), 39 | pattern = map(penguins_data) 40 | ) 41 | ) 42 | -------------------------------------------------------------------------------- /episodes/files/plans/plan_11.R: -------------------------------------------------------------------------------- 1 | options(tidyverse.quiet = TRUE) 2 | source("R/functions.R") 3 | source("R/packages.R") 4 | 5 | tar_plan( 6 | # Load raw data 7 | tar_file_read( 8 | penguins_data_raw, 9 | path_to_file("penguins_raw.csv"), 10 | read_csv(!!.x, show_col_types = FALSE) 11 | ), 12 | # Clean and group data 13 | tar_group_by( 14 | penguins_data, 15 | clean_penguin_data(penguins_data_raw), 16 | species 17 | ), 18 | # Get summary of combined model with all species together 19 | combined_summary = model_glance(penguins_data), 20 | # Get summary of one model per species 21 | tar_target( 22 | species_summary, 23 | model_glance(penguins_data), 24 | pattern = map(penguins_data) 25 | ), 26 | # Get predictions of combined model with all species together 27 | combined_predictions = model_augment(penguins_data), 28 | # Get predictions of one model per species 29 | tar_target( 30 | species_predictions, 31 | model_augment(penguins_data), 32 | pattern = map(penguins_data) 33 | ), 34 | # Generate report 35 | tar_quarto( 36 | penguin_report, 37 | path = "penguin_report.qmd", 38 | quiet = FALSE 39 | ) 40 | ) 41 | -------------------------------------------------------------------------------- /episodes/files/plans/plan_2.R: -------------------------------------------------------------------------------- 1 | options(tidyverse.quiet = TRUE) 2 | source("R/packages.R") 3 | source("R/functions.R") 4 | 5 | list( 6 | tar_target(penguins_csv_file, path_to_file('penguins_raw.csv')), 7 | tar_target(penguins_data_raw, read_csv( 8 | penguins_csv_file, show_col_types = FALSE)), 9 | tar_target(penguins_data, clean_penguin_data(penguins_data_raw)) 10 | ) 11 | -------------------------------------------------------------------------------- /episodes/files/plans/plan_2b.R: -------------------------------------------------------------------------------- 1 | options(tidyverse.quiet = TRUE) 2 | source("R/packages.R") 3 | source("R/functions.R") 4 | 5 | tar_plan( 6 | penguins_csv_file = path_to_file("penguins_raw.csv"), 7 | penguins_data_raw = read_csv(penguins_csv_file, show_col_types = FALSE), 8 | penguins_data = clean_penguin_data(penguins_data_raw) 9 | ) 10 | -------------------------------------------------------------------------------- /episodes/files/plans/plan_3.R: -------------------------------------------------------------------------------- 1 | options(tidyverse.quiet = TRUE) 2 | source("R/packages.R") 3 | source("R/functions.R") 4 | 5 | tar_plan( 6 | tar_file_read( 7 | penguins_data_raw, 8 | path_to_file("penguins_raw.csv"), 9 | read_csv(!!.x, show_col_types = FALSE) 10 | ), 11 | penguins_data = clean_penguin_data(penguins_data_raw) 12 | ) 13 | -------------------------------------------------------------------------------- /episodes/files/plans/plan_4.R: -------------------------------------------------------------------------------- 1 | options(tidyverse.quiet = TRUE) 2 | source("R/packages.R") 3 | source("R/functions.R") 4 | 5 | tar_plan( 6 | # Load raw data 7 | tar_file_read( 8 | penguins_data_raw, 9 | path_to_file("penguins_raw.csv"), 10 | read_csv(!!.x, show_col_types = FALSE) 11 | ), 12 | # Clean data 13 | penguins_data = clean_penguin_data(penguins_data_raw), 14 | # Build model 15 | combined_model = lm( 16 | bill_depth_mm ~ bill_length_mm, 17 | data = penguins_data 18 | ) 19 | ) 20 | -------------------------------------------------------------------------------- /episodes/files/plans/plan_5.R: -------------------------------------------------------------------------------- 1 | options(tidyverse.quiet = TRUE) 2 | source("R/packages.R") 3 | source("R/functions.R") 4 | 5 | tar_plan( 6 | # Load raw data 7 | tar_file_read( 8 | penguins_data_raw, 9 | path_to_file("penguins_raw.csv"), 10 | read_csv(!!.x, show_col_types = FALSE) 11 | ), 12 | # Clean data 13 | penguins_data = clean_penguin_data(penguins_data_raw), 14 | # Build models 15 | combined_model = lm( 16 | bill_depth_mm ~ bill_length_mm, 17 | data = penguins_data 18 | ), 19 | adelie_model = lm( 20 | bill_depth_mm ~ bill_length_mm, 21 | data = filter(penguins_data, species == "Adelie") 22 | ), 23 | chinstrap_model = lm( 24 | bill_depth_mm ~ bill_length_mm, 25 | data = filter(penguins_data, species == "Chinstrap") 26 | ), 27 | gentoo_model = lm( 28 | bill_depth_mm ~ bill_length_mm, 29 | data = filter(penguins_data, species == "Gentoo") 30 | ), 31 | # Get model summaries 32 | combined_summary = glance(combined_model), 33 | adelie_summary = glance(adelie_model), 34 | chinstrap_summary = glance(chinstrap_model), 35 | gentoo_summary = glance(gentoo_model) 36 | ) 37 | -------------------------------------------------------------------------------- /episodes/files/plans/plan_6.R: -------------------------------------------------------------------------------- 1 | options(tidyverse.quiet = TRUE) 2 | source("R/packages.R") 3 | source("R/functions.R") 4 | 5 | tar_plan( 6 | # Load raw data 7 | tar_file_read( 8 | penguins_data_raw, 9 | path_to_file("penguins_raw.csv"), 10 | read_csv(!!.x, show_col_types = FALSE) 11 | ), 12 | # Clean data 13 | penguins_data = clean_penguin_data(penguins_data_raw), 14 | # Group data 15 | tar_group_by( 16 | penguins_data_grouped, 17 | penguins_data, 18 | species 19 | ), 20 | # Build combined model with all species together 21 | combined_summary = model_glance(penguins_data), 22 | # Build one model per species 23 | tar_target( 24 | species_summary, 25 | model_glance(penguins_data_grouped), 26 | pattern = map(penguins_data_grouped) 27 | ) 28 | ) 29 | -------------------------------------------------------------------------------- /episodes/files/plans/plan_6b.R: -------------------------------------------------------------------------------- 1 | options(tidyverse.quiet = TRUE) 2 | source("R/packages.R") 3 | source("R/functions.R") 4 | 5 | tar_plan( 6 | # Load raw data 7 | tar_file_read( 8 | penguins_data_raw, 9 | path_to_file("penguins_raw.csv"), 10 | read_csv(!!.x, show_col_types = FALSE) 11 | ), 12 | # Clean data 13 | penguins_data = clean_penguin_data(penguins_data_raw), 14 | # Group data 15 | tar_group_by( 16 | penguins_data_grouped, 17 | penguins_data, 18 | species 19 | ), 20 | # Build combined model with all species together 21 | combined_summary = model_glance_orig(penguins_data), 22 | # Build one model per species 23 | tar_target( 24 | species_summary, 25 | model_glance_orig(penguins_data_grouped), 26 | pattern = map(penguins_data_grouped) 27 | ) 28 | ) 29 | -------------------------------------------------------------------------------- /episodes/files/plans/plan_7.R: -------------------------------------------------------------------------------- 1 | options(tidyverse.quiet = TRUE) 2 | source("R/functions.R") 3 | source("R/packages.R") 4 | 5 | tar_plan( 6 | # Load raw data 7 | tar_file_read( 8 | penguins_data_raw, 9 | path_to_file("penguins_raw.csv"), 10 | read_csv(!!.x, show_col_types = FALSE) 11 | ), 12 | # Clean data 13 | penguins_data = clean_penguin_data(penguins_data_raw), 14 | # Group data 15 | tar_group_by( 16 | penguins_data_grouped, 17 | penguins_data, 18 | species 19 | ), 20 | # Get summary of combined model with all species together 21 | combined_summary = model_glance(penguins_data), 22 | # Get summary of one model per species 23 | tar_target( 24 | species_summary, 25 | model_glance(penguins_data_grouped), 26 | pattern = map(penguins_data_grouped) 27 | ), 28 | # Get predictions of combined model with all species together 29 | combined_predictions = model_augment(penguins_data_grouped), 30 | # Get predictions of one model per species 31 | tar_target( 32 | species_predictions, 33 | model_augment(penguins_data_grouped), 34 | pattern = map(penguins_data_grouped) 35 | ) 36 | ) 37 | -------------------------------------------------------------------------------- /episodes/files/plans/plan_8.R: -------------------------------------------------------------------------------- 1 | options(tidyverse.quiet = TRUE) 2 | source("R/functions.R") 3 | source("R/packages.R") 4 | 5 | tar_plan( 6 | # Load raw data 7 | tar_file_read( 8 | penguins_data_raw, 9 | path_to_file("penguins_raw.csv"), 10 | read_csv(!!.x, show_col_types = FALSE) 11 | ), 12 | # Clean and group data 13 | tar_group_by( 14 | penguins_data, 15 | clean_penguin_data(penguins_data_raw), 16 | species 17 | ), 18 | # Get summary of combined model with all species together 19 | combined_summary = model_glance(penguins_data), 20 | # Get summary of one model per species 21 | tar_target( 22 | species_summary, 23 | model_glance(penguins_data), 24 | pattern = map(penguins_data) 25 | ), 26 | # Get predictions of combined model with all species together 27 | combined_predictions = model_augment(penguins_data), 28 | # Get predictions of one model per species 29 | tar_target( 30 | species_predictions, 31 | model_augment(penguins_data), 32 | pattern = map(penguins_data) 33 | ) 34 | ) 35 | -------------------------------------------------------------------------------- /episodes/files/plans/plan_9.R: -------------------------------------------------------------------------------- 1 | options(tidyverse.quiet = TRUE) 2 | suppressPackageStartupMessages(library(crew)) 3 | source("R/functions.R") 4 | source("R/packages.R") 5 | 6 | # Set up parallelization 7 | library(crew) 8 | tar_option_set( 9 | controller = crew_controller_local(workers = 2) 10 | ) 11 | 12 | tar_plan( 13 | # Load raw data 14 | tar_file_read( 15 | penguins_data_raw, 16 | path_to_file("penguins_raw.csv"), 17 | read_csv(!!.x, show_col_types = FALSE) 18 | ), 19 | # Clean and group data 20 | tar_group_by( 21 | penguins_data, 22 | clean_penguin_data(penguins_data_raw), 23 | species 24 | ), 25 | # Get summary of combined model with all species together 26 | combined_summary = model_glance(penguins_data), 27 | # Get summary of one model per species 28 | tar_target( 29 | species_summary, 30 | model_glance(penguins_data), 31 | pattern = map(penguins_data) 32 | ), 33 | # Get predictions of combined model with all species together 34 | combined_predictions = model_augment(penguins_data), 35 | # Get predictions of one model per species 36 | tar_target( 37 | species_predictions, 38 | model_augment(penguins_data), 39 | pattern = map(penguins_data) 40 | ) 41 | ) 42 | -------------------------------------------------------------------------------- /episodes/files/tar_functions/README.md: -------------------------------------------------------------------------------- 1 | These are functions that are used inside the targets pipelines. 2 | All of them are automatically included in every plan written by `execute_plan`. 3 | However they are split into separate files so they can be included as code chunks and thereby shown to the learners. 4 | -------------------------------------------------------------------------------- /episodes/files/tar_functions/augment_with_mod_name.R: -------------------------------------------------------------------------------- 1 | augment_with_mod_name <- function(model_in_list) { 2 | model_name <- names(model_in_list) 3 | model <- model_in_list[[1]] 4 | augment(model) |> 5 | mutate(model_name = model_name) 6 | } 7 | -------------------------------------------------------------------------------- /episodes/files/tar_functions/augment_with_mod_name_slow.R: -------------------------------------------------------------------------------- 1 | augment_with_mod_name_slow <- function(model_in_list) { 2 | Sys.sleep(4) 3 | model_name <- names(model_in_list) 4 | model <- model_in_list[[1]] 5 | broom::augment(model) |> 6 | mutate(model_name = model_name) 7 | } 8 | -------------------------------------------------------------------------------- /episodes/files/tar_functions/clean_penguin_data.R: -------------------------------------------------------------------------------- 1 | clean_penguin_data <- function(penguins_data_raw) { 2 | penguins_data_raw |> 3 | select( 4 | species = Species, 5 | bill_length_mm = `Culmen Length (mm)`, 6 | bill_depth_mm = `Culmen Depth (mm)` 7 | ) |> 8 | drop_na() |> 9 | # Split "species" apart on spaces, and only keep the first word 10 | separate(species, into = "species", extra = "drop") 11 | } 12 | -------------------------------------------------------------------------------- /episodes/files/tar_functions/glance_with_mod_name.R: -------------------------------------------------------------------------------- 1 | glance_with_mod_name <- function(model_in_list) { 2 | model_name <- names(model_in_list) 3 | model <- model_in_list[[1]] 4 | glance(model) |> 5 | mutate(model_name = model_name) 6 | } 7 | -------------------------------------------------------------------------------- /episodes/files/tar_functions/glance_with_mod_name_slow.R: -------------------------------------------------------------------------------- 1 | glance_with_mod_name_slow <- function(model_in_list) { 2 | Sys.sleep(4) 3 | model_name <- names(model_in_list) 4 | model <- model_in_list[[1]] 5 | broom::glance(model) |> 6 | mutate(model_name = model_name) 7 | } 8 | -------------------------------------------------------------------------------- /episodes/files/tar_functions/model_augment.R: -------------------------------------------------------------------------------- 1 | model_augment <- function(penguins_data) { 2 | # Make model 3 | model <- lm( 4 | bill_depth_mm ~ bill_length_mm, 5 | data = penguins_data) 6 | # Get species name 7 | species_name <- unique(penguins_data$species) 8 | # If this is the combined dataset with multiple 9 | # species, changed name to 'combined' 10 | if (length(species_name) > 1) { 11 | species_name <- "combined" 12 | } 13 | # Get model summary and add species name 14 | augment(model) |> 15 | mutate(species = species_name, .before = 1) 16 | } 17 | -------------------------------------------------------------------------------- /episodes/files/tar_functions/model_augment_slow.R: -------------------------------------------------------------------------------- 1 | model_augment_slow <- function(penguins_data) { 2 | Sys.sleep(4) 3 | # Make model 4 | model <- lm( 5 | bill_depth_mm ~ bill_length_mm, 6 | data = penguins_data) 7 | # Get species name 8 | species_name <- unique(penguins_data$species) 9 | # If this is the combined dataset with multiple 10 | # species, changed name to 'combined' 11 | if (length(species_name) > 1) { 12 | species_name <- "combined" 13 | } 14 | # Get model summary and add species name 15 | augment(model) |> 16 | mutate(species = species_name, .before = 1) 17 | } 18 | -------------------------------------------------------------------------------- /episodes/files/tar_functions/model_glance.R: -------------------------------------------------------------------------------- 1 | model_glance <- function(penguins_data) { 2 | # Make model 3 | model <- lm( 4 | bill_depth_mm ~ bill_length_mm, 5 | data = penguins_data) 6 | # Get species name 7 | species_name <- unique(penguins_data$species) 8 | # If this is the combined dataset with multiple 9 | # species, changed name to 'combined' 10 | if (length(species_name) > 1) { 11 | species_name <- "combined" 12 | } 13 | # Get model summary and add species name 14 | glance(model) |> 15 | mutate(species = species_name, .before = 1) 16 | } 17 | -------------------------------------------------------------------------------- /episodes/files/tar_functions/model_glance_orig.R: -------------------------------------------------------------------------------- 1 | model_glance_orig <- function(penguins_data) { 2 | model <- lm( 3 | bill_depth_mm ~ bill_length_mm, 4 | data = penguins_data) 5 | broom::glance(model) 6 | } 7 | -------------------------------------------------------------------------------- /episodes/files/tar_functions/model_glance_slow.R: -------------------------------------------------------------------------------- 1 | model_glance_slow <- function(penguins_data) { 2 | Sys.sleep(4) 3 | # Make model 4 | model <- lm( 5 | bill_depth_mm ~ bill_length_mm, 6 | data = penguins_data) 7 | # Get species name 8 | species_name <- unique(penguins_data$species) 9 | # If this is the combined dataset with multiple 10 | # species, changed name to 'combined' 11 | if (length(species_name) > 1) { 12 | species_name <- "combined" 13 | } 14 | # Get model summary and add species name 15 | glance(model) |> 16 | mutate(species = species_name, .before = 1) 17 | } 18 | -------------------------------------------------------------------------------- /episodes/files/tar_functions/write_lines_file.R: -------------------------------------------------------------------------------- 1 | write_lines_file <- function(text, file, ...) { 2 | writeLines(text = text, con = file, ...) 3 | file 4 | } 5 | -------------------------------------------------------------------------------- /episodes/functions.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'A Brief Introduction to Functions' 3 | teaching: 30 4 | exercises: 10 5 | --- 6 | 7 | :::::::::::::::::::::::::::::::::::::: questions 8 | 9 | - What are functions? 10 | - Why should we know how to write them? 11 | - What are the main components of a function? 12 | 13 | :::::::::::::::::::::::::::::::::::::::::::::::: 14 | 15 | ::::::::::::::::::::::::::::::::::::: objectives 16 | 17 | - Understand the usefulness of custom functions 18 | - Understand the basic concepts around writing functions 19 | 20 | :::::::::::::::::::::::::::::::::::::::::::::::: 21 | 22 | ::::::::::::::::::::::::::::::::::::: {.instructor} 23 | 24 | Episode summary: A very brief introduction to functions, when you have learners who have no experience with them. 25 | 26 | :::::::::::::::::::::::::::::::::::::::::::::::: 27 | 28 | ```{r} 29 | #| label: setup 30 | #| echo: FALSE 31 | #| message: FALSE 32 | #| warning: FALSE 33 | library(targets) 34 | 35 | if (interactive()) { 36 | setwd("episodes") 37 | } 38 | 39 | source("files/lesson_functions.R") 40 | ``` 41 | 42 | ## About functions 43 | 44 | Functions in R are something we are used to thinking of as something that comes from a package. You find, install and use specialized functions from packages to get your work done. 45 | 46 | But you can, and arguably should, be writing your own functions too! 47 | Functions are a great way of making it easy to repeat the same operation but with different settings. 48 | How many times have you copy-pasted the exact same code in your script, only to change a couple of things (a variable, an input etc.) before running it again? 49 | Only to then discover that there was an error in the code, and when you fix it, you need to remember to do so in all the places where you copied that code. 50 | 51 | Through writing functions you can reduce this back and forth, and create a more efficient workflow for yourself. 52 | When you find the bug, you fix it in a single place, the function you made, and each subsequent call of that function will now be fixed. 53 | 54 | Furthermore, `targets` makes extensive use of custom functions, so a basic understanding of how they work is very important to successfully using it. 55 | 56 | ### Writing a function 57 | 58 | There is not much difference between writing your own function and writing other code in R, you are still coding with R! 59 | Let's imagine we want to convert the millimeter measurements in the penguins data to centimeters. 60 | 61 | ```{r} 62 | #| label: targets-functions-problem 63 | #| message: FALSE 64 | library(palmerpenguins) 65 | library(tidyverse) 66 | 67 | penguins |> 68 | mutate( 69 | bill_length_cm = bill_length_mm / 10, 70 | bill_depth_cm = bill_depth_mm / 10 71 | ) 72 | 73 | ``` 74 | 75 | This is not a complicated operation, but we might want to make a convenient custom function that can do this conversion for us anyways. 76 | 77 | To write a function, you need to use the `function()` function. 78 | With this function we provide what will be the input arguments of the function inside its parentheses, and what the function will subsequently do with those input arguments in curly braces `{}` after the function parentheses. 79 | The object name we assign this to, will become the function's name. 80 | 81 | ```{r} 82 | #| label: targets-functions-skeleton 83 | #| eval: false 84 | my_function <- function(argument1, argument2) { 85 | # the things the function will do 86 | } 87 | # call the function 88 | my_function(1, "something") 89 | ``` 90 | 91 | For our mm to cm conversion the function would look like so: 92 | 93 | ```{r} 94 | #| label: targets-functions-cm 95 | mm2cm <- function(x) { 96 | x / 10 97 | } 98 | ``` 99 | 100 | Our custom function will now transform any numerical input by dividing it by 10. 101 | 102 | Let's try it out: 103 | 104 | ```{r} 105 | #| label: targets-functions-cm-use 106 | penguins |> 107 | mutate( 108 | bill_length_cm = mm2cm(bill_length_mm), 109 | bill_depth_cm = mm2cm(bill_depth_mm) 110 | ) 111 | ``` 112 | 113 | Congratulations, you've created and used your first custom function! 114 | 115 | ### Make a function from existing code 116 | 117 | Many times, we might already have a piece of code that we'd like to use to create a function. 118 | For instance, we've copy-pasted a section of code several times and realize that this piece of code is repetitive, so a function is in order. 119 | Or, you are converting your workflow to `targets`, and need to change your script into a series of functions that `targets` will call. 120 | 121 | Recall the code snippet we had to clean our penguins data: 122 | 123 | ```{r} 124 | #| label: code-to-convert-to-function 125 | #| eval: false 126 | penguins_data_raw |> 127 | select( 128 | species = Species, 129 | bill_length_mm = `Culmen Length (mm)`, 130 | bill_depth_mm = `Culmen Depth (mm)` 131 | ) |> 132 | drop_na() 133 | ``` 134 | 135 | We need to adapt this code to become a function, and this function needs a single argument, which is the dataset it should clean. 136 | 137 | It should look like this: 138 | ```{r} 139 | #| label: clean-data-function 140 | clean_penguin_data <- function(penguins_data_raw) { 141 | penguins_data_raw |> 142 | select( 143 | species = Species, 144 | bill_length_mm = `Culmen Length (mm)`, 145 | bill_depth_mm = `Culmen Depth (mm)` 146 | ) |> 147 | drop_na() 148 | } 149 | ``` 150 | 151 | Add this function to `_targets.R` after the part where you load packages with `library()` and before the list at the end. 152 | 153 | ::::::::::::::::: callout 154 | 155 | # RStudio function extraction 156 | 157 | RStudio also has a handy helper to extract a function from a piece of code. 158 | Once you have basic familiarity with functions, it may help you figure out the necessary input when turning code into a function. 159 | 160 | To use it, highlight the piece of code you want to make into a function. 161 | In our case that is the entire pipeline from `penguins_data_raw` to the `drop_na()` statement. 162 | Once you have done this, in RStudio go to the "Code" section in the top bar, and select "Extract function" from the list. 163 | A prompt will open asking you to hit enter, and you should have the following code in your script where the cursor was. 164 | 165 | This function will not work however, because it contains more stuff than is needed as an argument. 166 | This is because tidyverse uses non-standard evaluation, and we can write unquoted column names inside the `select()`. 167 | The function extractor thinks that all unquoted (or back-ticked) text in the code is a reference to an object. 168 | You will need to do some manual cleaning to get the function working, which is why its more convenient if you have a little experience with functions already. 169 | 170 | :::::::::::::::::: 171 | 172 | ::::::::::::::::::::::::::::::::::::: {.challenge} 173 | 174 | ## Challenge: Write a function that takes a numerical vector and returns its mean divided by 10. 175 | 176 | :::::::::::::::::::::::::::::::::: {.solution} 177 | 178 | ```{r} 179 | #| label: write-function-answer 180 | vecmean <- function(x) { 181 | mean(x) / 10 182 | } 183 | ``` 184 | 185 | :::::::::::::::::::::::::::::::::: 186 | 187 | ::::::::::::::::::::::::::::::::::::: 188 | 189 | ## Using functions in the workflow 190 | 191 | Now that we've defined our custom data cleaning function, we can put it to use in the workflow. 192 | 193 | Can you see how this might be done? 194 | 195 | We need to delete the corresponding code from the last `tar_target()` and replace it with a call to the new function. 196 | 197 | Modify the workflow to look like this: 198 | 199 | ```{r} 200 | #| label = "targets-show-fun-add", 201 | #| eval = FALSE, 202 | #| code = readLines("files/plans/plan_1.R")[2:21] 203 | ``` 204 | 205 | We should run the workflow again with `tar_make()` to make sure it is up-to-date: 206 | 207 | ```{r} 208 | #| label: targets-run-fun 209 | #| eval: true 210 | #| echo: [5] 211 | pushd(make_tempdir()) 212 | write_example_plan("plan_0.R") 213 | tar_make(reporter = "silent") 214 | write_example_plan("plan_1.R") 215 | tar_make() 216 | popd() 217 | ``` 218 | 219 | We will learn more soon about the messages that `targets()` prints out. 220 | 221 | ## Functions make it easier to reason about code 222 | 223 | Notice that now the list of targets at the end is starting to look like a high-level summary of your analysis. 224 | 225 | This is another advantage of using custom functions: **functions allows us to separate the details of each workflow step from the overall workflow**. 226 | 227 | To understand the overall workflow, you don't need to know all of the details about how the data were cleaned; you just need to know that there was a cleaning step. 228 | On the other hand, if you do need to go back and delve into the specifics of the data cleaning, you only need to pay attention to what happens inside that function, and you can ignore the rest of the workflow. 229 | **This makes it easier to reason about the code**, and will lead to fewer bugs and ultimately save you time and mental energy. 230 | 231 | Here we have only scratched the surface of functions, and you will likely need to get more help in learning about them. 232 | For more information, we recommend reading this episode in the R Novice lesson from Carpentries that is [all about functions](https://swcarpentry.github.io/r-novice-gapminder/10-functions.html). 233 | 234 | ::::::::::::::::::::::::::::::::::::: keypoints 235 | 236 | - Functions are crucial when repeating the same code many times with minor differences 237 | - RStudio's "Extract function" tool can help you get started with converting code into functions 238 | - Functions are an essential part of how `targets` works. 239 | 240 | :::::::::::::::::::::::::::::::::::::::::::::::: 241 | -------------------------------------------------------------------------------- /episodes/introduction.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Introduction" 3 | teaching: 10 4 | exercises: 2 5 | --- 6 | 7 | :::::::::::::::::::::::::::::::::::::: questions 8 | 9 | - Why should we care about reproducibility? 10 | - How can `targets` help us achieve reproducibility? 11 | 12 | :::::::::::::::::::::::::::::::::::::::::::::::: 13 | 14 | ::::::::::::::::::::::::::::::::::::: objectives 15 | 16 | - Explain why reproducibility is important for science 17 | - Describe the features of `targets` that enhance reproducibility 18 | 19 | :::::::::::::::::::::::::::::::::::::::::::::::: 20 | 21 | ::::::::::::::::::::::::::::::::::::: {.instructor} 22 | 23 | Episode summary: Introduce the idea of reproducibility and why / who would want to use `targets` 24 | 25 | ::::::::::::::::::::::::::::::::::::: 26 | 27 | ## What is reproducibility? 28 | 29 | Reproducibility is the ability for others (including your future self) to reproduce your analysis. 30 | 31 | We can only have confidence in the results of scientific analyses if they can be reproduced. 32 | 33 | However, reproducibility is not a binary concept (not reproducible vs. reproducible); rather, there is a scale from **less** reproducible to **more** reproducible. 34 | 35 | `targets` goes a long ways towards making your analyses **more reproducible**. 36 | 37 | Other practices you can use to further enhance reproducibility include controlling your computing environment with tools like Docker, conda, or renv, but we don't have time to cover those in this workshop. 38 | 39 | ## What is `targets`? 40 | 41 | `targets` is a workflow management package for the R programming language developed and maintained by Will Landau. 42 | 43 | The major features of `targets` include: 44 | 45 | - **Automation** of workflow 46 | - **Caching** of workflow steps 47 | - **Batch creation** of workflow steps 48 | - **Parallelization** at the level of the workflow 49 | 50 | This allows you to do the following: 51 | 52 | - return to a project after working on something else and immediately pick up where you left off without confusion or trying to remember what you were doing 53 | - change the workflow, then only re-run the parts that that are affected by the change 54 | - massively scale up the workflow without changing individual functions 55 | 56 | ... and of course, it will help others reproduce your analysis. 57 | 58 | ## Who should use `targets`? 59 | 60 | `targets` is by no means the only workflow management software. 61 | There is a large number of similar tools, each with varying features and use-cases. 62 | For example, [snakemake](https://snakemake.readthedocs.io/en/stable/) is a popular workflow tool for python, and [`make`](https://www.gnu.org/software/make/) is a tool that has been around for a very long time for automating bash scripts. 63 | `targets` is designed to work specifically with R, so it makes the most sense to use it if you primarily use R, or intend to. 64 | If you mostly code with other tools, you may want to consider an alternative. 65 | 66 | The **goal** of this workshop is to **learn how to use `targets` to reproducible data analysis in R**. 67 | 68 | ## Where to get more information 69 | 70 | `targets` is a sophisticated package and there is a lot more to learn that we can cover in this workshop. 71 | 72 | Here are some recommended resources for continuing on your `targets` journey: 73 | 74 | - [The `targets` R package user manual](https://books.ropensci.org/targets/) by the author of `targets`, Will Landau, should be considered required reading for anyone seriously interested in `targets`. 75 | - [The `targets` discussion board](https://github.com/ropensci/targets/discussions) is a great place for asking questions and getting help. Before you ask a question though, be sure to [read the policy on asking for help](https://books.ropensci.org/targets/help.html). 76 | - [The `targets` package webpage](https://docs.ropensci.org/targets/) includes documentation of all `targets` functions. 77 | - [The `tarchetypes` package webpage](https://docs.ropensci.org/tarchetypes/) includes documentation of all `tarchetypes` functions. You will almost certainly use `tarchetypes` along with `targets`, so it's good to consult both. 78 | - [Reproducible computation at scale in R with `targets`](https://github.com/wlandau/targets-tutorial) is a tutorial by Will Landau analyzing customer churn with Keras. 79 | - [Recorded talks](https://github.com/ropensci/targets#recorded-talks) and [example projects](https://github.com/ropensci/targets#example-projects) listed on the `targets` README. 80 | 81 | ## About the example dataset 82 | 83 | For this workshop, we will analyze an example dataset of measurements taken on adult foraging Adélie, Chinstrap, and Gentoo penguins observed on islands in the Palmer Archipelago, Antarctica. 84 | 85 | The data are available from the `palmerpenguins` R package. You can get more information about the data by running `?palmerpenguins`. 86 | 87 | ![The three species of penguins in the `palmerpenguins` dataset. Artwork by @allison_horst.](https://allisonhorst.github.io/palmerpenguins/reference/figures/lter_penguins.png) 88 | 89 | The goal of the analysis is to determine the relationship between bill length and depth by using linear models. 90 | 91 | We will gradually build up the analysis through this lesson, but you can see the final version at . 92 | 93 | ::::::::::::::::::::::::::::::::::::: keypoints 94 | 95 | - We can only have confidence in the results of scientific analyses if they can be reproduced by others (including your future self) 96 | - `targets` helps achieve reproducibility by automating workflow 97 | - `targets` is designed for use with the R programming language 98 | - The example dataset for this workshop includes measurements taken on penguins in Antarctica 99 | 100 | :::::::::::::::::::::::::::::::::::::::::::::::: 101 | -------------------------------------------------------------------------------- /episodes/lifecycle.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'The Workflow Lifecycle' 3 | teaching: 10 4 | exercises: 2 5 | --- 6 | 7 | :::::::::::::::::::::::::::::::::::::: questions 8 | 9 | - What happens if we re-run a workflow? 10 | - How does `targets` know what steps to re-run? 11 | - How can we inspect the state of the workflow? 12 | 13 | :::::::::::::::::::::::::::::::::::::::::::::::: 14 | 15 | ::::::::::::::::::::::::::::::::::::: objectives 16 | 17 | - Explain how `targets` helps increase efficiency 18 | - Be able to inspect a workflow to see what parts are outdated 19 | 20 | :::::::::::::::::::::::::::::::::::::::::::::::: 21 | 22 | ::::::::::::::::::::::::::::::::::::: {.instructor} 23 | 24 | Episode summary: Demonstrate typical cycle of running `targets`: make, inspect, adjust, make... 25 | 26 | ::::::::::::::::::::::::::::::::::::: 27 | 28 | ```{r} 29 | #| label: setup 30 | #| echo: FALSE 31 | #| message: FALSE 32 | #| warning: FALSE 33 | library(targets) 34 | library(visNetwork) 35 | source("files/lesson_functions.R") 36 | ``` 37 | 38 | ## Re-running the workflow 39 | 40 | One of the features of `targets` is that it maximizes efficiency by only running the parts of the workflow that need to be run. 41 | 42 | This is easiest to understand by trying it yourself. Let's try running the workflow again: 43 | 44 | ```{r} 45 | #| label: targets-run 46 | #| echo: [5] 47 | # Each tar_script is fresh, so need to run once to catch up to learners 48 | pushd(make_tempdir()) 49 | write_example_plan("plan_1.R") 50 | tar_make(reporter = "silent") 51 | tar_make() 52 | popd() 53 | ``` 54 | 55 | Remember how the first time we ran the pipeline, `targets` printed out a list of each target as it was being built? 56 | 57 | This time, it tells us it is skipping those targets; they have already been built, so there's no need to run that code again. 58 | 59 | Remember, the fastest code is the code you don't have to run! 60 | 61 | ## Re-running the workflow after modification 62 | 63 | What happens when we change one part of the workflow then run it again? 64 | 65 | Say that we decide the species names should be shorter. 66 | Right now they include the common name and the scientific name, but we really only need the first part of the common name to distinguish them. 67 | 68 | Edit `_targets.R` so that the `clean_penguin_data()` function looks like this: 69 | 70 | ```{r} 71 | #| label: new-func 72 | #| eval: FALSE 73 | #| file: files/tar_functions/clean_penguin_data.R 74 | ``` 75 | 76 | Then run it again. 77 | 78 | ```{r} 79 | #| label: targets-run-2 80 | #| echo: [6] 81 | plan_2_dir <- make_tempdir() 82 | pushd(plan_2_dir) 83 | write_example_plan("plan_1.R") 84 | tar_make(reporter = "silent") 85 | write_example_plan("plan_2.R") 86 | tar_make() 87 | popd() 88 | ``` 89 | 90 | What happened? 91 | 92 | This time, it skipped `penguins_csv_file` and `penguins_data_raw` and only ran `penguins_data`. 93 | 94 | Of course, since our example workflow is so short we don't even notice the amount of time saved. 95 | But imagine using this in a series of computationally intensive analysis steps. 96 | The ability to automatically skip steps results in a massive increase in efficiency. 97 | 98 | ::::::::::::::::::::::::::::::::::::: challenge 99 | 100 | ## Challenge 1: Inspect the output 101 | 102 | How can you inspect the contents of `penguins_data`? 103 | 104 | :::::::::::::::::::::::::::::::::: solution 105 | 106 | With `tar_read(penguins_data)` or by running `tar_load(penguins_data)` followed by `penguins_data`. 107 | 108 | :::::::::::::::::::::::::::::::::::::::::::: 109 | 110 | ::::::::::::::::::::::::::::::::::::::::::::::: 111 | 112 | ## Under the hood 113 | 114 | How does `targets` keep track of which targets are up-to-date vs. outdated? 115 | 116 | For each target in the workflow (items in the list at the end of the `_targets.R` file) and any custom functions used in the workflow, `targets` calculates a **hash value**, or unique combination of letters and digits that represents an object in the computer's memory. 117 | You can think of the hash value (or "hash" for short) as **a unique fingerprint** for a target or function. 118 | 119 | The first time your run `tar_make()`, `targets` calculates the hashes for each target and function as it runs the code and stores them in the targets cache (the `_targets` folder). 120 | Then, for each subsequent call of `tar_make()`, it calculates the hashes again and compares them to the stored values. 121 | It detects which have changed, and this is how it knows which targets are out of date. 122 | 123 | :::::::::::::::::::::::::::::::::::::::: callout 124 | 125 | ## Where the hashes live 126 | 127 | If you are curious about what the hashes look like, you can see them in the file `_targets/meta/meta`, but **do not edit this file by hand**---that would ruin your workflow! 128 | 129 | :::::::::::::::::::::::::::::::::::::::: 130 | 131 | This information is used in combination with the dependency relationships (in other words, how each target depends on the others) to re-run the workflow in the most efficient way possible: code is only run for targets that need to be re-built, and others are skipped. 132 | 133 | ## Visualizing the workflow 134 | 135 | Typically, you will be making edits to various places in your code, adding new targets, and running the workflow periodically. 136 | It is good to be able to visualize the state of the workflow. 137 | 138 | This can be done with `tar_visnetwork()` 139 | 140 | ```{r} 141 | #| label: targets-run-hide-3 142 | #| echo: [5] 143 | #| results: "asis" 144 | #| eval: FALSE 145 | # TODO: Change #| eval to TRUE when 146 | # https://github.com/carpentries/sandpaper/issues/443 147 | # is resolved 148 | pushd(plan_2_dir) 149 | tar_visnetwork() 150 | popd() 151 | ``` 152 | 153 | ![](fig/lifecycle-visnetwork.png){alt="Visualization of the targets worklow, showing 'penguins_data' connected by lines to 'penguins_data_raw', 'penguins_csv_file' and 'clean_penguin_data'"} 154 | 155 | You should see the network show up in the plot area of RStudio. 156 | 157 | It is an HTML widget, so you can zoom in and out (this isn't important for the current example since it is so small, but is useful for larger, "real-life" workflows). 158 | 159 | Here, we see that all of the targets are dark green, indicating that they are up-to-date and would be skipped if we were to run the workflow again. 160 | 161 | ::::::::::::::::::::::::::::::::::::: prereq 162 | 163 | ## Installing visNetwork 164 | 165 | You may encounter an error message `The package "visNetwork" is required.` 166 | 167 | In this case, install it first with `install.packages("visNetwork")`. 168 | 169 | :::::::::::::::::::::::::::::::::::::::::::::::: 170 | 171 | ::::::::::::::::::::::::::::::::::::: challenge 172 | 173 | ## Challenge 2: What else can the visualization tell us? 174 | 175 | Modify the workflow in `_targets.R`, then run `tar_visnetwork()` again **without** running `tar_make()`. 176 | What color indicates that a target is out of date? 177 | 178 | :::::::::::::::::::::::::::::::::: solution 179 | 180 | Light blue indicates the target is out of date. 181 | 182 | Depending on how you modified the code, any or all of the targets may now be light blue. 183 | 184 | :::::::::::::::::::::::::::::::::::::::::::: 185 | 186 | ::::::::::::::::::::::::::::::::::::::::::::::: 187 | 188 | ::::::::::::::::::::::::::::::::::::: callout 189 | 190 | ## 'Outdated' does not always mean 'will be run' 191 | 192 | Just because a target appears as light blue (is "outdated") in the network visualization, this does not guarantee that it will be re-built during the next run. Rather, it means that **at least of one the targets that it depends on has changed**. 193 | 194 | For example, if the workflow state looked like this: 195 | 196 | `A -> B* -> C -> D` 197 | 198 | where the `*` indicates that `B` has changed compared to the last time the workflow was run, the network visualization will show `B`, `C`, and `D` all as light blue. 199 | 200 | But if re-running the workflow results in the exact same value for `C` as before, `D` will not be re-run (will be "skipped"). 201 | 202 | Most of the time, a single change will cascade to the rest of the downstream targets and cause them to be re-built, but this is not always the case. `targets` has no way of knowing ahead of time what the actual output will be, so it cannot provide a network visualization that completely predicts the future! 203 | 204 | ::::::::::::::::::::::::::::::::::::::::::::::: 205 | 206 | ## Other ways to check workflow status 207 | 208 | The visualization is very useful, but sometimes you may be working on a server that doesn't provide graphical output, or you just want a quick textual summary of the workflow. 209 | There are some other useful functions that can do that. 210 | 211 | `tar_outdated()` lists only the outdated targets; that is, targets that will be built during the next run, or depend on such a target. 212 | If everything is up to date, it will return a zero-length character vector (`character(0)`). 213 | 214 | ```{r} 215 | #| label: targets-outdated 216 | #| echo: [2] 217 | pushd(plan_2_dir) 218 | tar_outdated() 219 | popd() 220 | ``` 221 | 222 | `tar_progress()` shows the current status of the workflow as a dataframe. 223 | You may find it helpful to further manipulate the dataframe to obtain useful summaries of the workflow, for example using `dplyr` (such data manipulation is beyond the scope of this lesson but the instructor may demonstrate its use). 224 | 225 | ```{r} 226 | #| label: targets-progress 227 | #| echo: [2] 228 | pushd(plan_2_dir) 229 | tar_progress() 230 | popd() 231 | ``` 232 | 233 | ## Granular control of targets 234 | 235 | It is possible to only make a particular target instead of running the entire workflow. 236 | 237 | To do this, type the name of the target you wish to build after `tar_make()` (note that any targets required by the one you specify will also be built). 238 | For example, `tar_make(penguins_data_raw)` would **only** build `penguins_data_raw`, not `penguins_data`. 239 | 240 | Furthermore, if you want to manually "reset" a target and make it appear out-of-date, you can do so with `tar_invalidate()`. This means that target (and any that depend on it) will be re-run next time. 241 | 242 | Let's give this a try. Remember that our pipeline is currently up to date, so `tar_make()` will skip everything: 243 | 244 | ```{r} 245 | #| label: targets-progress-show-2 246 | #| eval: true 247 | #| echo: [2] 248 | pushd(plan_2_dir) 249 | tar_make() 250 | popd() 251 | ``` 252 | 253 | Let's invalidate `penguins_data` and run it again: 254 | 255 | ```{r} 256 | #| label: targets-progress-show-3 257 | #| eval: true 258 | #| echo: [2, 3] 259 | pushd(plan_2_dir) 260 | tar_invalidate(penguins_data) 261 | tar_make() 262 | popd() 263 | ``` 264 | 265 | If you want to reset **everything** and start fresh, you can use `tar_invalidate(everything())` (`tar_invalidate()` [accepts `tidyselect` expressions](https://docs.ropensci.org/targets/reference/tar_invalidate.html) to specify target names). 266 | 267 | **Caution should be exercised** when using granular methods like this, though, since you may end up with your workflow in an unexpected state. The surest way to maintain an up-to-date workflow is to run `tar_make()` frequently. 268 | 269 | ## How this all works in practice 270 | 271 | In practice, you will likely be switching between running the workflow with `tar_make()`, loading the targets you built with `tar_load()`, and editing your custom functions by running code in an interactive R session. It takes some time to get used to it, but soon you will feel that your code isn't "real" until it is embedded in a `targets` workflow. 272 | 273 | ::::::::::::::::::::::::::::::::::::: keypoints 274 | 275 | - `targets` only runs the steps that have been affected by a change to the code 276 | - `tar_visnetwork()` shows the current state of the workflow as a network 277 | - `tar_progress()` shows the current state of the workflow as a data frame 278 | - `tar_outdated()` lists outdated targets 279 | - `tar_invalidate()` can be used to invalidate (re-run) specific targets 280 | 281 | :::::::::::::::::::::::::::::::::::::::::::::::: 282 | -------------------------------------------------------------------------------- /episodes/organization.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Best Practices for targets Project Organization' 3 | teaching: 10 4 | exercises: 2 5 | --- 6 | 7 | :::::::::::::::::::::::::::::::::::::: questions 8 | 9 | - What are best practices for organizing `targets` projects? 10 | - How does the organization of a `targets` workflow differ from a script-based analysis? 11 | 12 | :::::::::::::::::::::::::::::::::::::::::::::::: 13 | 14 | ::::::::::::::::::::::::::::::::::::: objectives 15 | 16 | - Explain how to organize `targets` projects for maximal reproducibility 17 | - Understand how to use functions in the context of `targets` 18 | 19 | :::::::::::::::::::::::::::::::::::::::::::::::: 20 | 21 | ::::::::::::::::::::::::::::::::::::: instructor 22 | 23 | Episode summary: Demonstrate best-practices for project organization 24 | 25 | ::::::::::::::::::::::::::::::::::::: 26 | 27 | ```{r} 28 | #| label: setup 29 | #| echo: FALSE 30 | #| message: FALSE 31 | #| warning: FALSE 32 | library(targets) 33 | library(tarchetypes) 34 | source("files/lesson_functions.R") 35 | ``` 36 | 37 | ## A simpler way to write workflow plans 38 | 39 | The default way to specify targets in the plan is with the `tar_target()` function. 40 | But this way of writing plans can be a bit verbose. 41 | 42 | There is an alternative provided by the `tarchetypes` package, also written by the creator of `targets`, Will Landau. 43 | 44 | ::::::::::::::::::::::::::::::::::::: prereq 45 | 46 | ## Install `tarchetypes` 47 | 48 | If you haven't done so yet, install `tarchetypes` with `install.packages("tarchetypes")`. 49 | 50 | ::::::::::::::::::::::::::::::::::::: 51 | 52 | The purpose of the `tarchetypes` is to provide various shortcuts that make writing `targets` pipelines easier. 53 | We will introduce just one for now, `tar_plan()`. This is used in place of `list()` at the end of the `_targets.R` script. 54 | By using `tar_plan()`, instead of specifying targets with `tar_target()`, we can use a syntax like this: `target_name = target_command`. 55 | 56 | Let's edit the penguins workflow to use the `tar_plan()` syntax: 57 | 58 | 59 | ```{r} 60 | #| label = "tar-plan-show-1", 61 | #| eval = FALSE, 62 | #| code = c(readLines("files/packages.R")[1:4], "\n", readLines("files/tar_functions/clean_penguin_data.R"), "\n", readLines("files/plans/plan_2b.R")[5:9]) 63 | ``` 64 | 65 | I think it is easier to read, do you? 66 | 67 | Notice that `tar_plan()` does not mean you have to write *all* targets this way; you can still use the `tar_target()` format within `tar_plan()`. 68 | That is because `=`, while short and easy to read, does not provide all of the customization that `targets` is capable of. 69 | This doesn't matter so much for now, but it will become important when you start to create more advanced `targets` workflows. 70 | 71 | ## Organizing files and folders 72 | 73 | So far, we have been doing everything with a single `_targets.R` file. 74 | This is OK for a small workflow, but does not work very well when the workflow gets bigger. 75 | There are better ways to organize your code. 76 | 77 | First, let's create a directory called `R` to store R code *other than* `_targets.R` (remember, `_targets.R` must be placed in the overall project directory, not in a subdirectory). 78 | Create a new R file in `R/` called `functions.R`. 79 | This is where we will put our custom functions. 80 | Let's go ahead and put `clean_penguin_data()` in there now and save it. 81 | 82 | Similarly, let's put the `library()` calls in their own script in `R/` called `packages.R` (this isn't the only way to do it though; see the ["Managing Packages" episode](https://joelnitta.github.io/targets-workshop/packages.html) for alternative approaches). 83 | 84 | We will also need to modify our `_targets.R` script to call these scripts with `source`: 85 | 86 | ```{r} 87 | #| label = "tar-plan-show-2", 88 | #| eval = FALSE, 89 | #| code = readLines("files/plans/plan_2b.R")[2:9] 90 | ``` 91 | 92 | Now `_targets.R` is much more streamlined: it is focused just on the workflow and immediately tells us what happens in each step. 93 | 94 | Finally, let's make some directories for storing data and output---files that are not code. 95 | Create a new directory inside the targets cache called `user`: `_targets/user`. 96 | Within `user`, create two more directories, `data` and `results`. 97 | (If you use version control, you will probably want to ignore the `_targets` directory). 98 | 99 | ## A word about functions 100 | 101 | We mentioned custom functions earlier in the lesson, but this is an important topic that deserves further clarification. 102 | If you are used to analyzing data in R with a series of scripts instead of a single workflow like `targets`, you may not write many functions (using the `function()` function). 103 | 104 | This is a major difference from `targets`. 105 | It would be quite difficult to write an efficient `targets` pipeline without the use of custom functions, because each target you build has to be the output of a single command. 106 | 107 | We don't have time in this curriculum to cover how to write functions in R, but the [Software Carpentry lesson](https://swcarpentry.github.io/r-novice-gapminder/10-functions) is recommended for reviewing this topic. 108 | 109 | Another major difference is that **each target must have a unique name**. 110 | You may be used to writing code that looks like this: 111 | 112 | ```{r} 113 | #| eval: FALSE 114 | #| label: example-script 115 | 116 | # Store a person's height in cm, then convert to inches 117 | height <- 160 118 | height <- height / 2.54 119 | ``` 120 | 121 | You would get an error if you tried to run the equivalent targets pipeline: 122 | 123 | ```{r} 124 | #| eval: FALSE 125 | #| label: example-bad-pipeline-show 126 | #| echo: [-1, -2] 127 | library(targets) 128 | library(tarchetypes) 129 | tar_plan( 130 | height = 160, 131 | height = height / 2.54 132 | ) 133 | ``` 134 | 135 | ```{r} 136 | #| echo: FALSE 137 | #| label: example-bad-pipeline-hide 138 | #| error: true 139 | tar_dir({ 140 | write_example_plan(chunk = "example-bad-pipeline-show") 141 | tar_make() 142 | }) 143 | ``` 144 | 145 | **A major part of working with `targets` pipelines is writing custom functions that are the right size.** 146 | They should not be so small that each is just a single line of code; this would make your pipeline difficult to understand and be too difficult to maintain. 147 | On the other hand, they should not be so big that each has large numbers of inputs and is thus overly sensitive to changes. 148 | 149 | Striking this balance is more of art than science, and only comes with practice. I find a good rule of thumb is no more than three inputs per target. 150 | 151 | ::::::::::::::::::::::::::::::::::::: keypoints 152 | 153 | - Put code in the `R/` folder 154 | - Put functions in `R/functions.R` 155 | - Specify packages in `R/packages.R` 156 | - Put other miscellaneous files in `_targets/user` 157 | - Writing functions is a key skill for `targets` pipelines 158 | 159 | :::::::::::::::::::::::::::::::::::::::::::::::: 160 | 161 | -------------------------------------------------------------------------------- /episodes/packages.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Managing Packages' 3 | teaching: 10 4 | exercises: 2 5 | --- 6 | 7 | :::::::::::::::::::::::::::::::::::::: questions 8 | 9 | - How should I manage packages for my `targets` project? 10 | 11 | :::::::::::::::::::::::::::::::::::::::::::::::: 12 | 13 | ::::::::::::::::::::::::::::::::::::: objectives 14 | 15 | - Demonstrate best practices for managing packages 16 | 17 | :::::::::::::::::::::::::::::::::::::::::::::::: 18 | 19 | ::::::::::::::::::::::::::::::::::::: instructor 20 | 21 | Episode summary: Show how to load packages and maintain package versions 22 | 23 | ::::::::::::::::::::::::::::::::::::: 24 | 25 | ```{r} 26 | #| label: setup 27 | #| echo: FALSE 28 | #| message: FALSE 29 | #| warning: FALSE 30 | library(targets) 31 | library(tarchetypes) 32 | source("files/lesson_functions.R") 33 | ``` 34 | 35 | ## Loading packages 36 | 37 | Almost every R analysis relies on packages for functions beyond those available in base R. 38 | 39 | There are three main ways to load packages in `targets` workflows. 40 | 41 | ### Method 1: `library()` {#method-1} 42 | 43 | This is the method you are almost certainly more familiar with, and is the method we have been using by default so far. 44 | 45 | Like any other R script, include `library()` calls near the top of the `_targets.R` script. Alternatively (and as the recommended best practice for project organization), you can put all of the `library()` calls in a separate script---this is typically called `packages.R` and stored in the `R/` directory of your project. 46 | 47 | The potential downside to this approach is that if you have a long list of packages to load, certain functions like `tar_visnetwork()`, `tar_outdated()`, etc., may take an unnecessarily long time to run because they have to load all the packages, even though they don't necessarily use them. 48 | 49 | ### Method 2: `tar_option_set()` {#method-2} 50 | 51 | In this method, use the `tar_option_set()` function in `_targets.R` to specify the packages to load when running the workflow. 52 | 53 | This will be demonstrated using the pre-cleaned dataset from the `palmerpenguins` package. Let's say we want to filter it down to just data for the Adelie penguin. 54 | 55 | ::::::::::::::::::::::::::::::::::::: {.callout} 56 | 57 | ## Save your progress 58 | 59 | You can only have one active `_targets.R` file at a time in a given project. 60 | 61 | We are about to create a new `_targets.R` file, but you probably don't want to lose your progress in the one we have been working on so far (the penguins bill analysis). You can temporarily rename that one to something like `_targets_old.R` so that you don't overwrite it with the new example `_targets.R` file below. Then, rename them when you are ready to work on it again. 62 | 63 | ::::::::::::::::::::::::::::::::::::: 64 | 65 | This is what using the `tar_option_set()` method looks like: 66 | 67 | ```{r} 68 | #| eval: FALSE 69 | #| label: load-pkg-show 70 | library(targets) 71 | library(tarchetypes) 72 | 73 | tar_option_set(packages = c("dplyr", "palmerpenguins")) 74 | 75 | tar_plan( 76 | adelie_data = filter(penguins, species == "Adelie") 77 | ) 78 | ``` 79 | 80 | ```{r} 81 | #| echo: FALSE 82 | #| label: load-pkg-hide 83 | tar_dir({ 84 | write_example_plan(chunk = "load-pkg-show") 85 | tar_make() 86 | }) 87 | ``` 88 | 89 | This method gets around the slow-downs that may sometimes be experienced with Method 1. 90 | 91 | ### Method 3: `packages` argument of `tar_target()` {#method-3} 92 | 93 | The main function for defining targets, `tar_target()` includes a `packages` argument that will load the specified packages **only for that target**. 94 | 95 | Here is how we could use this method, modified from the same example as above. 96 | 97 | ```{r} 98 | #| eval: FALSE 99 | #| label: load-pkg-show-2 100 | library(targets) 101 | library(tarchetypes) 102 | 103 | tar_plan( 104 | tar_target( 105 | adelie_data, 106 | filter(penguins, species == "Adelie"), 107 | packages = c("dplyr", "palmerpenguins") 108 | ) 109 | ) 110 | ``` 111 | 112 | ```{r} 113 | #| echo: FALSE 114 | #| label: load-pkg-hide-2 115 | tar_dir({ 116 | write_example_plan(chunk="load-pkg-show-2") 117 | tar_make() 118 | }) 119 | ``` 120 | 121 | This can be more memory efficient in some cases than loading all packages, since not every target is always made during a typical run of the workflow. 122 | But, it can be tedious to remember and specify packages needed on a per-target basis. 123 | 124 | ### One more option 125 | 126 | Another alternative that does not actually involve loading packages is to specify the package associated with each function by using the `::` notation, for example, `dplyr::mutate()`. 127 | This means you can **avoid loading packages altogether**. 128 | 129 | Here is how to write the plan using this method: 130 | 131 | ```{r} 132 | #| eval: FALSE 133 | #| label: load-pkg-show-3 134 | library(targets) 135 | library(tarchetypes) 136 | 137 | tar_plan( 138 | adelie_data = dplyr::filter(palmerpenguins::penguins, species == "Adelie") 139 | ) 140 | ``` 141 | 142 | ```{r} 143 | #| echo: FALSE 144 | #| label: load-pkg-hide-3 145 | tar_dir({ 146 | write_example_plan(chunk = "load-pkg-show-3") 147 | tar_make() 148 | }) 149 | ``` 150 | 151 | The benefits of this approach are that the origins of all functions is explicit, so you could browse your code (for example, by looking at its source in GitHub), and immediately know where all the functions come from. 152 | The downside is that it is rather verbose because you need to type the package name every time you use one of its functions. 153 | 154 | ### Which is the right way? 155 | 156 | **There is no "right" answer about how to load packages**---it is a matter of what works best for your particular situation. 157 | 158 | Often a reasonable approach is to load your most commonly used packages with `library()` (such as `tidyverse`) in `packages.R`, then use `::` notation for less frequently used functions whose origins you may otherwise forget. 159 | 160 | ## Maintaining package versions 161 | 162 | ### Tracking of custom functions vs. functions from packages 163 | 164 | A critical thing to understand about `targets` is that **it only tracks custom functions and targets**, not functions provided by packages. 165 | 166 | However, the content of packages can change, and packages typically get updated on a regular basis. **The output of your workflow may depend not only on the packages you use, but their versions**. 167 | 168 | Therefore, it is a good idea to track package versions. 169 | 170 | ### About `renv` 171 | 172 | Fortunately, you don't have to do this by hand: there are R packages available that can help automate this process. We recommend [renv](https://rstudio.github.io/renv/index.html), but there are others available as well (e.g., [groundhog](https://groundhogr.com/)). We don't have the time to cover detailed usage of `renv` in this lesson. To get started with `renv`, see the ["Introduction to renv" vignette](https://rstudio.github.io/renv/articles/renv.html). 173 | 174 | You can generally use `renv` the same way you would for a `targets` project as any other R project. However, there is one exception: if you load packages using `tar_option_set()` or the `packages` argument of `tar_target()` ([Method 2](#method-2) or [Method 3](#method-3), respectively), `renv` will not detect them (because it expects packages to be loaded with `library()`, `require()`, etc.). 175 | 176 | The solution in this case is to use the [`tar_renv()` function](https://docs.ropensci.org/targets/reference/tar_renv.html). This will write a separate file with `library()` calls for each package used in the workflow so that `renv` will properly detect them. 177 | 178 | ### Selective tracking of functions from packages 179 | 180 | Because `targets` doesn't track functions from packages, if you update a package and the contents of one of its functions changes, `targets` **will not re-build the target that was generated by that function**. 181 | 182 | However, it is possible to change this behavior on a per-package basis. 183 | This is best done only for a small number of packages, since adding too many would add too much computational overhead to `targets` when it has to calculate dependencies. 184 | For example, you may want to do this if you are using your own custom package that you update frequently. 185 | 186 | The way to do so is by using `tar_option_set()`, specifying the **same** package name in both `packages` and `imports`. Here is a modified version of the earlier code that demonstrates this for `dplyr` and `palmerpenguins`. 187 | 188 | ```{r} 189 | #| eval: FALSE 190 | #| label: load-pkg-show-4 191 | library(targets) 192 | library(tarchetypes) 193 | 194 | tar_option_set( 195 | packages = c("dplyr", "palmerpenguins"), 196 | imports = c("dplyr", "palmerpenguins") 197 | ) 198 | 199 | tar_plan( 200 | adelie_data = filter(penguins, species == "Adelie") 201 | ) 202 | ``` 203 | 204 | If we were to re-install either `dplyr` or `palmerpenguins` and one of the functions used from those in the pipeline changes (for example, `filter()`), any target depending on that function will be rebuilt. 205 | 206 | ## Resolving namespace conflicts 207 | 208 | There is one final best-practice to mention related to packages: resolving namespace conflicts. 209 | 210 | "Namespace" refers to the idea that a certain set of unique names are only unique **within a particular context**. 211 | For example, all the function names of a package have to be unique, but only within that package. 212 | Function names could be duplicated across packages. 213 | 214 | As you may imagine, this can cause confusion. 215 | For example, the `filter()` function appears in both the `stats` package and the `dplyr` package, but does completely different things in each. 216 | This is a **namespace conflict**: how do we know which `filter()` we are talking about? 217 | 218 | The `conflicted` package can help prevent such confusion by stopping you if you try to use an ambiguous function, and help you be explicit about which package to use. 219 | We don't have time to cover the details here, but you can read more about how to use `conflicted` at its [website](https://conflicted.r-lib.org/). 220 | 221 | When you use `conflicted`, you will typically run a series of commands to explicitly resolve namespace conflicts, like `conflicts_prefer(dplyr::filter)` (this would tell R that we want to use `filter` from `dplyr`, not `stats`). 222 | 223 | To use this in a `targets` workflow, you should put all calls to `conflicts_prefer` in a special file called `.Rprofile` that is located in the main folder of your project. This will ensure that the conflicts are always resolved for each target. 224 | 225 | The recommended way to edit your `.Rprofile` is to use `usethis::edit_r_profile("project")`. 226 | This will open `.Rprofile` in your editor, where you can edit it and save it. 227 | 228 | For example, your `.Rprofile` could include this: 229 | 230 | ```{r} 231 | #| eval: false 232 | library(conflicted) 233 | conflicts_prefer(dplyr::filter) 234 | ``` 235 | 236 | Note that you don't need to run `source()` to run the code in `.Rprofile`. 237 | It will always get run at the start of each R session automatically. 238 | 239 | ::::::::::::::::::::::::::::::::::::: keypoints 240 | 241 | - There are multiple ways to load packages with `targets` 242 | - `targets` only tracks user-defined functions, not packages 243 | - Use `renv` to manage package versions 244 | - Use the `conflicted` package to manage namespace conflicts 245 | 246 | :::::::::::::::::::::::::::::::::::::::::::::::: 247 | -------------------------------------------------------------------------------- /episodes/parallel.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Parallel Processing' 3 | teaching: 15 4 | exercises: 2 5 | --- 6 | 7 | :::::::::::::::::::::::::::::::::::::: questions 8 | 9 | - How can we build targets in parallel? 10 | 11 | :::::::::::::::::::::::::::::::::::::::::::::::: 12 | 13 | ::::::::::::::::::::::::::::::::::::: objectives 14 | 15 | - Be able to build targets in parallel 16 | 17 | :::::::::::::::::::::::::::::::::::::::::::::::: 18 | 19 | ::::::::::::::::::::::::::::::::::::: instructor 20 | 21 | Episode summary: Show how to use parallel processing 22 | 23 | ::::::::::::::::::::::::::::::::::::: 24 | 25 | ```{r} 26 | #| label: setup 27 | #| echo: FALSE 28 | #| message: FALSE 29 | #| warning: FALSE 30 | library(targets) 31 | library(tarchetypes) 32 | library(broom) 33 | 34 | if (interactive()) { 35 | setwd("episodes") 36 | } 37 | 38 | source("files/lesson_functions.R") 39 | 40 | # Increase width for printing tibbles 41 | options(width = 140) 42 | ``` 43 | 44 | Once a pipeline starts to include many targets, you may want to think about parallel processing. 45 | This takes advantage of multiple processors in your computer to build multiple targets at the same time. 46 | 47 | ::::::::::::::::::::::::::::::::::::: {.callout} 48 | 49 | ## When to use parallel processing 50 | 51 | Parallel processing should only be used if your workflow has independent tasks---if your workflow only consists of a linear sequence of targets, then there is nothing to parallelize. 52 | Most workflows that use branching can benefit from parallelism. 53 | 54 | ::::::::::::::::::::::::::::::::::::: 55 | 56 | `targets` includes support for high-performance computing, cloud computing, and various parallel backends. 57 | Here, we assume you are running this analysis on a laptop and so will use a relatively simple backend. 58 | If you are interested in high-performance computing, [see the `targets` manual](https://books.ropensci.org/targets/hpc.html). 59 | 60 | ### Set up workflow 61 | 62 | To enable parallel processing with `crew` you only need to load the `crew` package, then tell `targets` to use it using `tar_option_set`. 63 | Specifically, the following lines enable crew, and tells it to use 2 parallel workers. 64 | You can increase this number on more powerful machines: 65 | 66 | ```r 67 | library(crew) 68 | tar_option_set( 69 | controller = crew_controller_local(workers = 2) 70 | ) 71 | ``` 72 | 73 | Make these changes to the penguins analysis. 74 | It should now look like this: 75 | 76 | ```{r} 77 | #| label = "example-model-show-setup", 78 | #| eval = FALSE, 79 | #| code = readLines("files/plans/plan_9.R")[3:41] 80 | ``` 81 | 82 | There is still one more thing we need to modify only for the purposes of this demo: if we ran the analysis in parallel now, you wouldn't notice any difference in compute time because the functions are so fast. 83 | 84 | So let's make "slow" versions of `model_glance()` and `model_augment()` using the `Sys.sleep()` function, which just tells the computer to wait some number of seconds. 85 | This will simulate a long-running computation and enable us to see the difference between running sequentially and in parallel. 86 | 87 | Add these functions to `functions.R` (you can copy-paste the original ones, then modify them): 88 | 89 | ```{r} 90 | #| label: slow-funcs 91 | #| eval: false 92 | #| file: 93 | #| - files/tar_functions/model_glance_slow.R 94 | #| - files/tar_functions/model_augment_slow.R 95 | ``` 96 | 97 | Then, change the plan to use the "slow" version of the functions: 98 | 99 | ```{r} 100 | #| label = "example-model-show-9", 101 | #| eval = FALSE, 102 | #| code = readLines("files/plans/plan_10.R")[3:41] 103 | ``` 104 | 105 | Finally, run the pipeline with `tar_make()` as normal. 106 | 107 | ```{r} 108 | #| label: example-model-hide-9 109 | #| warning: false 110 | #| message: false 111 | #| echo: false 112 | 113 | plan_10_dir <- make_tempdir() 114 | pushd(plan_10_dir) 115 | write_example_plan("plan_9.R") 116 | tar_make(reporter = "silent") 117 | write_example_plan("plan_10.R") 118 | tar_make() 119 | popd() 120 | ``` 121 | 122 | Notice that although the time required to build each individual target is about 4 seconds, the total time to run the entire workflow is less than the sum of the individual target times! That is proof that processes are running in parallel **and saving you time**. 123 | 124 | The unique and powerful thing about targets is that **we did not need to change our custom function to run it in parallel**. We only adjusted *the workflow*. This means it is relatively easy to refactor (modify) a workflow for running sequentially locally or running in parallel in a high-performance context. 125 | 126 | Now that we have demonstrated how this works, you can change your analysis plan back to the original versions of the functions you wrote. 127 | 128 | ::::::::::::::::::::::::::::::::::::: keypoints 129 | 130 | - Dynamic branching creates multiple targets with a single command 131 | - You usually need to write custom functions so that the output of the branches includes necessary metadata 132 | - Parallel computing works at the level of the workflow, not the function 133 | 134 | :::::::::::::::::::::::::::::::::::::::::::::::: 135 | -------------------------------------------------------------------------------- /episodes/quarto.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Reproducible Reports with Quarto' 3 | teaching: 10 4 | exercises: 2 5 | --- 6 | 7 | :::::::::::::::::::::::::::::::::::::: questions 8 | 9 | - How can we create reproducible reports? 10 | 11 | :::::::::::::::::::::::::::::::::::::::::::::::: 12 | 13 | ::::::::::::::::::::::::::::::::::::: objectives 14 | 15 | - Be able to generate a report using `targets` 16 | 17 | :::::::::::::::::::::::::::::::::::::::::::::::: 18 | 19 | ::::::::::::::::::::::::::::::::::::: instructor 20 | 21 | Episode summary: Show how to write reports with Quarto 22 | 23 | ::::::::::::::::::::::::::::::::::::: 24 | 25 | ```{r} 26 | #| label: setup 27 | #| echo: FALSE 28 | #| message: FALSE 29 | #| warning: FALSE 30 | library(targets) 31 | library(tarchetypes) 32 | library(quarto) # don't actually need to load, but put here so renv catches it 33 | 34 | if (interactive()) { 35 | setwd("episodes") 36 | } 37 | 38 | source("files/lesson_functions.R") 39 | 40 | # Increase width for printing tibbles 41 | options(width = 140) 42 | ``` 43 | 44 | ## Copy-paste vs. dynamic documents 45 | 46 | Typically, you will want to communicate the results of a data analysis to a broader audience. 47 | 48 | You may have done this before by copying and pasting statistics, plots, and other results into a text document or presentation. 49 | This may be fine if you only ever do the analysis once. 50 | But that is rarely the case---it is much more likely that you will tweak parts of the analysis or add new data and re-run your pipeline. 51 | With the copy-paste method, you'd have to remember what results changed and manually make sure everything is up-to-date. 52 | This is a perilous exercise! 53 | 54 | Fortunately, `targets` provides functions for keeping a document in sync with pipeline results, so you can avoid such pitfalls. 55 | The main tool we will use to generate documents is **Quarto**. 56 | Quarto can be used separately from `targets` (and is a large topic on its own), but it also happens to be an excellent way to dynamically generate reports with `targets`. 57 | 58 | Quarto allows you to insert the results of R code directly into your documents so that there is no danger of copy-and-paste mistakes. 59 | Furthermore, it can generate output from the same underlying script in multiple formats including PDF, HTML, and Microsoft Word. 60 | 61 | ::::::::::::::::::::::::::::::::::::: {.prereq} 62 | 63 | ## Installing Quarto 64 | 65 | As of v2022.07.1, [RStudio comes with Quarto](https://docs.posit.co/ide/user/ide/guide/documents/quarto-project.html), so you don't need to install it separately. If you can't run Quarto from RStudio, we recommend installing the latest version of RStudio. 66 | 67 | ::::::::::::::::::::::::::::::::::::: 68 | 69 | ## About Quarto files 70 | 71 | `.qmd` or `.Qmd` is the extension for Quarto files, and stands for "Quarto markdown". 72 | Quarto files invert the normal way of writing code and comments: in a typical R script, all text is assumed to be R code, unless you preface it with a `#` to show that it is a comment. 73 | In Quarto, all text is assumed to be prose, and you use special notation to indicate which lines are R code to be evaluated. 74 | Once the code is evaluated, the results get inserted into a final, rendered document, which could be one of various formats. 75 | 76 | ![Quarto workflow](fig/03-qmd-workflow.png) 77 | 78 | We don't have the time to go into the details of Quarto during this lesson, but recommend the ["Introduction to Reproducible Publications with RStudio" incubator (in-development) lesson](https://ucsbcarpentry.github.io/Reproducible-Publications-with-RStudio-Quarto/) for more on this topic. 79 | 80 | ## Recommended workflow 81 | 82 | Dynamic documents like Quarto (or Rmarkdown, the predecessor to Quarto) can actually be used to manage data analysis pipelines. 83 | But that is not recommended because it doesn't scale well and lacks the sophisticated dependency tracking offered by `targets`. 84 | 85 | Our suggested approach is to conduct the vast majority of data analysis (in other words, the "heavy lifting") in the `targets` pipeline, then use the Quarto document to **summarize** and **plot** the results. 86 | 87 | ## Report on bill size in penguins 88 | 89 | Continuing our penguin bill size analysis, let's write a report evaluating each model. 90 | 91 | To save time, the report is already available at . 92 | 93 | Copy the [raw code from here](https://raw.githubusercontent.com/joelnitta/penguins-targets/main/penguin_report.qmd) and save it as a new file `penguin_report.qmd` in your project folder (you may also be able to right click in your browser and select "Save As"). 94 | 95 | Then, add one more target to the pipeline using the `tar_quarto()` function like this: 96 | 97 | ```{r} 98 | #| label = "example-penguins-show-1", 99 | #| eval = FALSE, 100 | #| code = readLines("files/plans/plan_11.R")[2:40] 101 | ``` 102 | 103 | ```{r} 104 | #| label: example-penguins-hide-1 105 | #| echo: FALSE 106 | #| eval: FALSE 107 | 108 | # FIXME 109 | # Skip eval until can figure out how to install quarto CLI in whatever is 110 | # compiling the lesson 111 | 112 | tar_dir({ 113 | library(quarto) 114 | readr::read_lines("https://raw.githubusercontent.com/joelnitta/penguins-targets/main/penguin_report.qmd") |> 115 | readr::write_lines("penguin_report.qmd") 116 | # Run it 117 | write_example_plan("plan_8.R") 118 | tar_make(reporter = "silent") 119 | write_example_plan("plan_11.R") 120 | tar_make() 121 | }) 122 | ``` 123 | 124 | The function to generate the report is `tar_quarto()`, from the `tarchetypes` package. 125 | 126 | As you can see, the "heavy" analysis of running the models is done in the workflow, then there is a single call to render the report at the end with `tar_quarto()`. 127 | 128 | ## How does `targets` know when to render the report? 129 | 130 | It is not immediately apparent just from this how `targets` knows to generate the report **at the end of the workflow** (recall that build order is not determined by the order of how targets are written in the workflow, but rather by their dependencies). 131 | `penguin_report` does not appear to depend on any of the other targets, since they do not show up in the `tar_quarto()` call. 132 | 133 | How does this work? 134 | 135 | The answer lies **inside** the `penguin_report.qmd` file. Let's look at the start of the file: 136 | 137 | ```{r} 138 | #| label: show-penguin-report-qmd 139 | #| echo: FALSE 140 | #| results: 'asis' 141 | 142 | penguin_qmd <- readr::read_lines("https://raw.githubusercontent.com/joelnitta/penguins-targets/main/penguin_report.qmd") 143 | 144 | cat("````{.markdown}\n") 145 | cat(penguin_qmd[1:24], sep = "\n") 146 | cat("\n````") 147 | ``` 148 | 149 | The lines in between `---` and `---` at the very beginning are called the "YAML header", and contain directions about how to render the document. 150 | 151 | The R code to be executed is specified by the lines between `` ```{r} `` and `` ``` ``. This is called a "code chunk", since it is a portion of code interspersed within prose text. 152 | 153 | Take a closer look at the R code chunk. Notice the use of `targets::tar_load()`. Do you remember what that function does? It loads the targets built during the workflow. 154 | 155 | Now things should make a bit more sense: `targets` knows that the report depends on the targets built during the workflow like `combined_summary` and `species_summary` **because they are loaded in the report with `tar_load()`.** 156 | 157 | ## Generating dynamic content 158 | 159 | The call to `tar_load()` at the start of `penguin_report.qmd` is really the key to generating an up-to-date report---once those are loaded from the workflow, we know that they are in sync with the data, and can use them to produce "polished" text and plots. 160 | 161 | ::::::::::::::::::::::::::::::::::::: {.challenge} 162 | 163 | ## Challenge: Spot the dynamic contents 164 | 165 | Read through `penguin_report.qmd` and try to find instances where the targets built during the workflow (`combined_summary`, etc.) are used to dynamically produce text and plots. 166 | 167 | :::::::::::::::::::::::::::::::::: {.solution} 168 | 169 | - In the code chunk labeled `results-stats`, statistics from the models like *R* squared are extracted, then inserted into the text with in-line code like `` `r knitr::inline_expr("combined_r2")` ``. 170 | 171 | - There are two figures, one for the combined model and one for the separate models (code chunks labeled `fig-combined-plot` and `fig-separate-plot`, respectively). These are built using the points predicted from the model in `combined_predictions` and `species_predictions`. 172 | 173 | :::::::::::::::::::::::::::::::::: 174 | 175 | ::::::::::::::::::::::::::::::::::::: 176 | 177 | You should also interactively run the code in `penguin_report.qmd` to better understand what is going on, starting with `tar_load()`. In fact, that is how this report was written: the code was run in an interactive session, and saved to the report as it was gradually tweaked to obtain the desired results. 178 | 179 | The best way to learn this approach to generating reports is to **try it yourself**. 180 | 181 | So your final Challenge is to construct a `targets` workflow using your own data and generate a report. Good luck! 182 | 183 | ::::::::::::::::::::::::::::::::::::: keypoints 184 | 185 | - `tarchetypes::tar_quarto()` is used to render Quarto documents 186 | - You should load targets within the Quarto document using `tar_load()` and `tar_read()` 187 | - It is recommended to do heavy computations in the main targets workflow, and lighter formatting and plot generation in the Quarto document 188 | 189 | :::::::::::::::::::::::::::::::::::::::::::::::: 190 | -------------------------------------------------------------------------------- /index.md: -------------------------------------------------------------------------------- 1 | --- 2 | site: sandpaper::sandpaper_site 3 | --- 4 | 5 | This is a lesson about how to use the [targets](https://docs.ropensci.org/targets/) R package for maintaining efficient data analysis workflows. 6 | -------------------------------------------------------------------------------- /instructors/instructor-notes.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Instructor Notes' 3 | --- 4 | 5 | ## General notes 6 | 7 | The examples gradually build up to a [full analysis](https://github.com/joelnitta/penguins-targets) of the [Palmer Penguins dataset](https://allisonhorst.github.io/palmerpenguins/). However, there are a few places where completely different code is demonstrated to explain certain concepts. Since a given `targets` project can only have one `_targets.R` file, this means the participants may have to delete their existing `_targets.R` file and write a new one to follow along with the examples. This may cause frustration if they can't keep a record of what they have done so far. One solution would be to save the old `_targets.R` file as `_targets_old.R` or similar, then rename it when it should be run again. 8 | 9 | ## Optional episodes: 10 | The "Function" episode is an optional episode and will depend on the learners coming to your workshop. 11 | We would recommend having a show of hands (or stickies) who has experience with functions, and if you have learners who do not, run this episode. 12 | 13 | targets relies so much on functions we believe it is worth spending a little time on if you have learners inexperienced with them, they will quickly fall behind and not be empowered to use targets at the end of the workshop if they don't get a short introduction. 14 | 15 | -------------------------------------------------------------------------------- /learners/reference.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Reference' 3 | --- 4 | 5 | ## Glossary 6 | 7 | branch 8 | : A set of targets that are programmatically defined in the `targets` workflow 9 | 10 | reproducibility 11 | : The ability for others (including your future self) to be able to re-run an analysis and obtain the same results 12 | 13 | target 14 | : An object built by the `targets` workflow 15 | 16 | -------------------------------------------------------------------------------- /learners/setup.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Setup 3 | --- 4 | 5 | ## Local setup 6 | 7 | Follow these instructions to install the required software on your computer. 8 | 9 | - [Download and install the latest version of R](https://www.r-project.org/). 10 | - [Download and install RStudio](https://www.rstudio.com/products/rstudio/download/#download). RStudio is an application (an integrated development environment or IDE) that facilitates the use of R and offers a number of nice additional features, including the [Quarto](https://quarto.org/) publishing system. You will need the free Desktop version for your computer. 11 | - Install the necessary R packages with the following command: 12 | 13 | ```r 14 | install.packages( 15 | c( 16 | "conflicted", 17 | "crew", 18 | "palmerpenguins", 19 | "quarto", 20 | "tarchetypes", 21 | "targets", 22 | "tidyverse", 23 | "visNetwork" 24 | ) 25 | ) 26 | ``` 27 | 28 | ## Alternative: In the cloud 29 | 30 | There is a [Posit Cloud](https://posit.cloud/) instance with RStudio and all necessary packages pre-installed available, so you don't need to install anything on your own computer. You may need to create an account (free). 31 | 32 | Click this link to open: 33 | -------------------------------------------------------------------------------- /links.md: -------------------------------------------------------------------------------- 1 | 5 | 6 | [pandoc]: https://pandoc.org/MANUAL.html 7 | [r-markdown]: https://rmarkdown.rstudio.com/ 8 | [rstudio]: https://www.rstudio.com/ 9 | [carpentries-workbench]: https://carpentries.github.io/sandpaper-docs/ 10 | 11 | -------------------------------------------------------------------------------- /profiles/learner-profiles.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Learner Profiles 3 | --- 4 | 5 | These are fictional examples of the sort of learner expected to take this workshop. 6 | 7 | **Dayja** is a graduate student in evolutionary biology. 8 | She is familiar with R and writes many R scripts to conduct her analyses, but she often finds that it is difficult to remember which scripts to run in what order when she updates her data. 9 | 10 | **Jessie** is an undergraduate who is using Quarto to write their graduate thesis. 11 | They want to make sure all the results presented in the thesis come directly from code to avoid any errors and to make it easier for submission to a journal later. 12 | 13 | **Vincent** is a post-doc in bioinformatics. 14 | He has to orchestrate large workflows that run the same set of steps over many samples. 15 | He wants to simplify his code to avoid repetition. 16 | -------------------------------------------------------------------------------- /renv/activate.R: -------------------------------------------------------------------------------- 1 | 2 | local({ 3 | 4 | # the requested version of renv 5 | version <- "1.1.2" 6 | attr(version, "sha") <- NULL 7 | 8 | # the project directory 9 | project <- Sys.getenv("RENV_PROJECT") 10 | if (!nzchar(project)) 11 | project <- getwd() 12 | 13 | # use start-up diagnostics if enabled 14 | diagnostics <- Sys.getenv("RENV_STARTUP_DIAGNOSTICS", unset = "FALSE") 15 | if (diagnostics) { 16 | start <- Sys.time() 17 | profile <- tempfile("renv-startup-", fileext = ".Rprof") 18 | utils::Rprof(profile) 19 | on.exit({ 20 | utils::Rprof(NULL) 21 | elapsed <- signif(difftime(Sys.time(), start, units = "auto"), digits = 2L) 22 | writeLines(sprintf("- renv took %s to run the autoloader.", format(elapsed))) 23 | writeLines(sprintf("- Profile: %s", profile)) 24 | print(utils::summaryRprof(profile)) 25 | }, add = TRUE) 26 | } 27 | 28 | # figure out whether the autoloader is enabled 29 | enabled <- local({ 30 | 31 | # first, check config option 32 | override <- getOption("renv.config.autoloader.enabled") 33 | if (!is.null(override)) 34 | return(override) 35 | 36 | # if we're being run in a context where R_LIBS is already set, 37 | # don't load -- presumably we're being run as a sub-process and 38 | # the parent process has already set up library paths for us 39 | rcmd <- Sys.getenv("R_CMD", unset = NA) 40 | rlibs <- Sys.getenv("R_LIBS", unset = NA) 41 | if (!is.na(rlibs) && !is.na(rcmd)) 42 | return(FALSE) 43 | 44 | # next, check environment variables 45 | # prefer using the configuration one in the future 46 | envvars <- c( 47 | "RENV_CONFIG_AUTOLOADER_ENABLED", 48 | "RENV_AUTOLOADER_ENABLED", 49 | "RENV_ACTIVATE_PROJECT" 50 | ) 51 | 52 | for (envvar in envvars) { 53 | envval <- Sys.getenv(envvar, unset = NA) 54 | if (!is.na(envval)) 55 | return(tolower(envval) %in% c("true", "t", "1")) 56 | } 57 | 58 | # enable by default 59 | TRUE 60 | 61 | }) 62 | 63 | # bail if we're not enabled 64 | if (!enabled) { 65 | 66 | # if we're not enabled, we might still need to manually load 67 | # the user profile here 68 | profile <- Sys.getenv("R_PROFILE_USER", unset = "~/.Rprofile") 69 | if (file.exists(profile)) { 70 | cfg <- Sys.getenv("RENV_CONFIG_USER_PROFILE", unset = "TRUE") 71 | if (tolower(cfg) %in% c("true", "t", "1")) 72 | sys.source(profile, envir = globalenv()) 73 | } 74 | 75 | return(FALSE) 76 | 77 | } 78 | 79 | # avoid recursion 80 | if (identical(getOption("renv.autoloader.running"), TRUE)) { 81 | warning("ignoring recursive attempt to run renv autoloader") 82 | return(invisible(TRUE)) 83 | } 84 | 85 | # signal that we're loading renv during R startup 86 | options(renv.autoloader.running = TRUE) 87 | on.exit(options(renv.autoloader.running = NULL), add = TRUE) 88 | 89 | # signal that we've consented to use renv 90 | options(renv.consent = TRUE) 91 | 92 | # load the 'utils' package eagerly -- this ensures that renv shims, which 93 | # mask 'utils' packages, will come first on the search path 94 | library(utils, lib.loc = .Library) 95 | 96 | # unload renv if it's already been loaded 97 | if ("renv" %in% loadedNamespaces()) 98 | unloadNamespace("renv") 99 | 100 | # load bootstrap tools 101 | ansify <- function(text) { 102 | if (renv_ansify_enabled()) 103 | renv_ansify_enhanced(text) 104 | else 105 | renv_ansify_default(text) 106 | } 107 | 108 | renv_ansify_enabled <- function() { 109 | 110 | override <- Sys.getenv("RENV_ANSIFY_ENABLED", unset = NA) 111 | if (!is.na(override)) 112 | return(as.logical(override)) 113 | 114 | pane <- Sys.getenv("RSTUDIO_CHILD_PROCESS_PANE", unset = NA) 115 | if (identical(pane, "build")) 116 | return(FALSE) 117 | 118 | testthat <- Sys.getenv("TESTTHAT", unset = "false") 119 | if (tolower(testthat) %in% "true") 120 | return(FALSE) 121 | 122 | iderun <- Sys.getenv("R_CLI_HAS_HYPERLINK_IDE_RUN", unset = "false") 123 | if (tolower(iderun) %in% "false") 124 | return(FALSE) 125 | 126 | TRUE 127 | 128 | } 129 | 130 | renv_ansify_default <- function(text) { 131 | text 132 | } 133 | 134 | renv_ansify_enhanced <- function(text) { 135 | 136 | # R help links 137 | pattern <- "`\\?(renv::(?:[^`])+)`" 138 | replacement <- "`\033]8;;x-r-help:\\1\a?\\1\033]8;;\a`" 139 | text <- gsub(pattern, replacement, text, perl = TRUE) 140 | 141 | # runnable code 142 | pattern <- "`(renv::(?:[^`])+)`" 143 | replacement <- "`\033]8;;x-r-run:\\1\a\\1\033]8;;\a`" 144 | text <- gsub(pattern, replacement, text, perl = TRUE) 145 | 146 | # return ansified text 147 | text 148 | 149 | } 150 | 151 | renv_ansify_init <- function() { 152 | 153 | envir <- renv_envir_self() 154 | if (renv_ansify_enabled()) 155 | assign("ansify", renv_ansify_enhanced, envir = envir) 156 | else 157 | assign("ansify", renv_ansify_default, envir = envir) 158 | 159 | } 160 | 161 | `%||%` <- function(x, y) { 162 | if (is.null(x)) y else x 163 | } 164 | 165 | catf <- function(fmt, ..., appendLF = TRUE) { 166 | 167 | quiet <- getOption("renv.bootstrap.quiet", default = FALSE) 168 | if (quiet) 169 | return(invisible()) 170 | 171 | msg <- sprintf(fmt, ...) 172 | cat(msg, file = stdout(), sep = if (appendLF) "\n" else "") 173 | 174 | invisible(msg) 175 | 176 | } 177 | 178 | header <- function(label, 179 | ..., 180 | prefix = "#", 181 | suffix = "-", 182 | n = min(getOption("width"), 78)) 183 | { 184 | label <- sprintf(label, ...) 185 | n <- max(n - nchar(label) - nchar(prefix) - 2L, 8L) 186 | if (n <= 0) 187 | return(paste(prefix, label)) 188 | 189 | tail <- paste(rep.int(suffix, n), collapse = "") 190 | paste0(prefix, " ", label, " ", tail) 191 | 192 | } 193 | 194 | heredoc <- function(text, leave = 0) { 195 | 196 | # remove leading, trailing whitespace 197 | trimmed <- gsub("^\\s*\\n|\\n\\s*$", "", text) 198 | 199 | # split into lines 200 | lines <- strsplit(trimmed, "\n", fixed = TRUE)[[1L]] 201 | 202 | # compute common indent 203 | indent <- regexpr("[^[:space:]]", lines) 204 | common <- min(setdiff(indent, -1L)) - leave 205 | text <- paste(substring(lines, common), collapse = "\n") 206 | 207 | # substitute in ANSI links for executable renv code 208 | ansify(text) 209 | 210 | } 211 | 212 | bootstrap <- function(version, library) { 213 | 214 | friendly <- renv_bootstrap_version_friendly(version) 215 | section <- header(sprintf("Bootstrapping renv %s", friendly)) 216 | catf(section) 217 | 218 | # attempt to download renv 219 | catf("- Downloading renv ... ", appendLF = FALSE) 220 | withCallingHandlers( 221 | tarball <- renv_bootstrap_download(version), 222 | error = function(err) { 223 | catf("FAILED") 224 | stop("failed to download:\n", conditionMessage(err)) 225 | } 226 | ) 227 | catf("OK") 228 | on.exit(unlink(tarball), add = TRUE) 229 | 230 | # now attempt to install 231 | catf("- Installing renv ... ", appendLF = FALSE) 232 | withCallingHandlers( 233 | status <- renv_bootstrap_install(version, tarball, library), 234 | error = function(err) { 235 | catf("FAILED") 236 | stop("failed to install:\n", conditionMessage(err)) 237 | } 238 | ) 239 | catf("OK") 240 | 241 | # add empty line to break up bootstrapping from normal output 242 | catf("") 243 | 244 | return(invisible()) 245 | } 246 | 247 | renv_bootstrap_tests_running <- function() { 248 | getOption("renv.tests.running", default = FALSE) 249 | } 250 | 251 | renv_bootstrap_repos <- function() { 252 | 253 | # get CRAN repository 254 | cran <- getOption("renv.repos.cran", "https://cloud.r-project.org") 255 | 256 | # check for repos override 257 | repos <- Sys.getenv("RENV_CONFIG_REPOS_OVERRIDE", unset = NA) 258 | if (!is.na(repos)) { 259 | 260 | # check for RSPM; if set, use a fallback repository for renv 261 | rspm <- Sys.getenv("RSPM", unset = NA) 262 | if (identical(rspm, repos)) 263 | repos <- c(RSPM = rspm, CRAN = cran) 264 | 265 | return(repos) 266 | 267 | } 268 | 269 | # check for lockfile repositories 270 | repos <- tryCatch(renv_bootstrap_repos_lockfile(), error = identity) 271 | if (!inherits(repos, "error") && length(repos)) 272 | return(repos) 273 | 274 | # retrieve current repos 275 | repos <- getOption("repos") 276 | 277 | # ensure @CRAN@ entries are resolved 278 | repos[repos == "@CRAN@"] <- cran 279 | 280 | # add in renv.bootstrap.repos if set 281 | default <- c(FALLBACK = "https://cloud.r-project.org") 282 | extra <- getOption("renv.bootstrap.repos", default = default) 283 | repos <- c(repos, extra) 284 | 285 | # remove duplicates that might've snuck in 286 | dupes <- duplicated(repos) | duplicated(names(repos)) 287 | repos[!dupes] 288 | 289 | } 290 | 291 | renv_bootstrap_repos_lockfile <- function() { 292 | 293 | lockpath <- Sys.getenv("RENV_PATHS_LOCKFILE", unset = "renv.lock") 294 | if (!file.exists(lockpath)) 295 | return(NULL) 296 | 297 | lockfile <- tryCatch(renv_json_read(lockpath), error = identity) 298 | if (inherits(lockfile, "error")) { 299 | warning(lockfile) 300 | return(NULL) 301 | } 302 | 303 | repos <- lockfile$R$Repositories 304 | if (length(repos) == 0) 305 | return(NULL) 306 | 307 | keys <- vapply(repos, `[[`, "Name", FUN.VALUE = character(1)) 308 | vals <- vapply(repos, `[[`, "URL", FUN.VALUE = character(1)) 309 | names(vals) <- keys 310 | 311 | return(vals) 312 | 313 | } 314 | 315 | renv_bootstrap_download <- function(version) { 316 | 317 | sha <- attr(version, "sha", exact = TRUE) 318 | 319 | methods <- if (!is.null(sha)) { 320 | 321 | # attempting to bootstrap a development version of renv 322 | c( 323 | function() renv_bootstrap_download_tarball(sha), 324 | function() renv_bootstrap_download_github(sha) 325 | ) 326 | 327 | } else { 328 | 329 | # attempting to bootstrap a release version of renv 330 | c( 331 | function() renv_bootstrap_download_tarball(version), 332 | function() renv_bootstrap_download_cran_latest(version), 333 | function() renv_bootstrap_download_cran_archive(version) 334 | ) 335 | 336 | } 337 | 338 | for (method in methods) { 339 | path <- tryCatch(method(), error = identity) 340 | if (is.character(path) && file.exists(path)) 341 | return(path) 342 | } 343 | 344 | stop("All download methods failed") 345 | 346 | } 347 | 348 | renv_bootstrap_download_impl <- function(url, destfile) { 349 | 350 | mode <- "wb" 351 | 352 | # https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17715 353 | fixup <- 354 | Sys.info()[["sysname"]] == "Windows" && 355 | substring(url, 1L, 5L) == "file:" 356 | 357 | if (fixup) 358 | mode <- "w+b" 359 | 360 | args <- list( 361 | url = url, 362 | destfile = destfile, 363 | mode = mode, 364 | quiet = TRUE 365 | ) 366 | 367 | if ("headers" %in% names(formals(utils::download.file))) { 368 | headers <- renv_bootstrap_download_custom_headers(url) 369 | if (length(headers) && is.character(headers)) 370 | args$headers <- headers 371 | } 372 | 373 | do.call(utils::download.file, args) 374 | 375 | } 376 | 377 | renv_bootstrap_download_custom_headers <- function(url) { 378 | 379 | headers <- getOption("renv.download.headers") 380 | if (is.null(headers)) 381 | return(character()) 382 | 383 | if (!is.function(headers)) 384 | stopf("'renv.download.headers' is not a function") 385 | 386 | headers <- headers(url) 387 | if (length(headers) == 0L) 388 | return(character()) 389 | 390 | if (is.list(headers)) 391 | headers <- unlist(headers, recursive = FALSE, use.names = TRUE) 392 | 393 | ok <- 394 | is.character(headers) && 395 | is.character(names(headers)) && 396 | all(nzchar(names(headers))) 397 | 398 | if (!ok) 399 | stop("invocation of 'renv.download.headers' did not return a named character vector") 400 | 401 | headers 402 | 403 | } 404 | 405 | renv_bootstrap_download_cran_latest <- function(version) { 406 | 407 | spec <- renv_bootstrap_download_cran_latest_find(version) 408 | type <- spec$type 409 | repos <- spec$repos 410 | 411 | baseurl <- utils::contrib.url(repos = repos, type = type) 412 | ext <- if (identical(type, "source")) 413 | ".tar.gz" 414 | else if (Sys.info()[["sysname"]] == "Windows") 415 | ".zip" 416 | else 417 | ".tgz" 418 | name <- sprintf("renv_%s%s", version, ext) 419 | url <- paste(baseurl, name, sep = "/") 420 | 421 | destfile <- file.path(tempdir(), name) 422 | status <- tryCatch( 423 | renv_bootstrap_download_impl(url, destfile), 424 | condition = identity 425 | ) 426 | 427 | if (inherits(status, "condition")) 428 | return(FALSE) 429 | 430 | # report success and return 431 | destfile 432 | 433 | } 434 | 435 | renv_bootstrap_download_cran_latest_find <- function(version) { 436 | 437 | # check whether binaries are supported on this system 438 | binary <- 439 | getOption("renv.bootstrap.binary", default = TRUE) && 440 | !identical(.Platform$pkgType, "source") && 441 | !identical(getOption("pkgType"), "source") && 442 | Sys.info()[["sysname"]] %in% c("Darwin", "Windows") 443 | 444 | types <- c(if (binary) "binary", "source") 445 | 446 | # iterate over types + repositories 447 | for (type in types) { 448 | for (repos in renv_bootstrap_repos()) { 449 | 450 | # build arguments for utils::available.packages() call 451 | args <- list(type = type, repos = repos) 452 | 453 | # add custom headers if available -- note that 454 | # utils::available.packages() will pass this to download.file() 455 | if ("headers" %in% names(formals(utils::download.file))) { 456 | headers <- renv_bootstrap_download_custom_headers(repos) 457 | if (length(headers) && is.character(headers)) 458 | args$headers <- headers 459 | } 460 | 461 | # retrieve package database 462 | db <- tryCatch( 463 | as.data.frame( 464 | do.call(utils::available.packages, args), 465 | stringsAsFactors = FALSE 466 | ), 467 | error = identity 468 | ) 469 | 470 | if (inherits(db, "error")) 471 | next 472 | 473 | # check for compatible entry 474 | entry <- db[db$Package %in% "renv" & db$Version %in% version, ] 475 | if (nrow(entry) == 0) 476 | next 477 | 478 | # found it; return spec to caller 479 | spec <- list(entry = entry, type = type, repos = repos) 480 | return(spec) 481 | 482 | } 483 | } 484 | 485 | # if we got here, we failed to find renv 486 | fmt <- "renv %s is not available from your declared package repositories" 487 | stop(sprintf(fmt, version)) 488 | 489 | } 490 | 491 | renv_bootstrap_download_cran_archive <- function(version) { 492 | 493 | name <- sprintf("renv_%s.tar.gz", version) 494 | repos <- renv_bootstrap_repos() 495 | urls <- file.path(repos, "src/contrib/Archive/renv", name) 496 | destfile <- file.path(tempdir(), name) 497 | 498 | for (url in urls) { 499 | 500 | status <- tryCatch( 501 | renv_bootstrap_download_impl(url, destfile), 502 | condition = identity 503 | ) 504 | 505 | if (identical(status, 0L)) 506 | return(destfile) 507 | 508 | } 509 | 510 | return(FALSE) 511 | 512 | } 513 | 514 | renv_bootstrap_download_tarball <- function(version) { 515 | 516 | # if the user has provided the path to a tarball via 517 | # an environment variable, then use it 518 | tarball <- Sys.getenv("RENV_BOOTSTRAP_TARBALL", unset = NA) 519 | if (is.na(tarball)) 520 | return() 521 | 522 | # allow directories 523 | if (dir.exists(tarball)) { 524 | name <- sprintf("renv_%s.tar.gz", version) 525 | tarball <- file.path(tarball, name) 526 | } 527 | 528 | # bail if it doesn't exist 529 | if (!file.exists(tarball)) { 530 | 531 | # let the user know we weren't able to honour their request 532 | fmt <- "- RENV_BOOTSTRAP_TARBALL is set (%s) but does not exist." 533 | msg <- sprintf(fmt, tarball) 534 | warning(msg) 535 | 536 | # bail 537 | return() 538 | 539 | } 540 | 541 | catf("- Using local tarball '%s'.", tarball) 542 | tarball 543 | 544 | } 545 | 546 | renv_bootstrap_github_token <- function() { 547 | for (envvar in c("GITHUB_TOKEN", "GITHUB_PAT", "GH_TOKEN")) { 548 | envval <- Sys.getenv(envvar, unset = NA) 549 | if (!is.na(envval)) 550 | return(envval) 551 | } 552 | } 553 | 554 | renv_bootstrap_download_github <- function(version) { 555 | 556 | enabled <- Sys.getenv("RENV_BOOTSTRAP_FROM_GITHUB", unset = "TRUE") 557 | if (!identical(enabled, "TRUE")) 558 | return(FALSE) 559 | 560 | # prepare download options 561 | token <- renv_bootstrap_github_token() 562 | if (is.null(token)) 563 | token <- "" 564 | 565 | if (nzchar(Sys.which("curl")) && nzchar(token)) { 566 | fmt <- "--location --fail --header \"Authorization: token %s\"" 567 | extra <- sprintf(fmt, token) 568 | saved <- options("download.file.method", "download.file.extra") 569 | options(download.file.method = "curl", download.file.extra = extra) 570 | on.exit(do.call(base::options, saved), add = TRUE) 571 | } else if (nzchar(Sys.which("wget")) && nzchar(token)) { 572 | fmt <- "--header=\"Authorization: token %s\"" 573 | extra <- sprintf(fmt, token) 574 | saved <- options("download.file.method", "download.file.extra") 575 | options(download.file.method = "wget", download.file.extra = extra) 576 | on.exit(do.call(base::options, saved), add = TRUE) 577 | } 578 | 579 | url <- file.path("https://api.github.com/repos/rstudio/renv/tarball", version) 580 | name <- sprintf("renv_%s.tar.gz", version) 581 | destfile <- file.path(tempdir(), name) 582 | 583 | status <- tryCatch( 584 | renv_bootstrap_download_impl(url, destfile), 585 | condition = identity 586 | ) 587 | 588 | if (!identical(status, 0L)) 589 | return(FALSE) 590 | 591 | renv_bootstrap_download_augment(destfile) 592 | 593 | return(destfile) 594 | 595 | } 596 | 597 | # Add Sha to DESCRIPTION. This is stop gap until #890, after which we 598 | # can use renv::install() to fully capture metadata. 599 | renv_bootstrap_download_augment <- function(destfile) { 600 | sha <- renv_bootstrap_git_extract_sha1_tar(destfile) 601 | if (is.null(sha)) { 602 | return() 603 | } 604 | 605 | # Untar 606 | tempdir <- tempfile("renv-github-") 607 | on.exit(unlink(tempdir, recursive = TRUE), add = TRUE) 608 | untar(destfile, exdir = tempdir) 609 | pkgdir <- dir(tempdir, full.names = TRUE)[[1]] 610 | 611 | # Modify description 612 | desc_path <- file.path(pkgdir, "DESCRIPTION") 613 | desc_lines <- readLines(desc_path) 614 | remotes_fields <- c( 615 | "RemoteType: github", 616 | "RemoteHost: api.github.com", 617 | "RemoteRepo: renv", 618 | "RemoteUsername: rstudio", 619 | "RemotePkgRef: rstudio/renv", 620 | paste("RemoteRef: ", sha), 621 | paste("RemoteSha: ", sha) 622 | ) 623 | writeLines(c(desc_lines[desc_lines != ""], remotes_fields), con = desc_path) 624 | 625 | # Re-tar 626 | local({ 627 | old <- setwd(tempdir) 628 | on.exit(setwd(old), add = TRUE) 629 | 630 | tar(destfile, compression = "gzip") 631 | }) 632 | invisible() 633 | } 634 | 635 | # Extract the commit hash from a git archive. Git archives include the SHA1 636 | # hash as the comment field of the tarball pax extended header 637 | # (see https://www.kernel.org/pub/software/scm/git/docs/git-archive.html) 638 | # For GitHub archives this should be the first header after the default one 639 | # (512 byte) header. 640 | renv_bootstrap_git_extract_sha1_tar <- function(bundle) { 641 | 642 | # open the bundle for reading 643 | # We use gzcon for everything because (from ?gzcon) 644 | # > Reading from a connection which does not supply a 'gzip' magic 645 | # > header is equivalent to reading from the original connection 646 | conn <- gzcon(file(bundle, open = "rb", raw = TRUE)) 647 | on.exit(close(conn)) 648 | 649 | # The default pax header is 512 bytes long and the first pax extended header 650 | # with the comment should be 51 bytes long 651 | # `52 comment=` (11 chars) + 40 byte SHA1 hash 652 | len <- 0x200 + 0x33 653 | res <- rawToChar(readBin(conn, "raw", n = len)[0x201:len]) 654 | 655 | if (grepl("^52 comment=", res)) { 656 | sub("52 comment=", "", res) 657 | } else { 658 | NULL 659 | } 660 | } 661 | 662 | renv_bootstrap_install <- function(version, tarball, library) { 663 | 664 | # attempt to install it into project library 665 | dir.create(library, showWarnings = FALSE, recursive = TRUE) 666 | output <- renv_bootstrap_install_impl(library, tarball) 667 | 668 | # check for successful install 669 | status <- attr(output, "status") 670 | if (is.null(status) || identical(status, 0L)) 671 | return(status) 672 | 673 | # an error occurred; report it 674 | header <- "installation of renv failed" 675 | lines <- paste(rep.int("=", nchar(header)), collapse = "") 676 | text <- paste(c(header, lines, output), collapse = "\n") 677 | stop(text) 678 | 679 | } 680 | 681 | renv_bootstrap_install_impl <- function(library, tarball) { 682 | 683 | # invoke using system2 so we can capture and report output 684 | bin <- R.home("bin") 685 | exe <- if (Sys.info()[["sysname"]] == "Windows") "R.exe" else "R" 686 | R <- file.path(bin, exe) 687 | 688 | args <- c( 689 | "--vanilla", "CMD", "INSTALL", "--no-multiarch", 690 | "-l", shQuote(path.expand(library)), 691 | shQuote(path.expand(tarball)) 692 | ) 693 | 694 | system2(R, args, stdout = TRUE, stderr = TRUE) 695 | 696 | } 697 | 698 | renv_bootstrap_platform_prefix <- function() { 699 | 700 | # construct version prefix 701 | version <- paste(R.version$major, R.version$minor, sep = ".") 702 | prefix <- paste("R", numeric_version(version)[1, 1:2], sep = "-") 703 | 704 | # include SVN revision for development versions of R 705 | # (to avoid sharing platform-specific artefacts with released versions of R) 706 | devel <- 707 | identical(R.version[["status"]], "Under development (unstable)") || 708 | identical(R.version[["nickname"]], "Unsuffered Consequences") 709 | 710 | if (devel) 711 | prefix <- paste(prefix, R.version[["svn rev"]], sep = "-r") 712 | 713 | # build list of path components 714 | components <- c(prefix, R.version$platform) 715 | 716 | # include prefix if provided by user 717 | prefix <- renv_bootstrap_platform_prefix_impl() 718 | if (!is.na(prefix) && nzchar(prefix)) 719 | components <- c(prefix, components) 720 | 721 | # build prefix 722 | paste(components, collapse = "/") 723 | 724 | } 725 | 726 | renv_bootstrap_platform_prefix_impl <- function() { 727 | 728 | # if an explicit prefix has been supplied, use it 729 | prefix <- Sys.getenv("RENV_PATHS_PREFIX", unset = NA) 730 | if (!is.na(prefix)) 731 | return(prefix) 732 | 733 | # if the user has requested an automatic prefix, generate it 734 | auto <- Sys.getenv("RENV_PATHS_PREFIX_AUTO", unset = NA) 735 | if (is.na(auto) && getRversion() >= "4.4.0") 736 | auto <- "TRUE" 737 | 738 | if (auto %in% c("TRUE", "True", "true", "1")) 739 | return(renv_bootstrap_platform_prefix_auto()) 740 | 741 | # empty string on failure 742 | "" 743 | 744 | } 745 | 746 | renv_bootstrap_platform_prefix_auto <- function() { 747 | 748 | prefix <- tryCatch(renv_bootstrap_platform_os(), error = identity) 749 | if (inherits(prefix, "error") || prefix %in% "unknown") { 750 | 751 | msg <- paste( 752 | "failed to infer current operating system", 753 | "please file a bug report at https://github.com/rstudio/renv/issues", 754 | sep = "; " 755 | ) 756 | 757 | warning(msg) 758 | 759 | } 760 | 761 | prefix 762 | 763 | } 764 | 765 | renv_bootstrap_platform_os <- function() { 766 | 767 | sysinfo <- Sys.info() 768 | sysname <- sysinfo[["sysname"]] 769 | 770 | # handle Windows + macOS up front 771 | if (sysname == "Windows") 772 | return("windows") 773 | else if (sysname == "Darwin") 774 | return("macos") 775 | 776 | # check for os-release files 777 | for (file in c("/etc/os-release", "/usr/lib/os-release")) 778 | if (file.exists(file)) 779 | return(renv_bootstrap_platform_os_via_os_release(file, sysinfo)) 780 | 781 | # check for redhat-release files 782 | if (file.exists("/etc/redhat-release")) 783 | return(renv_bootstrap_platform_os_via_redhat_release()) 784 | 785 | "unknown" 786 | 787 | } 788 | 789 | renv_bootstrap_platform_os_via_os_release <- function(file, sysinfo) { 790 | 791 | # read /etc/os-release 792 | release <- utils::read.table( 793 | file = file, 794 | sep = "=", 795 | quote = c("\"", "'"), 796 | col.names = c("Key", "Value"), 797 | comment.char = "#", 798 | stringsAsFactors = FALSE 799 | ) 800 | 801 | vars <- as.list(release$Value) 802 | names(vars) <- release$Key 803 | 804 | # get os name 805 | os <- tolower(sysinfo[["sysname"]]) 806 | 807 | # read id 808 | id <- "unknown" 809 | for (field in c("ID", "ID_LIKE")) { 810 | if (field %in% names(vars) && nzchar(vars[[field]])) { 811 | id <- vars[[field]] 812 | break 813 | } 814 | } 815 | 816 | # read version 817 | version <- "unknown" 818 | for (field in c("UBUNTU_CODENAME", "VERSION_CODENAME", "VERSION_ID", "BUILD_ID")) { 819 | if (field %in% names(vars) && nzchar(vars[[field]])) { 820 | version <- vars[[field]] 821 | break 822 | } 823 | } 824 | 825 | # join together 826 | paste(c(os, id, version), collapse = "-") 827 | 828 | } 829 | 830 | renv_bootstrap_platform_os_via_redhat_release <- function() { 831 | 832 | # read /etc/redhat-release 833 | contents <- readLines("/etc/redhat-release", warn = FALSE) 834 | 835 | # infer id 836 | id <- if (grepl("centos", contents, ignore.case = TRUE)) 837 | "centos" 838 | else if (grepl("redhat", contents, ignore.case = TRUE)) 839 | "redhat" 840 | else 841 | "unknown" 842 | 843 | # try to find a version component (very hacky) 844 | version <- "unknown" 845 | 846 | parts <- strsplit(contents, "[[:space:]]")[[1L]] 847 | for (part in parts) { 848 | 849 | nv <- tryCatch(numeric_version(part), error = identity) 850 | if (inherits(nv, "error")) 851 | next 852 | 853 | version <- nv[1, 1] 854 | break 855 | 856 | } 857 | 858 | paste(c("linux", id, version), collapse = "-") 859 | 860 | } 861 | 862 | renv_bootstrap_library_root_name <- function(project) { 863 | 864 | # use project name as-is if requested 865 | asis <- Sys.getenv("RENV_PATHS_LIBRARY_ROOT_ASIS", unset = "FALSE") 866 | if (asis) 867 | return(basename(project)) 868 | 869 | # otherwise, disambiguate based on project's path 870 | id <- substring(renv_bootstrap_hash_text(project), 1L, 8L) 871 | paste(basename(project), id, sep = "-") 872 | 873 | } 874 | 875 | renv_bootstrap_library_root <- function(project) { 876 | 877 | prefix <- renv_bootstrap_profile_prefix() 878 | 879 | path <- Sys.getenv("RENV_PATHS_LIBRARY", unset = NA) 880 | if (!is.na(path)) 881 | return(paste(c(path, prefix), collapse = "/")) 882 | 883 | path <- renv_bootstrap_library_root_impl(project) 884 | if (!is.null(path)) { 885 | name <- renv_bootstrap_library_root_name(project) 886 | return(paste(c(path, prefix, name), collapse = "/")) 887 | } 888 | 889 | renv_bootstrap_paths_renv("library", project = project) 890 | 891 | } 892 | 893 | renv_bootstrap_library_root_impl <- function(project) { 894 | 895 | root <- Sys.getenv("RENV_PATHS_LIBRARY_ROOT", unset = NA) 896 | if (!is.na(root)) 897 | return(root) 898 | 899 | type <- renv_bootstrap_project_type(project) 900 | if (identical(type, "package")) { 901 | userdir <- renv_bootstrap_user_dir() 902 | return(file.path(userdir, "library")) 903 | } 904 | 905 | } 906 | 907 | renv_bootstrap_validate_version <- function(version, description = NULL) { 908 | 909 | # resolve description file 910 | # 911 | # avoid passing lib.loc to `packageDescription()` below, since R will 912 | # use the loaded version of the package by default anyhow. note that 913 | # this function should only be called after 'renv' is loaded 914 | # https://github.com/rstudio/renv/issues/1625 915 | description <- description %||% packageDescription("renv") 916 | 917 | # check whether requested version 'version' matches loaded version of renv 918 | sha <- attr(version, "sha", exact = TRUE) 919 | valid <- if (!is.null(sha)) 920 | renv_bootstrap_validate_version_dev(sha, description) 921 | else 922 | renv_bootstrap_validate_version_release(version, description) 923 | 924 | if (valid) 925 | return(TRUE) 926 | 927 | # the loaded version of renv doesn't match the requested version; 928 | # give the user instructions on how to proceed 929 | dev <- identical(description[["RemoteType"]], "github") 930 | remote <- if (dev) 931 | paste("rstudio/renv", description[["RemoteSha"]], sep = "@") 932 | else 933 | paste("renv", description[["Version"]], sep = "@") 934 | 935 | # display both loaded version + sha if available 936 | friendly <- renv_bootstrap_version_friendly( 937 | version = description[["Version"]], 938 | sha = if (dev) description[["RemoteSha"]] 939 | ) 940 | 941 | fmt <- heredoc(" 942 | renv %1$s was loaded from project library, but this project is configured to use renv %2$s. 943 | - Use `renv::record(\"%3$s\")` to record renv %1$s in the lockfile. 944 | - Use `renv::restore(packages = \"renv\")` to install renv %2$s into the project library. 945 | ") 946 | catf(fmt, friendly, renv_bootstrap_version_friendly(version), remote) 947 | 948 | FALSE 949 | 950 | } 951 | 952 | renv_bootstrap_validate_version_dev <- function(version, description) { 953 | 954 | expected <- description[["RemoteSha"]] 955 | if (!is.character(expected)) 956 | return(FALSE) 957 | 958 | pattern <- sprintf("^\\Q%s\\E", version) 959 | grepl(pattern, expected, perl = TRUE) 960 | 961 | } 962 | 963 | renv_bootstrap_validate_version_release <- function(version, description) { 964 | expected <- description[["Version"]] 965 | is.character(expected) && identical(expected, version) 966 | } 967 | 968 | renv_bootstrap_hash_text <- function(text) { 969 | 970 | hashfile <- tempfile("renv-hash-") 971 | on.exit(unlink(hashfile), add = TRUE) 972 | 973 | writeLines(text, con = hashfile) 974 | tools::md5sum(hashfile) 975 | 976 | } 977 | 978 | renv_bootstrap_load <- function(project, libpath, version) { 979 | 980 | # try to load renv from the project library 981 | if (!requireNamespace("renv", lib.loc = libpath, quietly = TRUE)) 982 | return(FALSE) 983 | 984 | # warn if the version of renv loaded does not match 985 | renv_bootstrap_validate_version(version) 986 | 987 | # execute renv load hooks, if any 988 | hooks <- getHook("renv::autoload") 989 | for (hook in hooks) 990 | if (is.function(hook)) 991 | tryCatch(hook(), error = warnify) 992 | 993 | # load the project 994 | renv::load(project) 995 | 996 | TRUE 997 | 998 | } 999 | 1000 | renv_bootstrap_profile_load <- function(project) { 1001 | 1002 | # if RENV_PROFILE is already set, just use that 1003 | profile <- Sys.getenv("RENV_PROFILE", unset = NA) 1004 | if (!is.na(profile) && nzchar(profile)) 1005 | return(profile) 1006 | 1007 | # check for a profile file (nothing to do if it doesn't exist) 1008 | path <- renv_bootstrap_paths_renv("profile", profile = FALSE, project = project) 1009 | if (!file.exists(path)) 1010 | return(NULL) 1011 | 1012 | # read the profile, and set it if it exists 1013 | contents <- readLines(path, warn = FALSE) 1014 | if (length(contents) == 0L) 1015 | return(NULL) 1016 | 1017 | # set RENV_PROFILE 1018 | profile <- contents[[1L]] 1019 | if (!profile %in% c("", "default")) 1020 | Sys.setenv(RENV_PROFILE = profile) 1021 | 1022 | profile 1023 | 1024 | } 1025 | 1026 | renv_bootstrap_profile_prefix <- function() { 1027 | profile <- renv_bootstrap_profile_get() 1028 | if (!is.null(profile)) 1029 | return(file.path("profiles", profile, "renv")) 1030 | } 1031 | 1032 | renv_bootstrap_profile_get <- function() { 1033 | profile <- Sys.getenv("RENV_PROFILE", unset = "") 1034 | renv_bootstrap_profile_normalize(profile) 1035 | } 1036 | 1037 | renv_bootstrap_profile_set <- function(profile) { 1038 | profile <- renv_bootstrap_profile_normalize(profile) 1039 | if (is.null(profile)) 1040 | Sys.unsetenv("RENV_PROFILE") 1041 | else 1042 | Sys.setenv(RENV_PROFILE = profile) 1043 | } 1044 | 1045 | renv_bootstrap_profile_normalize <- function(profile) { 1046 | 1047 | if (is.null(profile) || profile %in% c("", "default")) 1048 | return(NULL) 1049 | 1050 | profile 1051 | 1052 | } 1053 | 1054 | renv_bootstrap_path_absolute <- function(path) { 1055 | 1056 | substr(path, 1L, 1L) %in% c("~", "/", "\\") || ( 1057 | substr(path, 1L, 1L) %in% c(letters, LETTERS) && 1058 | substr(path, 2L, 3L) %in% c(":/", ":\\") 1059 | ) 1060 | 1061 | } 1062 | 1063 | renv_bootstrap_paths_renv <- function(..., profile = TRUE, project = NULL) { 1064 | renv <- Sys.getenv("RENV_PATHS_RENV", unset = "renv") 1065 | root <- if (renv_bootstrap_path_absolute(renv)) NULL else project 1066 | prefix <- if (profile) renv_bootstrap_profile_prefix() 1067 | components <- c(root, renv, prefix, ...) 1068 | paste(components, collapse = "/") 1069 | } 1070 | 1071 | renv_bootstrap_project_type <- function(path) { 1072 | 1073 | descpath <- file.path(path, "DESCRIPTION") 1074 | if (!file.exists(descpath)) 1075 | return("unknown") 1076 | 1077 | desc <- tryCatch( 1078 | read.dcf(descpath, all = TRUE), 1079 | error = identity 1080 | ) 1081 | 1082 | if (inherits(desc, "error")) 1083 | return("unknown") 1084 | 1085 | type <- desc$Type 1086 | if (!is.null(type)) 1087 | return(tolower(type)) 1088 | 1089 | package <- desc$Package 1090 | if (!is.null(package)) 1091 | return("package") 1092 | 1093 | "unknown" 1094 | 1095 | } 1096 | 1097 | renv_bootstrap_user_dir <- function() { 1098 | dir <- renv_bootstrap_user_dir_impl() 1099 | path.expand(chartr("\\", "/", dir)) 1100 | } 1101 | 1102 | renv_bootstrap_user_dir_impl <- function() { 1103 | 1104 | # use local override if set 1105 | override <- getOption("renv.userdir.override") 1106 | if (!is.null(override)) 1107 | return(override) 1108 | 1109 | # use R_user_dir if available 1110 | tools <- asNamespace("tools") 1111 | if (is.function(tools$R_user_dir)) 1112 | return(tools$R_user_dir("renv", "cache")) 1113 | 1114 | # try using our own backfill for older versions of R 1115 | envvars <- c("R_USER_CACHE_DIR", "XDG_CACHE_HOME") 1116 | for (envvar in envvars) { 1117 | root <- Sys.getenv(envvar, unset = NA) 1118 | if (!is.na(root)) 1119 | return(file.path(root, "R/renv")) 1120 | } 1121 | 1122 | # use platform-specific default fallbacks 1123 | if (Sys.info()[["sysname"]] == "Windows") 1124 | file.path(Sys.getenv("LOCALAPPDATA"), "R/cache/R/renv") 1125 | else if (Sys.info()[["sysname"]] == "Darwin") 1126 | "~/Library/Caches/org.R-project.R/R/renv" 1127 | else 1128 | "~/.cache/R/renv" 1129 | 1130 | } 1131 | 1132 | renv_bootstrap_version_friendly <- function(version, shafmt = NULL, sha = NULL) { 1133 | sha <- sha %||% attr(version, "sha", exact = TRUE) 1134 | parts <- c(version, sprintf(shafmt %||% " [sha: %s]", substring(sha, 1L, 7L))) 1135 | paste(parts, collapse = "") 1136 | } 1137 | 1138 | renv_bootstrap_exec <- function(project, libpath, version) { 1139 | if (!renv_bootstrap_load(project, libpath, version)) 1140 | renv_bootstrap_run(project, libpath, version) 1141 | } 1142 | 1143 | renv_bootstrap_run <- function(project, libpath, version) { 1144 | 1145 | # perform bootstrap 1146 | bootstrap(version, libpath) 1147 | 1148 | # exit early if we're just testing bootstrap 1149 | if (!is.na(Sys.getenv("RENV_BOOTSTRAP_INSTALL_ONLY", unset = NA))) 1150 | return(TRUE) 1151 | 1152 | # try again to load 1153 | if (requireNamespace("renv", lib.loc = libpath, quietly = TRUE)) { 1154 | return(renv::load(project = project)) 1155 | } 1156 | 1157 | # failed to download or load renv; warn the user 1158 | msg <- c( 1159 | "Failed to find an renv installation: the project will not be loaded.", 1160 | "Use `renv::activate()` to re-initialize the project." 1161 | ) 1162 | 1163 | warning(paste(msg, collapse = "\n"), call. = FALSE) 1164 | 1165 | } 1166 | 1167 | renv_json_read <- function(file = NULL, text = NULL) { 1168 | 1169 | jlerr <- NULL 1170 | 1171 | # if jsonlite is loaded, use that instead 1172 | if ("jsonlite" %in% loadedNamespaces()) { 1173 | 1174 | json <- tryCatch(renv_json_read_jsonlite(file, text), error = identity) 1175 | if (!inherits(json, "error")) 1176 | return(json) 1177 | 1178 | jlerr <- json 1179 | 1180 | } 1181 | 1182 | # otherwise, fall back to the default JSON reader 1183 | json <- tryCatch(renv_json_read_default(file, text), error = identity) 1184 | if (!inherits(json, "error")) 1185 | return(json) 1186 | 1187 | # report an error 1188 | if (!is.null(jlerr)) 1189 | stop(jlerr) 1190 | else 1191 | stop(json) 1192 | 1193 | } 1194 | 1195 | renv_json_read_jsonlite <- function(file = NULL, text = NULL) { 1196 | text <- paste(text %||% readLines(file, warn = FALSE), collapse = "\n") 1197 | jsonlite::fromJSON(txt = text, simplifyVector = FALSE) 1198 | } 1199 | 1200 | renv_json_read_patterns <- function() { 1201 | 1202 | list( 1203 | 1204 | # objects 1205 | list("{", "\t\n\tobject(\t\n\t"), 1206 | list("}", "\t\n\t)\t\n\t"), 1207 | 1208 | # arrays 1209 | list("[", "\t\n\tarray(\t\n\t"), 1210 | list("]", "\n\t\n)\n\t\n"), 1211 | 1212 | # maps 1213 | list(":", "\t\n\t=\t\n\t") 1214 | 1215 | ) 1216 | 1217 | } 1218 | 1219 | renv_json_read_envir <- function() { 1220 | 1221 | envir <- new.env(parent = emptyenv()) 1222 | 1223 | envir[["+"]] <- `+` 1224 | envir[["-"]] <- `-` 1225 | 1226 | envir[["object"]] <- function(...) { 1227 | result <- list(...) 1228 | names(result) <- as.character(names(result)) 1229 | result 1230 | } 1231 | 1232 | envir[["array"]] <- list 1233 | 1234 | envir[["true"]] <- TRUE 1235 | envir[["false"]] <- FALSE 1236 | envir[["null"]] <- NULL 1237 | 1238 | envir 1239 | 1240 | } 1241 | 1242 | renv_json_read_remap <- function(object, patterns) { 1243 | 1244 | # repair names if necessary 1245 | if (!is.null(names(object))) { 1246 | 1247 | nms <- names(object) 1248 | for (pattern in patterns) 1249 | nms <- gsub(pattern[[2L]], pattern[[1L]], nms, fixed = TRUE) 1250 | names(object) <- nms 1251 | 1252 | } 1253 | 1254 | # repair strings if necessary 1255 | if (is.character(object)) { 1256 | for (pattern in patterns) 1257 | object <- gsub(pattern[[2L]], pattern[[1L]], object, fixed = TRUE) 1258 | } 1259 | 1260 | # recurse for other objects 1261 | if (is.recursive(object)) 1262 | for (i in seq_along(object)) 1263 | object[i] <- list(renv_json_read_remap(object[[i]], patterns)) 1264 | 1265 | # return remapped object 1266 | object 1267 | 1268 | } 1269 | 1270 | renv_json_read_default <- function(file = NULL, text = NULL) { 1271 | 1272 | # read json text 1273 | text <- paste(text %||% readLines(file, warn = FALSE), collapse = "\n") 1274 | 1275 | # convert into something the R parser will understand 1276 | patterns <- renv_json_read_patterns() 1277 | transformed <- text 1278 | for (pattern in patterns) 1279 | transformed <- gsub(pattern[[1L]], pattern[[2L]], transformed, fixed = TRUE) 1280 | 1281 | # parse it 1282 | rfile <- tempfile("renv-json-", fileext = ".R") 1283 | on.exit(unlink(rfile), add = TRUE) 1284 | writeLines(transformed, con = rfile) 1285 | json <- parse(rfile, keep.source = FALSE, srcfile = NULL)[[1L]] 1286 | 1287 | # evaluate in safe environment 1288 | result <- eval(json, envir = renv_json_read_envir()) 1289 | 1290 | # fix up strings if necessary 1291 | renv_json_read_remap(result, patterns) 1292 | 1293 | } 1294 | 1295 | 1296 | # load the renv profile, if any 1297 | renv_bootstrap_profile_load(project) 1298 | 1299 | # construct path to library root 1300 | root <- renv_bootstrap_library_root(project) 1301 | 1302 | # construct library prefix for platform 1303 | prefix <- renv_bootstrap_platform_prefix() 1304 | 1305 | # construct full libpath 1306 | libpath <- file.path(root, prefix) 1307 | 1308 | # run bootstrap code 1309 | renv_bootstrap_exec(project, libpath, version) 1310 | 1311 | invisible() 1312 | 1313 | }) 1314 | -------------------------------------------------------------------------------- /renv/profile: -------------------------------------------------------------------------------- 1 | lesson-requirements 2 | -------------------------------------------------------------------------------- /renv/profiles/lesson-requirements/renv/.gitignore: -------------------------------------------------------------------------------- 1 | library/ 2 | local/ 3 | cellar/ 4 | lock/ 5 | python/ 6 | sandbox/ 7 | staging/ 8 | -------------------------------------------------------------------------------- /renv/profiles/lesson-requirements/renv/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "bioconductor.version": null, 3 | "external.libraries": [], 4 | "ignored.packages": [], 5 | "package.dependency.fields": [ 6 | "Imports", 7 | "Depends", 8 | "LinkingTo" 9 | ], 10 | "ppm.enabled": null, 11 | "ppm.ignored.urls": [], 12 | "r.version": null, 13 | "snapshot.type": "implicit", 14 | "use.cache": true, 15 | "vcs.ignore.cellar": true, 16 | "vcs.ignore.library": true, 17 | "vcs.ignore.local": true, 18 | "vcs.manage.ignores": true 19 | } 20 | -------------------------------------------------------------------------------- /site/README.md: -------------------------------------------------------------------------------- 1 | This directory contains rendered lesson materials. Please do not edit files 2 | here. 3 | -------------------------------------------------------------------------------- /targets-workshop.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: No 4 | SaveWorkspace: No 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | 15 | AutoAppendNewline: Yes 16 | StripTrailingWhitespace: Yes 17 | LineEndingConversion: Posix 18 | 19 | BuildType: Website 20 | --------------------------------------------------------------------------------