├── .Rbuildignore ├── .editorconfig ├── .github └── workflows │ ├── README.md │ ├── pr-close-signal.yaml │ ├── pr-comment.yaml │ ├── pr-post-remove-branch.yaml │ ├── pr-preflight.yaml │ ├── pr-receive.yaml │ ├── sandpaper-main.yaml │ ├── sandpaper-version.txt │ ├── update-cache.yaml │ ├── update-workflows.yaml │ └── workbench-beta-phase.yml ├── .gitignore ├── .zenodo.json ├── AUTHORS ├── CITATION ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE.md ├── README.md ├── bin └── download_data.R ├── config.yaml ├── episodes ├── .Rhistory ├── .here ├── .ignore-05-databases.Rmd ├── 00-intro.Rmd ├── 01-intro-to-r.Rmd ├── 02-starting-with-data.Rmd ├── 03-dplyr.Rmd ├── 04-tidyr.Rmd ├── 05-ggplot2.Rmd ├── 06-rmarkdown.Rmd ├── 07-json.Rmd ├── data │ ├── SAFI.json │ ├── SAFI_clean.csv │ ├── SAFI_from_JSON.csv │ ├── download_data.R │ └── interviews_plotting.csv └── fig │ ├── R_00_Rstudio_01.png │ ├── R_00_Rstudio_02.png │ ├── R_00_Rstudio_03.png │ ├── R_02_Import_Dataset_01.png │ ├── data-frame.svg │ ├── here_horst.png │ ├── long_to_wide.png │ ├── new-rmd.png │ ├── packages_pane.png │ ├── pivot_long_to_wide.png │ ├── pivot_wide_to_long.png │ ├── pivot_wider.png │ ├── r+rstudio-analogy.jpg │ ├── r-automatic.jpeg │ ├── r-manual.jpeg │ ├── rmarkdown_wizards.png │ ├── rmd-rmd_to_html.png │ ├── rstudio_project_files.jpeg │ ├── separate_longer.png │ ├── tidy-data-wickham.png │ ├── tidyr-pivot_wider_longer.gif │ ├── wide_to_long.png │ └── working-directory-setup.png ├── index.md ├── instructors └── instructor-notes.md ├── learners ├── reference.md └── setup.md ├── profiles └── learner-profiles.md ├── renv ├── activate.R ├── profile └── profiles │ └── lesson-requirements │ ├── renv.lock │ └── renv │ └── .gitignore └── site └── README.md /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^renv$ 2 | ^renv\.lock$ 3 | ^\.travis\.yml$ 4 | ^tic\.R$ 5 | -------------------------------------------------------------------------------- /.editorconfig: -------------------------------------------------------------------------------- 1 | root = true 2 | 3 | [*] 4 | charset = utf-8 5 | insert_final_newline = true 6 | trim_trailing_whitespace = true 7 | 8 | [*.md] 9 | indent_size = 2 10 | indent_style = space 11 | max_line_length = 100 # Please keep this in sync with bin/lesson_check.py! 12 | trim_trailing_whitespace = false # keep trailing spaces in markdown - 2+ spaces are translated to a hard break (
) 13 | 14 | [*.r] 15 | max_line_length = 80 16 | 17 | [*.py] 18 | indent_size = 4 19 | indent_style = space 20 | max_line_length = 79 21 | 22 | [*.sh] 23 | end_of_line = lf 24 | 25 | [Makefile] 26 | indent_style = tab 27 | -------------------------------------------------------------------------------- /.github/workflows/README.md: -------------------------------------------------------------------------------- 1 | # Carpentries Workflows 2 | 3 | This directory contains workflows to be used for Lessons using the {sandpaper} 4 | lesson infrastructure. Two of these workflows require R (`sandpaper-main.yaml` 5 | and `pr-receive.yaml`) and the rest are bots to handle pull request management. 6 | 7 | These workflows will likely change as {sandpaper} evolves, so it is important to 8 | keep them up-to-date. To do this in your lesson you can do the following in your 9 | R console: 10 | 11 | ```r 12 | # Install/Update sandpaper 13 | options(repos = c(carpentries = "https://carpentries.r-universe.dev/", 14 | CRAN = "https://cloud.r-project.org")) 15 | install.packages("sandpaper") 16 | 17 | # update the workflows in your lesson 18 | library("sandpaper") 19 | update_github_workflows() 20 | ``` 21 | 22 | Inside this folder, you will find a file called `sandpaper-version.txt`, which 23 | will contain a version number for sandpaper. This will be used in the future to 24 | alert you if a workflow update is needed. 25 | 26 | What follows are the descriptions of the workflow files: 27 | 28 | ## Deployment 29 | 30 | ### 01 Build and Deploy (sandpaper-main.yaml) 31 | 32 | This is the main driver that will only act on the main branch of the repository. 33 | This workflow does the following: 34 | 35 | 1. checks out the lesson 36 | 2. provisions the following resources 37 | - R 38 | - pandoc 39 | - lesson infrastructure (stored in a cache) 40 | - lesson dependencies if needed (stored in a cache) 41 | 3. builds the lesson via `sandpaper:::ci_deploy()` 42 | 43 | #### Caching 44 | 45 | This workflow has two caches; one cache is for the lesson infrastructure and 46 | the other is for the lesson dependencies if the lesson contains rendered 47 | content. These caches are invalidated by new versions of the infrastructure and 48 | the `renv.lock` file, respectively. If there is a problem with the cache, 49 | manual invaliation is necessary. You will need maintain access to the repository 50 | and you can either go to the actions tab and [click on the caches button to find 51 | and invalidate the failing cache](https://github.blog/changelog/2022-10-20-manage-caches-in-your-actions-workflows-from-web-interface/) 52 | or by setting the `CACHE_VERSION` secret to the current date (which will 53 | invalidate all of the caches). 54 | 55 | ## Updates 56 | 57 | ### Setup Information 58 | 59 | These workflows run on a schedule and at the maintainer's request. Because they 60 | create pull requests that update workflows/require the downstream actions to run, 61 | they need a special repository/organization secret token called 62 | `SANDPAPER_WORKFLOW` and it must have the `public_repo` and `workflow` scope. 63 | 64 | This can be an individual user token, OR it can be a trusted bot account. If you 65 | have a repository in one of the official Carpentries accounts, then you do not 66 | need to worry about this token being present because the Carpentries Core Team 67 | will take care of supplying this token. 68 | 69 | If you want to use your personal account: you can go to 70 | 71 | to create a token. Once you have created your token, you should copy it to your 72 | clipboard and then go to your repository's settings > secrets > actions and 73 | create or edit the `SANDPAPER_WORKFLOW` secret, pasting in the generated token. 74 | 75 | If you do not specify your token correctly, the runs will not fail and they will 76 | give you instructions to provide the token for your repository. 77 | 78 | ### 02 Maintain: Update Workflow Files (update-workflow.yaml) 79 | 80 | The {sandpaper} repository was designed to do as much as possible to separate 81 | the tools from the content. For local builds, this is absolutely true, but 82 | there is a minor issue when it comes to workflow files: they must live inside 83 | the repository. 84 | 85 | This workflow ensures that the workflow files are up-to-date. The way it work is 86 | to download the update-workflows.sh script from GitHub and run it. The script 87 | will do the following: 88 | 89 | 1. check the recorded version of sandpaper against the current version on github 90 | 2. update the files if there is a difference in versions 91 | 92 | After the files are updated, if there are any changes, they are pushed to a 93 | branch called `update/workflows` and a pull request is created. Maintainers are 94 | encouraged to review the changes and accept the pull request if the outputs 95 | are okay. 96 | 97 | This update is run weekly or on demand. 98 | 99 | ### 03 Maintain: Update Package Cache (update-cache.yaml) 100 | 101 | For lessons that have generated content, we use {renv} to ensure that the output 102 | is stable. This is controlled by a single lockfile which documents the packages 103 | needed for the lesson and the version numbers. This workflow is skipped in 104 | lessons that do not have generated content. 105 | 106 | Because the lessons need to remain current with the package ecosystem, it's a 107 | good idea to make sure these packages can be updated periodically. The 108 | update cache workflow will do this by checking for updates, applying them in a 109 | branch called `updates/packages` and creating a pull request with _only the 110 | lockfile changed_. 111 | 112 | From here, the markdown documents will be rebuilt and you can inspect what has 113 | changed based on how the packages have updated. 114 | 115 | ## Pull Request and Review Management 116 | 117 | Because our lessons execute code, pull requests are a secruity risk for any 118 | lesson and thus have security measures associted with them. **Do not merge any 119 | pull requests that do not pass checks and do not have bots commented on them.** 120 | 121 | This series of workflows all go together and are described in the following 122 | diagram and the below sections: 123 | 124 | ![Graph representation of a pull request](https://carpentries.github.io/sandpaper/articles/img/pr-flow.dot.svg) 125 | 126 | ### Pre Flight Pull Request Validation (pr-preflight.yaml) 127 | 128 | This workflow runs every time a pull request is created and its purpose is to 129 | validate that the pull request is okay to run. This means the following things: 130 | 131 | 1. The pull request does not contain modified workflow files 132 | 2. If the pull request contains modified workflow files, it does not contain 133 | modified content files (such as a situation where @carpentries-bot will 134 | make an automated pull request) 135 | 3. The pull request does not contain an invalid commit hash (e.g. from a fork 136 | that was made before a lesson was transitioned from styles to use the 137 | workbench). 138 | 139 | Once the checks are finished, a comment is issued to the pull request, which 140 | will allow maintainers to determine if it is safe to run the 141 | "Receive Pull Request" workflow from new contributors. 142 | 143 | ### Receive Pull Request (pr-receive.yaml) 144 | 145 | **Note of caution:** This workflow runs arbitrary code by anyone who creates a 146 | pull request. GitHub has safeguarded the token used in this workflow to have no 147 | priviledges in the repository, but we have taken precautions to protect against 148 | spoofing. 149 | 150 | This workflow is triggered with every push to a pull request. If this workflow 151 | is already running and a new push is sent to the pull request, the workflow 152 | running from the previous push will be cancelled and a new workflow run will be 153 | started. 154 | 155 | The first step of this workflow is to check if it is valid (e.g. that no 156 | workflow files have been modified). If there are workflow files that have been 157 | modified, a comment is made that indicates that the workflow is not run. If 158 | both a workflow file and lesson content is modified, an error will occurr. 159 | 160 | The second step (if valid) is to build the generated content from the pull 161 | request. This builds the content and uploads three artifacts: 162 | 163 | 1. The pull request number (pr) 164 | 2. A summary of changes after the rendering process (diff) 165 | 3. The rendered files (build) 166 | 167 | Because this workflow builds generated content, it follows the same general 168 | process as the `sandpaper-main` workflow with the same caching mechanisms. 169 | 170 | The artifacts produced are used by the next workflow. 171 | 172 | ### Comment on Pull Request (pr-comment.yaml) 173 | 174 | This workflow is triggered if the `pr-receive.yaml` workflow is successful. 175 | The steps in this workflow are: 176 | 177 | 1. Test if the workflow is valid and comment the validity of the workflow to the 178 | pull request. 179 | 2. If it is valid: create an orphan branch with two commits: the current state 180 | of the repository and the proposed changes. 181 | 3. If it is valid: update the pull request comment with the summary of changes 182 | 183 | Importantly: if the pull request is invalid, the branch is not created so any 184 | malicious code is not published. 185 | 186 | From here, the maintainer can request changes from the author and eventually 187 | either merge or reject the PR. When this happens, if the PR was valid, the 188 | preview branch needs to be deleted. 189 | 190 | ### Send Close PR Signal (pr-close-signal.yaml) 191 | 192 | Triggered any time a pull request is closed. This emits an artifact that is the 193 | pull request number for the next action 194 | 195 | ### Remove Pull Request Branch (pr-post-remove-branch.yaml) 196 | 197 | Tiggered by `pr-close-signal.yaml`. This removes the temporary branch associated with 198 | the pull request (if it was created). 199 | -------------------------------------------------------------------------------- /.github/workflows/pr-close-signal.yaml: -------------------------------------------------------------------------------- 1 | name: "Bot: Send Close Pull Request Signal" 2 | 3 | on: 4 | pull_request: 5 | types: 6 | [closed] 7 | 8 | jobs: 9 | send-close-signal: 10 | name: "Send closing signal" 11 | runs-on: ubuntu-22.04 12 | if: ${{ github.event.action == 'closed' }} 13 | steps: 14 | - name: "Create PRtifact" 15 | run: | 16 | mkdir -p ./pr 17 | printf ${{ github.event.number }} > ./pr/NUM 18 | - name: Upload Diff 19 | uses: actions/upload-artifact@v4 20 | with: 21 | name: pr 22 | path: ./pr 23 | -------------------------------------------------------------------------------- /.github/workflows/pr-comment.yaml: -------------------------------------------------------------------------------- 1 | name: "Bot: Comment on the Pull Request" 2 | 3 | # read-write repo token 4 | # access to secrets 5 | on: 6 | workflow_run: 7 | workflows: ["Receive Pull Request"] 8 | types: 9 | - completed 10 | 11 | concurrency: 12 | group: pr-${{ github.event.workflow_run.pull_requests[0].number }} 13 | cancel-in-progress: true 14 | 15 | 16 | jobs: 17 | # Pull requests are valid if: 18 | # - they match the sha of the workflow run head commit 19 | # - they are open 20 | # - no .github files were committed 21 | test-pr: 22 | name: "Test if pull request is valid" 23 | runs-on: ubuntu-22.04 24 | if: > 25 | github.event.workflow_run.event == 'pull_request' && 26 | github.event.workflow_run.conclusion == 'success' 27 | outputs: 28 | is_valid: ${{ steps.check-pr.outputs.VALID }} 29 | payload: ${{ steps.check-pr.outputs.payload }} 30 | number: ${{ steps.get-pr.outputs.NUM }} 31 | msg: ${{ steps.check-pr.outputs.MSG }} 32 | steps: 33 | - name: 'Download PR artifact' 34 | id: dl 35 | uses: carpentries/actions/download-workflow-artifact@main 36 | with: 37 | run: ${{ github.event.workflow_run.id }} 38 | name: 'pr' 39 | 40 | - name: "Get PR Number" 41 | if: ${{ steps.dl.outputs.success == 'true' }} 42 | id: get-pr 43 | run: | 44 | unzip pr.zip 45 | echo "NUM=$(<./NR)" >> $GITHUB_OUTPUT 46 | 47 | - name: "Fail if PR number was not present" 48 | id: bad-pr 49 | if: ${{ steps.dl.outputs.success != 'true' }} 50 | run: | 51 | echo '::error::A pull request number was not recorded. The pull request that triggered this workflow is likely malicious.' 52 | exit 1 53 | - name: "Get Invalid Hashes File" 54 | id: hash 55 | run: | 56 | echo "json<> $GITHUB_OUTPUT 59 | - name: "Check PR" 60 | id: check-pr 61 | if: ${{ steps.dl.outputs.success == 'true' }} 62 | uses: carpentries/actions/check-valid-pr@main 63 | with: 64 | pr: ${{ steps.get-pr.outputs.NUM }} 65 | sha: ${{ github.event.workflow_run.head_sha }} 66 | headroom: 3 # if it's within the last three commits, we can keep going, because it's likely rapid-fire 67 | invalid: ${{ fromJSON(steps.hash.outputs.json)[github.repository] }} 68 | fail_on_error: true 69 | 70 | # Create an orphan branch on this repository with two commits 71 | # - the current HEAD of the md-outputs branch 72 | # - the output from running the current HEAD of the pull request through 73 | # the md generator 74 | create-branch: 75 | name: "Create Git Branch" 76 | needs: test-pr 77 | runs-on: ubuntu-22.04 78 | if: ${{ needs.test-pr.outputs.is_valid == 'true' }} 79 | env: 80 | NR: ${{ needs.test-pr.outputs.number }} 81 | permissions: 82 | contents: write 83 | steps: 84 | - name: 'Checkout md outputs' 85 | uses: actions/checkout@v4 86 | with: 87 | ref: md-outputs 88 | path: built 89 | fetch-depth: 1 90 | 91 | - name: 'Download built markdown' 92 | id: dl 93 | uses: carpentries/actions/download-workflow-artifact@main 94 | with: 95 | run: ${{ github.event.workflow_run.id }} 96 | name: 'built' 97 | 98 | - if: ${{ steps.dl.outputs.success == 'true' }} 99 | run: unzip built.zip 100 | 101 | - name: "Create orphan and push" 102 | if: ${{ steps.dl.outputs.success == 'true' }} 103 | run: | 104 | cd built/ 105 | git config --local user.email "actions@github.com" 106 | git config --local user.name "GitHub Actions" 107 | CURR_HEAD=$(git rev-parse HEAD) 108 | git checkout --orphan md-outputs-PR-${NR} 109 | git add -A 110 | git commit -m "source commit: ${CURR_HEAD}" 111 | ls -A | grep -v '^.git$' | xargs -I _ rm -r '_' 112 | cd .. 113 | unzip -o -d built built.zip 114 | cd built 115 | git add -A 116 | git commit --allow-empty -m "differences for PR #${NR}" 117 | git push -u --force --set-upstream origin md-outputs-PR-${NR} 118 | 119 | # Comment on the Pull Request with a link to the branch and the diff 120 | comment-pr: 121 | name: "Comment on Pull Request" 122 | needs: [test-pr, create-branch] 123 | runs-on: ubuntu-22.04 124 | if: ${{ needs.test-pr.outputs.is_valid == 'true' }} 125 | env: 126 | NR: ${{ needs.test-pr.outputs.number }} 127 | permissions: 128 | pull-requests: write 129 | steps: 130 | - name: 'Download comment artifact' 131 | id: dl 132 | uses: carpentries/actions/download-workflow-artifact@main 133 | with: 134 | run: ${{ github.event.workflow_run.id }} 135 | name: 'diff' 136 | 137 | - if: ${{ steps.dl.outputs.success == 'true' }} 138 | run: unzip ${{ github.workspace }}/diff.zip 139 | 140 | - name: "Comment on PR" 141 | id: comment-diff 142 | if: ${{ steps.dl.outputs.success == 'true' }} 143 | uses: carpentries/actions/comment-diff@main 144 | with: 145 | pr: ${{ env.NR }} 146 | path: ${{ github.workspace }}/diff.md 147 | 148 | # Comment if the PR is open and matches the SHA, but the workflow files have 149 | # changed 150 | comment-changed-workflow: 151 | name: "Comment if workflow files have changed" 152 | needs: test-pr 153 | runs-on: ubuntu-22.04 154 | if: ${{ always() && needs.test-pr.outputs.is_valid == 'false' }} 155 | env: 156 | NR: ${{ github.event.workflow_run.pull_requests[0].number }} 157 | body: ${{ needs.test-pr.outputs.msg }} 158 | permissions: 159 | pull-requests: write 160 | steps: 161 | - name: 'Check for spoofing' 162 | id: dl 163 | uses: carpentries/actions/download-workflow-artifact@main 164 | with: 165 | run: ${{ github.event.workflow_run.id }} 166 | name: 'built' 167 | 168 | - name: 'Alert if spoofed' 169 | id: spoof 170 | if: ${{ steps.dl.outputs.success == 'true' }} 171 | run: | 172 | echo 'body<> $GITHUB_ENV 173 | echo '' >> $GITHUB_ENV 174 | echo '## :x: DANGER :x:' >> $GITHUB_ENV 175 | echo 'This pull request has modified workflows that created output. Close this now.' >> $GITHUB_ENV 176 | echo '' >> $GITHUB_ENV 177 | echo 'EOF' >> $GITHUB_ENV 178 | 179 | - name: "Comment on PR" 180 | id: comment-diff 181 | uses: carpentries/actions/comment-diff@main 182 | with: 183 | pr: ${{ env.NR }} 184 | body: ${{ env.body }} 185 | -------------------------------------------------------------------------------- /.github/workflows/pr-post-remove-branch.yaml: -------------------------------------------------------------------------------- 1 | name: "Bot: Remove Temporary PR Branch" 2 | 3 | on: 4 | workflow_run: 5 | workflows: ["Bot: Send Close Pull Request Signal"] 6 | types: 7 | - completed 8 | 9 | jobs: 10 | delete: 11 | name: "Delete branch from Pull Request" 12 | runs-on: ubuntu-22.04 13 | if: > 14 | github.event.workflow_run.event == 'pull_request' && 15 | github.event.workflow_run.conclusion == 'success' 16 | permissions: 17 | contents: write 18 | steps: 19 | - name: 'Download artifact' 20 | uses: carpentries/actions/download-workflow-artifact@main 21 | with: 22 | run: ${{ github.event.workflow_run.id }} 23 | name: pr 24 | - name: "Get PR Number" 25 | id: get-pr 26 | run: | 27 | unzip pr.zip 28 | echo "NUM=$(<./NUM)" >> $GITHUB_OUTPUT 29 | - name: 'Remove branch' 30 | uses: carpentries/actions/remove-branch@main 31 | with: 32 | pr: ${{ steps.get-pr.outputs.NUM }} 33 | -------------------------------------------------------------------------------- /.github/workflows/pr-preflight.yaml: -------------------------------------------------------------------------------- 1 | name: "Pull Request Preflight Check" 2 | 3 | on: 4 | pull_request_target: 5 | branches: 6 | ["main"] 7 | types: 8 | ["opened", "synchronize", "reopened"] 9 | 10 | jobs: 11 | test-pr: 12 | name: "Test if pull request is valid" 13 | if: ${{ github.event.action != 'closed' }} 14 | runs-on: ubuntu-22.04 15 | outputs: 16 | is_valid: ${{ steps.check-pr.outputs.VALID }} 17 | permissions: 18 | pull-requests: write 19 | steps: 20 | - name: "Get Invalid Hashes File" 21 | id: hash 22 | run: | 23 | echo "json<> $GITHUB_OUTPUT 26 | - name: "Check PR" 27 | id: check-pr 28 | uses: carpentries/actions/check-valid-pr@main 29 | with: 30 | pr: ${{ github.event.number }} 31 | invalid: ${{ fromJSON(steps.hash.outputs.json)[github.repository] }} 32 | fail_on_error: true 33 | - name: "Comment result of validation" 34 | id: comment-diff 35 | if: ${{ always() }} 36 | uses: carpentries/actions/comment-diff@main 37 | with: 38 | pr: ${{ github.event.number }} 39 | body: ${{ steps.check-pr.outputs.MSG }} 40 | -------------------------------------------------------------------------------- /.github/workflows/pr-receive.yaml: -------------------------------------------------------------------------------- 1 | name: "Receive Pull Request" 2 | 3 | on: 4 | pull_request: 5 | types: 6 | [opened, synchronize, reopened] 7 | 8 | concurrency: 9 | group: ${{ github.ref }} 10 | cancel-in-progress: true 11 | 12 | jobs: 13 | test-pr: 14 | name: "Record PR number" 15 | if: ${{ github.event.action != 'closed' }} 16 | runs-on: ubuntu-22.04 17 | outputs: 18 | is_valid: ${{ steps.check-pr.outputs.VALID }} 19 | steps: 20 | - name: "Record PR number" 21 | id: record 22 | if: ${{ always() }} 23 | run: | 24 | echo ${{ github.event.number }} > ${{ github.workspace }}/NR # 2022-03-02: artifact name fixed to be NR 25 | - name: "Upload PR number" 26 | id: upload 27 | if: ${{ always() }} 28 | uses: actions/upload-artifact@v4 29 | with: 30 | name: pr 31 | path: ${{ github.workspace }}/NR 32 | - name: "Get Invalid Hashes File" 33 | id: hash 34 | run: | 35 | echo "json<> $GITHUB_OUTPUT 38 | - name: "echo output" 39 | run: | 40 | echo "${{ steps.hash.outputs.json }}" 41 | - name: "Check PR" 42 | id: check-pr 43 | uses: carpentries/actions/check-valid-pr@main 44 | with: 45 | pr: ${{ github.event.number }} 46 | invalid: ${{ fromJSON(steps.hash.outputs.json)[github.repository] }} 47 | 48 | build-md-source: 49 | name: "Build markdown source files if valid" 50 | needs: test-pr 51 | runs-on: ubuntu-22.04 52 | if: ${{ needs.test-pr.outputs.is_valid == 'true' }} 53 | env: 54 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 55 | RENV_PATHS_ROOT: ~/.local/share/renv/ 56 | CHIVE: ${{ github.workspace }}/site/chive 57 | PR: ${{ github.workspace }}/site/pr 58 | MD: ${{ github.workspace }}/site/built 59 | steps: 60 | - name: "Check Out Main Branch" 61 | uses: actions/checkout@v4 62 | 63 | - name: "Check Out Staging Branch" 64 | uses: actions/checkout@v4 65 | with: 66 | ref: md-outputs 67 | path: ${{ env.MD }} 68 | 69 | - name: "Set up R" 70 | uses: r-lib/actions/setup-r@v2 71 | with: 72 | use-public-rspm: true 73 | install-r: false 74 | 75 | - name: "Set up Pandoc" 76 | uses: r-lib/actions/setup-pandoc@v2 77 | 78 | - name: "Setup Lesson Engine" 79 | uses: carpentries/actions/setup-sandpaper@main 80 | with: 81 | cache-version: ${{ secrets.CACHE_VERSION }} 82 | 83 | - name: "Setup Package Cache" 84 | uses: carpentries/actions/setup-lesson-deps@main 85 | with: 86 | cache-version: ${{ secrets.CACHE_VERSION }} 87 | 88 | - name: "Validate and Build Markdown" 89 | id: build-site 90 | run: | 91 | sandpaper::package_cache_trigger(TRUE) 92 | sandpaper::validate_lesson(path = '${{ github.workspace }}') 93 | sandpaper:::build_markdown(path = '${{ github.workspace }}', quiet = FALSE) 94 | shell: Rscript {0} 95 | 96 | - name: "Generate Artifacts" 97 | id: generate-artifacts 98 | run: | 99 | sandpaper:::ci_bundle_pr_artifacts( 100 | repo = '${{ github.repository }}', 101 | pr_number = '${{ github.event.number }}', 102 | path_md = '${{ env.MD }}', 103 | path_pr = '${{ env.PR }}', 104 | path_archive = '${{ env.CHIVE }}', 105 | branch = 'md-outputs' 106 | ) 107 | shell: Rscript {0} 108 | 109 | - name: "Upload PR" 110 | uses: actions/upload-artifact@v4 111 | with: 112 | name: pr 113 | path: ${{ env.PR }} 114 | overwrite: true 115 | 116 | - name: "Upload Diff" 117 | uses: actions/upload-artifact@v4 118 | with: 119 | name: diff 120 | path: ${{ env.CHIVE }} 121 | retention-days: 1 122 | 123 | - name: "Upload Build" 124 | uses: actions/upload-artifact@v4 125 | with: 126 | name: built 127 | path: ${{ env.MD }} 128 | retention-days: 1 129 | 130 | - name: "Teardown" 131 | run: sandpaper::reset_site() 132 | shell: Rscript {0} 133 | -------------------------------------------------------------------------------- /.github/workflows/sandpaper-main.yaml: -------------------------------------------------------------------------------- 1 | name: "01 Build and Deploy Site" 2 | 3 | on: 4 | push: 5 | branches: 6 | - main 7 | - master 8 | schedule: 9 | - cron: '0 0 * * 2' 10 | workflow_dispatch: 11 | inputs: 12 | name: 13 | description: 'Who triggered this build?' 14 | required: true 15 | default: 'Maintainer (via GitHub)' 16 | reset: 17 | description: 'Reset cached markdown files' 18 | required: false 19 | default: false 20 | type: boolean 21 | jobs: 22 | full-build: 23 | name: "Build Full Site" 24 | 25 | # 2024-10-01: ubuntu-latest is now 24.04 and R is not installed by default in the runner image 26 | # pin to 22.04 for now 27 | runs-on: ubuntu-22.04 28 | permissions: 29 | checks: write 30 | contents: write 31 | pages: write 32 | env: 33 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 34 | RENV_PATHS_ROOT: ~/.local/share/renv/ 35 | steps: 36 | 37 | - name: "Checkout Lesson" 38 | uses: actions/checkout@v4 39 | 40 | - name: "Set up R" 41 | uses: r-lib/actions/setup-r@v2 42 | with: 43 | use-public-rspm: true 44 | install-r: false 45 | 46 | - name: "Set up Pandoc" 47 | uses: r-lib/actions/setup-pandoc@v2 48 | 49 | - name: "Setup Lesson Engine" 50 | uses: carpentries/actions/setup-sandpaper@main 51 | with: 52 | cache-version: ${{ secrets.CACHE_VERSION }} 53 | 54 | - name: "Setup Package Cache" 55 | uses: carpentries/actions/setup-lesson-deps@main 56 | with: 57 | cache-version: ${{ secrets.CACHE_VERSION }} 58 | 59 | - name: "Deploy Site" 60 | run: | 61 | reset <- "${{ github.event.inputs.reset }}" == "true" 62 | sandpaper::package_cache_trigger(TRUE) 63 | sandpaper:::ci_deploy(reset = reset) 64 | shell: Rscript {0} 65 | -------------------------------------------------------------------------------- /.github/workflows/sandpaper-version.txt: -------------------------------------------------------------------------------- 1 | 0.16.12 2 | -------------------------------------------------------------------------------- /.github/workflows/update-cache.yaml: -------------------------------------------------------------------------------- 1 | name: "03 Maintain: Update Package Cache" 2 | 3 | on: 4 | workflow_dispatch: 5 | inputs: 6 | name: 7 | description: 'Who triggered this build (enter github username to tag yourself)?' 8 | required: true 9 | default: 'monthly run' 10 | schedule: 11 | # Run every tuesday 12 | - cron: '0 0 * * 2' 13 | 14 | jobs: 15 | preflight: 16 | name: "Preflight Check" 17 | runs-on: ubuntu-22.04 18 | outputs: 19 | ok: ${{ steps.check.outputs.ok }} 20 | steps: 21 | - id: check 22 | run: | 23 | if [[ ${{ github.event_name }} == 'workflow_dispatch' ]]; then 24 | echo "ok=true" >> $GITHUB_OUTPUT 25 | echo "Running on request" 26 | # using single brackets here to avoid 08 being interpreted as octal 27 | # https://github.com/carpentries/sandpaper/issues/250 28 | elif [ `date +%d` -le 7 ]; then 29 | # If the Tuesday lands in the first week of the month, run it 30 | echo "ok=true" >> $GITHUB_OUTPUT 31 | echo "Running on schedule" 32 | else 33 | echo "ok=false" >> $GITHUB_OUTPUT 34 | echo "Not Running Today" 35 | fi 36 | 37 | check_renv: 38 | name: "Check if We Need {renv}" 39 | runs-on: ubuntu-22.04 40 | needs: preflight 41 | if: ${{ needs.preflight.outputs.ok == 'true'}} 42 | outputs: 43 | needed: ${{ steps.renv.outputs.exists }} 44 | steps: 45 | - name: "Checkout Lesson" 46 | uses: actions/checkout@v4 47 | - id: renv 48 | run: | 49 | if [[ -d renv ]]; then 50 | echo "exists=true" >> $GITHUB_OUTPUT 51 | fi 52 | 53 | check_token: 54 | name: "Check SANDPAPER_WORKFLOW token" 55 | runs-on: ubuntu-22.04 56 | needs: check_renv 57 | if: ${{ needs.check_renv.outputs.needed == 'true' }} 58 | outputs: 59 | workflow: ${{ steps.validate.outputs.wf }} 60 | repo: ${{ steps.validate.outputs.repo }} 61 | steps: 62 | - name: "validate token" 63 | id: validate 64 | uses: carpentries/actions/check-valid-credentials@main 65 | with: 66 | token: ${{ secrets.SANDPAPER_WORKFLOW }} 67 | 68 | update_cache: 69 | name: "Update Package Cache" 70 | needs: check_token 71 | if: ${{ needs.check_token.outputs.repo== 'true' }} 72 | runs-on: ubuntu-22.04 73 | env: 74 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 75 | RENV_PATHS_ROOT: ~/.local/share/renv/ 76 | steps: 77 | 78 | - name: "Checkout Lesson" 79 | uses: actions/checkout@v4 80 | 81 | - name: "Set up R" 82 | uses: r-lib/actions/setup-r@v2 83 | with: 84 | use-public-rspm: true 85 | install-r: false 86 | 87 | - name: "Update {renv} deps and determine if a PR is needed" 88 | id: update 89 | uses: carpentries/actions/update-lockfile@main 90 | with: 91 | cache-version: ${{ secrets.CACHE_VERSION }} 92 | 93 | - name: Create Pull Request 94 | id: cpr 95 | if: ${{ steps.update.outputs.n > 0 }} 96 | uses: carpentries/create-pull-request@main 97 | with: 98 | token: ${{ secrets.SANDPAPER_WORKFLOW }} 99 | delete-branch: true 100 | branch: "update/packages" 101 | commit-message: "[actions] update ${{ steps.update.outputs.n }} packages" 102 | title: "Update ${{ steps.update.outputs.n }} packages" 103 | body: | 104 | :robot: This is an automated build 105 | 106 | This will update ${{ steps.update.outputs.n }} packages in your lesson with the following versions: 107 | 108 | ``` 109 | ${{ steps.update.outputs.report }} 110 | ``` 111 | 112 | :stopwatch: In a few minutes, a comment will appear that will show you how the output has changed based on these updates. 113 | 114 | If you want to inspect these changes locally, you can use the following code to check out a new branch: 115 | 116 | ```bash 117 | git fetch origin update/packages 118 | git checkout update/packages 119 | ``` 120 | 121 | - Auto-generated by [create-pull-request][1] on ${{ steps.update.outputs.date }} 122 | 123 | [1]: https://github.com/carpentries/create-pull-request/tree/main 124 | labels: "type: package cache" 125 | draft: false 126 | -------------------------------------------------------------------------------- /.github/workflows/update-workflows.yaml: -------------------------------------------------------------------------------- 1 | name: "02 Maintain: Update Workflow Files" 2 | 3 | on: 4 | workflow_dispatch: 5 | inputs: 6 | name: 7 | description: 'Who triggered this build (enter github username to tag yourself)?' 8 | required: true 9 | default: 'weekly run' 10 | clean: 11 | description: 'Workflow files/file extensions to clean (no wildcards, enter "" for none)' 12 | required: false 13 | default: '.yaml' 14 | schedule: 15 | # Run every Tuesday 16 | - cron: '0 0 * * 2' 17 | 18 | jobs: 19 | check_token: 20 | name: "Check SANDPAPER_WORKFLOW token" 21 | runs-on: ubuntu-22.04 22 | outputs: 23 | workflow: ${{ steps.validate.outputs.wf }} 24 | repo: ${{ steps.validate.outputs.repo }} 25 | steps: 26 | - name: "validate token" 27 | id: validate 28 | uses: carpentries/actions/check-valid-credentials@main 29 | with: 30 | token: ${{ secrets.SANDPAPER_WORKFLOW }} 31 | 32 | update_workflow: 33 | name: "Update Workflow" 34 | runs-on: ubuntu-22.04 35 | needs: check_token 36 | if: ${{ needs.check_token.outputs.workflow == 'true' }} 37 | steps: 38 | - name: "Checkout Repository" 39 | uses: actions/checkout@v4 40 | 41 | - name: Update Workflows 42 | id: update 43 | uses: carpentries/actions/update-workflows@main 44 | with: 45 | clean: ${{ github.event.inputs.clean }} 46 | 47 | - name: Create Pull Request 48 | id: cpr 49 | if: "${{ steps.update.outputs.new }}" 50 | uses: carpentries/create-pull-request@main 51 | with: 52 | token: ${{ secrets.SANDPAPER_WORKFLOW }} 53 | delete-branch: true 54 | branch: "update/workflows" 55 | commit-message: "[actions] update sandpaper workflow to version ${{ steps.update.outputs.new }}" 56 | title: "Update Workflows to Version ${{ steps.update.outputs.new }}" 57 | body: | 58 | :robot: This is an automated build 59 | 60 | Update Workflows from sandpaper version ${{ steps.update.outputs.old }} -> ${{ steps.update.outputs.new }} 61 | 62 | - Auto-generated by [create-pull-request][1] on ${{ steps.update.outputs.date }} 63 | 64 | [1]: https://github.com/carpentries/create-pull-request/tree/main 65 | labels: "type: template and tools" 66 | draft: false 67 | -------------------------------------------------------------------------------- /.github/workflows/workbench-beta-phase.yml: -------------------------------------------------------------------------------- 1 | name: "Deploy to AWS" 2 | 3 | on: 4 | workflow_run: 5 | workflows: ["01 Build and Deploy Site"] 6 | types: 7 | - completed 8 | workflow_dispatch: 9 | 10 | jobs: 11 | preflight: 12 | name: "Preflight Check" 13 | runs-on: ubuntu-latest 14 | outputs: 15 | ok: ${{ steps.check.outputs.ok }} 16 | folder: ${{ steps.check.outputs.folder }} 17 | steps: 18 | - id: check 19 | run: | 20 | if [[ -z "${{ secrets.DISTRIBUTION }}" || -z "${{ secrets.AWS_ACCESS_KEY_ID }}" || -z "${{ secrets.AWS_SECRET_ACCESS_KEY }}" ]]; then 21 | echo ":information_source: No site configured" >> $GITHUB_STEP_SUMMARY 22 | echo "" >> $GITHUB_STEP_SUMMARY 23 | echo 'To deploy the preview on AWS, you need the `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` and `DISTRIBUTION` secrets set up' >> $GITHUB_STEP_SUMMARY 24 | else 25 | echo "::set-output name=folder::"$(sed -E 's^.+/(.+)^\1^' <<< ${{ github.repository }}) 26 | echo "::set-output name=ok::true" 27 | fi 28 | 29 | full-build: 30 | name: "Deploy to AWS" 31 | needs: [preflight] 32 | if: ${{ needs.preflight.outputs.ok }} 33 | runs-on: ubuntu-latest 34 | steps: 35 | 36 | - name: "Checkout site folder" 37 | uses: actions/checkout@v3 38 | with: 39 | ref: 'gh-pages' 40 | path: 'source' 41 | 42 | - name: "Deploy to Bucket" 43 | uses: jakejarvis/s3-sync-action@v0.5.1 44 | with: 45 | args: --acl public-read --follow-symlinks --delete --exclude '.git/*' 46 | env: 47 | AWS_S3_BUCKET: preview.carpentries.org 48 | AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} 49 | AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} 50 | SOURCE_DIR: 'source' 51 | DEST_DIR: ${{ needs.preflight.outputs.folder }} 52 | 53 | - name: "Invalidate CloudFront" 54 | uses: chetan/invalidate-cloudfront-action@master 55 | env: 56 | PATHS: /* 57 | AWS_REGION: 'us-east-1' 58 | DISTRIBUTION: ${{ secrets.DISTRIBUTION }} 59 | AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} 60 | AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} 61 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # sandpaper files 2 | episodes/*html 3 | site/* 4 | !site/README.md 5 | 6 | # History files 7 | .Rhistory 8 | .Rapp.history 9 | # Session Data files 10 | .RData 11 | # User-specific files 12 | .Ruserdata 13 | # Example code in package build process 14 | *-Ex.R 15 | # Output files from R CMD build 16 | /*.tar.gz 17 | # Output files from R CMD check 18 | /*.Rcheck/ 19 | # RStudio files 20 | .Rproj.user/ 21 | # produced vignettes 22 | vignettes/*.html 23 | vignettes/*.pdf 24 | # OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3 25 | .httr-oauth 26 | # knitr and R markdown default cache directories 27 | *_cache/ 28 | /cache/ 29 | # Temporary files created by R markdown 30 | *.utf8.md 31 | *.knit.md 32 | # R Environment Variables 33 | .Renviron 34 | # pkgdown site 35 | docs/ 36 | # translation temp files 37 | po/*~ 38 | # renv detritus 39 | renv/sandbox/ 40 | *.pyc 41 | *~ 42 | .DS_Store 43 | .ipynb_checkpoints 44 | .sass-cache 45 | .jekyll-cache/ 46 | .jekyll-metadata 47 | __pycache__ 48 | _site 49 | .Rproj.user 50 | .bundle/ 51 | .vendor/ 52 | vendor/ 53 | .docker-vendor/ 54 | Gemfile.lock 55 | .*history 56 | -------------------------------------------------------------------------------- /.zenodo.json: -------------------------------------------------------------------------------- 1 | { 2 | "contributors": [ 3 | { 4 | "type": "Editor", 5 | "name": "Juan Fung", 6 | "orcid": "0000-0002-0820-787X" 7 | }, 8 | { 9 | "type": "Editor", 10 | "name": "Eirini Zormpa", 11 | "orcid": "0000-0002-6902-7768" 12 | }, 13 | { 14 | "type": "Editor", 15 | "name": "Jesse Sadler" 16 | } 17 | ], 18 | "creators": [ 19 | { 20 | "name": "Juan Fung", 21 | "orcid": "0000-0002-0820-787X" 22 | }, 23 | { 24 | "name": "Allison Shay Theobold", 25 | "orcid": "0000-0002-8092-6182" 26 | }, 27 | { 28 | "name": "Kelsey Gonzalez", 29 | "orcid": "0000-0002-6592-8075" 30 | }, 31 | { 32 | "name": "Angela Li", 33 | "orcid": "0000-0002-8956-419X" 34 | }, 35 | { 36 | "name": "Eirini Zormpa", 37 | "orcid": "0000-0002-6902-7768" 38 | }, 39 | { 40 | "name": "Maneesha Sane" 41 | }, 42 | { 43 | "name": "Jesse Sadler" 44 | }, 45 | { 46 | "name": "brynnelliott" 47 | }, 48 | { 49 | "name": "bkmgit" 50 | }, 51 | { 52 | "name": "bbartholdy" 53 | }, 54 | { 55 | "name": "Eirini Zormpa" 56 | }, 57 | { 58 | "name": "aranganath24" 59 | }, 60 | { 61 | "name": "Erin Alison Becker", 62 | "orcid": "0000-0002-6832-0233" 63 | }, 64 | { 65 | "name": "Martin Monkman" 66 | }, 67 | { 68 | "name": "agully1" 69 | }, 70 | { 71 | "name": "Claudiu Forgaci", 72 | "orcid": "0000-0003-3218-5102" 73 | }, 74 | { 75 | "name": "Elizabeth Wickes", 76 | "orcid": "0000-0003-0487-4437" 77 | }, 78 | { 79 | "name": "Kristian Kjelmann", 80 | "orcid": "0000-0001-7994-735X" 81 | }, 82 | { 83 | "name": "Peter Kiraly" 84 | }, 85 | { 86 | "name": "Ahobert" 87 | }, 88 | { 89 | "name": "Claudia Engel", 90 | "orcid": "0000-0002-5234-3924" 91 | }, 92 | { 93 | "name": "mcnanton" 94 | }, 95 | { 96 | "name": "Peter Verhaar", 97 | "orcid": "0000-0002-8469-6804" 98 | }, 99 | { 100 | "name": "Serah Njambi", 101 | "orcid": "0000-0002-7834-1038" 102 | }, 103 | { 104 | "name": "Wladimir Labeikovsky", 105 | "orcid": "0000-0001-6074-3269" 106 | }, 107 | { 108 | "name": "Angela Li" 109 | }, 110 | { 111 | "name": "Christopher Prener" 112 | }, 113 | { 114 | "name": "Craig Gross" 115 | }, 116 | { 117 | "name": "Eduard Klapwijk", 118 | "orcid": "0000-0002-8936-0365" 119 | }, 120 | { 121 | "name": "Elif Dede Yildirim" 122 | }, 123 | { 124 | "name": "Erwin Lares" 125 | }, 126 | { 127 | "name": "Ezra Herman" 128 | }, 129 | { 130 | "name": "Hao Ye", 131 | "orcid": "0000-0002-8630-1458" 132 | }, 133 | { 134 | "name": "Isaac Jennings" 135 | }, 136 | { 137 | "name": "Jonathan Stoneman" 138 | }, 139 | { 140 | "name": "maczokni" 141 | }, 142 | { 143 | "name": "Murray Cadzow", 144 | "orcid": "0000-0002-2299-4136" 145 | }, 146 | { 147 | "name": "noestetie", 148 | "orcid": "0000-0001-7602-6942" 149 | }, 150 | { 151 | "name": "Utrilla Guerrero" 152 | }, 153 | { 154 | "name": "Emma Rand", 155 | "orcid": "0000-0002-1358-8275" 156 | }, 157 | { 158 | "name": "Alan O'Callaghan", 159 | "orcid": "0000-0003-4817-6171" 160 | }, 161 | { 162 | "name": "Aleksandra Wilczynska" 163 | }, 164 | { 165 | "name": "Andrew Stewart" 166 | }, 167 | { 168 | "name": "Aolin Gong" 169 | }, 170 | { 171 | "name": "Amy Yarnell" 172 | }, 173 | { 174 | "name": "Bernard (Barney) Ricca" 175 | }, 176 | { 177 | "name": "Bjørn Peare Bartholdy", 178 | "orcid": "0000-0003-3985-1016" 179 | }, 180 | { 181 | "name": "bmillerlab" 182 | }, 183 | { 184 | "name": "bricakeld" 185 | }, 186 | { 187 | "name": "Carlos Ramirez-Reyes" 188 | }, 189 | { 190 | "name": "Carolyn McNabb" 191 | }, 192 | { 193 | "name": "Chi Gao" 194 | }, 195 | { 196 | "name": "Christian Knudsen" 197 | }, 198 | { 199 | "name": "Daniela Gawehns" 200 | }, 201 | { 202 | "name": "herenya7" 203 | }, 204 | { 205 | "name": "jborycz" 206 | }, 207 | { 208 | "name": "Jennifer Blanc" 209 | }, 210 | { 211 | "name": "Jesica Formoso" 212 | }, 213 | { 214 | "name": "Jon Jablonski" 215 | }, 216 | { 217 | "name": "Karl W Broman", 218 | "orcid": "0000-0002-4914-6671" 219 | }, 220 | { 221 | "name": "Katrin Leinweber", 222 | "orcid": "0000-0001-5135-5758" 223 | }, 224 | { 225 | "name": "Laura Josephine Botzet" 226 | }, 227 | { 228 | "name": "Lilian Huang" 229 | }, 230 | { 231 | "name": "Liza Wood" 232 | }, 233 | { 234 | "name": "Macarena Sol Quiroga" 235 | }, 236 | { 237 | "name": "Moksha Menghaney" 238 | }, 239 | { 240 | "name": "Natalia Maciel Block", 241 | "orcid": "0000-0003-1040-4159" 242 | }, 243 | { 244 | "name": "Nicholas Marchio" 245 | }, 246 | { 247 | "name": "Renato Alves", 248 | "orcid": "0000-0002-7212-0234" 249 | }, 250 | { 251 | "name": "Rochelle Lundy" 252 | }, 253 | { 254 | "name": "Sefa Ozalp", 255 | "orcid": "0000-0002-4104-1541" 256 | }, 257 | { 258 | "name": "Seo-young Silvia Kim", 259 | "orcid": "0000-0002-8801-9210" 260 | }, 261 | { 262 | "name": "Vishal Lala" 263 | }, 264 | { 265 | "name": "William X. Q. Ngiam", 266 | "orcid": "0000-0003-3567-3881" 267 | } 268 | ], 269 | "license": { 270 | "id": "CC-BY-4.0" 271 | } 272 | } -------------------------------------------------------------------------------- /AUTHORS: -------------------------------------------------------------------------------- 1 | Peter Smyth (peter.smyth@manchester.ac.uk) 2 | -------------------------------------------------------------------------------- /CITATION: -------------------------------------------------------------------------------- 1 | FIXME: describe how to cite this lesson. 2 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Contributor Code of Conduct" 3 | --- 4 | 5 | As contributors and maintainers of this project, 6 | we pledge to follow the [The Carpentries Code of Conduct][coc]. 7 | 8 | Instances of abusive, harassing, or otherwise unacceptable behavior 9 | may be reported by following our [reporting guidelines][coc-reporting]. 10 | 11 | 12 | [coc-reporting]: https://docs.carpentries.org/topic_folders/policies/incident-reporting.html 13 | [coc]: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html 14 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | ## Contributing 2 | 3 | [The Carpentries][cp-site] ([Software Carpentry][swc-site], [Data 4 | Carpentry][dc-site], and [Library Carpentry][lc-site]) are open source 5 | projects, and we welcome contributions of all kinds: new lessons, fixes to 6 | existing material, bug reports, and reviews of proposed changes are all 7 | welcome. 8 | 9 | ### Contributor Agreement 10 | 11 | By contributing, you agree that we may redistribute your work under [our 12 | license](LICENSE.md). In exchange, we will address your issues and/or assess 13 | your change proposal as promptly as we can, and help you become a member of our 14 | community. Everyone involved in [The Carpentries][cp-site] agrees to abide by 15 | our [code of conduct](CODE_OF_CONDUCT.md). 16 | 17 | ### How to Contribute 18 | 19 | The easiest way to get started is to file an issue to tell us about a spelling 20 | mistake, some awkward wording, or a factual error. This is a good way to 21 | introduce yourself and to meet some of our community members. 22 | 23 | 1. If you do not have a [GitHub][github] account, you can [send us comments by 24 | email][contact]. However, we will be able to respond more quickly if you use 25 | one of the other methods described below. 26 | 27 | 2. If you have a [GitHub][github] account, or are willing to [create 28 | one][github-join], but do not know how to use Git, you can report problems 29 | or suggest improvements by [creating an issue][repo-issues]. This allows us 30 | to assign the item to someone and to respond to it in a threaded discussion. 31 | 32 | 3. If you are comfortable with Git, and would like to add or change material, 33 | you can submit a pull request (PR). Instructions for doing this are 34 | [included below](#using-github). For inspiration about changes that need to 35 | be made, check out the [list of open issues][issues] across the Carpentries. 36 | 37 | Note: if you want to build the website locally, please refer to [The Workbench 38 | documentation][template-doc]. 39 | 40 | ### Where to Contribute 41 | 42 | 1. If you wish to change this lesson, add issues and pull requests here. 43 | 2. If you wish to change the template used for workshop websites, please refer 44 | to [The Workbench documentation][template-doc]. 45 | 46 | 47 | ### What to Contribute 48 | 49 | There are many ways to contribute, from writing new exercises and improving 50 | existing ones to updating or filling in the documentation and submitting [bug 51 | reports][issues] about things that do not work, are not clear, or are missing. 52 | If you are looking for ideas, please see [the list of issues for this 53 | repository][repo-issues], or the issues for [Data Carpentry][dc-issues], 54 | [Library Carpentry][lc-issues], and [Software Carpentry][swc-issues] projects. 55 | 56 | Comments on issues and reviews of pull requests are just as welcome: we are 57 | smarter together than we are on our own. **Reviews from novices and newcomers 58 | are particularly valuable**: it's easy for people who have been using these 59 | lessons for a while to forget how impenetrable some of this material can be, so 60 | fresh eyes are always welcome. 61 | 62 | ### What *Not* to Contribute 63 | 64 | Our lessons already contain more material than we can cover in a typical 65 | workshop, so we are usually *not* looking for more concepts or tools to add to 66 | them. As a rule, if you want to introduce a new idea, you must (a) estimate how 67 | long it will take to teach and (b) explain what you would take out to make room 68 | for it. The first encourages contributors to be honest about requirements; the 69 | second, to think hard about priorities. 70 | 71 | We are also not looking for exercises or other material that only run on one 72 | platform. Our workshops typically contain a mixture of Windows, macOS, and 73 | Linux users; in order to be usable, our lessons must run equally well on all 74 | three. 75 | 76 | ### Using GitHub 77 | 78 | If you choose to contribute via GitHub, you may want to look at [How to 79 | Contribute to an Open Source Project on GitHub][how-contribute]. In brief, we 80 | use [GitHub flow][github-flow] to manage changes: 81 | 82 | 1. Create a new branch in your desktop copy of this repository for each 83 | significant change. 84 | 2. Commit the change in that branch. 85 | 3. Push that branch to your fork of this repository on GitHub. 86 | 4. Submit a pull request from that branch to the [upstream repository][repo]. 87 | 5. If you receive feedback, make changes on your desktop and push to your 88 | branch on GitHub: the pull request will update automatically. 89 | 90 | NB: The published copy of the lesson is usually in the `main` branch. 91 | 92 | Each lesson has a team of maintainers who review issues and pull requests or 93 | encourage others to do so. The maintainers are community volunteers, and have 94 | final say over what gets merged into the lesson. 95 | 96 | ### Other Resources 97 | 98 | The Carpentries is a global organisation with volunteers and learners all over 99 | the world. We share values of inclusivity and a passion for sharing knowledge, 100 | teaching and learning. There are several ways to connect with The Carpentries 101 | community listed at including via social 102 | media, slack, newsletters, and email lists. You can also [reach us by 103 | email][contact]. 104 | 105 | [repo]: https://github.com/datacarpentry/r-socialsci/ 106 | [repo-issues]: https://github.com/datacarpentry/r-socialsci/issues 107 | [contact]: mailto:team@carpentries.org 108 | [cp-site]: https://carpentries.org/ 109 | [dc-issues]: https://github.com/issues?q=user%3Adatacarpentry 110 | [dc-lessons]: https://datacarpentry.org/lessons/ 111 | [dc-site]: https://datacarpentry.org/ 112 | [discuss-list]: https://carpentries.topicbox.com/groups/discuss 113 | [github]: https://github.com 114 | [github-flow]: https://guides.github.com/introduction/flow/ 115 | [github-join]: https://github.com/join 116 | [how-contribute]: https://egghead.io/courses/how-to-contribute-to-an-open-source-project-on-github 117 | [issues]: https://carpentries.org/help-wanted-issues/ 118 | [lc-issues]: https://github.com/issues?q=user%3ALibraryCarpentry 119 | [swc-issues]: https://github.com/issues?q=user%3Aswcarpentry 120 | [swc-lessons]: https://software-carpentry.org/lessons/ 121 | [swc-site]: https://software-carpentry.org/ 122 | [lc-site]: https://librarycarpentry.org/ 123 | [template-doc]: https://carpentries.github.io/workbench/ 124 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Licenses" 3 | --- 4 | 5 | ## Instructional Material 6 | 7 | All Software Carpentry, Data Carpentry, and Library Carpentry instructional material is 8 | made available under the [Creative Commons Attribution 9 | license][cc-by-human]. The following is a human-readable summary of 10 | (and not a substitute for) the [full legal text of the CC BY 4.0 11 | license][cc-by-legal]. 12 | 13 | You are free: 14 | 15 | * to **Share**---copy and redistribute the material in any medium or format 16 | * to **Adapt**---remix, transform, and build upon the material 17 | 18 | for any purpose, even commercially. 19 | 20 | The licensor cannot revoke these freedoms as long as you follow the 21 | license terms. 22 | 23 | Under the following terms: 24 | 25 | * **Attribution**---You must give appropriate credit (mentioning that 26 | your work is derived from work that is Copyright © Software 27 | Carpentry and, where practical, linking to 28 | http://software-carpentry.org/), provide a [link to the 29 | license][cc-by-human], and indicate if changes were made. You may do 30 | so in any reasonable manner, but not in any way that suggests the 31 | licensor endorses you or your use. 32 | 33 | **No additional restrictions**---You may not apply legal terms or 34 | technological measures that legally restrict others from doing 35 | anything the license permits. With the understanding that: 36 | 37 | Notices: 38 | 39 | * You do not have to comply with the license for elements of the 40 | material in the public domain or where your use is permitted by an 41 | applicable exception or limitation. 42 | * No warranties are given. The license may not give you all of the 43 | permissions necessary for your intended use. For example, other 44 | rights such as publicity, privacy, or moral rights may limit how you 45 | use the material. 46 | 47 | ## Software 48 | 49 | Except where otherwise noted, the example programs and other software 50 | provided by Software Carpentry and Data Carpentry are made available under the 51 | [OSI][osi]-approved 52 | [MIT license][mit-license]. 53 | 54 | Permission is hereby granted, free of charge, to any person obtaining 55 | a copy of this software and associated documentation files (the 56 | "Software"), to deal in the Software without restriction, including 57 | without limitation the rights to use, copy, modify, merge, publish, 58 | distribute, sublicense, and/or sell copies of the Software, and to 59 | permit persons to whom the Software is furnished to do so, subject to 60 | the following conditions: 61 | 62 | The above copyright notice and this permission notice shall be 63 | included in all copies or substantial portions of the Software. 64 | 65 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 66 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 67 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 68 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE 69 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 70 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 71 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 72 | 73 | ## Trademark 74 | 75 | "The Carpentries", "Software Carpentry", "Data Carpentry", and "Library 76 | Carpentry" and their respective logos are registered trademarks of 77 | [The Carpentries, Inc.][carpentries]. 78 | 79 | [cc-by-human]: https://creativecommons.org/licenses/by/4.0/ 80 | [cc-by-legal]: https://creativecommons.org/licenses/by/4.0/legalcode 81 | [mit-license]: https://opensource.org/licenses/mit-license.html 82 | [carpentries]: https://carpentries.org 83 | [osi]: https://opensource.org 84 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![Build Status](https://travis-ci.org/datacarpentry/r-socialsci.svg?branch=master)](https://travis-ci.com/github/datacarpentry/r-socialsci) 2 | [![Create a Slack Account with us](https://img.shields.io/badge/Create_Slack_Account-The_Carpentries-071159.svg)](https://slack-invite.carpentries.org/) 3 | [![Slack Status](https://img.shields.io/badge/Slack_Channel-dc--socsci--r-E01563.svg)](https://carpentries.slack.com/messages/C9X9JDTSR) 4 | [![DOI](https://zenodo.org/badge/92420906.svg)](https://zenodo.org/badge/latestdoi/92420906) 5 | 6 | # r-socialsci 7 | 8 | Lesson on R for social scientists. Please see [https://datacarpentry.org/r-socialsci/](https://datacarpentry.org/r-socialsci/) for a rendered version of this lesson. 9 | 10 | This is an introduction to R designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). The lessons cover some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting. 11 | 12 | The [instructor notes page](https://datacarpentry.org/r-socialsci/guide/index.html) have some tips about how to best teach this workshop. 13 | 14 | Maintainers: 15 | 16 | - [Juan Fung](https://github.com/juanfung) 17 | - [Eirini Zormpa](https://github.com/eirini-zormpa) 18 | - [Jesse Sadler](https://github.com/jessesadler) 19 | 20 | 21 | -------------------------------------------------------------------------------- /bin/download_data.R: -------------------------------------------------------------------------------- 1 | if (!dir.exists("data")) 2 | dir.create("data") 3 | 4 | if (! file.exists("data/SAFI_clean.csv")) { 5 | download.file("https://ndownloader.figshare.com/files/11492171", 6 | "data/SAFI_clean.csv", mode = "wb") 7 | } 8 | -------------------------------------------------------------------------------- /config.yaml: -------------------------------------------------------------------------------- 1 | #------------------------------------------------------------ 2 | # Values for this lesson. 3 | #------------------------------------------------------------ 4 | 5 | # Which carpentry is this (swc, dc, lc, or cp)? 6 | # swc: Software Carpentry 7 | # dc: Data Carpentry 8 | # lc: Library Carpentry 9 | # cp: Carpentries (to use for instructor training for instance) 10 | # incubator: The Carpentries Incubator 11 | carpentry: 'dc' 12 | 13 | # Overall title for pages. 14 | title: 'R for Social Scientists' 15 | 16 | # Date the lesson was created (YYYY-MM-DD, this is empty by default) 17 | created: '2017-05-25' 18 | 19 | # Comma-separated list of keywords for the lesson 20 | keywords: 'software, data, lesson, The Carpentries' 21 | 22 | # Life cycle stage of the lesson 23 | # possible values: pre-alpha, alpha, beta, stable 24 | life_cycle: 'stable' 25 | 26 | # License of the lesson 27 | license: 'CC-BY 4.0' 28 | 29 | # Link to the source repository for this lesson 30 | source: 'https://github.com/datacarpentry/r-socialsci/' 31 | 32 | # Default branch of your lesson 33 | branch: 'main' 34 | 35 | # Who to contact if there are any issues 36 | contact: 'team@carpentries.org' 37 | 38 | # Navigation ------------------------------------------------ 39 | # 40 | # Use the following menu items to specify the order of 41 | # individual pages in each dropdown section. Leave blank to 42 | # include all pages in the folder. 43 | # 44 | # Example ------------- 45 | # 46 | # episodes: 47 | # - introduction.md 48 | # - first-steps.md 49 | # 50 | # learners: 51 | # - setup.md 52 | # 53 | # instructors: 54 | # - instructor-notes.md 55 | # 56 | # profiles: 57 | # - one-learner.md 58 | # - another-learner.md 59 | 60 | # Order of episodes in your lesson 61 | episodes: 62 | - 00-intro.Rmd 63 | - 01-intro-to-r.Rmd 64 | - 02-starting-with-data.Rmd 65 | - 03-dplyr.Rmd 66 | - 04-tidyr.Rmd 67 | - 05-ggplot2.Rmd 68 | - 06-rmarkdown.Rmd 69 | - 07-json.Rmd 70 | 71 | # Information for Learners 72 | learners: 73 | - reference.md 74 | 75 | # Information for Instructors 76 | instructors: 77 | - instructor-notes.md 78 | 79 | # Learner Profiles 80 | profiles: 81 | 82 | # Customisation --------------------------------------------- 83 | # 84 | # This space below is where custom yaml items (e.g. pinning 85 | # sandpaper and varnish versions) should live 86 | 87 | 88 | url: 'https://datacarpentry.org/r-socialsci' 89 | -------------------------------------------------------------------------------- /episodes/.Rhistory: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/.Rhistory -------------------------------------------------------------------------------- /episodes/.here: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/.here -------------------------------------------------------------------------------- /episodes/.ignore-05-databases.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Using a relational database with R" 3 | teaching: 0 4 | exercises: 0 5 | questions: 6 | - "How can I import data held in an SQLite database into an R data frame?" 7 | - "How can I write data from a data frame to an SQLite table?" 8 | - "How can I create an SQLite database from csv files" 9 | 10 | objectives: 11 | - "Install RSQLite package" 12 | - "Create a connection to an SQLite database" 13 | - "Query the database" 14 | - "Create a new databaseand populate it" 15 | - "Use dplyr functions to access and query an SQLite database" 16 | 17 | keypoints: 18 | - "First key point." 19 | --- 20 | 21 | ```{r, include=FALSE} 22 | source("../bin/chunk-options.R") 23 | knitr_fig_path("06-") 24 | source("../bin/download_data.R") 25 | ``` 26 | 27 | ## Introduction 28 | 29 | A common problem with R in that all operations are conducted in-memory. Here, "memory" means a computer 30 | drive called Random Access Memory (RAM for short) which is phsically installed on your computer. RAM 31 | allows for data to be read in nearly the same amount of time regardless of its physical location 32 | inside the memory. This is in stark contrast to the speed data could be read from CDs, DVDs, or other 33 | storage media, where the speed of transfer depended on how quickly the drive could rotate or the 34 | arm could move. Unfortunately, your computer has a limited amount of RAM, so the amount of data you 35 | can work with is limited by the available memory. So far, we have used small datasets that can 36 | easily fit into your computer's memory. But what about datasets that are too large for your 37 | computer to handle as a whole? 38 | 39 | In this case, it is helpful to organze the data into a database stored outside of R before creating 40 | a connection to the database itself. This connection will essentially remove the limitation of memory 41 | because SQL queries can be sent directly from R to the database and return to R only the results that you 42 | have identified as being neccessary for your analysis. 43 | 44 | Once we have made the connection to the database, much of what we do will look familiar because the code we will be using is very similar to what we saw in the SQL lesson and earlier episodes of this R lesson. 45 | 46 | In this lesson, we will be connecting to an SQLite database, which allows us to send strings containing SQL statements directly from R to the database and recieve the results. In addition, we will be connecting to the database in such a way that we can use 'dplyr' functions to operate directly on the database tables. 47 | 48 | ## Prelminaries 49 | 50 | First, install and load the neccessary packages. You can install the `RSQLite` package with 51 | 52 | ```{r, eval=FALSE} 53 | install.packages("RSQLite") 54 | ``` 55 | 56 | Load the packages with 57 | 58 | ```{r} 59 | library(RSQLite) 60 | library(dplyr) 61 | ``` 62 | 63 | Next, create a variable that contains the location of the SQLite database we are going to use. Here, we are assuming that it is in the current working directory. 64 | 65 | ```{r} 66 | dbfile <- "data/SN7577.sqlite" 67 | ``` 68 | 69 | ## Connecting to an SQLite Database 70 | 71 | Connect to the SQLite database specified by `dbfile`, above, using the `dbConnect` function. 72 | 73 | ```{r} 74 | mydb <- dbConnect(dbDriver("SQLite"), dbfile) 75 | ``` 76 | 77 | Here, `mydb` represents the connection to the database. It will be specified every time we need to access the database. 78 | 79 | Now that we have a connection, we can get a list of the tables in the database. 80 | 81 | ```{r} 82 | dbListTables(mydb) 83 | ``` 84 | 85 | Our objective here is to bring data from the database into R by sending a query to the database and then asking for the results of that query. 86 | 87 | ```{r} 88 | # Assign the results of a SQL query to an SQLiteResult object 89 | results <- dbSendQuery(mydb, "SELECT * FROM Question1") 90 | 91 | # Return results from a custom object to a dataframe 92 | data <- fetch(results) 93 | ``` 94 | 95 | `data` is a standard R dataframe that can be explored and manipulated. 96 | 97 | ```{r} 98 | # Return column names 99 | names(data) 100 | 101 | # Return description of dataframe structure 102 | str(data) 103 | 104 | # Return the second column 105 | data[,2] 106 | 107 | # Return the value of the second column, fourth row 108 | data[4,2] 109 | 110 | # Return the second column where the value of the column 'key' is greater than 7 111 | data[data$key > 7,2] 112 | ``` 113 | 114 | Once you have retrieved the data you should close the connection. 115 | 116 | ```{r} 117 | dbClearResult(results) 118 | ``` 119 | 120 | In addition to sending simple queries we can send complex one like a join. 121 | You may want to set this up in a concateneted string first for readability. 122 | 123 | ```{r} 124 | SQL_query <- paste("SELECT q.value,", 125 | "count(*) as how_many", 126 | "FROM SN7577 s", 127 | "JOIN Question1 q", 128 | "ON q.key = s.Q1", 129 | "GROUP BY s.Q1") 130 | 131 | results <- dbSendQuery(mydb, SQL_query) 132 | 133 | data <- fetch(results) 134 | 135 | data 136 | 137 | dbClearResult(results) 138 | ``` 139 | 140 | ::::::::::::::::::::::::::::::::::::::::: callout 141 | 142 | ## Exercise 143 | 144 | What happens if you send invalid SQL syntax? 145 | 146 | ::::::::::::::: solution 147 | 148 | ## Solution 149 | 150 | An error message is returned from SQLite. 151 | Notice that R is just the conduit; it cannot check the SQL syntax. 152 | 153 | ::::::::::::::::::::::::: 154 | 155 | :::::::::::::::::::::::::::::::::::::::::::::::::: 156 | 157 | We can also create a new database and add tables to it. Let's base this new dataframe on the Question1 table that can be found in our existing database. 158 | 159 | ```{r} 160 | # First, use a SQL query to extract the Question1 table from the existing database 161 | results = dbSendQuery(mydb, "SELECT * from Question1") 162 | 163 | # Then, store it as a dataframe 164 | Q1 <- fetch(results) 165 | ``` 166 | 167 | Now, we can create the new database and add data to it, either from an external file or a local dataframe. 168 | 169 | ```{r} 170 | dbfile_new = "data/a_newdb.sqlite" 171 | mydb_new = dbConnect(dbDriver("SQLite"), dbfile_new) 172 | 173 | dbWriteTable(conn = mydb_new , name = "SN7577", value = "data/SN7577.csv", 174 | row.names = FALSE, header = TRUE) 175 | 176 | dbWriteTable(conn = mydb_new , name = "Q1", value = Q1, 177 | row.names = FALSE) 178 | 179 | dbListTables(mydb_new) 180 | ``` 181 | 182 | ## Connecting to a Database for `dplyr` Use 183 | 184 | When we want to use `dplyr` functions to operate directly on the database tables, 185 | a different connection method is used. 186 | 187 | ```{r} 188 | mydb_dplyr <- src_sqlite(path="data/SN7577.sqlite") 189 | ``` 190 | 191 | as is the method for running queries. However using the 'tbl' function we still need to provide avalid SQL string. (?) 192 | 193 | ```{r} 194 | tbl(mydb_dplyr, sql("SELECT count(*) from SN7577")) 195 | ``` 196 | 197 | The real advantage of using `dplyr` is that once we have stored the table as an object 198 | (here, `SN7577_d`), we can use `dplyr` functions instead of SQL statements. 199 | 200 | ```{r} 201 | # Store the table as an object 202 | SN7577_d <- tbl(mydb_dplyr, sql("SELECT * FROM SN7577")) 203 | 204 | # Explore the object 205 | head(SN7577_d, n = 10) 206 | nrow(SN7577_d) 207 | 208 | # Apply dplyr functions to the object 209 | SN7577_d %>% 210 | filter(numage > 60) %>% 211 | select(sex, age, numage) %>% 212 | group_by(sex, age) %>% 213 | summarize(avg_age = mean(numage)) 214 | ``` 215 | 216 | Notice that on the `nrow` command we get NA rather than a count of rows. Thisis because `dplyr` doesn't hold the full table even after the 'Select \* ...' 217 | 218 | If you need the row count you can use 219 | 220 | ```{r} 221 | SN7577_d %>% 222 | tally() 223 | ``` 224 | 225 | ::::::::::::::::::::::::::::::::::::::::: callout 226 | 227 | ## Exercise 228 | 229 | Store the SN7577 table as an object for `dplyr` use. 230 | 231 | Write a query using `dplyr` functions that will return the average age (`numage`) by sex for all records where 232 | the response for Q2 is missing (missing values are indicated by a value of -1). 233 | 234 | ::::::::::::::: solution 235 | 236 | ## Solution 237 | 238 | ```{r} 239 | SN7577_d <- tbl(mydb_dplyr, sql("SELECT * FROM SN7577")) 240 | 241 | SN7577_d %>% 242 | filter(Q2 == -1) %>% 243 | group_by(sex) %>% 244 | summarize(avg_age = mean(numage)) 245 | ``` 246 | 247 | ::::::::::::::::::::::::: 248 | 249 | :::::::::::::::::::::::::::::::::::::::::::::::::: 250 | 251 | {% include links.md %} 252 | 253 | 254 | -------------------------------------------------------------------------------- /episodes/00-intro.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Before we Start 3 | teaching: 25 4 | exercises: 15 5 | source: Rmd 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | source("data/download_data.R") 10 | ``` 11 | 12 | ::: instructor 13 | 14 | - The main goal here is to help the learners be comfortable with the RStudio 15 | interface. 16 | - Go very slowly in the "Getting set up" section. Make sure everyone is following 17 | along (remind learners to use the stickies). Plan with the helpers at this 18 | point to go around the room, and be available to help. It's important to make 19 | sure that learners are in the correct working directory, and that they create 20 | a `data` (all lowercase) subfolder. 21 | 22 | :::::::::::: 23 | 24 | ::::::::::::::::::::::::::::::::::::::: objectives 25 | 26 | - Install latest version of R. 27 | - Install latest version of RStudio. 28 | - Navigate the RStudio GUI. 29 | - Install additional packages using the packages tab. 30 | - Install additional packages using R code. 31 | 32 | :::::::::::::::::::::::::::::::::::::::::::::::::: 33 | 34 | :::::::::::::::::::::::::::::::::::::::: questions 35 | 36 | - How to find your way around RStudio? 37 | - How to interact with R? 38 | - How to manage your environment? 39 | - How to install packages? 40 | 41 | :::::::::::::::::::::::::::::::::::::::::::::::::: 42 | 43 | ## What is R? What is RStudio? 44 | 45 | The term "`R`" is used to refer to both the programming language and the 46 | software that interprets the scripts written using it. 47 | 48 | [RStudio](https://rstudio.com) is currently a very popular way to not only write 49 | your R scripts but also to interact with the R software. To function correctly, 50 | RStudio needs R and therefore both need to be installed on your computer. 51 | 52 | To make it easier to interact with R, we will use RStudio. RStudio is the most 53 | popular IDE (Integrated Development Environment) for R. An IDE is a piece of 54 | software that provides 55 | tools to make programming easier. 56 | 57 | You can also use the R Presentations feature to present your work in an HTML5 58 | presentation mixing Markdown and R code. You can display these within R Studio 59 | or your browser. There are many options for customising your presentation slides, 60 | including an option for showing LaTeX equations. This can help you collaborate 61 | with others and also has an application in teaching and classroom use. 62 | 63 | ## Why learn R? 64 | 65 | ### R does not involve lots of pointing and clicking, and that's a good thing 66 | 67 | The learning curve might be steeper than with other software but with R, the 68 | results of your analysis do not rely on remembering a succession of pointing 69 | and clicking, but instead on a series of written commands, and that's a good 70 | thing! So, if you want to redo your analysis because you collected more data, 71 | you don't have to remember which button you clicked in which order to obtain 72 | your results; you just have to run your script again. 73 | 74 | Working with scripts makes the steps you used in your analysis clear, and the 75 | code you write can be inspected by someone else who can give you feedback and 76 | spot mistakes. 77 | 78 | Working with scripts forces you to have a deeper understanding of what you are 79 | doing, and facilitates your learning and comprehension of the methods you use. 80 | 81 | ### R code is great for reproducibility 82 | 83 | Reproducibility is when someone else (including your future self) can obtain the 84 | same results from the same dataset when using the same analysis. 85 | 86 | R integrates with other tools to generate manuscripts from your code. If you 87 | collect more data, or fix a mistake in your dataset, the figures and the 88 | statistical tests in your manuscript are updated automatically. 89 | 90 | An increasing number of journals and funding agencies expect analyses to be 91 | reproducible, so knowing R will give you an edge with these requirements. 92 | 93 | To further support reproducibility and transparency, there are also packages 94 | that help you with dependency management: keeping track of which packages we 95 | are loading and how they depend on the package version you are using. 96 | This helps you make sure existing workflows work consistently and continue 97 | doing what they did before. 98 | 99 | Packages like renv let you “save” and “load” the state of your project library, 100 | also keeping track of the package version you use and the source it can be 101 | retrieved from. 102 | 103 | ### R is interdisciplinary and extensible 104 | 105 | With 10,000+ packages that can be installed to extend its capabilities, R 106 | provides a framework that allows you to combine statistical approaches from many 107 | scientific disciplines to best suit the analytical framework you need to analyze 108 | your data. For instance, R has packages for image analysis, GIS, time series, 109 | population genetics, and a lot more. 110 | 111 | ### R works on data of all shapes and sizes 112 | 113 | The skills you learn with R scale easily with the size of your dataset. Whether 114 | your dataset has hundreds or millions of lines, it won't make much difference to 115 | you. 116 | 117 | R is designed for data analysis. It comes with special data structures and data 118 | types that make handling of missing data and statistical factors convenient. 119 | 120 | R can connect to spreadsheets, databases, and many other data formats, on your 121 | computer or on the web. 122 | 123 | ### R produces high-quality graphics 124 | 125 | The plotting functionalities in R are endless, and allow you to adjust any 126 | aspect of your graph to convey most effectively the message from your data. 127 | 128 | ### R has a large and welcoming community 129 | 130 | Thousands of people use R daily. Many of them are willing to help you through 131 | mailing lists and websites such as [Stack Overflow](https://stackoverflow.com/), 132 | or on the [RStudio community](https://community.rstudio.com/). Questions which 133 | are backed up with [short, reproducible code 134 | snippets](https://www.tidyverse.org/help/) are more likely to attract 135 | knowledgeable responses. 136 | 137 | ### Not only is R free, but it is also open-source and cross-platform 138 | 139 | Anyone can inspect the source code to see how R works. Because of this 140 | transparency, there is less chance for mistakes, and if you (or someone else) 141 | find some, you can report and fix bugs. 142 | 143 | Because R is open source and is supported by a large community of developers and 144 | users, there is a very large selection of third-party add-on packages which are 145 | freely available to extend R's native capabilities. 146 | 147 |

148 | 149 |

150 | 151 |

152 | 153 | ```{r rstudio-analogy, echo=FALSE, fig.show="hold", out.width="100%", fig.alt="RStudio extends what R can do, and makes it easier to write R code and interact with R."} 154 | knitr::include_graphics("fig/r-manual.jpeg") 155 | ``` 156 | 157 |

158 | 159 |

160 | 161 | ```{r rstudio-analogy-2, echo=FALSE, fig.show="hold", fig.alt="automatic car gear shift representing the ease of RStudio", out.width="100%"} 162 | knitr::include_graphics("fig/r-automatic.jpeg") 163 | ``` 164 | 165 |

166 | 167 |

168 | 169 |

170 | 171 | RStudio extends what R can do, and makes it easier to write R code and interact 172 | with R. Left photo credit; Right photo credit. 173 | 174 |

175 | 176 | 177 | 178 | ## A tour of RStudio 179 | 180 | ## Knowing your way around RStudio 181 | 182 | Let's start by learning about [RStudio](https://www.rstudio.com/), which is an 183 | Integrated Development Environment (IDE) for working with R. 184 | 185 | The RStudio IDE open-source product is free under the 186 | [Affero General Public License (AGPL) v3](https://www.gnu.org/licenses/agpl-3.0.en.html). 187 | The RStudio IDE is also available with a commercial license and priority email 188 | support from RStudio, Inc. 189 | 190 | We will use the RStudio IDE to write code, navigate the files on our computer, 191 | inspect the variables we create, and visualize the plots we generate. RStudio 192 | can also be used for other things (e.g., version control, developing packages, 193 | writing Shiny apps) that we will not cover during the workshop. 194 | 195 | One of the advantages of using RStudio is that all the information 196 | you need to write code is available in a single window. Additionally, RStudio 197 | provides many shortcuts, autocompletion, and highlighting for the major file 198 | types you use while developing in R. RStudio makes typing easier and less 199 | error-prone. 200 | 201 | ## Getting set up 202 | 203 | It is good practice to keep a set of related data, analyses, and text 204 | self-contained in a single folder called the **working directory**. All of the 205 | scripts within this folder can then use *relative paths* to files. Relative 206 | paths indicate where inside the project a file is located (as opposed to 207 | absolute paths, which point to where a file is on a specific computer). Working 208 | this way makes it a lot easier to move your project around on your computer and 209 | share it with others without having to directly modify file paths in the 210 | individual scripts. 211 | 212 | RStudio provides a helpful set of tools to do this through its "Projects" 213 | interface, which not only creates a working directory for you but also remembers 214 | its location (allowing you to quickly navigate to it). The interface also 215 | (optionally) preserves custom settings and open files to make it easier to 216 | resume work after a break. 217 | 218 | ### Create a new project 219 | 220 | - Under the `File` menu, click on `New project`, choose `New directory`, then 221 | `New project` 222 | - Enter a name for this new folder (or "directory") and choose a convenient 223 | location for it. This will be your **working directory** for the rest of the 224 | day (e.g., `~/data-carpentry`) 225 | - Click on `Create project` 226 | - Create a new file where we will type our scripts. Go to File > New File > R 227 | script. Click the save icon on your toolbar and save your script as 228 | "`script.R`". 229 | 230 | The simplest way to open an RStudio project once it has been created is to 231 | navigate through your files to where the project was saved and double 232 | click on the `.Rproj` (blue cube) file. This will open RStudio and start your R 233 | session in the **same** directory as the `.Rproj` file. All your data, plots and 234 | scripts will now be relative to the project directory. RStudio projects have the 235 | added benefit of allowing you to open multiple projects at the same time each 236 | open to its own project directory. This allows you to keep multiple projects 237 | open without them interfering with each other. 238 | 239 | ### The RStudio Interface 240 | 241 | Let's take a quick tour of RStudio. 242 | 243 | ![](fig/R_00_Rstudio_01.png){alt='Screenshot of the RStudio\_startup screen'} 244 | 245 | RStudio is divided into four "panes". The placement of these 246 | panes and their content can be customized (see menu, Tools -> Global Options -> 247 | Pane Layout). 248 | 249 | The Default Layout is: 250 | 251 | - Top Left - **Source**: your scripts and documents 252 | - Bottom Left - **Console**: what R would look and be like without RStudio 253 | - Top Right - **Environment/History**: look here to see what you have done 254 | - Bottom Right - **Files** and more: see the contents of the project/working 255 | directory here, like your Script.R file 256 | 257 | ### Organizing your working directory 258 | 259 | Using a consistent folder structure across your projects will help keep things 260 | organized and make it easy to find/file things in the future. This 261 | can be especially helpful when you have multiple projects. In general, you might 262 | create directories (folders) for **scripts**, **data**, and **documents**. Here 263 | are some examples of suggested directories: 264 | 265 | - **`data/`** Use this folder to store your raw data and intermediate datasets. 266 | For the sake of transparency and 267 | [provenance](https://en.wikipedia.org/wiki/Provenance), you 268 | should *always* keep a copy of your raw data accessible and do as much of 269 | your data cleanup and preprocessing programmatically (i.e., with scripts, 270 | rather than manually) as possible. 271 | - **`data_output/`** When you need to modify your raw data, 272 | it might be useful to store the modified versions of the datasets in a 273 | different folder. 274 | - **`documents/`** Used for outlines, drafts, and other 275 | text. 276 | - **`fig_output/`** This folder can store the graphics that are generated 277 | by your scripts. 278 | - **`scripts/`** A place to keep your R scripts for 279 | different analyses or plotting. 280 | 281 | You may want additional directories or subdirectories depending on your project 282 | needs, but these should form the backbone of your working directory. 283 | 284 | ![](fig/rstudio_project_files.jpeg){alt='Example of a working directory structure'} 285 | 286 | ### The working directory 287 | 288 | The working directory is an important concept to understand. It is the place 289 | where R will look for and save files. When you write code for your project, your 290 | scripts should refer to files in relation to the root of your working directory 291 | and only to files within this structure. 292 | 293 | Using RStudio projects makes this easy and ensures that your working directory 294 | is set up properly. If you need to check it, you can use `getwd()`. If for some 295 | reason your working directory is not the same as the location of your RStudio 296 | project, it is likely that you opened an R script or RMarkdown file **not** your 297 | `.Rproj` file. You should close out of RStudio and open the `.Rproj` file by 298 | double clicking on the blue cube! If you ever need to modify your working 299 | directory in a script, `setwd('my/path')` changes the working directory. This 300 | should be used with caution since it makes analyses hard to share across devices 301 | and with other users. 302 | 303 | ### Downloading the data and getting set up 304 | 305 | For this lesson we will use the following folders in our working directory: 306 | **`data/`**, **`data_output/`** and **`fig_output/`**. Let's write them all in 307 | lowercase to be consistent. We can create them using the RStudio interface by 308 | clicking on the "New Folder" button in the file pane (bottom right), or directly 309 | from R by typing at console: 310 | 311 | ```{r create-dirs, eval=FALSE} 312 | dir.create("data") 313 | dir.create("data_output") 314 | dir.create("fig_output") 315 | ``` 316 | 317 | You can either download the data used for this lesson from GitHub or with R. 318 | You can copy the data from this [GitHub link](https://github.com/datacarpentry/r-socialsci/blob/main/episodes/data/SAFI_clean.csv) 319 | and paste it into a file called `SAFI_clean.csv` in the `data/` directory you just created. 320 | Or you can do this directly from R by copying and pasting this in your terminal 321 | (your instructor can place this chunk of code in the Etherpad): 322 | 323 | ```{r download-data, eval=FALSE} 324 | download.file( 325 | "https://raw.githubusercontent.com/datacarpentry/r-socialsci/main/episodes/data/SAFI_clean.csv", 326 | "data/SAFI_clean.csv", mode = "wb" 327 | ) 328 | ``` 329 | 330 | ## Interacting with R 331 | 332 | The basis of programming is that we write down instructions for the computer to 333 | follow, and then we tell the computer to follow those instructions. We write, or 334 | *code*, instructions in R because it is a common language that both the computer 335 | and we can understand. We call the instructions *commands* and we tell the 336 | computer to follow the instructions by *executing* (also called *running*) those 337 | commands. 338 | 339 | There are two main ways of interacting with R: by using the console or by using 340 | script files (plain text files that contain your code). The console pane (in 341 | RStudio, the bottom left panel) is the place where commands written in the R 342 | language can be typed and executed immediately by the computer. It is also where 343 | the results will be shown for commands that have been executed. You can type 344 | commands directly into the console and press Enter to execute those 345 | commands, but they will be forgotten when you close the session. 346 | 347 | Because we want our code and workflow to be reproducible, it is better to type 348 | the commands we want in the script editor and save the script. This way, there 349 | is a complete record of what we did, and anyone (including our future selves!) 350 | can easily replicate the results on their computer. 351 | 352 | RStudio allows you to execute commands directly from the script editor by using 353 | the Ctrl + Enter shortcut (on Mac, Cmd + 354 | Return will work). The command on the current line in the 355 | script (indicated by the cursor) or all of the commands in 356 | selected text will be sent to the console and executed when you press 357 | Ctrl + Enter. If there is information in the console 358 | you do not need anymore, you can clear it with Ctrl + L. 359 | You can find other keyboard shortcuts in this 360 | [RStudio cheatsheet about the RStudio IDE](https://raw.githubusercontent.com/rstudio/cheatsheets/main/rstudio-ide.pdf). 361 | 362 | At some point in your analysis, you may want to check the content of a variable 363 | or the structure of an object without necessarily keeping a record of it in 364 | your script. You can type these commands and execute them directly in the 365 | console. RStudio provides the Ctrl + 1 and 366 | Ctrl + 2 shortcuts allow you to jump between the 367 | script and the console panes. 368 | 369 | If R is ready to accept commands, the R console shows a `>` prompt. If R 370 | receives a command (by typing, copy-pasting, or sent from the script editor using 371 | Ctrl + Enter), R will try to execute it and, when 372 | ready, will show the results and come back with a new `>` prompt to wait for new 373 | commands. 374 | 375 | If R is still waiting for you to enter more text, 376 | the console will show a `+` prompt. It means that you haven't finished entering 377 | a complete command. This is likely because you have not 'closed' a parenthesis or 378 | quotation, i.e. you don't have the same number of left-parentheses as 379 | right-parentheses or the same number of opening and closing quotation marks. 380 | When this happens, and you thought you finished typing your command, click 381 | inside the console window and press Esc; this will cancel the 382 | incomplete command and return you to the `>` prompt. You can then proofread 383 | the command(s) you entered and correct the error. 384 | 385 | ## Installing additional packages using the packages tab 386 | 387 | In addition to the core R installation, there are in excess of 388 | 10,000 additional packages which can be used to extend the 389 | functionality of R. Many of these have been written by R users and 390 | have been made available in central repositories, like the one 391 | hosted at CRAN, for anyone to download and install into their own R 392 | environment. You should have already installed the packages 'ggplot2' 393 | and 'dplyr. If you have not, please do so now using these instructions. 394 | 395 | You can see if you have a package installed by looking in the `packages` tab 396 | (on the lower-right by default). You can also type the command 397 | `installed.packages()` into the console and examine the output. 398 | 399 | ![](fig/packages_pane.png){alt='Screenshot of Packages pane'} 400 | 401 | Additional packages can be installed from the ‘packages' tab. 402 | On the packages tab, click the ‘Install' icon and start typing the 403 | name of the package you want in the text box. As you type, packages 404 | matching your starting characters will be displayed in a drop-down 405 | list so that you can select them. 406 | 407 | ![](fig/R_00_Rstudio_03.png){alt='Screenshot of Install Packages Window'} 408 | 409 | At the bottom of the Install Packages window is a check box to 410 | ‘Install' dependencies. This is ticked by default, which is usually 411 | what you want. Packages can (and do) make use of functionality 412 | built into other packages, so for the functionality contained in 413 | the package you are installing to work properly, there may be other 414 | packages which have to be installed with them. The ‘Install 415 | dependencies' option makes sure that this happens. 416 | 417 | ::::::::::::::::::::::::::::::::::::::: challenge 418 | 419 | ## Exercise 420 | 421 | Use both the Console and the Packages tab to confirm that you have the tidyverse 422 | installed. 423 | 424 | ::::::::::::::: solution 425 | 426 | ## Solution 427 | 428 | Scroll through packages tab down to ‘tidyverse'. You can also type a few 429 | characters into the searchbox. 430 | The ‘tidyverse' package is really a package of packages, including 431 | 'ggplot2' and 'dplyr', both of which require other packages to run correctly. 432 | All of these packages will be installed automatically. Depending on what 433 | packages have previously been installed in your R environment, the install of 434 | ‘tidyverse' could be very quick or could take several minutes. As the install 435 | proceeds, messages relating to its progress will be written to the console. 436 | You will be able to see all of the packages which are actually being 437 | installed. 438 | 439 | ::::::::::::::::::::::::: 440 | 441 | :::::::::::::::::::::::::::::::::::::::::::::::::: 442 | 443 | Because the install process accesses the CRAN repository, you 444 | will need an Internet connection to install packages. 445 | 446 | It is also possible to install packages from other repositories, as 447 | well as Github or the local file system, but we won't be looking at these options in this lesson. 448 | 449 | ## Installing additional packages using R code 450 | 451 | If you were watching the console window when you started the 452 | install of ‘tidyverse', you may have noticed that the line 453 | 454 | ```{r, eval=FALSE} 455 | install.packages("tidyverse") 456 | ``` 457 | 458 | was written to the console before the start of the installation messages. 459 | 460 | You could also have installed the **`tidyverse`** packages by running this command directly at the R terminal. 461 | 462 | We will be using another package called **`here`** throughout the workshop to manage paths and directories. We will discuss it more detail in a later episode, but we will install it now in the console: 463 | 464 | ```{r, eval=FALSE} 465 | install.packages("here") 466 | ``` 467 | 468 | :::::::::::::::::::::::::::::::::::::::: keypoints 469 | 470 | - Use RStudio to write and run R programs. 471 | - Use `install.packages()` to install packages (libraries). 472 | 473 | :::::::::::::::::::::::::::::::::::::::::::::::::: 474 | 475 | 476 | -------------------------------------------------------------------------------- /episodes/03-dplyr.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Data Wrangling with dplyr 3 | teaching: 25 4 | exercises: 15 5 | source: Rmd 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | source("data/download_data.R") 10 | ``` 11 | :::: instructor 12 | 13 | - This lesson works better if you have graphics demonstrating dplyr commands. 14 | You can modify [this Google Slides deck](https://docs.google.com/presentation/d/1A9abypFdFp8urAe9z7GCMjFr4aPeIb8mZAtJA2F7H0w/edit#slide=id.g652714585f_0_114) and use it for your workshop. 15 | - For this lesson make sure that learners are comfortable using pipes. 16 | - There is also sometimes some confusion on what the arguments of `group_by` 17 | should be, and when to use `filter()` and `select()`. 18 | 19 | :::::::::::: 20 | 21 | 22 | ::::::::::::::::::::::::::::::::::::::: objectives 23 | 24 | - Describe the purpose of an R package and the **`dplyr`** package. 25 | - Select certain columns in a dataframe with the **`dplyr`** function `select`. 26 | - Select certain rows in a dataframe according to filtering conditions with the **`dplyr`** function `filter`. 27 | - Link the output of one **`dplyr`** function to the input of another function with the 'pipe' operator `%>%`. 28 | - Add new columns to a dataframe that are functions of existing columns with `mutate`. 29 | - Use the split-apply-combine concept for data analysis. 30 | - Use `summarize`, `group_by`, and `count` to split a dataframe into groups of observations, apply a summary statistics for each group, and then combine the results. 31 | 32 | :::::::::::::::::::::::::::::::::::::::::::::::::: 33 | 34 | :::::::::::::::::::::::::::::::::::::::: questions 35 | 36 | - How can I select specific rows and/or columns from a dataframe? 37 | - How can I combine multiple commands into a single command? 38 | - How can I create new columns or remove existing columns from a dataframe? 39 | 40 | :::::::::::::::::::::::::::::::::::::::::::::::::: 41 | 42 | **`dplyr`** is a package for making tabular data wrangling easier by using a 43 | limited set of functions that can be combined to extract and summarize insights 44 | from your data. 45 | 46 | Like **`readr`**, **`dplyr`** is a part of the tidyverse. These packages were loaded 47 | in R's memory when we called `library(tidyverse)` earlier. 48 | 49 | ::::::::::::::::::::::::::::::::::::::::: callout 50 | 51 | ## Note 52 | 53 | The packages in the tidyverse, namely **`dplyr`**, **`tidyr`** and **`ggplot2`** 54 | accept both the British (e.g. *summarise*) and American (e.g. *summarize*) spelling 55 | variants of different function and option names. For this lesson, we utilize 56 | the American spellings of different functions; however, feel free to use 57 | the regional variant for where you are teaching. 58 | 59 | :::::::::::::::::::::::::::::::::::::::::::::::::: 60 | 61 | ## What is an R package? 62 | 63 | The package **`dplyr`** provides easy tools for the most common data 64 | wrangling tasks. It is built to work directly with dataframes, with many 65 | common tasks optimized by being written in a compiled language (C++) (not all R 66 | packages are written in R!). 67 | 68 | There are also packages available for a wide range of tasks including building plots 69 | (**`ggplot2`**, which we'll see later), downloading data from the NCBI database, or 70 | performing statistical analysis on your data set. Many packages such as these are 71 | housed on, and downloadable from, the **C**omprehensive **R** **A**rchive **N**etwork 72 | (CRAN) using `install.packages`. This function makes the package accessible by your R 73 | installation with the command `library()`, as you did with `tidyverse` earlier. 74 | 75 | To easily access the documentation for a package within R or RStudio, use 76 | `help(package = "package_name")`. 77 | 78 | To learn more about **`dplyr`** after the workshop, you may want to check out this 79 | [handy data transformation with **`dplyr`** cheatsheet](https://github.com/rstudio/cheatsheets/blob/main/data-transformation.pdf). 80 | 81 | ::::::::::::::::::::::::::::::::::::::::: callout 82 | 83 | ## Note 84 | 85 | There are alternatives to the `tidyverse` packages for data wrangling, including 86 | the package [`data.table`](https://rdatatable.gitlab.io/data.table/). See this 87 | [comparison](https://mgimond.github.io/rug_2019_12/Index.html) 88 | for example to get a sense of the differences between using `base`, `tidyverse`, and 89 | `data.table`. 90 | 91 | :::::::::::::::::::::::::::::::::::::::::::::::::: 92 | 93 | ## Learning **`dplyr`** 94 | 95 | To make sure everyone will use the same dataset for this lesson, we'll read 96 | again the SAFI dataset that we downloaded earlier. 97 | 98 | ```{r, results="hide", purl=FALSE, message=FALSE} 99 | 100 | ## load the tidyverse 101 | library(tidyverse) 102 | library(here) 103 | 104 | interviews <- read_csv(here("data", "SAFI_clean.csv"), na = "NULL") 105 | 106 | ## inspect the data 107 | interviews 108 | 109 | ## preview the data 110 | # view(interviews) 111 | ``` 112 | 113 | We're going to learn some of the most common **`dplyr`** functions: 114 | 115 | - `select()`: subset columns 116 | - `filter()`: subset rows on conditions 117 | - `mutate()`: create new columns by using information from other columns 118 | - `group_by()` and `summarize()`: create summary statistics on grouped data 119 | - `arrange()`: sort results 120 | - `count()`: count discrete values 121 | 122 | ## Selecting columns and filtering rows 123 | 124 | To select columns of a dataframe, use `select()`. The first argument to this 125 | function is the dataframe (`interviews`), and the subsequent arguments are the 126 | columns to keep, separated by commas. Alternatively, if you are selecting 127 | columns adjacent to each other, you can use a `:` to select a range of columns, 128 | read as "select columns from \_\_\_ to \_\_\_." You may have done something similar in 129 | the past using subsetting. `select()` is essentially doing the same thing as 130 | subsetting, using a package (`dplyr`) instead of R's base functions. 131 | 132 | ```{r, results="hide", purl=FALSE} 133 | # to select columns throughout the dataframe 134 | select(interviews, village, no_membrs, months_lack_food) 135 | # to do the same thing with subsetting 136 | interviews[c("village","no_membrs","months_lack_food")] 137 | # to select a series of connected columns 138 | select(interviews, village:respondent_wall_type) 139 | ``` 140 | 141 | To choose rows based on specific criteria, we can use the `filter()` function. 142 | The argument after the dataframe is the condition we want our final 143 | dataframe to adhere to (e.g. village name is Chirodzo): 144 | 145 | ```{r, purl=FALSE} 146 | # filters observations where village name is "Chirodzo" 147 | filter(interviews, village == "Chirodzo") 148 | ``` 149 | 150 | You may also have noticed that the output from these call doesn't run off the 151 | screen anymore. It's one of the advantages of `tbl_df` (also called tibble), 152 | the central data class in the tidyverse, compared to normal dataframes in R. 153 | 154 | We can also specify multiple conditions within the `filter()` function. We can 155 | combine conditions using either "and" or "or" statements. In an "and" 156 | statement, an observation (row) must meet **every** criteria to be included 157 | in the resulting dataframe. To form "and" statements within dplyr, we can pass 158 | our desired conditions as arguments in the `filter()` function, separated by 159 | commas: 160 | 161 | ```{r, purl=FALSE} 162 | 163 | # filters observations with "and" operator (comma) 164 | # output dataframe satisfies ALL specified conditions 165 | filter(interviews, village == "Chirodzo", 166 | rooms > 1, 167 | no_meals > 2) 168 | ``` 169 | 170 | We can also form "and" statements with the `&` operator instead of commas: 171 | 172 | ```{r, purl=FALSE} 173 | # filters observations with "&" logical operator 174 | # output dataframe satisfies ALL specified conditions 175 | filter(interviews, village == "Chirodzo" & 176 | rooms > 1 & 177 | no_meals > 2) 178 | ``` 179 | 180 | In an "or" statement, observations must meet *at least one* of the specified conditions. 181 | To form "or" statements we use the logical operator for "or," which is the vertical bar (|): 182 | 183 | ```{r, purl=FALSE} 184 | # filters observations with "|" logical operator 185 | # output dataframe satisfies AT LEAST ONE of the specified conditions 186 | filter(interviews, village == "Chirodzo" | village == "Ruaca") 187 | ``` 188 | 189 | ## Pipes 190 | 191 | What if you want to select and filter at the same time? There are three 192 | ways to do this: use intermediate steps, nested functions, or pipes. 193 | 194 | With intermediate steps, you create a temporary dataframe and use 195 | that as input to the next function, like this: 196 | 197 | ```{r, purl=FALSE} 198 | interviews2 <- filter(interviews, village == "Chirodzo") 199 | interviews_ch <- select(interviews2, village:respondent_wall_type) 200 | ``` 201 | 202 | This is readable, but can clutter up your workspace with lots of objects that 203 | you have to name individually. With multiple steps, that can be hard to keep 204 | track of. 205 | 206 | You can also nest functions (i.e. one function inside of another), like this: 207 | 208 | ```{r, purl=FALSE} 209 | interviews_ch <- select(filter(interviews, village == "Chirodzo"), 210 | village:respondent_wall_type) 211 | ``` 212 | 213 | This is handy, but can be difficult to read if too many functions are nested, as 214 | R evaluates the expression from the inside out (in this case, filtering, then 215 | selecting). 216 | 217 | The last option, *pipes*, are a recent addition to R. Pipes let you take the 218 | output of one function and send it directly to the next, which is useful when 219 | you need to do many things to the same dataset. There are two Pipes in R: 1) `%>%` (called magrittr pipe; made available via the **`magrittr`** package, installed automatically with 220 | **`dplyr`**) or 2) `|>` (called native R pipe and it comes preinstalled with R v4.1.0 onwards). Both the pipes are, by and large, function similarly with a few differences (For more information, check: https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/). The choice of which pipe to be used can be changed in the Global settings in R studio and once that is done, you can type the pipe with: 221 | 222 | - Ctrl + Shift + M if you have a PC or Cmd + 223 | Shift + M if you have a Mac. 224 | 225 | ```{r, purl=FALSE} 226 | # the following example is run using magrittr pipe but the output will be same with the native pipe 227 | interviews %>% 228 | filter(village == "Chirodzo") %>% 229 | select(village:respondent_wall_type) 230 | 231 | #interviews |> 232 | # filter(village == "Chirodzo") |> 233 | # select(village:respondent_wall_type) 234 | ``` 235 | 236 | In the above code, we use the pipe to send the `interviews` dataset first 237 | through `filter()` to keep rows where `village` is "Chirodzo", then through 238 | `select()` to keep only the columns from `village` to `respondent_wall_type`. Since `%>%` 239 | takes the object on its left and passes it as the first argument to the function 240 | on its right, we don't need to explicitly include the dataframe as an argument 241 | to the `filter()` and `select()` functions any more. 242 | 243 | Some may find it helpful to read the pipe like the word "then". For instance, 244 | in the above example, we take the dataframe `interviews`, *then* we `filter` 245 | for rows with `village == "Chirodzo"`, *then* we `select` columns `village:respondent_wall_type`. 246 | The **`dplyr`** functions by themselves are somewhat simple, 247 | but by combining them into linear workflows with the pipe, we can accomplish 248 | more complex data wrangling operations. 249 | 250 | If we want to create a new object with this smaller version of the data, we 251 | can assign it a new name: 252 | 253 | ```{r, purl=FALSE} 254 | interviews_ch <- interviews %>% 255 | filter(village == "Chirodzo") %>% 256 | select(village:respondent_wall_type) 257 | 258 | interviews_ch 259 | 260 | ``` 261 | 262 | Note that the final dataframe (`interviews_ch`) is the leftmost part of this 263 | expression. 264 | 265 | ::::::::::::::::::::::::::::::::::::::: challenge 266 | 267 | ## Exercise 268 | 269 | Using pipes, subset the `interviews` data to include interviews 270 | where respondents were members of an irrigation association 271 | (`memb_assoc`) and retain only the columns `affect_conflicts`, 272 | `liv_count`, and `no_meals`. 273 | 274 | ::::::::::::::: solution 275 | 276 | ## Solution 277 | 278 | ```{r} 279 | interviews %>% 280 | filter(memb_assoc == "yes") %>% 281 | select(affect_conflicts, liv_count, no_meals) 282 | ``` 283 | 284 | ::::::::::::::::::::::::: 285 | 286 | :::::::::::::::::::::::::::::::::::::::::::::::::: 287 | 288 | ## Mutate 289 | 290 | Frequently you'll want to create new columns based on the values in existing 291 | columns, for example to do unit conversions, or to find the ratio of values in 292 | two columns. For this we'll use `mutate()`. 293 | 294 | We might be interested in the ratio of number of household members 295 | to rooms used for sleeping (i.e. avg number of people per room): 296 | 297 | ```{r, purl=FALSE} 298 | interviews %>% 299 | mutate(people_per_room = no_membrs / rooms) 300 | ``` 301 | 302 | We may be interested in investigating whether being a member of an 303 | irrigation association had any effect on the ratio of household members 304 | to rooms. To look at this relationship, we will first remove 305 | data from our dataset where the respondent didn't answer the 306 | question of whether they were a member of an irrigation association. 307 | These cases are recorded as "NULL" in the dataset. 308 | 309 | To remove these cases, we could insert a `filter()` in the chain: 310 | 311 | ```{r, purl=FALSE} 312 | interviews %>% 313 | filter(!is.na(memb_assoc)) %>% 314 | mutate(people_per_room = no_membrs / rooms) 315 | ``` 316 | 317 | The `!` symbol negates the result of the `is.na()` function. Thus, if `is.na()` 318 | returns a value of `TRUE` (because the `memb_assoc` is missing), the `!` symbol 319 | negates this and says we only want values of `FALSE`, where `memb_assoc` **is 320 | not** missing. 321 | 322 | ::::::::::::::::::::::::::::::::::::::: challenge 323 | 324 | ## Exercise 325 | 326 | Create a new dataframe from the `interviews` data that meets the following 327 | criteria: contains only the `village` column and a new column called 328 | `total_meals` containing a value that is equal to the total number of meals 329 | served in the household per day on average (`no_membrs` times `no_meals`). 330 | Only the rows where `total_meals` is greater than 20 should be shown in the 331 | final dataframe. 332 | 333 | **Hint**: think about how the commands should be ordered to produce this data 334 | frame! 335 | 336 | ::::::::::::::: solution 337 | 338 | ## Solution 339 | 340 | ```{r} 341 | interviews_total_meals <- interviews %>% 342 | mutate(total_meals = no_membrs * no_meals) %>% 343 | filter(total_meals > 20) %>% 344 | select(village, total_meals) 345 | ``` 346 | 347 | ::::::::::::::::::::::::: 348 | 349 | :::::::::::::::::::::::::::::::::::::::::::::::::: 350 | 351 | ## Split-apply-combine data analysis and the summarize() function 352 | 353 | Many data analysis tasks can be approached using the *split-apply-combine* 354 | paradigm: split the data into groups, apply some analysis to each group, and 355 | then combine the results. **`dplyr`** makes this very easy through the use of 356 | the `group_by()` function. 357 | 358 | ### The `summarize()` function 359 | 360 | `group_by()` is often used together with `summarize()`, which collapses each 361 | group into a single-row summary of that group. `group_by()` takes as arguments 362 | the column names that contain the **categorical** variables for which you want 363 | to calculate the summary statistics. So to compute the average household size by 364 | village: 365 | 366 | ```{r, purl=FALSE} 367 | interviews %>% 368 | group_by(village) %>% 369 | summarize(mean_no_membrs = mean(no_membrs)) 370 | ``` 371 | 372 | You can also group by multiple columns: 373 | 374 | ```{r, purl=FALSE} 375 | interviews %>% 376 | group_by(village, memb_assoc) %>% 377 | summarize(mean_no_membrs = mean(no_membrs)) 378 | ``` 379 | 380 | Note that the output is a grouped tibble of nine rows by three columns 381 | which is indicated by the by two first lines with the `#`. 382 | To obtain an ungrouped tibble, use the 383 | `ungroup` function: 384 | 385 | ```{r, purl=FALSE} 386 | interviews %>% 387 | group_by(village, memb_assoc) %>% 388 | summarize(mean_no_membrs = mean(no_membrs)) %>% 389 | ungroup() 390 | ``` 391 | 392 | Notice that the second line with the `#` that previously indicated the grouping has 393 | disappeared and we now only have a 9x3-tibble without grouping. 394 | When grouping both by `village` and `membr_assoc`, we see rows in our table for 395 | respondents who did not specify whether they were a member of an irrigation 396 | association. We can exclude those data from our table using a filter step. 397 | 398 | ```{r, purl=FALSE} 399 | interviews %>% 400 | filter(!is.na(memb_assoc)) %>% 401 | group_by(village, memb_assoc) %>% 402 | summarize(mean_no_membrs = mean(no_membrs)) 403 | ``` 404 | 405 | Once the data are grouped, you can also summarize multiple variables at the same 406 | time (and not necessarily on the same variable). For instance, we could add a 407 | column indicating the minimum household size for each village for each group 408 | (members of an irrigation association vs not): 409 | 410 | ```{r, purl=FALSE} 411 | interviews %>% 412 | filter(!is.na(memb_assoc)) %>% 413 | group_by(village, memb_assoc) %>% 414 | summarize(mean_no_membrs = mean(no_membrs), 415 | min_membrs = min(no_membrs)) 416 | ``` 417 | 418 | It is sometimes useful to rearrange the result of a query to inspect the values. 419 | For instance, we can sort on `min_membrs` to put the group with the smallest 420 | household first: 421 | 422 | ```{r, purl=FALSE} 423 | interviews %>% 424 | filter(!is.na(memb_assoc)) %>% 425 | group_by(village, memb_assoc) %>% 426 | summarize(mean_no_membrs = mean(no_membrs), 427 | min_membrs = min(no_membrs)) %>% 428 | arrange(min_membrs) 429 | ``` 430 | 431 | To sort in descending order, we need to add the `desc()` function. If we want to 432 | sort the results by decreasing order of minimum household size: 433 | 434 | ```{r, purl=FALSE} 435 | interviews %>% 436 | filter(!is.na(memb_assoc)) %>% 437 | group_by(village, memb_assoc) %>% 438 | summarize(mean_no_membrs = mean(no_membrs), 439 | min_membrs = min(no_membrs)) %>% 440 | arrange(desc(min_membrs)) 441 | ``` 442 | 443 | ### Counting 444 | 445 | When working with data, we often want to know the number of observations found 446 | for each factor or combination of factors. For this task, **`dplyr`** provides 447 | `count()`. For example, if we wanted to count the number of rows of data for 448 | each village, we would do: 449 | 450 | ```{r, purl=FALSE} 451 | interviews %>% 452 | count(village) 453 | ``` 454 | 455 | For convenience, `count()` provides the `sort` argument to get results in 456 | decreasing order: 457 | 458 | ```{r, purl=FALSE} 459 | interviews %>% 460 | count(village, sort = TRUE) 461 | ``` 462 | 463 | ::::::::::::::::::::::::::::::::::::::: challenge 464 | 465 | ## Exercise 466 | 467 | How many households in the survey have an average of 468 | two meals per day? Three meals per day? Are there any other numbers 469 | of meals represented? 470 | 471 | ::::::::::::::: solution 472 | 473 | ## Solution 474 | 475 | ```{r} 476 | interviews %>% 477 | count(no_meals) 478 | ``` 479 | 480 | ::::::::::::::::::::::::: 481 | 482 | Use `group_by()` and `summarize()` to find the mean, min, and max 483 | number of household members for each village. Also add the number of 484 | observations (hint: see `?n`). 485 | 486 | ::::::::::::::: solution 487 | 488 | ## Solution 489 | 490 | ```{r} 491 | interviews %>% 492 | group_by(village) %>% 493 | summarize( 494 | mean_no_membrs = mean(no_membrs), 495 | min_no_membrs = min(no_membrs), 496 | max_no_membrs = max(no_membrs), 497 | n = n() 498 | ) 499 | ``` 500 | 501 | ::::::::::::::::::::::::: 502 | 503 | What was the largest household interviewed in each month? 504 | 505 | ::::::::::::::: solution 506 | 507 | ## Solution 508 | 509 | ```{r} 510 | # if not already included, add month, year, and day columns 511 | library(lubridate) # load lubridate if not already loaded 512 | interviews %>% 513 | mutate(month = month(interview_date), 514 | day = day(interview_date), 515 | year = year(interview_date)) %>% 516 | group_by(year, month) %>% 517 | summarize(max_no_membrs = max(no_membrs)) 518 | ``` 519 | 520 | ::::::::::::::::::::::::: 521 | 522 | :::::::::::::::::::::::::::::::::::::::::::::::::: 523 | 524 | :::::::::::::::::::::::::::::::::::::::: keypoints 525 | 526 | - Use the `dplyr` package to manipulate dataframes. 527 | - Use `select()` to choose variables from a dataframe. 528 | - Use `filter()` to choose data based on values. 529 | - Use `group_by()` and `summarize()` to work with subsets of data. 530 | - Use `mutate()` to create new variables. 531 | 532 | :::::::::::::::::::::::::::::::::::::::::::::::::: 533 | 534 | 535 | -------------------------------------------------------------------------------- /episodes/04-tidyr.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Data Wrangling with tidyr 3 | teaching: 25 4 | exercises: 15 5 | source: Rmd 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | source("data/download_data.R") 10 | ``` 11 | 12 | ::::::::::::::::::::::::::::::::::::::: objectives 13 | 14 | - Describe the concept of a wide and a long table format and for which purpose those formats are useful. 15 | - Describe the roles of variable names and their associated values when a table is reshaped. 16 | - Reshape a dataframe from long to wide format and back with the `pivot_wider` and `pivot_longer` commands from the **`tidyr`** package. 17 | - Export a dataframe to a csv file. 18 | 19 | :::::::::::::::::::::::::::::::::::::::::::::::::: 20 | 21 | :::::::::::::::::::::::::::::::::::::::: questions 22 | 23 | - How can I reformat a data frame to meet my needs? 24 | 25 | :::::::::::::::::::::::::::::::::::::::::::::::::: 26 | 27 | **`dplyr`** pairs nicely with **`tidyr`** which enables you to swiftly 28 | convert between different data formats (long vs. wide) for plotting and analysis. 29 | To learn more about **`tidyr`** after the workshop, you may want to check out this 30 | [handy data tidying with **`tidyr`** 31 | cheatsheet](https://github.com/rstudio/cheatsheets/blob/main/tidyr.pdf). 32 | 33 | To make sure everyone will use the same dataset for this lesson, we'll read 34 | again the SAFI dataset that we downloaded earlier. 35 | 36 | ```{r, results="hide", purl=FALSE, message=FALSE} 37 | 38 | ## load the tidyverse 39 | library(tidyverse) 40 | library(here) 41 | 42 | interviews <- read_csv(here("data", "SAFI_clean.csv"), na = "NULL") 43 | 44 | ## inspect the data 45 | interviews 46 | 47 | ## preview the data 48 | # view(interviews) 49 | ``` 50 | 51 | ## Reshaping with pivot\_wider() and pivot\_longer() 52 | 53 | There are essentially three rules that define a "tidy" dataset: 54 | 55 | 1. Each variable has its own column 56 | 2. Each observation has its own row 57 | 3. Each value must have its own cell 58 | 59 | This graphic visually represents the three rules that define a "tidy" dataset: 60 | 61 | ![](fig/tidy-data-wickham.png) 62 | *R for Data Science*, Wickham H and Grolemund G ([https://r4ds.had.co.nz/index.html](https://r4ds.had.co.nz/index.html)) 63 | © Wickham, Grolemund 2017 64 | This image is licenced under Attribution-NonCommercial-NoDerivs 3.0 United States (CC-BY-NC-ND 3.0 US) 65 | 66 | In this section we will explore how these rules are linked to the different 67 | data formats researchers are often interested in: "wide" and "long". This 68 | tutorial will help you efficiently transform your data shape regardless of 69 | original format. First we will explore qualities of the `interviews` data and 70 | how they relate to these different types of data formats. 71 | 72 | ### Long and wide data formats 73 | 74 | In the `interviews` data, each row contains the values of variables associated 75 | with each record collected (each interview in the villages). It is stated 76 | that the `key_ID` was "added to provide a unique Id for each observation" 77 | and the `instanceID` "does this as well but it is not as convenient to use." 78 | 79 | Once we have established that `key_ID` and `instanceID` are both unique we can use 80 | either variable as an identifier corresponding to the 131 interview records. 81 | 82 | ```{r, purl=FALSE} 83 | interviews %>% 84 | select(key_ID) %>% 85 | distinct() %>% 86 | nrow() 87 | ``` 88 | 89 | As seen in the code below, for each interview date in each village no 90 | `instanceID`s are the same. Thus, this format is what is called a "long" data 91 | format, where each observation occupies only one row in the dataframe. 92 | 93 | ```{r, purl=FALSE} 94 | interviews %>% 95 | filter(village == "Chirodzo") %>% 96 | select(key_ID, village, interview_date, instanceID) %>% 97 | sample_n(size = 10) 98 | ``` 99 | 100 | We notice that the layout or format of the `interviews` data is in a format that 101 | adheres to rules 1-3, where 102 | 103 | - each column is a variable 104 | - each row is an observation 105 | - each value has its own cell 106 | 107 | This is called a "long" data format. But, we notice that each column represents 108 | a different variable. In the "longest" data format there would only be three 109 | columns, one for the id variable, one for the observed variable, and one for the 110 | observed value (of that variable). This data format is quite unsightly 111 | and difficult to work with, so you will rarely see it in use. 112 | 113 | Alternatively, in a "wide" data format we see modifications to rule 1, where 114 | each column no longer represents a single variable. Instead, columns can 115 | represent different levels/values of a variable. For instance, in some data you 116 | encounter the researchers may have chosen for every survey date to be a 117 | different column. 118 | 119 | These may sound like dramatically different data layouts, but there are some 120 | tools that make transitions between these layouts much simpler than you might 121 | think! The gif below shows how these two formats relate to each other, and 122 | gives you an idea of how we can use R to shift from one format to the other. 123 | 124 | ![](fig/tidyr-pivot_wider_longer.gif) 125 | Long and wide dataframe layouts mainly affect readability. You may find that 126 | visually you may prefer the "wide" format, since you can see more of the data on 127 | the screen. However, all of the R functions we have used thus far expect for 128 | your data to be in a "long" data format. This is because the long format is more 129 | machine readable and is closer to the formatting of databases. 130 | 131 | ### Questions which warrant different data formats 132 | 133 | In interviews, each row contains the values of variables associated with each 134 | record (the unit), values such as the village of the respondent, the number 135 | of household members, or the type of wall their house had. This format allows 136 | for us to make comparisons across individual surveys, but what if we wanted to 137 | look at differences in households grouped by different types of items owned? 138 | 139 | To facilitate this comparison we would need to create a new table where each row 140 | (the unit) was comprised of values of variables associated with items owned 141 | (i.e., `items_owned`). In practical terms this means the values of 142 | the items in `items_owned` (e.g. bicycle, 143 | radio, table, etc.) would become the names of column variables and 144 | the cells would contain values of `TRUE` or `FALSE`, for whether that household 145 | had that item. 146 | 147 | Once we we've created this new table, we can explore the relationship within and 148 | between villages. The key point here is that we are still following a tidy data 149 | structure, but we have **reshaped** the data according to the observations of 150 | interest. 151 | 152 | Alternatively, if the interview dates were spread across multiple columns, and 153 | we were interested in visualizing, within each village, how irrigation 154 | conflicts have changed over time. This would require for the interview date to 155 | be included in a single column rather than spread across multiple columns. Thus, 156 | we would need to transform the column names into values of a variable. 157 | 158 | We can do both of these transformations with two `tidyr` functions, 159 | `pivot_wider()` and `pivot_longer()`. 160 | 161 | ## Pivoting wider 162 | 163 | `pivot_wider()` takes three principal arguments: 164 | 165 | 1. the data 166 | 2. the *names\_from* column variable whose values will become new column names. 167 | 3. the *values\_from* column variable whose values will fill the new column 168 | variables. 169 | 170 | Further arguments include `values_fill` which, if set, fills in missing values 171 | with the value provided. 172 | 173 | Let's use `pivot_wider()` to transform interviews to create new columns for each 174 | item owned by a household. 175 | There are a couple of new concepts in this transformation, so let's walk through 176 | it line by line. First we create a new object (`interviews_items_owned`) based on 177 | the `interviews` data frame. 178 | 179 | ```{r, eval=FALSE} 180 | interviews_items_owned <- interviews %>% 181 | ``` 182 | 183 | Then we will actually need to make our data frame longer, because we have 184 | multiple items in a single cell. 185 | We will use a new function, `separate_longer_delim()`, from the **`tidyr`** package 186 | to separate the values of `items_owned` based on the presence of semi-colons (`;`). 187 | The values of this variable were multiple items separated by semi-colons, so 188 | this action creates a row for each item listed in a household's possession. 189 | Thus, we end up with a long format version of the dataset, with multiple rows 190 | for each respondent. For example, if a respondent has a television and a solar 191 | panel, that respondent will now have two rows, one with "television" and the 192 | other with "solar panel" in the `items_owned` column. 193 | 194 | ```{r, eval=FALSE} 195 | separate_longer_delim(items_owned, delim = ";") %>% 196 | ``` 197 | 198 | After this transformation, you may notice that the `items_owned` column contains 199 | `NA` values. This is because some of the respondents did not own any of the items 200 | in the interviewer's list. We can use the `replace_na()` function to 201 | change these `NA` values to something more meaningful. The `replace_na()` function 202 | expects for you to give it a `list()` of columns that you would like to replace 203 | the `NA` values in, and the value that you would like to replace the `NA`s. This 204 | ends up looking like this: 205 | 206 | ```{r, eval=FALSE} 207 | replace_na(list(items_owned = "no_listed_items")) %>% 208 | ``` 209 | 210 | Next, we create a new variable named `items_owned_logical`, which has one value 211 | (`TRUE`) for every row. This makes sense, since each item in every row was owned 212 | by that household. We are constructing this variable so that when we spread the 213 | `items_owned` across multiple columns, we can fill the values of those columns 214 | with logical values describing whether the household did (`TRUE`) or did not 215 | (`FALSE`) own that particular item. 216 | 217 | ```{r, eval=FALSE} 218 | mutate(items_owned_logical = TRUE) %>% 219 | ``` 220 | 221 | ![](fig/separate_longer.png){alt="Two tables shown side-by-side. The first row 222 | of the left table is highlighted in blue, and the first four rows of the right 223 | table are also highlighted in blue to show how each of the values of 'items 224 | owned' are given their own row with the separate longer delim function. The 225 | 'items owned logical' column is highlighted in yellow on the right table to show 226 | how the mutate function adds a new column."} 227 | 228 | At this point, we can also count the number of items owned by each household, 229 | which is equivalent to the number of rows per `key_ID`. We can do this with a 230 | `group_by()` and `mutate()` pipeline that works similar to `group_by()` and 231 | `summarize()` discussed in the previous episode but instead of creating a 232 | summary table, we will add another column called `number_items`. We use the 233 | `n()` function to count the number of rows within each group. However, there is 234 | one difficulty we need to take into account, namely those households that did 235 | not list any items. These households now have `"no_listed_items"` under 236 | `items_owned`. We do not want to count this as an item but instead show zero 237 | items. We can accomplish this using **`dplyr`'s** `if_else()` function that 238 | evaluates a condition and returns one value if true and another if false. Here, 239 | if the `items_owned` column is `"no_listed_items"`, then a 0 is returned, 240 | otherwise, the number of rows per group is returned using `n()`. 241 | 242 | ```{r, eval=FALSE} 243 | group_by(key_ID) %>% 244 | mutate(number_items = if_else(items_owned == "no_listed_items", 0, n())) %>% 245 | 246 | ``` 247 | 248 | Lastly, we use `pivot_wider()` to switch from long format to wide format. This 249 | creates a new column for each of the unique values in the `items_owned` column, 250 | and fills those columns with the values of `items_owned_logical`. We also 251 | declare that for items that are missing, we want to fill those cells with the 252 | value of `FALSE` instead of `NA`. 253 | 254 | ```{r, eval=FALSE} 255 | pivot_wider(names_from = items_owned, 256 | values_from = items_owned_logical, 257 | values_fill = list(items_owned_logical = FALSE)) 258 | 259 | ``` 260 | 261 | ![](fig/pivot_wider.png){alt="Two tables shown side-by-side. The 'items owned' 262 | column is highlighted in blue on the left table, and the column names are 263 | highlighted in blue on the right table to show how the values of the 'items 264 | owned' become the column names in the output of the pivot wider function. The 265 | 'items owned logical' column is highlighted in yellow on the left table, and the 266 | values of the bicycle, television, and solar panel columns are highlighted in 267 | yellow on the right table to show how the values of the 'items owned logical' 268 | column became the values of all three of the aforementioned columns."} 269 | 270 | Combining the above steps, the chunk looks like this. Note that two new columns 271 | are created within the same `mutate()` call. 272 | 273 | ```{r} 274 | interviews_items_owned <- interviews %>% 275 | separate_longer_delim(items_owned, delim = ";") %>% 276 | replace_na(list(items_owned = "no_listed_items")) %>% 277 | group_by(key_ID) %>% 278 | mutate(items_owned_logical = TRUE, 279 | number_items = if_else(items_owned == "no_listed_items", 0, n())) %>% 280 | pivot_wider(names_from = items_owned, 281 | values_from = items_owned_logical, 282 | values_fill = list(items_owned_logical = FALSE)) 283 | ``` 284 | 285 | View the `interviews_items_owned` data frame. It should have `r 286 | nrow(interviews)` rows (the same number of rows you had originally), but extra 287 | columns for each item. How many columns were added? Notice that there is no 288 | longer a column titled `items_owned`. This is because there is a default 289 | parameter in `pivot_wider()` that drops the original column. The values that 290 | were in that column have now become columns named `television`, `solar_panel`, 291 | `table`, etc. You can use `dim(interviews)` and `dim(interviews_wide)` to see 292 | how the number of columns has changed between the two datasets. 293 | 294 | This format of the data allows us to do interesting things, like make a table 295 | showing the number of respondents in each village who owned a particular item: 296 | 297 | ```{r, purl=FALSE} 298 | interviews_items_owned %>% 299 | filter(bicycle) %>% 300 | group_by(village) %>% 301 | count(bicycle) 302 | ``` 303 | 304 | Or below we calculate the average number of items from the list owned by 305 | respondents in each village using the `number_items` column we created to 306 | count the items listed by each household. 307 | 308 | ```{r, purl=FALSE} 309 | interviews_items_owned %>% 310 | group_by(village) %>% 311 | summarize(mean_items = mean(number_items)) 312 | ``` 313 | 314 | ::::::::::::::::::::::::::::::::::::::: challenge 315 | 316 | ## Exercise 317 | 318 | We created `interviews_items_owned` by reshaping the data: first longer and then 319 | wider. Replicate this process with the `months_lack_food` column in the 320 | `interviews` dataframe. Create a new dataframe with columns for each of the 321 | months filled with logical vectors (`TRUE` or `FALSE`) and a summary column 322 | called `number_months_lack_food` that calculates the number of months each 323 | household reported a lack of food. 324 | 325 | Note that if the household did not lack food in the previous 12 months, the 326 | value input was "none". 327 | 328 | ::::::::::::::: solution 329 | 330 | ## Solution 331 | 332 | ```{r} 333 | months_lack_food <- interviews %>% 334 | separate_longer_delim(months_lack_food, delim = ";") %>% 335 | group_by(key_ID) %>% 336 | mutate(months_lack_food_logical = TRUE, 337 | number_months_lack_food = if_else(months_lack_food == "none", 0, n())) %>% 338 | pivot_wider(names_from = months_lack_food, 339 | values_from = months_lack_food_logical, 340 | values_fill = list(months_lack_food_logical = FALSE)) 341 | ``` 342 | 343 | ::::::::::::::::::::::::: 344 | 345 | 346 | :::::::::::::::::::::::::::::::::::::::::::::::::: 347 | 348 | ## Pivoting longer 349 | 350 | The opposing situation could occur if we had been provided with data in the form 351 | of `interviews_wide`, where the items owned are column names, but we 352 | wish to treat them as values of an `items_owned` variable instead. 353 | 354 | In this situation we are gathering these columns turning them into a pair 355 | of new variables. One variable includes the column names as values, and the 356 | other variable contains the values in each cell previously associated with the 357 | column names. We will do this in two steps to make this process a bit clearer. 358 | 359 | `pivot_longer()` takes four principal arguments: 360 | 361 | 1. the data 362 | 2. *cols* are the names of the columns we use to fill the a new values variable 363 | (or to drop). 364 | 3. the *names\_to* column variable we wish to create from the *cols* provided. 365 | 4. the *values\_to* column variable we wish to create and fill with values 366 | associated with the *cols* provided. 367 | 368 | 369 | ```{r, purl=FALSE} 370 | interviews_long <- interviews_items_owned %>% 371 | pivot_longer(cols = bicycle:car, 372 | names_to = "items_owned", 373 | values_to = "items_owned_logical") 374 | ``` 375 | 376 | View both `interviews_long` and `interviews_items_owned` and compare their structure. 377 | 378 | ::::::::::::::::::::::::::::::::::::::: challenge 379 | 380 | ## Exercise 381 | 382 | We created some summary tables on `interviews_items_owned` using `count` and 383 | `summarise`. We can create the same tables on `interviews_long`, but this will 384 | require a different process. 385 | 386 | Make a table showing the number of respondents in each village who owned 387 | a particular item, and include all items. The difference between this format and 388 | the wide format is that you can now `count` all the items using the 389 | `items_owned` variable. 390 | 391 | ::::::::::::::: solution 392 | 393 | ## Solution 394 | 395 | ```{r} 396 | interviews_long %>% 397 | filter(items_owned_logical) %>% 398 | group_by(village) %>% 399 | count(items_owned) 400 | ``` 401 | 402 | ::::::::::::::::::::::::: 403 | 404 | 405 | :::::::::::::::::::::::::::::::::::::::::::::::::: 406 | 407 | 408 | ## Applying what we learned to clean our data 409 | 410 | Now we have simultaneously learned about `pivot_longer()` and `pivot_wider()`, 411 | and fixed a problem in the way our data is structured. In this dataset, we have 412 | another column that stores multiple values in a single cell. Some of the cells 413 | in the `months_lack_food` column contain multiple months which, as before, are 414 | separated by semi-colons (`;`). 415 | 416 | To create a data frame where each of the columns contain only one value per cell, 417 | we can repeat the steps we applied to `items_owned` and apply them to 418 | `months_lack_food`. Since we will be using this data frame for the next episode, 419 | we will call it `interviews_plotting`. 420 | 421 | ```{r, purl=FALSE} 422 | ## Plotting data ## 423 | interviews_plotting <- interviews %>% 424 | ## pivot wider by items_owned 425 | separate_longer_delim(items_owned, delim = ";") %>% 426 | replace_na(list(items_owned = "no_listed_items")) %>% 427 | ## Use of grouped mutate to find number of rows 428 | group_by(key_ID) %>% 429 | mutate(items_owned_logical = TRUE, 430 | number_items = if_else(items_owned == "no_listed_items", 0, n())) %>% 431 | pivot_wider(names_from = items_owned, 432 | values_from = items_owned_logical, 433 | values_fill = list(items_owned_logical = FALSE)) %>% 434 | ## pivot wider by months_lack_food 435 | separate_longer_delim(months_lack_food, delim = ";") %>% 436 | mutate(months_lack_food_logical = TRUE, 437 | number_months_lack_food = if_else(months_lack_food == "none", 0, n())) %>% 438 | pivot_wider(names_from = months_lack_food, 439 | values_from = months_lack_food_logical, 440 | values_fill = list(months_lack_food_logical = FALSE)) 441 | 442 | ``` 443 | 444 | 445 | ## Exporting data 446 | 447 | Now that you have learned how to use **`dplyr`** and **`tidyr`** to wrangle your 448 | raw data, you may want to export these new datasets to share them with your 449 | collaborators or for archival purposes. 450 | 451 | Similar to the `read_csv()` function used for reading CSV files into R, there is 452 | a `write_csv()` function that generates CSV files from data frames. 453 | 454 | Before using `write_csv()`, we are going to create a new folder, `data_output`, 455 | in our working directory that will store this generated dataset. We don't want 456 | to write generated datasets in the same directory as our raw data. It's good 457 | practice to keep them separate. The `data` folder should only contain the raw, 458 | unaltered data, and should be left alone to make sure we don't delete or modify 459 | it. In contrast, our script will generate the contents of the `data_output` 460 | directory, so even if the files it contains are deleted, we can always 461 | re-generate them. 462 | 463 | In preparation for our next lesson on plotting, we created a version of the 464 | dataset where each of the columns includes only one data value. Now we can save 465 | this data frame to our `data_output` directory. 466 | 467 | ```{r, purl=FALSE, eval=FALSE} 468 | write_csv(interviews_plotting, file = "data_output/interviews_plotting.csv") 469 | ``` 470 | 471 | ```{r, purl=FALSE, eval=TRUE, echo=FALSE} 472 | if (!dir.exists("data_output")) dir.create("data_output") 473 | write_csv(interviews_plotting, "data_output/interviews_plotting.csv") 474 | ``` 475 | 476 | :::::::::::::::::::::::::::::::::::::::: keypoints 477 | 478 | - Use the `tidyr` package to change the layout of data frames. 479 | - Use `pivot_wider()` to go from long to wide format. 480 | - Use `pivot_longer()` to go from wide to long format. 481 | 482 | :::::::::::::::::::::::::::::::::::::::::::::::::: 483 | 484 | -------------------------------------------------------------------------------- /episodes/07-json.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Processing JSON data (Optional) 3 | teaching: 30 4 | exercises: 15 5 | source: Rmd 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | source("data/download_data.R") 10 | library(tidyverse) 11 | ``` 12 | 13 | :::: instructor 14 | 15 | - This is an optional lessons intended to introduce learners to JSON data, as well as how to 16 | read JSON data into R and how to convert the data into a data frame or array. 17 | - Note that his lesson was community-contributed and remains a work in progress. As such, it could 18 | benefit from feedback from instructors and/or workshop participants. 19 | 20 | :::::::::::: 21 | 22 | ::::::::::::::::::::::::::::::::::::::: objectives 23 | 24 | - Describe the JSON data format 25 | - Understand where JSON is typically used 26 | - Appreciate some advantages of using JSON over tabular data 27 | - Appreciate some disadvantages of processing JSON documents 28 | - Use the jsonLite package to read a JSON file 29 | - Display formatted JSON as dataframe 30 | - Select and display nested dataframe fields from a JSON document 31 | - Write tabular data from selected elements from a JSON document to a csv file 32 | 33 | :::::::::::::::::::::::::::::::::::::::::::::::::: 34 | 35 | :::::::::::::::::::::::::::::::::::::::: questions 36 | 37 | - What is JSON format? 38 | - How can I convert JSON to an R dataframe? 39 | - How can I convert an array of JSON record into a table? 40 | 41 | :::::::::::::::::::::::::::::::::::::::::::::::::: 42 | 43 | ## The JSON data format 44 | 45 | The JSON data format was designed as a way of allowing different machines or processes within machines to communicate with each other by sending messages constructed in a well defined format. JSON is now the preferred data format used by APIs (Application Programming Interfaces). 46 | 47 | The JSON format although somewhat verbose is not only Human readable but it can also be mapped very easily to an R dataframe. 48 | 49 | We are going to read a file of data formatted as JSON, convert it into a dataframe in R then selectively create a csv file from the extracted data. 50 | 51 | The JSON file we are going to use is the [SAFI.json](data/SAFI.json) file. This is the output file from an electronic survey system called ODK. The JSON represents the answers to a series of survey questions. The questions themselves have been replaced with unique Keys, the values are the answers. 52 | 53 | Because detailed surveys are by nature nested structures making it possible to record different levels of detail or selectively ask a set of specific questions based on the answer given to a previous question, the structure of the answers for the survey can not only be complex and convoluted, it could easily be different from one survey respondent's set of answers to another. 54 | 55 | ### Advantages of JSON 56 | 57 | - Very popular data format for APIs (e.g. results from an Internet search) 58 | - Human readable 59 | - Each record (or document as they are called) is self contained. The equivalent of the column name and column values are in every record. 60 | - Documents do not all have to have the same structure within the same file 61 | - Document structures can be complex and nested 62 | 63 | ### Disadvantages of JSON 64 | 65 | - It is more verbose than the equivalent data in csv format 66 | - Can be more difficult to process and display than csv formatted data 67 | 68 | ## Use the JSON package to read a JSON file 69 | 70 | ```{r, message=FALSE} 71 | library(jsonlite) 72 | ``` 73 | 74 | As with reading in a CSV, you have a couple of options for how to access the JSON file. 75 | 76 | You can read the JSON file directly into R with `read_json()` or the comparable `fromJSON()` 77 | function, though this does not download the file. 78 | 79 | ```{r eval=FALSE} 80 | json_data <- read_json( 81 | "https://raw.githubusercontent.com/datacarpentry/r-socialsci/main/episodes/data/SAFI.json" 82 | ) 83 | ``` 84 | 85 | To download the file you can copy and paste the contents of the file on 86 | [GitHub](https://github.com/datacarpentry/r-socialsci/blob/main/episodes/data/SAFI.json), 87 | creating a `SAFI.json` file in your `data` directory, or you can download the file with R. 88 | 89 | ```{r download-data, eval=FALSE} 90 | download.file( 91 | "https://raw.githubusercontent.com/datacarpentry/r-socialsci/main/episodes/data/SAFI.json", 92 | "data/SAFI.json", mode = "wb") 93 | ``` 94 | 95 | Once you have the data downloaded, you can read it into R with `read_json()`: 96 | 97 | ```{r eval=FALSE} 98 | json_data <- read_json("data/SAFI.json") 99 | ``` 100 | 101 | We can see that a new object called json\_data has appeared in our Environment. It is described as a Large list (131 elements). In this current form, our data is messy. You can have a glimpse of it with the `head()` or `view()` functions. It will look not much more structured than if you were to open the JSON file with a text editor. 102 | 103 | This is because, by default, the `read_json()` function's parameter `simplifyVector`, which specifies whether or not to simplify vectors is set to FALSE. This means that the default setting does not simplify nested lists into vectors and data frames. However, we can set this to TRUE, and our data will be read directly as a dataframe: 104 | 105 | ```{r} 106 | json_data <- read_json("data/SAFI.json", simplifyVector = TRUE) 107 | ``` 108 | 109 | Now we can see we have this json data in a dataframe format. For consistency with the rest of 110 | the lesson, let's coerce it to be a tibble and use `glimpse` to take a peek 111 | inside (these functions were loaded by `library(tidyverse)`): 112 | 113 | ```{r} 114 | json_data <- json_data %>% as_tibble() 115 | glimpse(json_data) 116 | ``` 117 | 118 | Looking good, but you might notice that actually we have a variable, *F\_liv* that is a list of dataframes! It is very important to know what you are expecting from your data to be able to look for things like this. For example, if you are getting your JSON from an API, have a look at the API documentation, so you know what to look for. 119 | 120 | Often when we have a very large number of columns, it can become difficult to determine all the variables which may require some special attention, like lists. Fortunately, we can use special verbs like `where` to quickly select all the list columns. 121 | 122 | ```{r} 123 | json_data %>% 124 | select(where(is.list)) %>% 125 | glimpse() 126 | ``` 127 | 128 | So what can we do about *F\_liv*, the column of dataframes? Well first things first, we can access each one. For example to access the dataframe in the first row, we can use the bracket (`[`) subsetting. Here we use single bracket, but you could also use double bracket (`[[`). The `[[` form allows only a single element to be selected using integer or character indices, whereas `[` allows indexing by vectors. 129 | 130 | ```{r} 131 | json_data$F_liv[1] 132 | ``` 133 | 134 | We can also choose to view the nested dataframes at all the rows of our main dataframe where a particular condition is met (for example where the value for the variable *C06\_rooms* is equal to 4): 135 | 136 | ```{r} 137 | json_data$F_liv[which(json_data$C06_rooms == 4)] 138 | ``` 139 | 140 | ## Write the JSON file to csv 141 | 142 | If we try to write our json\_data dataframe to a csv as we would usually in a regular dataframe, we won't get the desired result. Using the `write_csv` function from the `{readr}` package won't give you an error for list columns, but you'll only see missing (i.e. `NA`) values in these columns. Let's try it out to confirm: 143 | 144 | ```{r, eval=FALSE} 145 | write_csv(json_data, "json_data_with_list_columns.csv") 146 | read_csv("json_data_with_list_columns.csv") 147 | ``` 148 | 149 | To write out as a csv while maintaining the data within the list columns, we will need to "flatten" these columns. One way to do this is to convert these list columns into character types. (However, we don't want to change the data types for any of the other columns). Here's one way to do this using tidyverse. This command only applies the `as.character` command to those columns 'where' `is.list` is `TRUE`. 150 | 151 | ```{r} 152 | flattened_json_data <- json_data %>% 153 | mutate(across(where(is.list), as.character)) 154 | flattened_json_data 155 | ``` 156 | 157 | Now you can write this to a csv file: 158 | 159 | ```{r, eval=FALSE} 160 | write_csv(flattened_json_data, "data_output/json_data_with_flattened_list_columns.csv") 161 | ``` 162 | 163 | Note: this means that when you read this csv back into R, the column of the nested dataframes will now be read in as a character vector. Converting it back to list to extract elements might be complicated, so it is probably better to keep storing these data in a JSON format if you will have to do this. 164 | 165 | You can also write out the individual nested dataframes to a csv. For example: 166 | 167 | ```{r, eval=FALSE} 168 | write_csv(json_data$F_liv[[1]], "data_output/F_liv_row1.csv") 169 | ``` 170 | 171 | :::::::::::::::::::::::::::::::::::::::: keypoints 172 | 173 | - JSON is a popular data format for transferring data used by a great many Web based APIs 174 | - The complex structure of a JSON document means that it cannot easily be 'flattened' into tabular data 175 | - We can use R code to extract values of interest and place them in a csv file 176 | 177 | :::::::::::::::::::::::::::::::::::::::::::::::::: 178 | 179 | 180 | -------------------------------------------------------------------------------- /episodes/data/SAFI_clean.csv: -------------------------------------------------------------------------------- 1 | "key_ID","village","interview_date","no_membrs","years_liv","respondent_wall_type","rooms","memb_assoc","affect_conflicts","liv_count","items_owned","no_meals","months_lack_food","instanceID" 2 | 1,"God","2016-11-17T00:00:00Z",3,4,"muddaub",1,"NULL","NULL",1,"bicycle;television;solar_panel;table",2,"Jan","uuid:ec241f2c-0609-46ed-b5e8-fe575f6cefef" 3 | 2,"God","2016-11-17T00:00:00Z",7,9,"muddaub",1,"yes","once",3,"cow_cart;bicycle;radio;cow_plough;solar_panel;solar_torch;table;mobile_phone",2,"Jan;Sept;Oct;Nov;Dec","uuid:099de9c9-3e5e-427b-8452-26250e840d6e" 4 | 3,"God","2016-11-17T00:00:00Z",10,15,"burntbricks",1,"NULL","NULL",1,"solar_torch",2,"Jan;Feb;Mar;Oct;Nov;Dec","uuid:193d7daf-9582-409b-bf09-027dd36f9007" 5 | 4,"God","2016-11-17T00:00:00Z",7,6,"burntbricks",1,"NULL","NULL",2,"bicycle;radio;cow_plough;solar_panel;mobile_phone",2,"Sept;Oct;Nov;Dec","uuid:148d1105-778a-4755-aa71-281eadd4a973" 6 | 5,"God","2016-11-17T00:00:00Z",7,40,"burntbricks",1,"NULL","NULL",4,"motorcyle;radio;cow_plough;mobile_phone",2,"Aug;Sept;Oct;Nov","uuid:2c867811-9696-4966-9866-f35c3e97d02d" 7 | 6,"God","2016-11-17T00:00:00Z",3,3,"muddaub",1,"NULL","NULL",1,"NULL",2,"Aug;Sept;Oct","uuid:daa56c91-c8e3-44c3-a663-af6a49a2ca70" 8 | 7,"God","2016-11-17T00:00:00Z",6,38,"muddaub",1,"no","never",1,"motorcyle;cow_plough",3,"Nov","uuid:ae20a58d-56f4-43d7-bafa-e7963d850844" 9 | 8,"Chirodzo","2016-11-16T00:00:00Z",12,70,"burntbricks",3,"yes","never",2,"motorcyle;bicycle;television;radio;cow_plough;solar_panel;solar_torch;table;fridge",2,"Jan","uuid:d6cee930-7be1-4fd9-88c0-82a08f90fb5a" 10 | 9,"Chirodzo","2016-11-16T00:00:00Z",8,6,"burntbricks",1,"no","never",3,"television;solar_panel;solar_torch",3,"Jan;Dec","uuid:846103d2-b1db-4055-b502-9cd510bb7b37" 11 | 10,"Chirodzo","2016-12-16T00:00:00Z",12,23,"burntbricks",5,"no","never",2,"cow_cart;motorcyle;bicycle;television;radio;cow_plough;solar_panel;solar_torch;table",3,"Jan;Oct;Nov;Dec","uuid:8f4e49bc-da81-4356-ae34-e0d794a23721" 12 | 11,"God","2016-11-21T00:00:00Z",6,20,"sunbricks",1,"NULL","NULL",2,"radio;cow_plough",2,"Oct;Nov","uuid:d29b44e3-3348-4afc-aa4d-9eb34c89d483" 13 | 12,"God","2016-11-21T00:00:00Z",7,20,"burntbricks",3,"yes","never",2,"cow_cart;bicycle;radio;cow_plough;table",3,"Sept;Oct","uuid:e6ee6269-b467-4e37-91fc-5e9eaf934557" 14 | 13,"God","2016-11-21T00:00:00Z",6,8,"burntbricks",1,"no","never",3,"bicycle;radio;cow_plough;mobile_phone",2,"Sept;Oct;Nov","uuid:6c00c145-ee3b-409c-8c02-2c8d743b6918" 15 | 14,"God","2016-11-21T00:00:00Z",10,20,"burntbricks",3,"NULL","NULL",3,"bicycle;radio;cow_plough;solar_panel;table;mobile_phone",3,"June;July;Aug;Sept;Oct;Nov","uuid:9b21467f-1116-4340-a3b1-1ab64f13c87d" 16 | 15,"God","2016-11-21T00:00:00Z",5,30,"sunbricks",2,"yes","once",3,"bicycle;radio;cow_plough;solar_panel;table",2,"Jan;Feb;Mar;Apr;May;June;July;Aug;Sept;Oct;Nov","uuid:a837e545-ff86-4a1c-a1a5-6186804b985f" 17 | 16,"God","2016-11-24T00:00:00Z",6,47,"muddaub",1,"NULL","NULL",4,"radio;cow_plough;solar_panel;solar_torch",3,"Jan;Feb","uuid:d17db52f-4b87-4768-b534-ea8f9704c565" 18 | 17,"God","2016-11-21T00:00:00Z",8,20,"sunbricks",1,"NULL","NULL",1,"mobile_phone",2,"Nov;Dec","uuid:4707f3dc-df18-4348-9c2c-eec651e89b6b" 19 | 18,"God","2016-11-21T00:00:00Z",4,20,"muddaub",1,"NULL","NULL",3,"bicycle;mobile_phone",2,"Oct;Nov","uuid:7ffe7bd1-a15c-420c-a137-e1f006c317a3" 20 | 19,"God","2016-11-21T00:00:00Z",9,23,"burntbricks",2,"NULL","NULL",2,"bicycle;radio;cow_plough;solar_panel;solar_torch;mobile_phone",3,"Oct;Nov;Dec","uuid:e32f2dc0-0d05-42fb-8e21-605757ddf07d" 21 | 20,"God","2016-11-21T00:00:00Z",6,1,"burntbricks",1,"NULL","NULL",1,"bicycle;cow_plough;solar_torch",2,"Oct;Nov","uuid:d1005274-bf52-4e79-8380-3350dd7c2bac" 22 | 21,"God","2016-11-21T00:00:00Z",8,20,"burntbricks",1,"no","never",3,"NULL",2,"Jan;Feb;Mar;Oct;Nov;Dec","uuid:6570a7d0-6a0b-452c-aa2e-922500e35749" 23 | 22,"God","2016-11-21T00:00:00Z",4,20,"muddaub",1,"NULL","NULL",1,"radio",2,"Jan;Feb;Mar;Apr;Aug;Sept;Oct;Nov;Dec","uuid:a51c3006-8847-46ff-9d4e-d29919b8ecf9" 24 | 23,"Ruaca","2016-11-21T00:00:00Z",10,20,"burntbricks",4,"NULL","NULL",3,"cow_cart;bicycle;television;radio;cow_plough;solar_panel;electricity;mobile_phone",3,"none","uuid:58b37b6d-d6cd-4414-8790-b9c68bca98de" 25 | 24,"Ruaca","2016-11-21T00:00:00Z",6,4,"burntbricks",2,"no","never",3,"radio;table;sofa_set;mobile_phone",2,"Nov;Dec","uuid:661457d3-7e61-45e8-a238-7415e7548f82" 26 | 25,"Ruaca","2016-11-21T00:00:00Z",11,6,"burntbricks",3,"no","never",2,"cow_cart;motorcyle;television;radio;cow_plough;solar_panel;solar_torch;table;sofa_set;mobile_phone",2,"Jan;Feb;Oct","uuid:45ed84c4-114e-4df0-9f5d-c800806c2bee" 27 | 26,"Ruaca","2016-11-21T00:00:00Z",3,20,"burntbricks",2,"no","never",2,"radio;cow_plough;table;mobile_phone",2,"none","uuid:1c54ee24-22c4-4ee9-b1ad-42d483c08e2e" 28 | 27,"Ruaca","2016-11-21T00:00:00Z",7,36,"burntbricks",2,"NULL","NULL",3,"bicycle;radio;cow_plough;solar_panel;solar_torch;mobile_phone",3,"none","uuid:3197cded-1fdc-4c0c-9b10-cfcc0bf49c4d" 29 | 28,"Ruaca","2016-11-21T00:00:00Z",2,2,"muddaub",1,"no","more_once",1,"NULL",3,"Aug;Sept;Oct","uuid:1de53318-a8cf-4736-99b1-8239f8822473" 30 | 29,"Ruaca","2016-11-21T00:00:00Z",7,10,"burntbricks",2,"yes","frequently",1,"motorcyle;bicycle;radio;table;mobile_phone",3,"Jan;Feb","uuid:adcd7463-8943-4c67-b25f-f72311409476" 31 | 30,"Ruaca","2016-11-21T00:00:00Z",7,22,"muddaub",2,"NULL","NULL",1,"bicycle;radio;mobile_phone",2,"Jan;Feb","uuid:59341ead-92be-45a9-8545-6edf9f94fdc6" 32 | 31,"Ruaca","2016-11-21T00:00:00Z",3,2,"muddaub",1,"NULL","NULL",1,"NULL",3,"none","uuid:cb06eb49-dd39-4150-8bbe-a599e074afe8" 33 | 32,"Ruaca","2016-11-21T00:00:00Z",19,69,"muddaub",2,"yes","more_once",5,"cow_cart;motorcyle;radio;cow_plough;solar_panel;mobile_phone",2,"none","uuid:25597af3-cd79-449c-a48a-fb9aea6c48bf" 34 | 33,"Ruaca","2016-11-21T00:00:00Z",8,34,"muddaub",1,"no","more_once",2,"cow_cart;lorry;motorcyle;sterio;cow_plough;solar_panel;mobile_phone",2,"none","uuid:0fbd2df1-2640-4550-9fbd-7317feaa4758" 35 | 34,"Chirodzo","2016-11-17T00:00:00Z",8,18,"burntbricks",3,"yes","more_once",3,"television;radio;cow_plough;solar_panel;solar_torch;table;mobile_phone",2,"Jan;Dec","uuid:14c78c45-a7cc-4b2a-b765-17c82b43feb4" 36 | 35,"Chirodzo","2016-11-17T00:00:00Z",5,45,"muddaub",1,"yes","more_once",2,"bicycle;cow_plough",3,"Jan;Sept;Oct;Nov;Dec","uuid:ff7496e7-984a-47d3-a8a1-13618b5683ce" 37 | 36,"Chirodzo","2016-11-17T00:00:00Z",6,23,"sunbricks",1,"yes","once",3,"cow_cart;bicycle;radio;cow_plough;solar_panel;mobile_phone",3,"none","uuid:c90eade0-1148-4a12-8c0e-6387a36f45b1" 38 | 37,"Chirodzo","2016-11-17T00:00:00Z",3,8,"burntbricks",1,"NULL","NULL",2,"bicycle;television;radio;cow_plough;solar_panel;solar_torch;mobile_phone",3,"Jan;Nov;Dec","uuid:408c6c93-d723-45ef-8dee-1b1bd3fe20cd" 39 | 38,"God","2016-11-17T00:00:00Z",10,19,"muddaub",1,"yes","never",3,"bicycle;radio;cow_plough;solar_panel;table;mobile_phone",3,"Nov","uuid:81309594-ff58-4dc1-83a7-72af5952ee08" 40 | 39,"God","2016-11-17T00:00:00Z",6,22,"muddaub",1,"NULL","NULL",1,"NULL",3,"Nov","uuid:c0fb6310-55af-4831-ae3d-2729556c3285" 41 | 40,"God","2016-11-17T00:00:00Z",9,23,"burntbricks",1,"yes","never",1,"bicycle;radio;cow_plough;solar_panel;table;mobile_phone",3,"Sept;Oct;Nov","uuid:c0b34854-eede-4e81-b183-ef58a45bfc34" 42 | 41,"God","2016-11-17T00:00:00Z",7,22,"muddaub",1,"NULL","NULL",2,"motorcyle;bicycle;radio;cow_plough;table",3,"Oct;Nov","uuid:b3ba34d8-eea1-453d-bc73-c141bcbbc5e5" 43 | 42,"God","2016-11-17T00:00:00Z",8,8,"sunbricks",1,"no","never",3,"mobile_phone",3,"Jan;Nov;Dec","uuid:e3a1dd8a-1bda-428c-a014-2b527f11ae64" 44 | 43,"Chirodzo","2016-11-17T00:00:00Z",7,29,"muddaub",1,"no","never",2,"cow_plough;mobile_phone",2,"Jan;Feb;Oct;Nov;Dec","uuid:b4dff49f-ef27-40e5-a9d1-acf287b47358" 45 | 44,"Chirodzo","2016-11-17T00:00:00Z",2,6,"muddaub",1,"NULL","NULL",3,"radio;solar_torch",2,"Jan;Dec","uuid:f9fadf44-d040-4fca-86c1-2835f79c4952" 46 | 45,"Chirodzo","2016-11-17T00:00:00Z",9,7,"muddaub",1,"no","never",4,"motorcyle;bicycle;television;radio;cow_plough;solar_panel;solar_torch;table;mobile_phone",3,"none","uuid:e3554d22-35b1-4fb9-b386-dd5866ad5792" 47 | 46,"Chirodzo","2016-11-17T00:00:00Z",10,42,"burntbricks",2,"no","once",2,"motorcyle;computer;television;sterio;solar_panel;solar_torch;table;mobile_phone",2,"Sept;Oct;Nov","uuid:35f297e0-aa5d-4149-9b7b-4965004cfc37" 48 | 47,"Chirodzo","2016-11-17T00:00:00Z",2,2,"muddaub",1,"yes","once",1,"solar_torch;mobile_phone",3,"none","uuid:2d0b1936-4f82-4ec3-a3b5-7c3c8cd6cc2b" 49 | 48,"Chirodzo","2016-11-16T00:00:00Z",7,58,"muddaub",1,"NULL","NULL",3,"radio",3,"June;July;Aug;Sept;Oct;Nov","uuid:e180899c-7614-49eb-a97c-40ed013a38a2" 50 | 49,"Chirodzo","2016-11-16T00:00:00Z",6,26,"burntbricks",2,"NULL","NULL",2,"bicycle;radio;cow_plough;solar_panel;solar_torch;table;mobile_phone",3,"Jan;Nov;Dec","uuid:2303ebc1-2b3c-475a-8916-b322ebf18440" 51 | 50,"Chirodzo","2016-11-16T00:00:00Z",6,7,"muddaub",1,"yes","never",1,"solar_torch",2,"June;July;Aug;Sept;Oct;Nov;Dec","uuid:4267c33c-53a7-46d9-8bd6-b96f58a4f92c" 52 | 51,"Chirodzo","2016-11-16T00:00:00Z",5,30,"muddaub",1,"NULL","NULL",1,"radio",3,"Oct;Nov","uuid:18ac8e77-bdaf-47ab-85a2-e4c947c9d3ce" 53 | 52,"Chirodzo","2016-11-16T00:00:00Z",11,15,"burntbricks",3,"no","never",3,"motorcyle;television;radio;cow_plough;solar_panel;mobile_phone",3,"Aug;Sept;Oct;Nov","uuid:6db55cb4-a853-4000-9555-757b7fae2bcf" 54 | 53,"Chirodzo","2016-11-16T00:00:00Z",8,16,"burntbricks",3,"yes","frequently",2,"bicycle;radio;mobile_phone",2,"Nov","uuid:cc7f75c5-d13e-43f3-97e5-4f4c03cb4b12" 55 | 54,"Chirodzo","2016-11-16T00:00:00Z",7,15,"muddaub",1,"no","never",1,"NULL",2,"Sept;Oct;Nov","uuid:273ab27f-9be3-4f3b-83c9-d3e1592de919" 56 | 55,"Chirodzo","2016-11-16T00:00:00Z",9,23,"muddaub",2,"NULL","NULL",1,"television;cow_plough;mobile_phone",2,"Oct;Nov","uuid:883c0433-9891-4121-bc63-744f082c1fa0" 57 | 56,"Chirodzo","2016-11-16T00:00:00Z",12,23,"burntbricks",2,"yes","never",2,"motorcyle;bicycle;mobile_phone",3,"none","uuid:973c4ac6-f887-48e7-aeaf-4476f2cfab76" 58 | 57,"Chirodzo","2016-11-16T00:00:00Z",4,27,"burntbricks",1,"no","never",1,"radio",2,"none","uuid:a7184e55-0615-492d-9835-8f44f3b03a71" 59 | 58,"Chirodzo","2016-11-16T00:00:00Z",11,45,"burntbricks",3,"no","never",3,"motorcyle;bicycle;television;radio;cow_plough;solar_panel;mobile_phone",2,"none","uuid:a7a3451f-cd0d-4027-82d9-8dcd1234fcca" 60 | 59,"Chirodzo","2016-11-16T00:00:00Z",2,60,"muddaub",3,"NULL","NULL",3,"NULL",2,"none","uuid:1936db62-5732-45dc-98ff-9b3ac7a22518" 61 | 60,"Chirodzo","2016-11-16T00:00:00Z",8,15,"burntbricks",2,"no","never",4,"cow_plough",2,"none","uuid:85465caf-23e4-4283-bb72-a0ef30e30176" 62 | 61,"Chirodzo","2016-11-16T00:00:00Z",10,14,"muddaub",1,"yes","more_once",3,"cow_cart;motorcyle;bicycle;television;radio;cow_plough;solar_panel;table;mobile_phone",3,"Jan;Feb;Dec","uuid:2401cf50-8859-44d9-bd14-1bf9128766f2" 63 | 62,"Chirodzo","2016-11-16T00:00:00Z",5,5,"muddaub",1,"NULL","NULL",1,"bicycle;radio;mobile_phone",3,"Aug;Sept;Oct;Nov","uuid:c6597ecc-cc2a-4c35-a6dc-e62c71b345d6" 64 | 63,"Chirodzo","2016-11-16T00:00:00Z",4,10,"muddaub",1,"NULL","NULL",1,"NULL",3,"Jan;Oct;Nov;Dec","uuid:86ed4328-7688-462f-aac7-d6518414526a" 65 | 64,"Chirodzo","2016-11-16T00:00:00Z",6,1,"muddaub",1,"NULL","NULL",1,"bicycle;solar_torch;table;sofa_set;mobile_phone",3,"Jan;Feb;Dec","uuid:28cfd718-bf62-4d90-8100-55fafbe45d06" 66 | 65,"Chirodzo","2016-11-16T00:00:00Z",8,20,"burntbricks",3,"no","once",3,"motorcyle;radio;cow_plough;table",3,"Jan;Feb;Mar","uuid:143f7478-0126-4fbc-86e0-5d324339206b" 67 | 66,"Chirodzo","2016-11-16T00:00:00Z",10,37,"burntbricks",3,"yes","frequently",4,"cow_cart;motorcyle;bicycle;television;radio;cow_plough;solar_panel;solar_torch;mobile_phone",3,"none","uuid:a457eab8-971b-4417-a971-2e55b8702816" 68 | 67,"Chirodzo","2016-11-16T00:00:00Z",5,31,"burntbricks",2,"no","more_once",4,"motorcyle;radio;cow_plough;solar_panel;mobile_phone",3,"none","uuid:6c15d667-2860-47e3-a5e7-7f679271e419" 69 | 68,"Chirodzo","2016-11-16T00:00:00Z",8,52,"burntbricks",3,"no","more_once",3,"motorcyle;television;sterio;solar_panel;mobile_phone",3,"none","uuid:ef04b3eb-b47d-412e-9b09-4f5e08fc66f9" 70 | 69,"Chirodzo","2016-11-16T00:00:00Z",4,12,"muddaub",1,"no","more_once",1,"bicycle;radio;solar_torch;mobile_phone",3,"none","uuid:f86933a5-12b8-4427-b821-43c5b039401d" 71 | 70,"Chirodzo","2016-11-16T00:00:00Z",8,25,"burntbricks",2,"no","more_once",4,"cow_cart;bicycle;radio;cow_plough;solar_panel;mobile_phone",2,"none","uuid:1feb0108-4599-4bf9-8a07-1f5e66a50a0a" 72 | 71,"Ruaca","2016-11-18T00:00:00Z",6,14,"burntbricks",1,"yes","more_once",3,"radio;cow_plough;mobile_phone",2,"Aug;Sept;Oct;Nov","uuid:761f9c49-ec93-4932-ba4c-cc7b78dfcef1" 73 | 127,"Chirodzo","2016-11-16T00:00:00Z",4,18,"burntbricks",8,"NULL","NULL",1,"mobile_phone",2,"Aug;Sept;Oct","uuid:f6d04b41-b539-4e00-868a-0f62b427587d" 74 | 133,"Ruaca","2016-11-23T00:00:00Z",5,25,"burntbricks",2,"no","never",5,"cow_cart;car;lorry;motorcyle;bicycle;television;sterio;cow_plough;solar_panel;solar_torch;electricity;table;sofa_set;mobile_phone;fridge",3,"Jan;Oct;Nov","uuid:429d279a-a519-4dcc-9f64-4673b0fd5d53" 75 | 152,"Ruaca","2016-11-24T00:00:00Z",10,16,"burntbricks",1,"yes","once",3,"motorcyle;bicycle;radio;sterio;cow_plough;solar_panel;mobile_phone",3,"none","uuid:59738c17-1cda-49ee-a563-acd76f6bc487" 76 | 153,"Ruaca","2016-11-24T00:00:00Z",5,41,"burntbricks",1,"NULL","NULL",1,"NULL",2,"Oct;Nov","uuid:7e7961ca-fa1c-4567-9bfa-a02f876e4e03" 77 | 155,"God","2016-11-24T00:00:00Z",4,4,"burntbricks",1,"NULL","NULL",1,"electricity",2,"Jan;Sept;Oct;Nov;Dec","uuid:77b3021b-a9d6-4276-aaeb-5bfcfd413852" 78 | 178,"Ruaca","2016-11-25T00:00:00Z",5,79,"burntbricks",2,"yes","frequently",3,"radio;cow_plough;solar_panel;mobile_phone",3,"none","uuid:2186e2ec-f65a-47cc-9bc1-a0f36dd9591c" 79 | 177,"God","2016-11-25T00:00:00Z",10,13,"sunbricks",1,"no","more_once",2,"motorcyle;television;cow_plough;solar_panel;mobile_phone",3,"Nov","uuid:87998c33-c8d2-49ec-9dae-c123735957ec" 80 | 180,"Ruaca","2016-11-25T00:00:00Z",7,50,"muddaub",1,"no","never",3,"cow_plough;solar_panel",3,"Oct;Nov","uuid:ece89122-ea99-4378-b67e-a170127ec4e6" 81 | 181,"God","2016-11-25T00:00:00Z",11,25,"sunbricks",2,"yes","more_once",3,"cow_cart;motorcyle;bicycle;television;radio;cow_plough;solar_panel;mobile_phone",3,"none","uuid:bf373763-dca5-4906-901b-d1bacb4f0286" 82 | 182,"God","2016-11-25T00:00:00Z",7,21,"muddaub",3,"no","more_once",2,"solar_panel",3,"Jan;Feb;Nov;Dec","uuid:394033e8-a6e2-4e39-bfac-458753a1ed78" 83 | 186,"God","2016-11-28T00:00:00Z",7,24,"muddaub",1,"no","more_once",2,"cow_plough;mobile_phone",3,"none","uuid:268bfd97-991c-473f-bd51-bc80676c65c6" 84 | 187,"God","2016-11-28T00:00:00Z",5,43,"muddaub",2,"yes","more_once",4,"cow_cart;motorcyle;bicycle;television;radio;cow_plough;solar_panel;solar_torch;mobile_phone",3,"none","uuid:0a42c9ee-a840-4dda-8123-15c1bede5dfc" 85 | 195,"God","2016-11-28T00:00:00Z",5,48,"burntbricks",1,"no","never",3,"cow_cart;bicycle;radio;cow_plough;solar_torch",2,"Sept;Oct;Nov","uuid:2c132929-9c8f-450a-81ff-367360ce2c19" 86 | 196,"God","2016-11-28T00:00:00Z",7,49,"burntbricks",2,"yes","more_once",3,"radio;cow_plough;mobile_phone",3,"none","uuid:44e427d1-a448-4bf2-b529-7d67b2266c06" 87 | 197,"God","2016-11-28T00:00:00Z",5,19,"burntbricks",2,"no","more_once",3,"bicycle;television;radio;cow_plough;solar_torch;table;mobile_phone",2,"Nov","uuid:85c99fd2-775f-40c9-8654-68223f59d091" 88 | 198,"God","2016-11-28T00:00:00Z",3,49,"burntbricks",1,"no","never",1,"NULL",3,"Nov","uuid:28c64954-739c-444c-a6e0-355878e471c8" 89 | 201,"God","2016-11-21T00:00:00Z",4,6,"muddaub",2,"NULL","NULL",2,"bicycle;radio;solar_torch;mobile_phone",2,"Oct;Nov;Dec","uuid:9e79a31c-3ea5-44f0-80f9-a32db49422e3" 90 | 202,"God","2016-11-17T00:00:00Z",12,12,"burntbricks",4,"yes","more_once",3,"cow_cart;radio;cow_plough;solar_panel;solar_torch;table;mobile_phone",3,"Jan;Feb;Mar;Oct;Nov;Dec","uuid:06d39051-38ef-4757-b68b-3327b1f16b9d" 91 | 72,"Ruaca","2017-04-26T00:00:00Z",6,24,"muddaub",1,"yes","more_once",3,"bicycle;radio;cow_plough",2,"Jan;Aug;Sept;Oct;Nov;Dec","uuid:c4a2c982-244e-45a5-aa4b-71fa53f99e18" 92 | 73,"Ruaca","2017-04-26T00:00:00Z",7,9,"burntbricks",2,"yes","more_once",3,"cow_cart;motorcyle;bicycle;television;radio;cow_plough;solar_panel;table;mobile_phone",3,"Jan;Sept;Oct","uuid:ac3da862-9e6c-4962-94b6-f4c31624f207" 93 | 76,"Ruaca","2017-04-26T00:00:00Z",17,48,"burntbricks",2,"yes","more_once",4,"bicycle;radio;cow_plough;solar_panel;mobile_phone",3,"none","uuid:4178a296-903a-4a8e-9cfa-0cd6143476e8" 94 | 83,"Ruaca","2017-04-27T00:00:00Z",5,22,"burntbricks",1,"yes","never",2,"radio;cow_plough;solar_torch",2,"Aug;Sept;Oct","uuid:a1e9df00-c8ae-411c-931c-c7df898c68d0" 95 | 85,"Ruaca","2017-04-27T00:00:00Z",7,40,"sunbricks",1,"no","never",2,"radio;cow_plough",2,"Oct;Nov","uuid:4d0f472b-f8ae-4026-87c9-6b5be14b0a70" 96 | 89,"God","2017-04-27T00:00:00Z",5,10,"burntbricks",2,"no","never",3,"bicycle;radio;cow_plough;solar_panel;solar_torch;table;mobile_phone",3,"Oct;Nov","uuid:b3b309c6-f234-4830-8b30-87d26a17ee1d" 97 | 101,"God","2017-04-27T00:00:00Z",3,4,"muddaub",1,"no","never",1,"bicycle;solar_torch",3,"Sept;Oct;Nov","uuid:3c174acd-e431-4523-9ad6-eb14cddca805" 98 | 103,"Ruaca","2017-04-27T00:00:00Z",6,96,"sunbricks",1,"no","never",5,"cow_cart;cow_plough;solar_panel;sofa_set;mobile_phone",3,"Jan;Feb;Dec","uuid:e9d79844-ef14-493b-bbd6-d13691cc660e" 99 | 102,"Ruaca","2017-04-28T00:00:00Z",12,15,"burntbricks",2,"yes","frequently",2,"cow_plough;table;sofa_set;mobile_phone",3,"Jan;Feb","uuid:76206b0b-af74-4344-b24f-81e839f0d7b0" 100 | 78,"Ruaca","2017-04-28T00:00:00Z",6,48,"burntbricks",1,"no","more_once",2,"cow_plough",2,"Aug;Sept;Oct","uuid:da3fa7cc-5ce9-44fd-9a78-b8982b607515" 101 | 80,"Ruaca","2017-04-28T00:00:00Z",5,12,"muddaub",1,"no","more_once",1,"cow_cart;bicycle;radio;cow_plough;solar_panel;solar_torch",3,"none","uuid:a85df6df-0336-46fa-a9f4-522bf6f8b438" 102 | 104,"Ruaca","2017-04-28T00:00:00Z",14,52,"sunbricks",1,"yes","never",4,"cow_cart;bicycle;cow_plough",3,"Jan;Feb;Dec","uuid:bb2bb365-7d7d-4fe9-9353-b21269676119" 103 | 105,"Ruaca","2017-04-28T00:00:00Z",6,40,"sunbricks",1,"yes","frequently",2,"motorcyle;radio;cow_plough;solar_panel;mobile_phone",3,"Jan;Feb;Dec","uuid:af0904ee-4fdb-4090-973f-599c81ddf022" 104 | 106,"God","2017-04-30T00:00:00Z",15,22,"sunbricks",5,"no","never",2,"cow_cart;motorcyle;bicycle;radio;sterio;cow_plough;solar_panel;solar_torch;table;mobile_phone",3,"Oct;Nov;Dec","uuid:468797c1-4a65-4f35-9c83-e28ce46972a2" 105 | 109,"God","2017-05-03T00:00:00Z",4,12,"sunbricks",1,"NULL","NULL",3,"cow_cart;bicycle;radio;cow_plough;table",3,"July;Aug;Sept;Oct;Nov","uuid:602cd3f6-4a97-49c6-80e3-bcfd5c78dfa4" 106 | 110,"Ruaca","2017-05-03T00:00:00Z",6,22,"sunbricks",3,"no","never",3,"bicycle;radio;cow_plough;table;mobile_phone",2,"none","uuid:e7c51ac4-24e4-475e-88e7-f85e896945e3" 107 | 113,"Ruaca","2017-05-03T00:00:00Z",11,26,"burntbricks",3,"no","never",4,"cow_cart;motorcyle;bicycle;radio;cow_plough;solar_panel;solar_torch;table;mobile_phone",3,"none","uuid:01210861-aba1-4268-98d0-0260e05f5155" 108 | 118,"Ruaca","2017-05-04T00:00:00Z",5,25,"muddaub",1,"NULL","NULL",1,"radio;solar_torch;mobile_phone",3,"Oct;Nov;Dec","uuid:77335b2e-8812-4a35-b1e5-ca9ab626dfea" 109 | 125,"Ruaca","2017-05-04T00:00:00Z",5,14,"burntbricks",1,"no","more_once",2,"bicycle;radio;cow_plough;solar_panel;solar_torch;mobile_phone",3,"Jan;Sept;Oct;Nov;Dec","uuid:02b05c68-302e-4e7a-b229-81cb1377fd29" 110 | 119,"Ruaca","2017-05-04T00:00:00Z",3,14,"muddaub",1,"no","never",4,"bicycle;cow_plough;solar_panel;mobile_phone",3,"none","uuid:fa201fce-4e94-44b8-b435-c558c2e1ed55" 111 | 115,"Ruaca","2017-05-11T00:00:00Z",4,16,"sunbricks",2,"NULL","NULL",3,"cow_cart;motorcyle;bicycle;television;radio;cow_plough;solar_panel;solar_torch;table;mobile_phone",3,"none","uuid:628fe23d-188f-43e4-a203-a4bf3257d461" 112 | 108,"God","2017-05-11T00:00:00Z",15,22,"burntbricks",2,"no","never",4,"cow_cart;bicycle;radio;cow_plough;solar_panel;table;mobile_phone",3,"Aug;Sept;Oct;Nov","uuid:e4f4d6ba-e698-45a5-947f-ba6da88cc22b" 113 | 116,"Ruaca","2017-05-11T00:00:00Z",5,25,"burntbricks",3,"NULL","NULL",3,"motorcyle;bicycle;television;radio;cow_plough;solar_panel;solar_torch;table;mobile_phone",3,"Jan;Nov;Dec","uuid:cfee6297-2c0e-4f8a-94cc-9aaee0bd64cb" 114 | 117,"Ruaca","2017-05-11T00:00:00Z",10,28,"muddaub",4,"NULL","NULL",1,"motorcyle;television;radio;solar_panel;solar_torch;table;mobile_phone",3,"Jan;Feb;Nov;Dec","uuid:3fe626b3-c794-48e1-a80f-5bfe440c507b" 115 | 144,"Ruaca","2017-05-18T00:00:00Z",7,5,"burntbricks",4,"no","frequently",4,"cow_cart;television;radio;cow_plough;solar_panel;solar_torch;table;mobile_phone",2,"none","uuid:0670cef6-d233-4852-89d8-36955261b0a3" 116 | 143,"Ruaca","2017-05-18T00:00:00Z",10,24,"burntbricks",2,"no","frequently",3,"cow_cart;motorcyle;television;radio;cow_plough;solar_torch;table;mobile_phone",3,"Jan;Dec","uuid:9a096a12-b335-468c-b3cc-1191180d62de" 117 | 150,"Ruaca","2017-05-18T00:00:00Z",7,8,"muddaub",1,"no","never",1,"mobile_phone",3,"Sept;Oct;Nov","uuid:92613d0d-e7b1-4d62-8ea4-451d7cd0a982" 118 | 159,"God","2017-05-18T00:00:00Z",4,24,"sunbricks",1,"no","never",1,"radio;solar_panel;solar_torch",3,"Sept;Oct;Nov","uuid:37577f91-d665-443e-8d70-b914954cef4b" 119 | 160,"God","2017-06-03T00:00:00Z",7,13,"burntbricks",2,"yes","frequently",2,"cow_cart;cow_plough;solar_torch;mobile_phone",2,"Nov","uuid:f22831ec-6bc3-4b73-9197-4b01e01abb66" 120 | 165,"Ruaca","2017-06-03T00:00:00Z",9,14,"burntbricks",1,"no","never",3,"cow_cart;motorcyle;bicycle;television;radio;cow_plough;solar_torch;electricity;table;sofa_set;mobile_phone;fridge",3,"none","uuid:62f3f7af-f0f3-4f88-b9e0-acf8baa49ae4" 121 | 166,"Ruaca","2017-06-03T00:00:00Z",11,16,"muddaub",1,"no","never",1,"bicycle;solar_torch;mobile_phone",2,"Feb;Mar","uuid:40aac732-94df-496c-97ba-5b67f59bcc7a" 122 | 167,"Ruaca","2017-06-03T00:00:00Z",8,24,"muddaub",1,"no","never",3,"motorcyle;radio;cow_plough;solar_panel;solar_torch;table;mobile_phone",2,"Jan;Nov;Dec","uuid:a9d1a013-043b-475d-a71b-77ed80abe970" 123 | 174,"Ruaca","2017-06-03T00:00:00Z",12,25,"burntbricks",2,"no","never",3,"car;lorry;motorcyle;radio;sterio;cow_plough;solar_panel;solar_torch;table;sofa_set;mobile_phone;fridge",3,"Jan;Feb;Dec","uuid:43ec6132-478c-4f87-878d-fb3c0c4d0c74" 124 | 175,"Ruaca","2017-06-03T00:00:00Z",7,36,"burntbricks",1,"no","never",4,"motorcyle;bicycle;radio;sterio;cow_plough;solar_panel;table;mobile_phone",2,"Jan;Oct;Nov;Dec","uuid:64fc743e-8176-40f6-8ae4-36ae97fac1d9" 125 | 189,"Ruaca","2017-06-03T00:00:00Z",15,16,"sunbricks",1,"no","never",3,"motorcyle;radio;sterio;cow_plough;solar_panel;table;mobile_phone",3,"Nov","uuid:c17e374c-280b-4e78-bf21-74a7c1c73492" 126 | 191,"Ruaca","2017-06-03T00:00:00Z",10,5,"burntbricks",4,"no","never",1,"radio;cow_plough;solar_panel;solar_torch;mobile_phone",2,"Oct;Nov;Dec","uuid:dad53aff-b520-4015-a9e3-f5fdf9168fe1" 127 | 192,"Chirodzo","2017-06-03T00:00:00Z",9,20,"burntbricks",1,"no","once",1,"bicycle;television;radio;sterio;solar_panel;solar_torch;table;mobile_phone",3,"Jan;Nov;Dec","uuid:f94409a6-e461-4e4c-a6fb-0072d3d58b00" 128 | 126,"Ruaca","2017-05-18T00:00:00Z",3,7,"burntbricks",1,"no","more_once",3,"motorcyle;radio;solar_panel",3,"Oct;Nov;Dec","uuid:69caea81-a4e5-4e8d-83cd-9c18d8e8d965" 129 | 193,"Ruaca","2017-06-04T00:00:00Z",7,10,"cement",3,"no","more_once",3,"car;lorry;television;radio;sterio;cow_plough;solar_torch;electricity;table;sofa_set;mobile_phone;fridge",3,"none","uuid:5ccc2e5a-ea90-48b5-8542-69400d5334df" 130 | 194,"Ruaca","2017-06-04T00:00:00Z",4,5,"muddaub",1,"no","more_once",1,"radio;solar_panel;solar_torch;mobile_phone",3,"Sept;Oct;Nov","uuid:95c11a30-d44f-40c4-8ea8-ec34fca6bbbf" 131 | 199,"Chirodzo","2017-06-04T00:00:00Z",7,17,"burntbricks",2,"yes","more_once",2,"cow_cart;lorry;motorcyle;computer;television;radio;sterio;cow_plough;solar_panel;solar_torch;electricity;mobile_phone",3,"Nov;Dec","uuid:ffc83162-ff24-4a87-8709-eff17abc0b3b" 132 | 200,"Chirodzo","2017-06-04T00:00:00Z",8,20,"burntbricks",2,"NULL","NULL",3,"radio;cow_plough;solar_panel;solar_torch;table;mobile_phone",3,"Oct;Nov","uuid:aa77a0d7-7142-41c8-b494-483a5b68d8a7" 133 | -------------------------------------------------------------------------------- /episodes/data/download_data.R: -------------------------------------------------------------------------------- 1 | if (!dir.exists("data")) 2 | dir.create("data") 3 | 4 | # Get and clean the data 5 | # There are leading spaces in some respondent_wall_type 6 | # key_id 1 and 21 are duplicated. The 2nd row should be 2 and the 53rd row should be 53 7 | 8 | if (! file.exists("data/SAFI_clean.csv")) { 9 | download.file("https://ndownloader.figshare.com/files/11492171", 10 | "data/SAFI_clean.csv", mode = "wb") 11 | 12 | # Clean data 13 | df <- read.csv("data/SAFI_clean.csv", 14 | stringsAsFactors = FALSE) 15 | 16 | # Remove white space 17 | df$respondent_wall_type <- trimws(df$respondent_wall_type, which = "both") 18 | # Replace duplicate ids 19 | df[[2, 1]] <- 2 20 | df[[53, 1]] <- 53 21 | 22 | write.csv(df, "data/SAFI_clean.csv", row.names = FALSE) 23 | } 24 | 25 | # Plotting data ----------------------------------------------------------- 26 | 27 | # Create plotting data for ggplot episode 28 | library(tidyr) 29 | library(dplyr) 30 | 31 | if (! file.exists("data/interviews_plotting.csv")) { 32 | # Copy code from ggplot episode to create data 33 | interviews_plotting <- df %>% 34 | # Need to turn NULL to NA 35 | mutate(memb_assoc = na_if(memb_assoc, "NULL"), 36 | affect_conflicts = na_if(affect_conflicts, "NULL"), 37 | items_owned = na_if(items_owned, "NULL")) %>% 38 | ## pivot wider by items_owned 39 | separate_longer_delim(items_owned, delim = ";") %>% 40 | replace_na(list(items_owned = "no_listed_items")) %>% 41 | ## Use of grouped mutate to find number of rows 42 | group_by(key_ID) %>% 43 | mutate(items_owned_logical = TRUE, 44 | number_items = if_else(items_owned == "no_listed_items", 0, n())) %>% 45 | pivot_wider(names_from = items_owned, 46 | values_from = items_owned_logical, 47 | values_fill = list(items_owned_logical = FALSE)) %>% 48 | ## pivot wider by months_lack_food 49 | separate_longer_delim(months_lack_food, delim = ";") %>% 50 | mutate(months_lack_food_logical = TRUE, 51 | number_months_lack_food = if_else(months_lack_food == "none", 0, n())) %>% 52 | pivot_wider(names_from = months_lack_food, 53 | values_from = months_lack_food_logical, 54 | values_fill = list(months_lack_food_logical = FALSE)) 55 | 56 | write.csv(interviews_plotting, "data/interviews_plotting.csv", row.names = FALSE) 57 | } 58 | 59 | 60 | -------------------------------------------------------------------------------- /episodes/fig/R_00_Rstudio_01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/R_00_Rstudio_01.png -------------------------------------------------------------------------------- /episodes/fig/R_00_Rstudio_02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/R_00_Rstudio_02.png -------------------------------------------------------------------------------- /episodes/fig/R_00_Rstudio_03.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/R_00_Rstudio_03.png -------------------------------------------------------------------------------- /episodes/fig/R_02_Import_Dataset_01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/R_02_Import_Dataset_01.png -------------------------------------------------------------------------------- /episodes/fig/data-frame.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 270 | -------------------------------------------------------------------------------- /episodes/fig/here_horst.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/here_horst.png -------------------------------------------------------------------------------- /episodes/fig/long_to_wide.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/long_to_wide.png -------------------------------------------------------------------------------- /episodes/fig/new-rmd.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/new-rmd.png -------------------------------------------------------------------------------- /episodes/fig/packages_pane.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/packages_pane.png -------------------------------------------------------------------------------- /episodes/fig/pivot_long_to_wide.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/pivot_long_to_wide.png -------------------------------------------------------------------------------- /episodes/fig/pivot_wide_to_long.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/pivot_wide_to_long.png -------------------------------------------------------------------------------- /episodes/fig/pivot_wider.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/pivot_wider.png -------------------------------------------------------------------------------- /episodes/fig/r+rstudio-analogy.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/r+rstudio-analogy.jpg -------------------------------------------------------------------------------- /episodes/fig/r-automatic.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/r-automatic.jpeg -------------------------------------------------------------------------------- /episodes/fig/r-manual.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/r-manual.jpeg -------------------------------------------------------------------------------- /episodes/fig/rmarkdown_wizards.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/rmarkdown_wizards.png -------------------------------------------------------------------------------- /episodes/fig/rmd-rmd_to_html.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/rmd-rmd_to_html.png -------------------------------------------------------------------------------- /episodes/fig/rstudio_project_files.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/rstudio_project_files.jpeg -------------------------------------------------------------------------------- /episodes/fig/separate_longer.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/separate_longer.png -------------------------------------------------------------------------------- /episodes/fig/tidy-data-wickham.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/tidy-data-wickham.png -------------------------------------------------------------------------------- /episodes/fig/tidyr-pivot_wider_longer.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/tidyr-pivot_wider_longer.gif -------------------------------------------------------------------------------- /episodes/fig/wide_to_long.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/wide_to_long.png -------------------------------------------------------------------------------- /episodes/fig/working-directory-setup.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/r-socialsci/16768652bd72ba4f074499f12c27ffd6cac3db94/episodes/fig/working-directory-setup.png -------------------------------------------------------------------------------- /index.md: -------------------------------------------------------------------------------- 1 | --- 2 | site: sandpaper::sandpaper_site 3 | --- 4 | 5 | Data Carpentry's aim is to teach researchers basic concepts, skills, 6 | and tools for working with data so that they can get more done in 7 | less time, and with less pain. The lessons below were designed for 8 | those interested in working with social sciences data in R. 9 | 10 | This is an introduction to R designed for participants with no 11 | programming experience. These lessons can be taught in a half-day, 12 | full-day, or over a two-day workshop (see 13 | [Instructor Notes](https://datacarpentry.org/r-socialsci/instructor/instructor-notes.html) 14 | for suggested lesson plans). 15 | They start with some basic information about R syntax, the 16 | RStudio interface, and move through how to import CSV files, the 17 | structure of data frames, how to deal with factors, how to add/remove 18 | rows and columns, how to calculate summary statistics from a data 19 | frame, and a brief introduction to plotting. 20 | 21 | :::::::::::::::::::::::::::::::::::::::::: prereq 22 | 23 | ## Getting Started 24 | 25 | Data Carpentry's teaching is hands-on, so participants are encouraged to use 26 | their own computers to ensure the proper setup of tools for an efficient 27 | workflow. 28 | 29 | **These lessons assume no prior knowledge of the skills or tools.** 30 | 31 | To get started, follow the directions in the "[Setup](setup.html)" tab to 32 | download data to your computer and follow any installation instructions. 33 | 34 | #### Prerequisites 35 | 36 | This lesson requires a working copy of **R** and **RStudio**. 37 |
To most effectively use these materials, please make sure to install 38 | everything *before* working through this lesson. 39 | 40 | 41 | :::::::::::::::::::::::::::::::::::::::::::::::::: 42 | 43 | :::::::::::::::::::::::::::::::::::::::::: instructor 44 | 45 | ## For Instructors 46 | 47 | If you are teaching this lesson in a workshop, please see the 48 | [Instructor notes](https://datacarpentry.org/r-socialsci/instructor/instructor-notes.html) 49 | for helpful tips. 50 | 51 | 52 | :::::::::::::::::::::::::::::::::::::::::::::::::: 53 | 54 | 55 | -------------------------------------------------------------------------------- /instructors/instructor-notes.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Instructor Notes 3 | --- 4 | 5 | ## Dataset 6 | 7 | The data used for this lesson are a slightly cleaned up version of the 8 | SAFI Survey Results available on GitHub. The original data is on 9 | [figshare](https://figshare.com/articles/dataset/SAFI_Survey_Results/6262019). 10 | 11 | This lesson uses `SAFI_clean.csv`. The direct download link for the data file is: 12 | [https://raw.githubusercontent.com/datacarpentry/r-socialsci/main/episodes/data/SAFI_clean.csv](https://raw.githubusercontent.com/datacarpentry/r-socialsci/main/episodes/data/SAFI_clean.csv). 13 | 14 | ## Lesson Plans 15 | 16 | The lesson contains much more material than can be taught in a day. Instructors will 17 | need to pick an appropriate subset of episodes to use in a standard one day course. 18 | 19 | Suggested path for half-day course: 20 | 21 | - Before we Start 22 | - Introduction to R 23 | - Starting with Data 24 | 25 | Suggested path for full-day course: 26 | 27 | - Before we Start 28 | - Introduction to R 29 | - Starting with Data 30 | - Data Wranging with dplyr 31 | - (OPTIONAL) Data Wrangling with tidyr 32 | - Data Visualisation with ggplot2 33 | 34 | For a two-day workshop, it may be possible to cover all of the episodes. Feedback from 35 | the community on successful lesson plans is always appreciated! 36 | 37 | ## Technical Tips and Tricks 38 | 39 | Show how to use the 'zoom' button to blow up graphs without constantly resizing 40 | windows. 41 | 42 | Sometimes a package will not install. You can try a different CRAN mirror: 43 | 44 | - Tools > Global Options > Packages > CRAN Mirror 45 | 46 | Alternatively you can go to CRAN and download the package and install from ZIP 47 | file: 48 | 49 | - Tools > Install Packages > set to 'from Zip/TAR' 50 | 51 | It's often easier to make sure they have all the needed packages installed at one 52 | time, rather than deal with these issues over and over. See the "Setup instructions" 53 | section on the homepage of the course website for package installation instructions. 54 | 55 | **`|` character on Spanish keyboards:** The Spanish Mac keyboard does not have a `|` key. 56 | This character can be created using: 57 | 58 | ``` 59 | `alt` + `1` 60 | ``` 61 | 62 | ## Other Resources 63 | 64 | If you encounter a problem during a workshop, feel free to contact the 65 | maintainers by email or [open an 66 | issue](https://github.com/datacarpentry/r-socialsci/issues/new). 67 | 68 | For a more in-depth coverage of topics of the workshops, you may want to read "[R for Data Science](http://r4ds.had.co.nz/)" by Hadley Wickham and Garrett Grolemund. 69 | 70 | 71 | -------------------------------------------------------------------------------- /learners/reference.md: -------------------------------------------------------------------------------- 1 | --- 2 | {} 3 | --- 4 | 5 | ## Glossary 6 | 7 | Cheat sheet of functions used in the lessons 8 | 9 | ### Lesson 1 -- Introduction to R 10 | 11 | - `sqrt()` # calculate the square root 12 | - `round()` # round a number 13 | - `args()` # find what arguments a function takes 14 | - `length()` # how many elements are in a particular vector 15 | - `class() ` # the class (the type of element) of an object 16 | - `str() ` # an overview of the object and the elements it contains 17 | - `typeof` # determines the (R internal) type or storage mode of any object 18 | - `c() ` # create vector; add elements to vector 19 | - `[ ]` # extract and subset vector 20 | - `%in% ` # to test if a value is found in a vector 21 | - `is.na()` # test if there are missing values 22 | - `na.omit()` # Returns the object with incomplete cases removed 23 | - `complete.cases()`\# elements which are complete cases 24 | 25 | ### Lesson 2 -- Starting with Data 26 | 27 | - `download.file() ` # download files from the internet to your computer 28 | - `read_csv() ` # load CSV file into R memory 29 | - `head() ` # shows the first 6 rows 30 | - `view()` # invoke a spreadsheet-style data viewer 31 | - `read_delim()` # load a file in table format into R memory 32 | - `str() ` # check structure of the object and information about the class, length and content of each column 33 | - `dim() ` # check dimension of data frame 34 | - `nrow() ` # returns the number of rows 35 | - `ncol() ` # returns the number of columns 36 | - `tail() ` # shows the last 6 rows 37 | - `names() ` # returns the column names (synonym of colnames() for data frame objects) 38 | - `rownames() ` # returns the row names 39 | - `summary() ` # summary statistics for each column 40 | - `glimpse` # like `str()` applied to a data frame but tries to show as much data as possible 41 | - `factor() ` # create factors 42 | - `levels() ` # check levels of a factor 43 | - `nlevels() ` # check number of levels of a factor 44 | - `as.character()` # convert an object to a character vector 45 | - `as.numeric()` # convert an object to a numeric vector 46 | - `as.numeric(as.character(x))` # convert factors where the levels appear as characters to a numeric vector 47 | - `as.numeric(levels(x))[x]` # convert factors where the levels appear as numbers to a numeric vector 48 | - `plot()` # plot an object 49 | - `addNA()` # convert NA into a factor level 50 | - `data.frame()` # create a data.frame object 51 | - `ymd()` # convert a vector representing year, month, and day to a Date vector 52 | - `paste()` # concatenate vectors after converting to character 53 | 54 | ### Lesson 3 -- Data Wrangling with dplyr and tidyr 55 | 56 | - `str()` # check structure of the object and information about the class, length and content of each column 57 | - `view()` # invoke a spreadsheet-style data viewer 58 | - `select() ` # select columns of a data frame 59 | - `filter() ` # allows you to select a subset of rows in a data frame 60 | - `%>% ` # pipes to select and filter at the same time 61 | - `mutate() ` # create new columns based on the values in existing columns 62 | - `head() ` # shows the first 6 rows 63 | - `group_by() ` # split the data into groups, apply some analysis to each group, and then combine the results. 64 | - `summarize() ` # collapses each group into a single-row summary of that group 65 | - `mean()` # calculate the mean value of a vector 66 | - `!is.na()` # test if there are no missing values 67 | - `print()` # print values to the console 68 | - `min()` # return the minimum value of a vector 69 | - `arrange()` # arrange rows by variables 70 | - `desc()` # transform a vector into a format that will be sorted in descending order 71 | - `count()` # counts the total number of records for each category 72 | - `pivot_wider()` # reshape a data frame by a key-value pair across multiple columns 73 | - `pivot_longer()` # reshape a data frame by collapsing into a key-value pair 74 | - `replace_na()` # Replace NAs with specified values 75 | - `n_distinct()` # get a count of unique values 76 | - `write_csv()` # save to a csv formatted file 77 | 78 | ### Lesson 4 -- Data Visualization with ggplot2 79 | 80 | - `read_csv()` # load a csv formatted file into R memory 81 | - `ggplot2(data= , aes(x= , y= )) + geom_point( ) + facet_wrap () + theme_bw() + theme() ` # skeleton for creating plot layers 82 | - `aes()` # by selecting the variables to be plotted and the variables to 83 | define the presentation such as plotting size, shape color, etc. 84 | - `geom_` # graphical representation of the data in the plot (points, lines, bars). To add a geom to the plot use + operator 85 | - `facet_wrap()` # allows to split one plot into multiple plots based on a factor included in the dataset 86 | - `labs()` # set labels to plot 87 | - `theme_bw()` # set the background to white 88 | - `theme()` # used to locally modify one or more theme elements in a specific ggplot object 89 | - `+` # arrange ggplots horizontally 90 | - `/` # arrange ggplots vertically 91 | - `plot_layout()` # set width and height of individual plots in a patchwork of plots 92 | - `ggsave()` # save a ggplot 93 | 94 | ### Lesson 5 -- Processing JSON data 95 | 96 | - `read_json()` # load json object to an R object 97 | 98 | 99 | -------------------------------------------------------------------------------- /learners/setup.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Setup 3 | --- 4 | 5 | ## Setup instructions 6 | 7 | **R** and **RStudio** are separate downloads and installations. R is the 8 | underlying statistical computing environment, but using R alone is no 9 | fun. RStudio is a graphical integrated development environment (IDE) that makes 10 | using R much easier and more interactive. You need to install R before you 11 | install RStudio. Once installed, because RStudio is an IDE, RStudio will run R in 12 | the background. You do not need to run it separately. 13 | 14 | After installing both programs, 15 | you will need to install the **`tidyverse`** package from within RStudio. The 16 | **`tidyverse`** package is a powerful collection of data science tools within **R** 17 | see the [**`tidyverse`** website](https://tidyverse.tidyverse.org) for more details. 18 | Follow the instructions below for your operating system, and then follow the 19 | instructions to install **`tidyverse`**. 20 | 21 | ### Windows 22 | 23 | #### If you already have R and RStudio installed 24 | 25 | - Open RStudio, and click on "Help" > "Check for updates". If a new version is 26 | available, quit RStudio, and download the latest version for RStudio. 27 | - To check which version of R you are using, start RStudio and the first thing 28 | that appears in the console indicates the version of R you are 29 | running. Alternatively, you can type `sessionInfo()`, which will also display 30 | which version of R you are running. Go on 31 | the [CRAN website](https://cran.r-project.org/bin/windows/base/) and check 32 | whether a more recent version is available. If so, you can update R using 33 | the `installr` package, by running: 34 | 35 | ```r 36 | if( !("installr" %in% installed.packages()) ){install.packages("installr")} 37 | installr::updateR(TRUE) 38 | ``` 39 | 40 | #### If you don't have R and RStudio installed 41 | 42 | - Download R from 43 | the [CRAN website](http://cran.r-project.org/bin/windows/base/release.htm). 44 | - Run the `.exe` file that was just downloaded. 45 | - Go to the [RStudio download page](https://posit.co/download/rstudio-desktop/). 46 | - Under *Installers* select **RStudio x.yy.zzz - Windows. 47 | Vista/7/8/10** (where x, y, and z represent version numbers). 48 | - Double click the file to install it. 49 | - Once it's installed, open RStudio to make sure it works and you don't get any 50 | error messages. 51 | 52 | ### macOS 53 | 54 | #### If you already have R and RStudio installed 55 | 56 | - Open RStudio, and click on "Help" > "Check for updates". If a new version is 57 | available, quit RStudio, and download the latest version for RStudio. 58 | - To check the version of R you are using, start RStudio and the first thing 59 | that appears on the terminal indicates the version of R you are running. Alternatively, you can type `sessionInfo()`, which will also display which version of R you are running. Go on 60 | the [CRAN website](https://cran.r-project.org/bin/macosx/) and check 61 | whether a more recent version is available. If so, please download and install 62 | it. In any case, make sure you have at least R 3.2. 63 | 64 | #### If you don't have R and RStudio installed 65 | 66 | - Download R from 67 | the [CRAN website](http://cran.r-project.org/bin/macosx/). 68 | - Select the `.pkg` file for the latest R version. 69 | - Double click on the downloaded file to install R. 70 | - It is also a good idea to install [XQuartz](https://www.xquartz.org/) (needed 71 | by some packages). 72 | - Go to the [RStudio download page](https://posit.co/download/rstudio-desktop/). 73 | - Under *Installers* select **RStudio x.yy.zzz - Mac OS X 10.6+ (64-bit)** 74 | (where x, y, and z represent version numbers). 75 | - Double click the file to install RStudio. 76 | - Once it's installed, open RStudio to make sure it works and you don't get any 77 | error messages. 78 | 79 | ### Linux 80 | 81 | - Follow the instructions for your distribution 82 | from [CRAN](https://cloud.r-project.org/bin/linux), they provide information 83 | to get the most recent version of R for common distributions. For most 84 | distributions, you could use your package manager (e.g., for Debian/Ubuntu run 85 | `sudo apt-get install r-base`, and for Fedora `sudo yum install R`), but we 86 | don't recommend this approach as the versions provided by this approach are 87 | usually out of date. In any case, make sure you have at least R 3.2. 88 | - Go to the 89 | [RStudio download page](https://posit.co/download/rstudio-desktop/). 90 | - Under *Installers* select the version that matches your distribution, and 91 | install it with your preferred method (e.g., with Debian/Ubuntu `sudo dpkg -i rstudio-x.yy.zzz-amd64.deb` at the terminal). 92 | - Once it's installed, open RStudio to make sure it works and you don't get any 93 | error messages. 94 | - Before installing the `tidyverse` package, **Ubuntu** (and related) users may 95 | need to install the following dependencies: `libcurl4-openssl-dev libssl-dev libxml2-dev` 96 | (e.g. `sudo apt install libcurl4-openssl-dev libssl-dev libxml2-dev`). 97 | 98 | ### For everyone 99 | 100 | **After installing R and RStudio, you need to install the `tidyverse` and `here` packages.** 101 | 102 | - After starting RStudio, at the console type: 103 | `install.packages("tidyverse")` followed by the enter key. Once this has installed, type 104 | `install.packages("here")` followed by the enter key. Both packages should now be installed. 105 | 106 | - For reference, the lesson uses `SAFI_clean.csv`. The direct download link for 107 | this file is: [https://github.com/datacarpentry/r-socialsci/blob/main/episodes/data/SAFI_clean.csv](https://github.com/datacarpentry/r-socialsci/blob/main/episodes/data/SAFI_clean.csv). 108 | This data is a slightly cleaned up version of the SAFI Survey Results available on 109 | [figshare](https://figshare.com/articles/dataset/SAFI_Survey_Results/6262019). 110 | Instructions for downloading the data with R are provided in the 111 | [Before we start episode](https://datacarpentry.org/r-socialsci/00-intro.html). 112 | 113 | - The [json episode](https://datacarpentry.org/r-socialsci/07-json.html) uses 114 | `SAFI.json`. The file is available on GitHub 115 | [here](https://github.com/datacarpentry/r-socialsci/blob/main/episodes/data/SAFI.json). 116 | 117 | 118 | -------------------------------------------------------------------------------- /profiles/learner-profiles.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: FIXME 3 | --- 4 | 5 | This is a placeholder file. Please add content here. 6 | -------------------------------------------------------------------------------- /renv/profile: -------------------------------------------------------------------------------- 1 | lesson-requirements 2 | -------------------------------------------------------------------------------- /renv/profiles/lesson-requirements/renv/.gitignore: -------------------------------------------------------------------------------- 1 | library/ 2 | local/ 3 | cellar/ 4 | lock/ 5 | python/ 6 | sandbox/ 7 | staging/ 8 | -------------------------------------------------------------------------------- /site/README.md: -------------------------------------------------------------------------------- 1 | This directory contains rendered lesson materials. Please do not edit files 2 | here. 3 | --------------------------------------------------------------------------------