├── .editorconfig ├── .github └── workflows │ ├── README.md │ ├── pr-close-signal.yaml │ ├── pr-comment.yaml │ ├── pr-post-remove-branch.yaml │ ├── pr-preflight.yaml │ ├── pr-receive.yaml │ ├── sandpaper-main.yaml │ ├── sandpaper-version.txt │ ├── update-cache.yaml │ └── update-workflows.yaml ├── .gitignore ├── .zenodo.json ├── AUTHORS ├── CITATION ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE.md ├── README.md ├── config.yaml ├── episodes ├── 00-intro.md ├── 01-format-data.md ├── 02-common-mistakes.md ├── 03-dates-as-data.md ├── 04-quality-assurance.md ├── 05-exporting-data.md └── fig │ ├── bad-formatting.png │ ├── better-formatting.png │ ├── comments-in-cells.png │ ├── data-validation-numbers-LibreOffice-new.png │ ├── data-validation-numbers-new.png │ ├── data-validation-tab-LibreOffice-new.png │ ├── data-validation-tab-new.png │ ├── error-invalid-data-LibreOffice-new.png │ ├── error-invalid-data-new.png │ ├── error_alert-new.png │ ├── error_alert_LibreOffice-new.png │ ├── excel-to-csv.png │ ├── excel_dates_1.jpg │ ├── filled-range-of-values-LibreOffice-new.png │ ├── input_message-new.png │ ├── input_message_LibreOffice-new.png │ ├── load-csv-calc.png │ ├── load-csv-excel.png │ ├── multiple-info.png │ ├── multiple-tables-example.png │ ├── multiple-tables-example2.png │ ├── select-range-of-values-LibreOffice-new.png │ ├── select-range-of-values-new.png │ ├── single-info.png │ ├── solution_exercise_1_dates.png │ ├── spreadsheet_simple_data_01.png │ ├── spreadsheets_Data_validation_05.png │ ├── white_table_1.jpg │ └── zeros-example.png ├── index.md ├── instructors └── instructor-notes.md ├── learners ├── reference.md └── setup.md ├── profiles └── learner-profiles.md └── site └── README.md /.editorconfig: -------------------------------------------------------------------------------- 1 | root = true 2 | 3 | [*] 4 | charset = utf-8 5 | insert_final_newline = true 6 | trim_trailing_whitespace = true 7 | 8 | [*.md] 9 | indent_size = 2 10 | indent_style = space 11 | max_line_length = 100 # Please keep this in sync with bin/lesson_check.py! 12 | trim_trailing_whitespace = false # keep trailing spaces in markdown - 2+ spaces are translated to a hard break (
) 13 | 14 | [*.r] 15 | max_line_length = 80 16 | 17 | [*.py] 18 | indent_size = 4 19 | indent_style = space 20 | max_line_length = 79 21 | 22 | [*.sh] 23 | end_of_line = lf 24 | 25 | [Makefile] 26 | indent_style = tab 27 | -------------------------------------------------------------------------------- /.github/workflows/README.md: -------------------------------------------------------------------------------- 1 | # Carpentries Workflows 2 | 3 | This directory contains workflows to be used for Lessons using the {sandpaper} 4 | lesson infrastructure. Two of these workflows require R (`sandpaper-main.yaml` 5 | and `pr-receive.yaml`) and the rest are bots to handle pull request management. 6 | 7 | These workflows will likely change as {sandpaper} evolves, so it is important to 8 | keep them up-to-date. To do this in your lesson you can do the following in your 9 | R console: 10 | 11 | ```r 12 | # Install/Update sandpaper 13 | options(repos = c(carpentries = "https://carpentries.r-universe.dev/", 14 | CRAN = "https://cloud.r-project.org")) 15 | install.packages("sandpaper") 16 | 17 | # update the workflows in your lesson 18 | library("sandpaper") 19 | update_github_workflows() 20 | ``` 21 | 22 | Inside this folder, you will find a file called `sandpaper-version.txt`, which 23 | will contain a version number for sandpaper. This will be used in the future to 24 | alert you if a workflow update is needed. 25 | 26 | What follows are the descriptions of the workflow files: 27 | 28 | ## Deployment 29 | 30 | ### 01 Build and Deploy (sandpaper-main.yaml) 31 | 32 | This is the main driver that will only act on the main branch of the repository. 33 | This workflow does the following: 34 | 35 | 1. checks out the lesson 36 | 2. provisions the following resources 37 | - R 38 | - pandoc 39 | - lesson infrastructure (stored in a cache) 40 | - lesson dependencies if needed (stored in a cache) 41 | 3. builds the lesson via `sandpaper:::ci_deploy()` 42 | 43 | #### Caching 44 | 45 | This workflow has two caches; one cache is for the lesson infrastructure and 46 | the other is for the lesson dependencies if the lesson contains rendered 47 | content. These caches are invalidated by new versions of the infrastructure and 48 | the `renv.lock` file, respectively. If there is a problem with the cache, 49 | manual invaliation is necessary. You will need maintain access to the repository 50 | and you can either go to the actions tab and [click on the caches button to find 51 | and invalidate the failing cache](https://github.blog/changelog/2022-10-20-manage-caches-in-your-actions-workflows-from-web-interface/) 52 | or by setting the `CACHE_VERSION` secret to the current date (which will 53 | invalidate all of the caches). 54 | 55 | ## Updates 56 | 57 | ### Setup Information 58 | 59 | These workflows run on a schedule and at the maintainer's request. Because they 60 | create pull requests that update workflows/require the downstream actions to run, 61 | they need a special repository/organization secret token called 62 | `SANDPAPER_WORKFLOW` and it must have the `public_repo` and `workflow` scope. 63 | 64 | This can be an individual user token, OR it can be a trusted bot account. If you 65 | have a repository in one of the official Carpentries accounts, then you do not 66 | need to worry about this token being present because the Carpentries Core Team 67 | will take care of supplying this token. 68 | 69 | If you want to use your personal account: you can go to 70 | 71 | to create a token. Once you have created your token, you should copy it to your 72 | clipboard and then go to your repository's settings > secrets > actions and 73 | create or edit the `SANDPAPER_WORKFLOW` secret, pasting in the generated token. 74 | 75 | If you do not specify your token correctly, the runs will not fail and they will 76 | give you instructions to provide the token for your repository. 77 | 78 | ### 02 Maintain: Update Workflow Files (update-workflow.yaml) 79 | 80 | The {sandpaper} repository was designed to do as much as possible to separate 81 | the tools from the content. For local builds, this is absolutely true, but 82 | there is a minor issue when it comes to workflow files: they must live inside 83 | the repository. 84 | 85 | This workflow ensures that the workflow files are up-to-date. The way it work is 86 | to download the update-workflows.sh script from GitHub and run it. The script 87 | will do the following: 88 | 89 | 1. check the recorded version of sandpaper against the current version on github 90 | 2. update the files if there is a difference in versions 91 | 92 | After the files are updated, if there are any changes, they are pushed to a 93 | branch called `update/workflows` and a pull request is created. Maintainers are 94 | encouraged to review the changes and accept the pull request if the outputs 95 | are okay. 96 | 97 | This update is run weekly or on demand. 98 | 99 | ### 03 Maintain: Update Package Cache (update-cache.yaml) 100 | 101 | For lessons that have generated content, we use {renv} to ensure that the output 102 | is stable. This is controlled by a single lockfile which documents the packages 103 | needed for the lesson and the version numbers. This workflow is skipped in 104 | lessons that do not have generated content. 105 | 106 | Because the lessons need to remain current with the package ecosystem, it's a 107 | good idea to make sure these packages can be updated periodically. The 108 | update cache workflow will do this by checking for updates, applying them in a 109 | branch called `updates/packages` and creating a pull request with _only the 110 | lockfile changed_. 111 | 112 | From here, the markdown documents will be rebuilt and you can inspect what has 113 | changed based on how the packages have updated. 114 | 115 | ## Pull Request and Review Management 116 | 117 | Because our lessons execute code, pull requests are a secruity risk for any 118 | lesson and thus have security measures associted with them. **Do not merge any 119 | pull requests that do not pass checks and do not have bots commented on them.** 120 | 121 | This series of workflows all go together and are described in the following 122 | diagram and the below sections: 123 | 124 | ![Graph representation of a pull request](https://carpentries.github.io/sandpaper/articles/img/pr-flow.dot.svg) 125 | 126 | ### Pre Flight Pull Request Validation (pr-preflight.yaml) 127 | 128 | This workflow runs every time a pull request is created and its purpose is to 129 | validate that the pull request is okay to run. This means the following things: 130 | 131 | 1. The pull request does not contain modified workflow files 132 | 2. If the pull request contains modified workflow files, it does not contain 133 | modified content files (such as a situation where @carpentries-bot will 134 | make an automated pull request) 135 | 3. The pull request does not contain an invalid commit hash (e.g. from a fork 136 | that was made before a lesson was transitioned from styles to use the 137 | workbench). 138 | 139 | Once the checks are finished, a comment is issued to the pull request, which 140 | will allow maintainers to determine if it is safe to run the 141 | "Receive Pull Request" workflow from new contributors. 142 | 143 | ### Receive Pull Request (pr-receive.yaml) 144 | 145 | **Note of caution:** This workflow runs arbitrary code by anyone who creates a 146 | pull request. GitHub has safeguarded the token used in this workflow to have no 147 | priviledges in the repository, but we have taken precautions to protect against 148 | spoofing. 149 | 150 | This workflow is triggered with every push to a pull request. If this workflow 151 | is already running and a new push is sent to the pull request, the workflow 152 | running from the previous push will be cancelled and a new workflow run will be 153 | started. 154 | 155 | The first step of this workflow is to check if it is valid (e.g. that no 156 | workflow files have been modified). If there are workflow files that have been 157 | modified, a comment is made that indicates that the workflow is not run. If 158 | both a workflow file and lesson content is modified, an error will occurr. 159 | 160 | The second step (if valid) is to build the generated content from the pull 161 | request. This builds the content and uploads three artifacts: 162 | 163 | 1. The pull request number (pr) 164 | 2. A summary of changes after the rendering process (diff) 165 | 3. The rendered files (build) 166 | 167 | Because this workflow builds generated content, it follows the same general 168 | process as the `sandpaper-main` workflow with the same caching mechanisms. 169 | 170 | The artifacts produced are used by the next workflow. 171 | 172 | ### Comment on Pull Request (pr-comment.yaml) 173 | 174 | This workflow is triggered if the `pr-receive.yaml` workflow is successful. 175 | The steps in this workflow are: 176 | 177 | 1. Test if the workflow is valid and comment the validity of the workflow to the 178 | pull request. 179 | 2. If it is valid: create an orphan branch with two commits: the current state 180 | of the repository and the proposed changes. 181 | 3. If it is valid: update the pull request comment with the summary of changes 182 | 183 | Importantly: if the pull request is invalid, the branch is not created so any 184 | malicious code is not published. 185 | 186 | From here, the maintainer can request changes from the author and eventually 187 | either merge or reject the PR. When this happens, if the PR was valid, the 188 | preview branch needs to be deleted. 189 | 190 | ### Send Close PR Signal (pr-close-signal.yaml) 191 | 192 | Triggered any time a pull request is closed. This emits an artifact that is the 193 | pull request number for the next action 194 | 195 | ### Remove Pull Request Branch (pr-post-remove-branch.yaml) 196 | 197 | Tiggered by `pr-close-signal.yaml`. This removes the temporary branch associated with 198 | the pull request (if it was created). 199 | -------------------------------------------------------------------------------- /.github/workflows/pr-close-signal.yaml: -------------------------------------------------------------------------------- 1 | name: "Bot: Send Close Pull Request Signal" 2 | 3 | on: 4 | pull_request: 5 | types: 6 | [closed] 7 | 8 | jobs: 9 | send-close-signal: 10 | name: "Send closing signal" 11 | runs-on: ubuntu-22.04 12 | if: ${{ github.event.action == 'closed' }} 13 | steps: 14 | - name: "Create PRtifact" 15 | run: | 16 | mkdir -p ./pr 17 | printf ${{ github.event.number }} > ./pr/NUM 18 | - name: Upload Diff 19 | uses: actions/upload-artifact@v4 20 | with: 21 | name: pr 22 | path: ./pr 23 | -------------------------------------------------------------------------------- /.github/workflows/pr-comment.yaml: -------------------------------------------------------------------------------- 1 | name: "Bot: Comment on the Pull Request" 2 | 3 | # read-write repo token 4 | # access to secrets 5 | on: 6 | workflow_run: 7 | workflows: ["Receive Pull Request"] 8 | types: 9 | - completed 10 | 11 | concurrency: 12 | group: pr-${{ github.event.workflow_run.pull_requests[0].number }} 13 | cancel-in-progress: true 14 | 15 | 16 | jobs: 17 | # Pull requests are valid if: 18 | # - they match the sha of the workflow run head commit 19 | # - they are open 20 | # - no .github files were committed 21 | test-pr: 22 | name: "Test if pull request is valid" 23 | runs-on: ubuntu-22.04 24 | if: > 25 | github.event.workflow_run.event == 'pull_request' && 26 | github.event.workflow_run.conclusion == 'success' 27 | outputs: 28 | is_valid: ${{ steps.check-pr.outputs.VALID }} 29 | payload: ${{ steps.check-pr.outputs.payload }} 30 | number: ${{ steps.get-pr.outputs.NUM }} 31 | msg: ${{ steps.check-pr.outputs.MSG }} 32 | steps: 33 | - name: 'Download PR artifact' 34 | id: dl 35 | uses: carpentries/actions/download-workflow-artifact@main 36 | with: 37 | run: ${{ github.event.workflow_run.id }} 38 | name: 'pr' 39 | 40 | - name: "Get PR Number" 41 | if: ${{ steps.dl.outputs.success == 'true' }} 42 | id: get-pr 43 | run: | 44 | unzip pr.zip 45 | echo "NUM=$(<./NR)" >> $GITHUB_OUTPUT 46 | 47 | - name: "Fail if PR number was not present" 48 | id: bad-pr 49 | if: ${{ steps.dl.outputs.success != 'true' }} 50 | run: | 51 | echo '::error::A pull request number was not recorded. The pull request that triggered this workflow is likely malicious.' 52 | exit 1 53 | - name: "Get Invalid Hashes File" 54 | id: hash 55 | run: | 56 | echo "json<> $GITHUB_OUTPUT 59 | - name: "Check PR" 60 | id: check-pr 61 | if: ${{ steps.dl.outputs.success == 'true' }} 62 | uses: carpentries/actions/check-valid-pr@main 63 | with: 64 | pr: ${{ steps.get-pr.outputs.NUM }} 65 | sha: ${{ github.event.workflow_run.head_sha }} 66 | headroom: 3 # if it's within the last three commits, we can keep going, because it's likely rapid-fire 67 | invalid: ${{ fromJSON(steps.hash.outputs.json)[github.repository] }} 68 | fail_on_error: true 69 | 70 | # Create an orphan branch on this repository with two commits 71 | # - the current HEAD of the md-outputs branch 72 | # - the output from running the current HEAD of the pull request through 73 | # the md generator 74 | create-branch: 75 | name: "Create Git Branch" 76 | needs: test-pr 77 | runs-on: ubuntu-22.04 78 | if: ${{ needs.test-pr.outputs.is_valid == 'true' }} 79 | env: 80 | NR: ${{ needs.test-pr.outputs.number }} 81 | permissions: 82 | contents: write 83 | steps: 84 | - name: 'Checkout md outputs' 85 | uses: actions/checkout@v4 86 | with: 87 | ref: md-outputs 88 | path: built 89 | fetch-depth: 1 90 | 91 | - name: 'Download built markdown' 92 | id: dl 93 | uses: carpentries/actions/download-workflow-artifact@main 94 | with: 95 | run: ${{ github.event.workflow_run.id }} 96 | name: 'built' 97 | 98 | - if: ${{ steps.dl.outputs.success == 'true' }} 99 | run: unzip built.zip 100 | 101 | - name: "Create orphan and push" 102 | if: ${{ steps.dl.outputs.success == 'true' }} 103 | run: | 104 | cd built/ 105 | git config --local user.email "actions@github.com" 106 | git config --local user.name "GitHub Actions" 107 | CURR_HEAD=$(git rev-parse HEAD) 108 | git checkout --orphan md-outputs-PR-${NR} 109 | git add -A 110 | git commit -m "source commit: ${CURR_HEAD}" 111 | ls -A | grep -v '^.git$' | xargs -I _ rm -r '_' 112 | cd .. 113 | unzip -o -d built built.zip 114 | cd built 115 | git add -A 116 | git commit --allow-empty -m "differences for PR #${NR}" 117 | git push -u --force --set-upstream origin md-outputs-PR-${NR} 118 | 119 | # Comment on the Pull Request with a link to the branch and the diff 120 | comment-pr: 121 | name: "Comment on Pull Request" 122 | needs: [test-pr, create-branch] 123 | runs-on: ubuntu-22.04 124 | if: ${{ needs.test-pr.outputs.is_valid == 'true' }} 125 | env: 126 | NR: ${{ needs.test-pr.outputs.number }} 127 | permissions: 128 | pull-requests: write 129 | steps: 130 | - name: 'Download comment artifact' 131 | id: dl 132 | uses: carpentries/actions/download-workflow-artifact@main 133 | with: 134 | run: ${{ github.event.workflow_run.id }} 135 | name: 'diff' 136 | 137 | - if: ${{ steps.dl.outputs.success == 'true' }} 138 | run: unzip ${{ github.workspace }}/diff.zip 139 | 140 | - name: "Comment on PR" 141 | id: comment-diff 142 | if: ${{ steps.dl.outputs.success == 'true' }} 143 | uses: carpentries/actions/comment-diff@main 144 | with: 145 | pr: ${{ env.NR }} 146 | path: ${{ github.workspace }}/diff.md 147 | 148 | # Comment if the PR is open and matches the SHA, but the workflow files have 149 | # changed 150 | comment-changed-workflow: 151 | name: "Comment if workflow files have changed" 152 | needs: test-pr 153 | runs-on: ubuntu-22.04 154 | if: ${{ always() && needs.test-pr.outputs.is_valid == 'false' }} 155 | env: 156 | NR: ${{ github.event.workflow_run.pull_requests[0].number }} 157 | body: ${{ needs.test-pr.outputs.msg }} 158 | permissions: 159 | pull-requests: write 160 | steps: 161 | - name: 'Check for spoofing' 162 | id: dl 163 | uses: carpentries/actions/download-workflow-artifact@main 164 | with: 165 | run: ${{ github.event.workflow_run.id }} 166 | name: 'built' 167 | 168 | - name: 'Alert if spoofed' 169 | id: spoof 170 | if: ${{ steps.dl.outputs.success == 'true' }} 171 | run: | 172 | echo 'body<> $GITHUB_ENV 173 | echo '' >> $GITHUB_ENV 174 | echo '## :x: DANGER :x:' >> $GITHUB_ENV 175 | echo 'This pull request has modified workflows that created output. Close this now.' >> $GITHUB_ENV 176 | echo '' >> $GITHUB_ENV 177 | echo 'EOF' >> $GITHUB_ENV 178 | 179 | - name: "Comment on PR" 180 | id: comment-diff 181 | uses: carpentries/actions/comment-diff@main 182 | with: 183 | pr: ${{ env.NR }} 184 | body: ${{ env.body }} 185 | -------------------------------------------------------------------------------- /.github/workflows/pr-post-remove-branch.yaml: -------------------------------------------------------------------------------- 1 | name: "Bot: Remove Temporary PR Branch" 2 | 3 | on: 4 | workflow_run: 5 | workflows: ["Bot: Send Close Pull Request Signal"] 6 | types: 7 | - completed 8 | 9 | jobs: 10 | delete: 11 | name: "Delete branch from Pull Request" 12 | runs-on: ubuntu-22.04 13 | if: > 14 | github.event.workflow_run.event == 'pull_request' && 15 | github.event.workflow_run.conclusion == 'success' 16 | permissions: 17 | contents: write 18 | steps: 19 | - name: 'Download artifact' 20 | uses: carpentries/actions/download-workflow-artifact@main 21 | with: 22 | run: ${{ github.event.workflow_run.id }} 23 | name: pr 24 | - name: "Get PR Number" 25 | id: get-pr 26 | run: | 27 | unzip pr.zip 28 | echo "NUM=$(<./NUM)" >> $GITHUB_OUTPUT 29 | - name: 'Remove branch' 30 | uses: carpentries/actions/remove-branch@main 31 | with: 32 | pr: ${{ steps.get-pr.outputs.NUM }} 33 | -------------------------------------------------------------------------------- /.github/workflows/pr-preflight.yaml: -------------------------------------------------------------------------------- 1 | name: "Pull Request Preflight Check" 2 | 3 | on: 4 | pull_request_target: 5 | branches: 6 | ["main"] 7 | types: 8 | ["opened", "synchronize", "reopened"] 9 | 10 | jobs: 11 | test-pr: 12 | name: "Test if pull request is valid" 13 | if: ${{ github.event.action != 'closed' }} 14 | runs-on: ubuntu-22.04 15 | outputs: 16 | is_valid: ${{ steps.check-pr.outputs.VALID }} 17 | permissions: 18 | pull-requests: write 19 | steps: 20 | - name: "Get Invalid Hashes File" 21 | id: hash 22 | run: | 23 | echo "json<> $GITHUB_OUTPUT 26 | - name: "Check PR" 27 | id: check-pr 28 | uses: carpentries/actions/check-valid-pr@main 29 | with: 30 | pr: ${{ github.event.number }} 31 | invalid: ${{ fromJSON(steps.hash.outputs.json)[github.repository] }} 32 | fail_on_error: true 33 | - name: "Comment result of validation" 34 | id: comment-diff 35 | if: ${{ always() }} 36 | uses: carpentries/actions/comment-diff@main 37 | with: 38 | pr: ${{ github.event.number }} 39 | body: ${{ steps.check-pr.outputs.MSG }} 40 | -------------------------------------------------------------------------------- /.github/workflows/pr-receive.yaml: -------------------------------------------------------------------------------- 1 | name: "Receive Pull Request" 2 | 3 | on: 4 | pull_request: 5 | types: 6 | [opened, synchronize, reopened] 7 | 8 | concurrency: 9 | group: ${{ github.ref }} 10 | cancel-in-progress: true 11 | 12 | jobs: 13 | test-pr: 14 | name: "Record PR number" 15 | if: ${{ github.event.action != 'closed' }} 16 | runs-on: ubuntu-22.04 17 | outputs: 18 | is_valid: ${{ steps.check-pr.outputs.VALID }} 19 | steps: 20 | - name: "Record PR number" 21 | id: record 22 | if: ${{ always() }} 23 | run: | 24 | echo ${{ github.event.number }} > ${{ github.workspace }}/NR # 2022-03-02: artifact name fixed to be NR 25 | - name: "Upload PR number" 26 | id: upload 27 | if: ${{ always() }} 28 | uses: actions/upload-artifact@v4 29 | with: 30 | name: pr 31 | path: ${{ github.workspace }}/NR 32 | - name: "Get Invalid Hashes File" 33 | id: hash 34 | run: | 35 | echo "json<> $GITHUB_OUTPUT 38 | - name: "echo output" 39 | run: | 40 | echo "${{ steps.hash.outputs.json }}" 41 | - name: "Check PR" 42 | id: check-pr 43 | uses: carpentries/actions/check-valid-pr@main 44 | with: 45 | pr: ${{ github.event.number }} 46 | invalid: ${{ fromJSON(steps.hash.outputs.json)[github.repository] }} 47 | 48 | build-md-source: 49 | name: "Build markdown source files if valid" 50 | needs: test-pr 51 | runs-on: ubuntu-22.04 52 | if: ${{ needs.test-pr.outputs.is_valid == 'true' }} 53 | env: 54 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 55 | RENV_PATHS_ROOT: ~/.local/share/renv/ 56 | CHIVE: ${{ github.workspace }}/site/chive 57 | PR: ${{ github.workspace }}/site/pr 58 | MD: ${{ github.workspace }}/site/built 59 | steps: 60 | - name: "Check Out Main Branch" 61 | uses: actions/checkout@v4 62 | 63 | - name: "Check Out Staging Branch" 64 | uses: actions/checkout@v4 65 | with: 66 | ref: md-outputs 67 | path: ${{ env.MD }} 68 | 69 | - name: "Set up R" 70 | uses: r-lib/actions/setup-r@v2 71 | with: 72 | use-public-rspm: true 73 | install-r: false 74 | 75 | - name: "Set up Pandoc" 76 | uses: r-lib/actions/setup-pandoc@v2 77 | 78 | - name: "Setup Lesson Engine" 79 | uses: carpentries/actions/setup-sandpaper@main 80 | with: 81 | cache-version: ${{ secrets.CACHE_VERSION }} 82 | 83 | - name: "Setup Package Cache" 84 | uses: carpentries/actions/setup-lesson-deps@main 85 | with: 86 | cache-version: ${{ secrets.CACHE_VERSION }} 87 | 88 | - name: "Validate and Build Markdown" 89 | id: build-site 90 | run: | 91 | sandpaper::package_cache_trigger(TRUE) 92 | sandpaper::validate_lesson(path = '${{ github.workspace }}') 93 | sandpaper:::build_markdown(path = '${{ github.workspace }}', quiet = FALSE) 94 | shell: Rscript {0} 95 | 96 | - name: "Generate Artifacts" 97 | id: generate-artifacts 98 | run: | 99 | sandpaper:::ci_bundle_pr_artifacts( 100 | repo = '${{ github.repository }}', 101 | pr_number = '${{ github.event.number }}', 102 | path_md = '${{ env.MD }}', 103 | path_pr = '${{ env.PR }}', 104 | path_archive = '${{ env.CHIVE }}', 105 | branch = 'md-outputs' 106 | ) 107 | shell: Rscript {0} 108 | 109 | - name: "Upload PR" 110 | uses: actions/upload-artifact@v4 111 | with: 112 | name: pr 113 | path: ${{ env.PR }} 114 | overwrite: true 115 | 116 | - name: "Upload Diff" 117 | uses: actions/upload-artifact@v4 118 | with: 119 | name: diff 120 | path: ${{ env.CHIVE }} 121 | retention-days: 1 122 | 123 | - name: "Upload Build" 124 | uses: actions/upload-artifact@v4 125 | with: 126 | name: built 127 | path: ${{ env.MD }} 128 | retention-days: 1 129 | 130 | - name: "Teardown" 131 | run: sandpaper::reset_site() 132 | shell: Rscript {0} 133 | -------------------------------------------------------------------------------- /.github/workflows/sandpaper-main.yaml: -------------------------------------------------------------------------------- 1 | name: "01 Build and Deploy Site" 2 | 3 | on: 4 | push: 5 | branches: 6 | - main 7 | - master 8 | schedule: 9 | - cron: '0 0 * * 2' 10 | workflow_dispatch: 11 | inputs: 12 | name: 13 | description: 'Who triggered this build?' 14 | required: true 15 | default: 'Maintainer (via GitHub)' 16 | reset: 17 | description: 'Reset cached markdown files' 18 | required: false 19 | default: false 20 | type: boolean 21 | jobs: 22 | full-build: 23 | name: "Build Full Site" 24 | 25 | # 2024-10-01: ubuntu-latest is now 24.04 and R is not installed by default in the runner image 26 | # pin to 22.04 for now 27 | runs-on: ubuntu-22.04 28 | permissions: 29 | checks: write 30 | contents: write 31 | pages: write 32 | env: 33 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 34 | RENV_PATHS_ROOT: ~/.local/share/renv/ 35 | steps: 36 | 37 | - name: "Checkout Lesson" 38 | uses: actions/checkout@v4 39 | 40 | - name: "Set up R" 41 | uses: r-lib/actions/setup-r@v2 42 | with: 43 | use-public-rspm: true 44 | install-r: false 45 | 46 | - name: "Set up Pandoc" 47 | uses: r-lib/actions/setup-pandoc@v2 48 | 49 | - name: "Setup Lesson Engine" 50 | uses: carpentries/actions/setup-sandpaper@main 51 | with: 52 | cache-version: ${{ secrets.CACHE_VERSION }} 53 | 54 | - name: "Setup Package Cache" 55 | uses: carpentries/actions/setup-lesson-deps@main 56 | with: 57 | cache-version: ${{ secrets.CACHE_VERSION }} 58 | 59 | - name: "Deploy Site" 60 | run: | 61 | reset <- "${{ github.event.inputs.reset }}" == "true" 62 | sandpaper::package_cache_trigger(TRUE) 63 | sandpaper:::ci_deploy(reset = reset) 64 | shell: Rscript {0} 65 | -------------------------------------------------------------------------------- /.github/workflows/sandpaper-version.txt: -------------------------------------------------------------------------------- 1 | 0.16.12 2 | -------------------------------------------------------------------------------- /.github/workflows/update-cache.yaml: -------------------------------------------------------------------------------- 1 | name: "03 Maintain: Update Package Cache" 2 | 3 | on: 4 | workflow_dispatch: 5 | inputs: 6 | name: 7 | description: 'Who triggered this build (enter github username to tag yourself)?' 8 | required: true 9 | default: 'monthly run' 10 | schedule: 11 | # Run every tuesday 12 | - cron: '0 0 * * 2' 13 | 14 | jobs: 15 | preflight: 16 | name: "Preflight Check" 17 | runs-on: ubuntu-22.04 18 | outputs: 19 | ok: ${{ steps.check.outputs.ok }} 20 | steps: 21 | - id: check 22 | run: | 23 | if [[ ${{ github.event_name }} == 'workflow_dispatch' ]]; then 24 | echo "ok=true" >> $GITHUB_OUTPUT 25 | echo "Running on request" 26 | # using single brackets here to avoid 08 being interpreted as octal 27 | # https://github.com/carpentries/sandpaper/issues/250 28 | elif [ `date +%d` -le 7 ]; then 29 | # If the Tuesday lands in the first week of the month, run it 30 | echo "ok=true" >> $GITHUB_OUTPUT 31 | echo "Running on schedule" 32 | else 33 | echo "ok=false" >> $GITHUB_OUTPUT 34 | echo "Not Running Today" 35 | fi 36 | 37 | check_renv: 38 | name: "Check if We Need {renv}" 39 | runs-on: ubuntu-22.04 40 | needs: preflight 41 | if: ${{ needs.preflight.outputs.ok == 'true'}} 42 | outputs: 43 | needed: ${{ steps.renv.outputs.exists }} 44 | steps: 45 | - name: "Checkout Lesson" 46 | uses: actions/checkout@v4 47 | - id: renv 48 | run: | 49 | if [[ -d renv ]]; then 50 | echo "exists=true" >> $GITHUB_OUTPUT 51 | fi 52 | 53 | check_token: 54 | name: "Check SANDPAPER_WORKFLOW token" 55 | runs-on: ubuntu-22.04 56 | needs: check_renv 57 | if: ${{ needs.check_renv.outputs.needed == 'true' }} 58 | outputs: 59 | workflow: ${{ steps.validate.outputs.wf }} 60 | repo: ${{ steps.validate.outputs.repo }} 61 | steps: 62 | - name: "validate token" 63 | id: validate 64 | uses: carpentries/actions/check-valid-credentials@main 65 | with: 66 | token: ${{ secrets.SANDPAPER_WORKFLOW }} 67 | 68 | update_cache: 69 | name: "Update Package Cache" 70 | needs: check_token 71 | if: ${{ needs.check_token.outputs.repo== 'true' }} 72 | runs-on: ubuntu-22.04 73 | env: 74 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 75 | RENV_PATHS_ROOT: ~/.local/share/renv/ 76 | steps: 77 | 78 | - name: "Checkout Lesson" 79 | uses: actions/checkout@v4 80 | 81 | - name: "Set up R" 82 | uses: r-lib/actions/setup-r@v2 83 | with: 84 | use-public-rspm: true 85 | install-r: false 86 | 87 | - name: "Update {renv} deps and determine if a PR is needed" 88 | id: update 89 | uses: carpentries/actions/update-lockfile@main 90 | with: 91 | cache-version: ${{ secrets.CACHE_VERSION }} 92 | 93 | - name: Create Pull Request 94 | id: cpr 95 | if: ${{ steps.update.outputs.n > 0 }} 96 | uses: carpentries/create-pull-request@main 97 | with: 98 | token: ${{ secrets.SANDPAPER_WORKFLOW }} 99 | delete-branch: true 100 | branch: "update/packages" 101 | commit-message: "[actions] update ${{ steps.update.outputs.n }} packages" 102 | title: "Update ${{ steps.update.outputs.n }} packages" 103 | body: | 104 | :robot: This is an automated build 105 | 106 | This will update ${{ steps.update.outputs.n }} packages in your lesson with the following versions: 107 | 108 | ``` 109 | ${{ steps.update.outputs.report }} 110 | ``` 111 | 112 | :stopwatch: In a few minutes, a comment will appear that will show you how the output has changed based on these updates. 113 | 114 | If you want to inspect these changes locally, you can use the following code to check out a new branch: 115 | 116 | ```bash 117 | git fetch origin update/packages 118 | git checkout update/packages 119 | ``` 120 | 121 | - Auto-generated by [create-pull-request][1] on ${{ steps.update.outputs.date }} 122 | 123 | [1]: https://github.com/carpentries/create-pull-request/tree/main 124 | labels: "type: package cache" 125 | draft: false 126 | -------------------------------------------------------------------------------- /.github/workflows/update-workflows.yaml: -------------------------------------------------------------------------------- 1 | name: "02 Maintain: Update Workflow Files" 2 | 3 | on: 4 | workflow_dispatch: 5 | inputs: 6 | name: 7 | description: 'Who triggered this build (enter github username to tag yourself)?' 8 | required: true 9 | default: 'weekly run' 10 | clean: 11 | description: 'Workflow files/file extensions to clean (no wildcards, enter "" for none)' 12 | required: false 13 | default: '.yaml' 14 | schedule: 15 | # Run every Tuesday 16 | - cron: '0 0 * * 2' 17 | 18 | jobs: 19 | check_token: 20 | name: "Check SANDPAPER_WORKFLOW token" 21 | runs-on: ubuntu-22.04 22 | outputs: 23 | workflow: ${{ steps.validate.outputs.wf }} 24 | repo: ${{ steps.validate.outputs.repo }} 25 | steps: 26 | - name: "validate token" 27 | id: validate 28 | uses: carpentries/actions/check-valid-credentials@main 29 | with: 30 | token: ${{ secrets.SANDPAPER_WORKFLOW }} 31 | 32 | update_workflow: 33 | name: "Update Workflow" 34 | runs-on: ubuntu-22.04 35 | needs: check_token 36 | if: ${{ needs.check_token.outputs.workflow == 'true' }} 37 | steps: 38 | - name: "Checkout Repository" 39 | uses: actions/checkout@v4 40 | 41 | - name: Update Workflows 42 | id: update 43 | uses: carpentries/actions/update-workflows@main 44 | with: 45 | clean: ${{ github.event.inputs.clean }} 46 | 47 | - name: Create Pull Request 48 | id: cpr 49 | if: "${{ steps.update.outputs.new }}" 50 | uses: carpentries/create-pull-request@main 51 | with: 52 | token: ${{ secrets.SANDPAPER_WORKFLOW }} 53 | delete-branch: true 54 | branch: "update/workflows" 55 | commit-message: "[actions] update sandpaper workflow to version ${{ steps.update.outputs.new }}" 56 | title: "Update Workflows to Version ${{ steps.update.outputs.new }}" 57 | body: | 58 | :robot: This is an automated build 59 | 60 | Update Workflows from sandpaper version ${{ steps.update.outputs.old }} -> ${{ steps.update.outputs.new }} 61 | 62 | - Auto-generated by [create-pull-request][1] on ${{ steps.update.outputs.date }} 63 | 64 | [1]: https://github.com/carpentries/create-pull-request/tree/main 65 | labels: "type: template and tools" 66 | draft: false 67 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # sandpaper files 2 | episodes/*html 3 | site/* 4 | !site/README.md 5 | *.Rproj 6 | 7 | # History files 8 | .Rhistory 9 | .Rapp.history 10 | # Session Data files 11 | .RData 12 | # User-specific files 13 | .Ruserdata 14 | # Example code in package build process 15 | *-Ex.R 16 | # Output files from R CMD build 17 | /*.tar.gz 18 | # Output files from R CMD check 19 | /*.Rcheck/ 20 | # RStudio files 21 | .Rproj.user/ 22 | # produced vignettes 23 | vignettes/*.html 24 | vignettes/*.pdf 25 | # OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3 26 | .httr-oauth 27 | # knitr and R markdown default cache directories 28 | *_cache/ 29 | /cache/ 30 | # Temporary files created by R markdown 31 | *.utf8.md 32 | *.knit.md 33 | # R Environment Variables 34 | .Renviron 35 | # pkgdown site 36 | docs/ 37 | # translation temp files 38 | po/*~ 39 | # renv detritus 40 | renv/sandbox/ 41 | *.pyc 42 | *~ 43 | .DS_Store 44 | .ipynb_checkpoints 45 | .sass-cache 46 | .jekyll-cache/ 47 | .jekyll-metadata 48 | __pycache__ 49 | _site 50 | .Rproj.user 51 | .bundle/ 52 | .vendor/ 53 | vendor/ 54 | .docker-vendor/ 55 | Gemfile.lock 56 | .*history 57 | -------------------------------------------------------------------------------- /.zenodo.json: -------------------------------------------------------------------------------- 1 | { 2 | "contributors": [ 3 | { 4 | "type": "Editor", 5 | "name": "Trevor Burrows" 6 | } 7 | ], 8 | "creators": [ 9 | { 10 | "name": "Christopher Prener" 11 | }, 12 | { 13 | "name": "Trevor Burrows" 14 | }, 15 | { 16 | "name": "bkmgit" 17 | }, 18 | { 19 | "name": "Mara Sedlins" 20 | }, 21 | { 22 | "name": "AliNite" 23 | }, 24 | { 25 | "name": "Scott Carl Peterson", 26 | "orcid": "0000-0002-1920-616X" 27 | }, 28 | { 29 | "name": "Angelique Trusler", 30 | "orcid": "0000-0003-2340-8538" 31 | }, 32 | { 33 | "name": "Annajiat Alim Rasel", 34 | "orcid": "0000-0003-0198-3734" 35 | }, 36 | { 37 | "name": "Maneesha Sane" 38 | }, 39 | { 40 | "name": "Peter Bugeia" 41 | }, 42 | { 43 | "name": "Phil Reed", 44 | "orcid": "0000-0002-4479-715X" 45 | }, 46 | { 47 | "name": "Angela Li", 48 | "orcid": "0000-0002-8956-419X" 49 | }, 50 | { 51 | "name": "Claudiu Forgaci", 52 | "orcid": "0000-0003-3218-5102" 53 | }, 54 | { 55 | "name": "Dafne Erica van Kuppevelt", 56 | "orcid": "0000-0002-2662-1994" 57 | }, 58 | { 59 | "name": "Emily Ferrier" 60 | }, 61 | { 62 | "name": "Fran Baseby" 63 | }, 64 | { 65 | "name": "Katherine E. Koziar", 66 | "orcid": "0000-0003-0505-7973" 67 | }, 68 | { 69 | "name": "Kunal Marwaha", 70 | "orcid": "0000-0001-9084-6971" 71 | }, 72 | { 73 | "name": "Naoe Tatara", 74 | "orcid": "0000-0002-0049-1634" 75 | }, 76 | { 77 | "name": "Nathaniel Porter", 78 | }, 79 | { 80 | "name": "Sarah M Brown", 81 | "orcid": "0000-0001-5728-0822" 82 | }, 83 | { 84 | "name": "Shiobhan Smith", 85 | "orcid": "0000-0003-1738-9836" 86 | }, 87 | { 88 | "name": "Tugba Ozturk" 89 | }, 90 | { 91 | "name": "ecparke-utm" 92 | }, 93 | { 94 | "name": "geyslein" 95 | }, 96 | { 97 | "name": "marksnyders" 98 | }, 99 | { 100 | "name": "mlbecher" 101 | } 102 | ], 103 | "license": { 104 | "id": "CC-BY-4.0" 105 | } 106 | } 107 | -------------------------------------------------------------------------------- /AUTHORS: -------------------------------------------------------------------------------- 1 | FIXME: list authors' names and email addresses. 2 | -------------------------------------------------------------------------------- /CITATION: -------------------------------------------------------------------------------- 1 | FIXME: describe how to cite this lesson. 2 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Contributor Code of Conduct" 3 | --- 4 | 5 | As contributors and maintainers of this project, 6 | we pledge to follow the [The Carpentries Code of Conduct][coc]. 7 | 8 | Instances of abusive, harassing, or otherwise unacceptable behavior 9 | may be reported by following our [reporting guidelines][coc-reporting]. 10 | 11 | 12 | [coc-reporting]: https://docs.carpentries.org/topic_folders/policies/incident-reporting.html 13 | [coc]: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html 14 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | ## Contributing 2 | 3 | [The Carpentries][cp-site] ([Software Carpentry][swc-site], [Data 4 | Carpentry][dc-site], and [Library Carpentry][lc-site]) are open source 5 | projects, and we welcome contributions of all kinds: new lessons, fixes to 6 | existing material, bug reports, and reviews of proposed changes are all 7 | welcome. 8 | 9 | ### Contributor Agreement 10 | 11 | By contributing, you agree that we may redistribute your work under [our 12 | license](LICENSE.md). In exchange, we will address your issues and/or assess 13 | your change proposal as promptly as we can, and help you become a member of our 14 | community. Everyone involved in [The Carpentries][cp-site] agrees to abide by 15 | our [code of conduct](CODE_OF_CONDUCT.md). 16 | 17 | ### How to Contribute 18 | 19 | The easiest way to get started is to file an issue to tell us about a spelling 20 | mistake, some awkward wording, or a factual error. This is a good way to 21 | introduce yourself and to meet some of our community members. 22 | 23 | 1. If you do not have a [GitHub][github] account, you can [send us comments by 24 | email][contact]. However, we will be able to respond more quickly if you use 25 | one of the other methods described below. 26 | 27 | 2. If you have a [GitHub][github] account, or are willing to [create 28 | one][github-join], but do not know how to use Git, you can report problems 29 | or suggest improvements by [creating an issue][issues]. This allows us to 30 | assign the item to someone and to respond to it in a threaded discussion. 31 | 32 | 3. If you are comfortable with Git, and would like to add or change material, 33 | you can submit a pull request (PR). Instructions for doing this are 34 | [included below](#using-github). 35 | 36 | Note: if you want to build the website locally, please refer to [The Workbench 37 | documentation][template-doc]. 38 | 39 | ### Where to Contribute 40 | 41 | 1. If you wish to change this lesson, add issues and pull requests here. 42 | 2. If you wish to change the template used for workshop websites, please refer 43 | to [The Workbench documentation][template-doc]. 44 | 45 | 46 | ### What to Contribute 47 | 48 | There are many ways to contribute, from writing new exercises and improving 49 | existing ones to updating or filling in the documentation and submitting [bug 50 | reports][issues] about things that do not work, are not clear, or are missing. 51 | If you are looking for ideas, please see [the list of issues for this 52 | repository][repo], or the issues for [Data Carpentry][dc-issues], [Library 53 | Carpentry][lc-issues], and [Software Carpentry][swc-issues] projects. 54 | 55 | Comments on issues and reviews of pull requests are just as welcome: we are 56 | smarter together than we are on our own. **Reviews from novices and newcomers 57 | are particularly valuable**: it's easy for people who have been using these 58 | lessons for a while to forget how impenetrable some of this material can be, so 59 | fresh eyes are always welcome. 60 | 61 | ### What *Not* to Contribute 62 | 63 | Our lessons already contain more material than we can cover in a typical 64 | workshop, so we are usually *not* looking for more concepts or tools to add to 65 | them. As a rule, if you want to introduce a new idea, you must (a) estimate how 66 | long it will take to teach and (b) explain what you would take out to make room 67 | for it. The first encourages contributors to be honest about requirements; the 68 | second, to think hard about priorities. 69 | 70 | We are also not looking for exercises or other material that only run on one 71 | platform. Our workshops typically contain a mixture of Windows, macOS, and 72 | Linux users; in order to be usable, our lessons must run equally well on all 73 | three. 74 | 75 | ### Using GitHub 76 | 77 | If you choose to contribute via GitHub, you may want to look at [How to 78 | Contribute to an Open Source Project on GitHub](https://egghead.io/courses/how-to-contribute-to-an-open-source-project-on-github). In brief, we 79 | use [GitHub flow][github-flow] to manage changes: 80 | 81 | 1. Create a new branch in your desktop copy of this repository for each 82 | significant change. 83 | 2. Commit the change in that branch. 84 | 3. Push that branch to your fork of this repository on GitHub. 85 | 4. Submit a pull request from that branch to the [upstream repository][repo]. 86 | 5. If you receive feedback, make changes on your desktop and push to your 87 | branch on GitHub: the pull request will update automatically. 88 | 89 | NB: The published copy of the lesson is usually in the `main` branch. 90 | 91 | Each lesson has a team of maintainers who review issues and pull requests or 92 | encourage others to do so. The maintainers are community volunteers, and have 93 | final say over what gets merged into the lesson. 94 | 95 | ### Other Resources 96 | 97 | The Carpentries is a global organisation with volunteers and learners all over 98 | the world. We share values of inclusivity and a passion for sharing knowledge, 99 | teaching and learning. There are several ways to connect with The Carpentries 100 | community listed at including via social 101 | media, slack, newsletters, and email lists. You can also [reach us by 102 | email][contact]. 103 | 104 | [repo]: https://example.com/FIXME](https://github.com/datacarpentry/spreadsheets-socialsci 105 | [contact]: mailto:team@carpentries.org 106 | [cp-site]: https://carpentries.org/ 107 | [dc-issues]: https://github.com/issues?q=user%3Adatacarpentry 108 | [dc-lessons]: https://datacarpentry.org/lessons/ 109 | [dc-site]: https://datacarpentry.org/ 110 | [discuss-list]: https://lists.software-carpentry.org/listinfo/discuss 111 | [github]: https://github.com 112 | [github-flow]: https://guides.github.com/introduction/flow/ 113 | [github-join]: https://github.com/join 114 | [how-contribute]: https://egghead.io/series/how-to-contribute-to-an-open-source-project-on-github 115 | [issues]: https://carpentries.org/help-wanted-issues/ 116 | [lc-issues]: https://github.com/issues?q=user%3ALibraryCarpentry 117 | [swc-issues]: https://github.com/issues?q=user%3Aswcarpentry 118 | [swc-lessons]: https://software-carpentry.org/lessons/ 119 | [swc-site]: https://software-carpentry.org/ 120 | [lc-site]: https://librarycarpentry.org/ 121 | [template-doc]: https://carpentries.github.io/workbench/ 122 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Licenses" 3 | --- 4 | 5 | ## Instructional Material 6 | 7 | All Carpentries (Software Carpentry, Data Carpentry, and Library Carpentry) 8 | instructional material is made available under the [Creative Commons 9 | Attribution license][cc-by-human]. The following is a human-readable summary of 10 | (and not a substitute for) the [full legal text of the CC BY 4.0 11 | license][cc-by-legal]. 12 | 13 | You are free: 14 | 15 | - to **Share**---copy and redistribute the material in any medium or format 16 | - to **Adapt**---remix, transform, and build upon the material 17 | 18 | for any purpose, even commercially. 19 | 20 | The licensor cannot revoke these freedoms as long as you follow the license 21 | terms. 22 | 23 | Under the following terms: 24 | 25 | - **Attribution**---You must give appropriate credit (mentioning that your work 26 | is derived from work that is Copyright (c) The Carpentries and, where 27 | practical, linking to ), provide a [link to the 28 | license][cc-by-human], and indicate if changes were made. You may do so in 29 | any reasonable manner, but not in any way that suggests the licensor endorses 30 | you or your use. 31 | 32 | - **No additional restrictions**---You may not apply legal terms or 33 | technological measures that legally restrict others from doing anything the 34 | license permits. With the understanding that: 35 | 36 | Notices: 37 | 38 | * You do not have to comply with the license for elements of the material in 39 | the public domain or where your use is permitted by an applicable exception 40 | or limitation. 41 | * No warranties are given. The license may not give you all of the permissions 42 | necessary for your intended use. For example, other rights such as publicity, 43 | privacy, or moral rights may limit how you use the material. 44 | 45 | ## Software 46 | 47 | Except where otherwise noted, the example programs and other software provided 48 | by The Carpentries are made available under the [OSI][osi]-approved [MIT 49 | license][mit-license]. 50 | 51 | Permission is hereby granted, free of charge, to any person obtaining a copy of 52 | this software and associated documentation files (the "Software"), to deal in 53 | the Software without restriction, including without limitation the rights to 54 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 55 | of the Software, and to permit persons to whom the Software is furnished to do 56 | so, subject to the following conditions: 57 | 58 | The above copyright notice and this permission notice shall be included in all 59 | copies or substantial portions of the Software. 60 | 61 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 62 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 63 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 64 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 65 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 66 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 67 | SOFTWARE. 68 | 69 | ## Trademark 70 | 71 | "The Carpentries", "Software Carpentry", "Data Carpentry", and "Library 72 | Carpentry" and their respective logos are registered trademarks of 73 | [The Carpentries, Inc.][carpentries]. 74 | 75 | [cc-by-human]: https://creativecommons.org/licenses/by/4.0/ 76 | [cc-by-legal]: https://creativecommons.org/licenses/by/4.0/legalcode 77 | [mit-license]: https://opensource.org/licenses/mit-license.html 78 | [carpentries]: https://carpentries.org 79 | [osi]: https://opensource.org 80 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![Create a Slack Account with us](https://img.shields.io/badge/Create_Slack_Account-The_Carpentries-071159.svg)](https://slack-invite.carpentries.org/) 2 | [![Slack Status](https://img.shields.io/badge/Slack_Channel-dc--socsci--data--org-E01563.svg)](https://carpentries.slack.com/messages/C9X34DJ9Z) 3 | [![DOI](https://zenodo.org/badge/92422634.svg)](https://zenodo.org/badge/latestdoi/92422634) 4 | 5 | # spreadsheets-socialsci 6 | 7 | Lesson on spreadsheets and data organization for social scientists. 8 | 9 | Readme file for The SAFI Teaching Database 10 | Generated on 2019-09-19 for teaching purposes. 11 | 12 | Recommended citation for the dataset: Woodhouse, Philip; Veldwisch, Gert Jan; Brockington, Daniel; Komakech, 13 | Hans C.; Manjichi, Angela; Venot, Jean-Philippe (2018): SAFI Survey Results. doi:10.6084/m9.figshare.6262019.v1 14 | 15 | *** 16 | 17 | PROJECT INFORMATION 18 | 19 | *** 20 | 21 | 1. Title of dataset: The SAFI (Studying African Farmer-led Irrigation) Teaching Database 22 | 23 | 2. Author information: 24 | 25 | Principal Investigator 26 | Name: Philip Woodhouse 27 | Address: University of Manchester 28 | Email: [phil.woodhouse@manchester.ac.uk](mailto:phil.woodhouse@manchester.ac.uk) 29 | 30 | Co-Investigators: 31 | Name:Gert Jan Veldwisch 32 | Name: Daniel Brockington 33 | Name: Hans C Komakech 34 | Name: Angela Manjichi 35 | Name: Jean-Philippe Vernot 36 | 37 | 3. Data of data collection: November 2016 - June 2017 38 | 39 | 4. Funder Name: DFID-ESRC Growth Research Programme (DEGRP) grant ES/L01239/1 40 | 41 | 5. Publications: 42 | Farmer-led irrigation development and investment strategies for food security, growth and employment in 43 | Africa. Policy Brief. [www.safi-research.org/resources](https://www.safi-research.org/resources) 44 | 45 | *** 46 | 47 | DATA ACCESS INFORMATION 48 | 49 | *** 50 | 51 | 1. Licences / restrictions placed on access to the dataset: CC0 52 | 2: Access through figshare: doi:10.6084/m9.figshare.6262019.v1 53 | 54 | *** 55 | 56 | METHODS OF DATA COLLECTION 57 | 58 | *** 59 | 60 | 1. Describe the methods for data collection and / or provide links to papers describing data collection methods: 61 | This is survey data relating to households and agriculture in Tanzania and Mozambique. The survey data was collected 62 | through interviews conducted between November 2016 and June 2017. This is a teaching version of the dataset, 63 | not the full version. 64 | 65 | 2. Instrument information: 66 | The survey was split into several sections: 67 | A - General questions about when and where the survey was conducted; 68 | B - Information about the household and how long they have been living in the area; 69 | C - Details about the accommodation and other buildings on the farm; 70 | D - Details about different plots of land they grow crops on; 71 | E - Details about how they irrigate the land and availability of water; 72 | F - Financial details including assets owned and sources of income; 73 | G - Details of financial hardships; 74 | X - Information collected directly from the smartphone (GPS) or automatically included in the form (InstanceID). 75 | 76 | 3. Data procesessing: 77 | The survey data was collected through interviews using forms downloaded to Android Smartphones. The survey forms were 78 | created using the ODK (Open Data Kit) software via an Excel spreadsheet. The collected data were then sent back to the server. 79 | The server can be used to download the collected data in both JSON and CSV formats. 80 | 81 | 4. Analysis methods: 82 | Descriptive and summary statistics were calculated using SPSS. 83 | 84 | *** 85 | 86 | SUMMARY OF DATA FILES 87 | 88 | *** 89 | 90 | 1. List of data files: 91 | Filename: SAFI\_clean.csv 92 | Short description: CSV file containing the combined teaching data on one worksheet. 93 | 94 | Filename: SAFI\_messy.xlsx 95 | Short description: Excel file containing data for Tanzania and Mozambique recorded on separate worksheets 96 | and requiring data cleaning prior to anlysis. 97 | 98 | Filename: SAFI\_dates.xlsx 99 | Short description: Excel file containing date data for understanding how to format dates in spreadsheets. 100 | 101 | 2. Relationships between files: 102 | No official linkages between files. 103 | 104 | *** 105 | 106 | DATA-SPECIFIC INFORMATION FOR SAFI\_clean.csv 107 | 108 | *** 109 | 110 | 1. Number of variables: 14 111 | 112 | 2. Number of cases: 131 113 | 114 | 3. Missing data codes: NULL 115 | 116 | 4. Variable list 117 | Variable name: key\_ID 118 | Variable description: Added to provide a unique ID for each observation (the InstanceID field does this as well) 119 | Variable coding/values: Numeric values 120 | Range of values: 1-202 121 | 122 | Variable name: village 123 | Variable description: Village name 124 | Variable coding / values: Text 125 | Range of values: God, Chirodzo, Ruaca 126 | 127 | Variable name: interview\_date 128 | Variable description: Date of interview 129 | Variable coding: Date YYYY-MM-DDTime 130 | Range of values: 2016-11-16 - 2017-06-04 131 | 132 | Variable name: no\_membrs 133 | Variable description: How many members live in the household? 134 | Variable coding: Numeric value (continuous) 135 | Range of values: 2 - 19 136 | 137 | Variable name: years\_liv 138 | Variable description: How many years have you lived in this, or a neighbouring village? 139 | Variable coding: Numeric value (years, continuous) 140 | Range of values: 1-96 141 | 142 | Variable name: respondent\_wall\_type 143 | Variable description: What type of walls are in the house? 144 | Variable coding: Text (categories) 145 | Range of values: burntbricks, muddaub, sunbricks, cement 146 | 147 | Variable name: rooms 148 | Variable description: How many rooms in the house are used for sleeping? 149 | Variable coding: Numeric value (continuous) 150 | Range of values: 1-8 151 | 152 | Variable name: memb\_assoc 153 | Variable description: Is the participant a member of an irrigation association? 154 | Variable coding: Yes / No / NULL 155 | 156 | Variable name: affect\_conflicts 157 | Variable description: Has the person been affected by conflicts with other irrigators in the area? 158 | Variable coding: Text (category) 159 | Range of values: once, more\_once, frequently, never, NULL 160 | 161 | Variable name: liv\_count 162 | Variable description: Livestock count 163 | Variable coding: Numeric value (continuous) 164 | Range of values: 1-5 165 | 166 | Variable name: items\_owned 167 | Variable description: Which of the following items are owned by the household (list provided) 168 | Variable coding: Text (string separated by semicolon) 169 | 170 | Variable name: no\_meals 171 | Variable description: How many meals do people in your household normally eat in a day? 172 | Variable coding: Numeric value (continuous) 173 | Range of values: 2-3 174 | 175 | Variable name: months\_lack\_food 176 | Variable description: Indicate which months, in the last 12 months where you have faced a situation when you did not have enough food to feed the household? 177 | Variable coding: Text (string separate by semicolon) 178 | Range of values: Month given in abbreviation or none 179 | 180 | Variable name: InstanceID 181 | Variable description: Unique identifier for the form data submission 182 | Variable coding: unique ID alpha-numeric string 183 | 184 | *** 185 | 186 | DATA-SPECIFIC INFORMATION FOR SAFI\_messy.xlsx 187 | 188 | *** 189 | 190 | 1. Number of variables: 14 across two worksheets (Tanzania and Mozambique) 191 | 192 | 2. Variable list 193 | Variable name: key\_ID 194 | Variable description: Added to provide a unique ID for each observation (the InstanceID field does this as well) 195 | Variable coding/values: Numeric values 196 | Range of values: 1-202 197 | 198 | Variable name: roof\_type 199 | Variable description: Type of roof on accommodation 200 | Variable coding / values: Text (categories) 201 | Range of values: grass, mabatisloping 202 | 203 | Variable name: wall\_type 204 | Variable description: Type of wall in accommodation 205 | Variable coding: Text (categories) 206 | Range of values: muddaub, burntbricks 207 | 208 | Variable name: floor\_type 209 | Variable description: Type of floor in accommodation 210 | Variable coding: Text (categories) 211 | Range of values: earth, cement 212 | 213 | Variable name: live\_stock\_owned\_and\_numbers 214 | Variable description: Type of livestock owned and total number owned 215 | Variable coding: Alpha numeric 216 | Range of values: 1-4, poultry, oxen, cows, goats 217 | 218 | Variable name: plots 219 | Variable description: Number of plots cultivated in the last 12 months 220 | Variable coding: Numeric (categories) 221 | Range of values: 1-4 and -999 222 | 223 | Variable name: water use 224 | Variable description: Do you bring water to your fields, stop water leaving your fields or drain water out of any of your fields? 225 | Variable coding: text (categories) 226 | Range of values: no, yes, Y, N, 1, 1, no (only in summer) 227 | 228 | Variable name: rooms 229 | Variable description: Number of rooms in the house used for seleeping 230 | Variable coding: Numeric 231 | Range of values: 1 - 4 232 | 233 | Variable name: oxen 234 | Variable description: Do you own oxen? 235 | Variable coding: Numeric binary 236 | Range of values: 0, 1 237 | 238 | Variable name: poultry 239 | Variable description: Do you own poultry 240 | Variable coding: Text 241 | Range of values: 1,2 Yes 242 | 243 | Variable name: goats 244 | Variable description: Do you own goats? 245 | Variable coding: Text 246 | Range of values: 1, 0, No 247 | 248 | Variable name: cows 249 | Variable description: Do you own cows? 250 | Variable coding: Text 251 | Range of values: 1,0, Yes 252 | 253 | Variable name: total 254 | Variable description:Total number of livestock owned 255 | Variable coding: Numeric (continuous) 256 | Range of values: 1-4 257 | 258 | Variable name: look after cows 259 | Variable description: Does the participant look after cows? 260 | Variable coding: Yes / No 261 | 262 | *** 263 | 264 | DATA-SPECIFIC INFORMATION FOR SAFI\_dates.xlsx 265 | 266 | *** 267 | 268 | 1. Number of variables: 14 across two worksheets (DD\_MM\_YEAR and MM\_DD\_YEAR) 269 | 270 | 2. Variable list 271 | Variable name: Interview dates 272 | Variable description: Date that interview took place 273 | Variable coding/values: Date 274 | Range of values: DD-MM-YYYY or MM\_DD\_YYYY depending on spreadsheet 275 | 276 | Variable name: years\_farm 277 | Variable description: Number of years the household have been farming in this area 278 | Variable coding / values: 279 | Range of values: 280 | 281 | Variable name: parents\_live 282 | Variable description: Did your parents live in this village or neighbouring village? 283 | Variable coding: Yes / No 284 | Range of values: Yes / No 285 | 286 | Variable name: no\_membrs 287 | Variable description: How many members live in your household? 288 | Variable coding: Numeric value 289 | Range of values: 2 - 19 290 | 291 | Variable name: roof\_type 292 | Variable description: Type of roof on the accommodataion 293 | Variable coding: Text (categories) 294 | Range of values: grass, mabatisloping 295 | 296 | Variable name: respondent\_wall\_type 297 | Variable description: Type of wall in the accommodation 298 | Variable coding: Text (categories) 299 | Range of values: burntbricks, muddaub, sunbricks, cement 300 | 301 | Variable name: floor\_type 302 | Variable description: Type of floor in the accommodation 303 | Variable coding: Text (categories) 304 | Range of values: earth, cement 305 | 306 | 307 | -------------------------------------------------------------------------------- /config.yaml: -------------------------------------------------------------------------------- 1 | #------------------------------------------------------------ 2 | # Values for this lesson. 3 | #------------------------------------------------------------ 4 | 5 | # Which carpentry is this (swc, dc, lc, or cp)? 6 | # swc: Software Carpentry 7 | # dc: Data Carpentry 8 | # lc: Library Carpentry 9 | # cp: Carpentries (to use for instructor training for instance) 10 | # incubator: The Carpentries Incubator 11 | carpentry: 'dc' 12 | 13 | # Overall title for pages. 14 | title: 'Data Organization in Spreadsheets for Social Scientists' 15 | 16 | # Date the lesson was created (YYYY-MM-DD, this is empty by default) 17 | created: '2017-05-25' 18 | 19 | # Comma-separated list of keywords for the lesson 20 | keywords: 'software, data, lesson, The Carpentries' 21 | 22 | # Life cycle stage of the lesson 23 | # possible values: pre-alpha, alpha, beta, stable 24 | life_cycle: 'stable' 25 | 26 | # License of the lesson materials (recommended CC-BY 4.0) 27 | license: 'CC-BY 4.0' 28 | 29 | # Link to the source repository for this lesson 30 | source: 'https://github.com/datacarpentry/spreadsheets-socialsci' 31 | 32 | # Default branch of your lesson 33 | branch: 'main' 34 | 35 | # Who to contact if there are any issues 36 | contact: 'team@carpentries.org' 37 | 38 | # Navigation ------------------------------------------------ 39 | # 40 | # Use the following menu items to specify the order of 41 | # individual pages in each dropdown section. Leave blank to 42 | # include all pages in the folder. 43 | # 44 | # Example ------------- 45 | # 46 | # episodes: 47 | # - introduction.md 48 | # - first-steps.md 49 | # 50 | # learners: 51 | # - setup.md 52 | # 53 | # instructors: 54 | # - instructor-notes.md 55 | # 56 | # profiles: 57 | # - one-learner.md 58 | # - another-learner.md 59 | 60 | # Order of episodes in your lesson 61 | episodes: 62 | - 00-intro.md 63 | - 01-format-data.md 64 | - 02-common-mistakes.md 65 | - 03-dates-as-data.md 66 | - 04-quality-assurance.md 67 | - 05-exporting-data.md 68 | 69 | # Information for Learners 70 | learners: 71 | 72 | # Information for Instructors 73 | instructors: 74 | 75 | # Learner Profiles 76 | profiles: 77 | 78 | # Customisation --------------------------------------------- 79 | # 80 | # This space below is where custom yaml items (e.g. pinning 81 | # sandpaper and varnish versions) should live 82 | 83 | 84 | url: 'https://datacarpentry.github.io/spreadsheets-socialsci' 85 | analytics: carpentries 86 | lang: en 87 | -------------------------------------------------------------------------------- /episodes/00-intro.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Introduction 3 | teaching: 15 4 | exercises: 3 5 | --- 6 | 7 | ::::::::::::::::::::::::::::::::::::::: objectives 8 | 9 | - Define the scope of this lesson 10 | - Describe some drawbacks and advantages of using spreadsheet programs 11 | 12 | :::::::::::::::::::::::::::::::::::::::::::::::::: 13 | 14 | :::::::::::::::::::::::::::::::::::::::: questions 15 | 16 | - What are spreadsheets useful for in a research project? 17 | 18 | :::::::::::::::::::::::::::::::::::::::::::::::::: 19 | 20 | 21 | Good data organization is the foundation of your research 22 | project. Most researchers have data or do data entry in 23 | spreadsheets. Spreadsheet programs are very useful graphical 24 | interfaces for designing data tables and handling very basic data 25 | quality control functions. 26 | 27 | ### Spreadsheet outline 28 | 29 | In this lesson, we're going to talk about: 30 | 31 | - Good data entry practices - formatting data tables in spreadsheets 32 | - How to avoid common formatting mistakes 33 | - Recognising and reformatting dates in spreadsheets 34 | - Basic quality control and data manipulation in spreadsheets 35 | - Exporting data from spreadsheets 36 | 37 | ### Spreadsheet programs 38 | 39 | Many spreadsheet programs are available. We will use Microsoft Excel in our examples. 40 | Although it is not open source software it is very widely available and used. 41 | 42 | Free spreadsheet programs such as LibreOffice are available. 43 | The functionality of these may differ from Excel, but in general they can be used to perform similar tasks. 44 | 45 | ## Problems with Spreadsheets 46 | 47 | Spreadsheets are good for data entry, 48 | but in reality we tend to use spreadsheet programs for much more than data entry. 49 | We use them to create data tables for publications, 50 | to generate summary statistics, 51 | and make figures. 52 | Laying out spreadsheets in this way often adds some difficulty when we want 53 | to take our data from the spreadsheet and use it in another program. 54 | Additional white space, merged cells, colour and grids 55 | may aid readability but are not easily handled by other programs 56 | that take our spreadsheet as an input to further analysis. 57 | 58 | Generating statistics and figures in spreadsheets should be done with caution. 59 | The graphical, drag and drop nature of spreadsheet programs means that it can be very difficult, if not impossible, to replicate your steps (much less retrace anyone else's). 60 | This is particularly true if your stats or figures require complex calculations. 61 | Furthermore, when performing calculations in a spreadsheet, it's easy to accidentally apply a slightly different formula to multiple adjacent cells. 62 | This often makes it difficult to demonstrate data quality and consistency in our analysis. 63 | 64 | Even when we are aware of some of the limitations that data in spreadsheets presents, 65 | often we have inherited spreadsheets from another colleague or data provider. 66 | In these situations we cannot exercise any control in its construction 67 | or entry of the data within it. 68 | Nevertheless it is important to be aware of the limitations these data may present, and know how to assess if any problems are present and how to overcome them. 69 | 70 | ::::::::::::::::::::::::::::::::::::::::: callout 71 | 72 | ## What this lesson will not teach you 73 | 74 | - How to do *statistics* in a spreadsheet 75 | - How to do *plotting* in a spreadsheet 76 | - How to *write code* in spreadsheet programs 77 | 78 | If you're looking to do this, a couple of good references are the 79 | [Excel Cookbook](https://search.worldcat.org/title/1419271899), published by O'Reilly, and the [Microsoft Excel 365 bible](https://search.worldcat.org/en/title/1263023438). 80 | 81 | 82 | :::::::::::::::::::::::::::::::::::::::::::::::::: 83 | 84 | ::::::::::::::::::::::::::::::::::::::: challenge 85 | 86 | ## Exercise 87 | 88 | - How many people have used spreadsheets in their research? 89 | - How many people have accidentally done something that made them 90 | frustrated or sad? 91 | 92 | 93 | :::::::::::::::::::::::::::::::::::::::::::::::::: 94 | 95 | ### Using Spreadsheets for Data Entry and Cleaning 96 | 97 | However, there are circumstances where you might want to use a spreadsheet 98 | program to produce "quick and dirty" calculations or figures, and some of 99 | these features can be used in data cleaning, prior to importation into a 100 | statistical analysis program. We will show you how to use some features of 101 | spreadsheet programs to check your data quality along the way and produce 102 | preliminary summary statistics. 103 | 104 | In this lesson, we will assume that you are most likely using Excel as 105 | your primary spreadsheet program - there are other programs with similar functionality but Excel seems 106 | to be the most commonly used. 107 | 108 | 109 | 110 | :::::::::::::::::::::::::::::::::::::::: keypoints 111 | 112 | - Good data organization is the foundation of any research project. 113 | - Spreadsheets are good for data entry, but when doing data cleaning or analysis, it's not easy to show or replicate what you did. 114 | 115 | :::::::::::::::::::::::::::::::::::::::::::::::::: 116 | 117 | 118 | -------------------------------------------------------------------------------- /episodes/01-format-data.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Formatting Data Tables in Spreadsheets 3 | teaching: 15 4 | exercises: 15 5 | --- 6 | 7 | ::::::::::::::::::::::::::::::::::::::: objectives 8 | 9 | - Recognise and resolve common spreadsheet formatting problems. 10 | - Describe the importance of metadata. 11 | - Identify metadata that should be included with a dataset. 12 | 13 | :::::::::::::::::::::::::::::::::::::::::::::::::: 14 | 15 | :::::::::::::::::::::::::::::::::::::::: questions 16 | 17 | - How do we format data in spreadsheets for effective data use? 18 | 19 | :::::::::::::::::::::::::::::::::::::::::::::::::: 20 | 21 | ## Data formatting problems 22 | 23 | The most common mistake made is treating spreadsheet programs like lab notebooks, that is, 24 | relying on context, notes in the margin, 25 | spatial layout of data and fields to convey information. As humans, we 26 | can (usually) interpret these things, but computers don't view information the same way, and 27 | unless we explain to the computer what every single thing means (and 28 | that can be hard!), it will not be able to see how our data fit 29 | together. 30 | 31 | Using the power of computers, we can manage and analyze data in much more 32 | effective and faster ways, but to use that power, we have to set up 33 | our data for the computer to be able to understand it (and computers are very 34 | literal). 35 | 36 | This is why it's extremely important to set up well-formatted 37 | tables from the outset - before you even start entering data from 38 | your very first preliminary experiment. Data organization is the 39 | foundation of your research project. It can make it easier or harder 40 | to work with your data throughout your analysis, so it's worth 41 | thinking about when you're doing your data entry or setting up your 42 | experiment. You can set things up in different ways in spreadsheets, 43 | but some of these choices can limit your ability to work with the data in other programs or 44 | have the you-of-6-months-from-now or your collaborator work with the 45 | data. 46 | 47 | ::::::::::::::::::::::::::::::::::::::::: callout 48 | 49 | ## Tip 50 | 51 | The best layouts/formats (as well as software and 52 | interfaces) for data entry and data analysis might be 53 | different. It is important to take this into account, and ideally 54 | automate the conversion from one to another. 55 | 56 | 57 | :::::::::::::::::::::::::::::::::::::::::::::::::: 58 | 59 | ### Keeping track of your analyses 60 | 61 | When you're working with spreadsheets, during data clean up or analyses, it's 62 | very easy to end up with a spreadsheet that looks very different from the one 63 | you started with. In order to be able to reproduce your analyses or figure out 64 | what you did when Reviewer #3 asks for a different analysis, you should 65 | 66 | - create a new file or tab with your cleaned or analyzed data. Don't modify 67 | the original dataset, or you will never know where you started! 68 | - keep track of the steps you took in your clean up or analysis. You should track 69 | these steps as you would any step in an experiment. You can 70 | do this in another text file, or a good option is to create a new tab in your spreadsheet 71 | with your notes. This way the notes and data stay together. 72 | 73 | Put these principles in to practice today during the exercises. 74 | 75 | ### Tidy data in spreadsheets 76 | 77 | The tidy data principles when structuring data in spreadsheets are: 78 | 79 | 1. Put all your variables in columns - the thing you're measuring, 80 | like 'weight' or 'temperature'. 81 | 2. Put each observation in its own row. 82 | 3. Don't combine multiple pieces of information in one 83 | cell. Sometimes it just seems like one thing, but think if that's 84 | the only way you'll want to be able to use or sort that data. 85 | 4. Leave the raw data raw - don't change it! 86 | 5. Export the cleaned data to a text-based format like CSV (comma-separated values) format. This 87 | ensures that anyone can use the data, and is required by 88 | most data repositories. 89 | 90 | You can understand more easily these principles with the illustrations in the [Tidy Data Series by Lowndes & Horst](https://allisonhorst.com/other-r-fun). 91 | 92 | For instance, we're going to be working with data from a study of 93 | agricultural practices among farmers in two countries in eastern 94 | sub-Saharan Africa (Mozambique and Tanzania). Researchers conducted 95 | interviews with farmers in these countries to collect data on 96 | household statistics (e.g., number of household members, 97 | number of meals eaten per day, availability of water), 98 | farming practices (e.g., water usage), and assets (e.g., number of farm plots, 99 | number of livestock). They also recorded the dates and locations of 100 | each interview. 101 | 102 | If they were to keep track of the data like this: 103 | 104 | ![](fig/multiple-info.png){alt='multiple-info example'} 105 | 106 | the problem is that number of livestock and type of livestock are in 107 | the same field. So, if they wanted to 108 | look at the average number of livestock owned, or the average number of each type 109 | of livestock, 110 | it would be hard to do this using this data setup. If instead we put the count 111 | of each type of livestock in its own column, this would make analysis 112 | much easier. The rule of thumb, when setting up a datasheet, is that each 113 | variable (in this case, each type of livestock) should have its own column, 114 | each observation should have its own row, and each cell should contain only a 115 | single value. Thus, the example above should look like this: 116 | 117 | ![](fig/single-info.png){alt='single-info example'} 118 | 119 | Notice that this now allows us to make statements about the number of each type of 120 | animal that a farmer owns, while still allowing us to say things about the 121 | total number of livestock. All we need to do is sum the values in each row to 122 | find a total. We'll be learning how to do this computationally and reproducibly 123 | later in this workshop. 124 | 125 | ::::::::::::::::::::::::::::::::::::::::: callout 126 | 127 | ## Workshop Data 128 | 129 | > The data used in these lessons are taken from interviews of farmers in two 130 | > countries in eastern sub-Saharan Africa (Mozambique and Tanzania). These 131 | > interviews were conducted between November 2016 and June 2017 and probed 132 | > household features (e.g., construction materials used, number of household 133 | > members), agricultural practices (e.g., water usage), and assets (e.g., number 134 | > and types of livestock). 135 | 136 | This is a real dataset, however, it has been simplified for this workshop. If 137 | you're interested in exploring the full dataset further, you can download 138 | it from Figshare and work with it using exactly the same tools we'll learn 139 | about today. 140 | 141 | For more information about the dataset and to download it from Figshare, check 142 | out the [Social Sciences workshop data 143 | page](https://www.datacarpentry.org/socialsci-workshop/data). 144 | 145 | 146 | :::::::::::::::::::::::::::::::::::::::::::::::::: 147 | 148 | ::::::::::::::::::::::::::::::::::::::::: callout 149 | 150 | ## LibreOffice Users 151 | 152 | The default for LibreOffice is to treat tabs, commas, and semicolons as delimiters. 153 | This behavior can cause problems with both the data for this lesson and other data 154 | you might want to use. This can be fixed when opening LibreOffice by deselecting 155 | the "semicolons" and "tabs" checkboxes. 156 | 157 | 158 | :::::::::::::::::::::::::::::::::::::::::::::::::: 159 | 160 | ::::::::::::::::::::::::::::::::::::::: challenge 161 | 162 | ## Exercise 163 | 164 | We're going to take a messy version of the SAFI data and describe how we would clean it up. 165 | 166 | 1. Download the [messy data](https://ndownloader.figshare.com/files/11502824). 167 | 2. Open up the data in a spreadsheet program. 168 | 3. Notice that there are two tabs. Two researchers conducted the interviews, 169 | one in Mozambique and the other in Tanzania. They both structured their 170 | data tables in a different way. Now, you're the person in charge of this 171 | project and you want to be able to start analyzing the data. 172 | 4. With the person next to you, identify what is wrong with this spreadsheet. 173 | Discuss the steps you would need to take to clean up the two tabs, and to 174 | put them all together in one spreadsheet. 175 | 176 | **Important** Do not forget our first piece of advice, to create a new file 177 | (or tab) for the cleaned data, never modify your original (raw) data. 178 | 179 | After you go through this exercise, we'll discuss as a group what was wrong 180 | with this data and how you would fix it. 181 | 182 | ::::::::::::::: solution 183 | 184 | ## Solution 185 | 186 | - Take about 10 minutes to work on this exercise. 187 | - All the mistakes listed in [the next episode](02-common-mistakes.md) are 188 | present in the messy dataset. If this 189 | exercise is done during a workshop, ask people what they saw as wrong with 190 | the data. As they bring up different points, you can refer to [the next episode](02-common-mistakes.md) 191 | or expand a bit on the point they brought up. 192 | 193 | 194 | 195 | ::::::::::::::::::::::::: 196 | 197 | :::::::::::::::::::::::::::::::::::::::::::::::::: 198 | 199 | ::::::::::::::::::::::::::::::::::::::::: callout 200 | 201 | ## Handy References 202 | 203 | Three excellent references on spreadsheet organization are: 204 | 205 | - Hadley Wickham, *Tidy Data*, Vol. 59, Issue 10, Sep 2014, Journal of 206 | Statistical Software. [http://www.jstatsoft.org/v59/i10](https://www.jstatsoft.org/v59/i10) 207 | 208 | - Julia Lowndes \& Allison Horst, *Tidy Data Series by Lowndes & Horst*. [https://allisonhorst.com/other-r-fun](https://allisonhorst.com/other-r-fun) 209 | 210 | - Karl W. Broman \& Kara H. Woo, *Data Organization in Spreadsheets*, Vol. 72, 211 | Issue 1, 2018, The American Statistician. 212 | [https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989](https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989) 213 | 214 | 215 | :::::::::::::::::::::::::::::::::::::::::::::::::: 216 | 217 | ### Metadata 218 | 219 | Recording data about your data ("metadata") is essential. You may be on intimate 220 | terms with your dataset while you are 221 | collecting and analysing it, but the chances that you will still remember 222 | the exact wording of the question you asked about your 223 | informants' water use (the data recorded in the column `water use`), for 224 | example, are slim. 225 | 226 | As well, there are many reasons other people may want to examine or use your data - to understand your findings, to verify your findings, 227 | to review your submitted publication, to replicate your results, to design a 228 | similar study, or even to archive your data for access and 229 | re-use by others. While digital data by definition are machine-readable, 230 | understanding their meaning is a job for human beings. The 231 | importance of documenting your data during the collection and analysis phase of 232 | your research cannot be overestimated, especially if your 233 | research is going to be part of the scholarly record. 234 | 235 | However, metadata should not be contained in the data file itself. Unlike a table 236 | in a paper or a supplemental file, metadata (in the 237 | form of legends) should not be included in a data file since this information is 238 | not data, and including it can disrupt how computer 239 | programs interpret your data file. Rather, metadata should be stored as a 240 | separate file in the same directory as your data file, 241 | preferably in plain text format with a name that clearly associates it with your 242 | data file. Because metadata files are free text format, 243 | they also allow you to encode comments, units, information about how null values 244 | are encoded, etc. that are important to document but can 245 | disrupt the formatting of your data file. 246 | 247 | Some of this information may be familiar to learners who conduct analyses on 248 | survey data or other data sets that come with codebooks. Codebooks will often 249 | describe the way a variable has been constructed, what prompt was associated with 250 | it in a survey or interview, and what the meaning of various values are. For example, 251 | the [General Social Survey](https://gss.norc.org) maintains their entire codebook online. 252 | Looking at an entry for a particular variable, such as 253 | [the variable `SEX`](https://gssdataexplorer.norc.org/variables/81/vshow), provides 254 | valuable information about what survey waves the variable covers, and the meaning 255 | of particular values. 256 | 257 | Additionally, file or database level metadata describes how files that make up 258 | the dataset relate to each other; what format they are 259 | in; and whether they supersede or are superseded by previous files. A 260 | folder-level readme.txt file is the classic way of accounting for 261 | all the files and folders in a project. 262 | 263 | Metadata are most useful when they follow a standard. For example, the 264 | [Data Documentation Initiative (DDI)](https://www.ddialliance.org) provides a 265 | standardized way to document metadata at various points in the research cycle. 266 | Research librarians may have specific expertise in this area, and can be 267 | helpful resources for thinking about ways to purposefully document metatdata 268 | as part of your research. 269 | 270 | (Text on metadata adapted from the online course [MANTRA - Research Data Management Training](https://mantra.ed.ac.uk/) by Research Data Service and the Institute for Academic Development, University of Edinburgh. MANTRA is licensed under a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/).) 271 | 272 | ::::::::::::::::::::::::::::::::::::: instructor 273 | For the next exercise, learners will open a CSV file. 274 | Instructions on how to load the file correctly (ensuring the delimiter is 275 | interpreted properly regardless of the learner's computer settings) 276 | are provided in the 277 | [Quality Assurance episode](04-quality-assurance.md#restricting-data-to-a-numeric-range). 278 | 279 | 280 | ::::::::::::::::::::::::::::::::::::::::::::::::: 281 | 282 | ::::::::::::::::::::::::::::::::::::::: challenge 283 | 284 | ## Exercise 285 | 286 | Download a [clean version of this 287 | dataset](https://ndownloader.figshare.com/files/11492171) and open the file 288 | with your spreadsheet program. This data has many more variables that were not 289 | included in the messy spreadsheet and is formatted according to tidy data 290 | principles. 291 | 292 | Discuss this data with a partner and make a list of some of the types of 293 | metadata that should be recorded about this dataset. It may be helpful to 294 | start by asking yourself, "What is not immediately obvious to me about this 295 | data? What questions would I need to know the answers to in order to analyze 296 | and interpret this data?" 297 | 298 | ::::::::::::::: solution 299 | 300 | ## Solution 301 | 302 | Some types of metadata that should be recorded and made available with the 303 | data are: 304 | 305 | - the exact wording of questions used in the interviews (if interviews were 306 | structured) or general prompts used (if interviews were semi-structured) 307 | - a description of the type of data allowed in each column (e.g., the allowed 308 | range for numerical data with a restricted range, a list of allowed options 309 | for categorical variables, whether data in a numerical column should be 310 | continuous or discrete) 311 | - definitions of any categorical variables (e.g., definitions of 312 | "burntbricks" and "sunbricks") 313 | - definitions of what was counted as a "room", a "plot", etc. (e.g., was 314 | there a minimum size) 315 | - learners may come up with additional questions to add to this list 316 | 317 | 318 | 319 | ::::::::::::::::::::::::: 320 | 321 | :::::::::::::::::::::::::::::::::::::::::::::::::: 322 | 323 | 324 | 325 | :::::::::::::::::::::::::::::::::::::::: keypoints 326 | 327 | - Never modify your raw data. Always make a copy before making any changes. 328 | - Keep track of all of the steps you take to clean your data. 329 | - Organize your data according to tidy data principles. 330 | - Record metadata in a separate plain text file. 331 | 332 | :::::::::::::::::::::::::::::::::::::::::::::::::: 333 | 334 | 335 | -------------------------------------------------------------------------------- /episodes/02-common-mistakes.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Formatting Problems 3 | teaching: 20 4 | exercises: 0 5 | --- 6 | 7 | ::::::::::::::::::::::::::::::::::::::: objectives 8 | 9 | - Recognize and resolve common spreadsheet formatting problems. 10 | 11 | :::::::::::::::::::::::::::::::::::::::::::::::::: 12 | 13 | :::::::::::::::::::::::::::::::::::::::: questions 14 | 15 | - What common mistakes are made when formatting spreadsheets? 16 | 17 | :::::::::::::::::::::::::::::::::::::::::::::::::: 18 | 19 | ## Common Spreadsheet Errors 20 | 21 | This lesson is meant to be used as a reference for discussion as learners identify issues with the messy dataset discussed in the 22 | previous lesson. Instructors: don't go through this lesson except to refer to responses to the exercise in the previous lesson. 23 | 24 | There are a few potential errors to be on the lookout for in your own data as well as data from collaborators or the Internet. If you are aware of the errors and the possible negative effect on downstream data analysis and result interpretation, it might motivate yourself and your project members to try and avoid them. Making small changes to the way you format your data in spreadsheets can have a great impact on efficiency and reliability when it comes to data cleaning and analysis. 25 | 26 | - [Using multiple tables](#tables) 27 | - [Using multiple tabs](#tabs) 28 | - [Not filling in zeros](#zeros) 29 | - [Using problematic null values](#null) 30 | - [Using formatting to convey information](#formatting) 31 | - [Using formatting to make the data sheet look pretty](#formatting-pretty) 32 | - [Placing comments or units in cells](#units) 33 | - [Entering more than one piece of information in a cell](#info) 34 | - [Using problematic field names](#field-name) 35 | - [Using special characters in data](#special) 36 | 37 | ## Using multiple tables {#tables} 38 | 39 | A common strategy is creating multiple data tables within 40 | one spreadsheet. This confuses the computer, so try to avoid doing this! 41 | When you create multiple tables within one 42 | spreadsheet, you're drawing false associations between things for the computer, 43 | which sees each row as an observation. You're also potentially using the same 44 | field name in multiple places, which will make it harder to clean your data up 45 | into a usable form. The example below depicts the problem: 46 | 47 | ![](fig/multiple-tables-example2.png){alt='multiple tables'} 48 | 49 | In the example above, the computer will see row 24 and assume that all columns A-J 50 | refer to the same sample. This row actually represents two distinct samples 51 | (information about livestock for informant 1 and information about plots for informant 2). Other rows are similarly problematic. 52 | 53 | ## Using multiple tabs {#tabs} 54 | 55 | But what about workbook tabs? That seems like an easy way to organize data, right? Well, yes and no. When you create extra tabs, you fail 56 | to allow the computer to see connections in the data that are there (you have to introduce spreadsheet application-specific functions or 57 | scripting to ensure this connection). 58 | 59 | Say you make a separate tab for each day you take a measurement. This isn't good practice for two reasons: 60 | 61 | 1) you are more likely to accidentally add inconsistencies to your data if each time you take a measurement, you start recording data in a new tab, and 62 | 2) even if you manage to prevent all inconsistencies from creeping in, you will add an extra step for yourself before you analyze the 63 | data because you will have to combine these data into a single datatable. You will have to explicitly tell the computer how to combine 64 | tabs - and if the tabs are inconsistently formatted, you might even have to do it manually. 65 | 66 | For these and other reasons, it is good practice to avoid creating new tabs to organize your spreadsheet data. The next time you're entering data, and you go to create another tab or table, ask yourself if you could avoid adding this tab by adding another column to your original spreadsheet. You may, however, use a new tab to store notes about your data, such as steps you've taken to clean or manipulate your data. 67 | 68 | Your data sheet might get very long over the course of the experiment. This makes it harder to enter data if you can't see your headers 69 | at the top of the spreadsheet. But don't repeat your header row. These can easily get mixed into the data, 70 | leading to problems down the road. 71 | 72 | Instead you can freeze the column headers so that they remain visible even when you have a spreadsheet with many rows. 73 | 74 | [Documentation on how to freeze column headers](https://support.office.com/en-ca/article/Freeze-column-headings-for-easy-scrolling-57ccce0c-cf85-4725-9579-c5d13106ca6a) 75 | 76 | ## Not filling in zeros {#zeros} 77 | 78 | It might be that when you're measuring something, it's 79 | usually a zero, say the number of cows that an informant has, in a 80 | region where most farmers have goats and no cows. Why bother 81 | writing in the number zero in that column, when it's mostly zeros? 82 | 83 | ![](fig/zeros-example.png){alt='filling in zeros'} 84 | 85 | However, there's a difference between a zero and a blank cell in a spreadsheet. To the computer, a zero is actually data. You measured 86 | or counted it. A blank cell means that it wasn't measured and the computer will interpret it as an unknown value (otherwise known as a 87 | null value). 88 | 89 | The spreadsheets or statistical programs will likely mis-interpret blank cells that you intend to be zeros. By not entering the value of 90 | your observation, you are telling your computer to represent that data as unknown or missing (null). This can cause problems with 91 | subsequent calculations or analyses. For example, the average of a set of numbers which includes a single null value is always null 92 | (because the computer can't guess the value of the missing observations). Because of this, it's very important to record zeros as zeros and truly missing data as nulls. 93 | 94 | ## Using problematic null values {#null} 95 | 96 | **Example**: using -999 or other numerical values (or zero) to represent missing data. 97 | 98 | **Solution**: One common practice is to record unknown or missing data as -999, 999, or 0. Many statistical programs will not recognize 99 | that these are intended to represent missing (null) values. How these values are interpreted will depend on the software you use to 100 | analyze your data. It is essential to use a clearly defined and consistent null indicator. 101 | Blanks (most applications) and NA (for R) are good choices. White et al., 2013, explain good choices for indicating null values for different software applications in their article: 102 | [Nine simple ways to make it easier to (re)use your data.](https://ojs.library.queensu.ca/index.php/IEE/article/view/4608) Ideas in Ecology and Evolution. 103 | 104 | | Null Values | Problems | Compatibility | Recommendation | 105 | | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------- | -------------- | 106 | | 0 | Indistinguishable from a true zero | | NEVER use | 107 | | Blank | Hard to distinguish values that are missing from those overlooked on entry. Hard to distinguish blanks from spaces, which behave differently. | R, Python, SQL, Excel | Best option | 108 | | \-999, 999 | Not recognized as null by many programs without user input. Can be inadvertently entered into calculations. | | Avoid | 109 | | NA, na | Can also be an abbreviation (e.g., North America), can cause problems with data type (turn a numerical column into a text column). NA is more commonly recognized than na. | R | Good option | 110 | | N/A | An alternate form of NA, but often not compatible with software. | | Avoid | 111 | | NULL | Can cause problems with data type. | SQL | Good option | 112 | | None | Uncommon. Can cause problems with data type. | Python | Avoid | 113 | | No data | Uncommon. Can cause problems with data type, contains a space. | | Avoid | 114 | | Missing | Uncommon. Can cause problems with data type. | | Avoid | 115 | | \-, +, . | Uncommon. Can cause problems with data type. | | Avoid | 116 | 117 | ## Using formatting to convey information {#formatting} 118 | 119 | **Example**: highlighting cells, rows or columns that should be excluded from an analysis, and leaving blank rows to indicate separations in data. 120 | 121 | ![](fig/bad-formatting.png){alt='formatting'} 122 | 123 | **Solution**: create a new field to encode which data should be excluded. 124 | 125 | ![](fig/better-formatting.png){alt='good formatting'} 126 | 127 | ## Using formatting to make the data sheet look pretty {#formatting-pretty} 128 | 129 | **Example**: merging cells. 130 | 131 | **Solution**: If you're not careful, formatting a worksheet to be more aesthetically pleasing can compromise your computer's ability to 132 | see associations in the data. Merged cells will make your data unreadable by statistics software. Consider restructuring your data in 133 | such a way that you will not need to merge cells to organize your data. 134 | 135 | ## Placing comments or units in cells {#units} 136 | 137 | **Example**: Some of your informants only irrigate their plots at certain times of the year. You've added this information as notes directly into the cell with the data. 138 | 139 | **Solution**: Most analysis software can't see Excel or LibreOffice comments, and would be confused by comments placed within your data 140 | cells. As described above for formatting, create another field if you need to add notes to cells. Similarly, don't include units in 141 | cells: ideally, all the measurements you place in one column should be in the same unit, but if for some reason they aren't, create 142 | another field and specify the units the cell is in. 143 | 144 | ![](fig/comments-in-cells.png){alt='comments in cells'} 145 | 146 | ## Entering more than one piece of information in a cell {#info} 147 | 148 | **Example**: Your informant has multiple livestock of different types. You record this information as "3, (oxen , cows)" to indicate that there are three total livestock, which is a mixture of oxen and cows. 149 | 150 | **Solution**: Don't include more than one piece of information in a cell. This will limit the ways in which you can analyze your data. 151 | If you need both these types of information (the total number of animals and the types), design your data sheet to include this information. For example, include a separate column for each type of livestock. 152 | 153 | ## Using problematic field names {#field-name} 154 | 155 | Choose descriptive field names, but be careful not to include spaces, numbers, or special characters of any kind. Spaces can be 156 | misinterpreted by parsers that use whitespace as delimiters and some programs don't like field names that are text strings that start 157 | with numbers. 158 | 159 | Underscores (`_`) are a good alternative to spaces. Consider writing names in camel case (like this: ExampleFileName) to improve 160 | readability. Remember that abbreviations that make sense at the moment may not be so obvious in 6 months, but don't overdo it with names 161 | that are excessively long. Including the units in the field names avoids confusion and enables others to readily interpret your variable names. Avoid starting variable names with numbers, as this may cause problems with some analysis software. 162 | 163 | **Examples** 164 | 165 | | Good Name | Good Alternative | Avoid | 166 | |--------------|-------------------|----------------| 167 | | wall_type | WallType | wall type | 168 | | longitude | GpsLongitude | gps:Longitude | 169 | | gender | gender | M/F | 170 | | Informant_01 | first_informant | 1st Inf | 171 | | age_18 | years18 | 18years | 172 | 173 | ## Using special characters in data {#special} 174 | 175 | **Example**: You treat your spreadsheet program as a word processor when writing notes, for example copying data directly from Word or 176 | other applications. 177 | 178 | **Solution**: This is a common strategy. For example, when writing longer text in a cell, people often include line breaks, em-dashes, 179 | etc in their spreadsheet. Also, when copying data in from applications such as Word, formatting and fancy non-standard characters (such 180 | as left- and right-aligned quotation marks) are included. When exporting this data into a coding/statistical environment or into a 181 | relational database, dangerous things may occur, such as lines being cut in half and encoding errors being thrown. 182 | 183 | General best practice is to avoid adding characters such as newlines, tabs, and vertical tabs. In other words, treat a text cell as if 184 | it were a simple web form that can only contain text and spaces. 185 | 186 | 187 | 188 | :::::::::::::::::::::::::::::::::::::::: keypoints 189 | 190 | - Avoid using multiple tables within one spreadsheet. 191 | - Avoid spreading data across multiple tabs (but do use a new tab to record data cleaning or manipulations). 192 | - Record zeros as zeros. 193 | - Use an appropriate null value to record missing data. 194 | - Don't use formatting to convey information or to make your spreadsheet look pretty. 195 | - Place comments in a separate column. 196 | - Record units in column headers. 197 | - Include only one piece of information in a cell. 198 | - Avoid spaces, numbers and special characters in column headers. 199 | - Avoid special characters in your data. 200 | 201 | :::::::::::::::::::::::::::::::::::::::::::::::::: 202 | 203 | 204 | -------------------------------------------------------------------------------- /episodes/03-dates-as-data.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Dates as Data 3 | teaching: 10 4 | exercises: 10 5 | --- 6 | 7 | ::::::::::::::::::::::::::::::::::::::: objectives 8 | 9 | - Recognise problematic or suspicious date formats. 10 | - Use formulas to separate dates into their component values (e.g., Year, Month, Day). 11 | 12 | :::::::::::::::::::::::::::::::::::::::::::::::::: 13 | 14 | :::::::::::::::::::::::::::::::::::::::: questions 15 | 16 | - What are good approaches for handling dates in spreadsheets? 17 | 18 | :::::::::::::::::::::::::::::::::::::::::::::::::: 19 | 20 | ## Date formats in spreadsheets 21 | 22 | Dates in spreadsheets are often stored in a single column. 23 | 24 | While this seems like a logical way to record dates when you are entering them, or visually reviewing data, it's not actually a best practice for preparing data for analysis. 25 | 26 | When working with data, your goal is to have as little ambiguity as possible. Ambiguity can creep into your data when working with dates when there are regional variations either in your observations or when you or your team might be working with different versions or suites of software products (e.g., LibreOffice, Microsoft Excel, Gnumeric). 27 | 28 | To avoid ambiguity between regional differences in date formatting and compatibility across spreadsheet software programs, a good practice is to divide dates into components in different columns - YEAR, MONTH, and DAY. 29 | 30 | When working with dates it's also important to remember that functions are guaranteed to be compatible only within the same family of software products (e.g., LibreOffice, Microsoft Excel, Gnumeric). If you need to export your data and conserve the timestamps, you are better off handling dates using one of the solutions discussed below than the single column method. 31 | 32 | One of the other reasons dates can be tricky is that most spreadsheet programs have "useful features" which can change the way dates are displayed - but not stored. The image below demonstrates some of the many date formatting options in Excel. 33 | 34 | ![](fig/excel_dates_1.jpg){alt='Many formats, many ambiguities'} 35 | 36 | ## Dates stored as integers 37 | 38 | The first thing you need to know is that Excel stores dates as numbers - see the last column in the above figure. This serial number represents the number of days from December 31, 1899. In the example, July 2, 2014 is stored as the serial number 41822. 39 | 40 | Using functions we can add days, months or years to a given date. 41 | Say you had a research plan where you needed to conduct interviews with a 42 | set of informants every ninety days for a year. 43 | 44 | In our example above, in a new cell you can type: 45 | 46 | \=B2+90 47 | 48 | And it would return 49 | 50 | 30-Sep 51 | 52 | because it understands the date as a number `41822`, and `41822 + 90 = 41912` 53 | which Excel interprets as the 30th day of September, 2014. In most cases, it retains the format of the cell that is being operated upon. Month and year rollovers are internally tracked and applied. 54 | 55 | ## Regional date formatting 56 | 57 | When you enter a date into a spreadsheet it looks like a date although the spreadsheet program may 58 | display different text from what you input. It does this to be 'helpful' but it often is not. 59 | 60 | For example if you enter '7/12/88' into your 61 | Excel spreadsheet it may display as '07/12/1988' (depending on your version of Excel). These 62 | are different ways of formatting the same date. 63 | 64 | Different countries also write dates differently. If you are in the UK, for example, you will interpret 65 | the date above as the 7th day of December, however a researcher from the US will interpret the same entry as the 12th day of July. This regional variation is handled automatically by your 66 | spreadsheet program so that when you are typing in dates they appear as you would expect. If you 67 | try to type in a US format date into a UK version of Excel, it may or may not be treated as a 68 | date. 69 | 70 | This regional variation is one good reason to treat dates, not as a single data point, but as 71 | three distinct pieces of data (year, month, and day). Separating dates into their component parts 72 | will avoid this confusion, while also giving the added benefit of allowing you to compare, for 73 | example data collected in January of multiple years with data collected in February of multiple years. 74 | 75 | ::::::::::::::::::::::::::::::::::::::: challenge 76 | 77 | ## Separating dates into components 78 | 79 | Download and open the [SAFI\_dates.xlsx](https://ndownloader.figshare.com/files/11502827) file. This file 80 | contains a subset of the data from the SAFI interviews, including the dates on which the 81 | interviews were conducted. 82 | 83 | Choose the tab of the spreadsheet that corresponds to the way you format dates in your 84 | location (either day first `DD_MM_YEAR`, or month first `MM_DD_YEAR`). 85 | 86 | Extract the components of the date to new columns. For this we 87 | can use the built-in Excel functions: 88 | 89 | `=YEAR()` 90 | `=MONTH()` 91 | `=DAY()` 92 | 93 | Apply each of these formulas to its entire column. 94 | Make sure the new column is formatted as a number and not as a date. 95 | 96 | We now have each component of our date isolated in its own column. This will allow us 97 | to group our data with respect to year, month, or day of month for our analyses and will 98 | also prevent problems when passing data between different versions of spreadsheet 99 | software (as for example when sharing data with collaborators in different countries). 100 | 101 | ::::::::::::::: solution 102 | 103 | ## Solution 104 | 105 | ![](fig/solution_exercise_1_dates.png){alt='dates exercise 1'} 106 | 107 | Note that this solution shows the dates in `MM_DD_YEAR` format. 108 | 109 | 110 | 111 | ::::::::::::::::::::::::: 112 | 113 | :::::::::::::::::::::::::::::::::::::::::::::::::: 114 | 115 | ::::::::::::::::::::::::::::::::::::::: challenge 116 | 117 | ## Default year 118 | 119 | Using the same spreadsheet you used for the previous exercise, add another data point 120 | in the `interview_date` column by typing either `11/17` (if your location uses `MM/DD` formatting) 121 | or `17/11` (if your location uses `DD/MM` formatting). The `Year`, `Month`, and `Day` columns 122 | should populate for this new data point. What year is shown in the `Year` column? 123 | 124 | ::::::::::::::: solution 125 | 126 | ## Solution 127 | 128 | If no year is specified, the spreadsheet program will assume you mean the current year 129 | and will insert that value. This may be incorrect if you are working with historical data so 130 | be very cautious when working with data that does not have a year specified within its date 131 | variable. 132 | 133 | 134 | 135 | ::::::::::::::::::::::::: 136 | 137 | :::::::::::::::::::::::::::::::::::::::::::::::::: 138 | 139 | ## Historical data 140 | 141 | Excel is unable to parse dates from before 1899-12-31, and will thus leave these untouched. If you're mixing historic data 142 | from before and after this date, Excel will translate only the post-1900 dates into its internal format, thus resulting in mixed data. If you're working with historic data, be extremely careful with your dates! 143 | 144 | 145 | 146 | :::::::::::::::::::::::::::::::::::::::: keypoints 147 | 148 | - Use extreme caution when working with date data. 149 | - Splitting dates into their component values can make them easier to handle. 150 | 151 | :::::::::::::::::::::::::::::::::::::::::::::::::: 152 | 153 | 154 | -------------------------------------------------------------------------------- /episodes/04-quality-assurance.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Quality Assurance 3 | teaching: 15 4 | exercises: 10 5 | --- 6 | 7 | ::::::::::::::::::::::::::::::::::::::: objectives 8 | 9 | - Apply quality assurance techniques to limit incorrect data entry. 10 | 11 | :::::::::::::::::::::::::::::::::::::::::::::::::: 12 | 13 | :::::::::::::::::::::::::::::::::::::::: questions 14 | 15 | - How can we carry out basic quality assurance in spreadsheets? 16 | 17 | :::::::::::::::::::::::::::::::::::::::::::::::::: 18 | 19 | When you have a well-structured data table, you can use several simple 20 | techniques within your spreadsheet to ensure the data you enter is 21 | free of errors. 22 | 23 | ## Validating data on input 24 | 25 | When we input data into a cell of a spreadsheet we are typically not constrained in the type of data we enter. 26 | In any one column, the spreadsheets software will not warn us if we start to enter a mix of text, numbers or dates in different rows. 27 | Even if we are not facing constraints from the software, as a researcher we often anticipate that all data in one column will be of a certain type. 28 | It is also possible that the nature of the data contained in the table allows us to place additional restrictions on the acceptable values for cells in a column. 29 | For example a column recording age in years should be numeric, greater than 0 and is unlikely to be greater than 120. 30 | 31 | Excel allows us to specify a variety of data validations to be applied to cell contents. 32 | If the validation fails, an error is raised and the data we entered does not go into the particular cell. 33 | 34 | We will be working with a couple of examples of data validation 35 | rules but many others exist. For an overview of data validation rules 36 | available, check out the [Excel support page on data validation](https://support.office.com/en-us/article/Apply-data-validation-to-cells-29FECBCC-D1B9-42C1-9D76-EFF3CE5F7249) or the [Validating cell contents section of the LibreOffice Calc Guide](https://books.libreoffice.org/en/CG24/CG2402-EnteringandEditingData.html#toc28). 37 | 38 | We will look at two examples: 39 | 40 | 1. Restricting data to a numeric range 41 | 2. Restricting data to entries from a list 42 | 43 | ### Restricting data to a numeric range 44 | 45 | First, we'll open the [clean version of the SAFI dataset](https://ndownloader.figshare.com/files/11492171), 46 | which is a CSV file. CSV files are plain text files where the columns are separated 47 | by commas, hence 'comma separated values' or CSV. CSV is a format commonly used for tabular data, 48 | which we will discuss further in the next episode. 49 | 50 | To open this CSV file, one option is to double-click the file once it's in your Downloads folder. 51 | However, doing this can lead to different results depending on your computer's configuration. To avoid this, 52 | the following box shows a more reliable method to load the data in Excel or Calc. 53 | 54 | ::: group-tab 55 | 56 | ### Excel 57 | 58 | 1\. Open Excel and start a blank workbook. 59 | 60 | 2\. Select the `Data` tab. In the `Get & Transform Data` group, choose `Get Data` > `From File` > `From Text/CSV` 61 | 62 | 3\. In the pop-up window, navigate to the folder that contains your file, select the file, and click `Open`. 63 | 64 | 4\. In the new window, make sure the Delimiter is set to `Comma` at the top. Review the data preview. If everything looks correct, click `Load`. 65 | 66 | ![](fig/load-csv-excel.png){alt='Load CSV in Excel'} 67 | 68 | ### Calc 69 | 70 | 1\. Open LibreOffice Calc. 71 | 72 | 2\. Click `File` > `Open...` 73 | 74 | 3\. In the pop-up window, navigate to the folder that contains your file, select the file, and click `Open`. 75 | 76 | 4\. In the new window, you'll see several options. For now, make sure that under `Separator Options`, only `Separated by` > `Comma` is selected. Review the data preview. If everything looks correct, click `OK`. 77 | 78 | ![](fig/load-csv-calc.png){alt='Load CSV in Calc'} 79 | 80 | ::: 81 | 82 | When we open the file, we see that there are 83 | several columns with numeric data. One example of this is the column `no_membrs` 84 | representing the number of people in the household. We would expect this always 85 | to be a positive integer, and so we should reject values like `1.5` and `-8` as 86 | entry errors. We would also reject values over a certain maximum - for example 87 | an entry like `90` is probably the result of the researcher inputting `9` and 88 | their finger slipping and also hitting the `0` key. It is up to you as the 89 | researcher to decide what a reasonable maximum value would be for your data, 90 | here we will assume that there are no families with greater than 30 members. 91 | 92 | Let's start by opening the data validation feature using the `no_membrs` column. 93 | 94 | ::: group-tab 95 | 96 | ### Excel 97 | 98 | 1\. Select the `no_membrs` column. 99 | 100 | 2\. Select the `Data` tab, and in the `Data Tools` group, select `Data Validation` or `Validation Tools` (depending on your version of Excel). The following pop-up will appear: 101 | 102 | ![](fig/data-validation-tab-new.png){alt='Image of data validation tab in Excel'} 103 | 104 | 3\. Select 'Whole number' from the `Allow` drop down options. 105 | 106 | 4\. The window content will change. 107 | In the `Data` drop down box, check that 'between' is selected. `Minimum` and `Maximum` boxes will be provided for you to specify an allowed range. You will see this: 108 | 109 | ![](fig/data-validation-numbers-new.png){alt='Image of data validation tab for number rules in Excel'} 110 | 111 | 5\. Fill in the minimum and maximum values that make sense for your data and click `OK`. Here we will choose a minimum of 1 and a maximum of 30. 112 | 113 | ### Calc 114 | 115 | 1\. Select the `no_membrs` column. 116 | 117 | 2\. On the `Data` tab select `Validity...`. The following pop-up will appear: 118 | 119 | ![](fig/data-validation-tab-LibreOffice-new.png){alt='Image of data validation tab in LibreOffice'} 120 | 121 | 3\. Select 'Whole Numbers' from the `Allow` drop down options. 122 | 123 | 4\. The window content will change. 124 | In the `Data` drop down box, check that 'valid range' is selected. `Minimum` and `Maximum` boxes will be provided for you to specify an allowed range. You will see this: 125 | 126 | ![](fig/data-validation-numbers-LibreOffice-new.png){alt='Image of data validation tab in LibreOffice'} 127 | 128 | 5\. Fill in the minimum and maximum values that make sense for your data and click `OK`. Here we will choose a minimum of 1 and a maximum of 30. 129 | 130 | ::: 131 | 132 | 133 | Now your data table will not allow you to enter a value that violates 134 | the data validation rule you have created. To test this out, try 135 | to enter a new value into the `no_membrs` column that is not valid. 136 | The following error box will appear 137 | 138 | ::: group-tab 139 | 140 | ### Excel 141 | 142 | ![](fig/error-invalid-data-new.png){alt='Image of error message for inputing invalid data in Excel'} 143 | 144 | ### Calc 145 | 146 | ![](fig/error-invalid-data-LibreOffice-new.png){alt='Image of error message for inputing invalid data in LibreOffice'} 147 | 148 | ::: 149 | 150 | 151 | You can also customize the resulting message to be more informative by entering 152 | your own message in the `Error Alert` tab when creating a data validation rule. 153 | 154 | ::: group-tab 155 | 156 | ### Excel 157 | 158 | Check that the `Style` is 'Stop'. You can write 'Invalid number' as the `Title` 159 | and 'Number of households must be a whole number between 1 and 30' as the `Error Message`. 160 | 161 | ![](fig/error_alert-new.png){alt='Image of Error Alert tab in Excel'} 162 | 163 | Now check what happens if you try to enter an invalid value. 164 | 165 | ### Calc 166 | 167 | Check that the `Action` is 'Stop'. You can write 'Invalid number' as the `Title` 168 | and 'Number of households must be a whole number between 1 and 30' as the `Error Message`. 169 | 170 | ![](fig/error_alert_LibreOffice-new.png){alt='Image of Error Alert tab in LibreOffice'} 171 | 172 | 173 | Now check what happens if you try to enter an invalid value. 174 | 175 | ::: 176 | 177 | 178 | You can also have an `Input message` that warns users of the spreadsheet what values 179 | are accepted in cell that has data validation. 180 | 181 | ::: group-tab 182 | 183 | ### Excel 184 | 185 | Select the `Input Message` tab. Add the title and input message that is convenient for 186 | your task. In this example, we will write 'Household members' and 'Please enter a whole number between 1 and 30'. 187 | 188 | ![](fig/input_message-new.png){alt='Image of Input Message tab in Excel'} 189 | 190 | Now check what happens when you select a cell that has data validation. 191 | 192 | ### Calc 193 | 194 | Select the `Input Help` tab. Add the title and input message that is convenient for 195 | your task. In this example, we will write 'Household members' and 'Please enter a whole number between 1 and 30'. 196 | 197 | ![](fig/input_message_LibreOffice-new.png){alt='Image of Input Message tab in LibreOffice'} 198 | 199 | Now check what happens when you select a cell that has data validation. 200 | 201 | ::: 202 | 203 | 204 | ::::::::::::::::::::::::::::::::::::::: challenge 205 | 206 | ## Exercise 207 | 208 | Apply a new data validation rule to one of the other numeric 209 | columns in this data table. Discuss with the person sitting next 210 | to you what a reasonable rule would be for the column you've selected. Be sure to create an informative error alert and input message. 211 | 212 | 213 | :::::::::::::::::::::::::::::::::::::::::::::::::: 214 | 215 | ### Restricting data to entries from a list 216 | 217 | Quality assurance can make data entry easier as well as more robust. For 218 | example, if you use a list of options to restrict data entry, the spreadsheet 219 | will provide you with a drop-downlist of the available items. So, instead of 220 | trying to remember how to spell "mabatisloping", or whether or not you capitalized "cement" you can select the 221 | right option from the list. 222 | 223 | ::: group-tab 224 | 225 | ### Excel 226 | 227 | 1\. Select the `respondent_wall_type` column. 228 | 229 | 2\. Select the `Data` tab, and in the `Data Tools` group, select `Data Validation` or `Validation Tools` (depending on your version of Excel). 230 | 231 | 3\. Select `List` from the `Allow` drop-down menu. 232 | 233 | 4\. The window will change to include a `Source` box, you will see: 234 | 235 | ![](fig/select-range-of-values-new.png){alt='Image of selecting a range of values to allow in Excel'} 236 | 237 | 5\. Type a list of all the values that you want to be accepted in this column, separated by a comma (with no spaces). For us this will be "grass,muddaub,burntbricks,sunbricks,cement". 238 | 239 | 6\. Create a meaningful error alert and input message, then click 'OK'. In LibreOffice, there is no need to create an input message. 240 | 241 | 242 | ### Calc 243 | 244 | 1\. Select the `respondent_wall_type` column. 245 | 246 | 2\. On the `Data` tab select `Validity...`. 247 | 248 | 3\. Select `List` from the `Allow` drop-down menu. 249 | 250 | 4\. The window will change to include an `Entries` box, you will see: 251 | 252 | ![](fig/select-range-of-values-LibreOffice-new.png){alt='Image of selecting a range of values to allow in LibreOffice'} 253 | 254 | 5\. Type a list of all the values that you want to be accepted in this column, and insert a new line by clicking enter after each value. Make sure not to include spaces before or after the values. Your entries of grass, muddaub, burntbricks, sunbricks and cement should look like this: 255 | 256 | ![](fig/filled-range-of-values-LibreOffice-new.png){alt='Image of filled in range of values to allow in LibreOffice'} 257 | 258 | 6\. Create a meaningful error alert and input message, then click 'OK'. 259 | 260 | ::: 261 | 262 | We have now provided a restriction that will be validated each time we try and 263 | enter data into the selected cells. When a cell in this column is selected, a drop-down arrow will appear. 264 | When you click the arrow, you will be able to select a value from your list. 265 | If you type a value which is not on the list, you will get an error message. This not only prevents data input errors, but also makes it easier and faster to enter data. 266 | 267 | ::::::::::::::::::::::::::::::::::::::: challenge 268 | 269 | ## Exercise 270 | 271 | Apply a new data validation rule to one of the other categorical 272 | columns in this data table. Discuss with the person sitting next 273 | to you what a reasonable rule would be for the column you've selected. Be sure to create an informative input message. 274 | 275 | 276 | :::::::::::::::::::::::::::::::::::::::::::::::::: 277 | 278 | ::::::::::::::::::::::::::::::::::::::::: callout 279 | 280 | ## Tip 281 | 282 | Typing a list of values where only a few possible values exist (like "grass, muddaub, burntbricks, sunbricks, cement") might be convenient, but if the list is longer it makes sense to create it as a small table (in a separate tab of the workbook). 283 | We can give the table a name and then reference the table name as the source of acceptable inputs when the source box appears in the Data Validation pop-out. 284 | 285 | Using a table in this way makes the data entry process more flexible. 286 | If you add or remove contents from the table, then these are immediately reflected in any new cell entries based on this source. 287 | You can also have different cells refer to the same table of acceptable inputs. 288 | 289 | 290 | :::::::::::::::::::::::::::::::::::::::::::::::::: 291 | 292 | ::::::::::::::::::::::::::::::::::::::::: callout 293 | 294 | ## Tip 295 | 296 | In the examples above we have applied data validation rules to 297 | an existing spreadsheet to demonstrate how they work, however, 298 | you may have noticed that data validation rules are not applied 299 | retroactively to data that is already present in the cell. 300 | This means, for example, that if we had already entered `150` 301 | in the `no_membrs` column before applying our data validation 302 | rule, that cell would not be flagged with a warning. 303 | 304 | In some versions of Excel, you can click in the `Data` tab, and in the `Data Tools` group, 305 | click in the little drop-down arrow next to `Data Validation`, and then `Circle invalid data`. This will put red circles around invalid data entries. Note that it can be a bit slow with large data files. You can do the same in LibreOffice Calc by going to `Tools` tab, then `Detective` and 306 | selecting `Mark invalid data`. 307 | 308 | When using spreadsheets for data entry, it is a good idea to set up 309 | data validation rules for each column when you set up your 310 | spreadsheet (i.e. before you enter any data). 311 | 312 | 313 | :::::::::::::::::::::::::::::::::::::::::::::::::: 314 | 315 | 316 | 317 | :::::::::::::::::::::::::::::::::::::::: keypoints 318 | 319 | - Always copy your original spreadsheet file and work with a copy so you don't affect the raw data. 320 | - Use data validation to prevent accidentally entering invalid data. 321 | 322 | :::::::::::::::::::::::::::::::::::::::::::::::::: 323 | 324 | 325 | -------------------------------------------------------------------------------- /episodes/05-exporting-data.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Exporting Data 3 | teaching: 10 4 | exercises: 5 5 | --- 6 | 7 | ::::::::::::::::::::::::::::::::::::::: objectives 8 | 9 | - Store spreadsheet data in universal file formats. 10 | - Export data from a spreadsheet to a CSV file. 11 | 12 | :::::::::::::::::::::::::::::::::::::::::::::::::: 13 | 14 | :::::::::::::::::::::::::::::::::::::::: questions 15 | 16 | - How can we export data from spreadsheets in a way that is useful for downstream applications? 17 | 18 | :::::::::::::::::::::::::::::::::::::::::::::::::: 19 | 20 | Storing the data you're going to work with for your analyses in Excel 21 | default file format (`*.xls` or `*.xlsx` - depending on the Excel 22 | version) isn't a good idea. Why? 23 | 24 | - Because it is a proprietary format, and it is possible that in 25 | the future, technology won't exist (or will become sufficiently 26 | rare) to make it inconvenient, if not impossible, to open the file. 27 | 28 | - Other spreadsheet software may not be able to open files 29 | saved in a proprietary Excel format. 30 | 31 | - Different versions of Excel may handle data 32 | differently, leading to inconsistencies. 33 | 34 | - Finally, more journals and grant agencies are requiring you 35 | to deposit your data in a data repository, and most of them don't 36 | accept Excel format. It needs to be in one of the formats 37 | discussed below. 38 | 39 | - The above points also apply to other formats such as open data formats used by LibreOffice. These formats are not static and do not get parsed the same way by different software packages. 40 | 41 | As an example of inconsistencies in data storage, do you remember our earlier discussion about how Excel stores dates? It turns out that 42 | there are multiple defaults for different versions of the software, and you can switch between them all. So, say you're 43 | compiling Excel-stored data from multiple sources. There's dates in each file- Excel interprets them as their own internally consistent 44 | serial numbers. When you combine the data, Excel will take the serial number from the place you're importing it from, and interpret it 45 | using the rule set for the version of Excel you're using. Essentially, you could be adding errors to your data, and it wouldn't 46 | necessarily be flagged by any data cleaning methods if your ranges overlap. 47 | 48 | Storing data in a universal, open, and static format will help deal with this problem. Try tab-delimited (tab separated values 49 | or TSV) or comma-delimited (comma separated values or CSV). The advantage of a CSV file over an Excel/SPSS/etc. file is that we can open and read a CSV file 50 | using just about any software, including plain text editors like TextEdit or NotePad. 51 | Data in a CSV file can also be easily imported into other formats and 52 | environments, such as SQLite and R. We're not tied to a certain version of a certain expensive program when we work with CSV files, so 53 | it's a 54 | good format to work with for maximum portability and endurance. Most spreadsheet programs can save to delimited text formats like CSV 55 | easily, although they may give you a warning during the file export. 56 | 57 | To save a file you have opened in Excel in CSV format: 58 | 59 | 1. From the top menu select `File` and `Save as`. 60 | 2. In the `Format` field, from the list, select `Comma Separated Values` (`*.csv`). 61 | 3. Double check the file name and the location where you want to save it and hit `Save`. 62 | 63 | An important note for backwards compatibility: you can open CSV files in Excel! 64 | 65 | ![](fig/excel-to-csv.png){alt='Saving an Excel file to CSV'} 66 | 67 | ::::::::::::::::::::::::::::::::::::::::: callout 68 | 69 | ## A note on R and `xls` 70 | 71 | There are R packages that can read `xls` files (as well as 72 | Google spreadsheets). It is even possible to access different 73 | worksheets in the `xls` documents. However, because these 74 | packages parse data tables from proprietary and non-static 75 | software, there is no guarantee that they will continue to 76 | work on new versions of Excel. Exporting your data to CSV or TSV 77 | format is much safer and more reproducible. 78 | 79 | 80 | :::::::::::::::::::::::::::::::::::::::::::::::::: 81 | 82 | ::::::::::::::::::::::::::::::::::::::::: callout 83 | 84 | ## What to do when your data contain commas 85 | 86 | In some datasets, the data values themselves may include commas (,). In that 87 | case, you need to make sure that the commas are properly escaped when saving 88 | the file. Otherwise, the software which you use (including Excel) will most 89 | likely incorrectly display the data in columns. This is because the commas 90 | which are a part of the data values will be interpreted as delimiters. 91 | 92 | If you are working with data that contains commas, the fields should be 93 | enclosed with double quotes. The spreadsheet software should do the right 94 | thing [LibreOffice](https://www.libreoffice.org/download/download/) provides 95 | comprehensive options to import and export CSV files). However, it is always a 96 | good idea to double check that the file you are exporting can be read in 97 | correctly. For more of a discussion on data formats and potential issues with 98 | commas within datasets see [the Ecology Spreadsheets lesson discussion 99 | page](https://www.datacarpentry.org/spreadsheet-ecology-lesson/discuss). 100 | 101 | 102 | :::::::::::::::::::::::::::::::::::::::::::::::::: 103 | 104 | 105 | 106 | :::::::::::::::::::::::::::::::::::::::: keypoints 107 | 108 | - Data stored in common spreadsheet formats will often not be read correctly into data analysis software, introducing errors into your data. 109 | - Exporting data from spreadsheets to formats like CSV or TSV puts it in a format that can be used consistently by most programs. 110 | 111 | :::::::::::::::::::::::::::::::::::::::::::::::::: 112 | 113 | 114 | -------------------------------------------------------------------------------- /episodes/fig/bad-formatting.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/bad-formatting.png -------------------------------------------------------------------------------- /episodes/fig/better-formatting.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/better-formatting.png -------------------------------------------------------------------------------- /episodes/fig/comments-in-cells.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/comments-in-cells.png -------------------------------------------------------------------------------- /episodes/fig/data-validation-numbers-LibreOffice-new.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/data-validation-numbers-LibreOffice-new.png -------------------------------------------------------------------------------- /episodes/fig/data-validation-numbers-new.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/data-validation-numbers-new.png -------------------------------------------------------------------------------- /episodes/fig/data-validation-tab-LibreOffice-new.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/data-validation-tab-LibreOffice-new.png -------------------------------------------------------------------------------- /episodes/fig/data-validation-tab-new.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/data-validation-tab-new.png -------------------------------------------------------------------------------- /episodes/fig/error-invalid-data-LibreOffice-new.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/error-invalid-data-LibreOffice-new.png -------------------------------------------------------------------------------- /episodes/fig/error-invalid-data-new.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/error-invalid-data-new.png -------------------------------------------------------------------------------- /episodes/fig/error_alert-new.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/error_alert-new.png -------------------------------------------------------------------------------- /episodes/fig/error_alert_LibreOffice-new.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/error_alert_LibreOffice-new.png -------------------------------------------------------------------------------- /episodes/fig/excel-to-csv.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/excel-to-csv.png -------------------------------------------------------------------------------- /episodes/fig/excel_dates_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/excel_dates_1.jpg -------------------------------------------------------------------------------- /episodes/fig/filled-range-of-values-LibreOffice-new.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/filled-range-of-values-LibreOffice-new.png -------------------------------------------------------------------------------- /episodes/fig/input_message-new.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/input_message-new.png -------------------------------------------------------------------------------- /episodes/fig/input_message_LibreOffice-new.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/input_message_LibreOffice-new.png -------------------------------------------------------------------------------- /episodes/fig/load-csv-calc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/load-csv-calc.png -------------------------------------------------------------------------------- /episodes/fig/load-csv-excel.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/load-csv-excel.png -------------------------------------------------------------------------------- /episodes/fig/multiple-info.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/multiple-info.png -------------------------------------------------------------------------------- /episodes/fig/multiple-tables-example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/multiple-tables-example.png -------------------------------------------------------------------------------- /episodes/fig/multiple-tables-example2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/multiple-tables-example2.png -------------------------------------------------------------------------------- /episodes/fig/select-range-of-values-LibreOffice-new.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/select-range-of-values-LibreOffice-new.png -------------------------------------------------------------------------------- /episodes/fig/select-range-of-values-new.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/select-range-of-values-new.png -------------------------------------------------------------------------------- /episodes/fig/single-info.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/single-info.png -------------------------------------------------------------------------------- /episodes/fig/solution_exercise_1_dates.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/solution_exercise_1_dates.png -------------------------------------------------------------------------------- /episodes/fig/spreadsheet_simple_data_01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/spreadsheet_simple_data_01.png -------------------------------------------------------------------------------- /episodes/fig/spreadsheets_Data_validation_05.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/spreadsheets_Data_validation_05.png -------------------------------------------------------------------------------- /episodes/fig/white_table_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/white_table_1.jpg -------------------------------------------------------------------------------- /episodes/fig/zeros-example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datacarpentry/spreadsheets-socialsci/264dbfaacdbefbe72f9d62db393e730813aeef52/episodes/fig/zeros-example.png -------------------------------------------------------------------------------- /index.md: -------------------------------------------------------------------------------- 1 | --- 2 | permalink: index.html 3 | site: sandpaper::sandpaper_site 4 | --- 5 | 6 | Good data organization is the foundation of any research project. Most 7 | researchers have data in spreadsheets, so it's the place that many research 8 | projects start. 9 | 10 | Typically we organize data in spreadsheets in ways that we as humans want to work with the data. However 11 | computers require data to be organized in particular ways. In order 12 | to use tools that make computation more efficient, such as programming 13 | languages like R or Python, we need to structure our data the way that 14 | computers need the data. Since this is where most research projects start, 15 | this is where we want to start too! 16 | 17 | In this lesson, you will learn: 18 | 19 | - Good data entry practices - formatting data tables in spreadsheets 20 | - How to avoid common formatting mistakes 21 | - Approaches for handling dates in spreadsheets 22 | - Basic quality control and data manipulation in spreadsheets 23 | - Exporting data from spreadsheets 24 | 25 | In this lesson, however, you will *not* learn about data analysis with spreadsheets. 26 | Much of your time as a researcher will be spent in the initial 'data wrangling' 27 | stage, where you need to organize the data to perform a proper analysis later. 28 | It's not the most fun, but it is necessary. In this lesson you will 29 | learn how to think about data organization and some practices for more 30 | effective data wrangling. With this approach you can better format current data 31 | and plan new data collection so less data wrangling is needed. 32 | 33 | :::::::::::::::::::::::::::::::::::::::::: prereq 34 | 35 | ## Getting Started 36 | 37 | Data Carpentry's teaching is hands-on, so participants are encouraged to use 38 | their own computers to ensure the proper setup of tools for an efficient 39 | workflow.
**These lessons assume no prior knowledge of the skills or tools.** 40 | 41 | To get started, follow the directions in the "[Setup](learners/setup.md)" tab to 42 | download data to your computer and follow any installation instructions. 43 | 44 | #### Prerequisites 45 | 46 | This lesson requires a working copy of spreadsheet software, such as Microsoft 47 | Excel or LibreOffice or OpenOffice.org (see more details in "[Setup](learners/setup.md)"). 48 |
To most effectively use these materials, please make sure to install 49 | everything *before* working through this lesson. 50 | 51 | 52 | :::::::::::::::::::::::::::::::::::::::::::::::::: 53 | 54 | :::::::::::::::::::::::::::::::::::::::::: prereq 55 | 56 | ## For Instructors 57 | 58 | If you are teaching this lesson in a workshop, please see the 59 | [Instructor notes](instructors/instructor-notes.md). 60 | 61 | 62 | :::::::::::::::::::::::::::::::::::::::::::::::::: 63 | 64 | 65 | -------------------------------------------------------------------------------- /instructors/instructor-notes.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Instructor Notes 3 | --- 4 | 5 | ## Instructor notes 6 | 7 | ## Lesson motivation and learning objectives 8 | 9 | The purpose of this lesson is not to teach how to do data analysis in spreadsheets, 10 | but to teach good data organization and how to do some data cleaning and 11 | quality control in a spreadsheet program. 12 | 13 | ## Lesson design 14 | 15 | #### [Introduction](../episodes/00-intro.md) 16 | 17 | - Introduce that we're teaching data organization, and that we're using 18 | spreadsheets, because most people do data entry in spreadsheets or 19 | have data in spreadsheets. 20 | - Emphasize that we are teaching good practice in data organization and that 21 | this is the foundation of their research practice. Without organized and clean 22 | data, it will be difficult for them to apply the things we're teaching in the 23 | rest of the workshop to their data. 24 | - Much of their lives as a researcher will be spent on this 'data wrangling' stage, but 25 | some of it can be prevented with good strategies for data collection up front. 26 | - Tell that we're not teaching data analysis or plotting in spreadsheets, because it's 27 | very manual and also not reproducible. That's why we're teaching SQL, R, Python! 28 | - Now let's talk about spreadsheets, and when we say spreadsheets, we mean any program that 29 | does spreadsheets like Excel or LibreOffice. Most learners are probably using Excel. 30 | - Ask the audience any things they've accidentally done in spreadsheets. Talk about an example of your own, like that you accidentally sorted only a single column and not the rest 31 | of the data in the spreadsheet. What are the pain points!? 32 | - As people answer highlight some of these issues with spreadsheets 33 | 34 | #### [Formatting data](../episodes/01-format-data.md) 35 | 36 | - Introduce the dataset that will be used in this lesson, and in the other Social Sciences lessons, the [Studying African Farmer-led Irrigation (SAFI) Dataset](https://www.datacarpentry.org/socialsci-workshop/data). 37 | - Go through the point about keeping track of your steps and keeping raw data raw 38 | - Go through the cardinal rule of spreadsheets about columns, rows and cells 39 | - Hand them a messy data file and have them pair up and work together to clean up the data. 40 | *Give them 15 minutes to do this.* 41 | - Learners who are using LibreOffice for the workshop will have problems with the dataset 42 | as the default for LibreOffice is to treat tabs, commas, and semicolons as delimiters. This 43 | can be fixed when opening LibreOffice by deselecting the "semicolons" and "tabs" checkboxes. 44 | - Ask for what people did to clean the data. As they bring up different points you can 45 | refer to them in the [Common formatting problems](../episodes/02-common-mistakes.md) file, or expand a bit on the point they brought up. 46 | All these mistakes are present in the messy 47 | dataset. 48 | - If you get a response where they've fixed the date, you can pause and go to the 49 | [dates](../episodes/03-dates-as-data.md) lesson. Or you can say you'll come back to dates at the end. 50 | There's an exercise in that file about how to change the 51 | date into three columns using Excel's built in MONTH, DAY, YEAR functions. Have them 52 | run through that exercise. 53 | 54 | #### [Common formatting problems](../episodes/02-common-mistakes.md) 55 | 56 | - **Don't go through this chapter** except to refer to as responses to the exercise in 57 | the previous chapter. 58 | 59 | #### [Dates as data](../episodes/03-dates-as-data.md) 60 | 61 | - Do the exercise and make the point about dates either in response to a learner bringing 62 | up date as an issue during the responses, or at the end of the response time. 63 | - If learners are using a non-English language version of Excel, the `=MONTH()`, `=DAY()`, and other date 64 | functions won't work for them. They will need to type in their language's equivalent of that word in the formula. 65 | - The spreadsheet for this episode has two tabs. The first tab is data stored as `DD-MM-YYYY`, 66 | the second is `MM-DD-YYYY`. If learners use the wrong tab for their location, they will get a `#VALUE` error. 67 | - When using Libre Office, it is helpful to first save the file in ods format. Then be sure to convert 68 | the date column to type date by right clicking on the cell, choose "Format Cells..." then choose Date and 69 | take a type of date that uses `DD/MM/YYYY`, such as English (Botswana). Once you click ok, you will find that 70 | the date has been pre-pended by an apostrophe. For example 21/11/2016 becomes '21/11/2016. Edit the cell to 71 | remove the apostrophe. You will then find that the day(), month() and year() functions work. 72 | 73 | #### [Quality assurance](../episodes/04-quality-assurance.md) 74 | 75 | The challenge with this lesson is that the instructor's version of the spreadsheet software is going to look different than about half the room's. It makes 76 | it challenging to show where you can find menu options and navigate through. 77 | 78 | Instead discuss the concepts of quality control, and how things like sorting can help you find outliers in your data. 79 | 80 | #### [Exporting data](../episodes/05-exporting-data.md) 81 | 82 | - Have the students export their cleaned data as CSV. Reiterate again the need for 83 | data in this format for the other tools we'll be using. 84 | 85 | #### Concluding points 86 | 87 | - Now your data is organized so that a computer can read and understand it. This 88 | let's you use the full power of the computer for your analyses as we'll see in the 89 | rest of the workshop. 90 | - While your data is now neatly organized, it still might have errors or missing data 91 | or other problems. It's like you put all your data in the right drawers, but the 92 | drawers might still be messy. The next lesson is going to teach you OpenRefine which 93 | is great for data cleaning and for some of the quality control that we touched on 94 | in this lesson. It also has the advantage that it automatically keeps track of the 95 | steps you take. 96 | 97 | ## Technical tips and tricks 98 | 99 | Provide information on setting up your environment for learners to view your 100 | live coding (increasing text size, changing text color, etc), as well as 101 | general recommendations for working with coding tools to best suit the 102 | learning environment. 103 | 104 | ## Common problems 105 | 106 | #### Excel looks and acts different on different operating systems 107 | 108 | The main challenge with this lesson is that Excel looks very different and how you 109 | do things is even different between Mac and PC, and between different versions of 110 | Excel. So, the presenter's environment will only be the same as some of the learners. 111 | 112 | We need better notes and screenshots of how things work on both Mac and PC. But we 113 | likely won't be able to cover all the different versions of Excel. 114 | 115 | If you have a helper who has experience with the other OS than you, it would be good 116 | to prep them to help with this lesson and tell how people to do things in the other OS. 117 | 118 | #### Apple Numbers 119 | 120 | Apple Numbers does not have data validation, which is needed for part of this lesson. A note 121 | is included in the setup instructions pointing Numbers users to either Microsoft Excel 122 | or LibreOffice. 123 | 124 | #### People are not interactive or responsive on the Exercise 125 | 126 | This lesson depends on people working on the exercise and responding with things 127 | that are fixed. If your audience is reluctant to participate, start out with 128 | some things on your own, or ask a helper for their answers. This generally gets 129 | even a reluctant audience started. 130 | 131 | ## Common questions raised by participants 132 | 133 | ### How do you extract date components from the interview\_date field in SAFI\_clean.csv? 134 | 135 | The interview\_date field in SAFI\_clean.csv when saved to SAFI\_clean.xlsx is difficult to 136 | manage because there isn't a way to format the column as a date field, even using the 137 | custom field formats. The easiest solution to this question is to show the student how to 138 | extract the date information from the field. Make a new column and format it as a date. 139 | In the first cell of the new column type =LEFT(C2,10) and then apply this to the column. 140 | This function extracts the first 10 characters from the left side of the interview\_date 141 | field and inserts them into a new column. 142 | 143 | ### How would you automatically transform the items\_owned field into a usable format? 144 | 145 | If you are not following the course immediately with the OpenRefine lesson it is important 146 | to make it clear that in the current format SAFI\_clean.csv is not ready for analysis. 147 | The items\_owned column ideally needs to be split into separate yes / no / null columns. 148 | Example: set up a new column 'bicycle' and format it as a number. You then need to extract 149 | information from the items\_owned column about whether the word 'bicycle' is in the column. 150 | One way of doing this is to use an IF statement: =IF(ISNUMBER(SEARCH("bicycle",K2))1,0). 151 | The IF statement can include a wild character e.g. "bicy\*". 152 | 153 | 154 | -------------------------------------------------------------------------------- /learners/reference.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'Glossary' 3 | --- 4 | 5 | ## Glossary 6 | 7 | {:auto\_ids} 8 | cleaned data 9 | : data that has been manipulated post-collection to remove errors or inaccuracies, introduce desired formatting changes, or otherwise prepare the data for analysis 10 | 11 | conditional formatting 12 | : formatting that is applied to a specific cell or range of cells depending on a set of criteria 13 | 14 | CSV (comma separated values) format 15 | : a plain text file format in which values are separated by commas 16 | 17 | factor 18 | : a variable that takes on a limited number of possible values (i.e. categorical data) 19 | 20 | metadata 21 | : data which describes other data 22 | 23 | null value 24 | : a value used to record observations missing from a dataset 25 | 26 | observation 27 | : a single measurement or record of the object being recorded (e.g. the weight of a particular mouse) 28 | 29 | plain text 30 | : unformatted text 31 | 32 | quality assurance 33 | : any process which checks data for validity during entry 34 | 35 | quality control 36 | : any process which removes problematic data from a dataset 37 | 38 | raw data 39 | : data that has not been manipulated and represents actual recorded values 40 | 41 | rich text 42 | : formatted text (e.g. text that appears bolded, colored or italicized) 43 | 44 | string 45 | : a collection of characters (e.g. "thisisastring") 46 | 47 | TSV (tab separated values) format 48 | : a plain text file format in which values are separated by tabs 49 | 50 | variable 51 | : a category of data being collected on the object being recorded (e.g. a mouse's weight) 52 | 53 | 54 | -------------------------------------------------------------------------------- /learners/setup.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Setup 3 | --- 4 | 5 | :::::::::::::::::::::::::::::::::::::::::: prereq 6 | 7 | ## Data 8 | 9 | You need to download some files to follow this lesson: 10 | 11 | 1. Download the following three files: 12 | 13 | - [SAFI\_clean.csv](https://ndownloader.figshare.com/files/11492171) 14 | - [SAFI\_messy.xlsx](https://ndownloader.figshare.com/files/11502824) 15 | - [SAFI\_dates.xlsx](https://ndownloader.figshare.com/files/11502827) 16 | 17 | 2. Place these 3 files in a folder you can easily find and access on your 18 | computer (for instance in a `datacarpentry-spreadsheets` folder on your 19 | Desktop or within your Home folder). 20 | 21 | #### About the data 22 | 23 | For more information about the dataset and to 24 | download it from Figshare, check out the [Social Sciences workshop data 25 | page](https://www.datacarpentry.org/socialsci-workshop/data). 26 | 27 | 28 | :::::::::::::::::::::::::::::::::::::::::::::::::: 29 | 30 | :::::::::::::::::::::::::::::::::::::::::: prereq 31 | 32 | ## Software 33 | 34 | To work through this tutorial you will need access to a spreadsheet program. For this you have many options: [Microsoft Excel](https://www.microsoft.com/en-us/microsoft-365/excel), [LibreOffice](https://www.libreoffice.org/), [Apple Numbers](https://support.apple.com/numbers), [Gnumeric](http://www.gnumeric.org/), [Onlyoffice](https://www.onlyoffice.com/), [WPS office](https://www.wps.com/), among others. Commands may differ a bit between programs, but 35 | the general ideas for thinking about spreadsheets are the same. 36 | 37 | For this lesson, we encourage you to use LibreOffice or Microsoft Excel, as the tasks we will 38 | be doing have been tested in these programs. If you don't have Microsoft Excel, you can use 39 | LibreOffice. It's a free, open source spreadsheet program. Here are the instructions to install it: 40 | 41 | 42 | 43 | #### Windows 44 | 45 | - **Download the Installer** 46 | Install LibreOffice by going to the [installation 47 | page](https://www.libreoffice.org/download/download-libreoffice/). The 48 | version for Windows should automatically be selected. Click 49 | **Download**. You will go to a page that asks about a 50 | donation, but you don't need to make one. Your download should begin 51 | automatically. 52 | - **Install LibreOffice** 53 | Once the installer is downloaded, double click on it and it should 54 | install. 55 | 56 | #### Mac OS X 57 | 58 | - **Download the Installer** 59 | Install LibreOffice by going to the [installation 60 | page](https://www.libreoffice.org/download/download-libreoffice/). The 61 | version for macOS should automatically be selected. Click 62 | **Download**. You will go to a page that asks about a 63 | donation, but you don't need to make one. Your download should begin 64 | automatically. 65 | - **Install LibreOffice** 66 | The file *LibreOffice\_X.X.X\_MacOS\_x86-64* (whichever version of LibreOffice you have selected) should have been 67 | downloaded. Double click on this file, and LibreOffice will be 68 | installed. 69 | 70 | #### Linux 71 | 72 | - **Download the Installer** 73 | Install LibreOffice by going to the [installation 74 | page](https://www.libreoffice.org/download/download-libreoffice/). The 75 | version for Linux should automatically be selected. Click **Download**. You will go to a page that asks about a donation, 76 | but you don't need to make one. Your download should begin 77 | automatically. 78 | - **Install LibreOffice** 79 | Once the installer is downloaded, double click on it and it should 80 | install. 81 | 82 | 83 | :::::::::::::::::::::::::::::::::::::::::::::::::: 84 | 85 | 86 | -------------------------------------------------------------------------------- /profiles/learner-profiles.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: FIXME 3 | --- 4 | 5 | This is a placeholder file. Please add content here. 6 | -------------------------------------------------------------------------------- /site/README.md: -------------------------------------------------------------------------------- 1 | This directory contains rendered lesson materials. Please do not edit files 2 | here. 3 | --------------------------------------------------------------------------------