├── .Rbuildignore
├── .dockerignore
├── .github
    ├── CODE_OF_CONDUCT.md
    ├── CONTRIBUTING.md
    ├── ISSUE_TEMPLATE.md
    └── workflows
    │   └── bookdown.yaml
├── .gitignore
├── .htmlhintrc
├── .remarkrc
├── DESCRIPTION
├── Dockerfile
├── EDA.Rmd
├── LICENSE
├── NAMESPACE
├── README.md
├── WORDLIST
├── _bookdown.yml
├── _common.R
├── _config.yml
├── _output.yaml
├── appendixes.Rmd
├── bin
    ├── add-r4ds-links.R
    ├── build.sh
    ├── check-r4ds-sections.R
    ├── check-spelling.R
    ├── create-sitemap.R
    ├── create-toc.R
    ├── deploy.sh
    ├── hypothesis.r
    ├── is_primary_key.R
    ├── r4ds-toc.R
    ├── render.R
    ├── serve.R
    └── style.R
├── communicate.Rmd
├── contributions.Rmd
├── data
    ├── README.md
    ├── file1.csv
    ├── file2.csv
    └── file3.csv
├── datetimes.Rmd
├── diagrams
    ├── Lahman1.graffle
    ├── Lahman1.png
    ├── Lahman2.graffle
    ├── Lahman2.png
    ├── Lahman3.graffle
    ├── Lahman3.png
    ├── master-batting-salaries.graffle
    ├── master-batting-salaries.png
    ├── nested_set_1.dot
    ├── nested_set_2.dot
    ├── nycflights.graffle
    └── nycflights.png
├── docker-compose.yml
├── explore.Rmd
├── factors.Rmd
├── functions.Rmd
├── graphics-for-communication.Rmd
├── img
    ├── cover.png
    ├── r4ds-exercise-solutions-cover.key
    ├── r4ds-exercise-solutions-cover.png
    ├── rmarkdown-file.png
    ├── rmarkdown-knit-button.png
    ├── rmarkdown-notebook.png
    └── visualize
    │   ├── unnamed-chunk-29-1.png
    │   ├── unnamed-chunk-29-2.png
    │   ├── unnamed-chunk-29-3.png
    │   ├── unnamed-chunk-29-4.png
    │   ├── unnamed-chunk-29-5.png
    │   └── unnamed-chunk-29-6.png
├── import.Rmd
├── includes
    ├── hypothesis.html
    ├── ort.css
    ├── r4ds-solutions.css
    └── r4ds.css
├── index.Rmd
├── intro.Rmd
├── iteration.Rmd
├── many-models.Rmd
├── model-basics.Rmd
├── model-building.Rmd
├── model.Rmd
├── pipes.Rmd
├── program.Rmd
├── r4ds-exercise-solutions.Rproj
├── r4ds-toc.csv
├── r4ds.bib
├── relational-data.Rmd
├── rmarkdown-formats.Rmd
├── rmarkdown-workflow.Rmd
├── rmarkdown.Rmd
├── rmarkdown
    ├── caching.Rmd
    ├── cv.Rmd
    ├── diamond-sizes.Rmd
    └── example.Rmd
├── strings.Rmd
├── tibble.Rmd
├── tidy.Rmd
├── transform.Rmd
├── vectors.Rmd
├── visualize.Rmd
├── workflow-basics.Rmd
├── workflow-projects.Rmd
├── workflow-scripts.Rmd
└── wrangle.Rmd


/.Rbuildignore:
--------------------------------------------------------------------------------
 1 | ^\.github$
 2 | ^.*\.Rproj$
 3 | ^\.Rproj\.user$
 4 | ^.*\.R?md$
 5 | ^.*\.R$
 6 | ^.*\.ya?ml$
 7 | ^.*\.json$
 8 | ^_bookdown_files$
 9 | ^WORDLIST$
10 | ^.*\.html$
11 | ^.*\.css$
12 | ^bookdown[0-9a-f]+$
13 | ^docs/$
14 | ^node_modules$
15 | ^bin/$
16 | ^diagrams/$
17 | ^.*\.rds$
18 | ^r4ds\.(tex|bib)$
19 | ^\.dockerignore$
20 | ^\.remarkrc$
21 | ^_build/$
22 | ^.*\.utf8\.md$
23 | ^_build$
24 | ^img$
25 | ^includes$
26 | ^Dockerfile$
27 | ^diagrams$
28 | ^bin$
29 | ^Makefile$
30 | 


--------------------------------------------------------------------------------
/.dockerignore:
--------------------------------------------------------------------------------
1 | *
2 | !DESCRIPTION
3 | 


--------------------------------------------------------------------------------
/.github/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
 1 | # Contributor Code of Conduct
 2 | 
 3 | As contributors and maintainers of this project, we pledge to respect all people who 
 4 | contribute through reporting issues, posting feature requests, updating documentation,
 5 | submitting pull requests or patches, and other activities.
 6 | 
 7 | We are committed to making participation in this project a harassment-free experience for
 8 | everyone, regardless of level of experience, gender, gender identity and expression,
 9 | sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.
10 | 
11 | Examples of unacceptable behavior by participants include the use of sexual language or
12 | imagery, derogatory comments or personal attacks, trolling, public or private harassment,
13 | insults, or other unprofessional conduct.
14 | 
15 | Project maintainers have the right and responsibility to remove, edit, or reject comments,
16 | commits, code, wiki edits, issues, and other contributions that are not aligned to this 
17 | Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed 
18 | from the project team.
19 | 
20 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by 
21 | opening an issue or contacting one or more of the project maintainers.
22 | 
23 | This Code of Conduct is adapted from the [Contributor Covenant](http://contributor-covenant.org), version 1.0.0, available at
24 | <http://contributor-covenant.org/version/1/0/0>.
25 | 


--------------------------------------------------------------------------------
/.github/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing to R4DS Exercise Solutions
 2 | 
 3 | This outlines how to propose a change to *R for Data Science Exercise Solutions*.
 4 | 
 5 | ## Fixing typos
 6 | 
 7 | Small typos or grammatical errors in the book may be edited directly using
 8 | the GitHub web interface, so long as the changes are made in the _source_ file.
 9 | 
10 | ## Prerequisites
11 | 
12 | Before you make a substantial pull request, you should always file an issue and
13 | make sure someone from the team agrees that it’s a problem. If you’ve found a
14 | bug, create an associated issue and illustrate the bug with a minimal
15 | [reprex](https://www.tidyverse.org/help/#reprex).
16 | 
17 | ## Pull request process
18 | 
19 | -   We recommend that you create a Git branch for each pull request (PR).  
20 | 
21 | -   Look at the Travis build status before and after making changes.
22 |     The `README` should contain badges for any continuous integration services used
23 | by the package.
24 | 
25 | -   New code should follow the tidyverse [style guide](http://style.tidyverse.org).
26 |     You can use the [styler](https://CRAN.R-project.org/package=styler) package to
27 |     apply these styles, but please don't restyle code that has nothing to do with
28 |     your PR.
29 | 
30 | ## Code of Conduct
31 | 
32 | Please note that the R4DSSolutions project is released with a
33 | [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By contributing to this
34 | project you agree to abide by its terms.
35 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE.md:
--------------------------------------------------------------------------------
1 | Please briefly describe your problem and what output you expect.
2 | 
3 | -   Include a link to page, and reference the exercise number (if applicable).
4 | 
5 | -   If it is a typo, include the quoted text where it occurs.
6 | 
7 | -   If code is producing an error, include both the code and the error message, as well as the
8 |     output of `sessionInfo()`.
9 | 


--------------------------------------------------------------------------------
/.github/workflows/bookdown.yaml:
--------------------------------------------------------------------------------
 1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
 2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
 3 | on:
 4 |   push:
 5 |     branches: [main, master]
 6 |   pull_request:
 7 |     branches: [main, master]
 8 |   workflow_dispatch:
 9 | 
10 | name: bookdown
11 | 
12 | jobs:
13 |   bookdown:
14 |     runs-on: ubuntu-latest
15 |     # Only restrict concurrency for non-PR jobs
16 |     concurrency:
17 |       group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }}
18 |     env:
19 |       GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
20 |     steps:
21 |       - uses: actions/checkout@v4
22 | 
23 |       - uses: r-lib/actions/setup-pandoc@v2
24 | 
25 |       - uses: r-lib/actions/setup-r@v2
26 |         with:
27 |           use-public-rspm: true
28 | 
29 |       - uses: r-lib/actions/setup-renv@v2
30 | 
31 |       - name: Cache bookdown results
32 |         uses: actions/cache@v4
33 |         with:
34 |           path: _bookdown_files
35 |           key: bookdown-${{ hashFiles('**/*Rmd') }}
36 |           restore-keys: bookdown-
37 | 
38 |       - name: Build site
39 |         run: bookdown::render_book("index.Rmd", quiet = TRUE)
40 |         shell: Rscript {0}
41 | 
42 |       - name: Deploy to GitHub pages 🚀
43 |         if: github.event_name != 'pull_request'
44 |         uses: JamesIves/github-pages-deploy-action@v4.5.0
45 |         with:
46 |           branch: gh-pages
47 |           folder: _book
48 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | .Rhistory
 2 | .Rproj.user
 3 | .RData
 4 | _bookdown_files/*
 5 | 
 6 | # Don't ignore all _cache, _files folders because they are needed for output in the docs folder
 7 | /*_cache/
 8 | /*_files/
 9 | /rmarkdown/*_cache/
10 | /rmarkdown/*_files/
11 | /rmarkdown/*.html
12 | 
13 | figures
14 | 
15 | # Ignore intermediate
16 | *.knit.md
17 | *.utf8.md
18 | /*.md
19 | !README.md
20 | !NEWS.md
21 | /libs
22 | 
23 | # ignore bookdown cache
24 | /bookdown[0-9a-f]*
25 | 
26 | *.aux
27 | *.out
28 | *.fls
29 | *.toc
30 | *.log
31 | *.fdb_latexmk
32 | _bookdown_files
33 | .ipynb_checkpoints
34 | *.rds
35 | node_modules
36 | Rplots.pdf
37 | *.bib.sav
38 | _build
39 | *.bak
40 | .drake
41 | /cache
42 | /figure
43 | 


--------------------------------------------------------------------------------
/.htmlhintrc:
--------------------------------------------------------------------------------
1 | {
2 |   "doctype-html5": false,
3 |   "tag-pair": false
4 | }
5 | 


--------------------------------------------------------------------------------
/.remarkrc:
--------------------------------------------------------------------------------
 1 | {
 2 |   "plugins": [
 3 |     "remark-preset-lint-recommended",
 4 |     "remark-preset-lint-consistent",
 5 |     "remark-preset-lint-markdown-style-guide",
 6 |     "remark-frontmatter",
 7 |     "remark-math",
 8 |     ["remark-lint-file-extension", false],
 9 |     ["remark-lint-maximum-line-length", 500],
10 |     ["remark-lint-no-shortcut-reference-link", false],
11 |     ["remark-lint-list-item-indent", "tab-size"],
12 |     ["remark-lint-no-undefined-references", false],
13 |     ["remark-lint-emphasis-marker", false],
14 |     ["remark-lint-fenced-code-flag", false],
15 |     ["remark-lint-no-duplicate-headings", false],
16 |     ["remark-lint-maximum-heading-length", false],
17 |     ["remark-lint-no-multiple-toplevel-headings", false],
18 |     ["remark-lint-no-file-name-irregular-characters", false]
19 |   ]
20 | }
21 | 


--------------------------------------------------------------------------------
/DESCRIPTION:
--------------------------------------------------------------------------------
 1 | Package: r4ds.exercise.solutions
 2 | Title: Exercise Solutions to "R for Data Science"
 3 | Version: 0.1
 4 | Authors@R: c(person("Jeffrey", "Arnold", , "jeffrey.arnold@gmail.com", c("aut", "cre")))
 5 | Description: Solutions to Wickham and Grolemund, "R for Data Science".
 6 |   This package installs the packages necessary to build that.
 7 | License: file LICENSE
 8 | URL: https://github.com/jrnold/r4ds-exercise-solutions
 9 | Depends:
10 |     R (>= 3.1.0)
11 | Imports:
12 |     babynames,
13 |     datamodelr,
14 |     DiagrammeR,
15 |     fueleconomy,
16 |     gapminder,
17 |     ggbeeswarm,
18 |     ggplot2 (>= 3.3.0),
19 |     ggstance,
20 |     hexbin,
21 |     here,
22 |     lvplot,
23 |     Lahman,
24 |     MASS,
25 |     maps,
26 |     microbenchmark,
27 |     nasaweather,
28 |     nycflights13,
29 |     stringi,
30 |     tidyr (>= 1.0.0),    
31 |     tidyverse,
32 |     viridis
33 | Suggests:
34 |     bookdown (>= 0.7.17),
35 |     devtools,
36 |     fs,
37 |     gh,
38 |     git2r,
39 |     glue,
40 |     hypothesisr,
41 |     jsonlite,
42 |     lintr,
43 |     magrittr,
44 |     optparse,
45 |     rmarkdown (>= 1.10.11),
46 |     rvest,
47 |     spelling,
48 |     styler,
49 |     urltools,
50 |     webshot,
51 |     xml2,
52 |     yaml
53 | Remotes:
54 |     github::bergant/datamodelr,
55 |     github::mdlincoln/hypothesisr,
56 |     github::rstudio/bookdown
57 | RoxygenNote: 6.1.1
58 | 


--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
 1 | FROM rocker/verse:latest
 2 | 
 3 | ENV PROJ_DIR /home/rstudio/r4ds-exercise-solutions
 4 | ENV PANDOC_VERSION 2.5
 5 | ENV PANDOC_FILENAME pandoc-${PANDOC_VERSION}-1-amd64.deb
 6 | ENV APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE 1
 7 | 
 8 | # Install pandoc and nodejs
 9 | RUN apt-get update && apt-get install -y \
10 |   gnupg \
11 |   curl
12 | 
13 | RUN curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash - && \
14 |   apt-get install -y nodejs
15 | 
16 | RUN curl -sL -o ${PANDOC_FILENAME} https://github.com/jgm/pandoc/releases/download/${PANDOC_VERSION}/${PANDOC_FILENAME} && \
17 |   dpkg -i ${PANDOC_FILENAME} && \
18 |   rm ${PANDOC_FILENAME} && \
19 |   rm /usr/local/bin/pandoc /usr/local/bin/pandoc-citeproc
20 | 
21 | # Install dependencies needed to run code and build package
22 | RUN mkdir install
23 | COPY DESCRIPTION install
24 | RUN Rscript -e "devtools::install('install', dependencies=TRUE)"
25 | RUN rm -rf install
26 | 
27 | RUN mkdir ${PROJ_DIR}
28 | WORKDIR ${PROJ_DIR}
29 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 | Attribution 4.0 International
  2 | 
  3 | =======================================================================
  4 | 
  5 | Creative Commons Corporation ("Creative Commons") is not a law firm and
  6 | does not provide legal services or legal advice. Distribution of
  7 | Creative Commons public licenses does not create a lawyer-client or
  8 | other relationship. Creative Commons makes its licenses and related
  9 | information available on an "as-is" basis. Creative Commons gives no
 10 | warranties regarding its licenses, any material licensed under their
 11 | terms and conditions, or any related information. Creative Commons
 12 | disclaims all liability for damages resulting from their use to the
 13 | fullest extent possible.
 14 | 
 15 | Using Creative Commons Public Licenses
 16 | 
 17 | Creative Commons public licenses provide a standard set of terms and
 18 | conditions that creators and other rights holders may use to share
 19 | original works of authorship and other material subject to copyright
 20 | and certain other rights specified in the public license below. The
 21 | following considerations are for informational purposes only, are not
 22 | exhaustive, and do not form part of our licenses.
 23 | 
 24 |      Considerations for licensors: Our public licenses are
 25 |      intended for use by those authorized to give the public
 26 |      permission to use material in ways otherwise restricted by
 27 |      copyright and certain other rights. Our licenses are
 28 |      irrevocable. Licensors should read and understand the terms
 29 |      and conditions of the license they choose before applying it.
 30 |      Licensors should also secure all rights necessary before
 31 |      applying our licenses so that the public can reuse the
 32 |      material as expected. Licensors should clearly mark any
 33 |      material not subject to the license. This includes other CC-
 34 |      licensed material, or material used under an exception or
 35 |      limitation to copyright. More considerations for licensors:
 36 | 	wiki.creativecommons.org/Considerations_for_licensors
 37 | 
 38 |      Considerations for the public: By using one of our public
 39 |      licenses, a licensor grants the public permission to use the
 40 |      licensed material under specified terms and conditions. If
 41 |      the licensor's permission is not necessary for any reason--for
 42 |      example, because of any applicable exception or limitation to
 43 |      copyright--then that use is not regulated by the license. Our
 44 |      licenses grant only permissions under copyright and certain
 45 |      other rights that a licensor has authority to grant. Use of
 46 |      the licensed material may still be restricted for other
 47 |      reasons, including because others have copyright or other
 48 |      rights in the material. A licensor may make special requests,
 49 |      such as asking that all changes be marked or described.
 50 |      Although not required by our licenses, you are encouraged to
 51 |      respect those requests where reasonable. More considerations
 52 |      for the public:
 53 | 	wiki.creativecommons.org/Considerations_for_licensees
 54 | 
 55 | =======================================================================
 56 | 
 57 | Creative Commons Attribution 4.0 International Public License
 58 | 
 59 | By exercising the Licensed Rights (defined below), You accept and agree
 60 | to be bound by the terms and conditions of this Creative Commons
 61 | Attribution 4.0 International Public License ("Public License"). To the
 62 | extent this Public License may be interpreted as a contract, You are
 63 | granted the Licensed Rights in consideration of Your acceptance of
 64 | these terms and conditions, and the Licensor grants You such rights in
 65 | consideration of benefits the Licensor receives from making the
 66 | Licensed Material available under these terms and conditions.
 67 | 
 68 | 
 69 | Section 1 -- Definitions.
 70 | 
 71 |   a. Adapted Material means material subject to Copyright and Similar
 72 |      Rights that is derived from or based upon the Licensed Material
 73 |      and in which the Licensed Material is translated, altered,
 74 |      arranged, transformed, or otherwise modified in a manner requiring
 75 |      permission under the Copyright and Similar Rights held by the
 76 |      Licensor. For purposes of this Public License, where the Licensed
 77 |      Material is a musical work, performance, or sound recording,
 78 |      Adapted Material is always produced where the Licensed Material is
 79 |      synched in timed relation with a moving image.
 80 | 
 81 |   b. Adapter's License means the license You apply to Your Copyright
 82 |      and Similar Rights in Your contributions to Adapted Material in
 83 |      accordance with the terms and conditions of this Public License.
 84 | 
 85 |   c. Copyright and Similar Rights means copyright and/or similar rights
 86 |      closely related to copyright including, without limitation,
 87 |      performance, broadcast, sound recording, and Sui Generis Database
 88 |      Rights, without regard to how the rights are labeled or
 89 |      categorized. For purposes of this Public License, the rights
 90 |      specified in Section 2(b)(1)-(2) are not Copyright and Similar
 91 |      Rights.
 92 | 
 93 |   d. Effective Technological Measures means those measures that, in the
 94 |      absence of proper authority, may not be circumvented under laws
 95 |      fulfilling obligations under Article 11 of the WIPO Copyright
 96 |      Treaty adopted on December 20, 1996, and/or similar international
 97 |      agreements.
 98 | 
 99 |   e. Exceptions and Limitations means fair use, fair dealing, and/or
100 |      any other exception or limitation to Copyright and Similar Rights
101 |      that applies to Your use of the Licensed Material.
102 | 
103 |   f. Licensed Material means the artistic or literary work, database,
104 |      or other material to which the Licensor applied this Public
105 |      License.
106 | 
107 |   g. Licensed Rights means the rights granted to You subject to the
108 |      terms and conditions of this Public License, which are limited to
109 |      all Copyright and Similar Rights that apply to Your use of the
110 |      Licensed Material and that the Licensor has authority to license.
111 | 
112 |   h. Licensor means the individual(s) or entity(ies) granting rights
113 |      under this Public License.
114 | 
115 |   i. Share means to provide material to the public by any means or
116 |      process that requires permission under the Licensed Rights, such
117 |      as reproduction, public display, public performance, distribution,
118 |      dissemination, communication, or importation, and to make material
119 |      available to the public including in ways that members of the
120 |      public may access the material from a place and at a time
121 |      individually chosen by them.
122 | 
123 |   j. Sui Generis Database Rights means rights other than copyright
124 |      resulting from Directive 96/9/EC of the European Parliament and of
125 |      the Council of 11 March 1996 on the legal protection of databases,
126 |      as amended and/or succeeded, as well as other essentially
127 |      equivalent rights anywhere in the world.
128 | 
129 |   k. You means the individual or entity exercising the Licensed Rights
130 |      under this Public License. Your has a corresponding meaning.
131 | 
132 | 
133 | Section 2 -- Scope.
134 | 
135 |   a. License grant.
136 | 
137 |        1. Subject to the terms and conditions of this Public License,
138 |           the Licensor hereby grants You a worldwide, royalty-free,
139 |           non-sublicensable, non-exclusive, irrevocable license to
140 |           exercise the Licensed Rights in the Licensed Material to:
141 | 
142 |             a. reproduce and Share the Licensed Material, in whole or
143 |                in part; and
144 | 
145 |             b. produce, reproduce, and Share Adapted Material.
146 | 
147 |        2. Exceptions and Limitations. For the avoidance of doubt, where
148 |           Exceptions and Limitations apply to Your use, this Public
149 |           License does not apply, and You do not need to comply with
150 |           its terms and conditions.
151 | 
152 |        3. Term. The term of this Public License is specified in Section
153 |           6(a).
154 | 
155 |        4. Media and formats; technical modifications allowed. The
156 |           Licensor authorizes You to exercise the Licensed Rights in
157 |           all media and formats whether now known or hereafter created,
158 |           and to make technical modifications necessary to do so. The
159 |           Licensor waives and/or agrees not to assert any right or
160 |           authority to forbid You from making technical modifications
161 |           necessary to exercise the Licensed Rights, including
162 |           technical modifications necessary to circumvent Effective
163 |           Technological Measures. For purposes of this Public License,
164 |           simply making modifications authorized by this Section 2(a)
165 |           (4) never produces Adapted Material.
166 | 
167 |        5. Downstream recipients.
168 | 
169 |             a. Offer from the Licensor -- Licensed Material. Every
170 |                recipient of the Licensed Material automatically
171 |                receives an offer from the Licensor to exercise the
172 |                Licensed Rights under the terms and conditions of this
173 |                Public License.
174 | 
175 |             b. No downstream restrictions. You may not offer or impose
176 |                any additional or different terms or conditions on, or
177 |                apply any Effective Technological Measures to, the
178 |                Licensed Material if doing so restricts exercise of the
179 |                Licensed Rights by any recipient of the Licensed
180 |                Material.
181 | 
182 |        6. No endorsement. Nothing in this Public License constitutes or
183 |           may be construed as permission to assert or imply that You
184 |           are, or that Your use of the Licensed Material is, connected
185 |           with, or sponsored, endorsed, or granted official status by,
186 |           the Licensor or others designated to receive attribution as
187 |           provided in Section 3(a)(1)(A)(i).
188 | 
189 |   b. Other rights.
190 | 
191 |        1. Moral rights, such as the right of integrity, are not
192 |           licensed under this Public License, nor are publicity,
193 |           privacy, and/or other similar personality rights; however, to
194 |           the extent possible, the Licensor waives and/or agrees not to
195 |           assert any such rights held by the Licensor to the limited
196 |           extent necessary to allow You to exercise the Licensed
197 |           Rights, but not otherwise.
198 | 
199 |        2. Patent and trademark rights are not licensed under this
200 |           Public License.
201 | 
202 |        3. To the extent possible, the Licensor waives any right to
203 |           collect royalties from You for the exercise of the Licensed
204 |           Rights, whether directly or through a collecting society
205 |           under any voluntary or waivable statutory or compulsory
206 |           licensing scheme. In all other cases the Licensor expressly
207 |           reserves any right to collect such royalties.
208 | 
209 | 
210 | Section 3 -- License Conditions.
211 | 
212 | Your exercise of the Licensed Rights is expressly made subject to the
213 | following conditions.
214 | 
215 |   a. Attribution.
216 | 
217 |        1. If You Share the Licensed Material (including in modified
218 |           form), You must:
219 | 
220 |             a. retain the following if it is supplied by the Licensor
221 |                with the Licensed Material:
222 | 
223 |                  i. identification of the creator(s) of the Licensed
224 |                     Material and any others designated to receive
225 |                     attribution, in any reasonable manner requested by
226 |                     the Licensor (including by pseudonym if
227 |                     designated);
228 | 
229 |                 ii. a copyright notice;
230 | 
231 |                iii. a notice that refers to this Public License;
232 | 
233 |                 iv. a notice that refers to the disclaimer of
234 |                     warranties;
235 | 
236 |                  v. a URI or hyperlink to the Licensed Material to the
237 |                     extent reasonably practicable;
238 | 
239 |             b. indicate if You modified the Licensed Material and
240 |                retain an indication of any previous modifications; and
241 | 
242 |             c. indicate the Licensed Material is licensed under this
243 |                Public License, and include the text of, or the URI or
244 |                hyperlink to, this Public License.
245 | 
246 |        2. You may satisfy the conditions in Section 3(a)(1) in any
247 |           reasonable manner based on the medium, means, and context in
248 |           which You Share the Licensed Material. For example, it may be
249 |           reasonable to satisfy the conditions by providing a URI or
250 |           hyperlink to a resource that includes the required
251 |           information.
252 | 
253 |        3. If requested by the Licensor, You must remove any of the
254 |           information required by Section 3(a)(1)(A) to the extent
255 |           reasonably practicable.
256 | 
257 |        4. If You Share Adapted Material You produce, the Adapter's
258 |           License You apply must not prevent recipients of the Adapted
259 |           Material from complying with this Public License.
260 | 
261 | 
262 | Section 4 -- Sui Generis Database Rights.
263 | 
264 | Where the Licensed Rights include Sui Generis Database Rights that
265 | apply to Your use of the Licensed Material:
266 | 
267 |   a. for the avoidance of doubt, Section 2(a)(1) grants You the right
268 |      to extract, reuse, reproduce, and Share all or a substantial
269 |      portion of the contents of the database;
270 | 
271 |   b. if You include all or a substantial portion of the database
272 |      contents in a database in which You have Sui Generis Database
273 |      Rights, then the database in which You have Sui Generis Database
274 |      Rights (but not its individual contents) is Adapted Material; and
275 | 
276 |   c. You must comply with the conditions in Section 3(a) if You Share
277 |      all or a substantial portion of the contents of the database.
278 | 
279 | For the avoidance of doubt, this Section 4 supplements and does not
280 | replace Your obligations under this Public License where the Licensed
281 | Rights include other Copyright and Similar Rights.
282 | 
283 | 
284 | Section 5 -- Disclaimer of Warranties and Limitation of Liability.
285 | 
286 |   a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
287 |      EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
288 |      AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
289 |      ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
290 |      IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
291 |      WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
292 |      PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
293 |      ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
294 |      KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
295 |      ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
296 | 
297 |   b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
298 |      TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
299 |      NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
300 |      INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
301 |      COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
302 |      USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
303 |      ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
304 |      DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
305 |      IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
306 | 
307 |   c. The disclaimer of warranties and limitation of liability provided
308 |      above shall be interpreted in a manner that, to the extent
309 |      possible, most closely approximates an absolute disclaimer and
310 |      waiver of all liability.
311 | 
312 | 
313 | Section 6 -- Term and Termination.
314 | 
315 |   a. This Public License applies for the term of the Copyright and
316 |      Similar Rights licensed here. However, if You fail to comply with
317 |      this Public License, then Your rights under this Public License
318 |      terminate automatically.
319 | 
320 |   b. Where Your right to use the Licensed Material has terminated under
321 |      Section 6(a), it reinstates:
322 | 
323 |        1. automatically as of the date the violation is cured, provided
324 |           it is cured within 30 days of Your discovery of the
325 |           violation; or
326 | 
327 |        2. upon express reinstatement by the Licensor.
328 | 
329 |      For the avoidance of doubt, this Section 6(b) does not affect any
330 |      right the Licensor may have to seek remedies for Your violations
331 |      of this Public License.
332 | 
333 |   c. For the avoidance of doubt, the Licensor may also offer the
334 |      Licensed Material under separate terms or conditions or stop
335 |      distributing the Licensed Material at any time; however, doing so
336 |      will not terminate this Public License.
337 | 
338 |   d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
339 |      License.
340 | 
341 | 
342 | Section 7 -- Other Terms and Conditions.
343 | 
344 |   a. The Licensor shall not be bound by any additional or different
345 |      terms or conditions communicated by You unless expressly agreed.
346 | 
347 |   b. Any arrangements, understandings, or agreements regarding the
348 |      Licensed Material not stated herein are separate from and
349 |      independent of the terms and conditions of this Public License.
350 | 
351 | 
352 | Section 8 -- Interpretation.
353 | 
354 |   a. For the avoidance of doubt, this Public License does not, and
355 |      shall not be interpreted to, reduce, limit, restrict, or impose
356 |      conditions on any use of the Licensed Material that could lawfully
357 |      be made without permission under this Public License.
358 | 
359 |   b. To the extent possible, if any provision of this Public License is
360 |      deemed unenforceable, it shall be automatically reformed to the
361 |      minimum extent necessary to make it enforceable. If the provision
362 |      cannot be reformed, it shall be severed from this Public License
363 |      without affecting the enforceability of the remaining terms and
364 |      conditions.
365 | 
366 |   c. No term or condition of this Public License will be waived and no
367 |      failure to comply consented to unless expressly agreed to by the
368 |      Licensor.
369 | 
370 |   d. Nothing in this Public License constitutes or may be interpreted
371 |      as a limitation upon, or waiver of, any privileges and immunities
372 |      that apply to the Licensor or You, including from the legal
373 |      processes of any jurisdiction or authority.
374 | 
375 | 
376 | =======================================================================
377 | 
378 | Creative Commons is not a party to its public
379 | licenses. Notwithstanding, Creative Commons may elect to apply one of
380 | its public licenses to material it publishes and in those instances
381 | will be considered the “Licensor.” The text of the Creative Commons
382 | public licenses is dedicated to the public domain under the CC0 Public
383 | Domain Dedication. Except for the limited purpose of indicating that
384 | material is shared under a Creative Commons public license or as
385 | otherwise permitted by the Creative Commons policies published at
386 | creativecommons.org/policies, Creative Commons does not authorize the
387 | use of the trademark "Creative Commons" or any other trademark or logo
388 | of Creative Commons without its prior written consent including,
389 | without limitation, in connection with any unauthorized modifications
390 | to any of its public licenses or any other arrangements,
391 | understandings, or agreements concerning use of licensed material. For
392 | the avoidance of doubt, this paragraph does not form part of the
393 | public licenses.
394 | 
395 | Creative Commons may be contacted at creativecommons.org.
396 | 


--------------------------------------------------------------------------------
/NAMESPACE:
--------------------------------------------------------------------------------
1 | # Generated by roxygen2: do not edit by hand
2 | 
3 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | [![Lifecycle: superseded](https://img.shields.io/badge/lifecycle-superseded-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html#superseded)
 2 | 
 3 | # Exercise Solutions to R for Data Science
 4 | 
 5 | These are solutions to the **1st edition** of R for Data Science. The solutions to the 2nd edition of [R for Data Science](https://r4ds.hadley.nz/) are available at [R for Data Science (2e) - Solutions to Exercises](https://mine-cetinkaya-rundel.github.io/r4ds-solutions/).
 6 | 
 7 | This repository contains the code and text behind the [Solutions for R for Data Science](https://jrnold.github.io/r4ds-exercise-solutions/), which, as its name suggests, has solutions to the the exercises in [R for Data Science](https://r4ds.had.co.nz/) by Garrett Grolemund and Hadley Wickham.
 8 | 
 9 | The R packages used in this book can be installed via
10 | ```r
11 | devtools::install_github("jrnold/r4ds-exercise-solutions")
12 | ```
13 | 
14 | ## Contributing
15 | 
16 | Work on this repo has effectively stopped since the 2nd edition of R for Data Science has been published. Please direct your contributions to [R for Data Science (2e) - Solutions to Exercises](https://mine-cetinkaya-rundel.github.io/r4ds-solutions/).
17 | 
18 | ## Build
19 | 
20 | The site is built using the [bookdown](https://bookdown.org/yihui/bookdown/) package and pandoc.
21 | 


--------------------------------------------------------------------------------
/WORDLIST:
--------------------------------------------------------------------------------
  1 | aaa
  2 | abaca
  3 | abba
  4 | abc
  5 | abcsgasgddsadgsdgcba
  6 | abline
  7 | accba
  8 | ae
  9 | Ames
 10 | ATL
 11 | BDL
 12 | benchmarking
 13 | Berkson's
 14 | bimodal
 15 | bimodality
 16 | binwidth
 17 | BNA
 18 | bookdown
 19 | BOS
 20 | BQN
 21 | bts
 22 | burqa
 23 | cancelled
 24 | carajoos
 25 | chardet
 26 | Chua
 27 | Chuan
 28 | cinq
 29 | CLDR
 30 | ClevelandMcGillMcGill
 31 | CN
 32 | colonophon
 33 | colour
 34 | Computerphile
 35 | Consolas
 36 | coord
 37 | counterintuitive
 38 | covariation
 39 | Covariation
 40 | csv
 41 | cyclicality
 42 | datamodelr
 43 | datasets
 44 | datetimes
 45 | De
 46 | Deja
 47 | denormal
 48 | derecho
 49 | derechos
 50 | dest
 51 | DiagrammeR
 52 | dialr
 53 | disaggregate
 54 | DL
 55 | DoaneSeward
 56 | DOCX
 57 | dongzhuoer
 58 | DOTLESS
 59 | dplyr
 60 | ds
 61 | DS
 62 | DSSolutions
 63 | duplicative
 64 | Durations
 65 | dx
 66 | dy
 67 | eed
 68 | EPUB
 69 | EUC
 70 | EV
 71 | EWR
 72 | ExpressJet
 73 | Fabien
 74 | facetted
 75 | facetting
 76 | FiraCode
 77 | FiveThirtyEight
 78 | fizzbuzz
 79 | FizzBuzz
 80 | forcats
 81 | frac
 82 | gapminder
 83 | Gapminder
 84 | GBK
 85 | GeeksforGeeks
 86 | geoms
 87 | ggbeeswarm
 88 | ggplot
 89 | ggrepel
 90 | ggstance
 91 | github
 92 | Gliffy
 93 | glyphs
 94 | Graffle
 95 | Graphviz
 96 | Grolemund
 97 | GSP
 98 | gss
 99 | gvwilson
100 | hadley
101 | Hadley
102 | hc
103 | Hecker
104 | Hecker
105 | HeerAgrawala
106 | Heike
107 | heterogeneous
108 | heteroskedasticity
109 | Hintze
110 | HintzeNelson
111 | HNL
112 | Hofman
113 | Hofmann
114 | HofmannWickhamKafadar
115 | honours
116 | HOU
117 | hpdiamonds
118 | htm
119 | http
120 | hypothes
121 | ı
122 | IAH
123 | iconv
124 | IEC
125 | infty
126 | Inkscape
127 | interpretable
128 | io
129 | ipsum
130 | ise
131 | ized
132 | JamesCuster
133 | JIS
134 | jitter
135 | JL
136 | jrnold
137 | Kafadar
138 | Karlton
139 | kididdles
140 | KleinGeard
141 | Krywinski
142 | Krywinsky
143 | kunststube
144 | Lahman
145 | lceil
146 | Leste
147 | LGA
148 | lifecycle
149 | lintr
150 | lm
151 | loess
152 | lorem
153 | lubridate
154 | lvplot
155 | MarkDown
156 | mathrm
157 | mathsf
158 | md
159 | Menlo
160 | merely
161 | microbenchmark
162 | modelr
163 | MQ
164 | mvhone
165 | nd
166 | nicercode
167 | normals
168 | NoteBook
169 | NSE
170 | nycflights
171 | nz
172 | nzxwang
173 | O'Reilly
174 | oe
175 | OmniGraffle
176 | openflights
177 | ou
178 | pandoc
179 | Ph
180 | PhD
181 | PHI
182 | PHL
183 | pointrange
184 | postfixed
185 | PowerPoint
186 | pre
187 | preallocate
188 | preprocess
189 | preprocessed
190 | programmatically
191 | PSE
192 | purrr
193 | purrr's
194 | quartile
195 | quartiles
196 | Quora
197 | radix
198 | rceil
199 | readr
200 | regexcrossword
201 | reimplementing
202 | representable
203 | rescale
204 | rmarkdown
205 | rmarkdown
206 | Rmd
207 | rstudio
208 | RStudio
209 | rstudiotips
210 | Sanglard
211 | sd
212 | SelectFields
213 | Shalloway
214 | SJU
215 | StackOverflow
216 | stringi
217 | stringr
218 | STT
219 | suboptimal
220 | summarise
221 | th
222 | tibble
223 | tibbles
224 | Tibbles
225 | tidyr
226 | tidyverse
227 | Timezones
228 | transtats
229 | un
230 | undercarat
231 | underpredicting
232 | unicode
233 | unintuitively
234 | unpivots
235 | vectorized
236 | vectorizes
237 | viridis
238 | visualisation
239 | visualising
240 | Vu
241 | VVS
242 | WD
243 | Wickham
244 | wikipedia
245 | WordNet
246 | Wut
247 | www
248 | yaml
249 | YAML
250 | yyyy
251 | YYYY
252 | 


--------------------------------------------------------------------------------
/_bookdown.yml:
--------------------------------------------------------------------------------
 1 | output_dir: "_build"
 2 | new_session: yes
 3 | 
 4 | rmd_files:
 5 |   - "index.Rmd"
 6 |   - "intro.Rmd"
 7 | 
 8 |   - "explore.Rmd"
 9 |   - "visualize.Rmd"
10 |   - "workflow-basics.Rmd"
11 |   - "transform.Rmd"
12 |   - "workflow-scripts.Rmd"
13 |   - "EDA.Rmd"
14 |   - "workflow-projects.Rmd"
15 | 
16 |   - "wrangle.Rmd"
17 |   - "tibble.Rmd"
18 |   - "import.Rmd"
19 |   - "tidy.Rmd"
20 |   - "relational-data.Rmd"
21 |   - "strings.Rmd"
22 |   - "factors.Rmd"
23 |   - "datetimes.Rmd"
24 | 
25 |   - "program.Rmd"
26 |   - "pipes.Rmd"
27 |   - "functions.Rmd"
28 |   - "vectors.Rmd"
29 |   - "iteration.Rmd"
30 | 
31 |   - "model.Rmd"
32 |   - "model-basics.Rmd"
33 |   - "model-building.Rmd"
34 |   - "many-models.Rmd"
35 | 
36 |   - "communicate.Rmd"
37 |   - "rmarkdown.Rmd"
38 |   - "graphics-for-communication.Rmd"
39 |   - "rmarkdown-formats.Rmd"
40 |   - "rmarkdown-workflow.Rmd"
41 | 
42 |   - "appendixes.Rmd"
43 | 
44 | before_chapter_script: "_common.R"
45 | book_filename: "r4ds-solutions"
46 | delete_merged_file: true
47 | edit: "https://github.com/jrnold/r4ds-exercise-solutions/edit/master/%s"
48 | history: "https://github.com/jrnold/r4ds-exercise-solutions/commits/master/%s"
49 | 


--------------------------------------------------------------------------------
/_common.R:
--------------------------------------------------------------------------------
 1 | set.seed(1014)
 2 | options(digits = 3)
 3 | 
 4 | knitr::opts_chunk$set(
 5 |   comment = "#>",
 6 |   collapse = TRUE,
 7 |   cache = TRUE,
 8 |   autodep = TRUE,
 9 |   # need to save cache
10 |   cache.extra = knitr::rand_seed,
11 |   out.width = "70%",
12 |   fig.align = "center",
13 |   fig.width = 6,
14 |   fig.asp = 0.618, # 1 / phi
15 |   fig.show = "hold",
16 |   # styler
17 |   tidy = 'styler'
18 | )
19 | 
20 | options(dplyr.print_min = 6, dplyr.print_max = 6)
21 | 
22 | is_html <- knitr::opts_knit$get("rmarkdown.pandoc.to") == "html"
23 | 
24 | # Info and useful links
25 | SOURCE_URL <- stringr::str_c("https:/", "github.com", "jrnold",
26 |   "r4ds-exercise-solutions",
27 |   sep = "/"
28 | )
29 | PUB_URL <- stringr::str_c("https:/", "jrnold.github.io",
30 |   "r4ds-exercise-solutions",
31 |   sep = "/"
32 | )
33 | 
34 | R4DS_URL <- "https://r4ds.had.co.nz"
35 | 
36 | r4ds_url <- function(...) {
37 |   stringr::str_c(R4DS_URL, ..., sep = "/")
38 | }
39 | 
40 | comma_int <- function(x) {
41 |   prettyNum(x, big.interval = 3, big.mark = ",")
42 | }
43 | 
44 | no_exercises <- function() {
45 |   tags <- htmltools::tags
46 |   tags$div(
47 |     class = 'alert alert-warning hints-alert',
48 |     tags$div(
49 |       class = "hints-icon",
50 |       tags$i(
51 |         class = "fa fa-exclamation-circle"
52 |       )
53 |     ),
54 |     tags$div(
55 |       class = "hints-container",
56 |       "No exercises"
57 |     )
58 |   )
59 | }
60 | 


--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
  1 | github_repo: "jrnold/r4ds-exercise-solutions"
  2 | deploy_url: "https://jrnold.github.io/r4ds-exercise-solutions/"
  3 | r4ds:
  4 |   github_repo: "hadley/r4ds"
  5 |   url: "https://r4ds.had.co.nz"
  6 | chapters:
  7 |   - {"rmd": "intro.Rmd", "html": "introduction.html"}
  8 |   - {"rmd": "explore-intro.Rmd", "html": "explore-intro.html"}
  9 |   - {"rmd": "visualize.Rmd", "html": "data-visualisation.html"}
 10 |   - {"rmd": "workflow-basics.Rmd", "html": "workflow-basics.html"}
 11 |   - {"rmd": "transform.Rmd", "html": "transform.html"}
 12 |   - {"rmd": "workflow-scripts.Rmd", "html": "workflow-scripts.html"}
 13 |   - {"rmd": "EDA.Rmd", "html": "exploratory-data-analysis.html"}
 14 |   - {"rmd": "workflow-projects.Rmd", "html": "workflow-projects.html"}
 15 |   - {"rmd": "wrangle-intro.Rmd", "html": "wrangle-intro.html"}
 16 |   - {"rmd": "tibble.Rmd", "html": "tibbles.html"}
 17 |   - {"rmd": "import.Rmd", "html": "data-import.html"}
 18 |   - {"rmd": "tidy.Rmd", "html": "tidy-data.html"}
 19 |   - {"rmd": "relational-data.Rmd", "html": "relational-data.html"}
 20 |   - {"rmd": "strings.Rmd", "html": "strings.html"}
 21 |   - {"rmd": "factors.Rmd", "html": "factors.html"}
 22 |   - {"rmd": "datetimes.Rmd", "html": "dates-and-times.html"}
 23 |   - {"rmd": "program-intro.Rmd", "html": "program-intro.html"}
 24 |   - {"rmd": "pipes.Rmd", "html": "pipes.html"}
 25 |   - {"rmd": "functions.Rmd", "html": "functions.html"}
 26 |   - {"rmd": "vectors.Rmd", "html": "vectors.html"}
 27 |   - {"rmd": "iteration.Rmd", "html": "iteration.html"}
 28 |   - {"rmd": "model-intro.Rmd", "html": "model-intro.html"}
 29 |   - {"rmd": "model-basics.Rmd", "html": "model-basics.html"}
 30 |   - {"rmd": "model-building.Rmd", "html": "model-building.html"}
 31 |   - {"rmd": "many-models.Rmd", "html": "many-models.html"}
 32 |   - {"rmd": "communicate-intro.Rmd", "html": "communicate-intro.html"}
 33 |   - {"rmd": "rmarkdown.Rmd", "html": "r-markdown.html"}
 34 |   - {"rmd": "graphics-for-communication.Rmd", "html": "graphics-for-communication.html"}
 35 |   - {"rmd": "rmarkdown-formats.Rmd", "html": "r-markdown-formats.html"}
 36 |   - {"rmd": "rmarkdown-workflow.Rmd", "html": "r-markdown-workflow.html"}
 37 | exercise_sections:
 38 |   # Mapping from subsections to the Exercise subsubsection in R4DS
 39 |   3.1: 1
 40 |   3.2: 4
 41 |   3.3: 1
 42 |   3.5: 1
 43 |   3.6: 1
 44 |   3.8: 1
 45 |   3.9: 1
 46 |   5.2: 4
 47 |   5.3: 1
 48 |   5.4: 1
 49 |   5.5: 2
 50 |   5.6: 7
 51 |   5.7: 1
 52 |   7.3: 4
 53 |   7.4: 1
 54 |   7.5.1: 1
 55 |   7.5.2: 1
 56 |   7.5.3: 1
 57 |   11.2: 2
 58 |   11.3: 5
 59 |   11.4: 1
 60 |   12.2: 1
 61 |   12.3: 3
 62 |   12.4: 3
 63 |   12.5: 1
 64 |   12.6: 1
 65 |   13.2: 1
 66 |   13.3: 1
 67 |   13.4: 6
 68 |   13.5: 1
 69 |   14.2: 5
 70 |   14.3.1: 1
 71 |   14.3.2: 1
 72 |   14.3.3: 1
 73 |   14.3.4: 1
 74 |   14.3.5: 1
 75 |   14.4.3: 1
 76 |   14.4.4: 1
 77 |   14.4.5: 1
 78 |   14.5: 1
 79 |   14.7: 1
 80 |   15.3: 1
 81 |   15.4: 1
 82 |   15.5: 1
 83 |   16.2: 4
 84 |   16.3: 4
 85 |   16.4: 5
 86 |   19.2: 1
 87 |   19.3: 1
 88 |   19.4: 4
 89 |   19.5: 5
 90 |   20.3: 5
 91 |   20.4: 6
 92 |   20.5: 4
 93 |   20.7: 4
 94 |   21.2: 1
 95 |   21.3: 5
 96 |   21.4: 1
 97 |   21.5: 3
 98 |   21.9: 3
 99 |   23.2: 1
100 |   23.3: 3
101 |   23.4: 5
102 |   24.2: 3
103 |   24.3: 5
104 |   25.2: 5
105 |   25.4: 5
106 |   25.5: 3
107 |   27.2: 1
108 |   27.3: 1
109 |   27.4: 7
110 |   28.2: 1
111 |   28.3: 1
112 |   28.4: 4
113 | 


--------------------------------------------------------------------------------
/_output.yaml:
--------------------------------------------------------------------------------
 1 | bookdown::gitbook:
 2 |   config:
 3 |     toc:
 4 |       collapse: section
 5 |       before: |
 6 |         <li><strong><a href="./">R for Data Science:<br>Exercise Solutions</a></strong></li>
 7 | 
 8 |       after: |
 9 |         <li><a href="https://github.com/rstudio/bookdown">Proudly published with bookdown</a></li>
10 | 
11 |     edit:
12 |       link: https://github.com/jrnold/r4ds-exercise-solutions/edit/master/%s
13 |       text: "Edit"
14 |     sharing:
15 |       facebook: no
16 |       twitter: yes
17 |       google: no
18 |       linkedin: yes
19 |       weibo: no
20 |       instapaper: no
21 |       vk: no
22 |   css:
23 |   - "includes/r4ds.css"
24 |   - "includes/r4ds-solutions.css"
25 |   - "includes/ort.css"
26 |   md_extensions: "+native_divs+native_spans+escaped_line_breaks+smart"
27 |   download: no
28 |   includes:
29 |     in_header:
30 |     - "includes/hypothesis.html"
31 | 


--------------------------------------------------------------------------------
/appendixes.Rmd:
--------------------------------------------------------------------------------
1 | # (PART) Appendixes {-}
2 | 
3 | # References {-}
4 | 


--------------------------------------------------------------------------------
/bin/add-r4ds-links.R:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env Rscript
 2 | #' For all sections anchor links and links to the relevant R4DS section
 3 | suppressPackageStartupMessages({
 4 |   library("rvest")
 5 |   library("fs")
 6 |   library("rlang")
 7 |   library("jsonlite")
 8 |   library("purrr")
 9 |   library("glue")
10 | })
11 | 
12 | # Add link to r4ds after each section
13 | add_r4ds_link_section <- function(x, r4ds_url) {
14 |   # first child should be the heading
15 |   header <- html_node(x, "h1,h2,h3,h4")
16 |   href <- httr::modify_url(r4ds_url, frag = xml_attr(x, "id"))
17 |   a <- xml_add_child(header, "a", href = href,
18 |                      class = "r4ds-section-link section-link",
19 |                      `aria-hidden` = "true")
20 |   icon <- xml_add_child(a, "i",
21 |                         class = "fa fa-external-link",
22 |                         `aria-hidden` = "true")
23 | }
24 | 
25 | add_r4ds_links <- function(doc, path, r4ds_url) {
26 |   # r4ds-section used to indicate R4DS
27 |   r4ds_sections <- html_nodes(doc, ".r4ds-section")
28 |   r4ds_path_url <- httr::modify_url(r4ds_url,
29 |                                     path = as.character(path))
30 |   walk(r4ds_sections, add_r4ds_link_section, r4ds_url = r4ds_path_url)
31 | }
32 | 
33 | # Add link to r4ds after each section
34 | add_anchor_section <- function(x) {
35 |   # first child should be the heading
36 |   header <- html_node(x, "h1,h2,h3,h4")
37 |   href <- paste0("#", xml_attr(x, "id"))
38 |   a <- xml_add_child(header, "a", href = href, class = "anchor section-link",
39 |                      `aria-hidden` = "true", .where = 0)
40 |   icon <- xml_add_child(a, "i",
41 |                         class = "fa fa-link",
42 |                         `aria-hidden` = "true")
43 | }
44 | 
45 | # Add link to r4ds after each section
46 | add_anchors <- function(doc) {
47 |   walk(html_nodes(doc, "div.section"), add_anchor_section)
48 | }
49 | 
50 | handle_page <- function(path, r4ds_url, output_dir) {
51 |   # read HTML for a file
52 |   filename <- fs::path(output_dir, path)
53 |   doc <- read_html(filename)
54 |   add_r4ds_links(doc, path = path, r4ds_url = r4ds_url)
55 |   add_anchors(doc)
56 |   cat(glue("Adding links to {filename}"), "\n\n")
57 |   write_html(doc, filename, options = c("as_html", "format"))
58 | }
59 | 
60 | main <- function() {
61 |   # read table of contents to get list of HTML files
62 |   toc_filename <- "toc.json"
63 |   output_dir <- bookdown:::load_config()[["output_dir"]]
64 |   toc <- read_json(fs::path(output_dir, toc_filename))
65 | 
66 |   # Get URL of r4ds
67 |   config <- yaml::read_yaml(fs::path("_config.yml"))
68 |   r4ds_url <- config$r4ds$url
69 |   walk(unlist(map(toc, "path")), handle_page, r4ds_url, output_dir)
70 | }
71 | 
72 | main()
73 | 


--------------------------------------------------------------------------------
/bin/build.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/sh
 2 | # build website
 3 | # exit when any command fails
 4 | set -e
 5 | set +x
 6 | 
 7 | Rscript bin/render.R --force --quiet
 8 | Rscript bin/create-sitemap.R
 9 | Rscript bin/create-toc.R
10 | Rscript bin/add-r4ds-links.R
11 | 


--------------------------------------------------------------------------------
/bin/check-r4ds-sections.R:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env Rscript
 2 | #' Check that sections in Soluions match those in R4DS
 3 | #'
 4 | #' For all sections in the solutions, check that
 5 | #'
 6 | #' -   It appears in R4DS
 7 | #' -   It has the same title as in R4DS
 8 | #' -   It has the same heading level as R4DS
 9 | #'
10 | #' It does not check whether all sections in R4DS appear in the Solutions
11 | #'
12 | suppressPackageStartupMessages({
13 |   library("xml2")
14 |   library("rvest")
15 |   library("purrr")
16 |   library("tibble")
17 |   library("dplyr")
18 |   library("stringr")
19 |   library("glue")
20 | })
21 | 
22 | handle_section <- function(x) {
23 |   header <- html_node(x, "h1,h2,h3,h4,h5")
24 |   tibble(id = as.character(html_attr(x, "id")),
25 |          title = str_replace(str_trim(html_text(header)),
26 |                                       "^\\d+(\\.\\d+)*\\s+", ""),
27 |          tag = xml_name(header))
28 | }
29 | 
30 | handle_html_file <- function(path, output_dir, r4ds_url) {
31 |   doc <- read_html(fs::path(output_dir, path))
32 |   solution_sections <- map_dfr(html_nodes(doc, "div.section.r4ds-section"),
33 |                                handle_section)
34 |   if (nrow(solution_sections)) {
35 |     solution_sections$path <- path
36 | 
37 |     r4ds_url <- httr::modify_url(r4ds_url, path = path)
38 |     r4ds_doc <- read_html(r4ds_url)
39 |     r4ds_sections <- map_dfr(html_nodes(r4ds_doc, "div.section"),
40 |                              handle_section)
41 | 
42 |     left_join(solution_sections, r4ds_sections,
43 |               by = "id", suffix = c("_solutions", "_r4ds"))
44 |   }
45 | }
46 | 
47 | main <- function() {
48 |   # Find any sections in solutions not in R4DS
49 |   output_dir <- bookdown:::load_config()$output_dir
50 |   filenames <- map(jsonlite::read_json(fs::path(output_dir, "toc.json")),
51 |                    "path") %>%
52 |     map_chr(1) %>%
53 |     keep(~ !.x %in% c("references.html"))
54 | 
55 |   config <- yaml::read_yaml("_config.yml")
56 | 
57 |   sections <- map_dfr(filenames, handle_html_file, output_dir = output_dir,
58 |                       r4ds_url = config$r4ds$url)
59 |   status <- 0
60 | 
61 |   missing_ids <- filter(sections, is.na(title_r4ds)) %>%
62 |     select(path, id)
63 |   if (nrow(missing_ids) > 0) {
64 |     cat("There sections have IDs not in R4DS", file = stderr())
65 |     cat(glue_data(missing_ids, "{path}#{id}"), file = stderr())
66 |     status <- 1
67 |   }
68 | 
69 |   different_titles <- filter(sections, title_solutions != title_r4ds) %>%
70 |     select(path, id, title_solutions, title_r4ds)
71 |   if (nrow(different_titles) > 0) {
72 |     cat("These sections have different titles than R4DS:")
73 |     cat(glue_data(different_titles, "{path}#{id}: '{title_solutions}' (solutions) '{title_r4ds}' (R4DS)"),
74 |         file = stderr())
75 |     status <- 1
76 |   }
77 | 
78 |   different_headings <- filter(sections, tag_solutions != tag_r4ds) %>%
79 |     select(path, id, tag_solutions, tag_r4ds)
80 |   if (nrow(different_headings) > 0) {
81 |     cat("These sections have different heading levels than R4DS:")
82 |     cat(glue_data(different_headings, "{path}#{id}: '{tag_solutions}' (solutions) '{tag_r4ds}' (R4DS)"),
83 |         file = stderr())
84 |     status <- 1
85 |   }
86 | 
87 |   quit(save = "no", status = status)
88 | }
89 | 
90 | main()
91 | 


--------------------------------------------------------------------------------
/bin/check-spelling.R:
--------------------------------------------------------------------------------
 1 | #!/bin/env Rscript
 2 | # Spell check
 3 | suppressPackageStartupMessages({
 4 |   library("spelling")
 5 |   library("magrittr")
 6 | })
 7 | wordlist_file <- "WORDLIST"
 8 | 
 9 | wordlist <- stringr::str_trim(readLines(wordlist_file))
10 | 
11 | files <- c(
12 |   list.files(here::here("."), pattern = "\\.(Rnw|Rmd)$", full.names = TRUE),
13 |   list.files(here::here("rmarkdown"),
14 |     pattern = "\\.(Rmd)$", full.names = TRUE
15 |   ),
16 |   here::here("README.md")
17 | ) %>%
18 |   normalizePath() %>%
19 |   unique()
20 | 
21 | misspelled_words <- spell_check_files(sort(files), ignore = wordlist)
22 | any_mispelled <- as.logical(nrow(misspelled_words))
23 | 
24 | if (any_mispelled) {
25 |   sink(file = stderr())
26 |   print(misspelled_words)
27 |   sink()
28 |   quit(save = "no", status = 1)
29 | }
30 | 


--------------------------------------------------------------------------------
/bin/create-sitemap.R:
--------------------------------------------------------------------------------
 1 | suppressPackageStartupMessages({
 2 |   library("xml2")
 3 |   library("optparse")
 4 |   library("glue")
 5 | })
 6 | 
 7 | sitemap_url_info <- function(path, base_url, priority = 0.5,
 8 |                         changefreq = "daily") {
 9 |   lastmod <- format(file.info(path)$mtime, format = "%Y-%m-%d",
10 |                     tz = "UTC")
11 |   loc <- paste0(stringr::str_replace(base_url, "/$", ""), "/", basename(path))
12 |   list(lastmod = lastmod, loc = loc, priority = priority,
13 |        changefreq = changefreq)
14 | }
15 | 
16 | create_sitemap <- function(output_dir, base_url,
17 |                            pattern = "^.*\\.html$",
18 |                            excludes = character(),
19 |                            changfreq = "daily", priority = 0.5) {
20 |   SITEMAP_XMLNS <- "http://www.sitemaps.org/schemas/sitemap/0.9"
21 |   filenames <- dir(output_dir, pattern = pattern)
22 |   filenames <- base::setdiff(filenames, excludes)
23 |   filenames <- file.path(output_dir, filenames)
24 |   sitemap <- xml_new_root("urlset", xmlns = SITEMAP_XMLNS)
25 |   for (file in filenames) {
26 |     info <- sitemap_url_info(file, base_url = base_url)
27 |     url <- xml_add_child(sitemap, "url")
28 |     xml_add_child(url, "loc", info$loc)
29 |     xml_add_child(url, "lastmod", info$lastmod)
30 |     xml_add_child(url, "priority", info$priority)
31 |     xml_add_child(url, "changefreq", info$changefreq)
32 |   }
33 |   sitemap_loc <- file.path(output_dir, "sitemap.xml")
34 |   cat(glue("Writing to {sitemap_loc}\n"))
35 |   write_xml(sitemap, sitemap_loc)
36 | }
37 | 
38 | main <- function() {
39 |   output_dir <- bookdown:::load_config()$output_dir
40 |   config <- yaml::read_yaml(here::here("_config.yml"))
41 |   create_sitemap(output_dir, config$deploy_url)
42 | }
43 | 
44 | main()
45 | 


--------------------------------------------------------------------------------
/bin/create-toc.R:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env Rscript
  2 | #' Write a table of contents
  3 | #'
  4 | #' This writes out a table of contents for R4DS Solutions to a JSON file.
  5 | #' This is useful for quickly navigating or looking up exercises.
  6 | #'
  7 | #'
  8 | suppressPackageStartupMessages({
  9 |   library("rvest")
 10 |   library("purrr")
 11 |   library("stringr")
 12 |   library("jsonlite")
 13 |   library("tibble")
 14 |   library("fs")
 15 |   library("dplyr")
 16 |   library("glue")
 17 |   library("optparse")
 18 | })
 19 | 
 20 | get_section_title <- function(x) {
 21 |   is_section_number <- function(x) {
 22 |     (xml_type(x) == "element") &&
 23 |       xml_name(x) == "span" &&
 24 |       html_has_class(x, "header-section-number")
 25 |   }
 26 |   header <- html_children(x)[[1]]
 27 |   title <- discard(xml_contents(header), is_section_number) %>%
 28 |     map_chr(html_text) %>%
 29 |     str_c(collapse = "") %>%
 30 |     str_trim()
 31 |   number <- html_text(html_node(header, "span.header-section-number"))
 32 |   list(title = title, number = number)
 33 | }
 34 | 
 35 | #' Return the classes of a HTML element
 36 | html_classes <- function(x) {
 37 |   klass <- html_attr(x, "class")
 38 |   if (!length(klass) || is.na(klass)) {
 39 |     character()
 40 |   } else {
 41 |     # Space separated - includes space, tab, and newlines
 42 |     # https://www.w3.org/TR/2011/WD-html5-20110525/elements.html#classes
 43 |     unique(str_split(str_trim(klass), "\\s")[[1]])
 44 |   }
 45 | }
 46 | 
 47 | #' Check whether a HTML element has a class
 48 | html_has_class <- function(x, kls) kls %in% html_classes(x)
 49 | 
 50 | # Get section level number
 51 | get_level <- function(x) {
 52 |   lvl <- str_extract(str_subset(html_classes(x), "^level\\d+$"),
 53 |                                  "\\d+")
 54 |   as.integer(lvl)
 55 | }
 56 | 
 57 | #' parse each section
 58 | process_section <- function(x, path = "/") {
 59 |   lvl <- get_level(x)
 60 |   current <- rlang::list2(id = html_attr(x, "id"),
 61 |                           !!!get_section_title(x),
 62 |                           path = path)
 63 |   # find next level of nodes
 64 |   sections <- map(html_nodes(x, glue("div.section.level{lvl + 1}")),
 65 |                              process_section, path = path)
 66 |   names(sections) <- map_chr(sections, "id")
 67 |   current[["sections"]] <- sections
 68 |   current
 69 | }
 70 | 
 71 | process_page <- function(path, output_dir) {
 72 |   doc <- read_html(fs::path(output_dir, path))
 73 |   # find top level heading
 74 |   process_section(html_node(doc, "div.section.level1"), path = path)
 75 | }
 76 | 
 77 | process_chapter <- function(x, output_dir) {
 78 |   a <- html_node(x, "a")
 79 |   data_level <- html_attr(x, "data-level")
 80 |   out <- list(chapter = if_else(data_level == "", NA_integer_,
 81 |                          as.integer(data_level)),
 82 |                path = html_attr(x, "data-path"),
 83 |                href = html_attr(a, "href"),
 84 |                name = str_replace(html_text(a), "^[\\d.]+ ", ""))
 85 |   out$sections <- process_page(out[["path"]], output_dir)
 86 |   out
 87 | }
 88 | 
 89 | #' create and write the table of contents
 90 | write_toc <- function(output_dir, path) {
 91 |   index <- read_html(file.path(output_dir, "index.html"))
 92 |   book_summary <- html_nodes(index, "div.book-summary>nav>ul")
 93 |   chapters <- map(html_nodes(book_summary, xpath = "./li[@class='chapter']"),
 94 |                   process_chapter, output_dir)
 95 |   outfile <- fs::path(output_dir, path)
 96 |   cat(glue("Writing to {outfile}\n"))
 97 |   write_json(chapters, outfile)
 98 | }
 99 | 
100 | main <- function() {
101 |   description <- str_c(
102 |     "Create a JSON table of contents for R4DS containing the sections ",
103 |     "for all HTML files"
104 |   )
105 |   parser <- OptionParser(description = description)
106 |   opts <- parse_args(parser, positional_arguments = TRUE)
107 |   path <- if (length(opts$args) < 1) {
108 |     "toc.json"
109 |   } else {
110 |     opts$args[[1]]
111 |   }
112 |   output_dir <- bookdown:::load_config()$output_dir
113 |   write_toc(output_dir, path)
114 | }
115 | 
116 | main()
117 | 


--------------------------------------------------------------------------------
/bin/deploy.sh:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env bash
  2 | # From https://github.com/X1011/git-directory-deploy
  3 | # License: BSD 3-Clause License
  4 | # Copyright: Daniel Smith
  5 | set -o errexit #abort if any command fails
  6 | me=$(basename "$0")
  7 | 
  8 | help_message="\
  9 | Usage: $me [-c FILE] [<options>]
 10 | Deploy generated files to a git branch.
 11 | 
 12 | Options:
 13 | 
 14 |   -h, --help               Show this help information.
 15 |   -v, --verbose            Increase verbosity. Useful for debugging.
 16 |   -e, --allow-empty        Allow deployment of an empty directory.
 17 |   -m, --message MESSAGE    Specify the message used when committing on the
 18 |                            deploy branch.
 19 |   -n, --no-hash            Don't append the source commit's hash to the deploy
 20 |                            commit's message.
 21 |   -c, --config-file PATH   Override default & environment variables' values
 22 |                            with those in set in the file at 'PATH'. Must be the
 23 |                            first option specified.
 24 | 
 25 | Variables:
 26 | 
 27 |   GIT_DEPLOY_DIR      Folder path containing the files to deploy.
 28 |   GIT_DEPLOY_BRANCH   Commit deployable files to this branch.
 29 |   GIT_DEPLOY_REPO     Push the deploy branch to this repository.
 30 | 
 31 | These variables have default values defined in the script. The defaults can be
 32 | overridden by environment variables. Any environment variables are overridden
 33 | by values set in a '.env' file (if it exists), and in turn by those set in a
 34 | file specified by the '--config-file' option."
 35 | 
 36 | parse_args() {
 37 | 	# Set args from a local environment file.
 38 | 	if [ -e ".env" ]; then
 39 | 		source .env
 40 | 	fi
 41 | 
 42 | 	# Set args from file specified on the command-line.
 43 | 	if [[ $1 = "-c" || $1 = "--config-file" ]]; then
 44 | 		source "$2"
 45 | 		shift 2
 46 | 	fi
 47 | 
 48 | 	# Parse arg flags
 49 | 	# If something is exposed as an environment variable, set/overwrite it
 50 | 	# here. Otherwise, set/overwrite the internal variable instead.
 51 | 	while : ; do
 52 | 		if [[ $1 = "-h" || $1 = "--help" ]]; then
 53 | 			echo "$help_message"
 54 | 			return 0
 55 | 		elif [[ $1 = "-v" || $1 = "--verbose" ]]; then
 56 | 			verbose=true
 57 | 			shift
 58 | 		elif [[ $1 = "-e" || $1 = "--allow-empty" ]]; then
 59 | 			allow_empty=true
 60 | 			shift
 61 | 		elif [[ ( $1 = "-m" || $1 = "--message" ) && -n $2 ]]; then
 62 | 			commit_message=$2
 63 | 			shift 2
 64 | 		elif [[ $1 = "-n" || $1 = "--no-hash" ]]; then
 65 | 			GIT_DEPLOY_APPEND_HASH=false
 66 | 			shift
 67 | 		else
 68 | 			break
 69 | 		fi
 70 | 	done
 71 | 
 72 | 	# Set internal option vars from the environment and arg flags. All internal
 73 | 	# vars should be declared here, with sane defaults if applicable.
 74 | 
 75 | 	# Source directory & target branch.
 76 | 	deploy_directory=${GIT_DEPLOY_DIR:-dist}
 77 | 	deploy_branch=${GIT_DEPLOY_BRANCH:-gh-pages}
 78 | 
 79 | 	#if no user identity is already set in the current git environment, use this:
 80 | 	default_username=${GIT_DEPLOY_USERNAME:-deploy.sh}
 81 | 	default_email=${GIT_DEPLOY_EMAIL:-}
 82 | 
 83 | 	#repository to deploy to. must be readable and writable.
 84 | 	repo=${GIT_DEPLOY_REPO:-origin}
 85 | 
 86 | 	#append commit hash to the end of message by default
 87 | 	append_hash=${GIT_DEPLOY_APPEND_HASH:-true}
 88 | }
 89 | 
 90 | main() {
 91 | 	parse_args "$@"
 92 | 
 93 | 	enable_expanded_output
 94 | 
 95 | 	if ! git diff --exit-code --quiet --cached; then
 96 | 		echo Aborting due to uncommitted changes in the index >&2
 97 | 		return 1
 98 | 	fi
 99 | 
100 | 	commit_title=`git log -n 1 --format="%s" HEAD`
101 | 	commit_hash=` git log -n 1 --format="%H" HEAD`
102 | 
103 | 	#default commit message uses last title if a custom one is not supplied
104 | 	if [[ -z $commit_message ]]; then
105 | 		commit_message="publish: $commit_title"
106 | 	fi
107 | 
108 | 	#append hash to commit message unless no hash flag was found
109 | 	if [ $append_hash = true ]; then
110 | 		commit_message="$commit_message"$'\n\n'"generated from commit $commit_hash"
111 | 	fi
112 | 
113 | 	previous_branch=`git rev-parse --abbrev-ref HEAD`
114 | 
115 | 	if [ ! -d "$deploy_directory" ]; then
116 | 		echo "Deploy directory '$deploy_directory' does not exist. Aborting." >&2
117 | 		return 1
118 | 	fi
119 | 
120 | 	# must use short form of flag in ls for compatibility with OS X and BSD
121 | 	if [[ -z `ls -A "$deploy_directory" 2> /dev/null` && -z $allow_empty ]]; then
122 | 		echo "Deploy directory '$deploy_directory' is empty. Aborting. If you're sure you want to deploy an empty tree, use the --allow-empty / -e flag." >&2
123 | 		return 1
124 | 	fi
125 | 
126 | 	if git ls-remote --exit-code $repo "refs/heads/$deploy_branch" ; then
127 | 		# deploy_branch exists in $repo; make sure we have the latest version
128 | 
129 | 		disable_expanded_output
130 | 		git fetch --force $repo $deploy_branch:$deploy_branch
131 | 		enable_expanded_output
132 | 	fi
133 | 
134 | 	# check if deploy_branch exists locally
135 | 	if git show-ref --verify --quiet "refs/heads/$deploy_branch"
136 | 	then incremental_deploy
137 | 	else initial_deploy
138 | 	fi
139 | 
140 | 	restore_head
141 | }
142 | 
143 | initial_deploy() {
144 | 	git --work-tree "$deploy_directory" checkout --orphan $deploy_branch
145 | 	git --work-tree "$deploy_directory" add --all
146 | 	commit+push
147 | }
148 | 
149 | incremental_deploy() {
150 | 	#make deploy_branch the current branch
151 | 	git symbolic-ref HEAD refs/heads/$deploy_branch
152 | 	#put the previously committed contents of deploy_branch into the index
153 | 	git --work-tree "$deploy_directory" reset --mixed --quiet
154 | 	git --work-tree "$deploy_directory" add --all
155 | 
156 | 	set +o errexit
157 | 	diff=$(git --work-tree "$deploy_directory" diff --exit-code --quiet HEAD --)$?
158 | 	set -o errexit
159 | 	case $diff in
160 | 		0) echo No changes to files in $deploy_directory. Skipping commit.;;
161 | 		1) commit+push;;
162 | 		*)
163 | 			echo git diff exited with code $diff. Aborting. Staying on branch $deploy_branch so you can debug. To switch back to master, use: git symbolic-ref HEAD refs/heads/master && git reset --mixed >&2
164 | 			return $diff
165 | 			;;
166 | 	esac
167 | }
168 | 
169 | commit+push() {
170 | 	set_user_id
171 | 	git --work-tree "$deploy_directory" commit -m "$commit_message"
172 | 
173 | 	disable_expanded_output
174 | 	#--quiet is important here to avoid outputting the repo URL, which may contain a secret token
175 | 	git push --quiet $repo $deploy_branch
176 | 	enable_expanded_output
177 | }
178 | 
179 | #echo expanded commands as they are executed (for debugging)
180 | enable_expanded_output() {
181 | 	if [ $verbose ]; then
182 | 		set -o xtrace
183 | 		set +o verbose
184 | 	fi
185 | }
186 | 
187 | #this is used to avoid outputting the repo URL, which may contain a secret token
188 | disable_expanded_output() {
189 | 	if [ $verbose ]; then
190 | 		set +o xtrace
191 | 		set -o verbose
192 | 	fi
193 | }
194 | 
195 | set_user_id() {
196 | 	if [[ -z `git config user.name` ]]; then
197 | 		git config user.name "$default_username"
198 | 	fi
199 | 	if [[ -z `git config user.email` ]]; then
200 | 		git config user.email "$default_email"
201 | 	fi
202 | }
203 | 
204 | restore_head() {
205 | 	if [[ $previous_branch = "HEAD" ]]; then
206 | 		#we weren't on any branch before, so just set HEAD back to the commit it was on
207 | 		git update-ref --no-deref HEAD $commit_hash $deploy_branch
208 | 	else
209 | 		git symbolic-ref HEAD refs/heads/$previous_branch
210 | 	fi
211 | 
212 | 	git reset --mixed
213 | }
214 | 
215 | filter() {
216 | 	sed -e "s|$repo|\$repo|g"
217 | }
218 | 
219 | sanitize() {
220 | 	"$@" 2> >(filter 1>&2) | filter
221 | }
222 | 
223 | [[ $1 = --source-only ]] || main "$@"
224 | 


--------------------------------------------------------------------------------
/bin/hypothesis.r:
--------------------------------------------------------------------------------
  1 | # Create GitHub issuses from hypothes.is annotations
  2 | suppressPackageStartupMessages({
  3 |   library("rapiclient")
  4 |   library("httr")
  5 |   library("ghql")
  6 |   library('purrr')
  7 |   library("stringr")
  8 |   library("glue")
  9 |   library("ghql")
 10 | })
 11 | 
 12 | 
 13 | 
 14 | HYPOTHESIS_USER <- "acct:jrnold@hypothes.is"
 15 | GITHUB_LABEL_INFO <- list(id="MDU6TGFiZWwxMzE5MTEyNTM4", name="hypothes.is")
 16 | GITHUB_REPO_ID  <- "MDEwOlJlcG9zaXRvcnk3NjgzMTMxMQ=="
 17 | 
 18 | 
 19 | get_annotations <- function(search_after, limit=200) {
 20 |   hypothesis_api <- suppressWarnings(get_api(url = "https://h.readthedocs.io/en/latest/api-reference/hypothesis-v1.yaml"))
 21 |   hypothesis_api$host <- "hypothes.is"
 22 |   hypothesis_api$basePath <- "/api"
 23 |   operations <- get_operations(hypothesis_api)
 24 |   res <- operations$Search_for_annotations(search_after=search_after, sort="created", order="asc",
 25 |                                            limit=limit,
 26 |                                            wildcard_uri = "https://jrnold.github.io/r4ds-exercise-solutions/*")
 27 |   content(res) %>%
 28 |     pluck("rows") %>%
 29 |     keep(function(x) {
 30 |       x$user != HYPOTHESIS_USER
 31 |     })
 32 | }
 33 | 
 34 | annotation_to_issue <- function(annotation, con) {
 35 |   x <- annotation[c("user", "id", "created", "updated", "uri", "text")]
 36 |   x[["link_incontext"]] <- annotation$links$incontext
 37 |   x[["link_html"]] <- annotation$links$html
 38 |   x[["text"]] <- str_c("> ", str_split(x$text, "\n")[[1]], collapse = "\n")
 39 |   title <- glue("Respond to {link_html}", .envir = x)
 40 |   body <- glue("Respond to hypothes.is annotation [{id}]({link_html}) on {uri} (user: {user}, created: {created}, updated: {updated})\n", "{text}",
 41 |                 "\n", "Link: {link_incontext}",
 42 |                 .envir = x, .sep = "\n")
 43 | 
 44 |   qry <- Query$new()
 45 |   qry$query('createNewIssue', "
 46 |   mutation createNewIssue($repositoryId: String!, $title: String!, $body: String!) {
 47 |     createIssue(input: {repositoryId: $repositoryId, title: $title, body: $body}) {
 48 |       issue{
 49 |         id
 50 |         title
 51 |       }
 52 |   }
 53 | }")
 54 | 
 55 |   qry$query('addLabelsToIssue', "
 56 |   mutation addLabelsToIssue($issueId: String!, $labels: [String!]!) {
 57 |     addLabelsToLabelable(input: {labelableId: $issueId, labelIds: $labels}) {
 58 |       labelable {
 59 |         ... on Issue {
 60 |           id
 61 |         }
 62 |       }
 63 |     }
 64 | }")
 65 | 
 66 |   res <- con$exec(qry$queries$createNewIssue, list(title = title, body = body, repositoryId = GITHUB_REPO_ID))
 67 |   data <- jsonlite::parse_json(res)$data
 68 |   issue_id <- data$createIssue$issue$id
 69 |   cat("Created issue ", data$createIssue$issue$title, "\n")
 70 |   res2 <- con$exec(qry$queries$addLabelsToIssue, list(issueId = issue_id, labels = list(GITHUB_LABEL_INFO$id)))
 71 |   cat("Added label to issue ", data$createIssue$issue$title, "\n")
 72 |   issue_id
 73 | }
 74 | 
 75 | connect_github <- function() {
 76 |   token <- Sys.getenv("GITHUB_PAT")
 77 |   con <- GraphqlClient$new(
 78 |     url = "https://api.github.com/graphql",
 79 |     headers = list(Authorization = paste0("Bearer ", token))
 80 |   )
 81 |   con$load_schema()
 82 |   con
 83 | }
 84 | 
 85 | main <- function() {
 86 |   search_after <- "2020-06-25T00:00:00"
 87 |   annotations <- get_annotations(search_after = search_after)
 88 |   con <- connect_github()
 89 |   map(annotations, annotation_to_issue, con)
 90 | }
 91 | 
 92 | annotations <- main()
 93 | 
 94 | # con$load_schema()
 95 | # qry <- Query$new()
 96 | # qry$query("getRepoId", "{
 97 | #   repository(owner: \"jrnold\", name: \"r4ds-exercise-solutions\") {
 98 | #     id
 99 | #     name
100 | #     labels {
101 | #       edges {
102 | #         node {
103 | #           id
104 | #           name
105 | #         }
106 | #       }
107 | #     }
108 | #   }
109 | # }")
110 | # (x <- con$exec(qry$queries$getRepoId))
111 | 
112 | #con <- connect_github()
113 | # #con$load_schema()
114 | # qry <- Query$new()
115 | # qry$query("getRepoIssues", "
116 | #   query getRepoIssues($owner: String!, $name: String!, $label: String!) {
117 | # 	  repository(name: $name, owner: $owner) {
118 | #     	issues (last:1, filterBy: {labels: [$label], createdBy: $owner}) {
119 | #           pageInfo {
120 | #             startCursor
121 | #             endCursor
122 | #             hasNextPage
123 | #           }
124 | #         	nodes {
125 | #             author {
126 | #               login
127 | #             }
128 | #             title
129 | #             createdAt
130 | #           }
131 | #         	totalCount
132 | #       }
133 | #   }
134 | # }")
135 | # (x <- con$exec(qry$queries$getRepoIssues, list(owner="jrnold", name = "r4ds-exercise-solutions", label = GITHUB_LABEL_INFO$name)))
136 | 


--------------------------------------------------------------------------------
/bin/is_primary_key.R:
--------------------------------------------------------------------------------
 1 | any_missing <- function(x) any(is.na(x))
 2 | 
 3 | #' Check whether variables are a primary key
 4 | #'
 5 | #' Check whether a set of variables is a primary key for a data frame.
 6 | #' Unlike SQL databases, R data frames do not enforce a primary key
 7 | #' constraint. This function checks whether a set of variables uniquely
 8 | #' identify a row.
 9 | #'
10 | #' @param tbl A tbl.
11 | #' @param ... One or more unquoted expressions separated by commas.
12 | #'   You can treat variable names like they are positions. This uses
13 | #'   The same semantics as [dplyr::select()].
14 | #' @return A logical vector of length one that is `TRUE` if the
15 | #'   variables are primary key, and `FALSE` otherwise.
16 | is_primary_key <- function(tbl, ...) {
17 |   variables <- quos(...)
18 |   # no elements can be missing
19 |   has_nulls <- summarise_at(tbl, vars(UQS(variables)), any_missing)
20 |   if (any(as.logical(has_nulls))) {
21 |     return(FALSE)
22 |   }
23 |   nrow(distinct(tbl, !!!variables)) == nrow(tbl)
24 | }
25 | 
26 | foo <- tribble(
27 |   ~a, ~b, ~c,
28 |   1, NA, 1,
29 |   2, 2, 1,
30 |   3, 3, 3
31 | )
32 | 
33 | is_key(foo, a)
34 | is_key(foo, b)
35 | is_key(foo, c)
36 | 
37 | is_key(foo, 1:3)
38 | 
39 | # check that columns in y are are foreign key of x
40 | is_foreign_key <- function(x, y, by = NULL) {
41 |   # check that by is a primary key for y
42 |   if (!rlang::eval_tidy(quo(is_primary_key(y, !!!by)))) {
43 |     return(FALSE)
44 |   }
45 |   # check that all x are found in y
46 |   !nrow(anti_join(x, y, by = by))
47 | }
48 | 
49 | 


--------------------------------------------------------------------------------
/bin/r4ds-toc.R:
--------------------------------------------------------------------------------
 1 | #' extract the table of contents from "R for Data Science" and save to a csv file
 2 | library("rvest")
 3 | library("purrr")
 4 | library("stringr")
 5 | library("jsonlite")
 6 | library("tibble")
 7 | library("dplyr")
 8 | library("readr")
 9 | 
10 | R4DS_INDEX <- "https://r4ds.had.co.nz/"
11 | 
12 | r4ds_chapters <- function() {
13 |   index <- read_html(R4DS_INDEX)
14 |   book_summary <- html_nodes(index, "div.book-summary")
15 |   chapters <- html_nodes(book_summary, "li.chapter") %>%
16 |     keep(~ str_detect(html_attr(.x, "data-level"), "^\\d+$")) %>%
17 |     map_dfr(~tibble(path = html_attr(.x, "data-path"),
18 |                     chapter = html_attr(.x, "data-level")))
19 |   chapters
20 | }
21 | 
22 | process_section <- function(section) {
23 |   header <- html_children(section)[[1]]
24 |   lvl <- str_split(html_attr(section, "class"), " ")[[1]] %>%
25 |     str_subset("^level(\\d+)$") %>%
26 |     str_extract("\\d+")
27 |   tibble(
28 |     section_level = lvl,
29 |     section_id = html_attr(section, "id"),
30 |     section_number = html_text(html_node(header, "span.header-section-number")),
31 |     section_name = str_replace(html_text(header), "^[\\d.]+\\s+", "")
32 |   )
33 | }
34 | 
35 | process_path <- function(path) {
36 |   doc <- read_html(str_c(R4DS_INDEX, path))
37 |   map_dfr(html_nodes(doc, "div.section"), process_section) %>%
38 |     mutate(path = path)
39 | }
40 | 
41 | create_toc <- function() {
42 |   chapters <- r4ds_chapters()
43 |   map_dfr(chapters$path, process_path) %>%
44 |     mutate(url = str_c(R4DS_INDEX, path, "#", section_id))
45 | }
46 | 
47 | main <- function() {
48 |   toc <- create_toc()
49 |   write_csv(toc, "r4ds-toc.csv")
50 | }
51 | 
52 | main()
53 | 


--------------------------------------------------------------------------------
/bin/render.R:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env Rscript
 2 | # Script to render book
 3 | suppressPackageStartupMessages({
 4 |   library("optparse")
 5 |   library("xml2")
 6 | })
 7 | 
 8 | # From devtools:::git_uncommitted
 9 | git_uncommitted <- function(path = ".") {
10 |   r <- git2r::repository(path, discover = TRUE)
11 |   st <- vapply(git2r::status(r), length, integer(1))
12 |   any(st != 0)
13 | }
14 | 
15 | # Adapted from devtools:::git_uncommitted
16 | check_uncommitted <- function(path = ".") {
17 |   if (git_uncommitted(path)) {
18 |     stop(paste("Uncommitted files.",
19 |       "All files should be committed before release.",
20 |       "Please add and commit.",
21 |       sep = " "
22 |     ))
23 |   }
24 | }
25 | 
26 | create_outdir <- function(output_dir) {
27 |   # create nojekyll if it doesn't exist
28 |   dir.create(output_dir, recursive = TRUE, showWarnings = FALSE)
29 |   nojekyll <- file.path(output_dir, ".nojekyll")
30 |   if (!file.exists(nojekyll)) {
31 |     cat("Creating ", nojekyll, "\n")
32 |     con <- file(nojekyll, "w")
33 |     close(con)
34 |   }
35 | }
36 | 
37 | render <- function(input, output_format = "all", force = FALSE,
38 |                    config = "_config.yml", ...) {
39 |   if (rmarkdown::pandoc_version() < 2) {
40 |     stop("This book requires pandoc > 2")
41 |   }
42 |   if (!force) {
43 |     check_uncommitted(dirname(input[[1]]))
44 |   }
45 |   output_dir <- yaml::read_yaml("_bookdown.yml")$output_dir
46 |   create_outdir(output_dir)
47 |   bookdown::render_book(input = "index.Rmd", output_format = output_format, ...,
48 |                         envir = new.env(), clean_envir = FALSE)
49 | }
50 | 
51 | main <- function(args = NULL) {
52 |   option_list <- list(
53 |     make_option(c("-f", "--force"),
54 |       action = "store_true", default = FALSE,
55 |       help = "Render even if there are uncomitted changes."
56 |     ),
57 |     make_option(c("-q", "--quiet"),
58 |       action = "store_true", default = FALSE,
59 |       help = "Do not use verbose output"
60 |     ),
61 |     make_option("--to", default = "all",
62 |                 help = "Bookdown output format to use"),
63 |     make_option("--config", default = "_config.yml",
64 |                 help = "Path to project config file. Needed for output URL.")
65 |   )
66 |   if (is.null(args)) {
67 |     args <- commandArgs(TRUE)
68 |   }
69 |   opts <- parse_args(OptionParser(
70 |       usage = "%prog [options] [output_format|all|html|pdf]",
71 |       option_list = option_list
72 |     ),
73 |     args = args,
74 |     positional_arguments = TRUE,
75 |     convert_hyphens_to_underscores = TRUE
76 |   )
77 |   output_format <- opts$options$to
78 |   if (output_format == "html") {
79 |     output_format <- "bookdown::gitbook"
80 |   } else if (output_format == "pdf") {
81 |     output_format <- "bookdown::pdf_book"
82 |   }
83 |   input <- if (!length(opts$args)) {
84 |     "index.Rmd"
85 |   } else {
86 |     opts$args
87 |   }
88 |   render(
89 |     input,
90 |     output_format = output_format,
91 |     force = opts$options$force,
92 |     quiet = opts$options$quiet
93 |   )
94 | }
95 | 
96 | main()
97 | 


--------------------------------------------------------------------------------
/bin/serve.R:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env Rscript
2 | bookdown::serve_book(".", preview = TRUE, daemon = TRUE, in_session = FALSE)
3 | 


--------------------------------------------------------------------------------
/bin/style.R:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env Rscript
2 | styler::style_dir(".", filetype = c("R", "Rmd"))
3 | 


--------------------------------------------------------------------------------
/communicate.Rmd:
--------------------------------------------------------------------------------
1 | # (PART) Communicate {-}
2 | 
3 | # Introduction {#communicate-intro .r4ds-section}
4 | 
5 | `r no_exercises()`
6 | 


--------------------------------------------------------------------------------
/contributions.Rmd:
--------------------------------------------------------------------------------
 1 | # Contributing
 2 | 
 3 | Like *R for Data Science*, the solutions to it have been developed in the open, and they can improve with your contributions. There are a number of ways you can help make the solutions even better:
 4 | 
 5 | -   If you don't understand something, please
 6 |     [let me know](mailto:jeffrey.arnold@gmail.com). Your feedback on what is confusing or hard to understand is valuable.
 7 | 
 8 | -   If you spot a typo, feel free to edit the underlying page and send a pull
 9 |     request. If you've never done this before, the process is very easy:
10 | 
11 |     -   Click the edit this page on the sidebar.
12 | 
13 |     -   Make the changes using GitHub's in-page editor and save.
14 | 
15 |     -   Submit a pull request and include a brief description of your changes.
16 |         "Fixing typos" is perfectly adequate.
17 | 


--------------------------------------------------------------------------------
/data/README.md:
--------------------------------------------------------------------------------
1 | This directory contains dummy CSV files so that Exercise 21.3.1 can run code.
2 | 


--------------------------------------------------------------------------------
/data/file1.csv:
--------------------------------------------------------------------------------
1 | X1,X2
2 | 1,"a"
3 | 2,"b"
4 | 


--------------------------------------------------------------------------------
/data/file2.csv:
--------------------------------------------------------------------------------
1 | X1,X2
2 | 3,"c"
3 | 4,"d"
4 | 


--------------------------------------------------------------------------------
/data/file3.csv:
--------------------------------------------------------------------------------
1 | X1,X2
2 | 5,"e"
3 | 6,"f"
4 | 


--------------------------------------------------------------------------------
/datetimes.Rmd:
--------------------------------------------------------------------------------
  1 | # Dates and times {#dates-and-times .r4ds-section}
  2 | 
  3 | ## Introduction {#introduction-10 .r4ds-section}
  4 | 
  5 | ```{r message=FALSE,cache=FALSE}
  6 | library("tidyverse")
  7 | library("lubridate")
  8 | library("nycflights13")
  9 | ```
 10 | 
 11 | ## Creating date/times {#creating-datetimes .r4ds-section}
 12 | 
 13 | This code is needed by exercises.
 14 | ```{r}
 15 | make_datetime_100 <- function(year, month, day, time) {
 16 |   make_datetime(year, month, day, time %/% 100, time %% 100)
 17 | }
 18 | 
 19 | flights_dt <- flights %>%
 20 |   filter(!is.na(dep_time), !is.na(arr_time)) %>%
 21 |   mutate(
 22 |     dep_time = make_datetime_100(year, month, day, dep_time),
 23 |     arr_time = make_datetime_100(year, month, day, arr_time),
 24 |     sched_dep_time = make_datetime_100(year, month, day, sched_dep_time),
 25 |     sched_arr_time = make_datetime_100(year, month, day, sched_arr_time)
 26 |   ) %>%
 27 |   select(origin, dest, ends_with("delay"), ends_with("time"))
 28 | ```
 29 | 
 30 | ### Exercise 16.2.1 {.unnumbered .exercise data-number="16.2.1"}
 31 | 
 32 | <div class="question">
 33 | 
 34 | What happens if you parse a string that
 35 | contains invalid dates?
 36 | 
 37 | </div>
 38 | 
 39 | <div class="answer">
 40 | 
 41 | ```{r}
 42 | ret <- ymd(c("2010-10-10", "bananas"))
 43 | print(class(ret))
 44 | ret
 45 | ```
 46 | 
 47 | It produces an `NA` and a warning message.
 48 | 
 49 | </div>
 50 | 
 51 | ### Exercise 16.2.2 {.unnumbered .exercise data-number="16.2.2"}
 52 | 
 53 | <div class="question">
 54 | 
 55 | What does the `tzone` argument to `today()` do? 
 56 | Why is it important?
 57 | 
 58 | </div>
 59 | 
 60 | <div class="answer">
 61 | 
 62 | It determines the time-zone of the date. 
 63 | Since different time-zones can have different dates, the value of `today()` can vary depending on the time-zone specified.
 64 | 
 65 | </div>
 66 | 
 67 | ### Exercise 16.2.3 {.unnumbered .exercise data-number="16.2.3"}
 68 | 
 69 | <div class="question">
 70 | 
 71 | Use the appropriate lubridate function to parse each of the following dates:
 72 | 
 73 | ```{r}
 74 | d1 <- "January 1, 2010"
 75 | d2 <- "2015-Mar-07"
 76 | d3 <- "06-Jun-2017"
 77 | d4 <- c("August 19 (2015)", "July 1 (2015)")
 78 | d5 <- "12/30/14"
 79 | ```
 80 | 
 81 | </div>
 82 | 
 83 | <div class="answer">
 84 | 
 85 | ```{r}
 86 | mdy(d1)
 87 | ymd(d2)
 88 | dmy(d3)
 89 | mdy(d4)
 90 | mdy(d5)
 91 | ```
 92 | 
 93 | </div>
 94 | 
 95 | ## Date-time components {#date-time-components .r4ds-section}
 96 | 
 97 | The following code from the chapter is used
 98 | 
 99 | ```{r}
100 | sched_dep <- flights_dt %>%
101 |   mutate(minute = minute(sched_dep_time)) %>%
102 |   group_by(minute) %>%
103 |   summarise(
104 |     avg_delay = mean(arr_delay, na.rm = TRUE),
105 |     n = n()
106 |   )
107 | ```
108 | In the previous code, the difference between rounded and un-rounded dates provides the within-period time.
109 | 
110 | ### Exercise 16.3.1 {.unnumbered .exercise data-number="16.3.1"}
111 | 
112 | <div class="question">
113 | 
114 | How does the distribution of flight times
115 | within a day change over the course of the year?
116 | 
117 | </div>
118 | 
119 | <div class="answer">
120 | 
121 | Let's try plotting this by month:
122 | ```{r}
123 | flights_dt %>%
124 |   filter(!is.na(dep_time)) %>%
125 |   mutate(dep_hour = update(dep_time, yday = 1)) %>%
126 |   mutate(month = factor(month(dep_time))) %>%
127 |   ggplot(aes(dep_hour, color = month)) +
128 |   geom_freqpoly(binwidth = 60 * 60)
129 | ```
130 | 
131 | This will look better if everything is normalized within groups. The reason
132 | that February is lower is that there are fewer days and thus fewer flights.
133 | ```{r}
134 | flights_dt %>%
135 |   filter(!is.na(dep_time)) %>%
136 |   mutate(dep_hour = update(dep_time, yday = 1)) %>%
137 |   mutate(month = factor(month(dep_time))) %>%
138 |   ggplot(aes(dep_hour, color = month)) +
139 |   geom_freqpoly(aes(y = ..density..), binwidth = 60 * 60)
140 | ```
141 | 
142 | At least to me there doesn't appear to much difference in within-day distribution over the year, but I maybe thinking about it incorrectly.
143 | 
144 | </div>
145 | 
146 | ### Exercise 16.3.2 {.unnumbered .exercise data-number="16.3.2"}
147 | 
148 | <div class="question">
149 | 
150 | Compare `dep_time`, `sched_dep_time` and `dep_delay`. Are they consistent? Explain your findings.
151 | 
152 | </div>
153 | 
154 | <div class="answer">
155 | 
156 | If they are consistent, then `dep_time = sched_dep_time + dep_delay`.
157 | 
158 | ```{r}
159 | flights_dt %>%
160 |   mutate(dep_time_ = sched_dep_time + dep_delay * 60) %>%
161 |   filter(dep_time_ != dep_time) %>%
162 |   select(dep_time_, dep_time, sched_dep_time, dep_delay)
163 | ```
164 | 
165 | There exist discrepancies. It looks like there are mistakes in the dates. These
166 | are flights in which the actual departure time is on the *next* day relative to
167 | the scheduled departure time. We forgot to account for this when creating the
168 | date-times using `make_datetime_100()` function in [16.2.2 From individual components](https://r4ds.had.co.nz/dates-and-times.html#from-individual-components). The code would have had to check if the departure time is less than
169 | the scheduled departure time plus departure delay (in minutes). Alternatively, simply adding the departure delay to the scheduled departure time is a more robust way to construct the departure time because it will automatically account for crossing into the next day.
170 | 
171 | </div>
172 | 
173 | ### Exercise 16.3.3 {.unnumbered .exercise data-number="16.3.3"}
174 | 
175 | <div class="question">
176 | 
177 | Compare `air_time` with the duration between the departure and arrival. 
178 | Explain your findings.
179 | 
180 | </div>
181 | 
182 | <div class="answer">
183 | 
184 | ```{r}
185 | flights_dt %>%
186 |   mutate(
187 |     flight_duration = as.numeric(arr_time - dep_time),
188 |     air_time_mins = air_time,
189 |     diff = flight_duration - air_time_mins
190 |   ) %>%
191 |   select(origin, dest, flight_duration, air_time_mins, diff)
192 | ```
193 | 
194 | </div>
195 | 
196 | ### Exercise 16.3.4 {.unnumbered .exercise data-number="16.3.4"}
197 | 
198 | <div class="question">
199 | 
200 | How does the average delay time change over the course of a day? Should you use `dep_time` or `sched_dep_time`? Why?
201 | 
202 | </div>
203 | 
204 | <div class="answer">
205 | 
206 | Use `sched_dep_time` because that is the relevant metric for someone scheduling a flight. Also, using `dep_time` will always bias delays to later in the day since delays will push flights later.
207 | 
208 | ```{r}
209 | flights_dt %>%
210 |   mutate(sched_dep_hour = hour(sched_dep_time)) %>%
211 |   group_by(sched_dep_hour) %>%
212 |   summarise(dep_delay = mean(dep_delay)) %>%
213 |   ggplot(aes(y = dep_delay, x = sched_dep_hour)) +
214 |   geom_point() +
215 |   geom_smooth()
216 | ```
217 | 
218 | </div>
219 | 
220 | ### Exercise 16.3.5 {.unnumbered .exercise data-number="16.3.5"}
221 | 
222 | <div class="question">
223 | 
224 | On what day of the week should you leave if you want to minimize the chance of a delay?
225 | 
226 | </div>
227 | 
228 | <div class="answer">
229 | 
230 | Saturday has the lowest average departure delay time and the lowest average arrival delay time.
231 | 
232 | ```{r 16.3.5.tbl}
233 | flights_dt %>%
234 |   mutate(dow = wday(sched_dep_time)) %>%
235 |   group_by(dow) %>%
236 |   summarise(
237 |     dep_delay = mean(dep_delay),
238 |     arr_delay = mean(arr_delay, na.rm = TRUE)
239 |   ) %>%
240 |   print(n = Inf)
241 | ```
242 | 
243 | ```{r 16.3.5.fig1}
244 | flights_dt %>%
245 |   mutate(wday = wday(dep_time, label = TRUE)) %>% 
246 |   group_by(wday) %>% 
247 |   summarize(ave_dep_delay = mean(dep_delay, na.rm = TRUE)) %>% 
248 |   ggplot(aes(x = wday, y = ave_dep_delay)) + 
249 |   geom_bar(stat = "identity")
250 | ```
251 | 
252 | ```{r 16.3.5.fig2}
253 | flights_dt %>% 
254 |   mutate(wday = wday(dep_time, label = TRUE)) %>% 
255 |   group_by(wday) %>% 
256 |   summarize(ave_arr_delay = mean(arr_delay, na.rm = TRUE)) %>% 
257 |   ggplot(aes(x = wday, y = ave_arr_delay)) + 
258 |   geom_bar(stat = "identity")
259 | ```
260 | 
261 | </div>
262 | 
263 | ### Exercise 16.3.6 {.unnumbered .exercise data-number="16.3.6"}
264 | 
265 | <div class="question">
266 | 
267 | What makes the distribution of `diamonds$carat` and `flights$sched_dep_time` similar?
268 | 
269 | </div>
270 | 
271 | <div class="answer">
272 | 
273 | ```{r}
274 | ggplot(diamonds, aes(x = carat)) +
275 |   geom_density()
276 | ```
277 | 
278 | In both `carat` and `sched_dep_time` there are abnormally large numbers of values are at nice "human" numbers. In `sched_dep_time` it is at 00 and 30 minutes. In carats, it is at 0, 1/3, 1/2, 2/3,
279 | 
280 | ```{r}
281 | ggplot(diamonds, aes(x = carat %% 1 * 100)) +
282 |   geom_histogram(binwidth = 1)
283 | ```
284 | 
285 | In scheduled departure times it is 00 and 30 minutes, and minutes
286 | ending in 0 and 5.
287 | 
288 | ```{r}
289 | ggplot(flights_dt, aes(x = minute(sched_dep_time))) +
290 |   geom_histogram(binwidth = 1)
291 | ```
292 | 
293 | </div>
294 | 
295 | ### Exercise 16.3.7 {.unnumbered .exercise data-number="16.3.7"}
296 | 
297 | <div class="question">
298 | 
299 | Confirm my hypothesis that the early departures of flights in minutes 20-30 and 50-60 are caused by scheduled flights that leave early. 
300 | Hint: create a binary variable that tells you whether or not a flight was delayed.
301 | 
302 | </div>
303 | 
304 | <div class="answer">
305 | 
306 | First, I create a binary variable `early` that is equal to 1 if a flight leaves early, and 0 if it does not.
307 | Then, I group flights by the minute of departure.
308 | This shows that the proportion of flights that are early departures is highest between minutes 20--30 and 50--60.
309 | ```{r}
310 | flights_dt %>% 
311 |   mutate(minute = minute(dep_time),
312 |          early = dep_delay < 0) %>% 
313 |   group_by(minute) %>% 
314 |   summarise(
315 |     early = mean(early, na.rm = TRUE),
316 |     n = n()) %>% 
317 |   ggplot(aes(minute, early)) +
318 |     geom_line()
319 | ```
320 | 
321 | </div>
322 | 
323 | ## Time spans {#time-spans .r4ds-section}
324 | 
325 | ### Exercise 16.4.1 {.unnumbered .exercise data-number="16.4.1"}
326 | 
327 | <div class="question">
328 | 
329 | Why is there `months()` but no `dmonths()`?
330 | 
331 | </div>
332 | 
333 | <div class="answer">
334 | 
335 | There is no unambiguous value of months in terms of seconds since months have differing numbers of days.
336 | 
337 | -   31 days: January, March, May, July, August, October, December
338 | -   30 days: April, June, September, November
339 | -   28 or 29 days: February
340 | 
341 | The month is not a duration of time defined independently of when it occurs, but a special interval between two dates.
342 | 
343 | </div>
344 | 
345 | ### Exercise 16.4.2 {.unnumbered .exercise data-number="16.4.2"}
346 | 
347 | <div class="question">
348 | 
349 | Explain `days(overnight * 1)` to someone who has just started learning R. 
350 | How does it work?
351 | 
352 | </div>
353 | 
354 | <div class="answer">
355 | 
356 | The variable `overnight` is equal to `TRUE` or `FALSE`.
357 | If it is an overnight flight, this becomes 1 day, and if not, then overnight = 0, and no days are added to the date.
358 | 
359 | </div>
360 | 
361 | ### Exercise 16.4.3 {.unnumbered .exercise data-number="16.4.3"}
362 | 
363 | <div class="question">
364 | 
365 | Create a vector of dates giving the first day of every month in 2015. 
366 | Create a vector of dates giving the first day of every month in the current year.
367 | 
368 | </div>
369 | 
370 | <div class="answer">
371 | 
372 | A vector of the first day of the month for every month in 2015:
373 | ```{r}
374 | ymd("2015-01-01") + months(0:11)
375 | ```
376 | 
377 | To get the vector of the first day of the month for *this* year, we first need to figure out what this year is, and get January 1st of it.
378 | I can do that by taking `today()` and truncating it to the year using `floor_date()`:
379 | ```{r}
380 | floor_date(today(), unit = "year") + months(0:11)
381 | ```
382 | 
383 | </div>
384 | 
385 | ### Exercise 16.4.4 {.unnumbered .exercise data-number="16.4.4"}
386 | 
387 | <div class="question">
388 | 
389 | Write a function that given your birthday (as a date), returns how old you are in years.
390 | 
391 | </div>
392 | 
393 | <div class="answer">
394 | 
395 | ```{r}
396 | age <- function(bday) {
397 |   (bday %--% today()) %/% years(1)
398 | }
399 | age(ymd("1990-10-12"))
400 | ```
401 | 
402 | </div>
403 | 
404 | ### Exercise 16.4.5 {.unnumbered .exercise data-number="16.4.5"}
405 | 
406 | <div class="question">
407 | 
408 | Why can’t `(today() %--% (today() + years(1)) / months(1)` work?
409 | 
410 | </div>
411 | 
412 | <div class="answer">
413 | 
414 | The code in the question is missing a parentheses.
415 | So, I will assume that that the correct code is,
416 | ```{r}
417 | (today() %--% (today() + years(1))) / months(1)
418 | ```
419 | 
420 | While this code will not display a warning or message, it does not work exactly as
421 | expected. The problem is discussed in the [Intervals](https://r4ds.had.co.nz/dates-and-times.html#intervals) section.
422 | 
423 | The numerator of the expression, `(today() %--% (today() + years(1))`, is an *interval*, which includes both a duration of time and a starting point. The interval has an exact number of seconds.
424 | The denominator of the expression, `months(1)`, is a period, which is meaningful to humans but not defined in terms of an exact number of seconds.
425 | Months can be 28, 29, 30, or 31 days, so it is not clear what `months(1)` divide by?
426 | The code does not produce a warning message, but it will not always produce the correct result.
427 | 
428 | To find the number of months within an interval use `%/%` instead of `/`,
429 | ```{r}
430 | (today() %--% (today() + years(1))) %/% months(1)
431 | ```
432 | 
433 | Alternatively, we could define a "month" as 30 days, and run
434 | ```{r}
435 | (today() %--% (today() + years(1))) / days(30)
436 | ```
437 | 
438 | This approach will not work with `today() + years(1)`, which is not defined for February 29th on leap years:
439 | ```{r}
440 | as.Date("2016-02-29") + years(1)
441 | ```
442 | 
443 | </div>
444 | 
445 | ## Time zones {#time-zones .r4ds-section}
446 | 
447 | `r no_exercises()`
448 | 


--------------------------------------------------------------------------------
/diagrams/Lahman1.graffle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/diagrams/Lahman1.graffle


--------------------------------------------------------------------------------
/diagrams/Lahman1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/diagrams/Lahman1.png


--------------------------------------------------------------------------------
/diagrams/Lahman2.graffle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/diagrams/Lahman2.graffle


--------------------------------------------------------------------------------
/diagrams/Lahman2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/diagrams/Lahman2.png


--------------------------------------------------------------------------------
/diagrams/Lahman3.graffle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/diagrams/Lahman3.graffle


--------------------------------------------------------------------------------
/diagrams/Lahman3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/diagrams/Lahman3.png


--------------------------------------------------------------------------------
/diagrams/master-batting-salaries.graffle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/diagrams/master-batting-salaries.graffle


--------------------------------------------------------------------------------
/diagrams/master-batting-salaries.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/diagrams/master-batting-salaries.png


--------------------------------------------------------------------------------
/diagrams/nested_set_1.dot:
--------------------------------------------------------------------------------
 1 | digraph nested_set_1 {
 2 |   node[shape=box]
 3 |   graph[style=rounded]
 4 |   # subgraph for R information
 5 |   subgraph cluster_abcdef {
 6 |     node[style=filled,fillcolor=gray90]
 7 |     "b"
 8 |     "a"
 9 |     subgraph cluster_ef {
10 |       graph[fillcolor=gray90,style="rounded,filled"]
11 |       node[fillcolor=gray80]
12 |       "f"
13 |       "e"
14 |     }
15 |     subgraph cluster_cd {
16 |       graph[fillcolor=gray90,style="rounded,filled"]
17 |       node[fillcolor=gray80]
18 |       "d"
19 |       "c"
20 |     }
21 |   }
22 | }
23 | 


--------------------------------------------------------------------------------
/diagrams/nested_set_2.dot:
--------------------------------------------------------------------------------
 1 | digraph nested_set_2 {
 2 |   node[shape=box]
 3 |   graph[style=rounded]
 4 |   # subgraph for R information
 5 |   subgraph cluster_1 {
 6 |     subgraph cluster_2 {
 7 |       graph[fillcolor=gray90,style="rounded,filled"]
 8 |       subgraph cluster_3 {
 9 |         graph[fillcolor=gray80]
10 |         subgraph cluster_4 {
11 |           graph[fillcolor=gray70]
12 |           subgraph cluster_5 {
13 |             graph[fillcolor=gray60]
14 |             subgraph cluster_6 {
15 |               graph[fillcolor=gray50]
16 |               node[style=filled,fillcolor=gray40]
17 |               "a"
18 |             }
19 |           }
20 |         }
21 |       }
22 |     }
23 |   }
24 | }
25 | 


--------------------------------------------------------------------------------
/diagrams/nycflights.graffle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/diagrams/nycflights.graffle


--------------------------------------------------------------------------------
/diagrams/nycflights.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/diagrams/nycflights.png


--------------------------------------------------------------------------------
/docker-compose.yml:
--------------------------------------------------------------------------------
 1 | version: '3.2'
 2 | services:
 3 |   nodejs:
 4 |     image: "node:latest"
 5 |     command: "npm test"
 6 |     volumes:
 7 |     - type: bind
 8 |       target: /app
 9 |       source: "."
10 |     working_dir: /app
11 | 
12 |   r:
13 |     build: "."
14 |     volumes:
15 |     - type: bind
16 |       target: /home/rstudio/r4ds-exercise-solutions
17 |       source: "."
18 |     ports:
19 |     - "8787:8787"
20 |     working_dir: /home/rstudio/r4ds-exercise-solutions
21 | 


--------------------------------------------------------------------------------
/explore.Rmd:
--------------------------------------------------------------------------------
1 | # (PART) Explore {-}
2 | 
3 | # Introduction {#explore-intro .r4ds-section}
4 | 
5 | `r no_exercises()`
6 | 


--------------------------------------------------------------------------------
/factors.Rmd:
--------------------------------------------------------------------------------
  1 | # Factors {#factors .r4ds-section}
  2 | 
  3 | ## Introduction {#introduction-9 .r4ds-section}
  4 | 
  5 | Functions and packages:
  6 | 
  7 | ```{r setup,message = FALSE,cache=FALSE}
  8 | library("tidyverse")
  9 | ```
 10 | The forcats package does not need to be explicitly loaded, since the recent versions of the tidyverse package now attach it.
 11 | 
 12 | ## Creating factors {#creating-factors .r4ds-section}
 13 | 
 14 | `r no_exercises()`
 15 | 
 16 | ## General Social Survey {#general-social-survey .r4ds-section}
 17 | 
 18 | ### Exercise 15.3.1 {.unnumbered .exercise data-number="15.3.1"}
 19 | 
 20 | <div class="question">
 21 | 
 22 | Explore the distribution of `rincome` (reported income).
 23 | What makes the default bar chart hard to understand?
 24 | How could you improve the plot?
 25 | 
 26 | </div>
 27 | 
 28 | <div class="answer">
 29 | 
 30 | My first attempt is to use `geom_bar()` with the default settings.
 31 | ```{r}
 32 | rincome_plot <-
 33 |   gss_cat %>%
 34 |   ggplot(aes(x = rincome)) +
 35 |   geom_bar()
 36 | rincome_plot
 37 | ```
 38 | 
 39 | The problem with default bar chart settings, are that the labels overlapping and impossible to read.
 40 | I'll try changing the angle of the x-axis labels to vertical so that they will not overlap.
 41 | ```{r}
 42 | rincome_plot +
 43 |   theme(axis.text.x = element_text(angle = 90, hjust = 1))
 44 | ```
 45 | 
 46 | This is better because the labels are not overlapping, but also difficult to read because the labels are vertical.
 47 | I could try angling the labels so that they are easier to read, but not overlapping.
 48 | ```{r}
 49 | rincome_plot +
 50 |   theme(axis.text.x = element_text(angle = 45, hjust = 1))
 51 | ```
 52 | 
 53 | But the solution I prefer for bar charts with long labels is to flip the axes, so that the bars are horizontal. 
 54 | Then the category labels are also horizontal, and easy to read.
 55 | ```{r}
 56 | rincome_plot +
 57 |   coord_flip()
 58 | ```
 59 | 
 60 | Though more than asked for in this question, I could further improve this plot by 
 61 | 
 62 | 1.  removing the "Not applicable" responses, 
 63 | 1.  renaming "Lt \$1000" to "Less than \$1000",
 64 | 1.  using color to distinguish non-response categories ("Refused", "Don't know", and "No answer") from income levels ("Lt $1000", ...), 
 65 | 1.  adding meaningful y- and x-axis titles, and
 66 | 1.  formatting the counts axis labels to use commas.
 67 | 
 68 | ```{r}
 69 | gss_cat %>%
 70 |   filter(!rincome %in% c("Not applicable")) %>%
 71 |   mutate(rincome = fct_recode(rincome,
 72 |     "Less than $1000" = "Lt $1000"
 73 |   )) %>%
 74 |   mutate(rincome_na = rincome %in% c("Refused", "Don't know", "No answer")) %>%
 75 |   ggplot(aes(x = rincome, fill = rincome_na)) +
 76 |   geom_bar() +
 77 |   coord_flip() +
 78 |   scale_y_continuous("Number of Respondents", labels = scales::comma) +
 79 |   scale_x_discrete("Respondent's Income") +
 80 |   scale_fill_manual(values = c("FALSE" = "black", "TRUE" = "gray")) +
 81 |   theme(legend.position = "None")
 82 | ```
 83 | 
 84 | If I were only interested in non-missing responses, then I could drop all respondents who answered "Not applicable", "Refused", "Don't know", or "No answer".
 85 | ```{r}
 86 | gss_cat %>%
 87 |   filter(!rincome %in% c("Not applicable", "Don't know", "No answer", "Refused")) %>%
 88 |   mutate(rincome = fct_recode(rincome,
 89 |     "Less than $1000" = "Lt $1000"
 90 |   )) %>%
 91 |   ggplot(aes(x = rincome)) +
 92 |   geom_bar() +
 93 |   coord_flip() +
 94 |   scale_y_continuous("Number of Respondents", labels = scales::comma) +
 95 |   scale_x_discrete("Respondent's Income")
 96 | ```
 97 | 
 98 | A side-effect of `coord_flip()` is that the label ordering on the x-axis, from lowest (top) to highest (bottom) is counterintuitive.
 99 | The next section introduces a function `fct_reorder()` which can help with this.
100 | 
101 | </div>
102 | 
103 | ### Exercise 15.3.2 {.unnumbered .exercise data-number="15.3.2"}
104 | 
105 | <div class="question">
106 | What is the most common `relig` in this survey?
107 | What’s the most common `partyid`?
108 | </div>
109 | 
110 | <div class="answer">
111 | 
112 | The most common `relig` is "Protestant"
113 | ```{r}
114 | gss_cat %>%
115 |   count(relig) %>%
116 |   arrange(desc(n)) %>%
117 |   head(1)
118 | ```
119 | 
120 | The most common `partyid` is "Independent"
121 | ```{r}
122 | gss_cat %>%
123 |   count(partyid) %>%
124 |   arrange(desc(n)) %>%
125 |   head(1)
126 | ```
127 | 
128 | </div>
129 | 
130 | ### Exercise 15.3.3 {.unnumbered .exercise data-number="15.3.3"}
131 | 
132 | <div class="question">
133 | Which `relig` does `denom` (denomination) apply to?
134 | How can you find out with a table?
135 | How can you find out with a visualization?
136 | </div>
137 | 
138 | <div class="answer">
139 | 
140 | ```{r}
141 | levels(gss_cat$denom)
142 | ```
143 | 
144 | From the context it is clear that `denom` refers to "Protestant" (and unsurprising given that it is the largest category in `freq`).
145 | Let's filter out the non-responses, no answers, others, not-applicable, or
146 | no denomination, to leave only answers to denominations.
147 | After doing that, the only remaining responses are "Protestant".
148 | ```{r}
149 | gss_cat %>%
150 |   filter(!denom %in% c(
151 |     "No answer", "Other", "Don't know", "Not applicable",
152 |     "No denomination"
153 |   )) %>%
154 |   count(relig)
155 | ```
156 | 
157 | This is also clear in a scatter plot of `relig` vs. `denom` where the points are
158 | proportional to the size of the number of answers (since otherwise there would be overplotting).
159 | ```{r}
160 | gss_cat %>%
161 |   count(relig, denom) %>%
162 |   ggplot(aes(x = relig, y = denom, size = n)) +
163 |   geom_point() +
164 |   theme(axis.text.x = element_text(angle = 90))
165 | ```
166 | 
167 | </div>
168 | 
169 | ## Modifying factor order {#modifying-factor-order .r4ds-section}
170 | 
171 | ### Exercise 15.4.1 {.unnumbered .exercise data-number="15.4.1"}
172 | 
173 | <div class="question">
174 | There are some suspiciously high numbers in `tvhours`.
175 | Is the `mean` a good summary?
176 | </div>
177 | 
178 | <div class="answer">
179 | 
180 | ```{r}
181 | summary(gss_cat[["tvhours"]])
182 | ```
183 | 
184 | ```{r}
185 | gss_cat %>%
186 |   filter(!is.na(tvhours)) %>%
187 |   ggplot(aes(x = tvhours)) +
188 |   geom_histogram(binwidth = 1)
189 | ```
190 | 
191 | Whether the mean is the best summary depends on what you are using it for :-), i.e. your objective.
192 | But probably the median would be what most people prefer.
193 | And the hours of TV doesn't look that surprising to me.
194 | 
195 | </div>
196 | 
197 | ### Exercise 15.4.2 {.unnumbered .exercise data-number="15.4.2"}
198 | 
199 | <div class="question">
200 | For each factor in `gss_cat` identify whether the order of the levels is arbitrary or principled.
201 | </div>
202 | 
203 | <div class="answer">
204 | 
205 | The following piece of code uses functions introduced in Ch 21, to print out the names of only the factors.
206 | ```{r}
207 | keep(gss_cat, is.factor) %>% names()
208 | ```
209 | 
210 | There are six categorical variables: `marital`, `race`, `rincome`, `partyid`, `relig`, and `denom`.
211 | 
212 | The ordering of marital is "somewhat principled". There is some sort of logic
213 | in that the levels are grouped "never married", married at some point
214 | (separated, divorced, widowed), and "married"; though it would seem that "Never
215 | Married", "Divorced", "Widowed", "Separated", "Married" might be more natural.
216 | I find that the question of ordering can be determined by the level of
217 | aggregation in a categorical variable, and there can be more "partially
218 | ordered" factors than one would expect.
219 | 
220 | ```{r}
221 | levels(gss_cat[["marital"]])
222 | ```
223 | ```{r}
224 | gss_cat %>%
225 |   ggplot(aes(x = marital)) +
226 |   geom_bar()
227 | ```
228 | 
229 | The ordering of race is principled in that the categories are ordered by count of observations in the data.
230 | ```{r}
231 | levels(gss_cat$race)
232 | ```
233 | ```{r}
234 | gss_cat %>%
235 |   ggplot(aes(race)) +
236 |   geom_bar() +
237 |   scale_x_discrete(drop = FALSE)
238 | ```
239 | 
240 | The levels of `rincome` are ordered in decreasing order of the income; however
241 | the placement of "No answer", "Don't know", and "Refused" before, and "Not
242 | applicable" after the income levels is arbitrary. It would be better to place
243 | all the missing income level categories either before or after all the known
244 | values.
245 | 
246 | ```{r}
247 | levels(gss_cat$rincome)
248 | ```
249 | 
250 | The levels of `relig` is arbitrary: there is no natural ordering, and they don't appear to be ordered by stats within the dataset.
251 | ```{r}
252 | levels(gss_cat$relig)
253 | ```
254 | 
255 | ```{r}
256 | gss_cat %>%
257 |   ggplot(aes(relig)) +
258 |   geom_bar() +
259 |   coord_flip()
260 | ```
261 | 
262 | The same goes for `denom`.
263 | ```{r}
264 | levels(gss_cat$denom)
265 | ```
266 | 
267 | Ignoring "No answer", "Don't know", and "Other party", the levels of `partyid` are ordered from "Strong Republican"" to "Strong Democrat".
268 | ```{r}
269 | levels(gss_cat$partyid)
270 | ```
271 | 
272 | </div>
273 | 
274 | ### Exercise 15.4.3 {.unnumbered .exercise data-number="15.4.3"}
275 | 
276 | <div class="question">
277 | Why did moving “Not applicable” to the front of the levels move it to the bottom of the plot?
278 | </div>
279 | 
280 | <div class="answer">
281 | 
282 | Because that gives the level "Not applicable" an integer value of 1.
283 | 
284 | </div>
285 | 
286 | ## Modifying factor levels {#modifying-factor-levels .r4ds-section}
287 | 
288 | ### Exercise 15.5.1 {.unnumbered .exercise data-number="15.5.1"}
289 | 
290 | <div class="question">
291 | How have the proportions of people identifying as Democrat, Republican, and Independent changed over time?
292 | </div>
293 | 
294 | <div class="answer">
295 | 
296 | To answer that, we need to combine the multiple levels into Democrat, Republican, and Independent
297 | ```{r}
298 | levels(gss_cat$partyid)
299 | ```
300 | 
301 | ```{r}
302 | gss_cat %>%
303 |   mutate(
304 |     partyid =
305 |       fct_collapse(partyid,
306 |         other = c("No answer", "Don't know", "Other party"),
307 |         rep = c("Strong republican", "Not str republican"),
308 |         ind = c("Ind,near rep", "Independent", "Ind,near dem"),
309 |         dem = c("Not str democrat", "Strong democrat")
310 |       )
311 |   ) %>%
312 |   count(year, partyid) %>%
313 |   group_by(year) %>%
314 |   mutate(p = n / sum(n)) %>%
315 |   ggplot(aes(
316 |     x = year, y = p,
317 |     colour = fct_reorder2(partyid, year, p)
318 |   )) +
319 |   geom_point() +
320 |   geom_line() +
321 |   labs(colour = "Party ID.")
322 | ```
323 | 
324 | </div>
325 | 
326 | ### Exercise 15.5.2 {.unnumbered .exercise data-number="15.5.2"}
327 | 
328 | <div class="question">
329 | How could you collapse `rincome` into a small set of categories?
330 | </div>
331 | 
332 | <div class="answer">
333 | 
334 | Group all the non-responses into one category, and then group other categories into a smaller number. Since there is a clear ordering, we would not use `fct_lump()`.`
335 | ```{r}
336 | levels(gss_cat$rincome)
337 | ```
338 | 
339 | ```{r}
340 | library("stringr")
341 | gss_cat %>%
342 |   mutate(
343 |     rincome =
344 |       fct_collapse(
345 |         rincome,
346 |         `Unknown` = c("No answer", "Don't know", "Refused", "Not applicable"),
347 |         `Lt $5000` = c("Lt $1000", str_c(
348 |           "$", c("1000", "3000", "4000"),
349 |           " to ", c("2999", "3999", "4999")
350 |         )),
351 |         `$5000 to 10000` = str_c(
352 |           "$", c("5000", "6000", "7000", "8000"),
353 |           " to ", c("5999", "6999", "7999", "9999")
354 |         )
355 |       )
356 |   ) %>%
357 |   ggplot(aes(x = rincome)) +
358 |   geom_bar() +
359 |   coord_flip()
360 | ```
361 | </div>
362 | 


--------------------------------------------------------------------------------
/graphics-for-communication.Rmd:
--------------------------------------------------------------------------------
  1 | # Graphics for communication {#graphics-for-communication .r4ds-section}
  2 | 
  3 | ## Introduction {#introduction-19 .r4ds-section}
  4 | 
  5 | ```{r setup,message=FALSE,cache=FALSE}
  6 | library("tidyverse")
  7 | library("modelr")
  8 | library("lubridate")
  9 | ```
 10 | 
 11 | ## Label {#label .r4ds-section}
 12 | 
 13 | ### Exercise 28.2.1 {.unnumbered .exercise data-number="28.2.1"}
 14 | 
 15 | <div class="question">
 16 | Create one plot on the fuel economy data with customized `title`,
 17 | `subtitle`, `caption`, `x`, `y`, and `colour` labels.
 18 | </div>
 19 | 
 20 | <div class="answer">
 21 | 
 22 | ```{r}
 23 | ggplot(
 24 |   data = mpg,
 25 |   mapping = aes(x = fct_reorder(class, hwy), y = hwy)
 26 | ) +
 27 |   geom_boxplot() +
 28 |   coord_flip() +
 29 |   labs(
 30 |     title = "Compact Cars have > 10 Hwy MPG than Pickup Trucks",
 31 |     subtitle = "Comparing the median highway mpg in each class",
 32 |     caption = "Data from fueleconomy.gov",
 33 |     x = "Car Class",
 34 |     y = "Highway Miles per Gallon"
 35 |   )
 36 | ```
 37 | 
 38 | </div>
 39 | 
 40 | ### Exercise 28.2.2 {.unnumbered .exercise data-number="28.2.2"}
 41 | 
 42 | <div class="question">
 43 | The `geom_smooth()` is somewhat misleading because the `hwy` for large engines is skewed upwards due to the inclusion of lightweight sports cars with big engines.
 44 | Use your modeling tools to fit and display a better model.
 45 | </div>
 46 | 
 47 | <div class="answer">
 48 | 
 49 | First, I'll plot the relationship between fuel efficiency and engine size (displacement) using all cars.
 50 | The plot shows a strong negative relationship.
 51 | ```{r}
 52 | ggplot(mpg, aes(displ, hwy)) +
 53 |   geom_point() +
 54 |   geom_smooth(method = "lm", se = FALSE) +
 55 |   labs(
 56 |     title = "Fuel Efficiency Decreases with Engine Size",
 57 |     caption = "Data from fueleconomy.gov",
 58 |     y = "Highway Miles per Gallon",
 59 |     x = "Engine Displacement"
 60 |   )
 61 | ```
 62 | 
 63 | However, if I disaggregate by car class, and plot the relationship between 
 64 | fuel efficiency and engine displacement within each class, I see a different
 65 | relationship.
 66 | 
 67 | 1.  For all car class except subcompact cars, there is no relationship or only
 68 |     a small negative relationship between fuel efficiency and engine size.
 69 | 
 70 | 1.  For subcompact cars, there is a strong negative relationship between fuel
 71 |     efficiency and engine size. As the question noted, this is because the 
 72 |     subcompact car class includes both small cheap cars, and sports cars with
 73 |     large engines.
 74 | 
 75 | ```{r}
 76 | ggplot(mpg, aes(displ, hwy, colour = class)) +
 77 |   geom_point() +
 78 |   geom_smooth(method = "lm", se = FALSE) +
 79 |   labs(
 80 |     title = "Fuel Efficiency Mostly Varies by Car Class",
 81 |     subtitle = "Subcompact caries fuel efficiency varies by engine size",
 82 |     caption = "Data from fueleconomy.gov",
 83 |     y = "Highway Miles per Gallon",
 84 |     x = "Engine Displacement"
 85 |   )
 86 | ```
 87 | 
 88 | Another way to model and visualize the relationship between fuel efficiency
 89 | and engine displacement after accounting for car class is to regress 
 90 | fuel efficiency on car class, and plot the residuals of that regression against
 91 | engine displacement.
 92 | The residuals of the first regression are the variation in fuel efficiency
 93 | not explained by engine displacement.
 94 | The relationship between fuel efficiency and engine displacement is attenuated
 95 | after accounting for car class.
 96 | 
 97 | ```{r}
 98 | mod <- lm(hwy ~ class, data = mpg)
 99 | mpg %>%
100 |   add_residuals(mod) %>%
101 |   ggplot(aes(x = displ, y = resid)) +
102 |   geom_point() +
103 |   geom_smooth(method = "lm", se = FALSE) +
104 |   labs(
105 |     title = "Engine size has little effect on fuel efficiency",
106 |     subtitle = "After accounting for car class",
107 |     caption = "Data from fueleconomy.gov",
108 |     y = "Highway MPG Relative to Class Average",
109 |     x = "Engine Displacement"
110 |   )
111 | ```
112 | 
113 | </div>
114 | 
115 | ### Exercise 28.2.3 {.unnumbered .exercise data-number="28.2.3"}
116 | 
117 | <div class="question">
118 | Take an exploratory graphic that you've created in the last month, and add informative titles to make it easier for others to understand.
119 | </div>
120 | 
121 | <div class="answer">
122 | 
123 | By its very nature, this exercise is left to readers.
124 | 
125 | </div>
126 | 
127 | ## Annotations {#annotations .r4ds-section}
128 | 
129 | ### Exercise 28.3.1 {.unnumbered .exercise data-number="28.3.1"}
130 | 
131 | <div class="question">
132 | Use `geom_text()` with infinite positions to place text at the four corners of the plot.
133 | </div>
134 | 
135 | <div class="answer">
136 | 
137 | I can use similar code as the example in the text.
138 | However, I need to use `vjust` and `hjust` in order for the text to appear in the plot, and these need to be different for each corner.
139 | But, `geom_text()` takes `hjust` and `vjust` as aesthetics, I can add them to the data and mappings, and use a single `geom_text()` call instead of four different `geom_text()` calls with four different data arguments, and four different values of `hjust` and `vjust` arguments.
140 | ```{r}
141 | label <- tribble(
142 |   ~displ, ~hwy, ~label, ~vjust, ~hjust,
143 |   Inf, Inf, "Top right", "top", "right",
144 |   Inf, -Inf, "Bottom right", "bottom", "right",
145 |   -Inf, Inf, "Top left", "top", "left",
146 |   -Inf, -Inf, "Bottom left", "bottom", "left"
147 | )
148 | 
149 | ggplot(mpg, aes(displ, hwy)) +
150 |   geom_point() +
151 |   geom_text(aes(label = label, vjust = vjust, hjust = hjust), data = label)
152 | ```
153 | 
154 | </div>
155 | 
156 | ### Exercise 28.3.2 {.unnumbered .exercise data-number="28.3.2"}
157 | 
158 | <div class="question">
159 | Read the documentation for `annotate()`. How can you use it to add a text label to a plot without having to create a tibble?
160 | </div>
161 | 
162 | <div class="answer">
163 | 
164 | With annotate you use what would be aesthetic mappings directly as arguments:
165 | ```{r}
166 | ggplot(mpg, aes(displ, hwy)) +
167 |   geom_point() +
168 |   annotate("text",
169 |     x = Inf, y = Inf,
170 |     label = "Increasing engine size is \nrelated to decreasing fuel economy.", vjust = "top", hjust = "right"
171 |   )
172 | ```
173 | 
174 | </div>
175 | 
176 | ### Exercise 28.3.3 {.unnumbered .exercise data-number="28.3.3"}
177 | 
178 | <div class="question">
179 | How do labels with `geom_text()` interact with faceting?
180 | How can you add a label to a single facet?
181 | How can you put a different label in each facet?
182 | (Hint: think about the underlying data.)
183 | </div>
184 | 
185 | <div class="answer">
186 | 
187 | If the facet variable is not specified, the text is drawn in all facets.
188 | ```{r}
189 | label <- tibble(
190 |   displ = Inf,
191 |   hwy = Inf,
192 |   label = "Increasing engine size is \nrelated to decreasing fuel economy."
193 | )
194 | 
195 | ggplot(mpg, aes(displ, hwy)) +
196 |   geom_point() +
197 |   geom_text(aes(label = label),
198 |     data = label, vjust = "top", hjust = "right",
199 |     size = 2
200 |   ) +
201 |   facet_wrap(~class)
202 | ```
203 | 
204 | To draw the label in only one facet, add a column to the label data frame with the value of the faceting variable(s) in which to draw it.
205 | ```{r}
206 | label <- tibble(
207 |   displ = Inf,
208 |   hwy = Inf,
209 |   class = "2seater",
210 |   label = "Increasing engine size is \nrelated to decreasing fuel economy."
211 | )
212 | 
213 | ggplot(mpg, aes(displ, hwy)) +
214 |   geom_point() +
215 |   geom_text(aes(label = label),
216 |     data = label, vjust = "top", hjust = "right",
217 |     size = 2
218 |   ) +
219 |   facet_wrap(~class)
220 | ```
221 | 
222 | To draw labels in different plots, simply have the facetting variable(s):
223 | ```{r}
224 | label <- tibble(
225 |   displ = Inf,
226 |   hwy = Inf,
227 |   class = unique(mpg$class),
228 |   label = str_c("Label for ", class)
229 | )
230 | 
231 | ggplot(mpg, aes(displ, hwy)) +
232 |   geom_point() +
233 |   geom_text(aes(label = label),
234 |     data = label, vjust = "top", hjust = "right",
235 |     size = 3
236 |   ) +
237 |   facet_wrap(~class)
238 | ```
239 | 
240 | </div>
241 | 
242 | ### Exercise 28.3.4 {.unnumbered .exercise data-number="28.3.4"}
243 | 
244 | <div class="question">
245 | What arguments to `geom_label()` control the appearance of the background box?
246 | </div>
247 | 
248 | <div class="answer">
249 | 
250 | -   `label.padding`: padding around label
251 | -   `label.r`: amount of rounding in the corners
252 | -   `label.size`: size of label border
253 | 
254 | </div>
255 | 
256 | ### Exercise 28.3.5 {.unnumbered .exercise data-number="28.3.5"}
257 | 
258 | <div class="question">
259 | What are the four arguments to `arrow()`? How do they work?
260 | Create a series of plots that demonstrate the most important options.
261 | </div>
262 | 
263 | <div class="answer">
264 | 
265 | The four arguments are (from the help for `arrow()`):
266 | 
267 | -   `angle` : angle of arrow head
268 | -   `length` : length of the arrow head
269 | -   `ends`: ends of the line to draw arrow head
270 | -   `type`: `"open"` or `"close"`: whether the arrow head is a closed or open triangle
271 | 
272 | </div>
273 | 
274 | ## Scales {#scales .r4ds-section}
275 | 
276 | ### Exercise 28.4.1 {.unnumbered .exercise data-number="28.4.1"}
277 | 
278 | <div class="question">
279 | Why doesn’t the following code override the default scale?
280 | </div>
281 | 
282 | <div class="answer">
283 | 
284 | ```{r}
285 | df <- tibble(
286 |   x = rnorm(10000),
287 |   y = rnorm(10000)
288 | )
289 | ggplot(df, aes(x, y)) +
290 |   geom_hex() +
291 |   scale_colour_gradient(low = "white", high = "red") +
292 |   coord_fixed()
293 | ```
294 | 
295 | It does not override the default scale because the colors in `geom_hex()` are set by the `fill` aesthetic, not the `color` aesthetic.
296 | 
297 | ```{r}
298 | ggplot(df, aes(x, y)) +
299 |   geom_hex() +
300 |   scale_fill_gradient(low = "white", high = "red") +
301 |   coord_fixed()
302 | ```
303 | 
304 | </div>
305 | 
306 | ### Exercise 28.4.2 {.unnumbered .exercise data-number="28.4.2"}
307 | 
308 | <div class="question">
309 | The first argument to every scale is the label for the scale.
310 | It is equivalent to using the `labs` function.
311 | </div>
312 | 
313 | <div class="answer">
314 | 
315 | ```{r}
316 | ggplot(mpg, aes(displ, hwy)) +
317 |   geom_point(aes(colour = class)) +
318 |   geom_smooth(se = FALSE) +
319 |   labs(
320 |     x = "Engine displacement (L)",
321 |     y = "Highway fuel economy (mpg)",
322 |     colour = "Car type"
323 |   )
324 | ```
325 | 
326 | ```{r}
327 | ggplot(mpg, aes(displ, hwy)) +
328 |   geom_point(aes(colour = class)) +
329 |   geom_smooth(se = FALSE) +
330 |   scale_x_continuous("Engine displacement (L)") +
331 |   scale_y_continuous("Highway fuel economy (mpg)") +
332 |   scale_colour_discrete("Car type")
333 | ```
334 | 
335 | </div>
336 | 
337 | ### Exercise 28.4.3 {.unnumbered .exercise data-number="28.4.3"}
338 | 
339 | <div class="question">
340 | Change the display of the presidential terms by:
341 | 
342 | 1.  Combining the two variants shown above.
343 | 1.  Improving the display of the y axis.
344 | 1.  Labeling each term with the name of the president.
345 | 1.  Adding informative plot labels.
346 | 1.  Placing breaks every 4 years (this is trickier than it seems!).
347 | 
348 | </div>
349 | 
350 | <div class="answer">
351 | 
352 | ```{r}
353 | fouryears <- lubridate::make_date(seq(year(min(presidential$start)),
354 |   year(max(presidential$end)),
355 |   by = 4
356 | ), 1, 1)
357 | 
358 | presidential %>%
359 |   mutate(
360 |     id = 33 + row_number(),
361 |     name_id = fct_inorder(str_c(name, " (", id, ")"))
362 |   ) %>%
363 |   ggplot(aes(start, name_id, colour = party)) +
364 |   geom_point() +
365 |   geom_segment(aes(xend = end, yend = name_id)) +
366 |   scale_colour_manual("Party", values = c(Republican = "red", Democratic = "blue")) +
367 |   scale_y_discrete(NULL) +
368 |   scale_x_date(NULL,
369 |     breaks = presidential$start, date_labels = "'%y",
370 |     minor_breaks = fouryears
371 |   ) +
372 |   ggtitle("Terms of US Presdients",
373 |     subtitle = "Roosevelth (34th) to Obama (44th)"
374 |   ) +
375 |   theme(
376 |     panel.grid.minor = element_blank(),
377 |     axis.ticks.y = element_blank()
378 |   )
379 | ```
380 | 
381 | To include both the start dates of presidential terms and every
382 | four years, I use different levels of emphasis. 
383 | The presidential term start years are used as major breaks with thicker lines and x-axis labels.
384 | Lines for every four years is indicated with minor breaks that use thinner lines to distinguish them from presidential term start years and to avoid cluttering the plot.
385 | 
386 | </div>
387 | 
388 | ### Exercise 28.4.4 {.unnumbered .exercise data-number="28.4.4"}
389 | 
390 | <div class="question">
391 | Use `override.aes` to make the legend on the following plot easier to see.
392 | </div>
393 | 
394 | <div class="answer">
395 | 
396 | ```{r}
397 | ggplot(diamonds, aes(carat, price)) +
398 |   geom_point(aes(colour = cut), alpha = 1 / 20)
399 | ```
400 | 
401 | The problem with the legend is that the `alpha` value make the colors hard to see. So I'll override the alpha value to make the points solid in the legend.
402 | ```{r}
403 | ggplot(diamonds, aes(carat, price)) +
404 |   geom_point(aes(colour = cut), alpha = 1 / 20) +
405 |   theme(legend.position = "bottom") +
406 |   guides(colour = guide_legend(nrow = 1, override.aes = list(alpha = 1)))
407 | ```
408 | 
409 | </div>
410 | 
411 | ## Zooming {#zooming .r4ds-section}
412 | 
413 | `r no_exercises()`
414 | 
415 | ## Themes {#themes .r4ds-section}
416 | 
417 | `r no_exercises()`
418 | 
419 | ## Saving your plots {#saving-your-plots .r4ds-section}
420 | 
421 | `r no_exercises()`
422 | 
423 | ## Learning more {#learning-more-4 .r4ds-section}
424 | 
425 | `r no_exercises()`
426 | 


--------------------------------------------------------------------------------
/img/cover.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/img/cover.png


--------------------------------------------------------------------------------
/img/r4ds-exercise-solutions-cover.key:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/img/r4ds-exercise-solutions-cover.key


--------------------------------------------------------------------------------
/img/r4ds-exercise-solutions-cover.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/img/r4ds-exercise-solutions-cover.png


--------------------------------------------------------------------------------
/img/rmarkdown-file.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/img/rmarkdown-file.png


--------------------------------------------------------------------------------
/img/rmarkdown-knit-button.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/img/rmarkdown-knit-button.png


--------------------------------------------------------------------------------
/img/rmarkdown-notebook.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/img/rmarkdown-notebook.png


--------------------------------------------------------------------------------
/img/visualize/unnamed-chunk-29-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/img/visualize/unnamed-chunk-29-1.png


--------------------------------------------------------------------------------
/img/visualize/unnamed-chunk-29-2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/img/visualize/unnamed-chunk-29-2.png


--------------------------------------------------------------------------------
/img/visualize/unnamed-chunk-29-3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/img/visualize/unnamed-chunk-29-3.png


--------------------------------------------------------------------------------
/img/visualize/unnamed-chunk-29-4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/img/visualize/unnamed-chunk-29-4.png


--------------------------------------------------------------------------------
/img/visualize/unnamed-chunk-29-5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/img/visualize/unnamed-chunk-29-5.png


--------------------------------------------------------------------------------
/img/visualize/unnamed-chunk-29-6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jrnold/r4ds-exercise-solutions/e5605c5d032bcab5780766e23cd230118ffb44ba/img/visualize/unnamed-chunk-29-6.png


--------------------------------------------------------------------------------
/import.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | output: html_document
  3 | editor_options:
  4 |   chunk_output_type: console
  5 | ---
  6 | # Data import {#data-import .r4ds-section}
  7 | 
  8 | ## Introduction {#introduction-5 .r4ds-section}
  9 | 
 10 | ```{r results='hide',message=FALSE,cache=FALSE}
 11 | library("tidyverse")
 12 | ```
 13 | 
 14 | ## Getting started {#getting-started .r4ds-section}
 15 | 
 16 | ### Exercise 11.2.1 {.unnumbered .exercise data-number="11.2.1"}
 17 | 
 18 | <div class="question">
 19 | What function would you use to read a file where fields were separated with “|”?
 20 | </div>
 21 | 
 22 | <div class="answer">
 23 | 
 24 | Use the `read_delim()` function with the argument `delim="|"`.
 25 | ```{r eval=FALSE}
 26 | read_delim(file, delim = "|")
 27 | ```
 28 | 
 29 | </div>
 30 | 
 31 | ### Exercise 11.2.2 {.unnumbered .exercise data-number="11.2.2"}
 32 | 
 33 | <div class="question">
 34 | Apart from `file`, `skip`, and `comment`, what other arguments do `read_csv()` and `read_tsv()` have in common?
 35 | </div>
 36 | 
 37 | <div class="answer">
 38 | 
 39 | They have the following arguments in common:
 40 | ```{r}
 41 | intersect(names(formals(read_csv)), names(formals(read_tsv)))
 42 | ```
 43 | 
 44 | -   `col_names` and `col_types` are used to specify the column names and how to parse the columns
 45 | -   `locale` is important for determining things like the encoding and whether "." or "," is used as a decimal mark.
 46 | -   `na` and `quoted_na` control which strings are treated as missing values when parsing vectors
 47 | -   `trim_ws` trims whitespace before and after cells before parsing
 48 | -   `n_max` sets how many rows to read
 49 | -   `guess_max` sets how many rows to use when guessing the column type
 50 | -   `progress` determines whether a progress bar is shown.
 51 | 
 52 | In fact, the two functions have the exact same arguments:
 53 | ```{r}
 54 | identical(names(formals(read_csv)), names(formals(read_tsv)))
 55 | ```
 56 | 
 57 | </div>
 58 | 
 59 | ### Exercise 11.2.3 {.unnumbered .exercise data-number="11.2.3"}
 60 | 
 61 | <div class="question">
 62 | What are the most important arguments to `read_fwf()`?
 63 | </div>
 64 | 
 65 | <div class="answer">
 66 | 
 67 | The most important argument to `read_fwf()` which reads "fixed-width formats", is `col_positions` which tells the function where data columns begin and end.
 68 | 
 69 | </div>
 70 | 
 71 | ### Exercise 11.2.4 {.unnumbered .exercise data-number="11.2.4"}
 72 | 
 73 | <div class="question">
 74 | Sometimes strings in a CSV file contain commas.
 75 | To prevent them from causing problems they need to be surrounded by a quoting character, like `"` or `'`.
 76 | By convention, `read_csv()` assumes that the quoting character will be `"`, and if you want to change it you’ll need to use `read_delim()` instead.
 77 | What arguments do you need to specify to read the following text into a data frame?
 78 | 
 79 | ```
 80 | "x,y\n1,'a,b'"
 81 | ```
 82 | 
 83 | </div>
 84 | 
 85 | <div class="answer">
 86 | 
 87 | For `read_delim()`, we will will need to specify a delimiter, in this case `","`, and a quote argument.
 88 | ```{r}
 89 | x <- "x,y\n1,'a,b'"
 90 | read_delim(x, ",", quote = "'")
 91 | ```
 92 | 
 93 | However, this question is out of date. `read_csv()` now supports a quote argument, so the following code works.
 94 | ```{r}
 95 | read_csv(x, quote = "'")
 96 | ```
 97 | 
 98 | </div>
 99 | 
100 | ### Exercise 11.2.5 {.unnumbered .exercise data-number="11.2.5"}
101 | 
102 | <div class="question">
103 | Identify what is wrong with each of the following inline CSV files.
104 | What happens when you run the code?
105 | </div>
106 | 
107 | <div class="answer">
108 | 
109 | ```{r}
110 | read_csv("a,b\n1,2,3\n4,5,6")
111 | ```
112 | 
113 | Only two columns are specified in the header "a" and "b", but the rows have three columns, so the last column is dropped.
114 | 
115 | ```{r}
116 | read_csv("a,b,c\n1,2\n1,2,3,4")
117 | ```
118 | 
119 | The numbers of columns in the data do not match the number of columns in the header (three).
120 | In row one, there are only two values, so column `c` is set to missing.
121 | In row two, there is an extra value, and that value is dropped.
122 | 
123 | ```{r}
124 | read_csv("a,b\n\"1")
125 | ```
126 | It's not clear what the intent was here.
127 | The opening quote `"1` is dropped because it is not closed, and `a` is treated as an integer.
128 | 
129 | ```{r}
130 | read_csv("a,b\n1,2\na,b")
131 | ```
132 | Both "a" and "b" are treated as character vectors since they contain non-numeric strings.
133 | This may have been intentional, or the author may have intended the values of the columns to be "1,2" and "a,b".
134 | 
135 | ```{r}
136 | read_csv("a;b\n1;3")
137 | ```
138 | 
139 | The values are separated by ";" rather than ",". Use `read_csv2()` instead:
140 | ```{r}
141 | read_csv2("a;b\n1;3")
142 | ```
143 | 
144 | </div>
145 | 
146 | ## Parsing a vector {#parsing-a-vector .r4ds-section}
147 | 
148 | ### Exercise 11.3.1 {.unnumbered .exercise data-number="11.3.1"}
149 | 
150 | <div class="question">
151 | What are the most important arguments to `locale()`?
152 | </div>
153 | 
154 | <div class="answer">
155 | 
156 | The locale object has arguments to set the following:
157 | 
158 | -   date and time formats: `date_names`, `date_format`, and `time_format`
159 | -   time zone: `tz`
160 | -   numbers: `decimal_mark`, `grouping_mark`
161 | -   encoding: `encoding`
162 | 
163 | </div>
164 | 
165 | ### Exercise 11.3.2 {.unnumbered .exercise data-number="11.3.2"}
166 | 
167 | <div class="question">
168 | What happens if you try and set `decimal_mark` and `grouping_mark` to the same character?
169 | What happens to the default value of `grouping_mark` when you set `decimal_mark` to `","`?
170 | What happens to the default value of `decimal_mark` when you set the `grouping_mark` to `"."`?
171 | </div>
172 | 
173 | <div class="answer">
174 | 
175 | If the decimal and grouping marks are set to the same character, `locale` throws an error:
176 | ```{r error=TRUE}
177 | locale(decimal_mark = ".", grouping_mark = ".")
178 | ```
179 | 
180 | If the `decimal_mark` is set to the comma "`,"`, then the grouping mark is set to the period `"."`:
181 | ```{r}
182 | locale(decimal_mark = ",")
183 | ```
184 | 
185 | If the grouping mark is set to a period, then the decimal mark is set to a comma
186 | ```{r}
187 | locale(grouping_mark = ".")
188 | ```
189 | 
190 | </div>
191 | 
192 | ### Exercise 11.3.3 {.unnumbered .exercise data-number="11.3.3"}
193 | 
194 | <div class="question">
195 | I didn’t discuss the `date_format` and `time_format` options to `locale()`.
196 | What do they do?
197 | Construct an example that shows when they might be useful.
198 | </div>
199 | 
200 | <div class="answer">
201 | 
202 | They provide default date and time formats.
203 | The [readr vignette](https://cran.r-project.org/web/packages/readr/vignettes/locales.html) discusses using these to parse dates: since dates can include languages specific weekday and month names, and different conventions for specifying AM/PM
204 | ```{r}
205 | locale()
206 | ```
207 | 
208 | Examples from the readr vignette of parsing French dates
209 | ```{r}
210 | parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr"))
211 | parse_date("14 oct. 1979", "%d %b %Y", locale = locale("fr"))
212 | ```
213 | 
214 | Both the date format and time format are used for guessing column types.
215 | Thus if you were often parsing data that had non-standard formats for the date and time, you could specify custom values for `date_format` and `time_format`.
216 | ```{r}
217 | locale_custom <- locale(date_format = "Day %d Mon %M Year %y",
218 |                  time_format = "Sec %S Min %M Hour %H")
219 | date_custom <- c("Day 01 Mon 02 Year 03", "Day 03 Mon 01 Year 01")
220 | parse_date(date_custom)
221 | parse_date(date_custom, locale = locale_custom)
222 | time_custom <- c("Sec 01 Min 02 Hour 03", "Sec 03 Min 02 Hour 01")
223 | parse_time(time_custom)
224 | parse_time(time_custom, locale = locale_custom)
225 | ```
226 | 
227 | </div>
228 | 
229 | ### Exercise 11.3.4 {.unnumbered .exercise data-number="11.3.4"}
230 | 
231 | <div class="question">
232 | If you live outside the US, create a new locale object that encapsulates the settings for the types of file you read most commonly.
233 | </div>
234 | 
235 | <div class="answer">
236 | 
237 | Read the help page for `locale()` using `?locale` to learn about the different variables that can be set.
238 | 
239 | As an example, consider Australia.
240 | Most of the defaults values are valid, except that the date format is "(d)d/mm/yyyy", meaning that January 2, 2006 is written as `02/01/2006`.
241 | 
242 | However, default locale will parse that date as February 1, 2006.
243 | 
244 | ```{r}
245 | parse_date("02/01/2006")
246 | ```
247 | 
248 | To correctly parse Australian dates, define a new `locale` object.
249 | 
250 | ```{r}
251 | au_locale <- locale(date_format = "%d/%m/%Y")
252 | ```
253 | 
254 | Using `parse_date()` with the `au_locale` as its locale will correctly parse our example date.
255 | 
256 | ```{r}
257 | parse_date("02/01/2006", locale = au_locale)
258 | ```
259 | 
260 | </div>
261 | 
262 | ### Exercise 11.3.5 {.unnumbered .exercise data-number="11.3.5"}
263 | 
264 | <div class="question">
265 | What’s the difference between `read_csv()` and `read_csv2()`?
266 | </div>
267 | 
268 | <div class="answer">
269 | 
270 | The delimiter. The function `read_csv()` uses a comma, while `read_csv2()` uses a semi-colon (`;`). Using a semi-colon is useful when commas are used as the decimal point (as in Europe).
271 | 
272 | </div>
273 | 
274 | ### Exercise 11.3.6 {.unnumbered .exercise data-number="11.3.6"}
275 | 
276 | <div class="question">
277 | What are the most common encodings used in Europe?
278 | What are the most common encodings used in Asia?
279 | Do some googling to find out.
280 | </div>
281 | 
282 | <div class="answer">
283 | 
284 | UTF-8 is standard now, and ASCII has been around forever.
285 | 
286 | For the European languages, there are separate encodings for Romance languages and Eastern European languages using Latin script, Cyrillic, Greek, Hebrew, Turkish: usually with separate ISO and Windows encoding standards.
287 | There is also Mac OS Roman.
288 | 
289 | For Asian languages Arabic and Vietnamese have ISO and Windows standards. The other major Asian scripts have their own:
290 | 
291 | -   Japanese: JIS X 0208, Shift JIS, ISO-2022-JP
292 | -   Chinese: GB 2312, GBK, GB 18030
293 | -   Korean: KS X 1001, EUC-KR, ISO-2022-KR
294 | 
295 | The list in the documentation for `stringi::stri_enc_detect()` is a good list of encodings since it supports the most common encodings.
296 | 
297 | -   Western European Latin script languages: ISO-8859-1, Windows-1250 (also CP-1250 for code-point)
298 | -   Eastern European Latin script languages: ISO-8859-2, Windows-1252
299 | -   Greek: ISO-8859-7
300 | -   Turkish: ISO-8859-9, Windows-1254
301 | -   Hebrew: ISO-8859-8, IBM424, Windows 1255
302 | -   Russian: Windows 1251
303 | -   Japanese: Shift JIS, ISO-2022-JP, EUC-JP
304 | -   Korean: ISO-2022-KR, EUC-KR
305 | -   Chinese: GB18030, ISO-2022-CN (Simplified), Big5 (Traditional)
306 | -   Arabic: ISO-8859-6, IBM420, Windows 1256
307 | 
308 | For more information on character encodings see the following sources.
309 | 
310 | -   The Wikipedia page [Character encoding](https://en.wikipedia.org/wiki/Character_encoding), has a good list of encodings.
311 | -   Unicode [CLDR](http://cldr.unicode.org/) project
312 | -   [What is the most common encoding of each language](https://stackoverflow.com/questions/8509339/what-is-the-most-common-encoding-of-each-language) (Stack Overflow)
313 | -   "What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text", <http://kunststube.net/encoding/>.
314 | 
315 | Programs that identify the encoding of text include:
316 | 
317 | -   `readr::guess_encoding()`
318 | -   `stringi::str_enc_detect()`
319 | -   [iconv](https://en.wikipedia.org/wiki/Iconv)
320 | -   [chardet](https://github.com/chardet/chardet) (Python)
321 | 
322 | </div>
323 | 
324 | ### Exercise 11.3.7 {.unnumbered .exercise data-number="11.3.7"}
325 | 
326 | <div class="question">
327 | Generate the correct format string to parse each of the following dates and times:
328 | </div>
329 | 
330 | <div class="answer">
331 | 
332 | ```{r}
333 | d1 <- "January 1, 2010"
334 | d2 <- "2015-Mar-07"
335 | d3 <- "06-Jun-2017"
336 | d4 <- c("August 19 (2015)", "July 1 (2015)")
337 | d5 <- "12/30/14" # Dec 30, 2014
338 | t1 <- "1705"
339 | t2 <- "11:15:10.12 PM"
340 | ```
341 | 
342 | The correct formats are:
343 | ```{r}
344 | parse_date(d1, "%B %d, %Y")
345 | parse_date(d2, "%Y-%b-%d")
346 | parse_date(d3, "%d-%b-%Y")
347 | parse_date(d4, "%B %d (%Y)")
348 | parse_date(d5, "%m/%d/%y")
349 | parse_time(t1, "%H%M")
350 | ```
351 | The time `t2` uses real seconds,
352 | ```{r}
353 | parse_time(t2, "%H:%M:%OS %p")
354 | ```
355 | 
356 | </div>
357 | 
358 | ## Parsing a file {#parsing-a-file .r4ds-section}
359 | 
360 | `r no_exercises()`
361 | 
362 | ## Writing to a file {#writing-to-a-file .r4ds-section}
363 | 
364 | `r no_exercises()`
365 | 
366 | ## Other types of data {#other-types-of-data .r4ds-section}
367 | 
368 | `r no_exercises()`
369 | 


--------------------------------------------------------------------------------
/includes/hypothesis.html:
--------------------------------------------------------------------------------
 1 | <!--
 2 | License: MIT License
 3 | Copyright (c) 2016 Open Review Toolkit & Ben Marwick
 4 | 
 5 | See https://github.com/benmarwick/bookdown-ort/blob/master/hypothesis.html
 6 | -->
 7 | 
 8 | <script async defer src="https://hypothes.is/embed.js"></script>
 9 | <script>window.hypothesisConfig = function () { return {showHighlights: false}; };</script>
10 | 


--------------------------------------------------------------------------------
/includes/ort.css:
--------------------------------------------------------------------------------
 1 | /*
 2 | License: MIT
 3 | Copyright (c) 2016 Open Review Toolkit & Ben Marwick
 4 | 
 5 | Source: https://github.com/benmarwick/bookdown-ort/blob/a240344b05c75dc6f40012d5cc8bfab31e4dc7b3/style.css. With additional editing.
 6 | */
 7 | 
 8 | .h-icon-chevron-left {
 9 |   background: white;
10 |   padding: 3px;
11 |   border: #eee 1px solid;
12 |   color: #666;
13 | }
14 | 
15 | .fa-rotate-315 {
16 |   -webkit-transform: rotate(315deg);
17 |   -moz-transform: rotate(315deg);
18 |   -ms-transform: rotate(315deg);
19 |   -o-transform: rotate(315deg);
20 |   transform: rotate(315deg);
21 | }
22 | 
23 | .rmdreview {
24 |   padding: 1em 1em 1em 5em;
25 |   margin-bottom: 0px;
26 |   background: #f5f5f5 5px center/3em no-repeat;
27 |   position:relative;
28 | }
29 | 
30 | .rmdreview:before {
31 |   content: "\f0e6";
32 |   font-family: FontAwesome;
33 |   left:10px;
34 |   position:absolute;
35 |   top:10px;
36 |   bottom: 0px;
37 |   font-size: 60px;
38 |  }
39 | 


--------------------------------------------------------------------------------
/includes/r4ds-solutions.css:
--------------------------------------------------------------------------------
 1 | .hints-icon {
 2 |     display: table-cell;
 3 |     padding-right: 15px;
 4 |     padding-left: 5px;
 5 | }
 6 | 
 7 | .hints-container {
 8 |     display: table-cell;
 9 | }
10 | 
11 | .noexercises, .question {
12 |   margin: 20px 0;
13 |   padding: 15px 30px 15px 15px;
14 | }
15 | 
16 | .question {
17 |   border-left: 5px solid #eee;
18 | }
19 | 
20 | .question blockquote:first-child {
21 |   margin-top: 0
22 | }
23 | 
24 | .question blockquote:last-child {
25 |   margin-bottom: 0
26 | }
27 | 
28 | /* Anchors and external links for sections */
29 | /* Showing anchors for sections idea comes from the CSS GitHub uses to render READMEs
30 |    See https://github.com/sindresorhus/github-markdown-css */
31 | .section-link {
32 |   font-size: smaller;
33 |   vertical-align: middle;
34 |   /* yeah, imporant is bad, but I can't figure out how to override this one otherwise */
35 |   color: #a9a9a9 !important;
36 | }
37 | 
38 | .anchor {
39 |   float: left;
40 |   padding-right: 4px;
41 |   margin-left: -20px;
42 | }
43 | 
44 | .r4ds-section-link {
45 |   padding-left: 4px;
46 | }
47 | 
48 | h1 a.section-link,
49 | h2 a.section-link,
50 | h3 a.section-link,
51 | h4 a.section-link,
52 | h5 a.section-link,
53 | h6 a.section-link {
54 |   visibility: hidden;
55 | }
56 | 
57 | h1:hover a.section-link,
58 | h2:hover a.section-link,
59 | h3:hover a.section-link,
60 | h4:hover a.section-link,
61 | h5:hover a.section-link,
62 | h6:hover a.section-link {
63 |   visibility: visible;
64 | }
65 | 
66 | /* space for the anchor links icons on the left */
67 | .book .book-body .page-wrapper .page-inner section {
68 |   margin-left: 20px;
69 | }
70 | 


--------------------------------------------------------------------------------
/includes/r4ds.css:
--------------------------------------------------------------------------------
 1 | .book .book-header {
 2 |   opacity: 1;
 3 |   text-align: left;
 4 | }
 5 | #header .title {
 6 |   margin-bottom: 0;
 7 | }
 8 | #header .author {
 9 |   margin: 0;
10 |   color: #666;
11 | }
12 | #header .author em {
13 |   font-style: normal;
14 | }
15 | 


--------------------------------------------------------------------------------
/index.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | knit: "bookdown::render_book"
  3 | title: "R for Data Science: Exercise Solutions"
  4 | date: >-
  5 |   `r format(Sys.Date(), "%B %d, %Y")`
  6 | author: ["Jeffrey B. Arnold"]
  7 | description: >
  8 |   Solutions to the exercises in 
  9 |   "R for Data Science" by Garrett Grolemund and Hadley Wickham.
 10 | site: bookdown::bookdown_site
 11 | github-repo: "jrnold/r4ds-exercise-solutions"
 12 | url: 'http\://jrnold.github.io/r4ds-exercise-solutions'
 13 | twitter-handle: jrnld
 14 | documentclass: book
 15 | bibliography: ["r4ds.bib"]
 16 | link-citations: true
 17 | biblio-stye: apalike
 18 | cover-image: /img/r4ds-exercise-solutions-cover.png
 19 | ---
 20 | 
 21 | ```{r include=FALSE,cache=FALSE,purl=FALSE}
 22 | # don't cache anything on this page
 23 | knitr::opts_chunk$set(cache = FALSE)
 24 | ```
 25 | 
 26 | # Welcome {-}
 27 | 
 28 | <img src="./img/r4ds-exercise-solutions-cover.png" width="250" height="375" alt="Cover image" align="right" style="margin: 0 1em 0 1em"/>
 29 | 
 30 | This book contains the **exercise solutions** for the book [*R for Data Science*](https://amzn.to/2aHLAQ1), by Hadley Wickham and Garret Grolemund [@WickhamGrolemund2017].
 31 | 
 32 | *R for Data Science* itself is available online at [r4ds.had.co.nz](https://r4ds.had.co.nz/), and physical copy is published by O'Reilly Media and available from [amazon](https://amzn.to/2aHLAQ1).
 33 | 
 34 | ## Acknowledgments {-}
 35 | 
 36 | ```{r include=FALSE,purl=FALSE,cache=FALSE}
 37 | library("magrittr")
 38 | # adapted from usethis:::github_repo_spec
 39 | github_repo_spec <- function(path = here::here()) {
 40 |   stringr::str_c(gh::gh_tree_remote(path), collapse = "/")
 41 | }
 42 | 
 43 | # copied from usethis:::parse_repo_spec
 44 | parse_repo_spec <- function(repo_spec) {
 45 |   repo_split <- stringr::str_split(repo_spec, "/")[[1]]
 46 |   if (length(repo_split) != 2) {
 47 |     stop("`repo_spec` must be of the form 'owner/repo'")
 48 |   }
 49 |   list(owner = repo_split[[1]], repo = repo_split[[2]])
 50 | }
 51 | 
 52 | # copied from usethis:::spec_owner
 53 | spec_owner <- function(repo_spec) {
 54 |   parse_repo_spec(repo_spec)$owner
 55 | }
 56 | 
 57 | # copied from usethis:::spec_repo
 58 | spec_repo <- function(repo_spec) {
 59 |   parse_repo_spec(repo_spec)$repo
 60 | }
 61 | 
 62 | # Need to use the github API because this info isn't included in the
 63 | # commits for GitHub pulls: Github <noreply@github.com>
 64 | 
 65 | # adapted from from usethis:::use_tidy_thanks
 66 | github_contribs <- function(repo_spec = github_repo_spec(),
 67 |                           excluded = NULL) {
 68 |   if (is.null(excluded)) {
 69 |     excluded <- spec_owner(repo_spec)
 70 |   }
 71 |   res <- gh::gh("/repos/:owner/:repo/issues",
 72 |     owner = spec_owner(repo_spec),
 73 |     repo = spec_repo(repo_spec), state = "all",
 74 |     filter = "all", .limit = Inf
 75 |   )
 76 |   if (identical(res[[1]], "")) {
 77 |     message("No matching issues/PRs found.")
 78 |     return(invisible())
 79 |   }
 80 |   contributors <- purrr:::map_chr(res, c("user", "login")) %>%
 81 |     purrr::discard(~.x %in% excluded) %>%
 82 |     unique() %>%
 83 |     sort()
 84 |   glue::glue("[\\@{contributors}](https://github.com/{contributors})") %>%
 85 |     glue::glue_collapse(sep = ", ", width = Inf, last = ", and")
 86 | }
 87 | 
 88 | hypothesis_contribs <- function() {
 89 |   hypothesis_user_url <- function(x) {
 90 |     username <- stringr::str_match(x, "acct:(.*)@")[1, 2]
 91 |     url <- stringr::str_c("https://hypothes.is/users/", username)
 92 |     stringr::str_c("[\\@", username, "](", url, ")")
 93 |   }
 94 | 
 95 |   hypothesis_url <- "https://hypothes.is/api/search"
 96 |   url_pattern <- "https://jrnold.github.io/r4ds-exercise-solutions/*"
 97 |   annotations <- httr::GET(hypothesis_url,
 98 |                            query = list(wildcard_uri = url_pattern)) %>%
 99 |     httr::content()
100 | 
101 |   annotations %>%
102 |     purrr::pluck("rows") %>%
103 |     purrr::keep(~ !.x$flagged) %>%
104 |     purrr::map_chr("user") %>%
105 |     unique() %>%
106 |     purrr::discard(~ .x == "acct:jrnold@hypothes.is") %>%
107 |     purrr::map_chr(hypothesis_user_url) %>%
108 |     sort() %>%
109 |     glue::glue_collapse(sep = ", ", width = Inf, last = ", and ")
110 | }
111 | ```
112 | 
113 | These solutions have benefited from many contributors.
114 | A special thanks to:
115 | 
116 | -   Garrett Grolemund and Hadley Wickham for writing the truly fantastic *R for Data Science*, without whom these solutions would not exist---literally.
117 | -   [\@dongzhuoer](https://github.com/dongzhuoer) and [\@cfgauss](https://hypothes.is/users/cfgauss) for careful readings of the book and noticing numerous issues and proposing fixes.
118 | 
119 | Thank you to all of those who contributed issues or pull-requests on
120 | [GitHub](https://github.com/jrnold/r4ds-exercise-solutions/graphs/contributors)
121 | (in alphabetical order): `r github_contribs()`
122 | Thank you to all of you who contributed annotations on [hypothes.is](https://hypothes.is/search?q=url%3Ajrnold.github.io%2Fr4ds-exercise-solutions%2F*) (in alphabetical order): `r hypothesis_contribs()`.
123 | 
124 | For another set of solutions for and notes on *R for Data Science* see [Yet Another 'R for Data Science' Study Guide](https://brshallo.github.io/r4ds_solutions/) by [Bryan Shalloway](https://github.com/brshallo).
125 | 
126 | ## License {-}
127 | 
128 | This work is licensed under a <a rel="license" href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.
129 | 


--------------------------------------------------------------------------------
/intro.Rmd:
--------------------------------------------------------------------------------
 1 | # Introduction {#introduction .r4ds-section}
 2 | 
 3 | ```{r include=FALSE,cache=FALSE,purl=FALSE}
 4 | # don't cache anything on this page
 5 | knitr::opts_chunk$set(cache = FALSE)
 6 | ```
 7 | 
 8 | ## How this book is organized {-}
 9 | 
10 | The book is divided into sections in with the same numbers and titles as those in *R for Data Science*. 
11 | Not all sections have exercises.
12 | Those sections without exercises have placeholder text indicating that there are no exercises.
13 | The text for each exercise is followed by the solution. 
14 | 
15 | Like *R for Data Science*, packages used in each chapter are loaded in a code chunk at the start of the chapter in a section titled "Prerequisites".
16 | If exercises depend on code in a section of *R for Data Science* it is either provided before the exercises or within the exercise solution.
17 | 
18 | If a package is used infrequently in solutions it may not be loaded, and functions using it will be called using the package name followed by two colons, as in `dplyr::mutate()` (see the *R for Data Science* [Introduction](https://r4ds.had.co.nz/introduction.html#running-r-code)).
19 | The double colon may also be used to be explicit about the package from which a function comes.
20 | 
21 | ## Prerequisites {-}
22 | 
23 | This book is a complement to, not a substitute of, [R for Data Science]().
24 | It only provides the exercise solutions for it. 
25 | See the [R for Data Science](https://r4ds.had.co.nz/introduction.html#prerequisites) prerequisites.
26 | 
27 | Additional, the solutions use several packages that are not used in *R4DS*.
28 | You can install all packages required to run the code in this book with the following line of code.
29 | ```{r eval=FALSE}
30 | devtools::install_github("jrnold/r4ds-exercise-solutions")
31 | ```
32 | 
33 | ## Bugs/Contributing {-}
34 | 
35 | If you find any typos, errors in the solutions, have an alternative solution,
36 | or think the solution could be improved, I would love your contributions.
37 | The best way to contribute is through GitHub.
38 | Please open an issue at <https://github.com/jrnold/r4ds-exercise-solutions/issues> or a pull request at
39 | <https://github.com/jrnold/r4ds-exercise-solutions/pulls>.
40 | 
41 | ## Colophon {-}
42 | 
43 | ```{r include=FALSE, purl = FALSE}
44 | r_head <- git2r::repository_head()
45 | r_branch <- r_head$name
46 | r_sha <- git2r::sha(r_head)
47 | r_sha_short <- stringr::str_sub(r_sha, 1, 7)
48 | github_full_url <- stringr::str_c(SOURCE_URL, "tree", r_sha, sep = "/")
49 | ```
50 | 
51 | HTML and PDF versions of this book are available at <`r PUB_URL`>.
52 | The book is powered by [bookdown](https://bookdown.org/home) which makes it easy to turn R markdown files into HTML, PDF, and EPUB.
53 | 
54 | The source of this book is available on GitHub at <`r SOURCE_URL`>.
55 | This book was built from commit [`r r_sha_short`](`r github_full_url`).
56 | 
57 | This book was built with these R packages.
58 | ```{r colonophon}
59 | devtools::session_info("r4ds.exercise.solutions")
60 | ```
61 | 


--------------------------------------------------------------------------------
/many-models.Rmd:
--------------------------------------------------------------------------------
  1 | # Many models {#many-models .r4ds-section}
  2 | 
  3 | ## Introduction {#introduction-17 .r4ds-section}
  4 | 
  5 | ```{r setup,message=FALSE,cache=FALSE}
  6 | library("modelr")
  7 | library("tidyverse")
  8 | library("gapminder")
  9 | ```
 10 | 
 11 | ## gapminder {#gapminder .r4ds-section}
 12 | 
 13 | ### Exercise 25.2.1 {.unnumbered .exercise data-number="25.2.1"}
 14 | 
 15 | <div class="question">
 16 | 
 17 | A linear trend seems to be slightly too simple for the overall trend. 
 18 | Can you do better with a quadratic polynomial? 
 19 | How can you interpret the coefficients of the quadratic? 
 20 | Hint you might want to transform year so that it has mean zero.)
 21 | 
 22 | </div>
 23 | 
 24 | <div class="answer">
 25 | 
 26 | The following code replicates the analysis in the chapter but replaces the function `country_model()` with a regression that includes the year squared.
 27 | ```{r, eval = FALSE}
 28 | lifeExp ~ poly(year, 2)
 29 | ```
 30 | 
 31 | ```{r}
 32 | country_model <- function(df) {
 33 |   lm(lifeExp ~ poly(year - median(year), 2), data = df)
 34 | }
 35 | 
 36 | by_country <- gapminder %>%
 37 |   group_by(country, continent) %>%
 38 |   nest()
 39 | 
 40 | by_country <- by_country %>%
 41 |   mutate(model = map(data, country_model))
 42 | ```
 43 | 
 44 | ```{r}
 45 | by_country <- by_country %>%
 46 |   mutate(
 47 |     resids = map2(data, model, add_residuals)
 48 |   )
 49 | by_country
 50 | ```
 51 | 
 52 | ```{r}
 53 | unnest(by_country, resids) %>%
 54 |   ggplot(aes(year, resid)) +
 55 |   geom_line(aes(group = country), alpha = 1 / 3) +
 56 |   geom_smooth(se = FALSE)
 57 | ```
 58 | 
 59 | ```{r}
 60 | by_country %>%
 61 |   mutate(glance = map(model, broom::glance)) %>%
 62 |   unnest(glance, .drop = TRUE) %>%
 63 |   ggplot(aes(continent, r.squared)) +
 64 |   geom_jitter(width = 0.5)
 65 | ```
 66 | 
 67 | </div>
 68 | 
 69 | ### Exercise 25.2.2 {.unnumbered .exercise data-number="25.2.2"}
 70 | 
 71 | <div class="question">
 72 | Explore other methods for visualizing the distribution of $R^2$ per continent. 
 73 | You might want to try the ggbeeswarm package, which provides similar methods for avoiding overlaps as jitter, but uses deterministic methods.
 74 | 
 75 | </div>
 76 | 
 77 | <div class="answer">
 78 | 
 79 | See exercise 7.5.1.1.6 for more on ggbeeswarm
 80 | 
 81 | ```{r}
 82 | library("ggbeeswarm")
 83 | by_country %>%
 84 |   mutate(glance = map(model, broom::glance)) %>%
 85 |   unnest(glance, .drop = TRUE) %>%
 86 |   ggplot(aes(continent, r.squared)) +
 87 |   geom_beeswarm()
 88 | ```
 89 | 
 90 | </div>
 91 | 
 92 | ### Exercise 25.2.3 {.unnumbered .exercise data-number="25.2.3"}
 93 | 
 94 | <div class="question">
 95 | 
 96 | To create the last plot (showing the data for the countries with the worst model fits),
 97 | we needed two steps:
 98 | we created a data frame with one row per country 
 99 | and then semi-joined it to the original dataset. 
100 | It’s possible to avoid this join if we use `unnest()` instead of `unnest(.drop = TRUE)`.
101 | How?
102 | 
103 | </div>
104 | 
105 | <div class="answer">
106 | 
107 | ```{r}
108 | gapminder %>%
109 |   group_by(country, continent) %>%
110 |   nest() %>%
111 |   mutate(model = map(data, ~lm(lifeExp ~ year, .))) %>%
112 |   mutate(glance = map(model, broom::glance)) %>%
113 |   unnest(glance) %>%
114 |   unnest(data) %>%
115 |   filter(r.squared < 0.25) %>%
116 |   ggplot(aes(year, lifeExp)) +
117 |   geom_line(aes(color = country))
118 | ```
119 | 
120 | </div>
121 | 
122 | ## List-columns {#list-columns-1 .r4ds-section}
123 | 
124 | `r no_exercises()`
125 | 
126 | ## Creating list-columns {#creating-list-columns .r4ds-section}
127 | 
128 | ### Exercise 25.4.1 {.unnumbered .exercise data-number="25.4.1"}
129 | 
130 | <div class="question">
131 | 
132 | List all the functions that you can think of that take a atomic vector and return a list.
133 | 
134 | </div>
135 | 
136 | <div class="answer">
137 | 
138 | Many functions in the stringr package take a character vector as input and return a list.
139 | ```{r}
140 | str_split(sentences[1:3], " ")
141 | str_match_all(c("abc", "aa", "aabaa", "abbbc"), "a+")
142 | ```
143 | The `map()` function takes a vector and always returns a list.
144 | ```{r}
145 | map(1:3, runif)
146 | ```
147 | 
148 | </div>
149 | 
150 | ### Exercise 25.4.2 {.unnumbered .exercise data-number="25.4.2"}
151 | 
152 | <div class="question">
153 | 
154 | Brainstorm useful summary functions that, like `quantile()`, return multiple values.
155 | 
156 | </div>
157 | 
158 | <div class="answer">
159 | 
160 | Some examples of summary functions that return multiple values are the following.
161 | ```{r}
162 | range(mtcars$mpg)
163 | fivenum(mtcars$mpg)
164 | boxplot.stats(mtcars$mpg)
165 | ```
166 | 
167 | </div>
168 | 
169 | ### Exercise 25.4.3 {.unnumbered .exercise data-number="25.4.3"}
170 | 
171 | <div class="question">
172 | 
173 | What’s missing in the following data frame? 
174 | How does `quantile()` return that missing piece? 
175 | Why isn’t that helpful here?
176 | 
177 | ```{r}
178 | mtcars %>%
179 |   group_by(cyl) %>%
180 |   summarise(q = list(quantile(mpg))) %>%
181 |   unnest()
182 | ```
183 | 
184 | </div>
185 | 
186 | <div class="answer">
187 | 
188 | The particular quantiles of the values are missing, e.g. `0%`, `25%`, `50%`, `75%`, `100%`. `quantile()` returns these in the names of the vector.
189 | ```{r}
190 | quantile(mtcars$mpg)
191 | ```
192 | 
193 | Since the `unnest` function drops the names of the vector, they aren't useful here.
194 | 
195 | </div>
196 | 
197 | ### Exercise 25.4.4 {.unnumbered .exercise data-number="25.4.4"}
198 | 
199 | <div class="question">
200 | 
201 | What does this code do?
202 | Why might might it be useful?
203 | 
204 | ```r
205 | mtcars %>%
206 |   group_by(cyl) %>%
207 |   summarise_each(funs(list))
208 | ```
209 | 
210 | </div>
211 | 
212 | <div class="answer">
213 | 
214 | ```{r}
215 | mtcars %>%
216 |   group_by(cyl) %>%
217 |   summarise_each(funs(list))
218 | ```
219 | 
220 | It creates a data frame in which each row corresponds to a value of `cyl`,
221 | and each observation for each column (other than `cyl`) is a vector of all the values of that column for that value of `cyl`.
222 | It seems like it should be useful to have all the observations of each variable for each group, but off the top of my head, I can't think of a specific use for this.
223 | But, it seems that it may do many things that `dplyr::do` does.
224 | 
225 | </div>
226 | 
227 | ## Simplifying list-columns {#simplifying-list-columns .r4ds-section}
228 | 
229 | ### Exercise 25.5.1 {.unnumbered .exercise data-number="25.5.1"}
230 | 
231 | <div class="question">
232 | 
233 | Why might the `lengths()` function be useful for creating atomic vector columns from list-columns?
234 | 
235 | </div>
236 | 
237 | <div class="answer">
238 | 
239 | The `lengths()` function returns the lengths of each element in a list.
240 | It could be useful for testing whether all elements in a list-column are the same length.
241 | You could get the maximum length to determine how many atomic vector columns to create.
242 | It is also a replacement for something like `map_int(x, length)` or `sapply(x, length)`.
243 | 
244 | </div>
245 | 
246 | ### Exercise 25.5.2 {.unnumbered .exercise data-number="25.5.2"}
247 | 
248 | <div class="question">
249 | 
250 | List the most common types of vector found in a data frame.
251 | What makes lists different?
252 | 
253 | </div>
254 | 
255 | <div class="answer">
256 | 
257 | The common types of vectors in data frames are:
258 | 
259 | -   `logical`
260 | -   `numeric`
261 | -   `integer`
262 | -   `character`
263 | -   `factor`
264 | 
265 | All of the common types of vectors in data frames are atomic. 
266 | Lists are not atomic since they can contain other lists and other vectors.
267 | 
268 | </div>
269 | 
270 | ## Making tidy data with broom {#making-tidy-data-with-broom .r4ds-section}
271 | 
272 | `r no_exercises()`
273 | 


--------------------------------------------------------------------------------
/model-basics.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | output: html_document
  3 | editor_options:
  4 |   chunk_output_type: console
  5 | ---
  6 | 
  7 | # Model basics {#model-basics .r4ds-section}
  8 | 
  9 | ## Introduction {#introduction-15 .r4ds-section}
 10 | 
 11 | ```{r setup,message=FALSE,cache=FALSE}
 12 | library("tidyverse")
 13 | library("modelr")
 14 | ```
 15 | 
 16 | The option `na.action` determines how missing values are handled.
 17 | It is a function.
 18 | `na.warn` sets it so that there is a warning if there are any missing values.
 19 | If it is not set (the default), R will silently drop them.
 20 | 
 21 | ```{r}
 22 | options(na.action = na.warn)
 23 | ```
 24 | 
 25 | ## A simple model {#a-simple-model .r4ds-section}
 26 | 
 27 | ### Exercise 23.2.1 {.unnumbered .exercise data-number="23.2.1"}
 28 | 
 29 | <div class="question">
 30 | One downside of the linear model is that it is sensitive to unusual values because the distance incorporates a squared term. Fit a linear model to the simulated data below, and visualize the results. Rerun a few times to generate different simulated datasets. What do you notice about the model?
 31 | </div>
 32 | 
 33 | <div class="answer">
 34 | 
 35 | ```{r}
 36 | sim1a <- tibble(
 37 |   x = rep(1:10, each = 3),
 38 |   y = x * 1.5 + 6 + rt(length(x), df = 2)
 39 | )
 40 | ```
 41 | 
 42 | Let's run it once and plot the results:
 43 | ```{r}
 44 | ggplot(sim1a, aes(x = x, y = y)) +
 45 |   geom_point() +
 46 |   geom_smooth(method = "lm", se = FALSE)
 47 | ```
 48 | We can also do this more systematically, by generating several simulations
 49 | and plotting the line.
 50 | 
 51 | ```{r}
 52 | simt <- function(i) {
 53 |   tibble(
 54 |     x = rep(1:10, each = 3),
 55 |     y = x * 1.5 + 6 + rt(length(x), df = 2),
 56 |     .id = i
 57 |   )
 58 | }
 59 | 
 60 | sims <- map_df(1:12, simt)
 61 | 
 62 | ggplot(sims, aes(x = x, y = y)) +
 63 |   geom_point() +
 64 |   geom_smooth(method = "lm", colour = "red") +
 65 |   facet_wrap(~.id, ncol = 4)
 66 | ```
 67 | 
 68 | What if we did the same things with normal distributions?
 69 | ```{r}
 70 | sim_norm <- function(i) {
 71 |   tibble(
 72 |     x = rep(1:10, each = 3),
 73 |     y = x * 1.5 + 6 + rnorm(length(x)),
 74 |     .id = i
 75 |   )
 76 | }
 77 | 
 78 | simdf_norm <- map_df(1:12, sim_norm)
 79 | 
 80 | ggplot(simdf_norm, aes(x = x, y = y)) +
 81 |   geom_point() +
 82 |   geom_smooth(method = "lm", colour = "red") +
 83 |   facet_wrap(~.id, ncol = 4)
 84 | ```
 85 | There are not large outliers, and the slopes are more similar.
 86 | 
 87 | The reason for this is that the Student's $t$-distribution, from which we sample with `rt()` has heavier tails than the normal distribution (`rnorm()`). This means that the Student's t-distribution
 88 | assigns a larger probability to values further from the center of the distribution.
 89 | ```{r}
 90 | tibble(
 91 |   x = seq(-5, 5, length.out = 100),
 92 |   normal = dnorm(x),
 93 |   student_t = dt(x, df = 2)
 94 | ) %>%
 95 |   pivot_longer(-x, names_to="distribution", values_to="density") %>%
 96 |   ggplot(aes(x = x, y = density, colour = distribution)) +
 97 |   geom_line()
 98 | ```
 99 | 
100 | For a normal distribution with mean zero and standard deviation one, the probability of being greater than 2 is,
101 | ```{r}
102 | pnorm(2, lower.tail = FALSE)
103 | ```
104 | For a Student's $t$ distribution with degrees of freedom = 2, it is more than 3 times higher,
105 | ```{r}
106 | pt(2, df = 2, lower.tail = FALSE)
107 | ```
108 | 
109 | </div>
110 | 
111 | ### Exercise 23.2.2 {.unnumbered .exercise data-number="23.2.2"}
112 | 
113 | <div class="question">
114 | One way to make linear models more robust is to use a different distance measure. For example, instead of root-mean-squared distance, you could use mean-absolute distance:
115 | </div>
116 | 
117 | <div class="answer">
118 | 
119 | ```{r}
120 | measure_distance <- function(mod, data) {
121 |   diff <- data$y - make_prediction(mod, data)
122 |   mean(abs(diff))
123 | }
124 | ```
125 | 
126 | For the above function to work, we need to define a function, `make_prediction()`, that
127 | takes a numeric vector of length two (the intercept and slope) and returns the predictions,
128 | ```{r}
129 | make_prediction <- function(mod, data) {
130 |   mod[1] + mod[2] * data$x
131 | }
132 | ```
133 | 
134 | Using the `sim1a` data, the best parameters of the least absolute deviation are:
135 | ```{r}
136 | best <- optim(c(0, 0), measure_distance, data = sim1a)
137 | best$par
138 | ```
139 | Using the `sim1a` data, while the parameters the minimize the least squares objective function are:
140 | ```{r}
141 | measure_distance_ls <- function(mod, data) {
142 |   diff <- data$y - (mod[1] + mod[2] * data$x)
143 |   sqrt(mean(diff^2))
144 | }
145 | 
146 | best <- optim(c(0, 0), measure_distance_ls, data = sim1a)
147 | best$par
148 | ```
149 | 
150 | In practice, I suggest not using  `optim()` to fit this model, and instead using an existing implementation.
151 | The `rlm()` and `lqs()` functions in the [MASS](https://CRAN.R-project.org/package=MASS) fit robust and resistant linear models.
152 | 
153 | </div>
154 | 
155 | ### Exercise 23.2.3 {.unnumbered .exercise data-number="23.2.3"}
156 | 
157 | <div class="question">
158 | One challenge with performing numerical optimization is that it’s only guaranteed to find a local optimum. What’s the problem with optimizing a three parameter model like this?
159 | </div>
160 | 
161 | <div class="answer">
162 | 
163 | ```{r}
164 | model3 <- function(a, data) {
165 |   a[1] + data$x * a[2] + a[3]
166 | }
167 | ```
168 | 
169 | The problem is that you for any values `a[1] = a1` and `a[3] = a3`, any other values of `a[1]` and `a[3]` where `a[1] + a[3] == (a1 + a3)` will have the same fit.
170 | 
171 | ```{r}
172 | measure_distance_3 <- function(a, data) {
173 |   diff <- data$y - model3(a, data)
174 |   sqrt(mean(diff^2))
175 | }
176 | ```
177 | Depending on our starting points, we can find different optimal values:
178 | ```{r}
179 | best3a <- optim(c(0, 0, 0), measure_distance_3, data = sim1)
180 | best3a$par
181 | ```
182 | ```{r}
183 | best3b <- optim(c(0, 0, 1), measure_distance_3, data = sim1)
184 | best3b$par
185 | ```
186 | ```{r}
187 | best3c <- optim(c(0, 0, 5), measure_distance_3, data = sim1)
188 | best3c$par
189 | ```
190 | In fact there are an infinite number of optimal values for this model.
191 | 
192 | <!-- How to discuss this better ?
193 | 
194 | Problem is that due to finite iterations, numerically these converge:
195 | 
196 | > sum(best3a$par[c(1, 3)])
197 | [1] 4.220074
198 | > sum(best3b$par[c(1, 3)])
199 | [1] 4.220404
200 | > sum(best3c$par[c(1, 3)])
201 | [1] 4.22117
202 | 
203 | -->
204 | 
205 | </div>
206 | 
207 | ## Visualising models {#visualising-models .r4ds-section}
208 | 
209 | ### Exercise 23.3.1 {.unnumbered .exercise data-number="23.3.1"}
210 | 
211 | <div class="question">
212 | Instead of using `lm()` to fit a straight line, you can use `loess()` to fit a smooth curve. Repeat the process of model fitting, grid generation, predictions, and visualization on `sim1` using `loess()` instead of `lm()`. How does the result compare to `geom_smooth()`?
213 | </div>
214 | 
215 | <div class="answer">
216 | 
217 | I'll use `add_predictions()` and `add_residuals()` to add the predictions and residuals from a loess regression to the `sim1` data.
218 | 
219 | ```{r}
220 | sim1_loess <- loess(y ~ x, data = sim1)
221 | sim1_lm <- lm(y ~ x, data = sim1)
222 | 
223 | grid_loess <- sim1 %>%
224 |   add_predictions(sim1_loess)
225 | 
226 | sim1 <- sim1 %>%
227 |   add_residuals(sim1_lm) %>%
228 |   add_predictions(sim1_lm) %>%
229 |   add_residuals(sim1_loess, var = "resid_loess") %>%
230 |   add_predictions(sim1_loess, var = "pred_loess")
231 | ```
232 | 
233 | This plots the loess predictions.
234 | The loess produces a nonlinear, smooth line through the data.
235 | ```{r}
236 | plot_sim1_loess <-
237 |   ggplot(sim1, aes(x = x, y = y)) +
238 |   geom_point() +
239 |   geom_line(aes(x = x, y = pred), data = grid_loess, colour = "red")
240 | plot_sim1_loess
241 | ```
242 | 
243 | The predictions of loess are the same as the default method for `geom_smooth()` because `geom_smooth()` uses `loess()` by default; the message even tells us that.
244 | ```{r message=TRUE}
245 | plot_sim1_loess +
246 |   geom_smooth(method = "loess", colour = "blue", se = FALSE, alpha = 0.20)
247 | ```
248 | 
249 | We can plot the residuals (red), and compare them to the residuals from `lm()` (black).
250 | In general, the loess model has smaller residuals within the sample (out of sample is a different issue, and we haven't considered the uncertainty of these estimates).
251 | 
252 | ```{r}
253 | ggplot(sim1, aes(x = x)) +
254 |   geom_ref_line(h = 0) +
255 |   geom_point(aes(y = resid)) +
256 |   geom_point(aes(y = resid_loess), colour = "red")
257 | ```
258 | 
259 | </div>
260 | 
261 | ### Exercise 23.3.2 {.unnumbered .exercise data-number="23.3.2"}
262 | 
263 | <div class="question">
264 | `add_predictions()` is paired with `gather_predictions()` and `spread_predictions()`.
265 | How do these three functions differ?
266 | </div>
267 | 
268 | <div class="answer">
269 | 
270 | The functions `gather_predictions()` and `spread_predictions()` allow for adding predictions from multiple models at once.
271 | 
272 | Taking the `sim1_mod` example,
273 | ```{r}
274 | sim1_mod <- lm(y ~ x, data = sim1)
275 | grid <- sim1 %>%
276 |   data_grid(x)
277 | ```
278 | 
279 | The function `add_predictions()` adds only a single model at a time.
280 | To add two models:
281 | ```{r}
282 | grid %>%
283 |   add_predictions(sim1_mod, var = "pred_lm") %>%
284 |   add_predictions(sim1_loess, var = "pred_loess")
285 | ```
286 | The function `gather_predictions()` adds predictions from multiple models by
287 | stacking the results and adding a column with the model name,
288 | ```{r}
289 | grid %>%
290 |   gather_predictions(sim1_mod, sim1_loess)
291 | ```
292 | The function `spread_predictions()` adds predictions from multiple models by
293 | adding multiple columns (postfixed with the model name) with predictions from each model.
294 | ```{r}
295 | grid %>%
296 |   spread_predictions(sim1_mod, sim1_loess)
297 | ```
298 | The function `spread_predictions()` is similar to the example which runs `add_predictions()` for each model, and is equivalent to running `spread()` after
299 | running `gather_predictions()`:
300 | ```{r}
301 | grid %>%
302 |   gather_predictions(sim1_mod, sim1_loess) %>%
303 |   spread(model, pred)
304 | ```
305 | 
306 | </div>
307 | 
308 | ### Exercise 23.3.3 {.unnumbered .exercise data-number="23.3.3"}
309 | 
310 | <div class="question">
311 | What does `geom_ref_line()` do? What package does it come from?
312 | Why is displaying a reference line in plots showing residuals useful and important?
313 | </div>
314 | 
315 | <div class="answer">
316 | 
317 | The geom `geom_ref_line()` adds as reference line to a plot.
318 | It is equivalent to running `geom_hline()` or `geom_vline()` with default settings that are useful for visualizing models.
319 | Putting a reference line at zero for residuals is important because good models (generally) should have residuals centered at zero, with approximately the same variance (or distribution) over the support of x, and no correlation.
320 | A zero reference line makes it easier to judge these characteristics visually.
321 | 
322 | </div>
323 | 
324 | ### Exercise 23.3.4 {.unnumbered .exercise data-number="23.3.4"}
325 | 
326 | <div class="question">
327 | Why might you want to look at a frequency polygon of absolute residuals?
328 | What are the pros and cons compared to looking at the raw residuals?
329 | </div>
330 | 
331 | <div class="answer">
332 | 
333 | Showing the absolute values of the residuals makes it easier to view the spread of the residuals.
334 | The model assumes that the residuals have mean zero, and using the absolute values of the residuals effectively doubles the number of residuals.
335 | ```{r}
336 | sim1_mod <- lm(y ~ x, data = sim1)
337 | 
338 | sim1 <- sim1 %>%
339 |   add_residuals(sim1_mod)
340 | 
341 | ggplot(sim1, aes(x = abs(resid))) +
342 |   geom_freqpoly(binwidth = 0.5)
343 | ```
344 | 
345 | However, using the absolute values of residuals throws away information about the sign, meaning that the
346 | frequency polygon cannot show whether the model systematically over- or under-estimates the residuals.
347 | 
348 | </div>
349 | 
350 | ## Formulas and model families {#formulas-and-model-families .r4ds-section}
351 | 
352 | ### Exercise 23.4.1 {.unnumbered .exercise data-number="23.4.1"}
353 | 
354 | <div class="question">
355 | What happens if you repeat the analysis of `sim2` using a model without an intercept. What happens to the model equation?
356 | What happens to the predictions?
357 | </div>
358 | 
359 | <div class="answer">
360 | 
361 | To run a model without an intercept, add `- 1` or `+ 0` to the right-hand-side o f the formula:
362 | ```{r}
363 | mod2a <- lm(y ~ x - 1, data = sim2)
364 | ```
365 | ```{r}
366 | mod2 <- lm(y ~ x, data = sim2)
367 | ```
368 | 
369 | The predictions are exactly the same in the models with and without an intercept:
370 | ```{r}
371 | grid <- sim2 %>%
372 |   data_grid(x) %>%
373 |   spread_predictions(mod2, mod2a)
374 | grid
375 | ```
376 | 
377 | </div>
378 | 
379 | ### Exercise 23.4.2 {.unnumbered .exercise data-number="23.4.2"}
380 | 
381 | <div class="question">
382 | Use `model_matrix()` to explore the equations generated for the models I fit to `sim3` and `sim4`.
383 | Why is `*` a good shorthand for interaction?
384 | </div>
385 | 
386 | <div class="answer">
387 | 
388 | For `x1 * x2` when `x2` is a categorical variable produces indicator variables `x2b`, `x2c`, `x2d` and
389 | variables `x1:x2b`, `x1:x2c`, and `x1:x2d` which are the products of `x1` and `x2*` variables:
390 | ```{r}
391 | x3 <- model_matrix(y ~ x1 * x2, data = sim3)
392 | x3
393 | ```
394 | We can confirm that the variables `x1:x2b` is the product of `x1` and `x2b`,
395 | ```{r}
396 | all(x3[["x1:x2b"]] == (x3[["x1"]] * x3[["x2b"]]))
397 | ```
398 | and similarly for `x1:x2c` and `x2c`, and `x1:x2d` and `x2d`:
399 | ```{r}
400 | all(x3[["x1:x2c"]] == (x3[["x1"]] * x3[["x2c"]]))
401 | all(x3[["x1:x2d"]] == (x3[["x1"]] * x3[["x2d"]]))
402 | ```
403 | 
404 | For `x1 * x2` where both `x1` and `x2` are continuous variables, `model_matrix()` creates variables
405 | `x1`, `x2`, and `x1:x2`:
406 | ```{r}
407 | x4 <- model_matrix(y ~ x1 * x2, data = sim4)
408 | x4
409 | ```
410 | Confirm that `x1:x2` is the product of the `x1` and `x2`,
411 | ```{r}
412 | all(x4[["x1"]] * x4[["x2"]] == x4[["x1:x2"]])
413 | ```
414 | 
415 | The asterisk `*` is good shorthand for an interaction since an interaction between `x1` and `x2` includes
416 | terms for `x1`, `x2`, and the product of `x1` and `x2`.
417 | 
418 | </div>
419 | 
420 | ### Exercise 23.4.3 {.unnumbered .exercise data-number="23.4.3"}
421 | 
422 | <div class="question">
423 | Using the basic principles, convert the formulas in the following two models into functions.
424 | (Hint: start by converting the categorical variable into 0-1 variables.)
425 | </div>
426 | 
427 | ```{r}
428 | mod1 <- lm(y ~ x1 + x2, data = sim3)
429 | mod2 <- lm(y ~ x1 * x2, data = sim3)
430 | ```
431 | 
432 | <div class="answer">
433 | 
434 | The problem is to convert the formulas in the models into functions. 
435 | I will assume that the function is only handling the conversion of the right hand side of the formula into a model matrix.
436 | The functions will take one argument, a data frame with `x1` and `x2` columns,
437 | and it will return a data frame.
438 | In other words, the functions will be special cases of the `model_matrix()` function.
439 | 
440 | Consider the right hand side of the first formula, `~ x1 + x2`.
441 | In the `sim3` data frame, the column `x1` is an integer, and the variable `x2` is a factor with four levels.
442 | ```{r}
443 | levels(sim3$x2)
444 | ```
445 | 
446 | Since `x1` is numeric it is unchanged.
447 | Since `x2` is a factor it is replaced with columns of indicator variables for all but one of its levels.
448 | I will first consider the special case in which `x2` only takes the levels of `x2` in `sim3`.
449 | In this case, "a" is considered the reference level and omitted, and new columns are made for "b", "c", and "d".
450 | ```{r}
451 | model_matrix_mod1 <- function(.data) {
452 |   mutate(.data,
453 |     x2b = as.numeric(x2 == "b"),
454 |     x2c = as.numeric(x2 == "c"),
455 |     x2d = as.numeric(x2 == "d"),
456 |     `(Intercept)` = 1
457 |   ) %>%
458 |     select(`(Intercept)`, x1, x2b, x2c, x2d)
459 | }
460 | ```
461 | ```{r}
462 | model_matrix_mod1(sim3)
463 | ```
464 | 
465 | A more general function for `~ x1 + x2` would not hard-code the specific levels in `x2`.
466 | ```{r}
467 | model_matrix_mod1b <- function(.data) {
468 |   # the levels of x2
469 |   lvls <- levels(.data$x2)
470 |   # drop the first level
471 |   # this assumes that there are at least two levels
472 |   lvls <- lvls[2:length(lvls)]
473 |   # create an indicator variable for each level of x2
474 |   for (lvl in lvls) {
475 |     # new column name x2 + level name
476 |     varname <- str_c("x2", lvl)
477 |     # add indicator variable for lvl
478 |     .data[[varname]] <- as.numeric(.data$x2 == lvl)
479 |   }
480 |   # generate the list of variables to keep
481 |   x2_variables <- str_c("x2", lvls)
482 |   # Add an intercept
483 |   .data[["(Intercept)"]] <- 1
484 |   # keep x1 and x2 indicator variables
485 |   select(.data, `(Intercept)`, x1, all_of(x2_variables))
486 | }
487 | ```
488 | ```{r}
489 | model_matrix_mod1b(sim3)
490 | ```
491 | 
492 | Consider the right hand side of the first formula, `~ x1 * x2`.
493 | The output data frame will consist of `x1`, columns with indicator variables for each level (except the reference level) of `x2`,
494 | and columns with the `x2` indicator variables multiplied by `x1`.
495 | 
496 | As with the previous formula, first I'll write a function that hard-codes the levels of `x2`.
497 | ```{r}
498 | model_matrix_mod2 <- function(.data) {
499 |   mutate(.data,
500 |     `(Intercept)` = 1,
501 |     x2b = as.numeric(x2 == "b"),
502 |     x2c = as.numeric(x2 == "c"),
503 |     x2d = as.numeric(x2 == "d"),
504 |     `x1:x2b` = x1 * x2b,
505 |     `x1:x2c` = x1 * x2c,
506 |     `x1:x2d` = x1 * x2d
507 |   ) %>%
508 |     select(`(Intercept)`, x1, x2b, x2c, x2d, `x1:x2b`, `x1:x2c`, `x1:x2d`)
509 | }
510 | ```
511 | ```{r}
512 | model_matrix_mod2(sim3)
513 | ```
514 | 
515 | For a more general function which will handle arbitrary levels in `x2`, I will 
516 | extend the `model_matrix_mod1b()` function that I wrote earlier.
517 | ```{r}
518 | model_matrix_mod2b <- function(.data) {
519 |   # get dataset with x1 and x2 indicator variables
520 |   out <- model_matrix_mod1b(.data)
521 |   # get names of the x2 indicator columns
522 |   x2cols <- str_subset(colnames(out), "^x2")
523 |   # create interactions between x1 and the x2 indicator columns
524 |   for (varname in x2cols) {
525 |     # name of the interaction variable
526 |     newvar <- str_c("x1:", varname)
527 |     out[[newvar]] <- out$x1 * out[[varname]]
528 |   }
529 |   out
530 | }
531 | ```
532 | ```{r}
533 | model_matrix_mod2b(sim3)
534 | ```
535 | 
536 | These functions could be further generalized to allow for `x1` and `x2` to
537 | be either numeric or factors. However, generalizing much more than that and 
538 | we will soon start reimplementing all of the `matrix_model()` function.
539 | 
540 | </div>
541 | 
542 | ### Exercise 23.4.4 {.unnumbered .exercise data-number="23.4.4"}
543 | 
544 | <div class="question">
545 | For `sim4`, which of `mod1` and `mod2` is better?
546 | I think `mod2` does a slightly better job at removing patterns, but it’s pretty subtle.
547 | Can you come up with a plot to support my claim?
548 | </div>
549 | 
550 | <div class="answer">
551 | 
552 | Estimate models `mod1` and `mod2` on `sim4`,
553 | ```{r}
554 | mod1 <- lm(y ~ x1 + x2, data = sim4)
555 | mod2 <- lm(y ~ x1 * x2, data = sim4)
556 | ```
557 | and add the residuals from these models to the `sim4` data,
558 | ```{r}
559 | sim4_mods <- gather_residuals(sim4, mod1, mod2)
560 | ```
561 | 
562 | Frequency plots of both the residuals,
563 | ```{r}
564 | 
565 | ggplot(sim4_mods, aes(x = resid, colour = model)) +
566 |   geom_freqpoly(binwidth = 0.5) +
567 |   geom_rug()
568 | ```
569 | and the absolute values of the residuals,
570 | ```{r}
571 | ggplot(sim4_mods, aes(x = abs(resid), colour = model)) +
572 |   geom_freqpoly(binwidth = 0.5) +
573 |   geom_rug()
574 | ```
575 | does not show much difference in the residuals between the models.
576 | However, `mod2` appears to have fewer residuals in the tails of the distribution between 2.5 and 5 (although the most extreme residuals are from `mod2`.
577 | 
578 | This is confirmed by checking the standard deviation of the residuals of these models,
579 | ```{r}
580 | sim4_mods %>%
581 |   group_by(model) %>%
582 |   summarise(resid = sd(resid))
583 | ```
584 | The standard deviation of the residuals of `mod2` is smaller than that of `mod1`.
585 | 
586 | </div>
587 | 
588 | ## Missing values {#missing-values-5 .r4ds-section}
589 | 
590 | `r no_exercises()`
591 | 
592 | ## Other model families {#other-model-families .r4ds-section}
593 | 
594 | `r no_exercises()`
595 | 


--------------------------------------------------------------------------------
/model.Rmd:
--------------------------------------------------------------------------------
1 | # (PART) Model {-}
2 | 
3 | # Introduction {#model-intro .r4ds-section}
4 | 
5 | `r no_exercises()`
6 | 


--------------------------------------------------------------------------------
/pipes.Rmd:
--------------------------------------------------------------------------------
1 | # Pipes {#pipes .r4ds-section}
2 | 
3 | `r no_exercises()`
4 | 


--------------------------------------------------------------------------------
/program.Rmd:
--------------------------------------------------------------------------------
1 | # (PART) Program {-}
2 | 
3 | # Introduction {#program-intro .r4ds-section}
4 | 
5 | `r no_exercises()`
6 | 


--------------------------------------------------------------------------------
/r4ds-exercise-solutions.Rproj:
--------------------------------------------------------------------------------
 1 | Version: 1.0
 2 | 
 3 | RestoreWorkspace: No
 4 | SaveWorkspace: No
 5 | AlwaysSaveHistory: No
 6 | 
 7 | EnableCodeIndexing: Yes
 8 | UseSpacesForTab: Yes
 9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 | 
12 | RnwWeave: knitr
13 | LaTeX: XeLaTeX
14 | 
15 | AutoAppendNewline: Yes
16 | StripTrailingWhitespace: Yes
17 | LineEndingConversion: Posix
18 | 
19 | BuildType: Custom
20 | CustomScriptPath: bin/render.R
21 | 


--------------------------------------------------------------------------------
/r4ds.bib:
--------------------------------------------------------------------------------
  1 | % Encoding: UTF-8
  2 | 
  3 | @Article{ClevelandMcGillMcGill1988,
  4 |   author    = {William S. Cleveland and Marylyn E. McGill and Robert McGill},
  5 |   title     = {The shape parameter of a two-variable graph},
  6 |   journal   = {Journal of the American Statistical Association},
  7 |   year      = {1988},
  8 |   volume    = {83},
  9 |   number    = {402},
 10 |   pages     = {289--300},
 11 |   issn      = {01621459},
 12 |   url       = {https://www.jstor.org/stable/2288843},
 13 |   abstract  = {The shape parameter of a two-variable graph is the ratio of the horizontal and vertical distances spanned by the data. For at least 70 years this parameter has received much attention in writings on data display, because it is a critical factor on two-variable graphs that show how one variable depends on the other. But despite the attention, there has been little systematic study. In this article the shape parameter and its effect on the visual decoding of slope information are studied through historical, empirical, theoretical, and experimental investigations. These investigations lead to a method for choosing the shape that maximizes the accuracy of slope judgments.},
 14 |   publisher = {[American Statistical Association, Taylor \& Francis, Ltd.]},
 15 |   timestamp = {2018-08-03},
 16 | }
 17 | 
 18 | @Book{WickhamGrolemund2017,
 19 |   author    = {Wickham, Hadley and Grolemund, Garrett},
 20 |   title     = {{R} for data science: import, tidy, transform, visualize, and model data},
 21 |   date      = {2017-01-05},
 22 |   edition   = {1},
 23 |   publisher = {O'Reilly Media},
 24 |   isbn      = {978-1491910399},
 25 |   timestamp = {2018-08-03},
 26 | }
 27 | 
 28 | @Article{HeerAgrawala2006,
 29 |   author       = {Heer, Jeffrey and Agrawala, Maneesh},
 30 |   title        = {Multi-scale banking to 45º},
 31 |   journaltitle = {Ieee Transactions on Visualization and Computer Graphics},
 32 |   year         = {2006},
 33 |   volume       = {12},
 34 |   number       = {5},
 35 |   issue        = {September/October},
 36 |   doi          = {10.1109/TVCG.2006.163},
 37 |   url          = {https://dx.doi.org/10.1109/TVCG.2006.163},
 38 |   timestamp    = {2018-08-03},
 39 | }
 40 | 
 41 | @Book{Cleveland1993,
 42 |   author    = {Cleveland, William S.},
 43 |   title     = {Visualizing information},
 44 |   year      = {1993},
 45 |   publisher = {Hobart Press},
 46 |   timestamp = {2018-08-03},
 47 | }
 48 | 
 49 | @Book{Cleveland1994,
 50 |   author    = {Cleveland, William S.},
 51 |   title     = {The elements of graphing data},
 52 |   year      = {1994},
 53 |   publisher = {Hobart Press},
 54 |   timestamp = {2018-08-03},
 55 | }
 56 | 
 57 | @Article{Cleveland1993a,
 58 |   author    = {William S. Cleveland},
 59 |   title     = {A model for studying display methods of statistical graphics},
 60 |   journal   = {Journal of Computational and Graphical Statistics},
 61 |   year      = {1993},
 62 |   volume    = {2},
 63 |   number    = {4},
 64 |   pages     = {323-343},
 65 |   doi       = {10.1080/10618600.1993.10474616},
 66 |   url       = {
 67 |         https://dx.doi.org/10.1080/10618600.1993.10474616
 68 | 
 69 | },
 70 |   publisher = {Taylor \& Francis},
 71 |   timestamp = {2018-08-03},
 72 | }
 73 | 
 74 | @Article{DoaneSeward2011,
 75 |   author    = {David P. Doane and Lori E. Seward},
 76 |   title     = {Measuring skewness: a forgotten statistic?},
 77 |   journal   = {Journal of Statistics Education},
 78 |   year      = {2011},
 79 |   volume    = {19},
 80 |   number    = {2},
 81 |   pages     = {null},
 82 |   doi       = {10.1080/10691898.2011.11889611},
 83 |   eprint    = {https://doi.org/10.1080/10691898.2011.11889611},
 84 |   url       = { 
 85 |         https://doi.org/10.1080/10691898.2011.11889611
 86 |     
 87 | },
 88 |   publisher = {Taylor \& Francis},
 89 |   timestamp = {2018-08-03},
 90 | }
 91 | 
 92 | @Article{HintzeNelson1998,
 93 |   author    = {Jerry L. Hintze and Ray D. Nelson},
 94 |   title     = {Violin Plots: A Box Plot-Density Trace Synergism},
 95 |   journal   = {The American Statistician},
 96 |   year      = {1998},
 97 |   volume    = {52},
 98 |   number    = {2},
 99 |   pages     = {181-184},
100 |   doi       = {10.1080/00031305.1998.10480559},
101 |   eprint    = {https://amstat.tandfonline.com/doi/pdf/10.1080/00031305.1998.10480559},
102 |   url       = { 
103 |         https://amstat.tandfonline.com/doi/abs/10.1080/00031305.1998.10480559
104 |     
105 | },
106 |   publisher = {Taylor \& Francis},
107 |   timestamp = {2018-08-10},
108 | }
109 | 
110 | @Article{HofmannWickhamKafadar2017,
111 |   author    = {Heike Hofmann and Hadley Wickham and Karen Kafadar},
112 |   title     = {Letter-Value Plots: Boxplots for Large Data},
113 |   journal   = {Journal of Computational and Graphical Statistics},
114 |   year      = {2017},
115 |   volume    = {26},
116 |   number    = {3},
117 |   pages     = {469-477},
118 |   doi       = {10.1080/10618600.2017.1305277},
119 |   eprint    = {https://doi.org/10.1080/10618600.2017.1305277},
120 |   url       = { 
121 |         https://doi.org/10.1080/10618600.2017.1305277
122 |     
123 | },
124 |   publisher = {Taylor \& Francis},
125 |   timestamp = {2018-08-10},
126 | }
127 | 
128 | @Comment{jabref-meta: databaseType:biblatex;}
129 | 


--------------------------------------------------------------------------------
/rmarkdown-formats.Rmd:
--------------------------------------------------------------------------------
1 | # R Markdown formats {#r-markdown-formats .r4ds-section}
2 | 
3 | `r no_exercises()`
4 | 


--------------------------------------------------------------------------------
/rmarkdown-workflow.Rmd:
--------------------------------------------------------------------------------
1 | # R Markdown workflow {#r-markdown-workflow .r4ds-section}
2 | 
3 | `r no_exercises()`
4 | 


--------------------------------------------------------------------------------
/rmarkdown.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | output: html_document
  3 | editor_options:
  4 |   chunk_output_type: console
  5 | ---
  6 | 
  7 | # R Markdown {#r-markdown .r4ds-section}
  8 | 
  9 | ## Introduction {#introduction-18 .r4ds-section}
 10 | 
 11 | ## R Markdown basics {#r-markdown-basics .r4ds-section}
 12 | 
 13 | ### Exercise 27.2.1 {.unnumbered .exercise data-number="27.2.1"}
 14 | 
 15 | <div class="question">
 16 | 
 17 | Create a new notebook using *File > New File > R Notebook*. Read the instructions. Practice running the chunks. Verify that you can modify the code, re-run it, and see modified output.
 18 | 
 19 | </div>
 20 | 
 21 | <div class="answer">
 22 | 
 23 | This exercise is left to the reader.
 24 | 
 25 | </div>
 26 | 
 27 | ### Exercise 27.2.2 {.unnumbered .exercise data-number="27.2.2"}
 28 | 
 29 | <div class="question">
 30 | 
 31 | Create a new R Markdown document with *File > New File > R Markdown ...*.
 32 | Knit it by clicking the appropriate button.
 33 | Knit it by using the appropriate keyboard short cut.
 34 | Verify that you can modify the input and see the output update.
 35 | 
 36 | </div>
 37 | 
 38 | <div class="answer">
 39 | 
 40 | This exercise is mostly left to the reader.
 41 | Recall that the keyboard shortcut to knit a file is `Cmd/Ctrl + Alt + K`.
 42 | 
 43 | </div>
 44 | 
 45 | ### Exercise 27.2.3 {.unnumbered .exercise data-number="27.2.3"}
 46 | 
 47 | <div class="question">
 48 | Compare and contrast the R notebook and R markdown files you created above.
 49 | How are the outputs similar? How are they different?
 50 | How are the inputs similar? How are they different?
 51 | What happens if you copy the YAML header from one to the other?
 52 | </div>
 53 | 
 54 | <div class="answer">
 55 | 
 56 | R notebook files show the output of code chunks inside the editor, while hiding the console, when they are edited in RStudio.
 57 | This contrasts with R markdown files, which show their output inside the console, and do not show output inside the editor.
 58 | This makes R notebook documents appealing for interactive exploration.
 59 | In this R markdown file, the plot is displayed in the "Plot" tab, while the output of `summary()` is displayed in the tab.
 60 | ```{r echo=FALSE,purl=FALSE}
 61 | knitr::include_graphics("img/rmarkdown-file.png")
 62 | ```
 63 | However, when this same file is converted to a R notebook, the plot and `summary()` output are displayed in the "Editor" below the chunk of code which created them.
 64 | ```{r echo=FALSE,purl=FALSE}
 65 | knitr::include_graphics("img/rmarkdown-notebook.png")
 66 | ```
 67 | 
 68 | Both R notebooks and R markdown files and can be knit to produce HTML output.
 69 | R markdown files can be knit to a variety of formats including HTML, PDF, and DOCX.
 70 | However, R notebooks can only be knit to HTML files, which are given the extension `.nb.html`.
 71 | However, unlike R markdown files knit to HTML, the HTML output of an R notebook includes copy of the original `.Rmd` source.
 72 | If a `.nb.html` file is opened in RStudio, the source of the `.Rmd` file can be extracted and edited.
 73 | In contrast, there is no way to recover the original source of an R markdown file from its output, except through the parts that are displayed in the output itself.
 74 | 
 75 | R markdown files and R notebooks differ in the value of `output` in their YAML headers.
 76 | The YAML header for the R notebook will have the line,
 77 | ```
 78 | ---
 79 | ouptut: html_notebook
 80 | ---
 81 | ```
 82 | For example, this is a R notebook,
 83 | ```
 84 | ---
 85 | title: "Diamond sizes"
 86 | date: 2016-08-25
 87 | output: html_notebook
 88 | ---
 89 | 
 90 | Text of the document.
 91 | ```
 92 | 
 93 | The YAML header for the R markdown file will have the line,
 94 | ```
 95 | ouptut: html_document
 96 | ```
 97 | For example, this is a R markdown file.
 98 | ```
 99 | ---
100 | title: "Diamond sizes"
101 | date: 2016-08-25
102 | output: html_document
103 | ---
104 | 
105 | Text of the document.
106 | ```
107 | 
108 | Copying the YAML header from an R notebook to a R markdown file changes it to an R notebook, and vice-versa.
109 | More specifically, an `.Rmd` file can be changed to R markdown file or R notebook by changing the value of the `output` key in the header.
110 | 
111 | The RStudio IDE and the rmarkdown package both use the YAML header of an `.Rmd` file to determine the document-type of the file.
112 | 
113 | For more information on R markdown notebooks see the following sources:
114 | 
115 | -   [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/) section), Chapter [Notebook](https://bookdown.org/yihui/rmarkdown/notebook.html)
116 | -   [Difference between R MarkDown and R NoteBook](https://stackoverflow.com/questions/43820483/difference-between-r-markdown-and-r-notebook/43898504#43898504) StackOverflow thread.
117 | 
118 | </div>
119 | 
120 | ### Exercise 27.2.4 {.unnumbered .exercise data-number="27.2.4"}
121 | 
122 | <div class="question">
123 | 
124 | Create one new R Markdown document for each of the three built-in formats:
125 | HTML, PDF and Word.
126 | Knit each of the three documents.
127 | How does the output
128 | differ? How does the input differ?
129 | (You may need to install LaTeX in order to
130 | build the PDF output — RStudio will prompt you if this is necessary.)
131 | 
132 | </div>
133 | 
134 | <div class="answer">
135 | 
136 | They produce different outputs, both in the final documents and intermediate
137 | files (notably the type of plots produced). The only difference in the inputs
138 | is the value of `output` in the YAML header.
139 | The following `.Rmd` would be knit to HTML.
140 | ```
141 | ---
142 | title: "Diamond sizes"
143 | date: 2016-08-25
144 | output: html_document
145 | ---
146 | 
147 | Text of the document.
148 | ```
149 | If the value of the `output` key is changed to `word_document`, knitting the file will create a Word document (DOCX).
150 | ```
151 | ---
152 | title: "Diamond sizes"
153 | date: 2016-08-25
154 | output: word_document
155 | ---
156 | 
157 | Text of the document.
158 | ```
159 | Similarly, if the value of the `output` key is changed to `pdf_document`, knitting the file will create a PDF.
160 | ```
161 | ---
162 | title: "Diamond sizes"
163 | date: 2016-08-25
164 | output: pdf_document
165 | ---
166 | 
167 | Text of the document.
168 | ```
169 | 
170 | If you click on the *Knit* menu button and then on one of *Knit to HTML*, *Knit to PDF*, or *Knit to Word*,
171 | you will see that the value of the `output` key will change to `html_document`, `pdf_document`, or `word_document`, respectively.
172 | 
173 | ```{r echo=FALSE,purl=FALSE}
174 | knitr::include_graphics("img/rmarkdown-knit-button.png")
175 | ```
176 | 
177 | You will see that the value of `output` will look a little different than the previous examples.
178 | It will add a new line with a value like, `pdf_document: default`.
179 | 
180 | ```yaml
181 | ---
182 | title: "Diamond sizes"
183 | date: 2016-08-25
184 | output:
185 |   pdf_document: default
186 | ---
187 | 
188 | Text of the document.
189 | ```
190 | 
191 | This format is more general, allows the document have multiple output formats as well as configuration settings that allow more fine-grained control over the look of the output format.
192 | The chapter [R Markdown Formats](https://r4ds.had.co.nz/r-markdown-formats.html) discusses output formats for R markdown files in more detail.
193 | 
194 | </div>
195 | 
196 | ## Text formatting with Markdown {#text-formatting-with-markdown .r4ds-section}
197 | 
198 | ### Exercise 27.3.1 {.unnumbered .exercise data-number="27.3.1"}
199 | 
200 | <div class="question">
201 | Practice what you’ve learned by creating a brief CV.
202 | The title should be your name, and you should include headings for (at least) education or employment.
203 | Each of the sections should include a bulleted list of jobs/degrees.
204 | Highlight the year in bold.
205 | </div>
206 | 
207 | <div class="answer">
208 | 
209 | A minimal example is the following CV.
210 | ```{r cv,echo=FALSE,comment='',purl=FALSE}
211 | cat(readr::read_file(here::here("rmarkdown", "cv.Rmd")))
212 | ```
213 | 
214 | Your own example could be much more detailed.
215 | 
216 | </div>
217 | 
218 | ### Exercise 27.3.2 {.unnumbered .exercise data-number="27.3.2"}
219 | 
220 | <div class="question">
221 | 
222 | Using the R Markdown quick reference, figure out how to:
223 | 
224 | 1.  Add a footnote.
225 | 1.  Add a horizontal rule.
226 | 1.  Add a block quote.
227 | 
228 | </div>
229 | 
230 | <div class="answer">
231 | 
232 | ```{r example,echo=FALSE,comment='',purl=FALSE}
233 | cat(readr::read_file(here::here("rmarkdown", "example.Rmd")))
234 | ```
235 | 
236 | </div>
237 | 
238 | ### Exercise 27.3.3 {.unnumbered .exercise data-number="27.3.3"}
239 | 
240 | <div class="question">
241 | 
242 | Copy and paste the contents of `diamond-sizes.Rmd` from <https://github.com/hadley/r4ds/tree/master/rmarkdown> in to a local R markdown document.
243 | Check that you can run it, then add text after the frequency polygon that describes its most striking features.
244 | </div>
245 | 
246 | <div class="answer">
247 | 
248 | The following R markdown document answers this question as well as exercises [Exercise 27.4.1](#exercise-27.4.1), [Exercise 27.4.2](#exercise-27.4.2), and [Exercise 27.4.3](#exercise-27.4.3).
249 | 
250 | ```{r diamond-sizes,echo=FALSE,comment='',purl=FALSE}
251 | cat(readr::read_file(here::here("rmarkdown", "diamond-sizes.Rmd")))
252 | ```
253 | 
254 | </div>
255 | 
256 | ## Code chunks {#code-chunks .r4ds-section}
257 | 
258 | ### Exercise 27.4.1 {.unnumbered .exercise data-number="27.4.1"}
259 | 
260 | <div class="question">
261 | Add a section that explores how diamond sizes vary by cut, color, and clarity.
262 | Assume you’re writing a report for someone who doesn’t know R, and instead of setting `echo = FALSE` on each chunk, set a global option.
263 | </div>
264 | 
265 | <div class="answer">
266 | 
267 | See the answer to [Exercise 27.3.3](#exercise-27.3.3).
268 | 
269 | </div>
270 | 
271 | ### Exercise 27.4.2 {.unnumbered .exercise data-number="27.4.2"}
272 | 
273 | <div class="question">
274 | Download `diamond-sizes.Rmd` from <https://github.com/hadley/r4ds/tree/master/rmarkdown>.
275 | Add a section that describes the largest 20 diamonds, including a table that displays their most important attributes.
276 | </div>
277 | 
278 | <div class="answer">
279 | 
280 | See the answer to [Exercise 27.3.3](#exercise-27.3.3).
281 | I use `arrange()` and `slice()` to select the largest twenty diamonds, and
282 | `knitr::kable()` to produce a formatted table.
283 | 
284 | </div>
285 | 
286 | ### Exercise 27.4.3 {.unnumbered .exercise data-number="27.4.3"}
287 | 
288 | <div class="question">
289 | Modify `diamonds-sizes.Rmd` to use `comma()` to produce nicely formatted output.
290 | Also include the percentage of diamonds that are larger than 2.5 carats.
291 | </div>
292 | 
293 | <div class="answer">
294 | 
295 | See the answer to [Exercise 27.3.3](#exercise-27.3.3).
296 | 
297 | I moved the computation of the number larger and percent of diamonds larger than 2.5 carats into a code chunk.
298 | I find that it is best to keep inline R expressions simple, usually consisting of an object and a formatting function.
299 | This makes it both easier to read and test the R code, while simultaneously making the prose easier to read.
300 | It helps the readability of the code and document to keep the computation of objects used in prose close to their use.
301 | Calculating those objects in a code chunk with the `include = FALSE` option (as is done in `diamonds-size.Rmd`) is useful in this regard.
302 | 
303 | </div>
304 | 
305 | ### Exercise 27.4.4 {.unnumbered .exercise data-number="27.4.4"}
306 | 
307 | <div class="question">
308 | 
309 | Set up a network of chunks where `d` depends on `c` and `b`, and both `b` and `c` depend on `a`. Have each chunk print lubridate::now(), set cache = TRUE, then verify your understanding of caching.
310 | 
311 | </div>
312 | 
313 | <div class="answer">
314 | 
315 | ```{r caching,echo=FALSE,comment='',purl=FALSE}
316 | cat(readr::read_file(here::here("rmarkdown", "caching.Rmd")))
317 | ```
318 | 
319 | </div>
320 | 
321 | ## Troubleshooting {#troubleshooting .r4ds-section}
322 | 
323 | `r no_exercises()`
324 | 
325 | ## YAML header {#yaml-header .r4ds-section}
326 | 
327 | `r no_exercises()`
328 | 
329 | ## Learning more {#learning-more-3 .r4ds-section}
330 | 
331 | `r no_exercises()`
332 | 


--------------------------------------------------------------------------------
/rmarkdown/caching.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: "Exercise 24.4.7.4"
 3 | author: "Jeffrey Arnold"
 4 | date: "2/1/2018"
 5 | output: html_document
 6 | ---
 7 | 
 8 | ```{r setup, include=FALSE}
 9 | knitr::opts_chunk$set(echo = TRUE, cache = TRUE)
10 | ```
11 | 
12 | The chunk `a` has no dependencies.
13 | ```{r a}
14 | print(lubridate::now())
15 | x <- 1
16 | ```
17 | 
18 | The chunk `b` depends on `a`.
19 | ```{r b, dependson = c("a")}
20 | print(lubridate::now())
21 | y <- x + 1
22 | ```
23 | 
24 | The chunk `c` depends on `a`.
25 | ```{r c, dependson = c("a")}
26 | print(lubridate::now())
27 | z <- x * 2
28 | ```
29 | 
30 | The chunk `d` depends on `c` and `b`:
31 | ```{r d, dependson = c("c", "b")}
32 | print(lubridate::now())
33 | w <- y + z
34 | ```
35 | 
36 | If this document is knit repeatedly, the value  printed by `lubridate::now()` 
37 | will be the same for all chunks, and the same as the first time the document
38 | was run with caching.
39 | 


--------------------------------------------------------------------------------
/rmarkdown/cv.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: "Hadley Wickham"
 3 | ---
 4 | 
 5 | ## Employment
 6 | 
 7 | -   Chief Scientist, Rstudio, **2013--present**.
 8 | -   Adjust Professor, Rice University, Houston, TX, **2013--present**.
 9 | -   Assistant Professor, Rice University, Houston, TX, **2008--12**.
10 | 
11 | ## Education
12 | 
13 | -   Ph.D. in Statistics, Iowa State University, Ames, IA,  **2008**
14 | 
15 | -   M.Sc. in Statistics, University of Auckland, New Zealand, **2004**
16 | 
17 | -   B.Sc. in Statistics and Computer Science, First Class Honours, The 
18 |     University of Auckland, New Zealand, **2002**.
19 | 
20 | -   Bachelor of Human Biology, First Class Honours, The University of Auckland, 
21 |     Auckland, New Zealand, **1999**.
22 | 


--------------------------------------------------------------------------------
/rmarkdown/diamond-sizes.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: "Diamond sizes"
 3 | output: html_document
 4 | date: '2018-07-15'
 5 | ---
 6 | 
 7 | ```{r knitr_opts, include = FALSE}
 8 | knitr::opts_chunk$set(echo = FALSE)
 9 | ```
10 | 
11 | ```{r setup, message = FALSE, cache=FALSE}
12 | library("ggplot2")
13 | library("dplyr")
14 | ```
15 | 
16 | ```{r}
17 | smaller <- diamonds %>%
18 |   filter(carat <= 2.5)
19 | ```
20 | 
21 | ```{r include = FALSE, purl = FALSE}
22 | # Hide objects and functions ONLY used inline
23 | n_larger <- nrow(diamonds) - nrow(smaller)
24 | pct_larger <- n_larger / nrow(diamonds) * 100
25 | 
26 | comma <- function(x) {
27 |   format(x, digits = 2, big.mark = ",")
28 | }
29 | ```
30 | 
31 | ## Size and Cut, Color, and Clarity
32 | 
33 | Diamonds with lower quality cuts (cuts are ranked from "Ideal" to "Fair") tend 
34 | to be be larger.
35 | ```{r}
36 | ggplot(diamonds, aes(y = carat, x = cut)) +
37 |   geom_boxplot()
38 | ```
39 | Likewise, diamonds with worse color (diamond colors are ranked from J (worst)
40 | to D (best)) tend to be larger:
41 | 
42 | ```{r}
43 | ggplot(diamonds, aes(y = carat, x = color)) +
44 |   geom_boxplot()
45 | ```
46 | 
47 | The pattern present in cut and color is also present in clarity. Diamonds with 
48 | worse clarity  (I1 (worst), SI1, SI2, VS1, VS2, VVS1, VVS2, IF (best)) tend to
49 | be larger:
50 | 
51 | ```{r}
52 | ggplot(diamonds, aes(y = carat, x = clarity)) +
53 |   geom_boxplot()
54 | ```
55 | 
56 | These patterns are consistent with there being a profitability threshold for 
57 | retail diamonds that is a function of carat, clarity, color, cut and other 
58 | characteristics. A diamond may be profitable to sell if a poor value of one
59 | feature, for example, poor clarity, color, or cut, is be offset by a good value
60 | of another feature, such as a large size. This can be considered an example
61 | of [Berkson's paradox](https://en.wikipedia.org/wiki/Berkson%27s_paradox).
62 | 
63 | ## Largest Diamonds
64 | 
65 | We have data about `r comma(nrow(diamonds))` diamonds. Only
66 | `r n_larger` (`r round(pct_larger, 1)`%) are larger
67 | than 2.5 carats. The distribution of the remainder is shown below:
68 | 
69 | ```{r}
70 | smaller %>%
71 |   ggplot(aes(carat)) +
72 |   geom_freqpoly(binwidth = 0.01)
73 | ```
74 | 
75 | The frequency distribution of diamond sizes is marked by spikes at
76 | whole-number and half-carat values, as well as several other carat values
77 | corresponding to fractions.
78 | 
79 | The largest twenty diamonds (by carat) in the datasets are,
80 | 
81 | ```{r results = "asis"}
82 | diamonds %>%
83 |   arrange(desc(carat)) %>%
84 |   slice(1:20) %>%
85 |   select(carat, cut, color, clarity) %>%
86 |   knitr::kable(
87 |     caption = "The largest 20 diamonds in the `diamonds` dataset."
88 |   )
89 | ```
90 | 
91 | Most of the twenty largest datasets are in the lowest clarity category ("I1"),
92 | with one being in the second best category ("VVS2") The top twenty diamonds 
93 | have colors ranging from the worst, "J", to best, "D",categories, though most
94 | are in the lower categories "J" and "I". The top twenty diamonds are more evenly
95 | distributed among the cut categories, from "Fair" to "Ideal", although the worst
96 | category (Fair) is the most common.
97 | 


--------------------------------------------------------------------------------
/rmarkdown/example.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Horizontal Rules, Block Quotes, and Footnotes
 3 | ---
 4 | 
 5 | The quick brown fox jumped over the lazy dog.[^quick-fox]
 6 | 
 7 | Use three or more `-` for a horizontal rule. For example,
 8 | 
 9 | ---
10 | 
11 | The horizontal rule uses the same syntax as a YAML block? So how does R markdown
12 | distinguish between the two?  Three dashes ("---") is only treated the start of
13 | a YAML block if it is at the start of the document.
14 | 
15 | > This would be a block quote. Generally, block quotes are used to indicate
16 | > quotes longer than a three or four lines.
17 | 
18 | [^quick-fox]: This is an example of a footnote. The sentence this is footnoting
19 |   is often used for displaying fonts because it includes all 26 letters of the
20 |   English alphabet.
21 | 


--------------------------------------------------------------------------------
/tibble.Rmd:
--------------------------------------------------------------------------------
  1 | # Tibbles {#tibbles .r4ds-section}
  2 | 
  3 | ```{r setup,message=FALSE,cache=FALSE}
  4 | library("tidyverse")
  5 | ```
  6 | 
  7 | ## Exercise 10.1 {.unnumbered .exercise data-number="10.1"}
  8 | 
  9 | <div class="question">
 10 | How can you tell if an object is a tibble? (Hint: try printing `mtcars`, which is a regular data frame).
 11 | </div>
 12 | 
 13 | <div class="answer">
 14 | 
 15 | When we print `mtcars`, it prints all the columns.
 16 | ```{r}
 17 | mtcars
 18 | ```
 19 | 
 20 | But when we first convert `mtcars` to a tibble using `as_tibble()`, it prints only the first ten observations. 
 21 | There are also some other differences in formatting of the printed data frame.
 22 | It prints the number of rows and columns and the date type of each column.
 23 | ```{r}
 24 | as_tibble(mtcars)
 25 | ```
 26 | 
 27 | You can use the function `is_tibble()` to check whether a data frame is a tibble or not.
 28 | The `mtcars` data frame is not a tibble.
 29 | ```{r}
 30 | is_tibble(mtcars)
 31 | ```
 32 | But the `diamonds` and `flights` data are tibbles.
 33 | ```{r}
 34 | is_tibble(ggplot2::diamonds)
 35 | is_tibble(nycflights13::flights)
 36 | is_tibble(as_tibble(mtcars))
 37 | ```
 38 | 
 39 | More generally, you can use the `class()` function to find out the class of an
 40 | object. Tibbles has the classes `c("tbl_df", "tbl", "data.frame")`, while old
 41 | data frames will only have the class `"data.frame"`.
 42 | ```{r}
 43 | class(mtcars)
 44 | class(ggplot2::diamonds)
 45 | class(nycflights13::flights)
 46 | ```
 47 | 
 48 | If you are interested in reading more on R's classes, read the chapters on
 49 | object oriented programming in [Advanced R](http://adv-r.had.co.nz/S3.html).
 50 | 
 51 | </div>
 52 | 
 53 | ## Exercise 10.2 {.unnumbered .exercise data-number="10.2"}
 54 | 
 55 | <div class="question">
 56 | Compare and contrast the following operations on a `data.frame` and equivalent tibble. What is different? Why might the default data frame behaviors cause you frustration?
 57 | </div>
 58 | 
 59 | <div class="answer">
 60 | 
 61 | ```{r}
 62 | df <- data.frame(abc = 1, xyz = "a")
 63 | df$x
 64 | df[, "xyz"]
 65 | df[, c("abc", "xyz")]
 66 | ```
 67 | 
 68 | ```{r}
 69 | tbl <- as_tibble(df)
 70 | tbl$x
 71 | tbl[, "xyz"]
 72 | tbl[, c("abc", "xyz")]
 73 | ```
 74 | 
 75 | The `$` operator will match any column name that starts with the name following it.
 76 | Since there is a column named `xyz`, the expression `df$x` will be expanded to `df$xyz`. 
 77 | This behavior of the `$` operator saves a few keystrokes, but it can result in accidentally using a different column than you thought you were using.
 78 | 
 79 | With data.frames, with `[` the type of object that is returned differs on the
 80 | number of columns. If it is one column, it won't return a data.frame, but
 81 | instead will return a vector. With more than one column, then it will return a
 82 | data.frame. This is fine if you know what you are passing in, but suppose you
 83 | did `df[ , vars]` where `vars` was a variable. Then what that code does
 84 | depends on `length(vars)` and you'd have to write code to account for those
 85 | situations or risk bugs.
 86 | 
 87 | </div>
 88 | 
 89 | ## Exercise 10.3 {.unnumbered .exercise data-number="10.3"}
 90 | 
 91 | <div class="question">
 92 | If you have the name of a variable stored in an object, e.g. `var <- "mpg"`, how can you extract the reference variable from a tibble?
 93 | </div>
 94 | 
 95 | <div class="answer">
 96 | 
 97 | You can use the double bracket, like `df[[var]]`. You cannot use the dollar sign, because `df$var` would look for a column named `var`.
 98 | 
 99 | </div>
100 | 
101 | ## Exercise 10.4 {.unnumbered .exercise data-number="10.4"}
102 | 
103 | <div class="question">
104 | 
105 | Practice referring to non-syntactic names in the following data frame by:
106 | 
107 | 1.  Extracting the variable called 1.
108 | 1.  Plotting a scatterplot of 1 vs 2.
109 | 1.  Creating a new column called 3 which is 2 divided by 1.
110 | 1.  Renaming the columns to one, two and three.
111 | 
112 | </div>
113 | 
114 | <div class="answer">
115 | 
116 | For this example, I'll create a dataset called annoying with
117 | columns named `1` and `2`.
118 | 
119 | ```{r}
120 | annoying <- tibble(
121 |   `1` = 1:10,
122 |   `2` = `1` * 2 + rnorm(length(`1`))
123 | )
124 | ```
125 | 
126 | 1.  To extract the variable named `1`:
127 | 
128 |     ```{r}
129 |     annoying[["1"]]
130 |     ```
131 | 
132 |     or
133 | 
134 |     ```{r}
135 |     annoying$`1`
136 |     ```
137 | 
138 | 1.  To create a scatter plot of `1` vs. `2`:
139 | 
140 |     ```{r}
141 |     ggplot(annoying, aes(x = `1`, y = `2`)) +
142 |       geom_point()
143 |     ```
144 | 
145 | 1.  To add a new column `3` which is `2` divided by `1`:
146 | 
147 |     ```{r}
148 |     mutate(annoying, `3` = `2` / `1`)
149 |     ```
150 |     
151 |     or 
152 |     
153 |     ```{r}
154 |     annoying[["3"]] <- annoying$`2` / annoying$`1`
155 |     ```
156 | 
157 |     or
158 | 
159 |     ```{r}
160 |     annoying[["3"]] <- annoying[["2"]] / annoying[["1"]]
161 |     ```
162 | 
163 | 1.  To rename the columns to `one`, `two`, and `three`, run:
164 | 
165 |     ```{r}
166 |     annoying <- rename(annoying, one = `1`, two = `2`, three = `3`)
167 |     glimpse(annoying)
168 |     ```
169 | 
170 | </div>
171 | 
172 | ## Exercise 10.5 {.unnumbered .exercise data-number="10.5"}
173 | 
174 | <div class="question">
175 | What does `tibble::enframe()` do? When might you use it?
176 | </div>
177 | 
178 | <div class="answer">
179 | 
180 | The function `tibble::enframe()` converts named vectors to a data frame with names and values
181 | 
182 | ```{r}
183 | enframe(c(a = 1, b = 2, c = 3))
184 | ```
185 | 
186 | </div>
187 | 
188 | ## Exercise 10.6 {.unnumbered .exercise data-number="10.6"}
189 | 
190 | <div class="question">
191 | What option controls how many additional column names are printed at the footer of a tibble?
192 | </div>
193 | 
194 | <div class="answer">
195 | 
196 | The help page for the `print()` method of tibble objects is discussed in `?print.tbl`.
197 | The `n_extra` argument determines the number of extra columns to print information for.
198 | 
199 | </div>
200 | 


--------------------------------------------------------------------------------
/workflow-basics.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | output: html_document
  3 | editor_options:
  4 |   chunk_output_type: console
  5 | ---
  6 | # Workflow: basics {#workflow-basics .r4ds-section}
  7 | 
  8 | ```{r message=FALSE,cache=FALSE}
  9 | library("tidyverse")
 10 | ```
 11 | 
 12 | ## Exercise 4.1 {.unnumbered .exercise data-number="4.1"}
 13 | 
 14 | <div class="question">
 15 | Why does this code not work?
 16 | ```{r error=TRUE}
 17 | my_variable <- 10
 18 | my_varıable
 19 | ```
 20 | </div>
 21 | 
 22 | <div class="answer">
 23 | 
 24 | The variable being printed is `my_varıable`, not `my_variable`:
 25 | the seventh character is "ı" ("[LATIN SMALL LETTER DOTLESS I](https://en.wikipedia.org/wiki/Dotted_and_dotless_I)"), not "i".
 26 | 
 27 | While it wouldn't have helped much in this case, the importance of
 28 | distinguishing characters in code is reasons why fonts which clearly
 29 | distinguish similar characters are preferred in programming.
 30 | It is especially important to distinguish between two sets of similar looking characters:
 31 | 
 32 | -   the numeral zero (0), the Latin small letter O (o), and the Latin capital letter O (O),
 33 | -   the numeral one (1), the Latin small letter I (i), the Latin capital letter I (I), and Latin small letter L (l).
 34 | 
 35 | In these fonts, zero and the Latin letter O are often distinguished by using a glyph for zero that uses either a dot in the interior or a slash through it.
 36 | Some examples of fonts with dotted or slashed zero glyphs  are Consolas, Deja Vu Sans Mono, Monaco, Menlo, [Source Sans Pro](https://adobe-fonts.github.io/source-sans-pro/), and FiraCode.
 37 | 
 38 | Error messages of the form `"object '...' not found"` mean exactly what they say.
 39 | R cannot find an object with that name.
 40 | Unfortunately, the error does not tell you why that object cannot be found, because R does not know the reason that the object does not exist.
 41 | The most common scenarios in which I encounter this error message are
 42 | 
 43 | 1.  I forgot to create the object, or an error prevented the object from being created.
 44 | 
 45 | 1.  I made a typo in the object's name, either when using it or when I created it (as in the example above), or I forgot what I had originally named it.
 46 |     If you find yourself often writing the wrong name for an object,
 47 |     it is a good indication that the original name was not a good one.
 48 | 
 49 | 1.  I forgot to load the package that contains the object using `library()`.
 50 | 
 51 | </div>
 52 | 
 53 | ## Exercise 4.2 {.unnumbered .exercise data-number="4.2"}
 54 | 
 55 | <div class="question">
 56 | 
 57 | Tweak each of the following R commands so that they run correctly:
 58 | 
 59 | ```{r, eval = FALSE}
 60 | ggplot(dota = mpg) + 
 61 |   geom_point(mapping = aes(x = displ, y = hwy))
 62 | 
 63 | fliter(mpg, cyl = 8)
 64 | filter(diamond, carat > 3)
 65 | ```
 66 | 
 67 | </div>
 68 | 
 69 | <div class="answer">
 70 | 
 71 | ```{r error=TRUE}
 72 | ggplot(dota = mpg) +
 73 |   geom_point(mapping = aes(x = displ, y = hwy))
 74 | ```
 75 | The error message is `argument "data" is missing, with no default`.
 76 | This error is a result of a typo, `dota` instead of `data`.
 77 | ```{r error=TRUE}
 78 | ggplot(data = mpg) +
 79 |   geom_point(mapping = aes(x = displ, y = hwy))
 80 | ```
 81 | 
 82 | ```{r error=TRUE}
 83 | fliter(mpg, cyl = 8)
 84 | ```
 85 | 
 86 | R could not find the function `fliter()` because we made a typo: `fliter` instead of `filter`.
 87 | 
 88 | ```{r error=TRUE}
 89 | filter(mpg, cyl = 8)
 90 | ```
 91 | 
 92 | We aren't done yet. But the error message gives a suggestion. Let's follow it.
 93 | 
 94 | ```{r error=TRUE}
 95 | filter(mpg, cyl == 8)
 96 | ```
 97 | 
 98 | ```{r error=TRUE}
 99 | filter(diamond, carat > 3)
100 | ```
101 | 
102 | R says it can't find the object `diamond`.
103 | This is a typo; the data frame is named `diamonds`.
104 | ```{r error=TRUE}
105 | filter(diamonds, carat > 3)
106 | ```
107 | 
108 | How did I know? I started typing in `diamond` and RStudio completed it to `diamonds`.
109 | Since `diamonds` includes the variable `carat` and the code works, that appears to have been the problem.
110 | 
111 | </div>
112 | 
113 | ## Exercise 4.3 {.unnumbered .exercise data-number="4.3"}
114 | 
115 | <div class="question">
116 | Press *Alt + Shift + K*. What happens? How can you get to the same place using the menus?
117 | </div>
118 | 
119 | <div class="answer">
120 | 
121 | This gives a menu with keyboard shortcuts. This can be found in the menu under `Tools -> Keyboard Shortcuts Help`.
122 | 
123 | </div>
124 | 


--------------------------------------------------------------------------------
/workflow-projects.Rmd:
--------------------------------------------------------------------------------
1 | # Workflow: projects {#workflow-projects .r4ds-section}
2 | 
3 | `r no_exercises()`
4 | 


--------------------------------------------------------------------------------
/workflow-scripts.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | output: html_document
 3 | editor_options:
 4 |   chunk_output_type: console
 5 | ---
 6 | 
 7 | # Workflow: scripts {#workflow-scripts .r4ds-section}
 8 | 
 9 | ## Exercise 6.1 {.unnumbered .exercise data-number="6.1"}
10 | 
11 | <div class="question">
12 | 
13 | Go to the RStudio Tips twitter account, <https://twitter.com/rstudiotips> and find one tip that looks interesting. 
14 | Practice using it!
15 | 
16 | </div>
17 | 
18 | <div class="answer">
19 | 
20 | The current timeline of [\@rstudiotips](https://twitter.com/rstudiotips) is displayed below.
21 | 
22 | <a class="twitter-timeline" href="https://twitter.com/rstudiotips?ref_src=twsrc%5Etfw"> Tweets by rstudiotips</a> 
23 | <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
24 | 
25 | </div>
26 | 
27 | ## Exercise 6.2 {.unnumbered .exercise data-number="6.2"}
28 | 
29 | <div class="question">
30 | 
31 | What other common mistakes will RStudio diagnostics report?
32 | Read <https://support.rstudio.com/hc/en-us/articles/205753617-Code-Diagnostics> to find out.
33 | 
34 | </div>
35 | 
36 | <div class="answer">
37 | 
38 | You should read that page, but some other diagnostics for R code include the following.
39 | 
40 | 1.  Check for missing, unmatched, partially matched, and too many arguments to functions.
41 | 1.  Warn if a variable is not defined.
42 | 1.  Warn if a variable is defined but not used.
43 | 1.  Check that the code style conforms to the [tidyverse style guide](https://style.tidyverse.org/).
44 | 
45 | </div>
46 | 


--------------------------------------------------------------------------------
/wrangle.Rmd:
--------------------------------------------------------------------------------
1 | # (PART) Wrangle {-}
2 | 
3 | # Introduction {#wrangle-intro .r4ds-section}
4 | 
5 | `r no_exercises()`
6 | 


--------------------------------------------------------------------------------