├── .Rprofile
├── .gitignore
├── LICENSE.md
├── R
└── helpers.R
├── _extensions
├── coatless
│ └── webr
│ │ ├── _extension.yml
│ │ ├── qwebr-cell-elements.js
│ │ ├── qwebr-cell-initialization.js
│ │ ├── qwebr-compute-engine.js
│ │ ├── qwebr-document-engine-initialization.js
│ │ ├── qwebr-document-settings.js
│ │ ├── qwebr-document-status.js
│ │ ├── qwebr-monaco-editor-element.js
│ │ ├── qwebr-monaco-editor-init.html
│ │ ├── qwebr-styling.css
│ │ ├── qwebr-theme-switch.js
│ │ ├── template.qmd
│ │ ├── webr-serviceworker.js
│ │ ├── webr-worker.js
│ │ └── webr.lua
└── quarto-ext
│ └── fontawesome
│ ├── _extension.yml
│ ├── assets
│ ├── css
│ │ ├── all.css
│ │ └── latex-fontsize.css
│ └── webfonts
│ │ ├── FontAwesome6Brands-Regular-400.ttf
│ │ ├── FontAwesome6Brands-Regular-400.woff2
│ │ ├── FontAwesome6Free-Regular-400.ttf
│ │ ├── FontAwesome6Free-Regular-400.woff2
│ │ ├── FontAwesome6Free-Solid-900.ttf
│ │ ├── FontAwesome6Free-Solid-900.woff2
│ │ ├── fa-brands-400.ttf
│ │ ├── fa-brands-400.woff2
│ │ ├── fa-regular-400.ttf
│ │ ├── fa-regular-400.woff2
│ │ ├── fa-solid-900.ttf
│ │ ├── fa-solid-900.woff2
│ │ ├── fa-v4compatibility.ttf
│ │ └── fa-v4compatibility.woff2
│ └── fontawesome.lua
├── _quarto.yml
├── about.qmd
├── basics
├── 01-visualization-basics
│ ├── 01-code-template.qmd
│ ├── 02-aesthetic-mappings.qmd
│ ├── 03-geometric-objects.qmd
│ ├── 04-ggplot2-package.qmd
│ └── index.qmd
└── 02-programming-basics
│ ├── 01-functions.qmd
│ ├── 02-arguments.qmd
│ ├── 03-objects.qmd
│ ├── 04-vectors.qmd
│ ├── 05-types.qmd
│ ├── 06-lists.qmd
│ ├── 07-packages.qmd
│ └── index.qmd
├── deploy.sh
├── html
└── custom.scss
├── index.qmd
├── js
├── bootstrapify.js
└── progressive-reveal.js
├── r-primers.Rproj
├── renv.lock
├── renv
├── .gitignore
├── activate.R
└── settings.json
├── tidy-data
└── 01-reshape-data
│ ├── 01-tidy-data.qmd
│ ├── 02-wide-to-long.qmd
│ ├── 03-long-to-wide.qmd
│ ├── img
│ ├── tidy.png
│ └── vectorized.png
│ └── index.qmd
├── transform-data
├── 01-tibbles
│ ├── 01-babynames.qmd
│ ├── 02-tibbles.qmd
│ ├── 03-tidyverse.qmd
│ ├── img
│ │ └── tibble_display.png
│ └── index.qmd
├── 02-isolating
│ ├── 01-your-name.qmd
│ ├── 02-select.qmd
│ ├── 03-filter.qmd
│ ├── 04-arrange.qmd
│ ├── 05-pipe.qmd
│ └── index.qmd
└── 03-deriving
│ ├── 01-most-popular-names.qmd
│ ├── 02-summarize.qmd
│ ├── 03-group_by.qmd
│ ├── 04-mutate.qmd
│ ├── 05-challenges.qmd
│ ├── index.qmd
│ └── video
│ ├── grp-mutate.mp4
│ ├── grp-summarize-00.mp4
│ ├── grp-summarize-01.mp4
│ ├── grp-summarize-02.mp4
│ ├── grp-summarize-03.mp4
│ └── mutate.mp4
└── visualize-data
├── 01-eda
├── 01-eda.qmd
├── 02-variation.qmd
├── 03-covariation.qmd
├── img
│ └── plots-table.png
└── index.qmd
├── 02-bar-charts
├── 01-bar-charts.qmd
├── 02-aesthetics.qmd
├── 03-position-adjustments.qmd
├── 04-facets.qmd
├── img
│ └── positions.png
└── index.qmd
├── 03-histograms
├── 01-histograms.qmd
├── 02-similar-geoms.qmd
└── index.qmd
├── 04-boxplots
├── 01-boxplots.qmd
├── 02-similar-geoms.qmd
├── 03-counts.qmd
├── img
│ └── box-png.png
└── index.qmd
├── 05-scatterplots
├── 01-scatterplots.qmd
├── 02-layers.qmd
├── 03-coordinate-systems.qmd
└── index.qmd
├── 06-line-graphs
├── 01-line-graphs.qmd
├── 02-similar-geoms.qmd
├── 03-maps.qmd
└── index.qmd
├── 07-overplotting
├── 01-overplotting.qmd
├── 02-rounding.qmd
├── 03-large-data.qmd
└── index.qmd
└── 08-customize
├── 01-zooming.qmd
├── 02-labels.qmd
├── 03-themes.qmd
├── 04-scales.qmd
├── 05-legends.qmd
├── 06-quiz.qmd
├── img
└── viridis.png
└── index.qmd
/.Rprofile:
--------------------------------------------------------------------------------
1 | source("renv/activate.R")
2 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .Rproj.user
2 | .Rhistory
3 | .RData
4 | .Ruserdata
5 |
6 | /.quarto/
7 | _site/
8 | _freeze/
9 | **_cache/
10 | **_files/
11 |
--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
1 | # License
2 |
3 | This is a human-readable summary of (and not a substitute for) the license.
4 | Please see
5 | for the full legal text.
6 |
7 | This work is licensed under the Creative Commons
8 | Attribution-ShareAlike 4.0 License (CC BY-SA 4.0).
9 |
10 | **You are free to:**
11 |
12 | - **Share**---copy and redistribute the material in any medium or format, even commercially.
13 | - **Adapt**---remix, transform, and build upon the material for any purpose, even commercially.
14 |
15 | The licensor cannot revoke these freedoms as long as you follow the license terms.
16 |
17 | **Under the following terms:**
18 |
19 | - **Attribution**---You must give appropriate credit, provide a link to the
20 | license, and indicate if changes were made. You may do so in any reasonable
21 | manner, but not in any way that suggests the licensor endorses you or your
22 | use.
23 |
24 | The primers are derived from the book _R for Data Science_. **For the purposes of this license, appropriate credit requires including the phrase, "R for Data Science from O'Reilly Media, Inc. Copyright © 2017 Garrett Grolemund, Hadley Wickham. Used with permission."**
25 |
26 | - **ShareAlike**-—-If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
27 |
28 | - **No additional restrictions**---You may not apply legal terms or
29 | technological measures that legally restrict others from doing
30 | anything the license permits.
31 |
32 | **Notices:**
33 |
34 | You do not have to comply with the license for elements of the material in the
35 | public domain or where your use is permitted by an applicable exception or
36 | limitation.
37 |
38 | No warranties are given. The license may not give you all of the permissions
39 | necessary for your intended use. For example, other rights such as publicity,
40 | privacy, or moral rights may limit how you use the material.
41 |
--------------------------------------------------------------------------------
/R/helpers.R:
--------------------------------------------------------------------------------
1 | create_buttons <- function(next_topic = "#") {
2 | if (is.null(next_topic)) {
3 | next_button <- ""
4 | } else {
5 | next_button <- glue::glue('Next topic ')
6 | }
7 | button_section <- glue::glue('
8 | ')
13 |
14 | cat(button_section)
15 | }
16 |
--------------------------------------------------------------------------------
/_extensions/coatless/webr/_extension.yml:
--------------------------------------------------------------------------------
1 | name: webr
2 | title: Embedded webr code cells
3 | author: James Joseph Balamuta
4 | version: 0.4.2-dev.6
5 | quarto-required: ">=1.4.554"
6 | contributes:
7 | filters:
8 | - webr.lua
9 |
--------------------------------------------------------------------------------
/_extensions/coatless/webr/qwebr-cell-initialization.js:
--------------------------------------------------------------------------------
1 | // Handle cell initialization initialization
2 | qwebrCellDetails.map(
3 | (entry) => {
4 | // Handle the creation of the element
5 | qwebrCreateHTMLElement(entry);
6 | // In the event of interactive, initialize the monaco editor
7 | if (entry.options.context == EvalTypes.Interactive) {
8 | qwebrCreateMonacoEditorInstance(entry);
9 | }
10 | }
11 | );
12 |
13 | // Identify non-interactive cells (in order)
14 | const filteredEntries = qwebrCellDetails.filter(entry => {
15 | const contextOption = entry.options && entry.options.context;
16 | return ['output', 'setup'].includes(contextOption) || (contextOption == "interactive" && entry.options && entry.options.autorun === 'true');
17 | });
18 |
19 | // Condition non-interactive cells to only be run after webR finishes its initialization.
20 | qwebrInstance.then(
21 | async () => {
22 | const nHiddenCells = filteredEntries.length;
23 | var currentHiddenCell = 0;
24 |
25 |
26 | // Modify button state
27 | qwebrSetInteractiveButtonState(`🟡 Running hidden code cells ...`, false);
28 |
29 | // Begin processing non-interactive sections
30 | // Due to the iteration policy, we must use a for() loop.
31 | // Otherwise, we would need to switch to using reduce with an empty
32 | // starting promise
33 | for (const entry of filteredEntries) {
34 |
35 | // Determine cell being examined
36 | currentHiddenCell = currentHiddenCell + 1;
37 | const formattedMessage = `Evaluating hidden cell ${currentHiddenCell} out of ${nHiddenCells}`;
38 |
39 | // Update the document status header
40 | if (qwebrShowStartupMessage) {
41 | qwebrUpdateStatusHeader(formattedMessage);
42 | }
43 |
44 | // Display the update in non-active areas
45 | qwebrUpdateStatusMessage(formattedMessage);
46 |
47 | // Extract details on the active cell
48 | const evalType = entry.options.context;
49 | const cellCode = entry.code;
50 | const qwebrCounter = entry.id;
51 |
52 | if (['output', 'setup'].includes(evalType)) {
53 | // Disable further global status updates
54 | const activeContainer = document.getElementById(`qwebr-non-interactive-loading-container-${qwebrCounter}`);
55 | activeContainer.classList.remove('qwebr-cell-needs-evaluation');
56 | activeContainer.classList.add('qwebr-cell-evaluated');
57 |
58 | // Update status on the code cell
59 | const activeStatus = document.getElementById(`qwebr-status-text-${qwebrCounter}`);
60 | activeStatus.innerText = " Evaluating hidden code cell...";
61 | activeStatus.classList.remove('qwebr-cell-needs-evaluation');
62 | activeStatus.classList.add('qwebr-cell-evaluated');
63 | }
64 |
65 | switch (evalType) {
66 | case 'interactive':
67 | // TODO: Make this more standardized.
68 | // At the moment, we're overriding the interactive status update by pretending its
69 | // output-like.
70 | const tempOptions = entry.options;
71 | tempOptions["context"] = "output"
72 | // Run the code in a non-interactive state that is geared to displaying output
73 | await qwebrExecuteCode(`${cellCode}`, qwebrCounter, tempOptions);
74 | break;
75 | case 'output':
76 | // Run the code in a non-interactive state that is geared to displaying output
77 | await qwebrExecuteCode(`${cellCode}`, qwebrCounter, entry.options);
78 | break;
79 | case 'setup':
80 | const activeDiv = document.getElementById(`qwebr-noninteractive-setup-area-${qwebrCounter}`);
81 | // Run the code in a non-interactive state with all output thrown away
82 | await mainWebR.evalRVoid(`${cellCode}`);
83 | break;
84 | default:
85 | break;
86 | }
87 |
88 | if (['output', 'setup'].includes(evalType)) {
89 | // Disable further global status updates
90 | const activeContainer = document.getElementById(`qwebr-non-interactive-loading-container-${qwebrCounter}`);
91 | // Disable visibility
92 | activeContainer.style.visibility = 'hidden';
93 | activeContainer.style.display = 'none';
94 | }
95 | }
96 | }
97 | ).then(
98 | () => {
99 | // Release document status as ready
100 |
101 | if (qwebrShowStartupMessage) {
102 | qwebrStartupMessage.innerText = "🟢 Ready!"
103 | }
104 |
105 | qwebrSetInteractiveButtonState(
106 | ` Run Code `,
107 | true
108 | );
109 | }
110 | );
--------------------------------------------------------------------------------
/_extensions/coatless/webr/qwebr-document-engine-initialization.js:
--------------------------------------------------------------------------------
1 | // Function to install a single package
2 | async function qwebrInstallRPackage(packageName) {
3 | await mainWebR.evalRVoid(`webr::install('${packageName}');`);
4 | }
5 |
6 | // Function to load a single package
7 | async function qwebrLoadRPackage(packageName) {
8 | await mainWebR.evalRVoid(`require('${packageName}', quietly = TRUE)`);
9 | }
10 |
11 | // Generic function to process R packages
12 | async function qwebrProcessRPackagesWithStatus(packages, processType, displayStatusMessageUpdate = true) {
13 | // Switch between contexts
14 | const messagePrefix = processType === 'install' ? 'Installing' : 'Loading';
15 |
16 | // Modify button state
17 | qwebrSetInteractiveButtonState(`🟡 ${messagePrefix} package ...`, false);
18 |
19 | // Iterate over packages
20 | for (let i = 0; i < packages.length; i++) {
21 | const activePackage = packages[i];
22 | const formattedMessage = `${messagePrefix} package ${i + 1} out of ${packages.length}: ${activePackage}`;
23 |
24 | // Display the update in header
25 | if (displayStatusMessageUpdate) {
26 | qwebrUpdateStatusHeader(formattedMessage);
27 | }
28 |
29 | // Display the update in non-active areas
30 | qwebrUpdateStatusMessage(formattedMessage);
31 |
32 | // Run package installation
33 | if (processType === 'install') {
34 | await qwebrInstallRPackage(activePackage);
35 | } else {
36 | await qwebrLoadRPackage(activePackage);
37 | }
38 | }
39 |
40 | // Clean slate
41 | if (processType === 'load') {
42 | await mainWebR.flush();
43 | }
44 | }
45 |
46 | // Start a timer
47 | const initializeWebRTimerStart = performance.now();
48 |
49 | // Encase with a dynamic import statement
50 | globalThis.qwebrInstance = import(qwebrCustomizedWebROptions.baseURL + "webr.mjs").then(
51 | async ({ WebR, ChannelType }) => {
52 | // Populate WebR options with defaults or new values based on `webr` meta
53 | globalThis.mainWebR = new WebR(qwebrCustomizedWebROptions);
54 |
55 | // Initialization WebR
56 | await mainWebR.init();
57 |
58 | // Setup a shelter
59 | globalThis.mainWebRCodeShelter = await new mainWebR.Shelter();
60 |
61 | // Setup a pager to allow processing help documentation
62 | await mainWebR.evalRVoid('webr::pager_install()');
63 |
64 | // Override the existing install.packages() to use webr::install()
65 | await mainWebR.evalRVoid('webr::shim_install()');
66 |
67 | // Specify the repositories to pull from
68 | // Note: webR does not use the `repos` option, but instead uses `webr_pkg_repos`
69 | // inside of `install()`. However, other R functions still pull from `repos`.
70 | await mainWebR.evalRVoid(`
71 | options(
72 | webr_pkg_repos = c(${qwebrPackageRepoURLS.map(repoURL => `'${repoURL}'`).join(',')}),
73 | repos = c(${qwebrPackageRepoURLS.map(repoURL => `'${repoURL}'`).join(',')})
74 | )
75 | `);
76 |
77 | // Check to see if any packages need to be installed
78 | if (qwebrSetupRPackages) {
79 | // Obtain only a unique list of packages
80 | const uniqueRPackageList = Array.from(new Set(qwebrInstallRPackagesList));
81 |
82 | // Install R packages one at a time (either silently or with a status update)
83 | await qwebrProcessRPackagesWithStatus(uniqueRPackageList, 'install', qwebrShowStartupMessage);
84 |
85 | if (qwebrAutoloadRPackages) {
86 | // Load R packages one at a time (either silently or with a status update)
87 | await qwebrProcessRPackagesWithStatus(uniqueRPackageList, 'load', qwebrShowStartupMessage);
88 | }
89 | }
90 | }
91 | );
92 |
93 | // Stop timer
94 | const initializeWebRTimerEnd = performance.now();
95 |
--------------------------------------------------------------------------------
/_extensions/coatless/webr/qwebr-document-settings.js:
--------------------------------------------------------------------------------
1 | // Document level settings ----
2 |
3 | // Determine if we need to install R packages
4 | globalThis.qwebrInstallRPackagesList = [{{INSTALLRPACKAGESLIST}}];
5 |
6 | // Specify possible locations to search for the repository
7 | globalThis.qwebrPackageRepoURLS = [{{RPACKAGEREPOURLS}}];
8 |
9 | // Check to see if we have an empty array, if we do set to skip the installation.
10 | globalThis.qwebrSetupRPackages = !(qwebrInstallRPackagesList.indexOf("") !== -1);
11 | globalThis.qwebrAutoloadRPackages = {{AUTOLOADRPACKAGES}};
12 |
13 | // Display a startup message?
14 | globalThis.qwebrShowStartupMessage = {{SHOWSTARTUPMESSAGE}};
15 | globalThis.qwebrShowHeaderMessage = {{SHOWHEADERMESSAGE}};
16 |
17 | // Describe the webR settings that should be used
18 | globalThis.qwebrCustomizedWebROptions = {
19 | "baseURL": "{{BASEURL}}",
20 | "serviceWorkerUrl": "{{SERVICEWORKERURL}}",
21 | "homedir": "{{HOMEDIR}}",
22 | "channelType": "{{CHANNELTYPE}}"
23 | };
24 |
25 | // Store cell data
26 | globalThis.qwebrCellDetails = {{QWEBRCELLDETAILS}};
27 |
--------------------------------------------------------------------------------
/_extensions/coatless/webr/qwebr-document-status.js:
--------------------------------------------------------------------------------
1 | // Declare startupMessageQWebR globally
2 | globalThis.qwebrStartupMessage = document.createElement("p");
3 |
4 | // Verify if OffScreenCanvas is supported
5 | globalThis.qwebrOffScreenCanvasSupport = function() {
6 | return typeof OffscreenCanvas !== 'undefined'
7 | }
8 |
9 | // Function to set the button text
10 | globalThis.qwebrSetInteractiveButtonState = function(buttonText, enableCodeButton = true) {
11 | document.querySelectorAll(".qwebr-button-run").forEach((btn) => {
12 | btn.innerHTML = buttonText;
13 | btn.disabled = !enableCodeButton;
14 | });
15 | }
16 |
17 | // Function to update the status message in non-interactive cells
18 | globalThis.qwebrUpdateStatusMessage = function(message) {
19 | document.querySelectorAll(".qwebr-status-text.qwebr-cell-needs-evaluation").forEach((elem) => {
20 | elem.innerText = message;
21 | });
22 | }
23 |
24 | // Function to update the status message
25 | globalThis.qwebrUpdateStatusHeader = function(message) {
26 | qwebrStartupMessage.innerHTML = `
27 |
28 | ${message} `;
29 | }
30 |
31 | function qwebrPlaceMessageContents(content, html_location = "title-block-header", revealjs_location = "title-slide") {
32 |
33 | // Get references to header elements
34 | const headerHTML = document.getElementById(html_location);
35 | const headerRevealJS = document.getElementById(revealjs_location);
36 |
37 | // Determine where to insert the quartoTitleMeta element
38 | if (headerHTML || headerRevealJS) {
39 | // Append to the existing "title-block-header" element or "title-slide" div
40 | (headerHTML || headerRevealJS).appendChild(content);
41 | } else {
42 | // If neither headerHTML nor headerRevealJS is found, insert after "webr-monaco-editor-init" script
43 | const monacoScript = document.getElementById("qwebr-monaco-editor-init");
44 | const header = document.createElement("header");
45 | header.setAttribute("id", "title-block-header");
46 | header.appendChild(content);
47 | monacoScript.after(header);
48 | }
49 | }
50 |
51 |
52 | function qwebrOffScreenCanvasSupportWarningMessage() {
53 |
54 | // Verify canvas is supported.
55 | if(qwebrOffScreenCanvasSupport()) return;
56 |
57 | // Create the main container div
58 | var calloutContainer = document.createElement('div');
59 | calloutContainer.classList.add('callout', 'callout-style-default', 'callout-warning', 'callout-titled');
60 |
61 | // Create the header div
62 | var headerDiv = document.createElement('div');
63 | headerDiv.classList.add('callout-header', 'd-flex', 'align-content-center');
64 |
65 | // Create the icon container div
66 | var iconContainer = document.createElement('div');
67 | iconContainer.classList.add('callout-icon-container');
68 |
69 | // Create the icon element
70 | var iconElement = document.createElement('i');
71 | iconElement.classList.add('callout-icon');
72 |
73 | // Append the icon element to the icon container
74 | iconContainer.appendChild(iconElement);
75 |
76 | // Create the title container div
77 | var titleContainer = document.createElement('div');
78 | titleContainer.classList.add('callout-title-container', 'flex-fill');
79 | titleContainer.innerText = 'Warning: Web Browser Does Not Support Graphing!';
80 |
81 | // Append the icon container and title container to the header div
82 | headerDiv.appendChild(iconContainer);
83 | headerDiv.appendChild(titleContainer);
84 |
85 | // Create the body container div
86 | var bodyContainer = document.createElement('div');
87 | bodyContainer.classList.add('callout-body-container', 'callout-body');
88 |
89 | // Create the paragraph element for the body content
90 | var paragraphElement = document.createElement('p');
91 | paragraphElement.innerHTML = 'This web browser does not have support for displaying graphs through the quarto-webr
extension since it lacks an OffScreenCanvas
. Please upgrade your web browser to one that supports OffScreenCanvas
.';
92 |
93 | // Append the paragraph element to the body container
94 | bodyContainer.appendChild(paragraphElement);
95 |
96 | // Append the header div and body container to the main container div
97 | calloutContainer.appendChild(headerDiv);
98 | calloutContainer.appendChild(bodyContainer);
99 |
100 | // Append the main container div to the document depending on format
101 | qwebrPlaceMessageContents(calloutContainer, "title-block-header");
102 |
103 | }
104 |
105 |
106 | // Function that attaches the document status message and diagnostics
107 | function displayStartupMessage(showStartupMessage, showHeaderMessage) {
108 | if (!showStartupMessage) {
109 | return;
110 | }
111 |
112 | // Create the outermost div element for metadata
113 | const quartoTitleMeta = document.createElement("div");
114 | quartoTitleMeta.classList.add("quarto-title-meta");
115 |
116 | // Create the first inner div element
117 | const firstInnerDiv = document.createElement("div");
118 | firstInnerDiv.setAttribute("id", "qwebr-status-message-area");
119 |
120 | // Create the second inner div element for "WebR Status" heading and contents
121 | const secondInnerDiv = document.createElement("div");
122 | secondInnerDiv.setAttribute("id", "qwebr-status-message-title");
123 | secondInnerDiv.classList.add("quarto-title-meta-heading");
124 | secondInnerDiv.innerText = "WebR Status";
125 |
126 | // Create another inner div for contents
127 | const secondInnerDivContents = document.createElement("div");
128 | secondInnerDivContents.setAttribute("id", "qwebr-status-message-body");
129 | secondInnerDivContents.classList.add("quarto-title-meta-contents");
130 |
131 | // Describe the WebR state
132 | qwebrStartupMessage.innerText = "🟡 Loading...";
133 | qwebrStartupMessage.setAttribute("id", "qwebr-status-message-text");
134 | // Add `aria-live` to auto-announce the startup status to screen readers
135 | qwebrStartupMessage.setAttribute("aria-live", "assertive");
136 |
137 | // Append the startup message to the contents
138 | secondInnerDivContents.appendChild(qwebrStartupMessage);
139 |
140 | // Add a status indicator for COOP and COEP Headers if needed
141 | if (showHeaderMessage) {
142 | const crossOriginMessage = document.createElement("p");
143 | crossOriginMessage.innerText = `${crossOriginIsolated ? '🟢' : '🟡'} COOP & COEP Headers`;
144 | crossOriginMessage.setAttribute("id", "qwebr-coop-coep-header");
145 | secondInnerDivContents.appendChild(crossOriginMessage);
146 | }
147 |
148 | // Combine the inner divs and contents
149 | firstInnerDiv.appendChild(secondInnerDiv);
150 | firstInnerDiv.appendChild(secondInnerDivContents);
151 | quartoTitleMeta.appendChild(firstInnerDiv);
152 |
153 | // Place message on webpage
154 | qwebrPlaceMessageContents(quartoTitleMeta);
155 | }
156 |
157 | displayStartupMessage(qwebrShowStartupMessage, qwebrShowHeaderMessage);
158 | qwebrOffScreenCanvasSupportWarningMessage();
--------------------------------------------------------------------------------
/_extensions/coatless/webr/qwebr-monaco-editor-init.html:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/_extensions/coatless/webr/qwebr-styling.css:
--------------------------------------------------------------------------------
1 | .monaco-editor pre {
2 | background-color: unset !important;
3 | }
4 |
5 | .qwebr-editor-toolbar {
6 | width: 100%;
7 | display: flex;
8 | justify-content: space-between;
9 | box-sizing: border-box;
10 | }
11 |
12 | .qwebr-editor-toolbar-left-buttons, .qwebr-editor-toolbar-right-buttons {
13 | display: flex;
14 | }
15 |
16 | .qwebr-non-interactive-loading-container.qwebr-cell-needs-evaluation, .qwebr-non-interactive-loading-container.qwebr-cell-evaluated {
17 | justify-content: center;
18 | display: flex;
19 | background-color: rgba(250, 250, 250, 0.65);
20 | border: 1px solid rgba(233, 236, 239, 0.65);
21 | border-radius: 0.5rem;
22 | margin-top: 15px;
23 | margin-bottom: 15px;
24 | }
25 |
26 | .qwebr-r-project-logo {
27 | color: #2767B0; /* R Project's blue color */
28 | }
29 |
30 | .qwebr-icon-status-spinner {
31 | color: #7894c4;
32 | }
33 |
34 | .qwebr-icon-run-code {
35 | color: #0d9c29
36 | }
37 |
38 | body.quarto-light .qwebr-output-code-stdout {
39 | color: #111;
40 | }
41 |
42 | body.quarto-dark .qwebr-output-code-stdout {
43 | color: #EEE;
44 | }
45 |
46 | .qwebr-output-code-stderr {
47 | color: #db4133;
48 | }
49 |
50 | body.quarto-light .qwebr-editor {
51 | border: 1px solid #EEEEEE;
52 | }
53 |
54 | body.quarto-light .qwebr-editor-toolbar {
55 | background-color: #EEEEEE;
56 | padding: 0.2rem 0.5rem;
57 | }
58 |
59 | body.quarto-dark .qwebr-editor {
60 | border: 1px solid #111;
61 | }
62 |
63 | body.quarto-dark .qwebr-editor-toolbar {
64 | background-color: #111;
65 | padding: 0.2rem 0.5rem;
66 | }
67 |
68 | .qwebr-button {
69 | display: inline-block;
70 | font-weight: 400;
71 | line-height: 1;
72 | text-decoration: none;
73 | text-align: center;
74 | padding: 0.375rem 0.75rem;
75 | font-size: .9rem;
76 | border-radius: 0.25rem;
77 | transition: color .15s ease-in-out,background-color .15s ease-in-out,border-color .15s ease-in-out,box-shadow .15s ease-in-out;
78 | }
79 |
80 | body.quarto-light .qwebr-button {
81 | background-color: #EEEEEE;
82 | color: #000;
83 | border-color: #dee2e6;
84 | border: 1px solid rgba(0,0,0,0);
85 | }
86 |
87 | body.quarto-dark .qwebr-button {
88 | background-color: #111;
89 | color: #EEE;
90 | border-color: #dee2e6;
91 | border: 1px solid rgba(0,0,0,0);
92 | }
93 |
94 | body.quarto-light .qwebr-button:hover {
95 | color: #000;
96 | background-color: #d9dce0;
97 | border-color: #c8ccd0;
98 | }
99 |
100 | body.quarto-dark .qwebr-button:hover {
101 | color: #d9dce0;
102 | background-color: #323232;
103 | border-color: #d9dce0;
104 | }
105 |
106 | .qwebr-button:disabled,.qwebr-button.disabled,fieldset:disabled .qwebr-button {
107 | pointer-events: none;
108 | opacity: .65
109 | }
110 |
111 | .qwebr-button-reset {
112 | color: #696969; /*#4682b4;*/
113 | }
114 |
115 | .qwebr-button-copy {
116 | color: #696969;
117 | }
118 |
119 |
120 | /* Custom styling for RevealJS Presentations*/
121 |
122 | /* Reset the style of the interactive area */
123 | .reveal div.qwebr-interactive-area {
124 | display: block;
125 | box-shadow: none;
126 | max-width: 100%;
127 | max-height: 100%;
128 | margin: 0;
129 | padding: 0;
130 | }
131 |
132 | /* Provide space to entries */
133 | .reveal div.qwebr-output-code-area pre div {
134 | margin: 1px 2px 1px 10px;
135 | }
136 |
137 | /* Collapse the inside code tags to avoid extra space between line outputs */
138 | .reveal pre div code.qwebr-output-code-stdout, .reveal pre div code.qwebr-output-code-stderr {
139 | padding: 0;
140 | display: contents;
141 | }
142 |
143 | body.reveal.quarto-light pre div code.qwebr-output-code-stdout {
144 | color: #111;
145 | }
146 |
147 | body.reveal.quarto-dark pre div code.qwebr-output-code-stdout {
148 | color: #EEEEEE;
149 | }
150 |
151 | .reveal pre div code.qwebr-output-code-stderr {
152 | color: #db4133;
153 | }
154 |
155 |
156 | /* Create a border around console and output (does not effect graphs) */
157 | body.reveal.quarto-light div.qwebr-console-area {
158 | border: 1px solid #EEEEEE;
159 | box-shadow: 2px 2px 10px #EEEEEE;
160 | }
161 |
162 | body.reveal.quarto-dark div.qwebr-console-area {
163 | border: 1px solid #111;
164 | box-shadow: 2px 2px 10px #111;
165 | }
166 |
167 |
168 | /* Cap output height and allow text to scroll */
169 | /* TODO: Is there a better way to fit contents/max it parallel to the monaco editor size? */
170 | .reveal div.qwebr-output-code-area pre {
171 | max-height: 400px;
172 | overflow: scroll;
173 | }
174 |
--------------------------------------------------------------------------------
/_extensions/coatless/webr/qwebr-theme-switch.js:
--------------------------------------------------------------------------------
1 | // Function to update Monaco Editors when body class changes
2 | function updateMonacoEditorsOnBodyClassChange() {
3 | // Select the body element
4 | const body = document.querySelector('body');
5 |
6 | // Options for the observer (which mutations to observe)
7 | const observerOptions = {
8 | attributes: true, // Observe changes to attributes
9 | attributeFilter: ['class'] // Only observe changes to the 'class' attribute
10 | };
11 |
12 | // Callback function to execute when mutations are observed
13 | const bodyClassChangeCallback = function(mutationsList, observer) {
14 | for(let mutation of mutationsList) {
15 | if (mutation.type === 'attributes' && mutation.attributeName === 'class') {
16 | // Class attribute has changed
17 | // Update all Monaco Editors on the page
18 | updateMonacoEditorTheme();
19 | }
20 | }
21 | };
22 |
23 | // Create an observer instance linked to the callback function
24 | const observer = new MutationObserver(bodyClassChangeCallback);
25 |
26 | // Start observing the target node for configured mutations
27 | observer.observe(body, observerOptions);
28 | }
29 |
30 | // Function to update all instances of Monaco Editors on the page
31 | function updateMonacoEditorTheme() {
32 | // Determine what VS Theme to use
33 | const vsThemeToUse = document.body.classList.contains("quarto-dark") ? 'vs-dark' : 'vs' ;
34 |
35 | // Iterate through all initialized Monaco Editors
36 | qwebrEditorInstances.forEach( function(editorInstance) {
37 | editorInstance.updateOptions({ theme: vsThemeToUse });
38 | });
39 | }
40 |
41 | // Call the function to start observing changes to body class
42 | updateMonacoEditorsOnBodyClassChange();
--------------------------------------------------------------------------------
/_extensions/coatless/webr/template.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "WebR-enabled code cell"
3 | format: html
4 | engine: knitr
5 | #webr:
6 | # show-startup-message: false # Disable display of webR initialization state
7 | # show-header-message: true # Display whether COOP&COEP headers are set for speed.
8 | # packages: ['ggplot2', 'dplyr'] # Pre-install dependencies
9 | # autoload-packages: false # Disable automatic library calls on R packages specified in packages.
10 | # repos: # Specify repositories to check for custom packages
11 | # - https://github-username.github.io/reponame
12 | # - https://username.r-universe.dev
13 | # channel-type: 'post-message' # Specify a specific communication channel type.
14 | # home-dir: "/home/rstudio" # Customize where the working directory is
15 | # base-url: '' # Base URL used for downloading R WebAssembly binaries
16 | # service-worker-url: '' # URL from where to load JavaScript worker scripts when loading webR with the ServiceWorker communication channel.
17 | filters:
18 | - webr
19 | ---
20 |
21 | ## Demo
22 |
23 | This is a webr-enabled code cell in a Quarto HTML document.
24 |
25 | ```{webr-r}
26 | 1 + 1
27 | ```
28 |
29 | ```{webr-r}
30 | fit = lm(mpg ~ am, data = mtcars)
31 | summary(fit)
32 | ```
33 |
34 | ```{webr-r}
35 | plot(pressure)
36 | ```
37 |
--------------------------------------------------------------------------------
/_extensions/coatless/webr/webr-serviceworker.js:
--------------------------------------------------------------------------------
1 | importScripts('https://webr.r-wasm.org/v0.3.3/webr-serviceworker.js');
2 |
--------------------------------------------------------------------------------
/_extensions/coatless/webr/webr-worker.js:
--------------------------------------------------------------------------------
1 | importScripts('https://webr.r-wasm.org/v0.3.3/webr-worker.js');
2 |
--------------------------------------------------------------------------------
/_extensions/quarto-ext/fontawesome/_extension.yml:
--------------------------------------------------------------------------------
1 | title: Font Awesome support
2 | author: Carlos Scheidegger
3 | version: 1.1.0
4 | quarto-required: ">=1.2.269"
5 | contributes:
6 | shortcodes:
7 | - fontawesome.lua
8 |
--------------------------------------------------------------------------------
/_extensions/quarto-ext/fontawesome/assets/css/latex-fontsize.css:
--------------------------------------------------------------------------------
1 | .fa-tiny {
2 | font-size: 0.5em;
3 | }
4 | .fa-scriptsize {
5 | font-size: 0.7em;
6 | }
7 | .fa-footnotesize {
8 | font-size: 0.8em;
9 | }
10 | .fa-small {
11 | font-size: 0.9em;
12 | }
13 | .fa-normalsize {
14 | font-size: 1em;
15 | }
16 | .fa-large {
17 | font-size: 1.2em;
18 | }
19 | .fa-Large {
20 | font-size: 1.5em;
21 | }
22 | .fa-LARGE {
23 | font-size: 1.75em;
24 | }
25 | .fa-huge {
26 | font-size: 2em;
27 | }
28 | .fa-Huge {
29 | font-size: 2.5em;
30 | }
31 |
--------------------------------------------------------------------------------
/_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Brands-Regular-400.ttf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Brands-Regular-400.ttf
--------------------------------------------------------------------------------
/_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Brands-Regular-400.woff2:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Brands-Regular-400.woff2
--------------------------------------------------------------------------------
/_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Free-Regular-400.ttf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Free-Regular-400.ttf
--------------------------------------------------------------------------------
/_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Free-Regular-400.woff2:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Free-Regular-400.woff2
--------------------------------------------------------------------------------
/_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Free-Solid-900.ttf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Free-Solid-900.ttf
--------------------------------------------------------------------------------
/_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Free-Solid-900.woff2:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Free-Solid-900.woff2
--------------------------------------------------------------------------------
/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-brands-400.ttf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-brands-400.ttf
--------------------------------------------------------------------------------
/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-brands-400.woff2:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-brands-400.woff2
--------------------------------------------------------------------------------
/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-regular-400.ttf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-regular-400.ttf
--------------------------------------------------------------------------------
/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-regular-400.woff2:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-regular-400.woff2
--------------------------------------------------------------------------------
/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-solid-900.ttf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-solid-900.ttf
--------------------------------------------------------------------------------
/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-solid-900.woff2:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-solid-900.woff2
--------------------------------------------------------------------------------
/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-v4compatibility.ttf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-v4compatibility.ttf
--------------------------------------------------------------------------------
/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-v4compatibility.woff2:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-v4compatibility.woff2
--------------------------------------------------------------------------------
/_extensions/quarto-ext/fontawesome/fontawesome.lua:
--------------------------------------------------------------------------------
1 | local function ensureLatexDeps()
2 | quarto.doc.use_latex_package("fontawesome5")
3 | end
4 |
5 | local function ensureHtmlDeps()
6 | quarto.doc.add_html_dependency({
7 | name = 'fontawesome6',
8 | version = '0.1.0',
9 | stylesheets = {'assets/css/all.css', 'assets/css/latex-fontsize.css'}
10 | })
11 | end
12 |
13 | local function isEmpty(s)
14 | return s == nil or s == ''
15 | end
16 |
17 | local function isValidSize(size)
18 | local validSizes = {
19 | "tiny",
20 | "scriptsize",
21 | "footnotesize",
22 | "small",
23 | "normalsize",
24 | "large",
25 | "Large",
26 | "LARGE",
27 | "huge",
28 | "Huge"
29 | }
30 | for _, v in ipairs(validSizes) do
31 | if v == size then
32 | return size
33 | end
34 | end
35 | return ""
36 | end
37 |
38 | return {
39 | ["fa"] = function(args, kwargs)
40 |
41 | local group = "solid"
42 | local icon = pandoc.utils.stringify(args[1])
43 | if #args > 1 then
44 | group = icon
45 | icon = pandoc.utils.stringify(args[2])
46 | end
47 |
48 | local title = pandoc.utils.stringify(kwargs["title"])
49 | if not isEmpty(title) then
50 | title = " title=\"" .. title .. "\""
51 | end
52 |
53 | local label = pandoc.utils.stringify(kwargs["label"])
54 | if isEmpty(label) then
55 | label = " aria-label=\"" .. icon .. "\""
56 | else
57 | label = " aria-label=\"" .. label .. "\""
58 | end
59 |
60 | local size = pandoc.utils.stringify(kwargs["size"])
61 |
62 | -- detect html (excluding epub which won't handle fa)
63 | if quarto.doc.is_format("html:js") then
64 | ensureHtmlDeps()
65 | if not isEmpty(size) then
66 | size = " fa-" .. size
67 | end
68 | return pandoc.RawInline(
69 | 'html',
70 | " "
71 | )
72 | -- detect pdf / beamer / latex / etc
73 | elseif quarto.doc.is_format("pdf") then
74 | ensureLatexDeps()
75 | if isEmpty(isValidSize(size)) then
76 | return pandoc.RawInline('tex', "\\faIcon{" .. icon .. "}")
77 | else
78 | return pandoc.RawInline('tex', "{\\" .. size .. "\\faIcon{" .. icon .. "}}")
79 | end
80 | else
81 | return pandoc.Null()
82 | end
83 | end
84 | }
85 |
--------------------------------------------------------------------------------
/_quarto.yml:
--------------------------------------------------------------------------------
1 | project:
2 | type: website
3 | preview:
4 | port: 5555
5 |
6 | execute:
7 | freeze: auto # Re-render only when source changes
8 |
9 | website:
10 | title: "R Primers"
11 | bread-crumbs: false
12 |
13 | repo-url: "https://github.com/andrewheiss/r-primers"
14 | repo-actions: [edit, issue]
15 |
16 | navbar:
17 | pinned: true
18 | left:
19 | - about.qmd
20 | right:
21 | - icon: github
22 | aria-label: github
23 | href: https://github.com/andrewheiss/r-primers
24 |
25 | sidebar:
26 | style: "docked"
27 | collapse-level: 2
28 | contents:
29 | - section: "Basics"
30 | contents:
31 | - auto: "basics/01-visualization-basics"
32 | - auto: "basics/02-programming-basics"
33 |
34 | - section: "Transform data"
35 | contents:
36 | - auto: "transform-data/01-tibbles"
37 | - auto: "transform-data/02-isolating"
38 | - auto: "transform-data/03-deriving"
39 |
40 | - section: "Visualize data"
41 | contents:
42 | - auto: "visualize-data/01-eda"
43 | - auto: "visualize-data/02-bar-charts"
44 | - auto: "visualize-data/03-histograms"
45 | - auto: "visualize-data/04-boxplots"
46 | - auto: "visualize-data/05-scatterplots"
47 | - auto: "visualize-data/06-line-graphs"
48 | - auto: "visualize-data/07-overplotting"
49 | - auto: "visualize-data/08-customize"
50 |
51 | - section: "Tidy data"
52 | contents:
53 | - auto: "tidy-data/01-reshape-data"
54 |
55 | format:
56 | html:
57 | theme:
58 | - zephyr
59 | - html/custom.scss
60 | toc: true
61 | toc-depth: 3
62 | knitr:
63 | opts_chunk:
64 | dev: "ragg_png"
65 | dpi: 300
66 |
--------------------------------------------------------------------------------
/about.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "About"
3 | sidebar: false
4 |
5 | engine: knitr
6 | filters:
7 | - webr
8 | webr:
9 | cell-options:
10 | editor-font-scale: 0.85
11 | fig-width: 6
12 | fig-height: 3.7
13 | out-width: "70%"
14 | ---
15 |
16 | ## A brief (and probably inaccurate) history of the RStudio/Posit Primers
17 |
18 | In 2018, Garrett Grolemund (co-author of [*R for Data Science*](https://r4ds.had.co.nz/)) created the RStudio Primers—a set of free interactive [{learnr}](https://rstudio.github.io/learnr/) apps to teach R to the public. These were hosted on a [Shiny](https://shiny.posit.co/) server maintained by RStudio and accessible through RStudio.cloud.
19 |
20 | With [RStudio's rebranding to Posit in 2022](https://posit.co/blog/rstudio-is-becoming-posit/), the lessons became the Posit Primers and remained accessible through [Posit.cloud](https://posit.cloud/).
21 |
22 | In December 2023, the Posit Primers were retired in favor of [Posit Recipes](https://posit.cloud/learn/recipes) and [Posit Cheatsheets](https://posit.co/resources/cheatsheets/). These newer resources have been updated to the latest versions of {tidyverse} packages, and they're no longer interactive (which is probably a lot easier for Posit's education team to maintain).
23 |
24 | ## How I've used the Primers in the past
25 |
26 | I've been relying on the RStudio/Posit Primers for teaching [my own R-focused classes](https://www.andrewheiss.com/teaching/) since 2020. In the first few weeks of every semester, I had students complete a bunch of the tutorials to get the hang of {dplyr} and {ggplot2}.
27 |
28 | With the sunsetting of the Primers at the beginning of 2024, though, I had to figure out a new solution.
29 |
30 | Fortunately, the RStudio/Posit Education team [posted the source for the Primers at GitHub](https://github.com/rstudio-education/primers) under a [Creative Commons license](https://github.com/rstudio-education/primers/blob/master/LICENSE.md), so for Spring 2024, I maintained a Shiny server with the tutorials I needed for my classes.
31 |
32 | ## The magic of webR
33 |
34 | Starting in 2023, [webR](https://docs.r-wasm.org/webr/latest/)—a version of R compiled to run with Javascript in a web browser—underwent rapid development, and a new Quarto extension ([{quarto-webr}](https://quarto-webr.thecoatlessprofessor.com/)) has since been developed to make it almost trivially easy to include Shiny-free interactive R chunks directly in the browser, like this:
35 |
36 | ```{webr-r}
37 | hist(faithful$waiting)
38 |
39 |
40 | ```
41 |
42 | That's ***so magical***!
43 |
44 | So for my Summer 2024 classes, I decided to take the plunge and convert the Shiny-based {learnr} tutorials that I've been using for so long into a webR-based website.
45 |
46 | ## How it works
47 |
48 | The tutorials aren't nearly as fully featured as {learnr}, but they get the job done.
49 |
50 | ### {learnr} hints and solutions
51 |
52 | To simulate {learnr}'s hint and solution functionality, I use Quarto's [Tabset Panels](https://quarto.org/docs/interactive/layout.html#tabset-panel):
53 |
54 | ````default
55 | ::: {.panel-tabset}
56 | ## {{{< fa code >}}} Interactive editor
57 |
58 | ```{webr-r}
59 | # Calculate 1 + 2
60 | ```
61 |
62 | ## {{{< fa lightbulb >}}} Hint
63 |
64 | **Hint:** Think about addition
65 |
66 | ## {{{< fa circle-check >}}} Solution
67 |
68 | ```r
69 | 1 + 2
70 | ```
71 |
72 | :::
73 | ````
74 |
75 | ::: {.panel-tabset}
76 | ## {{< fa code >}} Interactive editor
77 |
78 | ```{webr-r}
79 | # Calculate 1 + 2
80 |
81 |
82 |
83 | ```
84 |
85 | ## {{< fa lightbulb >}} Hint
86 |
87 | **Hint:** Think about addition
88 |
89 | ## {{< fa circle-check >}} Solution
90 |
91 | ```r
92 | 1 + 2
93 | ```
94 |
95 | :::
96 |
97 | ### {learnr} progressive reveal
98 |
99 | One great feature of {learnr} is its [progressive reveal](https://rstudio.github.io/learnr/articles/exercises.html#progressive-reveal), which unhides sections of a tutorial as you work through it. To simulate this with Quarto, I turned to Javascript. [My `progressive-reveal.js` script](https://github.com/andrewheiss/r-primers/blob/main/js/progressive-reveal.js) looks for all third-level headings on a page (similar to {learnr}) and makes the appear progressively using some buttons at the bottom of the page. It's clunky, but it works.
100 |
101 | ### Quizzes
102 |
103 | {learnr} also supports [interactive questions](https://rstudio.github.io/learnr/articles/questions.html), or inline quiz questions. To simulate this, I use [{checkdown}](https://agricolamz.github.io/checkdown/). It's not as great as {learnr}, but again, it gets the job done.^[I also played with [{webexercises}](https://psyteachr.github.io/webexercises/articles/webexercises.html), which is a little more polished, but doesn't let you give feedback messages for in/correct answers. I'm tempted to fork {checkdown} or submit a bunch of PRs to make it nicer. Someday.]
104 |
105 | ## Legal stuff
106 |
107 | The original primers were developed by the RStudio/Posit Education Team and made [open source on GitHub](https://github.com/rstudio-education/primers). Following the original license, these tutorials are licensed under the Creative Commons Attribution-ShareAlike 4.0 License (CC BY-SA 4.0).
108 |
109 | The primers are derived from the book [*R for Data Science*](https://r4ds.had.co.nz/) from O'Reilly Media, Inc. Copyright © 2017 Garrett Grolemund, Hadley Wickham. Used with permission.
110 |
111 | [See here for the full license.](https://github.com/andrewheiss/r-primers/blob/main/LICENSE.md)
112 |
--------------------------------------------------------------------------------
/basics/01-visualization-basics/03-geometric-objects.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Geometric objects"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 |
12 | engine: knitr
13 | filters:
14 | - webr
15 | webr:
16 | packages:
17 | - ggplot2
18 | cell-options:
19 | editor-font-scale: 0.85
20 | fig-width: 6
21 | fig-height: 3.7
22 | out-width: "70%"
23 | ---
24 |
25 | ```{r include=FALSE}
26 | knitr::opts_chunk$set(
27 | fig.width = 6,
28 | fig.height = 6 * 0.618,
29 | fig.retina = 3,
30 | dev = "ragg_png",
31 | fig.align = "center",
32 | out.width = "70%"
33 | )
34 |
35 | library(tidyverse)
36 | library(checkdown)
37 |
38 | source(here::here("R", "helpers.R"))
39 | ```
40 |
41 | How are these two plots similar?
42 |
43 | ```{r echo = FALSE, out.width="100%", message = FALSE}
44 | #| layout-ncol: 2
45 | ggplot(data = mpg) +
46 | geom_point(mapping = aes(x = displ, y = hwy))
47 |
48 | ggplot(data = mpg) +
49 | geom_smooth(mapping = aes(x = displ, y = hwy))
50 | ```
51 |
52 | Both plots contain the same x variable, the same y variable, and both describe the same data. But the plots are not identical. Each plot uses a different visual object to represent the data. In ggplot2 syntax, we say that they use different __geoms__.
53 |
54 | A __geom__ is the geometrical object that a plot uses to represent observations. People often describe plots by the type of geom that the plot uses. For example, bar charts use bar geoms, line charts use line geoms, boxplots use boxplot geoms, and so on. Scatterplots break the trend; they use the point geom.
55 |
56 | As we see above, you can use different geoms to plot the same data. The plot on the left uses the point geom, and the plot on the right uses the smooth geom, a smooth line fitted to the data.
57 |
58 | ### Geom functions
59 |
60 | To change the geom in your plot, change the geom function that you add to `ggplot()`. For example, take this code which makes the plot on the left (above), and change `geom_point()` to `geom_smooth()`. What do you get?
61 |
62 | ::: {.panel-tabset}
63 | ## {{< fa code >}} Interactive editor
64 |
65 | ```{webr-r}
66 | ggplot(data = mpg) +
67 | geom_point(mapping = aes(x = displ, y = hwy))
68 | ```
69 |
70 | ## {{< fa circle-check >}} Solution
71 |
72 | ```r
73 | ggplot(data = mpg) +
74 | geom_smooth(mapping = aes(x = displ, y = hwy))
75 | ```
76 |
77 | :::
78 |
79 | ###
80 |
81 | Good job! You get the plot on the right (above).
82 |
83 |
84 | ### More about geoms
85 |
86 | ggplot2 provides over 30 geom functions that you can use to make plots, and extension packages provide even more (see for a sampling). You'll learn how to use these geoms to explore data in the [Visualize Data]() primer.
87 |
88 | Until then, the best way to get a comprehensive overview of the available geoms is with the [ggplot2 cheatsheet](https://rstudio.github.io/cheatsheets/html/data-visualization.html). To learn more about any single geom, look at its help page, e.g. `?geom_smooth`.
89 |
90 | ### Exercise 1
91 |
92 | What geom would you use to draw a line chart? A boxplot? A histogram? An area chart?
93 |
94 | ### Exercise 2
95 |
96 | ::: {.callout-note appearance="simple" icon=false .question}
97 |
98 | **What does the `se` argument to `geom_smooth()` do?**
99 |
100 | ```{r predict, echo=FALSE}
101 | check_question(
102 | answer = "Adds or removes a standard error ribbon around the smooth line",
103 | options = c(
104 | "Nothing. `se` is not an argument of `geom_smooth()`",
105 | "chooses a method for calculating the smooth line",
106 | "controls whether or not to **s**how **e**rrors",
107 | "Adds or removes a standard error ribbon around the smooth line"
108 | ),
109 | type = "radio",
110 | button_label = "Submit answer",
111 | q_id = 1,
112 | right = c("Correct!")
113 | )
114 | ```
115 | :::
116 |
117 |
118 | ### Putting it all together
119 |
120 | The ideas that you've learned here: geoms, aesthetics, and the implied existence of a data space and a visual space combine to form a system known as the Grammar of Graphics.
121 |
122 | The Grammar of Graphics provides a systematic way to build any graph, and it underlies the ggplot2 package. In fact, the first two letters of ggplot2 stand for "Grammar of Graphics".
123 |
124 | ### The Grammar of Graphics
125 |
126 | The best way to understand the Grammar of Graphics is to see it explained in action:
127 |
128 | ```{=html}
129 |
130 |
131 |
132 | ```
133 |
134 | ##
135 |
136 | ```{r}
137 | #| echo: false
138 | #| results: asis
139 | create_buttons("04-ggplot2-package.html")
140 | ```
141 |
--------------------------------------------------------------------------------
/basics/01-visualization-basics/04-ggplot2-package.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "The ggplot2 package"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | ---
12 |
13 | ```{r include=FALSE}
14 | knitr::opts_chunk$set(
15 | fig.width = 6,
16 | fig.height = 6 * 0.618,
17 | fig.retina = 3,
18 | dev = "ragg_png",
19 | fig.align = "center",
20 | out.width = "70%"
21 | )
22 |
23 | library(tidyverse)
24 |
25 | source(here::here("R", "helpers.R"))
26 | ```
27 |
28 | Throughout this tutorial, I've referred to ggplot2 as a package. What does that mean?
29 |
30 | The R language is subdivided into __packages__, small collections of data sets and functions that all focus on a single task. The functions that we used in this tutorial come from one of those packages, the ggplot2 package, which focuses on visualizing data.
31 |
32 | ### What should you know about packages?
33 |
34 | When you first install R, you get a small collection of core packages known as __base R__. The remaining packages---there are over 10,000 of them---are optional. You don't need to install them unless you want to use them.
35 |
36 | ggplot2 is one of these optional packages, so are the other packages that we will look at in these tutorials. Some of the most popular and most modern parts of R come in the optional packages.
37 |
38 | You don't need to worry about installing packages in these tutorials. Each tutorial comes with all of the packages that you need pre-installed; this is how we make the tutorials easy to use.
39 |
40 | However, one day, you may want to use R outside of these tutorials. When that day comes, you'll want to remember which packages to download to acquire the functions you use here. Throughout the tutorials, I will try to make it as clear as possible where each function comes from!
41 |
42 |
43 | ### Where to from here
44 |
45 | Congratulations! You can use the ggplot2 code template to plot any dataset in many different ways. As you begin exploring data, you should incorporate these tools into your workflow.
46 |
47 | There is much more to ggplot2 and Data Visualization than we have covered here. If you would like to learn more about visualizing data with ggplot2, check out RStudio's primer on [Data Visualization]().
48 |
49 | Your new data visualization skills will make it easier to learn other parts of R, because you can now visualize the results of any change that you make to data. you'll put these skills to immediate use in the next tutorial, which will show you how to extract values from datasets, as well as how to compute new variables and summary statistics from your data. See you there.
50 |
51 | ##
52 |
53 | ```{r}
54 | #| echo: false
55 | #| results: asis
56 | create_buttons(NULL)
57 | ```
58 |
--------------------------------------------------------------------------------
/basics/01-visualization-basics/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Data visualization basics"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | ---
12 |
13 | ```{r include=FALSE}
14 | knitr::opts_chunk$set(
15 | fig.width = 6,
16 | fig.height = 6 * 0.618,
17 | fig.retina = 3,
18 | dev = "ragg_png",
19 | fig.align = "center",
20 | out.width = "70%"
21 | )
22 |
23 | source(here::here("R", "helpers.R"))
24 | ```
25 |
26 | Visualization is one of the most important tools for data science.
27 |
28 | It is also a great way to start learning R; when you visualize data, you get an immediate payoff that will keep you motivated as you learn. After all, learning a new language can be hard!
29 |
30 | This tutorial will teach you how to visualize data with R's most popular visualization package, `ggplot2`.
31 |
32 | ###
33 |
34 | The tutorial focuses on three basic skills:
35 |
36 | 1. How to create graphs with a reusable **template**
37 | 1. How to add variables to a graph with **aesthetics**
38 | 1. How to make different "types" of graphs with **geoms**
39 |
40 | In this tutorial, we will use the [core tidyverse packages](http://tidyverse.org/), including `ggplot2`. I've already loaded the packages for you, so let's begin!
41 |
42 | ***
43 |
44 | These examples are excerpted from _R for Data Science_ by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. You can purchase the book at [shop.oreilly.com](http://shop.oreilly.com/product/0636920034407.do).
45 |
46 |
47 | ##
48 |
49 | ```{r}
50 | #| echo: false
51 | #| results: asis
52 | create_buttons("01-code-template.html")
53 | ```
54 |
--------------------------------------------------------------------------------
/basics/02-programming-basics/01-functions.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Functions"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | engine: knitr
12 | filters:
13 | - webr
14 | webr:
15 | cell-options:
16 | editor-font-scale: 0.85
17 | fig-width: 6
18 | fig-height: 3.7
19 | out-width: "70%"
20 | ---
21 |
22 | ```{r include=FALSE}
23 | source(here::here("R", "helpers.R"))
24 | ```
25 |
26 | ### Functions {.no-hide}
27 |
28 | Watch [this video](https://vimeo.com/220490105):
29 |
30 | ```{=html}
31 |
32 |
33 |
34 | ```
35 |
36 | ### Run a function
37 |
38 | Can you use the `sqrt()` function in the chunk below to compute the square root of 962?
39 |
40 | ::: {.panel-tabset}
41 | ## {{< fa code >}} Interactive editor
42 |
43 | ```{webr-r}
44 |
45 |
46 |
47 | ```
48 |
49 | ## {{< fa circle-check >}} Solution
50 |
51 | ```r
52 | sqrt(962)
53 | ```
54 |
55 | :::
56 |
57 | ### Code
58 |
59 | Use the code chunk below to examine the code that `sqrt()` runs.
60 |
61 | ::: {.panel-tabset}
62 | ## {{< fa code >}} Interactive editor
63 |
64 | ```{webr-r}
65 |
66 |
67 |
68 | ```
69 |
70 | ## {{< fa circle-check >}} Solution
71 |
72 | ```r
73 | sqrt
74 | ```
75 |
76 | :::
77 |
78 | ###
79 |
80 | Good job! `sqrt` immediately triggers a low level algorithm optimized for performance, so there is not much code to see.
81 |
82 | ### lm
83 |
84 | Compare the code in `sqrt()` to the code in another R function, `lm()`. Examine `lm()`'s code body in the chunk below.
85 |
86 | ::: {.panel-tabset}
87 | ## {{< fa code >}} Interactive editor
88 |
89 | ```{webr-r}
90 |
91 |
92 |
93 | ```
94 |
95 | ## {{< fa circle-check >}} Solution
96 |
97 | ```r
98 | lm
99 | ```
100 |
101 | :::
102 |
103 |
104 | ### Help pages
105 |
106 | Wow! `lm()` runs a lot of code. What does it do? Open the help page for `lm()` in the chunk below and find out.
107 |
108 | ::: {.panel-tabset}
109 | ## {{< fa code >}} Interactive editor
110 |
111 | ```{webr-r}
112 | ?lm
113 |
114 |
115 | ```
116 |
117 | ## {{< fa circle-check >}} Solution
118 |
119 | ```r
120 | ?lm
121 | ```
122 |
123 | :::
124 |
125 | ###
126 |
127 | Good job! `lm()` is R's function for fitting basic linear models. No wonder it runs so much code.
128 |
129 |
130 | ### Code comments
131 |
132 | What do you think the chunk below will return? Run it and see. The result should be nothing. R will not run anything on a line after a `#` symbol. This is useful because it lets you write human readable comments in your code: just place the comments after a `#`. Now delete the `#` and re-run the chunk. You should see a result.
133 |
134 | ::: {.panel-tabset}
135 | ## {{< fa code >}} Interactive editor
136 |
137 | ```{webr-r}
138 | # sqrt(962)
139 |
140 |
141 | ```
142 |
143 | ## {{< fa circle-check >}} Solution
144 |
145 | ```r
146 | sqrt(962)
147 | ```
148 |
149 | :::
150 |
151 | ##
152 |
153 | ```{r}
154 | #| echo: false
155 | #| results: asis
156 | create_buttons("02-arguments.html")
157 | ```
158 |
--------------------------------------------------------------------------------
/basics/02-programming-basics/02-arguments.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Arguments"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 |
12 | engine: knitr
13 | filters:
14 | - webr
15 | webr:
16 | cell-options:
17 | editor-font-scale: 0.85
18 | fig-width: 6
19 | fig-height: 3.7
20 | out-width: "70%"
21 | ---
22 |
23 | ```{r include=FALSE}
24 | library(tidyverse)
25 | library(checkdown)
26 |
27 | source(here::here("R", "helpers.R"))
28 | ```
29 |
30 | ### Arguments {.no-hide}
31 |
32 | Watch [this video](https://vimeo.com/220490157):
33 |
34 | ```{=html}
35 |
36 |
37 |
38 | ```
39 |
40 | ### `args()`
41 |
42 | `rnorm()` is a function that generates random variables from a normal distribution. Find the arguments of `rnorm()` using the `args()` function.
43 |
44 | ::: {.panel-tabset}
45 | ## {{< fa code >}} Interactive editor
46 |
47 | ```{webr-r}
48 |
49 |
50 |
51 | ```
52 |
53 | ## {{< fa circle-check >}} Solution
54 |
55 | ```r
56 | args(rnorm)
57 | ```
58 |
59 | :::
60 |
61 | ###
62 |
63 | Good job! `n` specifies the number of random normal variables to generate. `mean` and `sd` describe the distribution to generate the random values with.
64 |
65 | ### Optional arguments
66 |
67 | ::: {.callout-note appearance="simple" icon=false .question}
68 |
69 | **Which arguments of `rnorm()` are not optional?**
70 |
71 | ```{r predict, echo=FALSE}
72 | check_question(
73 | answer = "n",
74 | options = c(
75 | "n",
76 | "mean",
77 | "sd"
78 | ),
79 | type = "radio",
80 | button_label = "Submit answer",
81 | q_id = 1,
82 | right = c("Correct! `n` is not an optional argument because it does not have a default value.")
83 | )
84 | ```
85 | :::
86 |
87 | ### `rnorm()` 1
88 |
89 | Use `rnrom()` to generate 100 random normal values with a mean of 100 and a standard deviation of 15.
90 |
91 | ::: {.panel-tabset}
92 | ## {{< fa code >}} Interactive editor
93 |
94 | ```{webr-r}
95 |
96 |
97 |
98 | ```
99 |
100 | ## {{< fa circle-check >}} Solution
101 |
102 | ```r
103 | rnorm(100, mean = 100, sd = 15)
104 | ```
105 |
106 | :::
107 |
108 | ### `rnorm()` 2
109 |
110 | Can you spot the error in the code below? Fix the code and then re-run it.
111 |
112 | ::: {.panel-tabset}
113 | ## {{< fa code >}} Interactive editor
114 |
115 | ```{webr-r}
116 | rnorm(100, mu = 100, sd = 15)
117 |
118 |
119 | ```
120 |
121 | ## {{< fa lightbulb >}} Hint
122 |
123 | **Hint:** In math, $\mu$ (mu, pronounced "mew" or "moo") is a Greek letter that stands for the mean of a distribution.
124 |
125 | ## {{< fa circle-check >}} Solution
126 |
127 | ```r
128 | rnorm(100, mean = 100, sd = 15)
129 | ```
130 |
131 | :::
132 |
133 | ##
134 |
135 | ```{r}
136 | #| echo: false
137 | #| results: asis
138 | create_buttons("03-objects.html")
139 | ```
140 |
--------------------------------------------------------------------------------
/basics/02-programming-basics/03-objects.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Objects"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | engine: knitr
12 | filters:
13 | - webr
14 | webr:
15 | cell-options:
16 | editor-font-scale: 0.85
17 | fig-width: 6
18 | fig-height: 3.7
19 | out-width: "70%"
20 | ---
21 |
22 | ```{r include=FALSE}
23 | source(here::here("R", "helpers.R"))
24 | ```
25 |
26 | ### Objects {.no-hide}
27 |
28 | Watch [this video](https://vimeo.com/220493412):
29 |
30 | ```{=html}
31 |
32 |
33 |
34 | ```
35 |
36 | ### Object names
37 |
38 | You can choose almost any name you like for an object, as long as the name does not begin with a number or a special character like `+`, `-`, `*`, `/`, `^`, `!`, `@`, or `&`.
39 |
40 | For instance, check out this list of some possible object names. Some are okay to use; some are invalid:
41 |
42 | - `today`: This is fine
43 | - `1st`: This is **bad**; it starts with a number
44 | - `+1`: This is **bad**; it starts with a special character
45 | - `vars`: This is fine
46 | - `\^_^`: This is **bad**; it starts with a special character
47 | - `foo`: This is fine
48 |
49 |
50 | ### Using objects
51 |
52 | In the code chunk below, save the results of `rnorm(100, mean = 100, sd = 15)` to an object named `data`. Then, on a new line, call the `hist()` function on `data` to plot a histogram of the random values.
53 |
54 | ::: {.panel-tabset}
55 | ## {{< fa code >}} Interactive editor
56 |
57 | ```{webr-r}
58 |
59 |
60 |
61 | ```
62 |
63 | ## {{< fa circle-check >}} Solution
64 |
65 | ```r
66 | data <- rnorm(100, mean = 100, sd = 15)
67 | hist(data)
68 | ```
69 |
70 | :::
71 |
72 | ### What if?
73 |
74 | What do you think would happen if you assigned `data` to a new object named `copy`, like this? Run the code and then inspect both `data` and `copy`.
75 |
76 | ::: {.panel-tabset}
77 | ## {{< fa code >}} Interactive editor
78 |
79 | ```{webr-r}
80 | data <- rnorm(100, mean = 100, sd = 15)
81 | copy <- data
82 |
83 |
84 | ```
85 |
86 | ## {{< fa circle-check >}} Solution
87 |
88 | ```r
89 | data <- rnorm(100, mean = 100, sd = 15)
90 | copy <- data
91 | data
92 | copy
93 | ```
94 |
95 | :::
96 |
97 | ###
98 |
99 | Good job! R saves a copy of the contents in data to copy.
100 |
101 | ### Datasets
102 |
103 | Objects provide an easy way to store datasets in R. In fact, R comes with many toy datasets pre-loaded. Examine the contents of `mtcars` to see a classic toy dataset. Hint: how could you learn more about the `mtcars` object?
104 |
105 | ::: {.panel-tabset}
106 | ## {{< fa code >}} Interactive editor
107 |
108 | ```{webr-r}
109 |
110 |
111 |
112 | ```
113 |
114 | ## {{< fa circle-check >}} Solution
115 |
116 | ```r
117 | mtcars
118 | ```
119 |
120 | :::
121 |
122 | ###
123 |
124 | Good job! You can learn more about mtcars by examining its help page with `?mtcars`.
125 |
126 | ### `rm()`
127 |
128 | What if you accidentally overwrite an object? If that object came with R or one of its packages, you can restore the original version of the object by removing your version with `rm()`. Run `rm()` on `mtcars` below to restore the mtcars data set.
129 |
130 | ::: {.panel-tabset}
131 | ## {{< fa code >}} Interactive editor
132 |
133 | ```{webr-r}
134 | mtcars <- 1
135 | mtcars
136 |
137 |
138 | ```
139 |
140 | ## {{< fa circle-check >}} Solution
141 |
142 | ```r
143 | mtcars <- 1
144 | mtcars
145 | rm(mtcars)
146 | mtcars
147 | ```
148 |
149 | :::
150 |
151 | ###
152 |
153 | Good job! Unfortunately, `rm()` cannot help you if you overwrite one of your own objects.
154 |
155 | ##
156 |
157 | ```{r}
158 | #| echo: false
159 | #| results: asis
160 | create_buttons("04-vectors.html")
161 | ```
162 |
--------------------------------------------------------------------------------
/basics/02-programming-basics/04-vectors.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Vectors"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | engine: knitr
12 | filters:
13 | - webr
14 | webr:
15 | cell-options:
16 | editor-font-scale: 0.85
17 | fig-width: 6
18 | fig-height: 3.7
19 | out-width: "70%"
20 | ---
21 |
22 | ```{r include=FALSE}
23 | source(here::here("R", "helpers.R"))
24 | ```
25 |
26 | ### Vectors {.no-hide}
27 |
28 | Watch [this video](https://vimeo.com/220490316):
29 |
30 | ```{=html}
31 |
32 |
33 |
34 | ```
35 |
36 | ### Create a vector
37 |
38 | In the chunk below, create a vector that contains the integers from one to ten. Use the `c()` function.
39 |
40 | ::: {.panel-tabset}
41 | ## {{< fa code >}} Interactive editor
42 |
43 | ```{webr-r}
44 |
45 |
46 |
47 | ```
48 |
49 | ## {{< fa circle-check >}} Solution
50 |
51 | ```r
52 | c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
53 | ```
54 |
55 | :::
56 |
57 |
58 | ### `:`
59 |
60 | If your vector contains a sequence of contiguous integers, you can create it with the `:` shortcut. Run `1:10` in the chunk below. What do you get? What do you suppose `1:20` would return?
61 |
62 | ::: {.panel-tabset}
63 | ## {{< fa code >}} Interactive editor
64 |
65 | ```{webr-r}
66 |
67 |
68 |
69 | ```
70 |
71 | ## {{< fa circle-check >}} Solution
72 |
73 | ```r
74 | 1:10
75 | 1:20
76 | ```
77 |
78 | :::
79 |
80 |
81 | ### `[]`
82 |
83 | You can extract any element of a vector by placing a pair of brackets behind the vector. Inside the brackets place the number of the element that you'd like to extract. For example, `vec[3]` would return the third element of the vector named `vec`.
84 |
85 | Use the chunk below to extract the fourth element of `vec`.
86 |
87 | ::: {.panel-tabset}
88 | ## {{< fa code >}} Interactive editor
89 |
90 | ```{webr-r}
91 | vec <- c(1, 2, 4, 8, 16)
92 |
93 |
94 | ```
95 |
96 | ## {{< fa circle-check >}} Solution
97 |
98 | ```r
99 | vec <- c(1, 2, 4, 8, 16)
100 | vec[4]
101 | ```
102 |
103 | :::
104 |
105 | ### More `[]`
106 |
107 | You can also use `[]` to extract multiple elements of a vector. Place the vector `c(1,2,5)` between the brackets below. What does R return?
108 |
109 | ::: {.panel-tabset}
110 | ## {{< fa code >}} Interactive editor
111 |
112 | ```{webr-r}
113 | vec <- c(1, 2, 4, 8, 16)
114 | vec[]
115 |
116 |
117 | ```
118 |
119 | ## {{< fa circle-check >}} Solution
120 |
121 | ```r
122 | vec <- c(1, 2, 4, 8, 16)
123 | vec[c(1,2,5)]
124 | ```
125 |
126 | :::
127 |
128 |
129 | ### Names
130 |
131 | If the elements of your vector have names, you can extract them by name. To do so place a name or vector of names in the brackets behind a vector. Surround each name with quotation marks, e.g. `vec2[c("alpha", "beta")]`.
132 |
133 | Extract the element named gamma from the vector below.
134 |
135 | ::: {.panel-tabset}
136 | ## {{< fa code >}} Interactive editor
137 |
138 | ```{webr-r}
139 | vec2 <- c(alpha = 1, beta = 2, gamma = 3)
140 |
141 |
142 | ```
143 |
144 | ## {{< fa circle-check >}} Solution
145 |
146 | ```r
147 | vec2 <- c(alpha = 1, beta = 2, gamma = 3)
148 | vec2["gamma"]
149 | ```
150 |
151 | :::
152 |
153 |
154 | ### Vectorised operations
155 |
156 | Predict what the code below will return. Then look at the result.
157 |
158 | ::: {.panel-tabset}
159 | ## {{< fa code >}} Interactive editor
160 |
161 | ```{webr-r}
162 | c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) + c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
163 |
164 |
165 | ```
166 |
167 | :::
168 |
169 | ###
170 |
171 | Good job! Like many R functions, R's math operators are vectorized: they're designed to work with vectors by repeating the operation for each pair of elements.
172 |
173 | ### Vector recycling
174 |
175 | Predict what the code below will return. Then look at the result.
176 |
177 | ::: {.panel-tabset}
178 | ## {{< fa code >}} Interactive editor
179 |
180 | ```{webr-r}
181 | 1 + c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
182 |
183 |
184 | ```
185 |
186 | :::
187 |
188 | ###
189 |
190 | Good job! Whenever you try to work with vectors of varying lengths (recall that `1` is a vector of length one), R will repeat the shorter vector as needed to compute the result.
191 |
192 | ##
193 |
194 | ```{r}
195 | #| echo: false
196 | #| results: asis
197 | create_buttons("05-types.html")
198 | ```
199 |
--------------------------------------------------------------------------------
/basics/02-programming-basics/05-types.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Types"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 |
12 | engine: knitr
13 | filters:
14 | - webr
15 | webr:
16 | cell-options:
17 | editor-font-scale: 0.85
18 | fig-width: 6
19 | fig-height: 3.7
20 | out-width: "70%"
21 | ---
22 |
23 | ```{r include=FALSE}
24 | library(tidyverse)
25 | library(checkdown)
26 |
27 | source(here::here("R", "helpers.R"))
28 | ```
29 |
30 | ### Types {.no-hide}
31 |
32 | Watch [this video](https://vimeo.com/220490241):
33 |
34 | ```{=html}
35 |
36 |
37 |
38 | ```
39 |
40 | ### Atomic types
41 |
42 | ::: {.callout-note appearance="simple" icon=false .question}
43 |
44 | **Which of these is not an atomic data type?**
45 |
46 | ```{r types1, echo=FALSE}
47 | check_question(
48 | answer = "simple",
49 | options = c(
50 | "numeric/double",
51 | "integer",
52 | "character",
53 | "logical",
54 | "complex",
55 | "raw",
56 | "simple"
57 | ),
58 | type = "radio",
59 | button_label = "Submit answer",
60 | q_id = 1,
61 | right = c("Correct!")
62 | )
63 | ```
64 |
65 | :::
66 |
67 | ### What type?
68 |
69 | ::: {.callout-note appearance="simple" icon=false .question}
70 |
71 | **What type of data is `"1L"`?**
72 |
73 | ```{r types2, echo=FALSE}
74 | check_question(
75 | answer = "character",
76 | options = c(
77 | "numeric/double",
78 | "integer",
79 | "character",
80 | "logical"
81 | ),
82 | type = "radio",
83 | button_label = "Submit answer",
84 | q_id = 2,
85 | right = c("Correct! This was tricky because of the quotes. 1L by itself would be an integer, but values become characters when they're in quotes.")
86 | )
87 | ```
88 |
89 | :::
90 |
91 | ### Integers
92 |
93 | Create a vector of integers from one to five. Can you imagine why you might want to use integers instead of numbers/doubles?
94 |
95 | ::: {.panel-tabset}
96 | ## {{< fa code >}} Interactive editor
97 |
98 | ```{webr-r}
99 |
100 |
101 |
102 | ```
103 |
104 | ## {{< fa circle-check >}} Solution
105 |
106 | ```r
107 | c(1L, 2L, 3L, 4L, 5L)
108 | ```
109 |
110 | :::
111 |
112 |
113 | ### Floating point arithmetic
114 |
115 | Computers must use a finite amount of memory to store decimal numbers (which can sometimes require infinite precision). As a result, some decimals can only be saved as very precise approximations. From time to time you'll notice side effects of this imprecision, like below.
116 |
117 | Compute the square root of two, square the answer (e.g. multiply the square root of two by the square root of two), and then subtract two from the result. What answer do you expect? What answer do you get?
118 |
119 | ::: {.panel-tabset}
120 | ## {{< fa code >}} Interactive editor
121 |
122 | ```{webr-r}
123 |
124 |
125 |
126 | ```
127 |
128 | ## {{< fa circle-check >}} Solution
129 |
130 | ```r
131 | sqrt(2) * sqrt(2) - 2
132 | sqrt(2)^2 - 2
133 | ```
134 |
135 | :::
136 |
137 |
138 | ### Vectors
139 |
140 | ::: {.callout-note appearance="simple" icon=false .question}
141 |
142 | **How many types of data can you put into a single vector?**
143 |
144 | ```{r types3, echo=FALSE}
145 | check_question(
146 | answer = "1",
147 | options = c(
148 | "1",
149 | "6",
150 | "As many as you like"
151 | ),
152 | type = "radio",
153 | button_label = "Submit answer",
154 | q_id = 3,
155 | right = c("Correct!")
156 | )
157 | ```
158 |
159 | :::
160 |
161 | ### Character or object?
162 |
163 | One of the most common mistakes in R is to call an object when you mean to call a character string and vice versa.
164 |
165 | ::: {.callout-note appearance="simple" icon=false .question}
166 |
167 | **Which of these are object names? What is the difference between object names and character strings?**
168 |
169 | ```{r types4, echo=FALSE}
170 | check_question(
171 | answer = c("foo", "mu", "a"),
172 | options = c(
173 | "foo",
174 | '"num"',
175 | "mu",
176 | '"sigma"',
177 | '"data"',
178 | "a"
179 | ),
180 | type = "checkbox",
181 | button_label = "Submit answer",
182 | q_id = 4,
183 | right = c("Correct! Character strings are surrounded by quotation marks, object names are not.")
184 | )
185 | ```
186 |
187 | :::
188 |
189 |
190 | ##
191 |
192 | ```{r}
193 | #| echo: false
194 | #| results: asis
195 | create_buttons("06-lists.html")
196 | ```
197 |
--------------------------------------------------------------------------------
/basics/02-programming-basics/06-lists.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Lists"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 |
12 | engine: knitr
13 | filters:
14 | - webr
15 | webr:
16 | cell-options:
17 | editor-font-scale: 0.85
18 | fig-width: 6
19 | fig-height: 3.7
20 | out-width: "70%"
21 | ---
22 |
23 | ```{r include=FALSE}
24 | library(tidyverse)
25 | library(checkdown)
26 |
27 | source(here::here("R", "helpers.R"))
28 | ```
29 |
30 | ### Lists {.no-hide}
31 |
32 | Watch [this video](https://vimeo.com/220490360):
33 |
34 | ```{=html}
35 |
36 |
37 |
38 | ```
39 |
40 | ### Lists vs. vectors
41 |
42 | ::: {.callout-note appearance="simple" icon=false .question}
43 |
44 | **Which data structure(s) could you use to store these pieces of data in the same object? `1001`, `TRUE`, `"stories"`**
45 |
46 | ```{r lists1, echo=FALSE}
47 | check_question(
48 | answer = c("a list"),
49 | options = c(
50 | "a vector",
51 | "a list",
52 | "neither"
53 | ),
54 | type = "radio",
55 | button_label = "Submit answer",
56 | q_id = 1,
57 | right = c("Correct! Lists can contain elements that are different types.")
58 | )
59 | ```
60 |
61 | :::
62 |
63 |
64 | ### Make a list
65 |
66 | Make a list that contains the elements `1001`, `TRUE`, and `"stories"`. Give each element a name.
67 |
68 | ::: {.panel-tabset}
69 | ## {{< fa code >}} Interactive editor
70 |
71 | ```{webr-r}
72 |
73 |
74 |
75 | ```
76 |
77 | ## {{< fa circle-check >}} Solution
78 |
79 | ```r
80 | list(number = 1001, logical = TRUE, string = "stories")
81 | ```
82 |
83 | :::
84 |
85 |
86 | ### Extract an element
87 |
88 | Extract the number 1001 from the list below.
89 |
90 | ::: {.panel-tabset}
91 | ## {{< fa code >}} Interactive editor
92 |
93 | ```{webr-r}
94 | things <- list(number = 1001, logical = TRUE, string = "stories")
95 |
96 |
97 | ```
98 |
99 | ## {{< fa circle-check >}} Solution
100 |
101 | ```r
102 | things <- list(number = 1001, logical = TRUE, string = "stories")
103 | things$number
104 | ```
105 |
106 | :::
107 |
108 | ### Data Frames
109 |
110 | You can make a data frame with the `data.frame()` function, which works similar to `c()`, and `list()`. Assemble the vectors below into a data frame with the column names `numbers`, `logicals`, `strings`.
111 |
112 | ::: {.panel-tabset}
113 | ## {{< fa code >}} Interactive editor
114 |
115 | ```{webr-r}
116 | nums <- c(1, 2, 3, 4)
117 | logs <- c(TRUE, TRUE, FALSE, TRUE)
118 | strs <- c("apple", "banana", "carrot", "duck")
119 |
120 |
121 | ```
122 |
123 | ## {{< fa circle-check >}} Solution
124 |
125 | ```r
126 | nums <- c(1, 2, 3, 4)
127 | logs <- c(TRUE, TRUE, FALSE, TRUE)
128 | strs <- c("apple", "banana", "carrot", "duck")
129 | data.frame(numbers = nums, logicals = logs, strings = strs)
130 | ```
131 |
132 | :::
133 |
134 | ###
135 |
136 | Good job. When you make a data frame, you must follow one rule: each column vector should be the same length
137 |
138 |
139 | ### Extract a column
140 |
141 | Given that a data frame is a type of list (with named elements), how could you extract the strings column of the `df` data frame below? Do it.
142 |
143 | ::: {.panel-tabset}
144 | ## {{< fa code >}} Interactive editor
145 |
146 | ```{webr-r}
147 | nums <- c(1, 2, 3, 4)
148 | logs <- c(TRUE, TRUE, FALSE, TRUE)
149 | strs <- c("apple", "banana", "carrot", "duck")
150 | df <- data.frame(numbers = nums, logicals = logs, strings = strs)
151 |
152 |
153 | ```
154 |
155 | ## {{< fa circle-check >}} Solution
156 |
157 | ```r
158 | df$strings
159 | ```
160 |
161 | :::
162 |
163 | ##
164 |
165 | ```{r}
166 | #| echo: false
167 | #| results: asis
168 | create_buttons("07-packages.html")
169 | ```
170 |
--------------------------------------------------------------------------------
/basics/02-programming-basics/07-packages.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Packages"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 |
12 | engine: knitr
13 | filters:
14 | - webr
15 | webr:
16 | packages:
17 | - tidyverse
18 | cell-options:
19 | editor-font-scale: 0.85
20 | fig-width: 6
21 | fig-height: 3.7
22 | out-width: "70%"
23 | ---
24 |
25 | ```{r include=FALSE}
26 | library(tidyverse)
27 | library(checkdown)
28 |
29 | source(here::here("R", "helpers.R"))
30 | ```
31 |
32 | ### Packages {.no-hide}
33 |
34 | Watch [this video](https://vimeo.com/220490447):
35 |
36 | ```{=html}
37 |
38 |
39 |
40 | ```
41 |
42 | ### A common error
43 |
44 | ::: {.callout-note appearance="simple" icon=false .question}
45 |
46 | **What does this common error message suggest? `object _____ does not exist.`**
47 |
48 | ```{r packages1, echo=FALSE}
49 | check_question(
50 | answer = c("Either"),
51 | options = c(
52 | "You misspelled your object name",
53 | "You've forgot to load the package that ____ comes in",
54 | "Either"
55 | ),
56 | type = "radio",
57 | button_label = "Submit answer",
58 | q_id = 1,
59 | right = c("Correct!")
60 | )
61 | ```
62 |
63 | :::
64 |
65 |
66 | ### Load a package
67 |
68 | In the code chunk below, load the {tidyverse} package. Whenever you load a package R will also load all of the packages that the first package depends on. {tidyverse} takes advantage of this to create a shortcut for loading several common packages at once. Whenever you load {tidyverse}, {tidyverse} also loads {ggplot2}, {dplyr}, {tibble}, {tidyr}, {readr}, {purrr}, {forcats}, {stringr}, and {lubridate}.
69 |
70 | ::: {.panel-tabset}
71 | ## {{< fa code >}} Interactive editor
72 |
73 | ```{webr-r}
74 |
75 |
76 |
77 | ```
78 |
79 | ## {{< fa circle-check >}} Solution
80 |
81 | ```r
82 | library(tidyverse)
83 | ```
84 |
85 | :::
86 |
87 | ###
88 |
89 | Good job! R will keep the packages loaded until you close your R session. When you re-open R, you'll need to reload your packages.
90 |
91 |
92 | ### Quotes
93 |
94 | Did you know `library()` is a special function in R? You can pass `library()` a package name in quotes, like `library("tidyverse")`, or not in quotes, like `library(tidyverse)`---both will work! That's often not the case with R functions.
95 |
96 | In general, you should always use quotes unless you are writing the _name_ of something that is already loaded into R's memory, like a function, vector, or data frame.
97 |
98 | ### Install packages
99 |
100 | But what if the package that you want to load is not installed on your computer? How would you install the {dplyr} package on your own computer?
101 |
102 | ::: {.panel-tabset}
103 | ## {{< fa code >}} Interactive editor
104 |
105 | ```{webr-r}
106 |
107 |
108 |
109 | ```
110 |
111 | ## {{< fa circle-check >}} Solution
112 |
113 | ```r
114 | install.packages("dplyr")
115 | ```
116 |
117 | :::
118 |
119 | ###
120 |
121 | Good job! You only need to install a package once, unless you wish to update your local copying by reinstalling the package. Notice that `install.packages()` _always_ requires quotes around the package name.
122 |
123 |
124 | ### Congratulations!
125 |
126 | Congratulations. You now have a formal sense for how the basics of R work. Although you may think of your self as a data scientist, this brief computer science background will help you as you analyze data. Whenever R does something unexpected, you can apply your knowledge of how R works to figure out what went wrong.
127 |
128 |
129 | ##
130 |
131 | ```{r}
132 | #| echo: false
133 | #| results: asis
134 | create_buttons(NULL)
135 | ```
136 |
--------------------------------------------------------------------------------
/basics/02-programming-basics/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Programming basics"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 | ---
11 |
12 | ```{r include=FALSE}
13 | source(here::here("R", "helpers.R"))
14 | ```
15 |
16 | ### Welcome to R! {.no-hide}
17 |
18 | R is easiest to use when you know how the R language works. This tutorial will teach you the implicit background knowledge that informs every piece of R code. You'll learn about:
19 |
20 | * **functions** and their **arguments**
21 | * **objects**
22 | * R's basic **data types**
23 | * R's basic data structures including **vectors** and **lists**
24 | * R's **package system**
25 |
26 | ##
27 |
28 | ```{r}
29 | #| echo: false
30 | #| results: asis
31 | create_buttons("01-functions.html")
32 | ```
33 |
--------------------------------------------------------------------------------
/deploy.sh:
--------------------------------------------------------------------------------
1 | REMOTE_HOST="ath-cloud"
2 | REMOTE_DIR="~/sites/r-primers.andrewheiss.com/public"
3 | REMOTE_DEST=$REMOTE_HOST:$REMOTE_DIR
4 |
5 | echo "Uploading new changes to remote server..."
6 | echo
7 | rsync -crvP --delete _site/ $REMOTE_DEST
8 |
--------------------------------------------------------------------------------
/html/custom.scss:
--------------------------------------------------------------------------------
1 | @import url('https://fonts.googleapis.com/css2?family=Inter:ital,wght@0,100..900;1,100..900&display=swap');
2 |
3 | /*-- scss:defaults --*/
4 | // Tiepolo colors
5 | // MetBrewer::met.brewer("Tiepolo")
6 | $white: #fff !default;
7 | $gray-100: #f8f9fa !default;
8 | $gray-200: #e9ecef !default;
9 | $gray-300: #dee2e6 !default;
10 | $gray-400: #ced4da !default;
11 | $gray-500: #adb5bd !default;
12 | $gray-600: #6c757d !default;
13 | $gray-700: #495057 !default;
14 | $gray-800: #343a40 !default;
15 | $gray-900: #212529 !default;
16 | $black: #000 !default;
17 |
18 | $blue: #17486f !default;
19 | $indigo: #6610f2 !default;
20 | $purple: #6f42c1 !default;
21 | $pink: #d63384 !default;
22 | $red: #802417 !default;
23 | $orange: #c06636 !default;
24 | $yellow: #e8b960 !default;
25 | $green: #646e3b !default;
26 | $teal: #2b5851 !default;
27 | $cyan: #508ea2 !default;
28 |
29 | $primary: $blue !default;
30 | $secondary: $white !default;
31 | $success: $green !default;
32 | $info: $cyan !default;
33 | $warning: $orange !default;
34 | $danger: $red !default;
35 | $light: $gray-100 !default;
36 | $dark: $gray-900 !default;
37 |
38 | $font-family-sans-serif: Inter, sans-serif !default;
39 | $font-family-serif: Inter, serif !default; /* Not actually a serif font but whatever */
40 |
41 | $font-size-base: 1rem !default;
42 | $headings-font-weight: 700 !default;
43 |
44 | // $h1-font-size: $font-size-base * 2.35;
45 | // $h2-font-size: $font-size-base * 1.8;
46 | // $h3-font-size: $font-size-base * 1.45;
47 | // $h4-font-size: $font-size-base * 1.1;
48 | // $h5-font-size: $font-size-base * 1;
49 | // $h6-font-size: $font-size-base * 0.8;
50 | //
51 | // $toc-font-size: 0.95rem;
52 | // $sidebar-font-size: 1.1rem;
53 | // $sidebar-font-size-section: 0.95rem;
54 | // $footer-font-size: 0.95rem;
55 | //
56 | // $link-color: $red;
57 | // $link-hover-color: $yellow;
58 |
59 | // Inline code
60 | $code-bg: $gray-200 !default;
61 | $code-color: $gray-900 !default;
62 | //
63 | // // Block code
64 | // $monokai-bg: #2e3440;
65 |
66 |
67 | /*-- scss:rules --*/
68 | .hidden {
69 | display: none;
70 | }
71 |
72 | #buttons {
73 | padding-bottom: 600px;
74 | }
75 |
76 | .question {
77 | input[type="submit"] {
78 | margin-top: 0.75em;
79 | }
80 |
81 | input[type="submit"] + div {
82 | margin-top: 0.5em;
83 | }
84 | }
85 |
86 | .navbar-dark .navbar-nav .show>.nav-link,
87 | .navbar-dark .navbar-nav .active>.nav-link,
88 | .navbar-dark .navbar-nav .nav-link.active,
89 | div.sidebar-item-container .active,
90 | div.sidebar-item-container .show>.nav-link,
91 | div.sidebar-item-container .sidebar-link>code{
92 | color: $orange;
93 | font-weight: bold;
94 | }
95 |
96 | thead th {
97 | text-transform: none;
98 | }
99 |
--------------------------------------------------------------------------------
/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "R Primers"
3 | toc: false
4 | ---
5 |
6 | A version of the old RStudio/Posit Primers, now with Quarto and webR.
7 |
8 | ## License {.appendix}
9 |
10 | The original primers were developed by the RStudio/Posit Education Team and made [open source on GitHub](https://github.com/rstudio-education/primers). Following the original license, these tutorials are licensed under the Creative Commons Attribution-ShareAlike 4.0 License (CC BY-SA 4.0).
11 |
12 | The primers are derived from the book [*R for Data Science*](https://r4ds.had.co.nz/) from O'Reilly Media, Inc. Copyright © 2017 Garrett Grolemund, Hadley Wickham. Used with permission.
13 |
14 | [See here for the full license.](https://github.com/andrewheiss/r-primers/blob/main/LICENSE.md)
15 |
--------------------------------------------------------------------------------
/js/bootstrapify.js:
--------------------------------------------------------------------------------
1 | document.addEventListener('DOMContentLoaded', function() {
2 | // Select all forms that don't have a class set
3 | var formsWithoutClass = document.querySelectorAll('form:not([class])');
4 | formsWithoutClass.forEach(addBootstrapClasses);
5 |
6 | // Select all forms that use `method="post"`
7 | // var postForms = document.querySelectorAll('form[method="post"]');
8 | // postForms.forEach(addBootstrapClasses);
9 | });
10 |
11 | function addBootstrapClasses(form) {
12 | // Add the Bootstrap class 'form-group' to the form
13 | form.classList.add('form-group');
14 |
15 | // Select the radio inputs within this form and add the Bootstrap class 'form-check-input'
16 | var radioInputs = form.querySelectorAll('input[type="radio"]');
17 | radioInputs.forEach(function(input) {
18 | input.classList.add('form-check-input');
19 | });
20 |
21 | // Select the labels within this form and add the Bootstrap class 'form-check-label'
22 | var labels = form.querySelectorAll('label');
23 | labels.forEach(function(label) {
24 | label.classList.add('form-check-label');
25 | });
26 |
27 | // Select the submit button within this form and add the Bootstrap classes 'btn' and 'btn-primary'
28 | var submitButton = form.querySelector('input[type="submit"]');
29 | submitButton.classList.add('btn', 'btn-primary', 'btn-sm');
30 | }
31 |
--------------------------------------------------------------------------------
/js/progressive-reveal.js:
--------------------------------------------------------------------------------
1 | var key = 'currentSection' + window.location.pathname;
2 | var currentSection = localStorage.getItem(key) ? parseInt(localStorage.getItem(key)) : -1;
3 | var sections = Array.from(document.getElementsByClassName('level3'))
4 | .filter(section => !section.classList.contains('no-hide'));
5 |
6 | // Hide all sections initially
7 | sections.forEach(function (section) {
8 | section.classList.add('hidden');
9 | });
10 |
11 | function revealSection(sectionIndex) {
12 | sections[sectionIndex].classList.remove('hidden');
13 | }
14 |
15 | var continueButton = document.getElementById('continueButton');
16 | var nextTopicButton = document.getElementById('nextTopicButton');
17 |
18 | // Disable continue button if there are no sections
19 | if (sections.length === 0) {
20 | continueButton.disabled = true;
21 | nextTopicButton.classList.remove('disabled');
22 | // Otherwise progressively reveal sections
23 | } else {
24 | continueButton.addEventListener('click', function () {
25 | currentSection++;
26 | if (currentSection < sections.length) {
27 | revealSection(currentSection);
28 | localStorage.setItem(key, currentSection);
29 | // Jump to the id anchor for the current section
30 | window.location.hash = sections[currentSection].id;
31 | // Adjust scroll position to account for the height of the navbar
32 | window.scrollBy(0, 70);
33 | }
34 |
35 | if (currentSection >= sections.length - 1) {
36 | continueButton.disabled = true;
37 | nextTopicButton.classList.remove('disabled');
38 | }
39 | });
40 | }
41 |
42 | // On page load, reveal up to the current section
43 | window.onload = function () {
44 | for (var i = 0; i <= currentSection; i++) {
45 | revealSection(i);
46 | }
47 | };
48 |
49 | function clearProgress() {
50 | localStorage.removeItem(key);
51 | window.location.hash = '#'; // Remove the anchor from the URL
52 | }
53 |
54 | document.getElementById('resetButton').addEventListener('click', function () {
55 | clearProgress();
56 | // Reload the page to reflect the reset progress
57 | location.reload();
58 | });
59 |
--------------------------------------------------------------------------------
/r-primers.Rproj:
--------------------------------------------------------------------------------
1 | Version: 1.0
2 |
3 | RestoreWorkspace: Default
4 | SaveWorkspace: Default
5 | AlwaysSaveHistory: Default
6 |
7 | EnableCodeIndexing: Yes
8 | UseSpacesForTab: Yes
9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 |
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 |
15 | AutoAppendNewline: Yes
16 |
17 | ProjectName: R Primers
18 |
--------------------------------------------------------------------------------
/renv/.gitignore:
--------------------------------------------------------------------------------
1 | library/
2 | local/
3 | cellar/
4 | lock/
5 | python/
6 | sandbox/
7 | staging/
8 |
--------------------------------------------------------------------------------
/renv/settings.json:
--------------------------------------------------------------------------------
1 | {
2 | "bioconductor.version": null,
3 | "external.libraries": [],
4 | "ignored.packages": [],
5 | "package.dependency.fields": [
6 | "Imports",
7 | "Depends",
8 | "LinkingTo"
9 | ],
10 | "ppm.enabled": null,
11 | "ppm.ignored.urls": [],
12 | "r.version": null,
13 | "snapshot.type": "implicit",
14 | "use.cache": true,
15 | "vcs.ignore.cellar": true,
16 | "vcs.ignore.library": true,
17 | "vcs.ignore.local": true,
18 | "vcs.manage.ignores": true
19 | }
20 |
--------------------------------------------------------------------------------
/tidy-data/01-reshape-data/img/tidy.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/tidy-data/01-reshape-data/img/tidy.png
--------------------------------------------------------------------------------
/tidy-data/01-reshape-data/img/vectorized.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/tidy-data/01-reshape-data/img/vectorized.png
--------------------------------------------------------------------------------
/tidy-data/01-reshape-data/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Reshape data"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 | ---
11 |
12 | ```{r include=FALSE}
13 | knitr::opts_chunk$set(
14 | fig.width = 6,
15 | fig.height = 6 * 0.618,
16 | fig.retina = 3,
17 | dev = "ragg_png",
18 | fig.align = "center",
19 | out.width = "70%"
20 | )
21 |
22 | source(here::here("R", "helpers.R"))
23 | ```
24 |
25 | The tools that you learned in the previous Primers work best when your data is organized in a specific way. This format is known as **tidy data** and it appears throughout the tidyverse. You will spend a lot of time as a data scientist wrangling your data into a usable format, so it is important to learn how to do this fast.
26 |
27 | This tutorial will teach you how to recognize tidy data, as well as how to reshape untidy data into a tidy format. In it, you will learn the core data wrangling functions for the tidyverse:
28 |
29 | * `pivot_longer()`, which reshapes wide data into long data, and
30 | * `pivot_wider()`, which reshapes long data into wide data
31 |
32 | This tutorial uses the [core tidyverse packages](http://tidyverse.org/), including {ggplot2}, {dplyr}, and {tidyr}, as well as the {babynames} package. All of these packages have been pre-installed and pre-loaded for your convenience.
33 |
34 |
35 | ##
36 |
37 | ```{r}
38 | #| echo: false
39 | #| results: asis
40 | create_buttons("01-tidy-data.html")
41 | ```
42 |
--------------------------------------------------------------------------------
/transform-data/01-tibbles/01-babynames.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "babynames"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | engine: knitr
12 | filters:
13 | - webr
14 | webr:
15 | packages:
16 | - babynames
17 | autoload-packages: false
18 | cell-options:
19 | editor-font-scale: 0.85
20 | fig-width: 6
21 | fig-height: 3.7
22 | out-width: "70%"
23 | ---
24 |
25 | ```{r include=FALSE}
26 | knitr::opts_chunk$set(
27 | fig.width = 6,
28 | fig.height = 6 * 0.618,
29 | fig.retina = 3,
30 | dev = "ragg_png",
31 | fig.align = "center",
32 | out.width = "70%"
33 | )
34 |
35 | library(tidyverse)
36 |
37 | source(here::here("R", "helpers.R"))
38 | ```
39 |
40 | ### Loading babynames {.no-hide}
41 |
42 | Before we begin, let's learn a little about our data. The `babynames` dataset comes in the {babynames} package. The package is pre-installed for you, just as {ggplot2} was pre-installed in the last tutorial. But unlike in the last tutorial, I have not pre-_loaded_ {babynames}, or any other package.
43 |
44 | What does this mean? In R, whenever you want to use a package that is not part of base R, you need to load the package with the command `library()`. Until you load a package, R will not be able to find the datasets and functions contained in the package. For example, if we asked R to display the `babynames` dataset, which comes in the {babynames} package, right now, we'd get the message below. R cannot find the dataset because we haven't loaded the {babynames} package.
45 |
46 | ```{r error=TRUE}
47 | babynames
48 | ```
49 |
50 | To load the {babynames} package, you would run the command `library(babynames)`. After you load a package, R will be able to find its contents _until you close R_. The next time you open R, you will need to reload the package if you wish to use it again.
51 |
52 | This might sound like an inconvenience, but choosing which packages to load keeps your R experience simple and orderly.
53 |
54 | In the chunk below, load {babynames} (the package) and then open the help page for `babynames` (the dataset). Be sure to read the help page before going on.
55 |
56 | ::: {.panel-tabset}
57 | ## {{< fa code >}} Interactive editor
58 |
59 | ```{webr-r}
60 |
61 |
62 |
63 | ```
64 |
65 | ## {{< fa circle-check >}} Solution
66 |
67 | ```r
68 | library(babynames)
69 | ?babynames
70 | ```
71 |
72 | :::
73 |
74 | ```{r bnames, include=FALSE}
75 | library(babynames)
76 | ```
77 |
78 | ### The data
79 |
80 | Now that you know a little about the dataset, let's examine its contents. If you were to run `babynames` at your R console, you would get output that looks like this:
81 |
82 | ```{r echo=TRUE, eval=FALSE}
83 | babynames
84 |
85 | #> 187 1880 F Christina 65 6.659495e-04
86 | #> 188 1880 F Lelia 65 6.659495e-04
87 | #> 189 1880 F Nelle 65 6.659495e-04
88 | #> 190 1880 F Sue 65 6.659495e-04
89 | #> 191 1880 F Johanna 64 6.557041e-04
90 | #> 192 1880 F Lilly 64 6.557041e-04
91 | #> 193 1880 F Lucinda 63 6.454587e-04
92 | #> 194 1880 F Minerva 63 6.454587e-04
93 | #> 195 1880 F Lettie 62 6.352134e-04
94 | #> 196 1880 F Roxie 62 6.352134e-04
95 | #> 197 1880 F Cynthia 61 6.249680e-04
96 | #> 198 1880 F Helena 60 6.147226e-04
97 | #> 199 1880 F Hilda 60 6.147226e-04
98 | #> 200 1880 F Hulda 60 6.147226e-04
99 | #> [ reached getOption("max.print") -- omitted 1825233 rows ]
100 | ```
101 |
102 | Yikes. What is happening?
103 |
104 | ### Displaying large data
105 |
106 | `babynames` is a large data frame, and R is not well equipped to display the contents of large data frames. R shows as many rows as possible before your memory buffer is overwhelmed. At that point, R stops, leaving you to look at an arbitrary section of your data.
107 |
108 | You can avoid this behavior by transforming your data frame to a _tibble_.
109 |
110 |
111 | ##
112 |
113 | ```{r}
114 | #| echo: false
115 | #| results: asis
116 | create_buttons("02-tibbles.html")
117 | ```
118 |
--------------------------------------------------------------------------------
/transform-data/01-tibbles/02-tibbles.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "tibbles"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | engine: knitr
12 | filters:
13 | - webr
14 | webr:
15 | packages:
16 | - tibble
17 | - babynames
18 | autoload-packages: false
19 | cell-options:
20 | editor-font-scale: 0.85
21 | fig-width: 6
22 | fig-height: 3.7
23 | out-width: "70%"
24 | ---
25 |
26 | ```{r include=FALSE}
27 | knitr::opts_chunk$set(
28 | fig.width = 6,
29 | fig.height = 6 * 0.618,
30 | fig.retina = 3,
31 | dev = "ragg_png",
32 | fig.align = "center",
33 | out.width = "70%"
34 | )
35 |
36 | library(tidyverse)
37 |
38 | source(here::here("R", "helpers.R"))
39 | ```
40 |
41 | ```{webr-r}
42 | #| context: setup
43 | library(babynames)
44 | ```
45 |
46 | ### What is a tibble? {.no-hide}
47 |
48 | A tibble is a special type of table. R displays tibbles in a refined way whenever you have the **tibble** package loaded: R will print only the first ten rows of a tibble as well as all of the columns that fit into your console window. R also adds useful summary information about the tibble, such as the data types of each column and the size of the data set.
49 |
50 | Whenever you do not have the tibble packages loaded, R will display the tibble as if it were a data frame. In fact, tibbles _are_ data frames, an enhanced type of data frame.
51 |
52 | You can think of the difference between the data frame display and the tibble display like this:
53 |
54 | {width=75%}
55 |
56 | ### `as_tibble()`
57 |
58 | You can transform a data frame to a tibble with the `as_tibble()` function in the tibble package, e.g. `as_tibble(cars)`. However, `babynames` is already a tibble. To display it nicely, you just need to load the {tibble} package.
59 |
60 | To see what I mean, use `library()` to load the tibble package in the chunk below and then call `babynames`.
61 |
62 | ::: {.panel-tabset}
63 | ## {{< fa code >}} Interactive editor
64 |
65 | ```{webr-r}
66 |
67 |
68 |
69 | ```
70 |
71 | ## {{< fa circle-check >}} Solution
72 |
73 | ```r
74 | library(tibble)
75 | library(babynames)
76 | babynames
77 | ```
78 |
79 | :::
80 |
81 | ###
82 |
83 | Excellent! If you want to check whether or not an object is a tibble, you can use the `is_tibble()` function that comes in the tibble package. For example, this would return TRUE: `is_tibble(babynames)`.
84 |
85 |
86 | ### `View()`
87 |
88 | What if you'd like to inspect the remaining portions of a tibble? To see the entire tibble, use the `View()` command. R will launch a window that shows a scrollable display of the entire data set. For example, the code below will launch a data viewer in RStudio.
89 |
90 | ```{r eval=FALSE}
91 | View(babynames)
92 | ```
93 |
94 | `View()` works in conjunction with the software that you run R from: `View()` opens the data editor provided by that software. Unfortunately, this tutorial doesn't come with a data editor, so you won't be able to use `View()` today (unless you open RStudio, for example, and run the code there).
95 |
96 |
97 | ##
98 |
99 | ```{r}
100 | #| echo: false
101 | #| results: asis
102 | create_buttons("03-tidyverse.html")
103 | ```
104 |
--------------------------------------------------------------------------------
/transform-data/01-tibbles/03-tidyverse.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "tidyverse"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 |
12 | engine: knitr
13 | filters:
14 | - webr
15 | webr:
16 | packages:
17 | - tidyverse
18 | autoload-packages: false
19 | cell-options:
20 | editor-font-scale: 0.85
21 | fig-width: 6
22 | fig-height: 3.7
23 | out-width: "70%"
24 | ---
25 |
26 | ```{r include=FALSE}
27 | knitr::opts_chunk$set(
28 | fig.width = 6,
29 | fig.height = 6 * 0.618,
30 | fig.retina = 3,
31 | dev = "ragg_png",
32 | fig.align = "center",
33 | out.width = "70%"
34 | )
35 |
36 | library(tidyverse)
37 | library(checkdown)
38 |
39 | source(here::here("R", "helpers.R"))
40 | ```
41 |
42 |
43 | ### The tidyverse {.no-hide}
44 |
45 | The {tibble} package is one of several packages that are known collectively as ["the tidyverse"](http://tidyverse.org). Tidyverse packages share a common philosophy and are designed to work well together. For example, in this tutorial you will use the {tibble} package, the {ggplot2} package, and the {dplyr} package, all of which belong to the tidyverse.
46 |
47 | ### The tidyverse package
48 |
49 | When you use tidyverse packages, you can make your life easier by using the {tidyverse} package. The {tidyverse} package provides a shortcut for installing and loading the entire suite of packages in "the tidyverse", e.g.
50 |
51 | ```{r eval = FALSE}
52 | install.packages("tidyverse")
53 | library(tidyverse)
54 | ```
55 |
56 | ### Installing the tidyverse
57 |
58 | Think of the {tidyverse} package as a placeholder for the packages that are in the "tidyverse". By itself, {tidyverse} does not do much, but when you install the {tidyverse} package it instructs R to install every other package in the tidyverse at the same time. In other words, when you run `install.packages("tidyverse")`, R installs the following packages for you in one simple step:
59 |
60 | * ggplot2
61 | * dplyr
62 | * tidyr
63 | * readr
64 | * purrr
65 | * tibble
66 | * hms
67 | * stringr
68 | * lubridate
69 | * forcats
70 | * DBI
71 | * haven
72 | * jsonlite
73 | * readxl
74 | * rvest
75 | * xml2
76 | * modelr
77 | * broom
78 |
79 | ### Loading the tidyverse
80 |
81 | When you load tidyverse with `library("tidyverse")`, it instructs R to load _the most commonly used_ tidyverse packages. These are:
82 |
83 | * ggplot2
84 | * dplyr
85 | * tidyr
86 | * readr
87 | * purrr
88 | * tibble
89 | * stringr
90 | * forcats
91 | * lubridate
92 |
93 | You can load the less commonly used tidyverse packages in the normal way, by running `library()` for each of them.
94 |
95 | Let's give this a try. We will use the ggplot2 and dplyr packages later in this tutorial. Let's use the tidyverse package to load them in the chunk below:
96 |
97 | ::: {.panel-tabset}
98 | ## {{< fa code >}} Interactive editor
99 |
100 | ```{webr-r}
101 |
102 |
103 |
104 | ```
105 |
106 | ## {{< fa circle-check >}} Solution
107 |
108 | ```r
109 | library(tidyverse)
110 | ```
111 |
112 | :::
113 |
114 | ### Quiz
115 |
116 | ::: {.callout-note appearance="simple" icon=false .question}
117 |
118 | **Which package is not loaded by `library("tidyverse")`**
119 |
120 | ```{r tidyverse-check, echo=FALSE}
121 | check_question(
122 | answer = "babynames",
123 | options = c(
124 | "ggplot2",
125 | "dplyr",
126 | "tibble",
127 | "babynames"
128 | ),
129 | type = "radio",
130 | button_label = "Submit answer",
131 | q_id = 1,
132 | right = c("Correct!")
133 | )
134 | ```
135 | :::
136 |
137 | ### Recap
138 |
139 | Tibbles and the {tidyverse} package are two tools that make life with R easier. Ironically, you may not come to appreciate their value right away: these tutorials pre-load packages for you. However, you will want to use tibbles and the {tidyverse} package when you move out of the tutorials and begin doing your own work with R inside of RStudio.
140 |
141 | This tutorial also introduced the babynames dataset. In the next tutorial, you will use this data set to plot the popularity of _your_ name over time. Along the way, you will learn how to filter and subset data sets in R.
142 |
143 | ##
144 |
145 | ```{r}
146 | #| echo: false
147 | #| results: asis
148 | create_buttons(NULL)
149 | ```
150 |
--------------------------------------------------------------------------------
/transform-data/01-tibbles/img/tibble_display.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/transform-data/01-tibbles/img/tibble_display.png
--------------------------------------------------------------------------------
/transform-data/01-tibbles/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Working with tibbles"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 | ---
11 |
12 | ```{r include=FALSE}
13 | knitr::opts_chunk$set(
14 | fig.width = 6,
15 | fig.height = 6 * 0.618,
16 | fig.retina = 3,
17 | dev = "ragg_png",
18 | fig.align = "center",
19 | out.width = "70%"
20 | )
21 |
22 | source(here::here("R", "helpers.R"))
23 | ```
24 |
25 | In this primer, you will explore the popularity of different names over time. To succeed, you will need to master some common tools for manipulating data with R:
26 |
27 | * tibbles and `View()`, which let you inspect raw data
28 | * `select()` and `filter()`, which let you extract rows and columns from a data frame
29 | * `arrange()`, which lets you reorder the rows in your data
30 | * `|>`, which organizes your code into reader-friendly "pipes"
31 | * `mutate()`, `group_by()`, and `summarize()`, which help you use your data to compute new variables and summary statistics
32 |
33 | These are some of the most useful R functions for data science, and the tutorials that follow will provide you everything you need to learn them.
34 |
35 | In the tutorials, we'll use a dataset named `babynames`, which comes in a package that is also named `babynames`. Within `babynames`, you will find information about almost every name given to children in the United States since 1880.
36 |
37 | This tutorial introduces `babynames` as well as a new data structure that makes working with data in R easy: the tibble.
38 |
39 | In addition to `babynames`, this tutorial uses the [core tidyverse packages](http://tidyverse.org/), including {ggplot2}, {tibble}, and {dplyr.} All of these packages have been pre-installed for your convenience. But they haven't been pre-loaded---something you will soon learn more about!
40 |
41 |
42 | ##
43 |
44 | ```{r}
45 | #| echo: false
46 | #| results: asis
47 | create_buttons("01-babynames.html")
48 | ```
49 |
--------------------------------------------------------------------------------
/transform-data/02-isolating/01-your-name.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Your name"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 | ---
11 |
12 | ```{r include=FALSE}
13 | knitr::opts_chunk$set(
14 | fig.width = 6,
15 | fig.height = 6 * 0.618,
16 | fig.retina = 3,
17 | dev = "ragg_png",
18 | fig.align = "center",
19 | out.width = "70%"
20 | )
21 |
22 | library(tidyverse)
23 | library(babynames)
24 |
25 | source(here::here("R", "helpers.R"))
26 | ```
27 |
28 | ### The history of your name {.no-hide}
29 |
30 | You can use the data in `babynames` to make graphs like this, which reveal the history of a name, perhaps your name.
31 |
32 | ```{r echo=FALSE, message=FALSE, warning=FALSE, out.width="90%"}
33 | babynames |>
34 | filter(name == "Andrew", sex == "M") |>
35 | ggplot() +
36 | geom_line(aes(x = year, y = prop)) +
37 | labs(title = "Popularity of the name Andrew")
38 | ```
39 |
40 | But before you do, you will need to trim down `babynames`. At the moment, there are more rows in `babynames` than you need to build your plot.
41 |
42 | ### An example
43 |
44 | To see what I mean, consider how I made the plot above: I began with the entire dataset, which if plotted as a scatterplot would've looked like this.
45 |
46 | ```{r plot-all-names, out.width="60%", cache=TRUE}
47 | ggplot(babynames) +
48 | geom_point(aes(x = year, y = prop)) +
49 | labs(title = "Popularity of every name in the dataset")
50 | ```
51 |
52 | I then narrowed the data to just the rows that contain my name, before plotting the data with a line geom. Here's how the rows with just my name look as a scatterplot.
53 |
54 | ```{r out.width="60%"}
55 | babynames |>
56 | filter(name == "Andrew", sex == "M") |>
57 | ggplot() +
58 | geom_point(aes(x = year, y = prop)) +
59 | labs(title = "Popularity of the name Andrew")
60 | ```
61 |
62 | If I had skipped this step, my line graph would've connected all of the points in the large dataset, creating an uninformative graph.
63 |
64 | ```{r out.width="60%", cached=TRUE}
65 | ggplot(babynames) +
66 | geom_line(aes(x = year, y = prop)) +
67 | labs(title = "Popularity of every name in the dataset")
68 | ```
69 |
70 | Your goal in this section is to repeat this process for your own name (or a name that you choose). Along the way, you will learn a set of functions that isolate information within a dataset.
71 |
72 | ### Isolating data
73 |
74 | This type of task occurs often in data science: you need to extract data from a table before you can use it. You can do this task quickly with three functions that come in the {dplyr} package:
75 |
76 | 1. **`select()`**, which extracts columns from a data frame
77 | 1. **`filter()`**, which extracts rows from a data frame
78 | 1. **`arrange()`**, which moves important rows to the top of a data frame
79 |
80 | Each function takes a data frame or tibble as its first argument and returns a new data frame or tibble as its output.
81 |
82 |
83 | ##
84 |
85 | ```{r}
86 | #| echo: false
87 | #| results: asis
88 | create_buttons("02-select.html")
89 | ```
90 |
--------------------------------------------------------------------------------
/transform-data/02-isolating/02-select.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "`select()`"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 |
12 | engine: knitr
13 | filters:
14 | - webr
15 | webr:
16 | packages:
17 | - babynames
18 | - dplyr
19 | cell-options:
20 | editor-font-scale: 0.85
21 | fig-width: 6
22 | fig-height: 3.7
23 | out-width: "70%"
24 | ---
25 |
26 | ```{r include=FALSE}
27 | knitr::opts_chunk$set(
28 | fig.width = 6,
29 | fig.height = 6 * 0.618,
30 | fig.retina = 3,
31 | dev = "ragg_png",
32 | fig.align = "center",
33 | out.width = "70%"
34 | )
35 |
36 | library(tidyverse)
37 | library(checkdown)
38 |
39 | source(here::here("R", "helpers.R"))
40 | ```
41 |
42 | `select()` extracts columns of a data frame and returns the columns as a new data frame. To use `select()`, pass it the name of a data frame to extract columns from, and then the names of the columns to extract. The column names do not need to appear in quotation marks or be prefixed with a `$`; `select()` knows to find them in the data frame that you supply.
43 |
44 | ### Exercise: `select()`
45 |
46 | Use the example below to get a feel for `select()`. Can you extract just the `name` column? How about the `name` and `year` columns? How about all of the columns except `prop`?
47 |
48 | ::: {.panel-tabset}
49 | ## {{< fa code >}} Interactive editor
50 |
51 | ```{webr-r}
52 | select(babynames, name, sex)
53 |
54 |
55 | ```
56 |
57 | ## {{< fa circle-check >}} Solution
58 |
59 | ```r
60 | select(babynames, name)
61 | select(babynames, name, year)
62 | select(babynames, year, sex, name, n)
63 | ```
64 |
65 | :::
66 |
67 |
68 | ### `select()` helpers
69 |
70 | You can also use a series of helpers with `select()`. For example, if you place a minus sign before a column name, `select()` will return every column but that column. Can you predict how the minus sign will work here?
71 |
72 | ::: {.panel-tabset}
73 | ## {{< fa code >}} Interactive editor
74 |
75 | ```{webr-r}
76 | select(babynames, -c(n, prop))
77 |
78 |
79 | ```
80 |
81 | :::
82 |
83 | The table below summarizes the other `select()` helpers that are available in {dplyr}. Study it, and then click "Continue" to test your understanding.
84 |
85 | | Helper function | Use | Example |
86 | |-------------------|------------------------|------------------------------|
87 | | **`-`** | Columns except | `select(babynames, -prop)` |
88 | | **`:`** | Columns between (inclusive) | `select(babynames, year:n)` |
89 | | **`contains()`** | Columns that contains a string | `select(babynames, contains("n"))` |
90 | | **`ends_with()`** | Columns that ends with a string | `select(babynames, ends_with("n"))` |
91 | | **`matches()`** | Columns that matches a regex | `select(babynames, matches("n"))` |
92 | | **`num_range()`** | Columns with a numerical suffix in the range | Not applicable with `babynames` |
93 | | **`one_of()`** | Columns whose name appear in the given set | `select(babynames, one_of(c("sex", "gender")))` |
94 | | **`starts_with()`** | Columns that starts with a string | `select(babynames, starts_with("n"))` |
95 |
96 | : {tbl-colwidths="[15, 35, 35]" .striped .hover .table-sm}
97 |
98 |
99 | ### `select()` quiz
100 |
101 | ::: {.callout-note appearance="simple" icon=false .question}
102 |
103 | **Which of these is not a way to select the `name` and `n` columns together?**
104 |
105 | ```{r predict, echo=FALSE}
106 | check_question(
107 | answer = 'select(babynames, ends_with("n"))',
108 | options = c(
109 | "select(babynames, -c(year, sex, prop))",
110 | "select(babynames, name:n)",
111 | 'select(babynames, starts_with("n"))',
112 | 'select(babynames, ends_with("n"))'
113 | ),
114 | type = "radio",
115 | button_label = "Submit answer",
116 | q_id = 1,
117 | right = c("Correct!")
118 | )
119 | ```
120 | :::
121 |
122 | ##
123 |
124 | ```{r}
125 | #| echo: false
126 | #| results: asis
127 | create_buttons("03-filter.html")
128 | ```
129 |
--------------------------------------------------------------------------------
/transform-data/02-isolating/04-arrange.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "`arrange()`"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | engine: knitr
12 | filters:
13 | - webr
14 | webr:
15 | packages:
16 | - babynames
17 | - dplyr
18 | cell-options:
19 | editor-font-scale: 0.85
20 | fig-width: 6
21 | fig-height: 3.7
22 | out-width: "70%"
23 | ---
24 |
25 | ```{r include=FALSE}
26 | knitr::opts_chunk$set(
27 | fig.width = 6,
28 | fig.height = 6 * 0.618,
29 | fig.retina = 3,
30 | dev = "ragg_png",
31 | fig.align = "center",
32 | out.width = "70%"
33 | )
34 |
35 | library(tidyverse)
36 | library(babynames)
37 |
38 | source(here::here("R", "helpers.R"))
39 | ```
40 |
41 | `arrange()` returns all of the rows of a data frame reordered by the values of a column. As with `select()`, the first argument of `arrange()` should be a data frame and the remaining arguments should be the names of columns. If you give `arrange()` a single column name, it will return the rows of the data frame reordered so that the row with the lowest value in that column appears first, the row with the second lowest value appears second, and so on. If the column contains character strings, `arrange()` will place them in alphabetical order.
42 |
43 | ### Exercise: `arrange()`
44 |
45 | Use the code chunk below to arrange babynames by `n`. Can you tell what the smallest value of `n` is?
46 |
47 | ::: {.panel-tabset}
48 | ## {{< fa code >}} Interactive editor
49 |
50 | ```{webr-r}
51 |
52 |
53 |
54 | ```
55 |
56 | ## {{< fa circle-check >}} Solution
57 |
58 | ```r
59 | arrange(babynames, n)
60 | ```
61 |
62 | :::
63 |
64 | ###
65 |
66 | Good job! The compiler of `babynames` used 5 as a cutoff; a name only made it into `babynames` for a given year and gender if it was used for five or more children.
67 |
68 | ### Tie breakers
69 |
70 | If you supply additional column names, `arrange()` will use them as tie breakers to order rows that have identical values in the earlier columns. Add to the code below, to make `prop` a tie breaker. The result should first order rows by value of `n` and then reorder rows within each value of `n` by values of `prop`.
71 |
72 | ::: {.panel-tabset}
73 | ## {{< fa code >}} Interactive editor
74 |
75 | ```{webr-r}
76 | arrange(babynames, n)
77 |
78 |
79 | ```
80 |
81 | ## {{< fa circle-check >}} Solution
82 |
83 | ```r
84 | arrange(babynames, n, prop)
85 | ```
86 |
87 | :::
88 |
89 |
90 | ### `desc()`
91 |
92 | If you would rather arrange rows in the opposite order, i.e. from _large_ values to _small_ values, surround a column name with `desc()`. `arrange()` will reorder the rows based on the largest values to the smallest.
93 |
94 | Add a `desc()` to the code below to display the most popular name for 2017 (the largest year in the dataset) instead of 1880 (the smallest year in the dataset).
95 |
96 | ::: {.panel-tabset}
97 | ## {{< fa code >}} Interactive editor
98 |
99 | ```{webr-r}
100 | arrange(babynames, year, desc(prop))
101 |
102 |
103 | ```
104 |
105 | ## {{< fa circle-check >}} Solution
106 |
107 | ```r
108 | arrange(babynames, desc(year), desc(prop))
109 | ```
110 |
111 | :::
112 |
113 | Think you have it? Click Continue to test yourself.
114 |
115 | ### `arrange()` quiz
116 |
117 | Which name was the most popular for a single gender in a single year? In the code chunk below, use `arrange()` to make the row with the largest value of `prop` appear at the top of the data set.
118 |
119 | ::: {.panel-tabset}
120 | ## {{< fa code >}} Interactive editor
121 |
122 | ```{webr-r}
123 |
124 |
125 |
126 | ```
127 |
128 | ## {{< fa circle-check >}} Solution
129 |
130 | ```r
131 | arrange(babynames, desc(prop))
132 | ```
133 |
134 | :::
135 |
136 | Now arrange `babynames` so that the row with the largest value of `n` appears at the top of the data frame. Will this be the same row? Why or why not?
137 |
138 | ::: {.panel-tabset}
139 | ## {{< fa code >}} Interactive editor
140 |
141 | ```{webr-r}
142 |
143 |
144 |
145 | ```
146 |
147 | ## {{< fa circle-check >}} Solution
148 |
149 | ```r
150 | arrange(babynames, desc(n))
151 | ```
152 |
153 | :::
154 |
155 | ###
156 |
157 | The number of children represented by each proportion grew over time as the population grew.
158 |
159 | ##
160 |
161 | ```{r}
162 | #| echo: false
163 | #| results: asis
164 | create_buttons("05-pipe.html")
165 | ```
166 |
--------------------------------------------------------------------------------
/transform-data/02-isolating/05-pipe.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "`|>`"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | engine: knitr
12 | filters:
13 | - webr
14 | webr:
15 | packages:
16 | - babynames
17 | - dplyr
18 | - ggplot2
19 | cell-options:
20 | editor-font-scale: 0.85
21 | fig-width: 6
22 | fig-height: 3.7
23 | out-width: "70%"
24 | ---
25 |
26 | ```{r include=FALSE}
27 | knitr::opts_chunk$set(
28 | fig.width = 6,
29 | fig.height = 6 * 0.618,
30 | fig.retina = 3,
31 | dev = "ragg_png",
32 | fig.align = "center",
33 | out.width = "70%"
34 | )
35 |
36 | library(tidyverse)
37 | library(babynames)
38 |
39 | source(here::here("R", "helpers.R"))
40 | ```
41 |
42 | ### Steps {.no-hide}
43 |
44 | Notice how each {dplyr} function takes a data frame as input and returns a data frame as output. This makes the functions easy to use in a step-by-step fashion. For example, you could:
45 |
46 | 1. Filter `babynames` to just boys born in 2017
47 | 2. Select the `name` and `n` columns from the result
48 | 3. Arrange those columns so that the most popular names appear near the top.
49 |
50 | ```{r}
51 | boys_2017 <- filter(babynames, year == 2017, sex == "M")
52 | boys_2017 <- select(boys_2017, name, n)
53 | boys_2017 <- arrange(boys_2017, desc(n))
54 | boys_2017
55 | ```
56 |
57 | ### Redundancy
58 |
59 | The result shows us the most popular boys names from 2017, which is the most recent year in the data set. But take a look at the code. Do you notice how we re-create `boys_2017` at each step so we will have something to pass to the next step? This is an inefficient way to write R code.
60 |
61 | You could avoid creating `boys_2017` by nesting your functions inside of each other, but this creates code that is hard to read:
62 |
63 | ```{r eval=FALSE}
64 | arrange(select(filter(babynames, year == 2017, sex == "M"), name, n), desc(n))
65 | ```
66 |
67 | There is a third way to write sequences of functions: the pipe.
68 |
69 | ### |>
70 |
71 | The pipe operator `|>` performs an extremely simple task: it passes the result on its left into the first argument of the function on its right. Or put another way, `x |> f(y)` is the same as `f(x, y)`. This piece of code punctuation makes it easy to write and read series of functions that are applied in a step by step way. For example, we can use the pipe to rewrite our code above:
72 |
73 | ```{r}
74 | babynames |>
75 | filter(year == 2017, sex == "M") |>
76 | select(name, n) |>
77 | arrange(desc(n))
78 | ```
79 |
80 | As you read the code, pronounce `|>` as **"and then"**. You'll notice that {dplyr} makes it easy to read pipes. Each function name is a verb, so our code resembles the statement, "Take `babynames`, _and then_ filter it by `name` and `sex`, _and then_ select the `name` and `n` columns, _and then_ arrange the results by descending values of `n`."
81 |
82 | {dplyr} also makes it easy to write pipes. Each {dplyr} function returns a data frame that can be piped into another {dplyr} function, which will accept the data frame as its first argument. In fact, {dplyr} functions are written with pipes in mind: each function does one simple task. {dplyr} expects you to use pipes to combine these simple tasks to produce sophisticated results.
83 |
84 | ### Exercise: Pipes
85 |
86 | I'll use pipes for the remainder of the tutorial, and I will expect you to as well. Let's practice a little by writing a new pipe in the chunk below. The pipe should:
87 |
88 | 1. Filter `babynames` to just the *girls* that were born in 2017
89 | 2. Select the `name` and `n` columns
90 | 3. Arrange the results so that the most popular names are near the top.
91 |
92 | Try to write your pipe without copying and pasting the code from above.
93 |
94 | ::: {.panel-tabset}
95 | ## {{< fa code >}} Interactive editor
96 |
97 | ```{webr-r}
98 |
99 |
100 |
101 | ```
102 |
103 | ## {{< fa circle-check >}} Solution
104 |
105 | ```r
106 | babynames |>
107 | filter(year == 2017, sex == "F") |>
108 | select(name, n) |>
109 | arrange(desc(n))
110 | ```
111 |
112 | :::
113 |
114 | ### Your name
115 |
116 | You've now mastered a set of skills that will let you easily plot the popularity of your name over time. In the code chunk below, use a combination of {dplyr} and {ggplot2} functions with `|>` to:
117 |
118 | 1. Trim `babynames` to just the rows that contain your name and your sex
119 | 2. Trim the result to just the columns that will appear in your graph (not strictly necessary, but useful practice)
120 | 3. Plot the results as a line graph with `year` on the x axis and `prop` on the y axis
121 |
122 | Note that the first argument of `ggplot()` takes a data frame, which means you can add `ggplot()` directly to the end of a pipe. However, you will need to switch from `|>` to `+` to finish adding layers to your plot.
123 |
124 | ::: {.panel-tabset}
125 | ## {{< fa code >}} Interactive editor
126 |
127 | ```{webr-r}
128 |
129 |
130 |
131 | ```
132 |
133 | ## {{< fa circle-check >}} Solution
134 |
135 | ```r
136 | babynames |>
137 | filter(name == "Andrew", sex == "M") |>
138 | select(year, prop) |>
139 | ggplot() +
140 | geom_line(aes(x = year, y = prop)) +
141 | labs(title = "Popularity of the name Andrew")
142 | ```
143 |
144 | :::
145 |
146 | ### Recap
147 |
148 | Together, `select()`, `filter()`, and `arrange()` let you quickly find information displayed within your data.
149 |
150 | The next tutorial will show you how to derive information that is implied by your data, but not displayed within your data set.
151 |
152 | In that tutorial, you will continue to use the `|>` operator, which is an essential part of programming with the dplyr library.
153 |
154 | Pipes help make R expressive, like a spoken language. Spoken languages consist of simple words that you combine into sentences to create sophisticated thoughts.
155 |
156 | In the tidyverse, functions are like words: each does one simple task well. You can combine these tasks into pipes with `|>` to perform complex, customized procedures.
157 |
158 |
159 | ##
160 |
161 | ```{r}
162 | #| echo: false
163 | #| results: asis
164 | create_buttons(NULL)
165 | ```
166 |
--------------------------------------------------------------------------------
/transform-data/02-isolating/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Isolating data with {dplyr}"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 | ---
11 |
12 | ```{r include=FALSE}
13 | knitr::opts_chunk$set(
14 | fig.width = 6,
15 | fig.height = 6 * 0.618,
16 | fig.retina = 3,
17 | dev = "ragg_png",
18 | fig.align = "center",
19 | out.width = "70%"
20 | )
21 |
22 | source(here::here("R", "helpers.R"))
23 | ```
24 |
25 | In this case study, you will explore the popularity of your own name over time. Along the way, you will master some of the most useful functions for isolating variables, cases, and values within a data frame:
26 |
27 | * `select()` and `filter()`, which let you extract rows and columns from a data frame
28 | * `arrange()`, which lets you reorder the rows in your data
29 | * `|>`, which organizes your code into reader-friendly "pipes"
30 |
31 | This tutorial uses the [core tidyverse packages](http://tidyverse.org/), including {ggplot2}, {tibble}, and {dplyr}, as well as the {babynames} package. All of these packages have been pre-installed and pre-loaded for your convenience.
32 |
33 |
34 | ##
35 |
36 | ```{r}
37 | #| echo: false
38 | #| results: asis
39 | create_buttons("01-your-name.html")
40 | ```
41 |
--------------------------------------------------------------------------------
/transform-data/03-deriving/01-most-popular-names.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "The most popular names"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | ---
12 |
13 | ```{r include=FALSE}
14 | knitr::opts_chunk$set(
15 | fig.width = 6,
16 | fig.height = 6 * 0.618,
17 | fig.retina = 3,
18 | dev = "ragg_png",
19 | fig.align = "center",
20 | out.width = "70%"
21 | )
22 |
23 | library(tidyverse)
24 | library(babynames)
25 | library(checkdown)
26 |
27 | source(here::here("R", "helpers.R"))
28 | ```
29 |
30 | ### What are the most popular names of all time? {.no-hide}
31 |
32 | Let's use `babynames` to answer a different question: what are the most popular names of all time?
33 |
34 | This question seems simple enough, but to answer it we need to be more precise: how do you define "the most popular" names? Try to think of several definitions and then click Continue. After the Continue button, I will suggest two definitions of my own.
35 |
36 | ### Two definitions of popular
37 |
38 | I suggest that we focus on two definitions of _popular_, one that uses sums and one that uses ranks:
39 |
40 | 1. **Sums** - A name is popular _if the total number of children that have the name is large when you sum across years_.
41 | 2. **Ranks** - A name is popular _if it consistently ranks among the top names from year to year_.
42 |
43 | This raises a question:
44 |
45 | ::: {.callout-note appearance="simple" icon=false .question}
46 |
47 | **Do we have enough information in `babynames` to compare the popularity of names?**
48 |
49 | ```{r predict, echo=FALSE}
50 | check_question(
51 | answer = "Yes. We can use the information in `babynames` to compute the values we want.",
52 | options = c(
53 | "No. No cell in `babynames` contains a rank value or a sum across years.",
54 | "Yes. We can use the information in `babynames` to compute the values we want."
55 | ),
56 | type = "radio",
57 | button_label = "Submit answer",
58 | q_id = 1,
59 | right = c("Correct!")
60 | )
61 | ```
62 | :::
63 |
64 | ### Deriving information
65 |
66 | Every data frame that you meet implies more information than it displays. For example, `babynames` does not display the total number of children who had your name, but `babynames` certainly implies what that number is. To discover the number, you only need to do a calculation:
67 |
68 | ```{r}
69 | babynames |>
70 | filter(name == "Andrew", sex == "M") |>
71 | summarize(total = sum(n))
72 | ```
73 |
74 | ### Useful functions
75 |
76 | {dplyr} provides three functions that can help you reveal the information implied by your data:
77 |
78 | * `summarize()`
79 | * `group_by()`
80 | * `mutate()`
81 |
82 | Like `select()`, `filter()` and `arrange()`, these functions all take a data frame as their first argument and return a new data frame as their output, which makes them easy to use in pipes.
83 |
84 | Let's master each function and use them to analyze popularity as we go.
85 |
86 | ##
87 |
88 | ```{r}
89 | #| echo: false
90 | #| results: asis
91 | create_buttons("02-summarize.html")
92 | ```
93 |
--------------------------------------------------------------------------------
/transform-data/03-deriving/02-summarize.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "`summarize()`"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | engine: knitr
12 | filters:
13 | - webr
14 | webr:
15 | packages:
16 | - babynames
17 | - dplyr
18 | cell-options:
19 | editor-font-scale: 0.85
20 | fig-width: 6
21 | fig-height: 3.7
22 | out-width: "70%"
23 | ---
24 |
25 | ```{r include=FALSE}
26 | knitr::opts_chunk$set(
27 | fig.width = 6,
28 | fig.height = 6 * 0.618,
29 | fig.retina = 3,
30 | dev = "ragg_png",
31 | fig.align = "center",
32 | out.width = "70%"
33 | )
34 |
35 | library(tidyverse)
36 | library(babynames)
37 |
38 | source(here::here("R", "helpers.R"))
39 | ```
40 |
41 | `summarize()` takes a data frame and uses it to calculate a new data frame of summary statistics.
42 |
43 | ### Syntax
44 |
45 | To use `summarize()`, pass it a data frame and then one or more named arguments. Each named argument should be set to an R expression that generates a single value. Summarise will turn each named argument into a column in the new data frame. The name of each argument will become the column name, and the value returned by the argument will become the column contents.
46 |
47 | Importantly, the `summarize()` function is *destructive*. It collapses a dataset into a single row and throws away any columns that we don’t use when summarizing. Watch this little animation to see what it does:
48 |
49 | ```{=html}
50 |
51 |
52 |
53 | ```
54 |
55 | ### Example
56 |
57 | I used `summarize()` earlier to calculate the total number of boys named "Andrew", but let's expand that code to also calculate
58 |
59 | * `max`: the maximum number of boys named "Andrew" in a single year
60 | * `mean`: the mean number of boys named "Andrew" per year
61 |
62 | ```{r}
63 | babynames |>
64 | filter(name == "Andrew", sex == "M") |>
65 | summarize(total = sum(n), max = max(n), mean = mean(n))
66 | ```
67 |
68 | Don't let the code above fool you. The first argument of `summarize()` is always a data frame, but when you use `summarize()` in a pipe, the first argument is provided by the pipe operator, `|>`. Here the first argument will be the data frame that is returned by `babynames |> filter(name == "Andrew", sex == "M")`.
69 |
70 | ### Exercise: `summarize()`
71 |
72 | Use the code chunk below to compute three statistics:
73 |
74 | 1. the total number of children who ever had your name
75 | 1. the maximum number of children given your name in a single year
76 | 1. the mean number of children given your name per year
77 |
78 | If you cannot think of an R function that would compute each statistic, click the Solution tab.
79 |
80 | ::: {.panel-tabset}
81 | ## {{< fa code >}} Interactive editor
82 |
83 | ```{webr-r}
84 |
85 |
86 |
87 | ```
88 |
89 | ## {{< fa circle-check >}} Solution
90 |
91 | ```r
92 | babynames |>
93 | filter(name == "Andrew", sex == "M") |>
94 | summarize(total = sum(n), max = max(n), mean = mean(n))
95 | ```
96 |
97 | :::
98 |
99 |
100 | ### Summary functions
101 |
102 | So far our `summarize()` examples have relied on `sum()`, `max()`, and `mean()`. But you can use any function in `summarize()` so long as it meets one criteria: the function must take a _vector_ of values as input and return a _single_ value as output. Functions that do this are known as **summary functions** and they are common in the field of descriptive statistics. Some of the most useful summary functions include:
103 |
104 | 1. **Measures of location**: `mean(x)`, `median(x)`, `quantile(x, 0.25)`, `min(x)`, and `max(x)`
105 | 1. **Measures of spread**: `sd(x)`, `var(x)`, `IQR(x)`, and `mad(x)`
106 | 1. **Measures of position**: `first(x)`, `nth(x, 2)`, and `last(x)`
107 | 1. **Counts**: `n_distinct(x)` and `n()`, which takes no arguments, and returns the size of the current group or data frame.
108 | 1. **Counts and proportions of logical values**: `sum(!is.na(x))`, which counts the number of `TRUE`s returned by a logical test; `mean(y == 0)`, which returns the proportion of `TRUE`s returned by a logical test.
109 |
110 | Let's apply some of these summary functions. Click Continue to test your understanding.
111 |
112 | ### Khaleesi challenge
113 |
114 | "Khaleesi" is a very modern name that appears to be based on the _Game of Thrones_ TV series, which premiered on April 17, 2011. In the chunk below, filter `babynames` to just the rows where `name == "Khaleesi"`. Then use `summarize()` and a summary function to return the first value of `year` in the data set.
115 |
116 | ::: {.panel-tabset}
117 | ## {{< fa code >}} Interactive editor
118 |
119 | ```{webr-r}
120 |
121 |
122 |
123 | ```
124 |
125 | ## {{< fa circle-check >}} Solution
126 |
127 | ```r
128 | babynames |>
129 | filter(name == "Khaleesi") |>
130 | summarize(year = first(year))
131 | ```
132 |
133 | :::
134 |
135 |
136 | ### Distinct name challenge
137 |
138 | In the chunk below, use `summarize()` and a summary function to return a data frame with two columns:
139 |
140 | * A column named `n` that displays the total number of rows in `babynames`
141 | * A column named `distinct` that displays the number of distinct names in `babynames`
142 |
143 | Will these numbers be different? Why or why not?
144 |
145 | ::: {.panel-tabset}
146 | ## {{< fa code >}} Interactive editor
147 |
148 | ```{webr-r}
149 |
150 |
151 |
152 | ```
153 |
154 | ## {{< fa circle-check >}} Solution
155 |
156 | ```r
157 | babynames |>
158 | summarize(n = n(), distinct = n_distinct(name))
159 | ```
160 |
161 | :::
162 |
163 | ###
164 |
165 | Good job! The two numbers are different because most names appear in the data set more than once. They appear once for each year in which they were used.
166 |
167 | ### `summarize()` by groups?
168 |
169 | How can we apply `summarize()` to find the most popular names in `babynames`? You've seen how to calculate the total number of children that have your name, which provides one of our measures of popularity, i.e. the total number of children that have a name:
170 |
171 | ```{r eval=FALSE}
172 | babynames |>
173 | filter(name == "Andrew", sex == "M") |>
174 | summarize(total = sum(n))
175 | ```
176 |
177 | However, we had to isolate your name from the rest of your data to calculate this number. You could imagine writing a program that goes through each name one at a time and:
178 |
179 | 1. filters out the rows with just that name
180 | 2. applies summarize to the rows
181 |
182 | Eventually, the program could combine all of the results back into a single data set. However, you don't need to write such a program; this is the job of {dplyr}'s `group_by()` function.
183 |
184 | ##
185 |
186 | ```{r}
187 | #| echo: false
188 | #| results: asis
189 | create_buttons("03-group_by.html")
190 | ```
191 |
--------------------------------------------------------------------------------
/transform-data/03-deriving/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Deriving information with {dplyr}"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 | ---
11 |
12 | ```{r include=FALSE}
13 | knitr::opts_chunk$set(
14 | fig.width = 6,
15 | fig.height = 6 * 0.618,
16 | fig.retina = 3,
17 | dev = "ragg_png",
18 | fig.align = "center",
19 | out.width = "70%"
20 | )
21 |
22 | source(here::here("R", "helpers.R"))
23 | ```
24 |
25 | In this case study, you will identify the most popular American names from 1880 to 2015. While doing this, you will master three more dplyr functions:
26 |
27 | * `mutate()`, `group_by()`, and `summarize()`, which help you use your data to compute new variables and summary statistics
28 |
29 | These are some of the most useful R functions for data science, and this tutorial provides everything you need to learn them.
30 |
31 | This tutorial uses the [core tidyverse packages](http://tidyverse.org/), including {ggplot2}, {tibble}, and {dplyr}, as well as the `babynames` package. All of these packages have been pre-installed and pre-loaded for your convenience.
32 |
33 |
34 | ##
35 |
36 | ```{r}
37 | #| echo: false
38 | #| results: asis
39 | create_buttons("01-most-popular-names.html")
40 | ```
41 |
--------------------------------------------------------------------------------
/transform-data/03-deriving/video/grp-mutate.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/transform-data/03-deriving/video/grp-mutate.mp4
--------------------------------------------------------------------------------
/transform-data/03-deriving/video/grp-summarize-00.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/transform-data/03-deriving/video/grp-summarize-00.mp4
--------------------------------------------------------------------------------
/transform-data/03-deriving/video/grp-summarize-01.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/transform-data/03-deriving/video/grp-summarize-01.mp4
--------------------------------------------------------------------------------
/transform-data/03-deriving/video/grp-summarize-02.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/transform-data/03-deriving/video/grp-summarize-02.mp4
--------------------------------------------------------------------------------
/transform-data/03-deriving/video/grp-summarize-03.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/transform-data/03-deriving/video/grp-summarize-03.mp4
--------------------------------------------------------------------------------
/transform-data/03-deriving/video/mutate.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/transform-data/03-deriving/video/mutate.mp4
--------------------------------------------------------------------------------
/visualize-data/01-eda/03-covariation.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Covariation"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | ---
12 |
13 | ```{r include=FALSE}
14 | knitr::opts_chunk$set(
15 | fig.width = 6,
16 | fig.height = 6 * 0.618,
17 | fig.retina = 3,
18 | dev = "ragg_png",
19 | fig.align = "center",
20 | out.width = "70%"
21 | )
22 |
23 | library(tidyverse)
24 | library(checkdown)
25 |
26 | source(here::here("R", "helpers.R"))
27 | ```
28 |
29 |
30 | ### What is covariation? {.no-hide}
31 |
32 | If variation describes the behavior _within_ a variable, covariation describes the behavior _between_ variables. **Covariation** is the tendency for the values of two or more variables to vary together in a related way. The best way to spot covariation is to visualise the relationship between two or more variables. How you do that should again depend on whether your variables are categorical or continuous.
33 |
34 | ### Two categorical variables
35 |
36 | You can plot the relationship between two categorical variables with a heatmap or with `geom_count()`:
37 |
38 | ```{r echo=FALSE, out.width="100%"}
39 | #| layout-ncol: 2
40 | diamonds |>
41 | count(color, cut) |>
42 | ggplot(mapping = aes(x = color, y = cut)) +
43 | geom_tile(mapping = aes(fill = n)) +
44 | labs(title = "Color grade vs. cut quality for 53940 diamonds")
45 |
46 | ggplot(diamonds) +
47 | geom_count(aes(color, cut)) +
48 | labs(title = "Color grade vs. cut quality for 53940 diamonds")
49 | ```
50 |
51 | Again, don't be concerned if you do not know how to make these graphs. For now, let's focus on the strategy of how to use visualizations in EDA. You'll learn how to make different types of plots in the tutorials that follow.
52 |
53 | ### One continuous and one categorical variable
54 |
55 | You can plot the relationship between one continuous and one categorical variable with a boxplot:
56 |
57 | ```{r echo=FALSE, out.width="80%"}
58 | ggplot(mpg) +
59 | geom_boxplot(aes(reorder(class, hwy, median), hwy)) +
60 | labs(title = "Pickup trucks and SUVs display the lowest fuel efficiency") +
61 | labs(x = "class")
62 | ```
63 |
64 | ### Two continuous variables
65 |
66 | You can plot the relationship between two continuous variables with a scatterplot:
67 |
68 | ```{r echo=FALSE, message=FALSE, out.width="80%"}
69 | ggplot(data = faithful) +
70 | geom_point(aes(x = eruptions, y = waiting)) +
71 | labs(title = "Length of eruption vs wait time before eruption")
72 | ```
73 |
74 | ### Patterns
75 |
76 | Patterns in your data provide clues about relationships. If a systematic relationship exists between two variables it will appear as a pattern in the data. If you spot a pattern, ask yourself:
77 |
78 | + Could this pattern be due to coincidence (i.e. random chance)?
79 |
80 | + How can you describe the relationship implied by the pattern?
81 |
82 | + How strong is the relationship implied by the pattern?
83 |
84 | + What other variables might affect the relationship?
85 |
86 | + Does the relationship change if you look at individual subgroups of the data?
87 |
88 | Remember that clusters and outliers are also a type of pattern. Two dimensional plots can reveal clusters and outliers that would not be visible in a one dimensional plot. If you spot either, ask yourself what they imply.
89 |
90 | ### Review 6: Patterns
91 |
92 | The scatterplot below shows the relationship between the length of an eruption of Old Faithful and the wait time before the eruption (i.e. the amount of time that passed between it and the previous eruption).
93 |
94 | ```{r echo=FALSE, message=FALSE, out.width="80%"}
95 | ggplot(data = faithful) +
96 | geom_point(aes(x = eruptions, y = waiting)) +
97 | labs(title = "Length of eruption vs wait time before eruption")
98 | ```
99 |
100 | ::: {.callout-note appearance="simple" icon=false .question}
101 |
102 | **Does the scatterplot above reveal a pattern that helps to explain the variation in lengths of Old Faithful eruptions?**
103 |
104 | ```{r echo=FALSE}
105 | check_question(
106 | answer = "Yes. Long eruptions are associated with a _long_ wait before the eruption",
107 | options = c(
108 | "No. There is no pattern.",
109 | "Yes. Long eruptions are associated with a _short_ wait before the eruption",
110 | "Yes. Long eruptions are associated with a _long_ wait before the eruption"
111 | ),
112 | type = "radio",
113 | button_label = "Submit answer",
114 | q_id = 1,
115 | right = c("Correct! The data seems to suggest that a long build up before an eruption is associated with a long eruption. The plot also shows the two clusters that we saw before: there are long eruptions with a long build up and short eruptions with a short build up.")
116 | )
117 | ```
118 | :::
119 |
120 |
121 | ### Uncertainty
122 |
123 | Patterns provide a useful tool for data scientists because they reveal covariation. If you think of variation as a phenomenon that creates uncertainty, covariation is a phenomenon that reduces it. When two variables covary, you can use the values of one variable to make better predictions about the values of the second. If the covariation is due to a causal relationship (a special case), you can use the value of one variable to control the value of the second.
124 |
125 |
126 | ### Recap
127 |
128 | You've learned a lot in this tutorial. Here's what you should keep with you:
129 |
130 | * EDA is an iterative cycle built around asking and refining questions.
131 | * These two questions are always useful:
132 | 1. What type of variation occurs _within_ my variables?
133 | 1. What type of covariation occurs _between_ my variables?
134 | * Remember the definitions of _variables_, _values_, _observations_, _variation_, _covariation_, _categorical_, and _continuous_. You'll see them again. Frequently.
135 |
136 | Throughout the tutorial, you also encountered several recommendations for plots that visualize variation and covariation for categorical and continuous variables. Plots are a bit like questions in EDA: you should make many quickly and try anything that strikes your fancy. You can refine your plots later to share with others. A lot of refinement will occur naturally as you iterate during EDA.
137 |
138 | The suggestions below can serve as starting point for visualizing data. In the tutorials that follow, you will learn how to make each type of plot, as well as how to use best practices and advanced skills when visualizing data.
139 |
140 | {width=80%}
141 |
142 | ##
143 |
144 | ```{r}
145 | #| echo: false
146 | #| results: asis
147 | create_buttons(NULL)
148 | ```
149 |
--------------------------------------------------------------------------------
/visualize-data/01-eda/img/plots-table.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/visualize-data/01-eda/img/plots-table.png
--------------------------------------------------------------------------------
/visualize-data/01-eda/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Exploratory data analysis"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 | ---
11 |
12 | ```{r include=FALSE}
13 | knitr::opts_chunk$set(
14 | fig.width = 6,
15 | fig.height = 6 * 0.618,
16 | fig.retina = 3,
17 | dev = "ragg_png",
18 | fig.align = "center",
19 | out.width = "70%"
20 | )
21 |
22 | source(here::here("R", "helpers.R"))
23 | ```
24 |
25 |
26 | This tutorial will show you how to explore your data in a systematic way, a task that statisticians call **exploratory data analysis**, or **EDA** for short. In the tutorial you will:
27 |
28 | * Learn a strategy for exploring data
29 | * Practice finding patterns in data
30 | * Get tips about how to use different types of plots to explore data
31 |
32 | The tutorial is excerpted from _R for Data Science_ by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. You can purchase the book at [shop.oreilly.com](http://shop.oreilly.com/product/0636920034407.do).
33 |
34 |
35 | ##
36 |
37 | ```{r}
38 | #| echo: false
39 | #| results: asis
40 | create_buttons("01-eda.html")
41 | ```
42 |
--------------------------------------------------------------------------------
/visualize-data/02-bar-charts/01-bar-charts.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Bar charts"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 |
12 | engine: knitr
13 | filters:
14 | - webr
15 | webr:
16 | packages:
17 | - ggplot2
18 | - dplyr
19 | cell-options:
20 | editor-font-scale: 0.85
21 | fig-width: 6
22 | fig-height: 3.7
23 | out-width: "70%"
24 | ---
25 |
26 | ```{r include=FALSE}
27 | knitr::opts_chunk$set(
28 | fig.width = 6,
29 | fig.height = 6 * 0.618,
30 | fig.retina = 3,
31 | dev = "ragg_png",
32 | fig.align = "center",
33 | out.width = "70%"
34 | )
35 |
36 | library(tidyverse)
37 | library(checkdown)
38 |
39 | source(here::here("R", "helpers.R"))
40 | ```
41 |
42 | ### How to make a bar chart {.no-hide}
43 |
44 | To make a bar chart with {ggplot2}, add `geom_bar()` to the [ggplot2 template](/basics/01-visualization-basics/01-code-template.qmd). For example, the code below plots a bar chart of the `cut` variable in the `diamonds` dataset, which comes with {ggplot2}.
45 |
46 | ```{r out.width="80%"}
47 | ggplot(data = diamonds) +
48 | geom_bar(mapping = aes(x = cut))
49 | ```
50 |
51 | ### The y axis
52 |
53 | You should not supply a $y$ aesthetic when you use `geom_bar()`; {ggplot2} will count how many times each $x$ value appears in the data, and then display the counts on the $y$ axis. So, for example, the plot above shows that over 20,000 diamonds in the data set had a value of `Ideal`.
54 |
55 | You can compute this information manually with the `count()` function from the {dplyr} package.
56 |
57 | ```{r}
58 | diamonds |>
59 | count(cut)
60 | ```
61 |
62 | ### `geom_col()`
63 |
64 | Sometimes, you may want to map the heights of the bars not to counts, but to a variable in the data set. To do this, use `geom_col()`, which is short for column.
65 |
66 | ```{r out.width="80%"}
67 | ggplot(data = pressure) +
68 | geom_col(mapping = aes(x = temperature, y = pressure))
69 | ```
70 |
71 | ### `geom_col()` data
72 |
73 | When you use `geom_col()`, your $x$ and $y$ values should have a one to one relationship, as they do in the `pressure` data set (i.e. each value of `temperature` is paired with a single value of `pressure`).
74 |
75 | ```{r}
76 | pressure
77 | ```
78 |
79 | ### Exercise 1: Make a bar chart
80 |
81 | Use the code chunk below to plot the distribution of the `color` variable in the `diamonds` data set, which comes in the {ggplot2} package.
82 |
83 | ::: {.panel-tabset}
84 | ## {{< fa code >}} Interactive editor
85 |
86 | ```{webr-r}
87 |
88 |
89 |
90 | ```
91 |
92 | ## {{< fa circle-check >}} Solution
93 |
94 | ```r
95 | ggplot(data = diamonds) +
96 | geom_bar(mapping = aes(x = color))
97 | ```
98 |
99 | :::
100 |
101 |
102 | ### Exercise 2: Interpretation
103 |
104 | ```{r out.width="80%", echo=FALSE}
105 | ggplot(data = diamonds) +
106 | geom_bar(mapping = aes(x = cut)) +
107 | labs(title = "Distribution of diamond cuts")
108 | ```
109 |
110 | ::: {.callout-note appearance="simple" icon=false .question}
111 |
112 | **What is the most common type of cut in the `diamonds` dataset?**
113 |
114 | ```{r echo=FALSE}
115 | check_question(
116 | answer = "Ideal",
117 | options = c(
118 | "Fair",
119 | "Good",
120 | "Very Good",
121 | "Premium",
122 | "Ideal"
123 | ),
124 | type = "radio",
125 | button_label = "Submit answer",
126 | q_id = 1,
127 | right = c("Correct!")
128 | )
129 | ```
130 | :::
131 |
132 | ::: {.callout-note appearance="simple" icon=false .question}
133 |
134 | **How many diamonds in the dataset had a `Good` cut?**
135 |
136 | ```{r echo=FALSE}
137 | check_question(
138 | answer = "≈5000",
139 | options = c(
140 | "≈2000",
141 | "≈5000",
142 | "≈7000",
143 | "≈20000"
144 | ),
145 | type = "radio",
146 | button_label = "Submit answer",
147 | q_id = 2,
148 | right = c("Correct!")
149 | )
150 | ```
151 | :::
152 |
153 |
154 | ### Exercise 3: What went wrong?
155 |
156 | Diagnose the error below and then fix the code chunk to make a plot.
157 |
158 | ::: {.panel-tabset}
159 | ## {{< fa code >}} Interactive editor
160 |
161 | ```{webr-r}
162 | ggplot(data = pressure) +
163 | geom_bar(mapping = aes(x = temperature, y = pressure))
164 |
165 |
166 | ```
167 |
168 | ## {{< fa circle-check >}} Solution
169 |
170 | ```r
171 | ggplot(data = pressure) +
172 | geom_col(mapping = aes(x = temperature, y = pressure))
173 | ```
174 |
175 | :::
176 |
177 |
178 | ### Exercise 4: `count()` and `geom_col()`
179 |
180 | Recreate the bar graph of `color` from exercise one, but this time first use `count()` to manually compute the heights of the bars. Then use `geom_col()` to plot the results as a bar graph. Does your graph look the same as in exercise one?
181 |
182 | ::: {.panel-tabset}
183 | ## {{< fa code >}} Interactive editor
184 |
185 | ```{webr-r}
186 |
187 |
188 |
189 | ```
190 |
191 | ## {{< fa circle-check >}} Solution
192 |
193 | ```r
194 | diamonds |>
195 | count(color) |>
196 | ggplot() +
197 | geom_col(mapping = aes(x = color, y = n))
198 | ```
199 |
200 | :::
201 |
202 |
203 | ##
204 |
205 | ```{r}
206 | #| echo: false
207 | #| results: asis
208 | create_buttons("02-aesthetics.html")
209 | ```
210 |
--------------------------------------------------------------------------------
/visualize-data/02-bar-charts/02-aesthetics.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Aesthetics"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 |
12 | engine: knitr
13 | filters:
14 | - webr
15 | webr:
16 | packages:
17 | - ggplot2
18 | - dplyr
19 | cell-options:
20 | editor-font-scale: 0.85
21 | fig-width: 6
22 | fig-height: 3.7
23 | out-width: "70%"
24 | ---
25 |
26 | ```{r include=FALSE}
27 | knitr::opts_chunk$set(
28 | fig.width = 6,
29 | fig.height = 6 * 0.618,
30 | fig.retina = 3,
31 | dev = "ragg_png",
32 | fig.align = "center",
33 | out.width = "70%"
34 | )
35 |
36 | library(tidyverse)
37 | library(checkdown)
38 |
39 | source(here::here("R", "helpers.R"))
40 | ```
41 |
42 |
43 | ### Aesthetics for bars {.no-hide}
44 |
45 | `geom_bar()` and `geom_col()` can use several aesthetics:
46 |
47 | * `alpha`
48 | * `color`
49 | * `fill`
50 | * `linetype`
51 | * `size`
52 |
53 | One of these, `color`, creates the most surprising results. Predict what the code below will return and then run it.
54 |
55 | ::: {.panel-tabset}
56 | ## {{< fa code >}} Interactive editor
57 |
58 | ```{webr-r}
59 | ggplot(data = diamonds) +
60 | geom_bar(mapping = aes(x = cut, color = cut))
61 |
62 |
63 | ```
64 |
65 | :::
66 |
67 | ### `fill`
68 |
69 | The `color` aesthetic controls the outline of each bar in your bar plot, which may not be what you want. To color the interior of each bar, use the `fill` aesthetic:
70 |
71 | ```{r echo=FALSE, out.width="100%"}
72 | #| layout-ncol: 2
73 | ggplot(data = diamonds) +
74 | geom_bar(mapping = aes(x = cut, color = cut), linewidth = 1) +
75 | labs(title = "color = cut")
76 |
77 | ggplot(data = diamonds) +
78 | geom_bar(mapping = aes(x = cut, fill = cut)) +
79 | labs(title = "fill = cut")
80 | ```
81 |
82 | Use the code chunk below to experiment with fill, along with other `geom_bar()` aesthetics, like `alpha`, `linetype`, and `size`.
83 |
84 | ::: {.panel-tabset}
85 | ## {{< fa code >}} Interactive editor
86 |
87 | ```{webr-r}
88 | ggplot(data = diamonds) +
89 | geom_bar(mapping = aes(x = cut, color = cut))
90 |
91 |
92 | ```
93 |
94 | :::
95 |
96 |
97 | ### Width
98 |
99 | You can control the width of each bar in your bar chart with the `width` parameter. In the chunk below, set `width = 1`, then `width = 0.5`. Can you spot the difference?
100 |
101 | ::: {.panel-tabset}
102 | ## {{< fa code >}} Interactive editor
103 |
104 | ```{webr-r}
105 | ggplot(data = diamonds) +
106 | geom_bar(mapping = aes(x = cut, fill = cut), width = 0.9)
107 |
108 |
109 | ```
110 |
111 | :::
112 |
113 | Notice that width is a _parameter_, not an aesthetic mapping. Hence, you should set width _outside_ of the `aes()` function.
114 |
115 | ### Exercise 5: Aesthetics
116 |
117 | Create a colored bar chart of the `class` variable from the `mpg` data set, which comes with ggplot2. Map the interior color of each bar to `class`.
118 |
119 | ::: {.panel-tabset}
120 | ## {{< fa code >}} Interactive editor
121 |
122 | ```{webr-r}
123 |
124 |
125 |
126 | ```
127 |
128 | ## {{< fa circle-check >}} Solution
129 |
130 | ```r
131 | ggplot(data = mpg) +
132 | geom_bar(mapping = aes(x = class, fill = class))
133 | ```
134 |
135 | :::
136 |
137 |
138 | ##
139 |
140 | ```{r}
141 | #| echo: false
142 | #| results: asis
143 | create_buttons("03-position-adjustments.html")
144 | ```
145 |
--------------------------------------------------------------------------------
/visualize-data/02-bar-charts/04-facets.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Facets"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 |
12 | engine: knitr
13 | filters:
14 | - webr
15 | webr:
16 | packages:
17 | - ggplot2
18 | cell-options:
19 | editor-font-scale: 0.85
20 | fig-width: 6
21 | fig-height: 3.7
22 | out-width: "70%"
23 | ---
24 |
25 | ```{r include=FALSE}
26 | knitr::opts_chunk$set(
27 | fig.width = 6,
28 | fig.height = 6 * 0.618,
29 | fig.retina = 3,
30 | dev = "ragg_png",
31 | fig.align = "center",
32 | out.width = "70%"
33 | )
34 |
35 | library(tidyverse)
36 | library(checkdown)
37 |
38 | source(here::here("R", "helpers.R"))
39 | ```
40 |
41 | ### Facetting {.no-hide}
42 |
43 | You can more easily compare subgroups of data if you place each subgroup in its own subplot, a process known as **facetting.**
44 |
45 | ```{r echo=FALSE}
46 | ggplot(data = diamonds) +
47 | geom_bar(mapping = aes(x = color, fill = cut)) +
48 | facet_wrap(vars(cut))
49 | ```
50 |
51 | ### `facet_grid()`
52 |
53 | {ggplot2} provides two functions for facetting. `facet_grid()` divides the plot into a grid of subplots based on the values of one or two facetting variables. To use it, add `facet_grid()` to the end of your plot call.
54 |
55 | The code chunks below show three ways to facet with `facet_grid()`. Spot the differences between the chunks, then run the code to learn what the differences do.
56 |
57 | ::: {.panel-tabset}
58 | ## {{< fa code >}} Interactive editor
59 |
60 | ```{webr-r}
61 | ggplot(data = diamonds) +
62 | geom_bar(mapping = aes(x = color)) +
63 | facet_grid(rows = vars(clarity), cols = vars(cut))
64 |
65 |
66 | ```
67 |
68 | :::
69 |
70 | ::: {.panel-tabset}
71 | ## {{< fa code >}} Interactive editor
72 |
73 | ```{webr-r}
74 | ggplot(data = diamonds) +
75 | geom_bar(mapping = aes(x = color)) +
76 | facet_grid(cols = vars(cut))
77 |
78 |
79 | ```
80 |
81 | :::
82 |
83 | ::: {.panel-tabset}
84 | ## {{< fa code >}} Interactive editor
85 |
86 | ```{webr-r}
87 | ggplot(data = diamonds) +
88 | geom_bar(mapping = aes(x = color)) +
89 | facet_grid(rows = vars(clarity))
90 |
91 |
92 | ```
93 |
94 | :::
95 |
96 |
97 |
98 | ### `facet_grid()` recap
99 |
100 | As you saw in the code examples, you use `facet_grid()` by passing a `rows` and/or a `cols` argument, with the names of the variables inside a `vars()` function.
101 |
102 | * `facet_grid()` will split the plot into facets vertically by the values of the `rows` variable: each facet will contain the observations that have a common value of the variable.
103 | * `facet_grid()` will split the plot horizontally by values of the `cols` variable. The result is a grid of facets, where each specific subplot shows a specific combination of values.
104 |
105 |
106 | ### `facet_wrap()`
107 |
108 | `facet_wrap()` provides a more relaxed way to facet a plot on a _single_ variable. It will split the plot into subplots and then reorganize the subplots into multiple rows so that each plot has a more or less square aspect ratio. In short, `facet_wrap()` _wraps_ the single row of subplots that you would get with `facet_grid()` into multiple rows.
109 |
110 | To use `facet_wrap()` pass it a variable name inside `vars()`, e.g. `facet_wrap(vars(color))`.
111 |
112 | Add `facet_wrap()` to the code below to create the graph that appeared at the start of this section. Facet by `cut`.
113 |
114 | ::: {.panel-tabset}
115 | ## {{< fa code >}} Interactive editor
116 |
117 | ```{webr-r}
118 | ggplot(data = diamonds) +
119 | geom_bar(mapping = aes(x = color, fill = cut))
120 |
121 |
122 | ```
123 |
124 | ## {{< fa circle-check >}} Solution
125 |
126 | ```r
127 | ggplot(data = diamonds) +
128 | geom_bar(mapping = aes(x = color, fill = cut)) +
129 | facet_wrap(vars(cut))
130 | ```
131 |
132 | :::
133 |
134 |
135 | ### `scales`
136 |
137 | By default, each facet in your plot will share the same $x$ and $y$ ranges. You can change this by adding a `scales` argument to `facet_wrap()` or `facet_grid()`.
138 |
139 | * `scales = "free"` will let the $x$ and $y$ range of each facet vary
140 | * `scales = "free_x"` will let the $x$ range of each facet vary, but not the $y$ range
141 | * `scales = "free_y"` will let the $y$ range of each facet vary, but not the $x$ range. This is a convenient way to compare the shapes of different distributions
142 |
143 | Try changing the `scales` argument from `free` to `free_x` to `free_y` to see how it works:
144 |
145 | ::: {.panel-tabset}
146 | ## {{< fa code >}} Interactive editor
147 |
148 | ```{webr-r}
149 | ggplot(data = diamonds) +
150 | geom_bar(mapping = aes(x = color, fill = cut)) +
151 | facet_wrap(vars(cut), scales = "free")
152 |
153 |
154 | ```
155 |
156 | :::
157 |
158 |
159 |
160 | ### Recap
161 |
162 | In this tutorial, you learned how to make bar charts; but much of what you learned applies to other types of charts as well. Here's what you should know:
163 |
164 | * Bar charts are the basis for histograms, which means that you can interpret histograms in a similar way.
165 | * Bars are not the only geom in {ggplot2} that use the fill aesthetic. You can use both fill and color aesthetics with any geom that has an "interior" region.
166 | * You can use the same position adjustments with any {ggplot2} geom: `"identity"`, `"stack"`, `"dodge"`, `"fill"`, `"nudge"`, and `"jitter"` (we'll learn about `"nudge"` and `"jitter"` later). Each geom comes with its own sensible default.
167 | * You can facet any {ggplot2} plot by adding `facet_grid()` or `facet_wrap()` to the plot call.
168 |
169 | Bar charts are an excellent way to display the distribution of a categorical variable. In the next tutorial, we'll meet a set of geoms that display the distribution of a continuous variable.
170 |
171 |
172 | ##
173 |
174 | ```{r}
175 | #| echo: false
176 | #| results: asis
177 | create_buttons(NULL)
178 | ```
179 |
--------------------------------------------------------------------------------
/visualize-data/02-bar-charts/img/positions.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/visualize-data/02-bar-charts/img/positions.png
--------------------------------------------------------------------------------
/visualize-data/02-bar-charts/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Bar charts"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 | ---
11 |
12 | ```{r include=FALSE}
13 | knitr::opts_chunk$set(
14 | fig.width = 6,
15 | fig.height = 6 * 0.618,
16 | fig.retina = 3,
17 | dev = "ragg_png",
18 | fig.align = "center",
19 | out.width = "70%"
20 | )
21 |
22 | source(here::here("R", "helpers.R"))
23 | ```
24 |
25 |
26 | This tutorial will show you how to make and enhance **bar charts** with the {ggplot2} package. You will learn how to:
27 |
28 | * make and interpret bar charts
29 | * customize bar charts with **aesthetics** and **parameters**
30 | * use **position adjustments**
31 | * use **facets** to create subplots
32 |
33 | The tutorial is adapted from [_R for Data Science_](https://r4ds.had.co.nz/) by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. You can purchase the book at [shop.oreilly.com](http://shop.oreilly.com/product/0636920034407.do).
34 |
35 | The tutorial uses the {ggplot2} and {dplyr} packages, which have been pre-loaded for your convenience.
36 |
37 |
38 | ##
39 |
40 | ```{r}
41 | #| echo: false
42 | #| results: asis
43 | create_buttons("01-bar-charts.html")
44 | ```
45 |
--------------------------------------------------------------------------------
/visualize-data/03-histograms/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Histograms"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 | ---
11 |
12 | ```{r include=FALSE}
13 | knitr::opts_chunk$set(
14 | fig.width = 6,
15 | fig.height = 6 * 0.618,
16 | fig.retina = 3,
17 | dev = "ragg_png",
18 | fig.align = "center",
19 | out.width = "70%"
20 | )
21 |
22 | source(here::here("R", "helpers.R"))
23 | ```
24 |
25 | **Histograms** are the most popular way to visualize continuous distributions. Here we will look at them and their derivatives. You will learn how to:
26 |
27 | * Make and interpret histograms
28 | * Adjust the **binwidth** of a histogram to reveal new information
29 | * Use geoms that are similar to histograms, such as __dotplots__, __frequency polygons__, and __densities__
30 |
31 | The tutorial is adapted from [_R for Data Science_](https://r4ds.had.co.nz/) by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. You can purchase the book at [shop.oreilly.com](http://shop.oreilly.com/product/0636920034407.do).
32 |
33 | The tutorial uses the {ggplot2} and {dplyr} packages, which have been pre-loaded for your convenience.
34 |
35 |
36 | ##
37 |
38 | ```{r}
39 | #| echo: false
40 | #| results: asis
41 | create_buttons("01-histograms.html")
42 | ```
43 |
--------------------------------------------------------------------------------
/visualize-data/04-boxplots/02-similar-geoms.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Similar geoms"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | engine: knitr
12 | filters:
13 | - webr
14 | webr:
15 | packages:
16 | - ggplot2
17 | cell-options:
18 | editor-font-scale: 0.85
19 | fig-width: 6
20 | fig-height: 3.7
21 | out-width: "70%"
22 | ---
23 |
24 | ```{r include=FALSE}
25 | knitr::opts_chunk$set(
26 | fig.width = 6,
27 | fig.height = 6 * 0.618,
28 | fig.retina = 3,
29 | dev = "ragg_png",
30 | fig.align = "center",
31 | out.width = "70%"
32 | )
33 |
34 | library(tidyverse)
35 |
36 | source(here::here("R", "helpers.R"))
37 | ```
38 |
39 | ### `geom_dotplot()` {.no-hide}
40 |
41 | Boxplots provide a quick way to represent a distribution, but they leave behind a lot of information. {ggplot2} supplements boxplots with two geoms that show more information.
42 |
43 | The first is `geom_dotplot()`. If you set the `binaxis` parameter of `geom_dotplot()` to `"y"`, `geom_dotplot()` behaves like `geom_boxplot()`, display a separate distribution for each group of data.
44 |
45 | Here each group functions like a vertical histogram. Add the parameter `stackdir = "center"` then re-run the code. Can you interpret the results?
46 |
47 | ::: {.panel-tabset}
48 | ## {{< fa code >}} Interactive editor
49 |
50 | ```{webr-r}
51 | ggplot(data = mpg) +
52 | geom_dotplot(mapping = aes(x = class, y = hwy), binaxis = "y",
53 | dotsize = 0.5, binwidth = 1)
54 |
55 |
56 | ```
57 |
58 |
59 | ## {{< fa circle-check >}} Solution
60 |
61 | ```r
62 | ggplot(data = mpg) +
63 | geom_dotplot(mapping = aes(x = class, y = hwy), binaxis = "y",
64 | dotsize = 0.5, binwidth = 1, stackdir = "center")
65 | ```
66 |
67 | :::
68 |
69 | ###
70 |
71 | Good job! When you set `stackdir = "center"`, `geom_dotplot()` arranges each row of dots symmetrically around the $x$ value. This layout will help you understand the next geom.
72 |
73 | As in the histogram tutorial, it takes a lot of tweaking to make a dotplot look right. As a result, I tend to only use them when I want to make a point.
74 |
75 |
76 | ### `geom_violin()`
77 |
78 | `geom_violin()` provides a second alternative to `geom_boxplot()`. A violin plot uses densities to draw a smoothed version of the centered dotplot you just made.
79 |
80 | You can think of a violin plot as an outline drawn around the edges of a centered dotplot. Each "violin" spans the range of the data. The violin is thick where there are many values, and thin where there are few.
81 |
82 | Convert the plot below from a boxplot to a violin plot. Note that violin plots do not use the parameters you saw for dotplots.
83 |
84 | ::: {.panel-tabset}
85 | ## {{< fa code >}} Interactive editor
86 |
87 | ```{webr-r}
88 | ggplot(data = mpg) +
89 | geom_boxplot(mapping = aes(x = class, y = hwy))
90 |
91 |
92 | ```
93 |
94 | ## {{< fa circle-check >}} Solution
95 |
96 | ```r
97 | ggplot(data = mpg) +
98 | geom_violin(mapping = aes(x = class, y = hwy))
99 | ```
100 |
101 | :::
102 |
103 | ###
104 |
105 | 'Good job! Another way to interpret a violin plot is to mentally "push" the width of each violin all to one side (so the other side is a straight line). The result would be a density (e.g. `geom_density()`) turned on its side for each distribution).
106 |
107 | ### Exercise 7: Violin plots
108 |
109 | You can further enhance violin plots by adding the parameter `draw_quantiles = c(0.25, 0.5, 0.75)`. This will cause ggplot2 to draw horizontal lines across the violins at the 25th, 50th, and 75th percentiles. These are the same three horizontal lines that are displayed in a boxplot (the 25th and 75th percentiles are the bounds of the box, the 50th percentile is the median).
110 |
111 | Add these lines to the violin plot below.
112 |
113 | ::: {.panel-tabset}
114 | ## {{< fa code >}} Interactive editor
115 |
116 | ```{webr-r}
117 | ggplot(data = mpg) +
118 | geom_violin(mapping = aes(x = class, y = hwy))
119 |
120 |
121 | ```
122 |
123 | ## {{< fa circle-check >}} Solution
124 |
125 | ```r
126 | ggplot(data = mpg) +
127 | geom_violin(mapping = aes(x = class, y = hwy), draw_quantiles = c(0.25, 0.5, 0.75))
128 | ```
129 |
130 | :::
131 |
132 | ###
133 |
134 | Good job! Can you predict how you would use `draw_quantiles` to draw a horizontal line at a different percentile, like the 60th percentile?.
135 |
136 |
137 | ##
138 |
139 | ```{r}
140 | #| echo: false
141 | #| results: asis
142 | create_buttons("03-counts.html")
143 | ```
144 |
--------------------------------------------------------------------------------
/visualize-data/04-boxplots/03-counts.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Counts"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | engine: knitr
12 | filters:
13 | - webr
14 | webr:
15 | packages:
16 | - ggplot2
17 | - dplyr
18 | cell-options:
19 | editor-font-scale: 0.85
20 | fig-width: 6
21 | fig-height: 3.7
22 | out-width: "70%"
23 | ---
24 |
25 | ```{r include=FALSE}
26 | knitr::opts_chunk$set(
27 | fig.width = 6,
28 | fig.height = 6 * 0.618,
29 | fig.retina = 3,
30 | dev = "ragg_png",
31 | fig.align = "center",
32 | out.width = "70%"
33 | )
34 |
35 | library(tidyverse)
36 |
37 | source(here::here("R", "helpers.R"))
38 | ```
39 |
40 | ### `geom_count()` {.no-hide}
41 |
42 | Boxplots provide an efficient way to explore the interaction of a continuous variable and a categorical variable. But what if you have two categorical variables?
43 |
44 | You can see how observations are distributed across two categorical variables with `geom_count()`. `geom_count()` draws a point at each combination of values from the two variables. The size of the point is mapped to the number of observations with this combination of values. Rare combinations will have small points, frequent combinations will have large points.
45 |
46 | ```{r out.width="80%", echo=FALSE, message=FALSE}
47 | ggplot(data = diamonds) +
48 | geom_count(mapping = aes(x = color, y = clarity))
49 | ```
50 |
51 | ### Exercise 8: Count plots
52 |
53 | Use `geom_count()` to plot the interaction of the `cut` and `clarity` variables in the `diamonds` data set.
54 |
55 | ::: {.panel-tabset}
56 | ## {{< fa code >}} Interactive editor
57 |
58 | ```{webr-r}
59 |
60 |
61 |
62 | ```
63 |
64 | ## {{< fa circle-check >}} Solution
65 |
66 | ```r
67 | ggplot(data = diamonds) +
68 | geom_count(mapping = aes(x = cut, y = clarity))
69 | ```
70 |
71 | :::
72 |
73 |
74 | ### `count()`
75 |
76 | You can use the `count()` function in the {dplyr} package to compute the count values displayed by `geom_count()`. To use `count()`, pass it a data frame and then the names of zero or more variables in the data frame. `count()` will return a new table that lists how many observations occur with each possible combination of the listed variables.
77 |
78 | So for example, the code below returns the counts that you visualized in Exercise 8.
79 |
80 | ```{r}
81 | diamonds |>
82 | count(cut, clarity)
83 | ```
84 |
85 | ### Heat maps
86 |
87 | Heat maps provide a second way to visualize the relationship between two categorical variables. They work like count plots, but use a fill color instead of a point size, to display the number of observations in each combination.
88 |
89 | ### How to make a heat map
90 |
91 | {ggplot2} does not provide a geom function for heat maps, but you can construct a heat map by plotting the results of `count()` with `geom_tile()`.
92 |
93 | To do this, set the x and y aesthetics of `geom_tile()` to the variables that you pass to `count()`. Then map the fill aesthetic to the `n` variable computed by `count()`. The plot below displays the same counts as the plot in Exercise 8.
94 |
95 | ```{r out.width="80%"}
96 | diamonds |>
97 | count(cut, clarity) |>
98 | ggplot() +
99 | geom_tile(mapping = aes(x = cut, y = clarity, fill = n))
100 | ```
101 |
102 | ### Exercise 9: Make a heat map
103 |
104 | Practice the method above by re-creating the heat map below.
105 |
106 | ```{r echo=FALSE, out.width="80%"}
107 | diamonds |>
108 | count(color, cut) |>
109 | ggplot(mapping = aes(x = color, y = cut)) +
110 | geom_tile(mapping = aes(fill = n))
111 | ```
112 |
113 | ::: {.panel-tabset}
114 | ## {{< fa code >}} Interactive editor
115 |
116 | ```{webr-r}
117 |
118 |
119 |
120 | ```
121 |
122 | ## {{< fa circle-check >}} Solution
123 |
124 | ```r
125 | diamonds |>
126 | count(color, cut) |>
127 | ggplot(mapping = aes(x = color, y = cut)) +
128 | geom_tile(mapping = aes(fill = n))
129 | ```
130 |
131 | :::
132 |
133 | ###
134 |
135 | Good job!
136 |
137 | ### Recap
138 |
139 | Boxplots, dotplots and violin plots provide an easy way to look for relationships between a continuous variable and a categorical variable. Violin plots convey a lot of information quickly, but boxplots have a head start in popularity---they were easy to use when statisticians had to draw graphs by hand.
140 |
141 | In any of these graphs, look for distributions, ranges, medians, skewness or anything else that catches your eye to change in an unusual way from distribution to distribution. Often, you can make patterns even more revealing with the `fct_reorder()` function from the {forcats} package (we'll wait to learn about {forcats} until after you study factors).
142 |
143 | Count plots and heat maps help you see how observations are distributed across the interactions of two categorical variables.
144 |
145 | ##
146 |
147 | ```{r}
148 | #| echo: false
149 | #| results: asis
150 | create_buttons(NULL)
151 | ```
152 |
--------------------------------------------------------------------------------
/visualize-data/04-boxplots/img/box-png.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/visualize-data/04-boxplots/img/box-png.png
--------------------------------------------------------------------------------
/visualize-data/04-boxplots/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Boxplots and counts"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 | ---
11 |
12 | ```{r include=FALSE}
13 | knitr::opts_chunk$set(
14 | fig.width = 6,
15 | fig.height = 6 * 0.618,
16 | fig.retina = 3,
17 | dev = "ragg_png",
18 | fig.align = "center",
19 | out.width = "70%"
20 | )
21 |
22 | source(here::here("R", "helpers.R"))
23 | ```
24 |
25 | **Boxplots** display the relationship between a continuous variable and a categorical variable. **Count** plots display the relationship between two categorical variables. In this tutorial, you will learn how to use both. You will learn how to:
26 |
27 | * Make and interpret boxplots
28 | * Rotate boxplots by flipping the coordinate system of your plot
29 | * Use *violin* plots and *dotplots*, two geoms that are similar to boxplots
30 | * Make and interpret count plots
31 |
32 | The tutorial is adapted from [_R for Data Science_](https://r4ds.had.co.nz/) by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. You can purchase the book at [shop.oreilly.com](http://shop.oreilly.com/product/0636920034407.do).
33 |
34 | The tutorial uses the {ggplot2} and {dplyr} packages, which have been pre-loaded for your convenience.
35 |
36 |
37 | ##
38 |
39 | ```{r}
40 | #| echo: false
41 | #| results: asis
42 | create_buttons("01-boxplots.html")
43 | ```
44 |
--------------------------------------------------------------------------------
/visualize-data/05-scatterplots/01-scatterplots.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Scatterplots"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | engine: knitr
12 | filters:
13 | - webr
14 | webr:
15 | packages:
16 | - ggplot2
17 | - dplyr
18 | cell-options:
19 | editor-font-scale: 0.85
20 | fig-width: 6
21 | fig-height: 3.7
22 | out-width: "70%"
23 | ---
24 |
25 | ```{r include=FALSE}
26 | knitr::opts_chunk$set(
27 | fig.width = 6,
28 | fig.height = 6 * 0.618,
29 | fig.retina = 3,
30 | dev = "ragg_png",
31 | fig.align = "center",
32 | out.width = "70%"
33 | )
34 |
35 | library(tidyverse)
36 | set.seed(1234)
37 |
38 | source(here::here("R", "helpers.R"))
39 | ```
40 |
41 | ### Review 1: `geom_point()` {.no-hide}
42 |
43 | In [Visualization Basics](/basics/01-visualization-basics/), you learned how to make a scatterplot with `geom_point()`.
44 |
45 | The code below summarizes the mpg data set and begins to plot the results. Finish the plot with `geom_point()`. Put `mean_cty` on the $x$ axis and `mean_hwy` on the $y$ axis.
46 |
47 | ::: {.panel-tabset}
48 | ## {{< fa code >}} Interactive editor
49 |
50 | ```{webr-r}
51 | mpg |>
52 | group_by(class) |>
53 | summarize(mean_cty = mean(cty), mean_hwy = mean(hwy)) |>
54 | ggplot()
55 |
56 |
57 | ```
58 |
59 | ## {{< fa circle-check >}} Solution
60 |
61 | ```r
62 | mpg |>
63 | group_by(class) |>
64 | summarize(mean_cty = mean(cty), mean_hwy = mean(hwy)) |>
65 | ggplot() +
66 | geom_point(mapping = aes(x = mean_cty, y = mean_hwy))
67 | ```
68 |
69 | :::
70 |
71 | ###
72 |
73 | Good job! It can be tricky to remember when to use `|>` and when to use `+`. Use `|>` to add one complete step to a pipe of code. Use `+` to add one more line to a {ggplot2} call.
74 |
75 | ### `geom_text()` and `geom_label()`
76 |
77 | `geom_text()` and `geom_label()` create scatterplots that use words instead of points to display data. Each requires the extra aesthetic `label`, which you should map to a variable that contains text to display for each observation.
78 |
79 | Convert the plot below from `geom_point()` to `geom_text()` and map the `label` aesthetic to the `class` variable. When you are finished convert the code to `geom_label()` and rerun the plot. Can you spot the difference?
80 |
81 | ::: {.panel-tabset}
82 | ## {{< fa code >}} Interactive editor
83 |
84 | ```{webr-r}
85 | mpg |>
86 | group_by(class) |>
87 | summarize(mean_cty = mean(cty), mean_hwy = mean(hwy)) |>
88 | ggplot() +
89 | geom_point(mapping = aes(x = mean_cty, y = mean_hwy))
90 |
91 |
92 | ```
93 |
94 | ## {{< fa circle-check >}} Solution
95 |
96 | ```r
97 | mpg |>
98 | group_by(class) |>
99 | summarize(mean_cty = mean(cty), mean_hwy = mean(hwy)) |>
100 | ggplot() +
101 | geom_text(mapping = aes(x = mean_cty, y = mean_hwy, label = class))
102 |
103 | mpg |>
104 | group_by(class) |>
105 | summarize(mean_cty = mean(cty), mean_hwy = mean(hwy)) |>
106 | ggplot() +
107 | geom_label(mapping = aes(x = mean_cty, y = mean_hwy, label = class))
108 | ```
109 |
110 | :::
111 |
112 | ###
113 |
114 | Good job! `geom_text()` replaces each point with a piece of text supplied by the label aesthetic. `geom_label()` replaces each point with a textbox. Notice that some pieces of text overlap each other, and others run off the page. We'll soon look at a way to fix this.
115 |
116 | ### `geom_smooth()`
117 |
118 | In [Visualization Basics](/basics/01-visualization-basics/), you met `geom_smooth()`, which provides a summarized version of a scatterplot.
119 |
120 | `geom_smooth()` uses a model to fit a smoothed line to the data and then visualizes the results. By default, `geom_smooth()` fits a loess smooth to data sets with less than 1,000 observations, and a generalized additive model to data sets with more than 1,000 observations.
121 |
122 | ```{r echo=FALSE, out.width="100%", message=FALSE, warning=FALSE}
123 | #| layout-ncol: 2
124 | mpg |>
125 | group_by(class) |>
126 | summarize(mean_cty = mean(cty), mean_hwy = mean(hwy)) |>
127 | ggplot() +
128 | geom_point(mapping = aes(x = mean_cty, y = mean_hwy)) +
129 | labs(title = "geom_point()") +
130 | ylim(16, 30)
131 |
132 | mpg |>
133 | group_by(class) |>
134 | summarize(mean_cty = mean(cty), mean_hwy = mean(hwy)) |>
135 | ggplot() +
136 | geom_smooth(mapping = aes(x = mean_cty, y = mean_hwy), se = FALSE) +
137 | labs(title = "geom_smooth()") +
138 | ylim(16, 30)
139 | ```
140 |
141 | ### `method`
142 |
143 | You can use the `method` parameter of `geom_smooth()` to fit and display other types of model lines. To do this, pass `method` the name of an R modeling function for `geom_smooth()` to use, such as `"lm"` (for linear models) or `"glm"` (for generalized linear models).
144 |
145 | In the code below, use `geom_smooth()` to draw the linear model line that fits the data.
146 |
147 | ::: {.panel-tabset}
148 | ## {{< fa code >}} Interactive editor
149 |
150 | ```{webr-r}
151 | mpg |>
152 | group_by(class) |>
153 | summarize(mean_cty = mean(cty), mean_hwy = mean(hwy)) |>
154 | ggplot()
155 |
156 |
157 | ```
158 |
159 | ## {{< fa circle-check >}} Solution
160 |
161 | ```r
162 | mpg |>
163 | group_by(class) |>
164 | summarize(mean_cty = mean(cty), mean_hwy = mean(hwy)) |>
165 | ggplot() +
166 | geom_smooth(mapping = aes(x = mean_cty, y = mean_hwy), method = "lm")
167 | ```
168 |
169 | :::
170 |
171 | ###
172 |
173 | Good job! Now let's look at a way to make `geom_smooth()` much more useful.
174 |
175 | ##
176 |
177 | ```{r}
178 | #| echo: false
179 | #| results: asis
180 | create_buttons("02-layers.html")
181 | ```
182 |
--------------------------------------------------------------------------------
/visualize-data/05-scatterplots/03-coordinate-systems.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Coordinate systems"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | engine: knitr
12 | filters:
13 | - webr
14 | webr:
15 | packages:
16 | - ggplot2
17 | cell-options:
18 | editor-font-scale: 0.85
19 | fig-width: 6
20 | fig-height: 3.7
21 | out-width: "70%"
22 | ---
23 |
24 | ```{r include=FALSE}
25 | knitr::opts_chunk$set(
26 | fig.width = 6,
27 | fig.height = 6 * 0.618,
28 | fig.retina = 3,
29 | dev = "ragg_png",
30 | fig.align = "center",
31 | out.width = "70%"
32 | )
33 |
34 | library(tidyverse)
35 | set.seed(1234)
36 |
37 | source(here::here("R", "helpers.R"))
38 | ```
39 |
40 | ### `coord_flip()` {.no-hide}
41 |
42 | One way to customize a scatterplot is to plot it in a new coordinate system. {ggplot2} provides several helper functions that change the coordinate system of a plot. You've already seen one of these in action in the [boxplots tutorial](/visualize-data/04-boxplots/): `coord_flip()` flips the $x$ and $y$ axes of a plot.
43 |
44 | ```{r out.width="80%", message=FALSE, warning=FALSE}
45 | ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
46 | geom_boxplot(outlier.alpha = 0) +
47 | geom_jitter(width = 0) +
48 | coord_flip()
49 | ```
50 |
51 | ### The coord functions
52 |
53 | Altogether, {ggplot2} comes with several `coord` functions:
54 |
55 | * `coord_cartesian()`: (the default) Cartesian coordinates
56 | * `coord_fixed()`: Cartesian coordinates that maintain a fixed aspect ratio as the plot window is resized
57 | * `coord_flip()`: Cartesian coordinates with x and y axes flipped
58 | * `coord_sf()`: cartographic projections for plotting maps
59 | * `coord_polar()` and `coord_radial()`: polar and radial coordinates for round plots like pie charts
60 | * `coord_trans()`: transformed Cartesian coordinates
61 |
62 | By default, {ggplot2} will draw a plot in Cartesian coordinates unless you add one of the functions above to the plot code.
63 |
64 | ### `coord_polar()`
65 |
66 | You use each coord function like you use `coord_flip()`, by adding it to a {ggplot2} call.
67 |
68 | So for example, you could add `coord_polar()` to a plot to make a graph that uses polar coordinates.
69 |
70 | ```{r out.width="80%", message=FALSE, warning=FALSE}
71 | ggplot(data = diamonds) +
72 | geom_bar(mapping = aes(x = cut, fill = cut), width = 1)
73 |
74 | last_plot() +
75 | coord_polar()
76 | ```
77 |
78 | ### Coordinate systems and scatterplots
79 |
80 | How can a coordinate system improve a scatterplot?
81 |
82 | Consider, the scatterplot below. It shows a strong relationship between the carat size of a diamond and its price.
83 |
84 | ```{r echo=FALSE, out.width="80%", message=FALSE, warning=FALSE}
85 | ggplot(data = diamonds) +
86 | geom_point(mapping = aes(x = carat, y = price))
87 | ```
88 |
89 | However, the relationship does not appear linear. It appears to have the form $y = x^{n}$, a common relationship found in nature. You can estimate the $n$ by replotting the data in a _log-log plot_.
90 |
91 | ### log-log plots
92 |
93 | Log-log plots graph the log of $x$ vs. the log of $y$, which has a valuable visual effect. If you log both sides of a relationship like
94 |
95 | $$
96 | y = x^{n}
97 | $$
98 |
99 | You get a linear relationship with slope $n$:
100 |
101 | $$
102 | \begin{aligned}
103 | \log(y) &= \log(x^{n}) \\
104 | \log(y) &= n \times \log(x)
105 | \end{aligned}
106 | $$
107 |
108 | In other words, log-log plots unbend power relationships into straight lines. Moreover, they display $n$ as the slope of the straight line, which is reasonably easy to estimate.
109 |
110 | Try this by using the diamonds dataset to plot `log(carat)` on the x-axis and `log(price)` on the y-axis:
111 |
112 | ::: {.panel-tabset}
113 | ## {{< fa code >}} Interactive editor
114 |
115 | ```{webr-r}
116 |
117 |
118 |
119 | ```
120 |
121 | ## {{< fa circle-check >}} Solution
122 |
123 | ```r
124 | ggplot(data = diamonds) +
125 | geom_point(mapping = aes(x = log(carat), y = log(price)))
126 | ```
127 |
128 | :::
129 |
130 | ###
131 |
132 | Good job! Now let's look at how you can do the same transformation, and others as well with a coord function.
133 |
134 | ### `coord_trans()`
135 |
136 | `coord_trans()` provides a second way to do the same transformation, or similar transformations.
137 |
138 | To use `coord_trans()` give it an $x$ and/or a $y$ argument. Set each to the name of an R function surrounded by quotation marks. `coord_trans()` will use the function to transform the specified axis before plotting the raw data.
139 |
140 | ::: {.panel-tabset}
141 | ## {{< fa code >}} Interactive editor
142 |
143 | ```{webr-r}
144 | ggplot(data = diamonds) +
145 | geom_point(mapping = aes(x = carat, y = price)) +
146 | coord_trans(x = "log", y = "log")
147 |
148 |
149 | ```
150 |
151 | :::
152 |
153 |
154 | ### Recap
155 |
156 | Scatterplots are one of the most useful types of plots for data science. You will have many chances to use `geom_point()`, `geom_smooth()`, and `geom_label_repel()` in your day-to-day work.
157 |
158 | However, this tutor introduced important two concepts that apply to more than just scatterplots:
159 |
160 | * You can add **multiple layers** to any plot that you make with {ggplot2}
161 | * You can add a different **coordinate system** to any plot that you make with {ggplot2}
162 |
163 |
164 | ##
165 |
166 | ```{r}
167 | #| echo: false
168 | #| results: asis
169 | create_buttons(NULL)
170 | ```
171 |
--------------------------------------------------------------------------------
/visualize-data/05-scatterplots/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Scatterplots"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 | ---
11 |
12 | ```{r include=FALSE}
13 | knitr::opts_chunk$set(
14 | fig.width = 6,
15 | fig.height = 6 * 0.618,
16 | fig.retina = 3,
17 | dev = "ragg_png",
18 | fig.align = "center",
19 | out.width = "70%"
20 | )
21 |
22 | source(here::here("R", "helpers.R"))
23 | ```
24 |
25 | A **scatterplot** displays the relationship between two continuous variables. Scatterplots are one of the most common types of graphs---in fact, you've met scatterplots already in [Visualization Basics](/basics/01-visualization-basics/).
26 |
27 | In this tutorial, you'll learn how to:
28 |
29 | * Make new types of scatterplots with `geom_text()` and `geom_jitter()`
30 | * Add multiple **layers** of geoms to a plot
31 | * Enhance scatterplots with `geom_smooth()`, `geom_rug()`, and `geom_repel()`
32 | * Change the **coordinate system** of a plot
33 |
34 | The tutorial is adapted from [_R for Data Science_](https://r4ds.had.co.nz/) by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. You can purchase the book at [shop.oreilly.com](http://shop.oreilly.com/product/0636920034407.do).
35 |
36 | The tutorial uses the {ggplot2}, {ggrepel}, and {dplyr} packages, which have been pre-loaded for your convenience.
37 |
38 |
39 | ##
40 |
41 | ```{r}
42 | #| echo: false
43 | #| results: asis
44 | create_buttons("01-scatterplots.html")
45 | ```
46 |
--------------------------------------------------------------------------------
/visualize-data/06-line-graphs/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Line plots"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 | ---
11 |
12 | ```{r include=FALSE}
13 | knitr::opts_chunk$set(
14 | fig.width = 6,
15 | fig.height = 6 * 0.618,
16 | fig.retina = 3,
17 | dev = "ragg_png",
18 | fig.align = "center",
19 | out.width = "70%"
20 | )
21 |
22 | source(here::here("R", "helpers.R"))
23 | ```
24 |
25 | A **line graph** displays a functional relationship between two continuous variables. A **map** displays spatial data. The two may seem different, but they are made in similar ways. This tutorial will examine them both.
26 |
27 | In this tutorial, you'll learn how to:
28 |
29 | * Make new types of line plots with `geom_step()`, `geom_area()`, `geom_path()`, and `geom_polygon()`
30 | * Avoid "whipsawing" with the group aesthetic
31 | * Find and plot map data with `geom_sf()`
32 | * Transform a coordinate system into a map projection with `coord_sf()`
33 |
34 | The tutorial is adapted from [_R for Data Science_](https://r4ds.had.co.nz/) by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. You can purchase the book at [shop.oreilly.com](http://shop.oreilly.com/product/0636920034407.do).
35 |
36 | The tutorial uses the {ggplot2}, {sf}, and {dplyr} packages, which have been pre-loaded for your convenience.
37 |
38 |
39 | ##
40 |
41 | ```{r}
42 | #| echo: false
43 | #| results: asis
44 | create_buttons("01-line-graphs.html")
45 | ```
46 |
--------------------------------------------------------------------------------
/visualize-data/07-overplotting/01-overplotting.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Overplotting"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 | ---
11 |
12 | ```{r include=FALSE}
13 | knitr::opts_chunk$set(
14 | fig.width = 6,
15 | fig.height = 6 * 0.618,
16 | fig.retina = 3,
17 | dev = "ragg_png",
18 | fig.align = "center",
19 | out.width = "70%"
20 | )
21 |
22 | library(tidyverse)
23 |
24 | source(here::here("R", "helpers.R"))
25 | ```
26 |
27 | ### What is overplotting? {.no-hide}
28 |
29 | You've seen this plot several times in previous tutorials, but have you noticed that it only displays 126 points? This is unusual because the plot visualizes a data set that contains 234 points.
30 |
31 | ```{r echo=FALSE, out.width="80%"}
32 | ggplot(data = mpg) +
33 | geom_point(mapping = aes(x = displ, y = hwy))
34 | ```
35 |
36 | The missing points are hidden behind other points, a phenomenon known as _overplotting_. Overplotting is a problem because it provides an incomplete picture of the dataset. You cannot determine where the *mass* of the points fall, which makes it difficult to spot relationships in the data.
37 |
38 | ### Causes of overplotting
39 |
40 | Overplotting usually occurs for two different reasons:
41 |
42 | 1. The data points have been rounded to a "grid" of common values, as in the plot above
43 | 2. The dataset is so large that it cannot be plotted without points overlapping each other
44 |
45 | How you deal with overplotting will depend on the cause.
46 |
47 |
48 | ##
49 |
50 | ```{r}
51 | #| echo: false
52 | #| results: asis
53 | create_buttons("02-rounding.html")
54 | ```
55 |
--------------------------------------------------------------------------------
/visualize-data/07-overplotting/02-rounding.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Rounding"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | engine: knitr
12 | filters:
13 | - webr
14 | webr:
15 | packages:
16 | - ggplot2
17 | cell-options:
18 | editor-font-scale: 0.85
19 | fig-width: 6
20 | fig-height: 3.7
21 | out-width: "70%"
22 | ---
23 |
24 | ```{r include=FALSE}
25 | knitr::opts_chunk$set(
26 | fig.width = 6,
27 | fig.height = 6 * 0.618,
28 | fig.retina = 3,
29 | dev = "ragg_png",
30 | fig.align = "center",
31 | out.width = "70%"
32 | )
33 |
34 | library(tidyverse)
35 |
36 | source(here::here("R", "helpers.R"))
37 | ```
38 |
39 |
40 | ### Overplotting due to rounding {.no-hide}
41 |
42 | If your overplotting is due to rounding, you can obtain a better picture of the data by making each point semi-transparent. For example you could _set_ the `alpha` aesthetic of the plot below to a _value_ less than one, which will make the points transparent.
43 |
44 | Try this now. Set the points to an alpha of 0.25, which will make each point 25% opaque (i.e. four points staked on top of each other will create a solid black).
45 |
46 | ::: {.panel-tabset}
47 | ## {{< fa code >}} Interactive editor
48 |
49 | ```{webr-r}
50 | ggplot(data = mpg) +
51 | geom_point(mapping = aes(x = displ, y = hwy))
52 |
53 |
54 | ```
55 |
56 | ## {{< fa lightbulb >}} Hint
57 |
58 | **Hint:** Make sure you set `alpha = 0.25` *outside* of `aes()`.
59 |
60 | ## {{< fa circle-check >}} Solution
61 |
62 | ```r
63 | ggplot(data = mpg) +
64 | geom_point(mapping = aes(x = displ, y = hwy), alpha = 0.25)
65 | ```
66 |
67 | :::
68 |
69 | ###
70 |
71 | Good job! You can now identify which values contain more observations. The darker locations contain several points stacked on top of each other.
72 |
73 |
74 | ### Adjust the position
75 |
76 | A second strategy for dealing with rounding is to adjust the position of each point. `position = "jitter"` adds a small amount of random noise to the location of each point. Since the noise is random, it is unlikely that two points rounded to the same location will also be jittered to the same location.
77 |
78 | The result is a jittered plot that displays more of the data. Jittering comes with both limitations and benefits. You cannot use a jittered plot to see the _local_ values of the points, but you can use a jittered plot to perceive the _global_ relationship between the variables, something that is hard to do in the presence of overplotting.
79 |
80 | ```{r out.width="80%"}
81 | ggplot(data = mpg) +
82 | geom_point(mapping = aes(x = displ, y = hwy), position = "jitter")
83 | ```
84 |
85 | ### Review: jitter
86 |
87 | In the [Scatterplots tutorial](/visualize-data/05-scatterplots/02-layers.qmd), you learned of a geom that displays the equivalent of `geom_point()` with a `position = "jitter"` adjustment.
88 |
89 | Rewrite the code below to use that geom. Do you obtain similar results?
90 |
91 | ::: {.panel-tabset}
92 | ## {{< fa code >}} Interactive editor
93 |
94 | ```{webr-r}
95 | ggplot(data = mpg) +
96 | geom_point(mapping = aes(x = displ, y = hwy), position = "jitter")
97 |
98 |
99 | ```
100 |
101 | ## {{< fa circle-check >}} Solution
102 |
103 | ```r
104 | ggplot(data = mpg) +
105 | geom_jitter(mapping = aes(x = displ, y = hwy))
106 | ```
107 |
108 | :::
109 |
110 | ###
111 |
112 | Good job! Now let's look at ways to handle overplotting due to large datasets.
113 |
114 |
115 | ##
116 |
117 | ```{r}
118 | #| echo: false
119 | #| results: asis
120 | create_buttons("03-large-data.html")
121 | ```
122 |
--------------------------------------------------------------------------------
/visualize-data/07-overplotting/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Overplotting and big data"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 | ---
11 |
12 | ```{r include=FALSE}
13 | knitr::opts_chunk$set(
14 | fig.width = 6,
15 | fig.height = 6 * 0.618,
16 | fig.retina = 3,
17 | dev = "ragg_png",
18 | fig.align = "center",
19 | out.width = "70%"
20 | )
21 |
22 | source(here::here("R", "helpers.R"))
23 | ```
24 |
25 |
26 | Data visualization is a useful tool because it makes data accessible to your visual system, which can process large amounts of information quickly. However, two characteristics of data can short circuit this system. Data can not be easily visualized if
27 |
28 | 1. Data points are all rounded to the same values.
29 | 2. The data contains so many points that they occlude each other.
30 |
31 | These features both create _overplotting_, the condition where multiple geoms in the plot are plotted on top of each other, hiding each other. This tutorial will show you several strategies for dealing with overplotting, introducing new geoms along the way.
32 |
33 | The tutorial is adapted from [_R for Data Science_](https://r4ds.had.co.nz/) by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. You can purchase the book at [shop.oreilly.com](http://shop.oreilly.com/product/0636920034407.do).
34 |
35 | The tutorial uses the {ggplot2} and {hexbin} packages, which have been pre-loaded for your convenience.
36 |
37 |
38 | ##
39 |
40 | ```{r}
41 | #| echo: false
42 | #| results: asis
43 | create_buttons("01-overplotting.html")
44 | ```
45 |
--------------------------------------------------------------------------------
/visualize-data/08-customize/01-zooming.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Zooming"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | engine: knitr
12 | filters:
13 | - webr
14 | webr:
15 | packages:
16 | - ggplot2
17 | - dplyr
18 | cell-options:
19 | editor-font-scale: 0.85
20 | fig-width: 6
21 | fig-height: 3.7
22 | out-width: "70%"
23 | ---
24 |
25 | ```{r include=FALSE}
26 | knitr::opts_chunk$set(
27 | fig.width = 6,
28 | fig.height = 6 * 0.618,
29 | fig.retina = 3,
30 | dev = "ragg_png",
31 | fig.align = "center",
32 | out.width = "70%"
33 | )
34 |
35 | library(tidyverse)
36 |
37 | source(here::here("R", "helpers.R"))
38 | ```
39 |
40 | ```{webr-r}
41 | #| context: setup
42 | p <- ggplot(diamonds) +
43 | geom_boxplot(mapping = aes(x = cut, y = price))
44 | ```
45 |
46 | In the previous tutorials, you learned how to visualize data with graphs. Now let's look at how to customize the look and feel of your graphs. To do that we will need to begin with a graph that we can customize.
47 |
48 | ### Review 1: Make a plot
49 |
50 | In the chunk below, make a plot that uses boxplots to display the relationship between the `cut` and `price` variables from the `diamonds` dataset.
51 |
52 | ::: {.panel-tabset}
53 | ## {{< fa code >}} Interactive editor
54 |
55 | ```{webr-r}
56 |
57 |
58 |
59 | ```
60 |
61 | ## {{< fa circle-check >}} Solution
62 |
63 | ```r
64 | ggplot(diamonds) +
65 | geom_boxplot(mapping = aes(x = cut, y = price))
66 | ```
67 |
68 | :::
69 |
70 | ###
71 |
72 | Good job! Let's use this plot as a starting point to make a more pleasing plot that displays a clear message.
73 |
74 | ### Storing plots
75 |
76 | Since we want to use this plot again later, let's go ahead and save it.
77 |
78 | ```{r}
79 | p <- ggplot(diamonds) +
80 | geom_boxplot(mapping = aes(x = cut, y = price))
81 | ```
82 |
83 | Now whenever you call `p`, R will draw your plot. Try it and see.
84 |
85 | ::: {.panel-tabset}
86 | ## {{< fa code >}} Interactive editor
87 |
88 | ```{webr-r}
89 |
90 |
91 |
92 | ```
93 |
94 | ## {{< fa circle-check >}} Solution
95 |
96 | ```r
97 | p
98 | ```
99 |
100 | :::
101 |
102 | ###
103 |
104 | Good job! By the way, have you taken a moment to look at what the plot shows? Let's do that now.
105 |
106 | ### Surprise?
107 |
108 | Our plot shows something surprising: when you group diamonds by `cut`, the worst cut diamonds have the highest median price. It's a little hard to see in the plot, but you can verify it with some data manipulation.
109 |
110 | ```{r}
111 | diamonds |>
112 | group_by(cut) |>
113 | summarise(median = median(price))
114 | ```
115 |
116 | ### Zoom
117 |
118 | ```{r echo=FALSE, out.width="80%"}
119 | p
120 | ```
121 |
122 | The difference between median prices is hard to see in our plot because each group contains distant outliers.
123 |
124 | We can make the difference easier to see by zooming in on the low values of $y$, where the medians are located. There are two ways to zoom with {ggplot2}: with and without clipping.
125 |
126 | ### Clipping
127 |
128 | Clipping refers to how R should treat the data that falls outside of the zoomed region. To see its effect, look at these plots. Each zooms in on the region where price is between \$0 and \$7,500.
129 |
130 | ```{r echo=FALSE, out.width="100%", warning=FALSE, message=FALSE}
131 | #| layout-ncol: 2
132 | p + ylim(0, 7500)
133 | p + coord_cartesian(ylim = c(0, 7500))
134 | ```
135 |
136 | * The plot on the left zooms _by_ clipping. It removes all of the data points that fall outside of the desired region, and then plots the data points that remain.
137 | * The plot on the right zooms _without_ clipping. You can think of it as drawing the entire graph and then zooming into a certain region.
138 |
139 | ### `xlim()` and `ylim()`
140 |
141 | Of these, zooming by clipping is the easiest to do. To zoom your graph on the $x$ axis, add the function `xlim()` to the plot call. To zoom on the $y$ axis add the function `ylim()`. Each takes a minimum value and a maximum value to zoom to, like this
142 |
143 | ```{r eval=FALSE}
144 | some_plot +
145 | xlim(0, 100)
146 | ```
147 |
148 | ### Exercise 1: Clipping
149 |
150 | Use `ylim()` to recreate our plot on the left from above. The plot zooms the $y$ axis from 0 to 7,500 by clipping.
151 |
152 | ::: {.panel-tabset}
153 | ## {{< fa code >}} Interactive editor
154 |
155 | ```{webr-r}
156 | p
157 |
158 |
159 | ```
160 |
161 | ## {{< fa circle-check >}} Solution
162 |
163 | ```r
164 | p + ylim(0, 7500)
165 | ```
166 |
167 | :::
168 |
169 | ###
170 |
171 | Good job! Zooming by clipping will sometimes make the graph you want, but in our case it is a very bad idea. Can you tell why?
172 |
173 |
174 | ### A caution
175 |
176 | Zooming by clipping is a bad idea for boxplots. `ylim()` fundamentally changes the information conveyed in the boxplots because it throws out some of the data before drawing the boxplots. Those aren't the medians of the entire data set that we are looking at.
177 |
178 | How then can we zoom without clipping?
179 |
180 | ### `xlim` and `ylim`
181 |
182 | To zoom without clipping, set the `xlim` and/or `ylim` arguments of your plot's `coord_` function. Each takes a numeric vector of length two (the minimum and maximum values to zoom to).
183 |
184 | This is easy to do if your plot explicitly calls a `coord_` function
185 |
186 | ```{r out.width="80%"}
187 | p + coord_flip(ylim = c(0, 7500))
188 | ```
189 |
190 | ### `coord_cartesian()`
191 |
192 | But what if your plot doesn't call a `coord_` function? Then your plot is using Cartesian coordinates (the default). You can adjust the limits of your plot without changing the default coordinate system by adding `coord_cartesian()` to your plot.
193 |
194 | Try it below. Use `coord_cartesian()` to zoom `p` to the region where price falls between 0 and 7500.
195 |
196 | ::: {.panel-tabset}
197 | ## {{< fa code >}} Interactive editor
198 |
199 | ```{webr-r}
200 | p
201 |
202 |
203 | ```
204 |
205 | ## {{< fa circle-check >}} Solution
206 |
207 | ```r
208 | p + coord_cartesian(ylim = c(0, 7500))
209 | ```
210 |
211 | :::
212 |
213 | ###
214 |
215 | Good job! Now it is much easier to see the differences in the median.
216 |
217 |
218 | ### `p`
219 |
220 | Notice that our code so far has used `p` to make a plot, but it hasn't changed the plot that is saved inside of `p`. You can run `p` by itself to get the unzoomed plot.
221 |
222 | ```{r out.width="80%"}
223 | p
224 | ```
225 |
226 | ### Updating `p`
227 |
228 | I like the zooming, so I'm purposefully going to overwrite the plot stored in `p` so that it uses it.
229 |
230 | ```{r out.width="80%"}
231 | p <- p + coord_cartesian(ylim = c(0, 7500))
232 | p
233 | ```
234 |
235 |
236 | ##
237 |
238 | ```{r}
239 | #| echo: false
240 | #| results: asis
241 | create_buttons("02-labels.html")
242 | ```
243 |
--------------------------------------------------------------------------------
/visualize-data/08-customize/02-labels.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Labels"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | engine: knitr
12 | filters:
13 | - webr
14 | webr:
15 | packages:
16 | - ggplot2
17 | cell-options:
18 | editor-font-scale: 0.85
19 | fig-width: 6
20 | fig-height: 3.7
21 | out-width: "70%"
22 | ---
23 |
24 | ```{r include=FALSE}
25 | knitr::opts_chunk$set(
26 | fig.width = 6,
27 | fig.height = 6 * 0.618,
28 | fig.retina = 3,
29 | dev = "ragg_png",
30 | fig.align = "center",
31 | out.width = "70%"
32 | )
33 |
34 | library(tidyverse)
35 |
36 | p <- ggplot(diamonds) +
37 | geom_boxplot(mapping = aes(x = cut, y = price)) +
38 | coord_cartesian(ylim = c(0, 7500))
39 |
40 | source(here::here("R", "helpers.R"))
41 | ```
42 |
43 | ```{webr-r}
44 | #| context: setup
45 | p <- ggplot(diamonds) +
46 | geom_boxplot(mapping = aes(x = cut, y = price)) +
47 | coord_cartesian(ylim = c(0, 7500))
48 | ```
49 |
50 | ### `labs()` {.no-hide}
51 |
52 | The relationship in our plot is now easier to see, but that doesn't mean that everyone who sees our plot will spot it. We can draw their attention to the relationship with a label, like a title or a caption.
53 |
54 | To do this, we will use the `labs()` function. You can think of `labs()` as an all purpose function for adding labels to a {ggplot2} plot.
55 |
56 | ### Titles
57 |
58 | Give `labs()` a `title` argument to add a title.
59 |
60 | ```{r out.width="80%"}
61 | p + labs(title = "The title appears here")
62 | ```
63 |
64 | ### Subtitles
65 |
66 | Give `labs()` a `subtitle` argument to add a subtitle. If you use multiple arguments, remember to separate them with a comma.
67 |
68 | ```{r out.width="80%"}
69 | p + labs(title = "The title appears here",
70 | subtitle = "The subtitle appears here, slightly smaller")
71 | ```
72 |
73 | ### Captions
74 |
75 | Give `labs()` a `caption` argument to add a caption. I like to use captions to cite my data source.
76 |
77 | ```{r out.width="80%"}
78 | p + labs(title = "The title appears here",
79 | subtitle = "The subtitle appears here, slightly smaller",
80 | caption = "Captions appear at the bottom.")
81 | ```
82 |
83 | ### Axis labels
84 |
85 | Give `labs()` `x` and `y` arguments to change the axis labels.
86 |
87 | ```{r out.width="80%"}
88 | p + labs(title = "The title appears here",
89 | subtitle = "The subtitle appears here, slightly smaller",
90 | caption = "Captions appear at the bottom.",
91 | x = "Diamond cut",
92 | y = "Price")
93 | ```
94 |
95 | ### Legend titles
96 |
97 | If you've mapped a column to an aesthetic like `color`, `fill`, `linetype`, etc., you can change its label with `labs()` too:
98 |
99 | ```{r out.width="80%"}
100 | ggplot(diamonds) +
101 | geom_boxplot(mapping = aes(x = cut, y = price, fill = cut)) +
102 | labs(title = "The title appears here",
103 | subtitle = "The subtitle appears here, slightly smaller",
104 | caption = "Captions appear at the bottom.",
105 | x = "Diamond cut",
106 | y = "Price",
107 | fill = "Diamond cut")
108 | ```
109 |
110 | ### Exercise 2: Labels
111 |
112 | Plot `p` with a set of informative labels. For learning purposes, be sure to use a title, subtitle, caption, and axis labels.
113 |
114 | ::: {.panel-tabset}
115 | ## {{< fa code >}} Interactive editor
116 |
117 | ```{webr-r}
118 | p
119 |
120 |
121 | ```
122 |
123 | ## {{< fa circle-check >}} Solution
124 |
125 | ```r
126 | p + labs(title = "Diamond prices by cut",
127 | subtitle = "Fair cut diamonds fetch the highest median price. Why?",
128 | caption = "Data collected by Hadley Wickham")
129 | ```
130 |
131 | :::
132 |
133 |
134 | ###
135 |
136 | Good job! By the way, why *do* fair cut diamonds fetch the highest price?
137 |
138 |
139 | ### Exercise 3: Carat size?
140 |
141 | Perhaps a diamond's cut is conflated with its carat size. If fair cut diamonds tend to be larger diamonds that would explain their larger prices. Let's test this.
142 |
143 | Make a plot that displays the relationship between carat size, price, and cut for all diamonds. How do you interpret the results? Give your plot a title, subtitle, and caption that explain the plot and convey your conclusions.
144 |
145 | If you are looking for a way to start, I recommend using a smooth line with color mapped to cut, perhaps overlaid on the background data.
146 |
147 | ::: {.panel-tabset}
148 | ## {{< fa code >}} Interactive editor
149 |
150 | ```{webr-r}
151 |
152 |
153 |
154 | ```
155 |
156 | ## {{< fa circle-check >}} Solution
157 |
158 | ```r
159 | ggplot(data = diamonds, mapping = aes(x = carat, y = price)) +
160 | geom_smooth(mapping = aes(color = cut), se = FALSE) +
161 | labs(title = "Carat size vs. Price",
162 | subtitle = "Fair cut diamonds tend to be large, but they fetch the lowest prices for most carat sizes.",
163 | caption = "Data by Hadley Wickham")
164 | ```
165 |
166 | :::
167 |
168 | ###
169 |
170 | Good job! The plot corroborates our hypothesis.
171 |
172 | ### `p1`
173 |
174 | Unlike `p`, our new plot uses color and has a legend. Let's save it to use later when we learn to customize colors and legends.
175 |
176 | ```{r out.width="80%", message=FALSE}
177 | p1 <- ggplot(data = diamonds, mapping = aes(x = carat, y = price)) +
178 | geom_smooth(mapping = aes(color = cut), se = FALSE) +
179 | labs(title = "Carat size vs. Price",
180 | subtitle = "Fair cut diamonds tend to be large, but they fetch the lowest prices for most carat sizes.",
181 | caption = "Data by Hadley Wickham")
182 | ```
183 |
184 | ### `annotate()`
185 |
186 | `annotate()` provides a final way to label your graph: it adds a single geom to your plot. When you use `annotate()`, you must first choose which type of geom to add. Next, you must manually supply a value for each aesthetic required by the geom.
187 |
188 | So for example, we could use `annotate()` to add text to our plot.
189 |
190 | ```{r message=FALSE}
191 | p1 + annotate("text", x = 4, y = 7500, label = "There are no cheap,\nlarge diamonds")
192 | ```
193 |
194 | Notice that I select `geom_text()` with `"text"`, the suffix of the function name in quotation marks.
195 |
196 | ##
197 |
198 | ```{r}
199 | #| echo: false
200 | #| results: asis
201 | create_buttons("03-themes.html")
202 | ```
203 |
--------------------------------------------------------------------------------
/visualize-data/08-customize/03-themes.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Themes"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | engine: knitr
12 | filters:
13 | - webr
14 | webr:
15 | packages:
16 | - ggplot2
17 | - ggthemes
18 | cell-options:
19 | editor-font-scale: 0.85
20 | fig-width: 6
21 | fig-height: 3.7
22 | out-width: "70%"
23 | ---
24 |
25 | ```{r include=FALSE}
26 | knitr::opts_chunk$set(
27 | fig.width = 6,
28 | fig.height = 6 * 0.618,
29 | fig.retina = 3,
30 | dev = "ragg_png",
31 | fig.align = "center",
32 | out.width = "70%"
33 | )
34 |
35 | library(tidyverse)
36 | library(ggthemes)
37 |
38 | p1 <- ggplot(data = diamonds, mapping = aes(x = carat, y = price)) +
39 | geom_smooth(mapping = aes(color = cut), se = FALSE) +
40 | labs(title = "Carat size vs. Price",
41 | subtitle = "Fair cut diamonds tend to be large, but they fetch the lowest prices for most carat sizes.",
42 | caption = "Data by Hadley Wickham")
43 |
44 | source(here::here("R", "helpers.R"))
45 | ```
46 |
47 | ```{webr-r}
48 | #| context: setup
49 | p1 <- ggplot(data = diamonds, mapping = aes(x = carat, y = price)) +
50 | geom_smooth(mapping = aes(color = cut), se = FALSE) +
51 | labs(title = "Carat size vs. Price",
52 | subtitle = "Fair cut diamonds tend to be large, but they fetch the lowest prices for most carat sizes.",
53 | caption = "Data by Hadley Wickham")
54 | ```
55 |
56 | One of the most effective ways to control the look of your plot is with a theme.
57 |
58 | ### What is a theme?
59 |
60 | A theme describes how the non-data elements of your plot should look. For example, these two plots show the same data, but they use two very different themes.
61 |
62 | ```{r echo=FALSE, out.width="100%", message=FALSE, warning=FALSE}
63 | #| layout-ncol: 2
64 | p1 + theme_bw()
65 | p1 + theme_economist()
66 | ```
67 |
68 | ### Theme functions
69 |
70 | To change the theme of your plot, add a `theme_` function to your plot call. The {ggplot2} package provides eight theme functions to choose from.
71 |
72 | * `theme_bw()`
73 | * `theme_classic()`
74 | * `theme_dark()`
75 | * `theme_gray()`
76 | * `theme_light()`
77 | * `theme_linedraw()`
78 | * `theme_minimal()`
79 | * `theme_void()`
80 |
81 | Use the box below to plot `p1` with each of the themes. Which theme do you prefer? Which theme does {ggplot2} apply by default?
82 |
83 | ::: {.panel-tabset}
84 | ## {{< fa code >}} Interactive editor
85 |
86 | ```{webr-r}
87 | p1 + theme_bw()
88 |
89 |
90 | ```
91 |
92 | :::
93 |
94 | ###
95 |
96 | Good job! {ggplot2} uses `theme_gray()` by default.
97 |
98 | ### {ggthemes}
99 |
100 | If you would like to give your graph a more complete makeover, the {ggthemes} package provides extra themes that imitate the graph styles of popular software packages and publications. These include:
101 |
102 | * `theme_base()`
103 | * `theme_calc()`
104 | * `theme_economist()`
105 | * `theme_economist_white()`
106 | * `theme_excel()`
107 | * `theme_few()`
108 | * `theme_fivethirtyeight()`
109 | * `theme_foundation()`
110 | * `theme_gdocs()`
111 | * `theme_hc()`
112 | * `theme_igray()`
113 | * `theme_map()`
114 | * `theme_pander()`
115 | * `theme_par()`
116 | * `theme_solarized()`
117 | * `theme_solarized_2()`
118 | * `theme_solid()`
119 | * `theme_stata()`
120 | * `theme_tufte()`
121 | * `theme_wsj()`
122 |
123 | Try plotting `p1` with at least two or three of the themes mentioned above.
124 |
125 | ::: {.panel-tabset}
126 | ## {{< fa code >}} Interactive editor
127 |
128 | ```{webr-r}
129 | p1
130 |
131 |
132 | ```
133 |
134 | ## {{< fa circle-check >}} Solution
135 |
136 | ```r
137 | p1 + theme_wsj()
138 | ```
139 |
140 | :::
141 |
142 | ###
143 |
144 | Good job! Notice that each theme supplies its own font sizes, which means that your captions might run off the page for some themes. In practice, you can fix this by resizing your graph window.
145 |
146 |
147 | ### Update `p1`
148 |
149 | If you compare the {ggtheme} themes to the styles they imitate, you might notice something: the colors used to plot your data haven't changed. The colors are noticeably {ggplot2} colors. In the next section, we'll look at how to customize this remaining part of your graph: the data elements.
150 |
151 | Before we go on, I suggest that we update `p1` to use `theme_bw()`. It will make our next set of modifications easier to see.
152 |
153 | ```{r p1, out.width="80%", message=FALSE}
154 | p1 <- p1 + theme_bw()
155 | p1
156 | ```
157 |
158 |
159 | ##
160 |
161 | ```{r}
162 | #| echo: false
163 | #| results: asis
164 | create_buttons("04-scales.html")
165 | ```
166 |
--------------------------------------------------------------------------------
/visualize-data/08-customize/06-quiz.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Quiz"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 |
11 | engine: knitr
12 | filters:
13 | - webr
14 | webr:
15 | packages:
16 | - ggplot2
17 | cell-options:
18 | editor-font-scale: 0.85
19 | fig-width: 6
20 | fig-height: 3.7
21 | out-width: "70%"
22 | ---
23 |
24 | ```{r include=FALSE}
25 | knitr::opts_chunk$set(
26 | fig.width = 6,
27 | fig.height = 6 * 0.618,
28 | fig.retina = 3,
29 | dev = "ragg_png",
30 | fig.align = "center",
31 | out.width = "70%"
32 | )
33 |
34 | library(tidyverse)
35 |
36 | source(here::here("R", "helpers.R"))
37 | ```
38 |
39 | In this tutorial, you learned how to customize the graphs that you make with ggplot2 in several ways. You learned how to:
40 |
41 | * Zoom in on regions of the graph
42 | * Add titles, subtitles, and annotations
43 | * Add themes
44 | * Add color scales
45 | * Adjust legends
46 |
47 | To cement your skills, combine what you've learned to recreate the plot below.
48 |
49 | ```{r echo=FALSE, message=FALSE}
50 | ggplot(diamonds, aes(x = carat, y = price)) +
51 | geom_point() +
52 | geom_smooth(aes(color = cut), se = FALSE) +
53 | labs(title = "Ideal cut diamonds command the best price for every carat size",
54 | subtitle = "Lines show GAM estimate of mean values for each level of cut",
55 | caption = "Data provided by Hadley Wickham",
56 | x = "Log Carat Size",
57 | y = "Log Price Size",
58 | color = "Cut Rating") +
59 | scale_x_log10() +
60 | scale_y_log10() +
61 | scale_color_brewer(palette = "Greens") +
62 | theme_light()
63 | ```
64 |
65 | ::: {.panel-tabset}
66 | ## {{< fa code >}} Interactive editor
67 |
68 | ```{webr-r}
69 |
70 |
71 |
72 | ```
73 |
74 | ## {{< fa circle-check >}} Solution
75 |
76 | ```r
77 | ggplot(data = diamonds, mapping = aes(x = carat, y = price)) +
78 | geom_point() +
79 | geom_smooth(mapping = aes(color = cut), se = FALSE) +
80 | labs(title = "Ideal cut diamonds command the best price for every carat size",
81 | subtitle = "Lines show GAM estimate of mean values for each level of cut",
82 | caption = "Data provided by Hadley Wickham",
83 | x = "Log Carat Size",
84 | y = "Log Price Size",
85 | color = "Cut Rating") +
86 | scale_x_log10() +
87 | scale_y_log10() +
88 | scale_color_brewer(palette = "Greens") +
89 | theme_light()
90 | ```
91 |
92 | :::
93 |
94 | ##
95 |
96 |
97 | ```{r}
98 | #| echo: false
99 | #| results: asis
100 | create_buttons(NULL)
101 | ```
102 |
--------------------------------------------------------------------------------
/visualize-data/08-customize/img/viridis.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/visualize-data/08-customize/img/viridis.png
--------------------------------------------------------------------------------
/visualize-data/08-customize/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Customize your plots"
3 | format:
4 | html:
5 | toc: false
6 | section-divs: true
7 | include-after-body:
8 | - text: |
9 |
10 | ---
11 |
12 | ```{r include=FALSE}
13 | knitr::opts_chunk$set(
14 | fig.width = 6,
15 | fig.height = 6 * 0.618,
16 | fig.retina = 3,
17 | dev = "ragg_png",
18 | fig.align = "center",
19 | out.width = "70%"
20 | )
21 |
22 | source(here::here("R", "helpers.R"))
23 | ```
24 |
25 | This tutorial will teach you how to customize the look and feel of your plots. You will learn how to:
26 |
27 | * **Zoom in** on areas of interest
28 | * Add **labels** and **annotations** to your plots
29 | * Change the appearance of your plot with a **theme**
30 | * Use **scales** to select custom color palettes
31 | * Modify the labels, title, and position of **legends**
32 |
33 | The tutorial is adapted from [_R for Data Science_](https://r4ds.had.co.nz/) by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. You can purchase the book at [shop.oreilly.com](http://shop.oreilly.com/product/0636920034407.do).
34 |
35 | The tutorial uses the {ggplot2}, {dplyr}, {scales}, {ggthemes}, and {viridis} packages, which have been pre-loaded for your convenience.
36 |
37 | ##
38 |
39 | ```{r}
40 | #| echo: false
41 | #| results: asis
42 | create_buttons("01-zooming.html")
43 | ```
44 |
--------------------------------------------------------------------------------