├── .Rprofile ├── .gitignore ├── LICENSE.md ├── R └── helpers.R ├── _extensions ├── coatless │ └── webr │ │ ├── _extension.yml │ │ ├── qwebr-cell-elements.js │ │ ├── qwebr-cell-initialization.js │ │ ├── qwebr-compute-engine.js │ │ ├── qwebr-document-engine-initialization.js │ │ ├── qwebr-document-settings.js │ │ ├── qwebr-document-status.js │ │ ├── qwebr-monaco-editor-element.js │ │ ├── qwebr-monaco-editor-init.html │ │ ├── qwebr-styling.css │ │ ├── qwebr-theme-switch.js │ │ ├── template.qmd │ │ ├── webr-serviceworker.js │ │ ├── webr-worker.js │ │ └── webr.lua └── quarto-ext │ └── fontawesome │ ├── _extension.yml │ ├── assets │ ├── css │ │ ├── all.css │ │ └── latex-fontsize.css │ └── webfonts │ │ ├── FontAwesome6Brands-Regular-400.ttf │ │ ├── FontAwesome6Brands-Regular-400.woff2 │ │ ├── FontAwesome6Free-Regular-400.ttf │ │ ├── FontAwesome6Free-Regular-400.woff2 │ │ ├── FontAwesome6Free-Solid-900.ttf │ │ ├── FontAwesome6Free-Solid-900.woff2 │ │ ├── fa-brands-400.ttf │ │ ├── fa-brands-400.woff2 │ │ ├── fa-regular-400.ttf │ │ ├── fa-regular-400.woff2 │ │ ├── fa-solid-900.ttf │ │ ├── fa-solid-900.woff2 │ │ ├── fa-v4compatibility.ttf │ │ └── fa-v4compatibility.woff2 │ └── fontawesome.lua ├── _quarto.yml ├── about.qmd ├── basics ├── 01-visualization-basics │ ├── 01-code-template.qmd │ ├── 02-aesthetic-mappings.qmd │ ├── 03-geometric-objects.qmd │ ├── 04-ggplot2-package.qmd │ └── index.qmd └── 02-programming-basics │ ├── 01-functions.qmd │ ├── 02-arguments.qmd │ ├── 03-objects.qmd │ ├── 04-vectors.qmd │ ├── 05-types.qmd │ ├── 06-lists.qmd │ ├── 07-packages.qmd │ └── index.qmd ├── deploy.sh ├── html └── custom.scss ├── index.qmd ├── js ├── bootstrapify.js └── progressive-reveal.js ├── r-primers.Rproj ├── renv.lock ├── renv ├── .gitignore ├── activate.R └── settings.json ├── tidy-data └── 01-reshape-data │ ├── 01-tidy-data.qmd │ ├── 02-wide-to-long.qmd │ ├── 03-long-to-wide.qmd │ ├── img │ ├── tidy.png │ └── vectorized.png │ └── index.qmd ├── transform-data ├── 01-tibbles │ ├── 01-babynames.qmd │ ├── 02-tibbles.qmd │ ├── 03-tidyverse.qmd │ ├── img │ │ └── tibble_display.png │ └── index.qmd ├── 02-isolating │ ├── 01-your-name.qmd │ ├── 02-select.qmd │ ├── 03-filter.qmd │ ├── 04-arrange.qmd │ ├── 05-pipe.qmd │ └── index.qmd └── 03-deriving │ ├── 01-most-popular-names.qmd │ ├── 02-summarize.qmd │ ├── 03-group_by.qmd │ ├── 04-mutate.qmd │ ├── 05-challenges.qmd │ ├── index.qmd │ └── video │ ├── grp-mutate.mp4 │ ├── grp-summarize-00.mp4 │ ├── grp-summarize-01.mp4 │ ├── grp-summarize-02.mp4 │ ├── grp-summarize-03.mp4 │ └── mutate.mp4 └── visualize-data ├── 01-eda ├── 01-eda.qmd ├── 02-variation.qmd ├── 03-covariation.qmd ├── img │ └── plots-table.png └── index.qmd ├── 02-bar-charts ├── 01-bar-charts.qmd ├── 02-aesthetics.qmd ├── 03-position-adjustments.qmd ├── 04-facets.qmd ├── img │ └── positions.png └── index.qmd ├── 03-histograms ├── 01-histograms.qmd ├── 02-similar-geoms.qmd └── index.qmd ├── 04-boxplots ├── 01-boxplots.qmd ├── 02-similar-geoms.qmd ├── 03-counts.qmd ├── img │ └── box-png.png └── index.qmd ├── 05-scatterplots ├── 01-scatterplots.qmd ├── 02-layers.qmd ├── 03-coordinate-systems.qmd └── index.qmd ├── 06-line-graphs ├── 01-line-graphs.qmd ├── 02-similar-geoms.qmd ├── 03-maps.qmd └── index.qmd ├── 07-overplotting ├── 01-overplotting.qmd ├── 02-rounding.qmd ├── 03-large-data.qmd └── index.qmd └── 08-customize ├── 01-zooming.qmd ├── 02-labels.qmd ├── 03-themes.qmd ├── 04-scales.qmd ├── 05-legends.qmd ├── 06-quiz.qmd ├── img └── viridis.png └── index.qmd /.Rprofile: -------------------------------------------------------------------------------- 1 | source("renv/activate.R") 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | 6 | /.quarto/ 7 | _site/ 8 | _freeze/ 9 | **_cache/ 10 | **_files/ 11 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | # License 2 | 3 | This is a human-readable summary of (and not a substitute for) the license. 4 | Please see 5 | for the full legal text. 6 | 7 | This work is licensed under the Creative Commons 8 | Attribution-ShareAlike 4.0 License (CC BY-SA 4.0). 9 | 10 | **You are free to:** 11 | 12 | - **Share**---copy and redistribute the material in any medium or format, even commercially. 13 | - **Adapt**---remix, transform, and build upon the material for any purpose, even commercially. 14 | 15 | The licensor cannot revoke these freedoms as long as you follow the license terms. 16 | 17 | **Under the following terms:** 18 | 19 | - **Attribution**---You must give appropriate credit, provide a link to the 20 | license, and indicate if changes were made. You may do so in any reasonable 21 | manner, but not in any way that suggests the licensor endorses you or your 22 | use. 23 | 24 | The primers are derived from the book _R for Data Science_. **For the purposes of this license, appropriate credit requires including the phrase, "R for Data Science from O'Reilly Media, Inc. Copyright © 2017 Garrett Grolemund, Hadley Wickham. Used with permission."** 25 | 26 | - **ShareAlike**-—-If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. 27 | 28 | - **No additional restrictions**---You may not apply legal terms or 29 | technological measures that legally restrict others from doing 30 | anything the license permits. 31 | 32 | **Notices:** 33 | 34 | You do not have to comply with the license for elements of the material in the 35 | public domain or where your use is permitted by an applicable exception or 36 | limitation. 37 | 38 | No warranties are given. The license may not give you all of the permissions 39 | necessary for your intended use. For example, other rights such as publicity, 40 | privacy, or moral rights may limit how you use the material. 41 | -------------------------------------------------------------------------------- /R/helpers.R: -------------------------------------------------------------------------------- 1 | create_buttons <- function(next_topic = "#") { 2 | if (is.null(next_topic)) { 3 | next_button <- "" 4 | } else { 5 | next_button <- glue::glue('Next topic') 6 | } 7 | button_section <- glue::glue(' 8 |
9 | 10 | {next_button} 11 | 12 |
') 13 | 14 | cat(button_section) 15 | } 16 | -------------------------------------------------------------------------------- /_extensions/coatless/webr/_extension.yml: -------------------------------------------------------------------------------- 1 | name: webr 2 | title: Embedded webr code cells 3 | author: James Joseph Balamuta 4 | version: 0.4.2-dev.6 5 | quarto-required: ">=1.4.554" 6 | contributes: 7 | filters: 8 | - webr.lua 9 | -------------------------------------------------------------------------------- /_extensions/coatless/webr/qwebr-cell-initialization.js: -------------------------------------------------------------------------------- 1 | // Handle cell initialization initialization 2 | qwebrCellDetails.map( 3 | (entry) => { 4 | // Handle the creation of the element 5 | qwebrCreateHTMLElement(entry); 6 | // In the event of interactive, initialize the monaco editor 7 | if (entry.options.context == EvalTypes.Interactive) { 8 | qwebrCreateMonacoEditorInstance(entry); 9 | } 10 | } 11 | ); 12 | 13 | // Identify non-interactive cells (in order) 14 | const filteredEntries = qwebrCellDetails.filter(entry => { 15 | const contextOption = entry.options && entry.options.context; 16 | return ['output', 'setup'].includes(contextOption) || (contextOption == "interactive" && entry.options && entry.options.autorun === 'true'); 17 | }); 18 | 19 | // Condition non-interactive cells to only be run after webR finishes its initialization. 20 | qwebrInstance.then( 21 | async () => { 22 | const nHiddenCells = filteredEntries.length; 23 | var currentHiddenCell = 0; 24 | 25 | 26 | // Modify button state 27 | qwebrSetInteractiveButtonState(`🟡 Running hidden code cells ...`, false); 28 | 29 | // Begin processing non-interactive sections 30 | // Due to the iteration policy, we must use a for() loop. 31 | // Otherwise, we would need to switch to using reduce with an empty 32 | // starting promise 33 | for (const entry of filteredEntries) { 34 | 35 | // Determine cell being examined 36 | currentHiddenCell = currentHiddenCell + 1; 37 | const formattedMessage = `Evaluating hidden cell ${currentHiddenCell} out of ${nHiddenCells}`; 38 | 39 | // Update the document status header 40 | if (qwebrShowStartupMessage) { 41 | qwebrUpdateStatusHeader(formattedMessage); 42 | } 43 | 44 | // Display the update in non-active areas 45 | qwebrUpdateStatusMessage(formattedMessage); 46 | 47 | // Extract details on the active cell 48 | const evalType = entry.options.context; 49 | const cellCode = entry.code; 50 | const qwebrCounter = entry.id; 51 | 52 | if (['output', 'setup'].includes(evalType)) { 53 | // Disable further global status updates 54 | const activeContainer = document.getElementById(`qwebr-non-interactive-loading-container-${qwebrCounter}`); 55 | activeContainer.classList.remove('qwebr-cell-needs-evaluation'); 56 | activeContainer.classList.add('qwebr-cell-evaluated'); 57 | 58 | // Update status on the code cell 59 | const activeStatus = document.getElementById(`qwebr-status-text-${qwebrCounter}`); 60 | activeStatus.innerText = " Evaluating hidden code cell..."; 61 | activeStatus.classList.remove('qwebr-cell-needs-evaluation'); 62 | activeStatus.classList.add('qwebr-cell-evaluated'); 63 | } 64 | 65 | switch (evalType) { 66 | case 'interactive': 67 | // TODO: Make this more standardized. 68 | // At the moment, we're overriding the interactive status update by pretending its 69 | // output-like. 70 | const tempOptions = entry.options; 71 | tempOptions["context"] = "output" 72 | // Run the code in a non-interactive state that is geared to displaying output 73 | await qwebrExecuteCode(`${cellCode}`, qwebrCounter, tempOptions); 74 | break; 75 | case 'output': 76 | // Run the code in a non-interactive state that is geared to displaying output 77 | await qwebrExecuteCode(`${cellCode}`, qwebrCounter, entry.options); 78 | break; 79 | case 'setup': 80 | const activeDiv = document.getElementById(`qwebr-noninteractive-setup-area-${qwebrCounter}`); 81 | // Run the code in a non-interactive state with all output thrown away 82 | await mainWebR.evalRVoid(`${cellCode}`); 83 | break; 84 | default: 85 | break; 86 | } 87 | 88 | if (['output', 'setup'].includes(evalType)) { 89 | // Disable further global status updates 90 | const activeContainer = document.getElementById(`qwebr-non-interactive-loading-container-${qwebrCounter}`); 91 | // Disable visibility 92 | activeContainer.style.visibility = 'hidden'; 93 | activeContainer.style.display = 'none'; 94 | } 95 | } 96 | } 97 | ).then( 98 | () => { 99 | // Release document status as ready 100 | 101 | if (qwebrShowStartupMessage) { 102 | qwebrStartupMessage.innerText = "🟢 Ready!" 103 | } 104 | 105 | qwebrSetInteractiveButtonState( 106 | ` Run Code`, 107 | true 108 | ); 109 | } 110 | ); -------------------------------------------------------------------------------- /_extensions/coatless/webr/qwebr-document-engine-initialization.js: -------------------------------------------------------------------------------- 1 | // Function to install a single package 2 | async function qwebrInstallRPackage(packageName) { 3 | await mainWebR.evalRVoid(`webr::install('${packageName}');`); 4 | } 5 | 6 | // Function to load a single package 7 | async function qwebrLoadRPackage(packageName) { 8 | await mainWebR.evalRVoid(`require('${packageName}', quietly = TRUE)`); 9 | } 10 | 11 | // Generic function to process R packages 12 | async function qwebrProcessRPackagesWithStatus(packages, processType, displayStatusMessageUpdate = true) { 13 | // Switch between contexts 14 | const messagePrefix = processType === 'install' ? 'Installing' : 'Loading'; 15 | 16 | // Modify button state 17 | qwebrSetInteractiveButtonState(`🟡 ${messagePrefix} package ...`, false); 18 | 19 | // Iterate over packages 20 | for (let i = 0; i < packages.length; i++) { 21 | const activePackage = packages[i]; 22 | const formattedMessage = `${messagePrefix} package ${i + 1} out of ${packages.length}: ${activePackage}`; 23 | 24 | // Display the update in header 25 | if (displayStatusMessageUpdate) { 26 | qwebrUpdateStatusHeader(formattedMessage); 27 | } 28 | 29 | // Display the update in non-active areas 30 | qwebrUpdateStatusMessage(formattedMessage); 31 | 32 | // Run package installation 33 | if (processType === 'install') { 34 | await qwebrInstallRPackage(activePackage); 35 | } else { 36 | await qwebrLoadRPackage(activePackage); 37 | } 38 | } 39 | 40 | // Clean slate 41 | if (processType === 'load') { 42 | await mainWebR.flush(); 43 | } 44 | } 45 | 46 | // Start a timer 47 | const initializeWebRTimerStart = performance.now(); 48 | 49 | // Encase with a dynamic import statement 50 | globalThis.qwebrInstance = import(qwebrCustomizedWebROptions.baseURL + "webr.mjs").then( 51 | async ({ WebR, ChannelType }) => { 52 | // Populate WebR options with defaults or new values based on `webr` meta 53 | globalThis.mainWebR = new WebR(qwebrCustomizedWebROptions); 54 | 55 | // Initialization WebR 56 | await mainWebR.init(); 57 | 58 | // Setup a shelter 59 | globalThis.mainWebRCodeShelter = await new mainWebR.Shelter(); 60 | 61 | // Setup a pager to allow processing help documentation 62 | await mainWebR.evalRVoid('webr::pager_install()'); 63 | 64 | // Override the existing install.packages() to use webr::install() 65 | await mainWebR.evalRVoid('webr::shim_install()'); 66 | 67 | // Specify the repositories to pull from 68 | // Note: webR does not use the `repos` option, but instead uses `webr_pkg_repos` 69 | // inside of `install()`. However, other R functions still pull from `repos`. 70 | await mainWebR.evalRVoid(` 71 | options( 72 | webr_pkg_repos = c(${qwebrPackageRepoURLS.map(repoURL => `'${repoURL}'`).join(',')}), 73 | repos = c(${qwebrPackageRepoURLS.map(repoURL => `'${repoURL}'`).join(',')}) 74 | ) 75 | `); 76 | 77 | // Check to see if any packages need to be installed 78 | if (qwebrSetupRPackages) { 79 | // Obtain only a unique list of packages 80 | const uniqueRPackageList = Array.from(new Set(qwebrInstallRPackagesList)); 81 | 82 | // Install R packages one at a time (either silently or with a status update) 83 | await qwebrProcessRPackagesWithStatus(uniqueRPackageList, 'install', qwebrShowStartupMessage); 84 | 85 | if (qwebrAutoloadRPackages) { 86 | // Load R packages one at a time (either silently or with a status update) 87 | await qwebrProcessRPackagesWithStatus(uniqueRPackageList, 'load', qwebrShowStartupMessage); 88 | } 89 | } 90 | } 91 | ); 92 | 93 | // Stop timer 94 | const initializeWebRTimerEnd = performance.now(); 95 | -------------------------------------------------------------------------------- /_extensions/coatless/webr/qwebr-document-settings.js: -------------------------------------------------------------------------------- 1 | // Document level settings ---- 2 | 3 | // Determine if we need to install R packages 4 | globalThis.qwebrInstallRPackagesList = [{{INSTALLRPACKAGESLIST}}]; 5 | 6 | // Specify possible locations to search for the repository 7 | globalThis.qwebrPackageRepoURLS = [{{RPACKAGEREPOURLS}}]; 8 | 9 | // Check to see if we have an empty array, if we do set to skip the installation. 10 | globalThis.qwebrSetupRPackages = !(qwebrInstallRPackagesList.indexOf("") !== -1); 11 | globalThis.qwebrAutoloadRPackages = {{AUTOLOADRPACKAGES}}; 12 | 13 | // Display a startup message? 14 | globalThis.qwebrShowStartupMessage = {{SHOWSTARTUPMESSAGE}}; 15 | globalThis.qwebrShowHeaderMessage = {{SHOWHEADERMESSAGE}}; 16 | 17 | // Describe the webR settings that should be used 18 | globalThis.qwebrCustomizedWebROptions = { 19 | "baseURL": "{{BASEURL}}", 20 | "serviceWorkerUrl": "{{SERVICEWORKERURL}}", 21 | "homedir": "{{HOMEDIR}}", 22 | "channelType": "{{CHANNELTYPE}}" 23 | }; 24 | 25 | // Store cell data 26 | globalThis.qwebrCellDetails = {{QWEBRCELLDETAILS}}; 27 | -------------------------------------------------------------------------------- /_extensions/coatless/webr/qwebr-document-status.js: -------------------------------------------------------------------------------- 1 | // Declare startupMessageQWebR globally 2 | globalThis.qwebrStartupMessage = document.createElement("p"); 3 | 4 | // Verify if OffScreenCanvas is supported 5 | globalThis.qwebrOffScreenCanvasSupport = function() { 6 | return typeof OffscreenCanvas !== 'undefined' 7 | } 8 | 9 | // Function to set the button text 10 | globalThis.qwebrSetInteractiveButtonState = function(buttonText, enableCodeButton = true) { 11 | document.querySelectorAll(".qwebr-button-run").forEach((btn) => { 12 | btn.innerHTML = buttonText; 13 | btn.disabled = !enableCodeButton; 14 | }); 15 | } 16 | 17 | // Function to update the status message in non-interactive cells 18 | globalThis.qwebrUpdateStatusMessage = function(message) { 19 | document.querySelectorAll(".qwebr-status-text.qwebr-cell-needs-evaluation").forEach((elem) => { 20 | elem.innerText = message; 21 | }); 22 | } 23 | 24 | // Function to update the status message 25 | globalThis.qwebrUpdateStatusHeader = function(message) { 26 | qwebrStartupMessage.innerHTML = ` 27 | 28 | ${message}`; 29 | } 30 | 31 | function qwebrPlaceMessageContents(content, html_location = "title-block-header", revealjs_location = "title-slide") { 32 | 33 | // Get references to header elements 34 | const headerHTML = document.getElementById(html_location); 35 | const headerRevealJS = document.getElementById(revealjs_location); 36 | 37 | // Determine where to insert the quartoTitleMeta element 38 | if (headerHTML || headerRevealJS) { 39 | // Append to the existing "title-block-header" element or "title-slide" div 40 | (headerHTML || headerRevealJS).appendChild(content); 41 | } else { 42 | // If neither headerHTML nor headerRevealJS is found, insert after "webr-monaco-editor-init" script 43 | const monacoScript = document.getElementById("qwebr-monaco-editor-init"); 44 | const header = document.createElement("header"); 45 | header.setAttribute("id", "title-block-header"); 46 | header.appendChild(content); 47 | monacoScript.after(header); 48 | } 49 | } 50 | 51 | 52 | function qwebrOffScreenCanvasSupportWarningMessage() { 53 | 54 | // Verify canvas is supported. 55 | if(qwebrOffScreenCanvasSupport()) return; 56 | 57 | // Create the main container div 58 | var calloutContainer = document.createElement('div'); 59 | calloutContainer.classList.add('callout', 'callout-style-default', 'callout-warning', 'callout-titled'); 60 | 61 | // Create the header div 62 | var headerDiv = document.createElement('div'); 63 | headerDiv.classList.add('callout-header', 'd-flex', 'align-content-center'); 64 | 65 | // Create the icon container div 66 | var iconContainer = document.createElement('div'); 67 | iconContainer.classList.add('callout-icon-container'); 68 | 69 | // Create the icon element 70 | var iconElement = document.createElement('i'); 71 | iconElement.classList.add('callout-icon'); 72 | 73 | // Append the icon element to the icon container 74 | iconContainer.appendChild(iconElement); 75 | 76 | // Create the title container div 77 | var titleContainer = document.createElement('div'); 78 | titleContainer.classList.add('callout-title-container', 'flex-fill'); 79 | titleContainer.innerText = 'Warning: Web Browser Does Not Support Graphing!'; 80 | 81 | // Append the icon container and title container to the header div 82 | headerDiv.appendChild(iconContainer); 83 | headerDiv.appendChild(titleContainer); 84 | 85 | // Create the body container div 86 | var bodyContainer = document.createElement('div'); 87 | bodyContainer.classList.add('callout-body-container', 'callout-body'); 88 | 89 | // Create the paragraph element for the body content 90 | var paragraphElement = document.createElement('p'); 91 | paragraphElement.innerHTML = 'This web browser does not have support for displaying graphs through the quarto-webr extension since it lacks an OffScreenCanvas. Please upgrade your web browser to one that supports OffScreenCanvas.'; 92 | 93 | // Append the paragraph element to the body container 94 | bodyContainer.appendChild(paragraphElement); 95 | 96 | // Append the header div and body container to the main container div 97 | calloutContainer.appendChild(headerDiv); 98 | calloutContainer.appendChild(bodyContainer); 99 | 100 | // Append the main container div to the document depending on format 101 | qwebrPlaceMessageContents(calloutContainer, "title-block-header"); 102 | 103 | } 104 | 105 | 106 | // Function that attaches the document status message and diagnostics 107 | function displayStartupMessage(showStartupMessage, showHeaderMessage) { 108 | if (!showStartupMessage) { 109 | return; 110 | } 111 | 112 | // Create the outermost div element for metadata 113 | const quartoTitleMeta = document.createElement("div"); 114 | quartoTitleMeta.classList.add("quarto-title-meta"); 115 | 116 | // Create the first inner div element 117 | const firstInnerDiv = document.createElement("div"); 118 | firstInnerDiv.setAttribute("id", "qwebr-status-message-area"); 119 | 120 | // Create the second inner div element for "WebR Status" heading and contents 121 | const secondInnerDiv = document.createElement("div"); 122 | secondInnerDiv.setAttribute("id", "qwebr-status-message-title"); 123 | secondInnerDiv.classList.add("quarto-title-meta-heading"); 124 | secondInnerDiv.innerText = "WebR Status"; 125 | 126 | // Create another inner div for contents 127 | const secondInnerDivContents = document.createElement("div"); 128 | secondInnerDivContents.setAttribute("id", "qwebr-status-message-body"); 129 | secondInnerDivContents.classList.add("quarto-title-meta-contents"); 130 | 131 | // Describe the WebR state 132 | qwebrStartupMessage.innerText = "🟡 Loading..."; 133 | qwebrStartupMessage.setAttribute("id", "qwebr-status-message-text"); 134 | // Add `aria-live` to auto-announce the startup status to screen readers 135 | qwebrStartupMessage.setAttribute("aria-live", "assertive"); 136 | 137 | // Append the startup message to the contents 138 | secondInnerDivContents.appendChild(qwebrStartupMessage); 139 | 140 | // Add a status indicator for COOP and COEP Headers if needed 141 | if (showHeaderMessage) { 142 | const crossOriginMessage = document.createElement("p"); 143 | crossOriginMessage.innerText = `${crossOriginIsolated ? '🟢' : '🟡'} COOP & COEP Headers`; 144 | crossOriginMessage.setAttribute("id", "qwebr-coop-coep-header"); 145 | secondInnerDivContents.appendChild(crossOriginMessage); 146 | } 147 | 148 | // Combine the inner divs and contents 149 | firstInnerDiv.appendChild(secondInnerDiv); 150 | firstInnerDiv.appendChild(secondInnerDivContents); 151 | quartoTitleMeta.appendChild(firstInnerDiv); 152 | 153 | // Place message on webpage 154 | qwebrPlaceMessageContents(quartoTitleMeta); 155 | } 156 | 157 | displayStartupMessage(qwebrShowStartupMessage, qwebrShowHeaderMessage); 158 | qwebrOffScreenCanvasSupportWarningMessage(); -------------------------------------------------------------------------------- /_extensions/coatless/webr/qwebr-monaco-editor-init.html: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /_extensions/coatless/webr/qwebr-styling.css: -------------------------------------------------------------------------------- 1 | .monaco-editor pre { 2 | background-color: unset !important; 3 | } 4 | 5 | .qwebr-editor-toolbar { 6 | width: 100%; 7 | display: flex; 8 | justify-content: space-between; 9 | box-sizing: border-box; 10 | } 11 | 12 | .qwebr-editor-toolbar-left-buttons, .qwebr-editor-toolbar-right-buttons { 13 | display: flex; 14 | } 15 | 16 | .qwebr-non-interactive-loading-container.qwebr-cell-needs-evaluation, .qwebr-non-interactive-loading-container.qwebr-cell-evaluated { 17 | justify-content: center; 18 | display: flex; 19 | background-color: rgba(250, 250, 250, 0.65); 20 | border: 1px solid rgba(233, 236, 239, 0.65); 21 | border-radius: 0.5rem; 22 | margin-top: 15px; 23 | margin-bottom: 15px; 24 | } 25 | 26 | .qwebr-r-project-logo { 27 | color: #2767B0; /* R Project's blue color */ 28 | } 29 | 30 | .qwebr-icon-status-spinner { 31 | color: #7894c4; 32 | } 33 | 34 | .qwebr-icon-run-code { 35 | color: #0d9c29 36 | } 37 | 38 | body.quarto-light .qwebr-output-code-stdout { 39 | color: #111; 40 | } 41 | 42 | body.quarto-dark .qwebr-output-code-stdout { 43 | color: #EEE; 44 | } 45 | 46 | .qwebr-output-code-stderr { 47 | color: #db4133; 48 | } 49 | 50 | body.quarto-light .qwebr-editor { 51 | border: 1px solid #EEEEEE; 52 | } 53 | 54 | body.quarto-light .qwebr-editor-toolbar { 55 | background-color: #EEEEEE; 56 | padding: 0.2rem 0.5rem; 57 | } 58 | 59 | body.quarto-dark .qwebr-editor { 60 | border: 1px solid #111; 61 | } 62 | 63 | body.quarto-dark .qwebr-editor-toolbar { 64 | background-color: #111; 65 | padding: 0.2rem 0.5rem; 66 | } 67 | 68 | .qwebr-button { 69 | display: inline-block; 70 | font-weight: 400; 71 | line-height: 1; 72 | text-decoration: none; 73 | text-align: center; 74 | padding: 0.375rem 0.75rem; 75 | font-size: .9rem; 76 | border-radius: 0.25rem; 77 | transition: color .15s ease-in-out,background-color .15s ease-in-out,border-color .15s ease-in-out,box-shadow .15s ease-in-out; 78 | } 79 | 80 | body.quarto-light .qwebr-button { 81 | background-color: #EEEEEE; 82 | color: #000; 83 | border-color: #dee2e6; 84 | border: 1px solid rgba(0,0,0,0); 85 | } 86 | 87 | body.quarto-dark .qwebr-button { 88 | background-color: #111; 89 | color: #EEE; 90 | border-color: #dee2e6; 91 | border: 1px solid rgba(0,0,0,0); 92 | } 93 | 94 | body.quarto-light .qwebr-button:hover { 95 | color: #000; 96 | background-color: #d9dce0; 97 | border-color: #c8ccd0; 98 | } 99 | 100 | body.quarto-dark .qwebr-button:hover { 101 | color: #d9dce0; 102 | background-color: #323232; 103 | border-color: #d9dce0; 104 | } 105 | 106 | .qwebr-button:disabled,.qwebr-button.disabled,fieldset:disabled .qwebr-button { 107 | pointer-events: none; 108 | opacity: .65 109 | } 110 | 111 | .qwebr-button-reset { 112 | color: #696969; /*#4682b4;*/ 113 | } 114 | 115 | .qwebr-button-copy { 116 | color: #696969; 117 | } 118 | 119 | 120 | /* Custom styling for RevealJS Presentations*/ 121 | 122 | /* Reset the style of the interactive area */ 123 | .reveal div.qwebr-interactive-area { 124 | display: block; 125 | box-shadow: none; 126 | max-width: 100%; 127 | max-height: 100%; 128 | margin: 0; 129 | padding: 0; 130 | } 131 | 132 | /* Provide space to entries */ 133 | .reveal div.qwebr-output-code-area pre div { 134 | margin: 1px 2px 1px 10px; 135 | } 136 | 137 | /* Collapse the inside code tags to avoid extra space between line outputs */ 138 | .reveal pre div code.qwebr-output-code-stdout, .reveal pre div code.qwebr-output-code-stderr { 139 | padding: 0; 140 | display: contents; 141 | } 142 | 143 | body.reveal.quarto-light pre div code.qwebr-output-code-stdout { 144 | color: #111; 145 | } 146 | 147 | body.reveal.quarto-dark pre div code.qwebr-output-code-stdout { 148 | color: #EEEEEE; 149 | } 150 | 151 | .reveal pre div code.qwebr-output-code-stderr { 152 | color: #db4133; 153 | } 154 | 155 | 156 | /* Create a border around console and output (does not effect graphs) */ 157 | body.reveal.quarto-light div.qwebr-console-area { 158 | border: 1px solid #EEEEEE; 159 | box-shadow: 2px 2px 10px #EEEEEE; 160 | } 161 | 162 | body.reveal.quarto-dark div.qwebr-console-area { 163 | border: 1px solid #111; 164 | box-shadow: 2px 2px 10px #111; 165 | } 166 | 167 | 168 | /* Cap output height and allow text to scroll */ 169 | /* TODO: Is there a better way to fit contents/max it parallel to the monaco editor size? */ 170 | .reveal div.qwebr-output-code-area pre { 171 | max-height: 400px; 172 | overflow: scroll; 173 | } 174 | -------------------------------------------------------------------------------- /_extensions/coatless/webr/qwebr-theme-switch.js: -------------------------------------------------------------------------------- 1 | // Function to update Monaco Editors when body class changes 2 | function updateMonacoEditorsOnBodyClassChange() { 3 | // Select the body element 4 | const body = document.querySelector('body'); 5 | 6 | // Options for the observer (which mutations to observe) 7 | const observerOptions = { 8 | attributes: true, // Observe changes to attributes 9 | attributeFilter: ['class'] // Only observe changes to the 'class' attribute 10 | }; 11 | 12 | // Callback function to execute when mutations are observed 13 | const bodyClassChangeCallback = function(mutationsList, observer) { 14 | for(let mutation of mutationsList) { 15 | if (mutation.type === 'attributes' && mutation.attributeName === 'class') { 16 | // Class attribute has changed 17 | // Update all Monaco Editors on the page 18 | updateMonacoEditorTheme(); 19 | } 20 | } 21 | }; 22 | 23 | // Create an observer instance linked to the callback function 24 | const observer = new MutationObserver(bodyClassChangeCallback); 25 | 26 | // Start observing the target node for configured mutations 27 | observer.observe(body, observerOptions); 28 | } 29 | 30 | // Function to update all instances of Monaco Editors on the page 31 | function updateMonacoEditorTheme() { 32 | // Determine what VS Theme to use 33 | const vsThemeToUse = document.body.classList.contains("quarto-dark") ? 'vs-dark' : 'vs' ; 34 | 35 | // Iterate through all initialized Monaco Editors 36 | qwebrEditorInstances.forEach( function(editorInstance) { 37 | editorInstance.updateOptions({ theme: vsThemeToUse }); 38 | }); 39 | } 40 | 41 | // Call the function to start observing changes to body class 42 | updateMonacoEditorsOnBodyClassChange(); -------------------------------------------------------------------------------- /_extensions/coatless/webr/template.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "WebR-enabled code cell" 3 | format: html 4 | engine: knitr 5 | #webr: 6 | # show-startup-message: false # Disable display of webR initialization state 7 | # show-header-message: true # Display whether COOP&COEP headers are set for speed. 8 | # packages: ['ggplot2', 'dplyr'] # Pre-install dependencies 9 | # autoload-packages: false # Disable automatic library calls on R packages specified in packages. 10 | # repos: # Specify repositories to check for custom packages 11 | # - https://github-username.github.io/reponame 12 | # - https://username.r-universe.dev 13 | # channel-type: 'post-message' # Specify a specific communication channel type. 14 | # home-dir: "/home/rstudio" # Customize where the working directory is 15 | # base-url: '' # Base URL used for downloading R WebAssembly binaries 16 | # service-worker-url: '' # URL from where to load JavaScript worker scripts when loading webR with the ServiceWorker communication channel. 17 | filters: 18 | - webr 19 | --- 20 | 21 | ## Demo 22 | 23 | This is a webr-enabled code cell in a Quarto HTML document. 24 | 25 | ```{webr-r} 26 | 1 + 1 27 | ``` 28 | 29 | ```{webr-r} 30 | fit = lm(mpg ~ am, data = mtcars) 31 | summary(fit) 32 | ``` 33 | 34 | ```{webr-r} 35 | plot(pressure) 36 | ``` 37 | -------------------------------------------------------------------------------- /_extensions/coatless/webr/webr-serviceworker.js: -------------------------------------------------------------------------------- 1 | importScripts('https://webr.r-wasm.org/v0.3.3/webr-serviceworker.js'); 2 | -------------------------------------------------------------------------------- /_extensions/coatless/webr/webr-worker.js: -------------------------------------------------------------------------------- 1 | importScripts('https://webr.r-wasm.org/v0.3.3/webr-worker.js'); 2 | -------------------------------------------------------------------------------- /_extensions/quarto-ext/fontawesome/_extension.yml: -------------------------------------------------------------------------------- 1 | title: Font Awesome support 2 | author: Carlos Scheidegger 3 | version: 1.1.0 4 | quarto-required: ">=1.2.269" 5 | contributes: 6 | shortcodes: 7 | - fontawesome.lua 8 | -------------------------------------------------------------------------------- /_extensions/quarto-ext/fontawesome/assets/css/latex-fontsize.css: -------------------------------------------------------------------------------- 1 | .fa-tiny { 2 | font-size: 0.5em; 3 | } 4 | .fa-scriptsize { 5 | font-size: 0.7em; 6 | } 7 | .fa-footnotesize { 8 | font-size: 0.8em; 9 | } 10 | .fa-small { 11 | font-size: 0.9em; 12 | } 13 | .fa-normalsize { 14 | font-size: 1em; 15 | } 16 | .fa-large { 17 | font-size: 1.2em; 18 | } 19 | .fa-Large { 20 | font-size: 1.5em; 21 | } 22 | .fa-LARGE { 23 | font-size: 1.75em; 24 | } 25 | .fa-huge { 26 | font-size: 2em; 27 | } 28 | .fa-Huge { 29 | font-size: 2.5em; 30 | } 31 | -------------------------------------------------------------------------------- /_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Brands-Regular-400.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Brands-Regular-400.ttf -------------------------------------------------------------------------------- /_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Brands-Regular-400.woff2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Brands-Regular-400.woff2 -------------------------------------------------------------------------------- /_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Free-Regular-400.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Free-Regular-400.ttf -------------------------------------------------------------------------------- /_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Free-Regular-400.woff2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Free-Regular-400.woff2 -------------------------------------------------------------------------------- /_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Free-Solid-900.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Free-Solid-900.ttf -------------------------------------------------------------------------------- /_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Free-Solid-900.woff2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/FontAwesome6Free-Solid-900.woff2 -------------------------------------------------------------------------------- /_extensions/quarto-ext/fontawesome/assets/webfonts/fa-brands-400.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-brands-400.ttf -------------------------------------------------------------------------------- /_extensions/quarto-ext/fontawesome/assets/webfonts/fa-brands-400.woff2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-brands-400.woff2 -------------------------------------------------------------------------------- /_extensions/quarto-ext/fontawesome/assets/webfonts/fa-regular-400.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-regular-400.ttf -------------------------------------------------------------------------------- /_extensions/quarto-ext/fontawesome/assets/webfonts/fa-regular-400.woff2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-regular-400.woff2 -------------------------------------------------------------------------------- /_extensions/quarto-ext/fontawesome/assets/webfonts/fa-solid-900.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-solid-900.ttf -------------------------------------------------------------------------------- /_extensions/quarto-ext/fontawesome/assets/webfonts/fa-solid-900.woff2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-solid-900.woff2 -------------------------------------------------------------------------------- /_extensions/quarto-ext/fontawesome/assets/webfonts/fa-v4compatibility.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-v4compatibility.ttf -------------------------------------------------------------------------------- /_extensions/quarto-ext/fontawesome/assets/webfonts/fa-v4compatibility.woff2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/_extensions/quarto-ext/fontawesome/assets/webfonts/fa-v4compatibility.woff2 -------------------------------------------------------------------------------- /_extensions/quarto-ext/fontawesome/fontawesome.lua: -------------------------------------------------------------------------------- 1 | local function ensureLatexDeps() 2 | quarto.doc.use_latex_package("fontawesome5") 3 | end 4 | 5 | local function ensureHtmlDeps() 6 | quarto.doc.add_html_dependency({ 7 | name = 'fontawesome6', 8 | version = '0.1.0', 9 | stylesheets = {'assets/css/all.css', 'assets/css/latex-fontsize.css'} 10 | }) 11 | end 12 | 13 | local function isEmpty(s) 14 | return s == nil or s == '' 15 | end 16 | 17 | local function isValidSize(size) 18 | local validSizes = { 19 | "tiny", 20 | "scriptsize", 21 | "footnotesize", 22 | "small", 23 | "normalsize", 24 | "large", 25 | "Large", 26 | "LARGE", 27 | "huge", 28 | "Huge" 29 | } 30 | for _, v in ipairs(validSizes) do 31 | if v == size then 32 | return size 33 | end 34 | end 35 | return "" 36 | end 37 | 38 | return { 39 | ["fa"] = function(args, kwargs) 40 | 41 | local group = "solid" 42 | local icon = pandoc.utils.stringify(args[1]) 43 | if #args > 1 then 44 | group = icon 45 | icon = pandoc.utils.stringify(args[2]) 46 | end 47 | 48 | local title = pandoc.utils.stringify(kwargs["title"]) 49 | if not isEmpty(title) then 50 | title = " title=\"" .. title .. "\"" 51 | end 52 | 53 | local label = pandoc.utils.stringify(kwargs["label"]) 54 | if isEmpty(label) then 55 | label = " aria-label=\"" .. icon .. "\"" 56 | else 57 | label = " aria-label=\"" .. label .. "\"" 58 | end 59 | 60 | local size = pandoc.utils.stringify(kwargs["size"]) 61 | 62 | -- detect html (excluding epub which won't handle fa) 63 | if quarto.doc.is_format("html:js") then 64 | ensureHtmlDeps() 65 | if not isEmpty(size) then 66 | size = " fa-" .. size 67 | end 68 | return pandoc.RawInline( 69 | 'html', 70 | "" 71 | ) 72 | -- detect pdf / beamer / latex / etc 73 | elseif quarto.doc.is_format("pdf") then 74 | ensureLatexDeps() 75 | if isEmpty(isValidSize(size)) then 76 | return pandoc.RawInline('tex', "\\faIcon{" .. icon .. "}") 77 | else 78 | return pandoc.RawInline('tex', "{\\" .. size .. "\\faIcon{" .. icon .. "}}") 79 | end 80 | else 81 | return pandoc.Null() 82 | end 83 | end 84 | } 85 | -------------------------------------------------------------------------------- /_quarto.yml: -------------------------------------------------------------------------------- 1 | project: 2 | type: website 3 | preview: 4 | port: 5555 5 | 6 | execute: 7 | freeze: auto # Re-render only when source changes 8 | 9 | website: 10 | title: "R Primers" 11 | bread-crumbs: false 12 | 13 | repo-url: "https://github.com/andrewheiss/r-primers" 14 | repo-actions: [edit, issue] 15 | 16 | navbar: 17 | pinned: true 18 | left: 19 | - about.qmd 20 | right: 21 | - icon: github 22 | aria-label: github 23 | href: https://github.com/andrewheiss/r-primers 24 | 25 | sidebar: 26 | style: "docked" 27 | collapse-level: 2 28 | contents: 29 | - section: "Basics" 30 | contents: 31 | - auto: "basics/01-visualization-basics" 32 | - auto: "basics/02-programming-basics" 33 | 34 | - section: "Transform data" 35 | contents: 36 | - auto: "transform-data/01-tibbles" 37 | - auto: "transform-data/02-isolating" 38 | - auto: "transform-data/03-deriving" 39 | 40 | - section: "Visualize data" 41 | contents: 42 | - auto: "visualize-data/01-eda" 43 | - auto: "visualize-data/02-bar-charts" 44 | - auto: "visualize-data/03-histograms" 45 | - auto: "visualize-data/04-boxplots" 46 | - auto: "visualize-data/05-scatterplots" 47 | - auto: "visualize-data/06-line-graphs" 48 | - auto: "visualize-data/07-overplotting" 49 | - auto: "visualize-data/08-customize" 50 | 51 | - section: "Tidy data" 52 | contents: 53 | - auto: "tidy-data/01-reshape-data" 54 | 55 | format: 56 | html: 57 | theme: 58 | - zephyr 59 | - html/custom.scss 60 | toc: true 61 | toc-depth: 3 62 | knitr: 63 | opts_chunk: 64 | dev: "ragg_png" 65 | dpi: 300 66 | -------------------------------------------------------------------------------- /about.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "About" 3 | sidebar: false 4 | 5 | engine: knitr 6 | filters: 7 | - webr 8 | webr: 9 | cell-options: 10 | editor-font-scale: 0.85 11 | fig-width: 6 12 | fig-height: 3.7 13 | out-width: "70%" 14 | --- 15 | 16 | ## A brief (and probably inaccurate) history of the RStudio/Posit Primers 17 | 18 | In 2018, Garrett Grolemund (co-author of [*R for Data Science*](https://r4ds.had.co.nz/)) created the RStudio Primers—a set of free interactive [{learnr}](https://rstudio.github.io/learnr/) apps to teach R to the public. These were hosted on a [Shiny](https://shiny.posit.co/) server maintained by RStudio and accessible through RStudio.cloud. 19 | 20 | With [RStudio's rebranding to Posit in 2022](https://posit.co/blog/rstudio-is-becoming-posit/), the lessons became the Posit Primers and remained accessible through [Posit.cloud](https://posit.cloud/). 21 | 22 | In December 2023, the Posit Primers were retired in favor of [Posit Recipes](https://posit.cloud/learn/recipes) and [Posit Cheatsheets](https://posit.co/resources/cheatsheets/). These newer resources have been updated to the latest versions of {tidyverse} packages, and they're no longer interactive (which is probably a lot easier for Posit's education team to maintain). 23 | 24 | ## How I've used the Primers in the past 25 | 26 | I've been relying on the RStudio/Posit Primers for teaching [my own R-focused classes](https://www.andrewheiss.com/teaching/) since 2020. In the first few weeks of every semester, I had students complete a bunch of the tutorials to get the hang of {dplyr} and {ggplot2}. 27 | 28 | With the sunsetting of the Primers at the beginning of 2024, though, I had to figure out a new solution. 29 | 30 | Fortunately, the RStudio/Posit Education team [posted the source for the Primers at GitHub](https://github.com/rstudio-education/primers) under a [Creative Commons license](https://github.com/rstudio-education/primers/blob/master/LICENSE.md), so for Spring 2024, I maintained a Shiny server with the tutorials I needed for my classes. 31 | 32 | ## The magic of webR 33 | 34 | Starting in 2023, [webR](https://docs.r-wasm.org/webr/latest/)—a version of R compiled to run with Javascript in a web browser—underwent rapid development, and a new Quarto extension ([{quarto-webr}](https://quarto-webr.thecoatlessprofessor.com/)) has since been developed to make it almost trivially easy to include Shiny-free interactive R chunks directly in the browser, like this: 35 | 36 | ```{webr-r} 37 | hist(faithful$waiting) 38 | 39 | 40 | ``` 41 | 42 | That's ***so magical***! 43 | 44 | So for my Summer 2024 classes, I decided to take the plunge and convert the Shiny-based {learnr} tutorials that I've been using for so long into a webR-based website. 45 | 46 | ## How it works 47 | 48 | The tutorials aren't nearly as fully featured as {learnr}, but they get the job done. 49 | 50 | ### {learnr} hints and solutions 51 | 52 | To simulate {learnr}'s hint and solution functionality, I use Quarto's [Tabset Panels](https://quarto.org/docs/interactive/layout.html#tabset-panel): 53 | 54 | ````default 55 | ::: {.panel-tabset} 56 | ## {{{< fa code >}}} Interactive editor 57 | 58 | ```{webr-r} 59 | # Calculate 1 + 2 60 | ``` 61 | 62 | ## {{{< fa lightbulb >}}} Hint 63 | 64 | **Hint:** Think about addition 65 | 66 | ## {{{< fa circle-check >}}} Solution 67 | 68 | ```r 69 | 1 + 2 70 | ``` 71 | 72 | ::: 73 | ```` 74 | 75 | ::: {.panel-tabset} 76 | ## {{< fa code >}} Interactive editor 77 | 78 | ```{webr-r} 79 | # Calculate 1 + 2 80 | 81 | 82 | 83 | ``` 84 | 85 | ## {{< fa lightbulb >}} Hint 86 | 87 | **Hint:** Think about addition 88 | 89 | ## {{< fa circle-check >}} Solution 90 | 91 | ```r 92 | 1 + 2 93 | ``` 94 | 95 | ::: 96 | 97 | ### {learnr} progressive reveal 98 | 99 | One great feature of {learnr} is its [progressive reveal](https://rstudio.github.io/learnr/articles/exercises.html#progressive-reveal), which unhides sections of a tutorial as you work through it. To simulate this with Quarto, I turned to Javascript. [My `progressive-reveal.js` script](https://github.com/andrewheiss/r-primers/blob/main/js/progressive-reveal.js) looks for all third-level headings on a page (similar to {learnr}) and makes the appear progressively using some buttons at the bottom of the page. It's clunky, but it works. 100 | 101 | ### Quizzes 102 | 103 | {learnr} also supports [interactive questions](https://rstudio.github.io/learnr/articles/questions.html), or inline quiz questions. To simulate this, I use [{checkdown}](https://agricolamz.github.io/checkdown/). It's not as great as {learnr}, but again, it gets the job done.^[I also played with [{webexercises}](https://psyteachr.github.io/webexercises/articles/webexercises.html), which is a little more polished, but doesn't let you give feedback messages for in/correct answers. I'm tempted to fork {checkdown} or submit a bunch of PRs to make it nicer. Someday.] 104 | 105 | ## Legal stuff 106 | 107 | The original primers were developed by the RStudio/Posit Education Team and made [open source on GitHub](https://github.com/rstudio-education/primers). Following the original license, these tutorials are licensed under the Creative Commons Attribution-ShareAlike 4.0 License (CC BY-SA 4.0). 108 | 109 | The primers are derived from the book [*R for Data Science*](https://r4ds.had.co.nz/) from O'Reilly Media, Inc. Copyright © 2017 Garrett Grolemund, Hadley Wickham. Used with permission. 110 | 111 | [See here for the full license.](https://github.com/andrewheiss/r-primers/blob/main/LICENSE.md) 112 | -------------------------------------------------------------------------------- /basics/01-visualization-basics/03-geometric-objects.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Geometric objects" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | 12 | engine: knitr 13 | filters: 14 | - webr 15 | webr: 16 | packages: 17 | - ggplot2 18 | cell-options: 19 | editor-font-scale: 0.85 20 | fig-width: 6 21 | fig-height: 3.7 22 | out-width: "70%" 23 | --- 24 | 25 | ```{r include=FALSE} 26 | knitr::opts_chunk$set( 27 | fig.width = 6, 28 | fig.height = 6 * 0.618, 29 | fig.retina = 3, 30 | dev = "ragg_png", 31 | fig.align = "center", 32 | out.width = "70%" 33 | ) 34 | 35 | library(tidyverse) 36 | library(checkdown) 37 | 38 | source(here::here("R", "helpers.R")) 39 | ``` 40 | 41 | How are these two plots similar? 42 | 43 | ```{r echo = FALSE, out.width="100%", message = FALSE} 44 | #| layout-ncol: 2 45 | ggplot(data = mpg) + 46 | geom_point(mapping = aes(x = displ, y = hwy)) 47 | 48 | ggplot(data = mpg) + 49 | geom_smooth(mapping = aes(x = displ, y = hwy)) 50 | ``` 51 | 52 | Both plots contain the same x variable, the same y variable, and both describe the same data. But the plots are not identical. Each plot uses a different visual object to represent the data. In ggplot2 syntax, we say that they use different __geoms__. 53 | 54 | A __geom__ is the geometrical object that a plot uses to represent observations. People often describe plots by the type of geom that the plot uses. For example, bar charts use bar geoms, line charts use line geoms, boxplots use boxplot geoms, and so on. Scatterplots break the trend; they use the point geom. 55 | 56 | As we see above, you can use different geoms to plot the same data. The plot on the left uses the point geom, and the plot on the right uses the smooth geom, a smooth line fitted to the data. 57 | 58 | ### Geom functions 59 | 60 | To change the geom in your plot, change the geom function that you add to `ggplot()`. For example, take this code which makes the plot on the left (above), and change `geom_point()` to `geom_smooth()`. What do you get? 61 | 62 | ::: {.panel-tabset} 63 | ## {{< fa code >}} Interactive editor 64 | 65 | ```{webr-r} 66 | ggplot(data = mpg) + 67 | geom_point(mapping = aes(x = displ, y = hwy)) 68 | ``` 69 | 70 | ## {{< fa circle-check >}} Solution 71 | 72 | ```r 73 | ggplot(data = mpg) + 74 | geom_smooth(mapping = aes(x = displ, y = hwy)) 75 | ``` 76 | 77 | ::: 78 | 79 | ### 80 | 81 | Good job! You get the plot on the right (above). 82 | 83 | 84 | ### More about geoms 85 | 86 | ggplot2 provides over 30 geom functions that you can use to make plots, and extension packages provide even more (see for a sampling). You'll learn how to use these geoms to explore data in the [Visualize Data]() primer. 87 | 88 | Until then, the best way to get a comprehensive overview of the available geoms is with the [ggplot2 cheatsheet](https://rstudio.github.io/cheatsheets/html/data-visualization.html). To learn more about any single geom, look at its help page, e.g. `?geom_smooth`. 89 | 90 | ### Exercise 1 91 | 92 | What geom would you use to draw a line chart? A boxplot? A histogram? An area chart? 93 | 94 | ### Exercise 2 95 | 96 | ::: {.callout-note appearance="simple" icon=false .question} 97 | 98 | **What does the `se` argument to `geom_smooth()` do?** 99 | 100 | ```{r predict, echo=FALSE} 101 | check_question( 102 | answer = "Adds or removes a standard error ribbon around the smooth line", 103 | options = c( 104 | "Nothing. `se` is not an argument of `geom_smooth()`", 105 | "chooses a method for calculating the smooth line", 106 | "controls whether or not to **s**how **e**rrors", 107 | "Adds or removes a standard error ribbon around the smooth line" 108 | ), 109 | type = "radio", 110 | button_label = "Submit answer", 111 | q_id = 1, 112 | right = c("Correct!") 113 | ) 114 | ``` 115 | ::: 116 | 117 | 118 | ### Putting it all together 119 | 120 | The ideas that you've learned here: geoms, aesthetics, and the implied existence of a data space and a visual space combine to form a system known as the Grammar of Graphics. 121 | 122 | The Grammar of Graphics provides a systematic way to build any graph, and it underlies the ggplot2 package. In fact, the first two letters of ggplot2 stand for "Grammar of Graphics". 123 | 124 | ### The Grammar of Graphics 125 | 126 | The best way to understand the Grammar of Graphics is to see it explained in action: 127 | 128 | ```{=html} 129 |
130 | 131 |
132 | ``` 133 | 134 | ## 135 | 136 | ```{r} 137 | #| echo: false 138 | #| results: asis 139 | create_buttons("04-ggplot2-package.html") 140 | ``` 141 | -------------------------------------------------------------------------------- /basics/01-visualization-basics/04-ggplot2-package.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "The ggplot2 package" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | --- 12 | 13 | ```{r include=FALSE} 14 | knitr::opts_chunk$set( 15 | fig.width = 6, 16 | fig.height = 6 * 0.618, 17 | fig.retina = 3, 18 | dev = "ragg_png", 19 | fig.align = "center", 20 | out.width = "70%" 21 | ) 22 | 23 | library(tidyverse) 24 | 25 | source(here::here("R", "helpers.R")) 26 | ``` 27 | 28 | Throughout this tutorial, I've referred to ggplot2 as a package. What does that mean? 29 | 30 | The R language is subdivided into __packages__, small collections of data sets and functions that all focus on a single task. The functions that we used in this tutorial come from one of those packages, the ggplot2 package, which focuses on visualizing data. 31 | 32 | ### What should you know about packages? 33 | 34 | When you first install R, you get a small collection of core packages known as __base R__. The remaining packages---there are over 10,000 of them---are optional. You don't need to install them unless you want to use them. 35 | 36 | ggplot2 is one of these optional packages, so are the other packages that we will look at in these tutorials. Some of the most popular and most modern parts of R come in the optional packages. 37 | 38 | You don't need to worry about installing packages in these tutorials. Each tutorial comes with all of the packages that you need pre-installed; this is how we make the tutorials easy to use. 39 | 40 | However, one day, you may want to use R outside of these tutorials. When that day comes, you'll want to remember which packages to download to acquire the functions you use here. Throughout the tutorials, I will try to make it as clear as possible where each function comes from! 41 | 42 | 43 | ### Where to from here 44 | 45 | Congratulations! You can use the ggplot2 code template to plot any dataset in many different ways. As you begin exploring data, you should incorporate these tools into your workflow. 46 | 47 | There is much more to ggplot2 and Data Visualization than we have covered here. If you would like to learn more about visualizing data with ggplot2, check out RStudio's primer on [Data Visualization](). 48 | 49 | Your new data visualization skills will make it easier to learn other parts of R, because you can now visualize the results of any change that you make to data. you'll put these skills to immediate use in the next tutorial, which will show you how to extract values from datasets, as well as how to compute new variables and summary statistics from your data. See you there. 50 | 51 | ## 52 | 53 | ```{r} 54 | #| echo: false 55 | #| results: asis 56 | create_buttons(NULL) 57 | ``` 58 | -------------------------------------------------------------------------------- /basics/01-visualization-basics/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Data visualization basics" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | --- 12 | 13 | ```{r include=FALSE} 14 | knitr::opts_chunk$set( 15 | fig.width = 6, 16 | fig.height = 6 * 0.618, 17 | fig.retina = 3, 18 | dev = "ragg_png", 19 | fig.align = "center", 20 | out.width = "70%" 21 | ) 22 | 23 | source(here::here("R", "helpers.R")) 24 | ``` 25 | 26 | Visualization is one of the most important tools for data science. 27 | 28 | It is also a great way to start learning R; when you visualize data, you get an immediate payoff that will keep you motivated as you learn. After all, learning a new language can be hard! 29 | 30 | This tutorial will teach you how to visualize data with R's most popular visualization package, `ggplot2`. 31 | 32 | ### 33 | 34 | The tutorial focuses on three basic skills: 35 | 36 | 1. How to create graphs with a reusable **template** 37 | 1. How to add variables to a graph with **aesthetics** 38 | 1. How to make different "types" of graphs with **geoms** 39 | 40 | In this tutorial, we will use the [core tidyverse packages](http://tidyverse.org/), including `ggplot2`. I've already loaded the packages for you, so let's begin! 41 | 42 | *** 43 | 44 | These examples are excerpted from _R for Data Science_ by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. You can purchase the book at [shop.oreilly.com](http://shop.oreilly.com/product/0636920034407.do). 45 | 46 | 47 | ## 48 | 49 | ```{r} 50 | #| echo: false 51 | #| results: asis 52 | create_buttons("01-code-template.html") 53 | ``` 54 | -------------------------------------------------------------------------------- /basics/02-programming-basics/01-functions.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Functions" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | engine: knitr 12 | filters: 13 | - webr 14 | webr: 15 | cell-options: 16 | editor-font-scale: 0.85 17 | fig-width: 6 18 | fig-height: 3.7 19 | out-width: "70%" 20 | --- 21 | 22 | ```{r include=FALSE} 23 | source(here::here("R", "helpers.R")) 24 | ``` 25 | 26 | ### Functions {.no-hide} 27 | 28 | Watch [this video](https://vimeo.com/220490105): 29 | 30 | ```{=html} 31 |
32 | 33 |
34 | ``` 35 | 36 | ### Run a function 37 | 38 | Can you use the `sqrt()` function in the chunk below to compute the square root of 962? 39 | 40 | ::: {.panel-tabset} 41 | ## {{< fa code >}} Interactive editor 42 | 43 | ```{webr-r} 44 | 45 | 46 | 47 | ``` 48 | 49 | ## {{< fa circle-check >}} Solution 50 | 51 | ```r 52 | sqrt(962) 53 | ``` 54 | 55 | ::: 56 | 57 | ### Code 58 | 59 | Use the code chunk below to examine the code that `sqrt()` runs. 60 | 61 | ::: {.panel-tabset} 62 | ## {{< fa code >}} Interactive editor 63 | 64 | ```{webr-r} 65 | 66 | 67 | 68 | ``` 69 | 70 | ## {{< fa circle-check >}} Solution 71 | 72 | ```r 73 | sqrt 74 | ``` 75 | 76 | ::: 77 | 78 | ### 79 | 80 | Good job! `sqrt` immediately triggers a low level algorithm optimized for performance, so there is not much code to see. 81 | 82 | ### lm 83 | 84 | Compare the code in `sqrt()` to the code in another R function, `lm()`. Examine `lm()`'s code body in the chunk below. 85 | 86 | ::: {.panel-tabset} 87 | ## {{< fa code >}} Interactive editor 88 | 89 | ```{webr-r} 90 | 91 | 92 | 93 | ``` 94 | 95 | ## {{< fa circle-check >}} Solution 96 | 97 | ```r 98 | lm 99 | ``` 100 | 101 | ::: 102 | 103 | 104 | ### Help pages 105 | 106 | Wow! `lm()` runs a lot of code. What does it do? Open the help page for `lm()` in the chunk below and find out. 107 | 108 | ::: {.panel-tabset} 109 | ## {{< fa code >}} Interactive editor 110 | 111 | ```{webr-r} 112 | ?lm 113 | 114 | 115 | ``` 116 | 117 | ## {{< fa circle-check >}} Solution 118 | 119 | ```r 120 | ?lm 121 | ``` 122 | 123 | ::: 124 | 125 | ### 126 | 127 | Good job! `lm()` is R's function for fitting basic linear models. No wonder it runs so much code. 128 | 129 | 130 | ### Code comments 131 | 132 | What do you think the chunk below will return? Run it and see. The result should be nothing. R will not run anything on a line after a `#` symbol. This is useful because it lets you write human readable comments in your code: just place the comments after a `#`. Now delete the `#` and re-run the chunk. You should see a result. 133 | 134 | ::: {.panel-tabset} 135 | ## {{< fa code >}} Interactive editor 136 | 137 | ```{webr-r} 138 | # sqrt(962) 139 | 140 | 141 | ``` 142 | 143 | ## {{< fa circle-check >}} Solution 144 | 145 | ```r 146 | sqrt(962) 147 | ``` 148 | 149 | ::: 150 | 151 | ## 152 | 153 | ```{r} 154 | #| echo: false 155 | #| results: asis 156 | create_buttons("02-arguments.html") 157 | ``` 158 | -------------------------------------------------------------------------------- /basics/02-programming-basics/02-arguments.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Arguments" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | 12 | engine: knitr 13 | filters: 14 | - webr 15 | webr: 16 | cell-options: 17 | editor-font-scale: 0.85 18 | fig-width: 6 19 | fig-height: 3.7 20 | out-width: "70%" 21 | --- 22 | 23 | ```{r include=FALSE} 24 | library(tidyverse) 25 | library(checkdown) 26 | 27 | source(here::here("R", "helpers.R")) 28 | ``` 29 | 30 | ### Arguments {.no-hide} 31 | 32 | Watch [this video](https://vimeo.com/220490157): 33 | 34 | ```{=html} 35 |
36 | 37 |
38 | ``` 39 | 40 | ### `args()` 41 | 42 | `rnorm()` is a function that generates random variables from a normal distribution. Find the arguments of `rnorm()` using the `args()` function. 43 | 44 | ::: {.panel-tabset} 45 | ## {{< fa code >}} Interactive editor 46 | 47 | ```{webr-r} 48 | 49 | 50 | 51 | ``` 52 | 53 | ## {{< fa circle-check >}} Solution 54 | 55 | ```r 56 | args(rnorm) 57 | ``` 58 | 59 | ::: 60 | 61 | ### 62 | 63 | Good job! `n` specifies the number of random normal variables to generate. `mean` and `sd` describe the distribution to generate the random values with. 64 | 65 | ### Optional arguments 66 | 67 | ::: {.callout-note appearance="simple" icon=false .question} 68 | 69 | **Which arguments of `rnorm()` are not optional?** 70 | 71 | ```{r predict, echo=FALSE} 72 | check_question( 73 | answer = "n", 74 | options = c( 75 | "n", 76 | "mean", 77 | "sd" 78 | ), 79 | type = "radio", 80 | button_label = "Submit answer", 81 | q_id = 1, 82 | right = c("Correct! `n` is not an optional argument because it does not have a default value.") 83 | ) 84 | ``` 85 | ::: 86 | 87 | ### `rnorm()` 1 88 | 89 | Use `rnrom()` to generate 100 random normal values with a mean of 100 and a standard deviation of 15. 90 | 91 | ::: {.panel-tabset} 92 | ## {{< fa code >}} Interactive editor 93 | 94 | ```{webr-r} 95 | 96 | 97 | 98 | ``` 99 | 100 | ## {{< fa circle-check >}} Solution 101 | 102 | ```r 103 | rnorm(100, mean = 100, sd = 15) 104 | ``` 105 | 106 | ::: 107 | 108 | ### `rnorm()` 2 109 | 110 | Can you spot the error in the code below? Fix the code and then re-run it. 111 | 112 | ::: {.panel-tabset} 113 | ## {{< fa code >}} Interactive editor 114 | 115 | ```{webr-r} 116 | rnorm(100, mu = 100, sd = 15) 117 | 118 | 119 | ``` 120 | 121 | ## {{< fa lightbulb >}} Hint 122 | 123 | **Hint:** In math, $\mu$ (mu, pronounced "mew" or "moo") is a Greek letter that stands for the mean of a distribution. 124 | 125 | ## {{< fa circle-check >}} Solution 126 | 127 | ```r 128 | rnorm(100, mean = 100, sd = 15) 129 | ``` 130 | 131 | ::: 132 | 133 | ## 134 | 135 | ```{r} 136 | #| echo: false 137 | #| results: asis 138 | create_buttons("03-objects.html") 139 | ``` 140 | -------------------------------------------------------------------------------- /basics/02-programming-basics/03-objects.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Objects" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | engine: knitr 12 | filters: 13 | - webr 14 | webr: 15 | cell-options: 16 | editor-font-scale: 0.85 17 | fig-width: 6 18 | fig-height: 3.7 19 | out-width: "70%" 20 | --- 21 | 22 | ```{r include=FALSE} 23 | source(here::here("R", "helpers.R")) 24 | ``` 25 | 26 | ### Objects {.no-hide} 27 | 28 | Watch [this video](https://vimeo.com/220493412): 29 | 30 | ```{=html} 31 |
32 | 33 |
34 | ``` 35 | 36 | ### Object names 37 | 38 | You can choose almost any name you like for an object, as long as the name does not begin with a number or a special character like `+`, `-`, `*`, `/`, `^`, `!`, `@`, or `&`. 39 | 40 | For instance, check out this list of some possible object names. Some are okay to use; some are invalid: 41 | 42 | - `today`: This is fine 43 | - `1st`: This is **bad**; it starts with a number 44 | - `+1`: This is **bad**; it starts with a special character 45 | - `vars`: This is fine 46 | - `\^_^`: This is **bad**; it starts with a special character 47 | - `foo`: This is fine 48 | 49 | 50 | ### Using objects 51 | 52 | In the code chunk below, save the results of `rnorm(100, mean = 100, sd = 15)` to an object named `data`. Then, on a new line, call the `hist()` function on `data` to plot a histogram of the random values. 53 | 54 | ::: {.panel-tabset} 55 | ## {{< fa code >}} Interactive editor 56 | 57 | ```{webr-r} 58 | 59 | 60 | 61 | ``` 62 | 63 | ## {{< fa circle-check >}} Solution 64 | 65 | ```r 66 | data <- rnorm(100, mean = 100, sd = 15) 67 | hist(data) 68 | ``` 69 | 70 | ::: 71 | 72 | ### What if? 73 | 74 | What do you think would happen if you assigned `data` to a new object named `copy`, like this? Run the code and then inspect both `data` and `copy`. 75 | 76 | ::: {.panel-tabset} 77 | ## {{< fa code >}} Interactive editor 78 | 79 | ```{webr-r} 80 | data <- rnorm(100, mean = 100, sd = 15) 81 | copy <- data 82 | 83 | 84 | ``` 85 | 86 | ## {{< fa circle-check >}} Solution 87 | 88 | ```r 89 | data <- rnorm(100, mean = 100, sd = 15) 90 | copy <- data 91 | data 92 | copy 93 | ``` 94 | 95 | ::: 96 | 97 | ### 98 | 99 | Good job! R saves a copy of the contents in data to copy. 100 | 101 | ### Datasets 102 | 103 | Objects provide an easy way to store datasets in R. In fact, R comes with many toy datasets pre-loaded. Examine the contents of `mtcars` to see a classic toy dataset. Hint: how could you learn more about the `mtcars` object? 104 | 105 | ::: {.panel-tabset} 106 | ## {{< fa code >}} Interactive editor 107 | 108 | ```{webr-r} 109 | 110 | 111 | 112 | ``` 113 | 114 | ## {{< fa circle-check >}} Solution 115 | 116 | ```r 117 | mtcars 118 | ``` 119 | 120 | ::: 121 | 122 | ### 123 | 124 | Good job! You can learn more about mtcars by examining its help page with `?mtcars`. 125 | 126 | ### `rm()` 127 | 128 | What if you accidentally overwrite an object? If that object came with R or one of its packages, you can restore the original version of the object by removing your version with `rm()`. Run `rm()` on `mtcars` below to restore the mtcars data set. 129 | 130 | ::: {.panel-tabset} 131 | ## {{< fa code >}} Interactive editor 132 | 133 | ```{webr-r} 134 | mtcars <- 1 135 | mtcars 136 | 137 | 138 | ``` 139 | 140 | ## {{< fa circle-check >}} Solution 141 | 142 | ```r 143 | mtcars <- 1 144 | mtcars 145 | rm(mtcars) 146 | mtcars 147 | ``` 148 | 149 | ::: 150 | 151 | ### 152 | 153 | Good job! Unfortunately, `rm()` cannot help you if you overwrite one of your own objects. 154 | 155 | ## 156 | 157 | ```{r} 158 | #| echo: false 159 | #| results: asis 160 | create_buttons("04-vectors.html") 161 | ``` 162 | -------------------------------------------------------------------------------- /basics/02-programming-basics/04-vectors.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Vectors" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | engine: knitr 12 | filters: 13 | - webr 14 | webr: 15 | cell-options: 16 | editor-font-scale: 0.85 17 | fig-width: 6 18 | fig-height: 3.7 19 | out-width: "70%" 20 | --- 21 | 22 | ```{r include=FALSE} 23 | source(here::here("R", "helpers.R")) 24 | ``` 25 | 26 | ### Vectors {.no-hide} 27 | 28 | Watch [this video](https://vimeo.com/220490316): 29 | 30 | ```{=html} 31 |
32 | 33 |
34 | ``` 35 | 36 | ### Create a vector 37 | 38 | In the chunk below, create a vector that contains the integers from one to ten. Use the `c()` function. 39 | 40 | ::: {.panel-tabset} 41 | ## {{< fa code >}} Interactive editor 42 | 43 | ```{webr-r} 44 | 45 | 46 | 47 | ``` 48 | 49 | ## {{< fa circle-check >}} Solution 50 | 51 | ```r 52 | c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) 53 | ``` 54 | 55 | ::: 56 | 57 | 58 | ### `:` 59 | 60 | If your vector contains a sequence of contiguous integers, you can create it with the `:` shortcut. Run `1:10` in the chunk below. What do you get? What do you suppose `1:20` would return? 61 | 62 | ::: {.panel-tabset} 63 | ## {{< fa code >}} Interactive editor 64 | 65 | ```{webr-r} 66 | 67 | 68 | 69 | ``` 70 | 71 | ## {{< fa circle-check >}} Solution 72 | 73 | ```r 74 | 1:10 75 | 1:20 76 | ``` 77 | 78 | ::: 79 | 80 | 81 | ### `[]` 82 | 83 | You can extract any element of a vector by placing a pair of brackets behind the vector. Inside the brackets place the number of the element that you'd like to extract. For example, `vec[3]` would return the third element of the vector named `vec`. 84 | 85 | Use the chunk below to extract the fourth element of `vec`. 86 | 87 | ::: {.panel-tabset} 88 | ## {{< fa code >}} Interactive editor 89 | 90 | ```{webr-r} 91 | vec <- c(1, 2, 4, 8, 16) 92 | 93 | 94 | ``` 95 | 96 | ## {{< fa circle-check >}} Solution 97 | 98 | ```r 99 | vec <- c(1, 2, 4, 8, 16) 100 | vec[4] 101 | ``` 102 | 103 | ::: 104 | 105 | ### More `[]` 106 | 107 | You can also use `[]` to extract multiple elements of a vector. Place the vector `c(1,2,5)` between the brackets below. What does R return? 108 | 109 | ::: {.panel-tabset} 110 | ## {{< fa code >}} Interactive editor 111 | 112 | ```{webr-r} 113 | vec <- c(1, 2, 4, 8, 16) 114 | vec[] 115 | 116 | 117 | ``` 118 | 119 | ## {{< fa circle-check >}} Solution 120 | 121 | ```r 122 | vec <- c(1, 2, 4, 8, 16) 123 | vec[c(1,2,5)] 124 | ``` 125 | 126 | ::: 127 | 128 | 129 | ### Names 130 | 131 | If the elements of your vector have names, you can extract them by name. To do so place a name or vector of names in the brackets behind a vector. Surround each name with quotation marks, e.g. `vec2[c("alpha", "beta")]`. 132 | 133 | Extract the element named gamma from the vector below. 134 | 135 | ::: {.panel-tabset} 136 | ## {{< fa code >}} Interactive editor 137 | 138 | ```{webr-r} 139 | vec2 <- c(alpha = 1, beta = 2, gamma = 3) 140 | 141 | 142 | ``` 143 | 144 | ## {{< fa circle-check >}} Solution 145 | 146 | ```r 147 | vec2 <- c(alpha = 1, beta = 2, gamma = 3) 148 | vec2["gamma"] 149 | ``` 150 | 151 | ::: 152 | 153 | 154 | ### Vectorised operations 155 | 156 | Predict what the code below will return. Then look at the result. 157 | 158 | ::: {.panel-tabset} 159 | ## {{< fa code >}} Interactive editor 160 | 161 | ```{webr-r} 162 | c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) + c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) 163 | 164 | 165 | ``` 166 | 167 | ::: 168 | 169 | ### 170 | 171 | Good job! Like many R functions, R's math operators are vectorized: they're designed to work with vectors by repeating the operation for each pair of elements. 172 | 173 | ### Vector recycling 174 | 175 | Predict what the code below will return. Then look at the result. 176 | 177 | ::: {.panel-tabset} 178 | ## {{< fa code >}} Interactive editor 179 | 180 | ```{webr-r} 181 | 1 + c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) 182 | 183 | 184 | ``` 185 | 186 | ::: 187 | 188 | ### 189 | 190 | Good job! Whenever you try to work with vectors of varying lengths (recall that `1` is a vector of length one), R will repeat the shorter vector as needed to compute the result. 191 | 192 | ## 193 | 194 | ```{r} 195 | #| echo: false 196 | #| results: asis 197 | create_buttons("05-types.html") 198 | ``` 199 | -------------------------------------------------------------------------------- /basics/02-programming-basics/05-types.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Types" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | 12 | engine: knitr 13 | filters: 14 | - webr 15 | webr: 16 | cell-options: 17 | editor-font-scale: 0.85 18 | fig-width: 6 19 | fig-height: 3.7 20 | out-width: "70%" 21 | --- 22 | 23 | ```{r include=FALSE} 24 | library(tidyverse) 25 | library(checkdown) 26 | 27 | source(here::here("R", "helpers.R")) 28 | ``` 29 | 30 | ### Types {.no-hide} 31 | 32 | Watch [this video](https://vimeo.com/220490241): 33 | 34 | ```{=html} 35 |
36 | 37 |
38 | ``` 39 | 40 | ### Atomic types 41 | 42 | ::: {.callout-note appearance="simple" icon=false .question} 43 | 44 | **Which of these is not an atomic data type?** 45 | 46 | ```{r types1, echo=FALSE} 47 | check_question( 48 | answer = "simple", 49 | options = c( 50 | "numeric/double", 51 | "integer", 52 | "character", 53 | "logical", 54 | "complex", 55 | "raw", 56 | "simple" 57 | ), 58 | type = "radio", 59 | button_label = "Submit answer", 60 | q_id = 1, 61 | right = c("Correct!") 62 | ) 63 | ``` 64 | 65 | ::: 66 | 67 | ### What type? 68 | 69 | ::: {.callout-note appearance="simple" icon=false .question} 70 | 71 | **What type of data is `"1L"`?** 72 | 73 | ```{r types2, echo=FALSE} 74 | check_question( 75 | answer = "character", 76 | options = c( 77 | "numeric/double", 78 | "integer", 79 | "character", 80 | "logical" 81 | ), 82 | type = "radio", 83 | button_label = "Submit answer", 84 | q_id = 2, 85 | right = c("Correct! This was tricky because of the quotes. 1L by itself would be an integer, but values become characters when they're in quotes.") 86 | ) 87 | ``` 88 | 89 | ::: 90 | 91 | ### Integers 92 | 93 | Create a vector of integers from one to five. Can you imagine why you might want to use integers instead of numbers/doubles? 94 | 95 | ::: {.panel-tabset} 96 | ## {{< fa code >}} Interactive editor 97 | 98 | ```{webr-r} 99 | 100 | 101 | 102 | ``` 103 | 104 | ## {{< fa circle-check >}} Solution 105 | 106 | ```r 107 | c(1L, 2L, 3L, 4L, 5L) 108 | ``` 109 | 110 | ::: 111 | 112 | 113 | ### Floating point arithmetic 114 | 115 | Computers must use a finite amount of memory to store decimal numbers (which can sometimes require infinite precision). As a result, some decimals can only be saved as very precise approximations. From time to time you'll notice side effects of this imprecision, like below. 116 | 117 | Compute the square root of two, square the answer (e.g. multiply the square root of two by the square root of two), and then subtract two from the result. What answer do you expect? What answer do you get? 118 | 119 | ::: {.panel-tabset} 120 | ## {{< fa code >}} Interactive editor 121 | 122 | ```{webr-r} 123 | 124 | 125 | 126 | ``` 127 | 128 | ## {{< fa circle-check >}} Solution 129 | 130 | ```r 131 | sqrt(2) * sqrt(2) - 2 132 | sqrt(2)^2 - 2 133 | ``` 134 | 135 | ::: 136 | 137 | 138 | ### Vectors 139 | 140 | ::: {.callout-note appearance="simple" icon=false .question} 141 | 142 | **How many types of data can you put into a single vector?** 143 | 144 | ```{r types3, echo=FALSE} 145 | check_question( 146 | answer = "1", 147 | options = c( 148 | "1", 149 | "6", 150 | "As many as you like" 151 | ), 152 | type = "radio", 153 | button_label = "Submit answer", 154 | q_id = 3, 155 | right = c("Correct!") 156 | ) 157 | ``` 158 | 159 | ::: 160 | 161 | ### Character or object? 162 | 163 | One of the most common mistakes in R is to call an object when you mean to call a character string and vice versa. 164 | 165 | ::: {.callout-note appearance="simple" icon=false .question} 166 | 167 | **Which of these are object names? What is the difference between object names and character strings?** 168 | 169 | ```{r types4, echo=FALSE} 170 | check_question( 171 | answer = c("foo", "mu", "a"), 172 | options = c( 173 | "foo", 174 | '"num"', 175 | "mu", 176 | '"sigma"', 177 | '"data"', 178 | "a" 179 | ), 180 | type = "checkbox", 181 | button_label = "Submit answer", 182 | q_id = 4, 183 | right = c("Correct! Character strings are surrounded by quotation marks, object names are not.") 184 | ) 185 | ``` 186 | 187 | ::: 188 | 189 | 190 | ## 191 | 192 | ```{r} 193 | #| echo: false 194 | #| results: asis 195 | create_buttons("06-lists.html") 196 | ``` 197 | -------------------------------------------------------------------------------- /basics/02-programming-basics/06-lists.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Lists" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | 12 | engine: knitr 13 | filters: 14 | - webr 15 | webr: 16 | cell-options: 17 | editor-font-scale: 0.85 18 | fig-width: 6 19 | fig-height: 3.7 20 | out-width: "70%" 21 | --- 22 | 23 | ```{r include=FALSE} 24 | library(tidyverse) 25 | library(checkdown) 26 | 27 | source(here::here("R", "helpers.R")) 28 | ``` 29 | 30 | ### Lists {.no-hide} 31 | 32 | Watch [this video](https://vimeo.com/220490360): 33 | 34 | ```{=html} 35 |
36 | 37 |
38 | ``` 39 | 40 | ### Lists vs. vectors 41 | 42 | ::: {.callout-note appearance="simple" icon=false .question} 43 | 44 | **Which data structure(s) could you use to store these pieces of data in the same object? `1001`, `TRUE`, `"stories"`** 45 | 46 | ```{r lists1, echo=FALSE} 47 | check_question( 48 | answer = c("a list"), 49 | options = c( 50 | "a vector", 51 | "a list", 52 | "neither" 53 | ), 54 | type = "radio", 55 | button_label = "Submit answer", 56 | q_id = 1, 57 | right = c("Correct! Lists can contain elements that are different types.") 58 | ) 59 | ``` 60 | 61 | ::: 62 | 63 | 64 | ### Make a list 65 | 66 | Make a list that contains the elements `1001`, `TRUE`, and `"stories"`. Give each element a name. 67 | 68 | ::: {.panel-tabset} 69 | ## {{< fa code >}} Interactive editor 70 | 71 | ```{webr-r} 72 | 73 | 74 | 75 | ``` 76 | 77 | ## {{< fa circle-check >}} Solution 78 | 79 | ```r 80 | list(number = 1001, logical = TRUE, string = "stories") 81 | ``` 82 | 83 | ::: 84 | 85 | 86 | ### Extract an element 87 | 88 | Extract the number 1001 from the list below. 89 | 90 | ::: {.panel-tabset} 91 | ## {{< fa code >}} Interactive editor 92 | 93 | ```{webr-r} 94 | things <- list(number = 1001, logical = TRUE, string = "stories") 95 | 96 | 97 | ``` 98 | 99 | ## {{< fa circle-check >}} Solution 100 | 101 | ```r 102 | things <- list(number = 1001, logical = TRUE, string = "stories") 103 | things$number 104 | ``` 105 | 106 | ::: 107 | 108 | ### Data Frames 109 | 110 | You can make a data frame with the `data.frame()` function, which works similar to `c()`, and `list()`. Assemble the vectors below into a data frame with the column names `numbers`, `logicals`, `strings`. 111 | 112 | ::: {.panel-tabset} 113 | ## {{< fa code >}} Interactive editor 114 | 115 | ```{webr-r} 116 | nums <- c(1, 2, 3, 4) 117 | logs <- c(TRUE, TRUE, FALSE, TRUE) 118 | strs <- c("apple", "banana", "carrot", "duck") 119 | 120 | 121 | ``` 122 | 123 | ## {{< fa circle-check >}} Solution 124 | 125 | ```r 126 | nums <- c(1, 2, 3, 4) 127 | logs <- c(TRUE, TRUE, FALSE, TRUE) 128 | strs <- c("apple", "banana", "carrot", "duck") 129 | data.frame(numbers = nums, logicals = logs, strings = strs) 130 | ``` 131 | 132 | ::: 133 | 134 | ### 135 | 136 | Good job. When you make a data frame, you must follow one rule: each column vector should be the same length 137 | 138 | 139 | ### Extract a column 140 | 141 | Given that a data frame is a type of list (with named elements), how could you extract the strings column of the `df` data frame below? Do it. 142 | 143 | ::: {.panel-tabset} 144 | ## {{< fa code >}} Interactive editor 145 | 146 | ```{webr-r} 147 | nums <- c(1, 2, 3, 4) 148 | logs <- c(TRUE, TRUE, FALSE, TRUE) 149 | strs <- c("apple", "banana", "carrot", "duck") 150 | df <- data.frame(numbers = nums, logicals = logs, strings = strs) 151 | 152 | 153 | ``` 154 | 155 | ## {{< fa circle-check >}} Solution 156 | 157 | ```r 158 | df$strings 159 | ``` 160 | 161 | ::: 162 | 163 | ## 164 | 165 | ```{r} 166 | #| echo: false 167 | #| results: asis 168 | create_buttons("07-packages.html") 169 | ``` 170 | -------------------------------------------------------------------------------- /basics/02-programming-basics/07-packages.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Packages" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | 12 | engine: knitr 13 | filters: 14 | - webr 15 | webr: 16 | packages: 17 | - tidyverse 18 | cell-options: 19 | editor-font-scale: 0.85 20 | fig-width: 6 21 | fig-height: 3.7 22 | out-width: "70%" 23 | --- 24 | 25 | ```{r include=FALSE} 26 | library(tidyverse) 27 | library(checkdown) 28 | 29 | source(here::here("R", "helpers.R")) 30 | ``` 31 | 32 | ### Packages {.no-hide} 33 | 34 | Watch [this video](https://vimeo.com/220490447): 35 | 36 | ```{=html} 37 |
38 | 39 |
40 | ``` 41 | 42 | ### A common error 43 | 44 | ::: {.callout-note appearance="simple" icon=false .question} 45 | 46 | **What does this common error message suggest? `object _____ does not exist.`** 47 | 48 | ```{r packages1, echo=FALSE} 49 | check_question( 50 | answer = c("Either"), 51 | options = c( 52 | "You misspelled your object name", 53 | "You've forgot to load the package that ____ comes in", 54 | "Either" 55 | ), 56 | type = "radio", 57 | button_label = "Submit answer", 58 | q_id = 1, 59 | right = c("Correct!") 60 | ) 61 | ``` 62 | 63 | ::: 64 | 65 | 66 | ### Load a package 67 | 68 | In the code chunk below, load the {tidyverse} package. Whenever you load a package R will also load all of the packages that the first package depends on. {tidyverse} takes advantage of this to create a shortcut for loading several common packages at once. Whenever you load {tidyverse}, {tidyverse} also loads {ggplot2}, {dplyr}, {tibble}, {tidyr}, {readr}, {purrr}, {forcats}, {stringr}, and {lubridate}. 69 | 70 | ::: {.panel-tabset} 71 | ## {{< fa code >}} Interactive editor 72 | 73 | ```{webr-r} 74 | 75 | 76 | 77 | ``` 78 | 79 | ## {{< fa circle-check >}} Solution 80 | 81 | ```r 82 | library(tidyverse) 83 | ``` 84 | 85 | ::: 86 | 87 | ### 88 | 89 | Good job! R will keep the packages loaded until you close your R session. When you re-open R, you'll need to reload your packages. 90 | 91 | 92 | ### Quotes 93 | 94 | Did you know `library()` is a special function in R? You can pass `library()` a package name in quotes, like `library("tidyverse")`, or not in quotes, like `library(tidyverse)`---both will work! That's often not the case with R functions. 95 | 96 | In general, you should always use quotes unless you are writing the _name_ of something that is already loaded into R's memory, like a function, vector, or data frame. 97 | 98 | ### Install packages 99 | 100 | But what if the package that you want to load is not installed on your computer? How would you install the {dplyr} package on your own computer? 101 | 102 | ::: {.panel-tabset} 103 | ## {{< fa code >}} Interactive editor 104 | 105 | ```{webr-r} 106 | 107 | 108 | 109 | ``` 110 | 111 | ## {{< fa circle-check >}} Solution 112 | 113 | ```r 114 | install.packages("dplyr") 115 | ``` 116 | 117 | ::: 118 | 119 | ### 120 | 121 | Good job! You only need to install a package once, unless you wish to update your local copying by reinstalling the package. Notice that `install.packages()` _always_ requires quotes around the package name. 122 | 123 | 124 | ### Congratulations! 125 | 126 | Congratulations. You now have a formal sense for how the basics of R work. Although you may think of your self as a data scientist, this brief computer science background will help you as you analyze data. Whenever R does something unexpected, you can apply your knowledge of how R works to figure out what went wrong. 127 | 128 | 129 | ## 130 | 131 | ```{r} 132 | #| echo: false 133 | #| results: asis 134 | create_buttons(NULL) 135 | ``` 136 | -------------------------------------------------------------------------------- /basics/02-programming-basics/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Programming basics" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | --- 11 | 12 | ```{r include=FALSE} 13 | source(here::here("R", "helpers.R")) 14 | ``` 15 | 16 | ### Welcome to R! {.no-hide} 17 | 18 | R is easiest to use when you know how the R language works. This tutorial will teach you the implicit background knowledge that informs every piece of R code. You'll learn about: 19 | 20 | * **functions** and their **arguments** 21 | * **objects** 22 | * R's basic **data types** 23 | * R's basic data structures including **vectors** and **lists** 24 | * R's **package system** 25 | 26 | ## 27 | 28 | ```{r} 29 | #| echo: false 30 | #| results: asis 31 | create_buttons("01-functions.html") 32 | ``` 33 | -------------------------------------------------------------------------------- /deploy.sh: -------------------------------------------------------------------------------- 1 | REMOTE_HOST="ath-cloud" 2 | REMOTE_DIR="~/sites/r-primers.andrewheiss.com/public" 3 | REMOTE_DEST=$REMOTE_HOST:$REMOTE_DIR 4 | 5 | echo "Uploading new changes to remote server..." 6 | echo 7 | rsync -crvP --delete _site/ $REMOTE_DEST 8 | -------------------------------------------------------------------------------- /html/custom.scss: -------------------------------------------------------------------------------- 1 | @import url('https://fonts.googleapis.com/css2?family=Inter:ital,wght@0,100..900;1,100..900&display=swap'); 2 | 3 | /*-- scss:defaults --*/ 4 | // Tiepolo colors 5 | // MetBrewer::met.brewer("Tiepolo") 6 | $white: #fff !default; 7 | $gray-100: #f8f9fa !default; 8 | $gray-200: #e9ecef !default; 9 | $gray-300: #dee2e6 !default; 10 | $gray-400: #ced4da !default; 11 | $gray-500: #adb5bd !default; 12 | $gray-600: #6c757d !default; 13 | $gray-700: #495057 !default; 14 | $gray-800: #343a40 !default; 15 | $gray-900: #212529 !default; 16 | $black: #000 !default; 17 | 18 | $blue: #17486f !default; 19 | $indigo: #6610f2 !default; 20 | $purple: #6f42c1 !default; 21 | $pink: #d63384 !default; 22 | $red: #802417 !default; 23 | $orange: #c06636 !default; 24 | $yellow: #e8b960 !default; 25 | $green: #646e3b !default; 26 | $teal: #2b5851 !default; 27 | $cyan: #508ea2 !default; 28 | 29 | $primary: $blue !default; 30 | $secondary: $white !default; 31 | $success: $green !default; 32 | $info: $cyan !default; 33 | $warning: $orange !default; 34 | $danger: $red !default; 35 | $light: $gray-100 !default; 36 | $dark: $gray-900 !default; 37 | 38 | $font-family-sans-serif: Inter, sans-serif !default; 39 | $font-family-serif: Inter, serif !default; /* Not actually a serif font but whatever */ 40 | 41 | $font-size-base: 1rem !default; 42 | $headings-font-weight: 700 !default; 43 | 44 | // $h1-font-size: $font-size-base * 2.35; 45 | // $h2-font-size: $font-size-base * 1.8; 46 | // $h3-font-size: $font-size-base * 1.45; 47 | // $h4-font-size: $font-size-base * 1.1; 48 | // $h5-font-size: $font-size-base * 1; 49 | // $h6-font-size: $font-size-base * 0.8; 50 | // 51 | // $toc-font-size: 0.95rem; 52 | // $sidebar-font-size: 1.1rem; 53 | // $sidebar-font-size-section: 0.95rem; 54 | // $footer-font-size: 0.95rem; 55 | // 56 | // $link-color: $red; 57 | // $link-hover-color: $yellow; 58 | 59 | // Inline code 60 | $code-bg: $gray-200 !default; 61 | $code-color: $gray-900 !default; 62 | // 63 | // // Block code 64 | // $monokai-bg: #2e3440; 65 | 66 | 67 | /*-- scss:rules --*/ 68 | .hidden { 69 | display: none; 70 | } 71 | 72 | #buttons { 73 | padding-bottom: 600px; 74 | } 75 | 76 | .question { 77 | input[type="submit"] { 78 | margin-top: 0.75em; 79 | } 80 | 81 | input[type="submit"] + div { 82 | margin-top: 0.5em; 83 | } 84 | } 85 | 86 | .navbar-dark .navbar-nav .show>.nav-link, 87 | .navbar-dark .navbar-nav .active>.nav-link, 88 | .navbar-dark .navbar-nav .nav-link.active, 89 | div.sidebar-item-container .active, 90 | div.sidebar-item-container .show>.nav-link, 91 | div.sidebar-item-container .sidebar-link>code{ 92 | color: $orange; 93 | font-weight: bold; 94 | } 95 | 96 | thead th { 97 | text-transform: none; 98 | } 99 | -------------------------------------------------------------------------------- /index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "R Primers" 3 | toc: false 4 | --- 5 | 6 | A version of the old RStudio/Posit Primers, now with Quarto and webR. 7 | 8 | ## License {.appendix} 9 | 10 | The original primers were developed by the RStudio/Posit Education Team and made [open source on GitHub](https://github.com/rstudio-education/primers). Following the original license, these tutorials are licensed under the Creative Commons Attribution-ShareAlike 4.0 License (CC BY-SA 4.0). 11 | 12 | The primers are derived from the book [*R for Data Science*](https://r4ds.had.co.nz/) from O'Reilly Media, Inc. Copyright © 2017 Garrett Grolemund, Hadley Wickham. Used with permission. 13 | 14 | [See here for the full license.](https://github.com/andrewheiss/r-primers/blob/main/LICENSE.md) 15 | -------------------------------------------------------------------------------- /js/bootstrapify.js: -------------------------------------------------------------------------------- 1 | document.addEventListener('DOMContentLoaded', function() { 2 | // Select all forms that don't have a class set 3 | var formsWithoutClass = document.querySelectorAll('form:not([class])'); 4 | formsWithoutClass.forEach(addBootstrapClasses); 5 | 6 | // Select all forms that use `method="post"` 7 | // var postForms = document.querySelectorAll('form[method="post"]'); 8 | // postForms.forEach(addBootstrapClasses); 9 | }); 10 | 11 | function addBootstrapClasses(form) { 12 | // Add the Bootstrap class 'form-group' to the form 13 | form.classList.add('form-group'); 14 | 15 | // Select the radio inputs within this form and add the Bootstrap class 'form-check-input' 16 | var radioInputs = form.querySelectorAll('input[type="radio"]'); 17 | radioInputs.forEach(function(input) { 18 | input.classList.add('form-check-input'); 19 | }); 20 | 21 | // Select the labels within this form and add the Bootstrap class 'form-check-label' 22 | var labels = form.querySelectorAll('label'); 23 | labels.forEach(function(label) { 24 | label.classList.add('form-check-label'); 25 | }); 26 | 27 | // Select the submit button within this form and add the Bootstrap classes 'btn' and 'btn-primary' 28 | var submitButton = form.querySelector('input[type="submit"]'); 29 | submitButton.classList.add('btn', 'btn-primary', 'btn-sm'); 30 | } 31 | -------------------------------------------------------------------------------- /js/progressive-reveal.js: -------------------------------------------------------------------------------- 1 | var key = 'currentSection' + window.location.pathname; 2 | var currentSection = localStorage.getItem(key) ? parseInt(localStorage.getItem(key)) : -1; 3 | var sections = Array.from(document.getElementsByClassName('level3')) 4 | .filter(section => !section.classList.contains('no-hide')); 5 | 6 | // Hide all sections initially 7 | sections.forEach(function (section) { 8 | section.classList.add('hidden'); 9 | }); 10 | 11 | function revealSection(sectionIndex) { 12 | sections[sectionIndex].classList.remove('hidden'); 13 | } 14 | 15 | var continueButton = document.getElementById('continueButton'); 16 | var nextTopicButton = document.getElementById('nextTopicButton'); 17 | 18 | // Disable continue button if there are no sections 19 | if (sections.length === 0) { 20 | continueButton.disabled = true; 21 | nextTopicButton.classList.remove('disabled'); 22 | // Otherwise progressively reveal sections 23 | } else { 24 | continueButton.addEventListener('click', function () { 25 | currentSection++; 26 | if (currentSection < sections.length) { 27 | revealSection(currentSection); 28 | localStorage.setItem(key, currentSection); 29 | // Jump to the id anchor for the current section 30 | window.location.hash = sections[currentSection].id; 31 | // Adjust scroll position to account for the height of the navbar 32 | window.scrollBy(0, 70); 33 | } 34 | 35 | if (currentSection >= sections.length - 1) { 36 | continueButton.disabled = true; 37 | nextTopicButton.classList.remove('disabled'); 38 | } 39 | }); 40 | } 41 | 42 | // On page load, reveal up to the current section 43 | window.onload = function () { 44 | for (var i = 0; i <= currentSection; i++) { 45 | revealSection(i); 46 | } 47 | }; 48 | 49 | function clearProgress() { 50 | localStorage.removeItem(key); 51 | window.location.hash = '#'; // Remove the anchor from the URL 52 | } 53 | 54 | document.getElementById('resetButton').addEventListener('click', function () { 55 | clearProgress(); 56 | // Reload the page to reflect the reset progress 57 | location.reload(); 58 | }); 59 | -------------------------------------------------------------------------------- /r-primers.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | 15 | AutoAppendNewline: Yes 16 | 17 | ProjectName: R Primers 18 | -------------------------------------------------------------------------------- /renv/.gitignore: -------------------------------------------------------------------------------- 1 | library/ 2 | local/ 3 | cellar/ 4 | lock/ 5 | python/ 6 | sandbox/ 7 | staging/ 8 | -------------------------------------------------------------------------------- /renv/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "bioconductor.version": null, 3 | "external.libraries": [], 4 | "ignored.packages": [], 5 | "package.dependency.fields": [ 6 | "Imports", 7 | "Depends", 8 | "LinkingTo" 9 | ], 10 | "ppm.enabled": null, 11 | "ppm.ignored.urls": [], 12 | "r.version": null, 13 | "snapshot.type": "implicit", 14 | "use.cache": true, 15 | "vcs.ignore.cellar": true, 16 | "vcs.ignore.library": true, 17 | "vcs.ignore.local": true, 18 | "vcs.manage.ignores": true 19 | } 20 | -------------------------------------------------------------------------------- /tidy-data/01-reshape-data/img/tidy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/tidy-data/01-reshape-data/img/tidy.png -------------------------------------------------------------------------------- /tidy-data/01-reshape-data/img/vectorized.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/tidy-data/01-reshape-data/img/vectorized.png -------------------------------------------------------------------------------- /tidy-data/01-reshape-data/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Reshape data" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | --- 11 | 12 | ```{r include=FALSE} 13 | knitr::opts_chunk$set( 14 | fig.width = 6, 15 | fig.height = 6 * 0.618, 16 | fig.retina = 3, 17 | dev = "ragg_png", 18 | fig.align = "center", 19 | out.width = "70%" 20 | ) 21 | 22 | source(here::here("R", "helpers.R")) 23 | ``` 24 | 25 | The tools that you learned in the previous Primers work best when your data is organized in a specific way. This format is known as **tidy data** and it appears throughout the tidyverse. You will spend a lot of time as a data scientist wrangling your data into a usable format, so it is important to learn how to do this fast. 26 | 27 | This tutorial will teach you how to recognize tidy data, as well as how to reshape untidy data into a tidy format. In it, you will learn the core data wrangling functions for the tidyverse: 28 | 29 | * `pivot_longer()`, which reshapes wide data into long data, and 30 | * `pivot_wider()`, which reshapes long data into wide data 31 | 32 | This tutorial uses the [core tidyverse packages](http://tidyverse.org/), including {ggplot2}, {dplyr}, and {tidyr}, as well as the {babynames} package. All of these packages have been pre-installed and pre-loaded for your convenience. 33 | 34 | 35 | ## 36 | 37 | ```{r} 38 | #| echo: false 39 | #| results: asis 40 | create_buttons("01-tidy-data.html") 41 | ``` 42 | -------------------------------------------------------------------------------- /transform-data/01-tibbles/01-babynames.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "babynames" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | engine: knitr 12 | filters: 13 | - webr 14 | webr: 15 | packages: 16 | - babynames 17 | autoload-packages: false 18 | cell-options: 19 | editor-font-scale: 0.85 20 | fig-width: 6 21 | fig-height: 3.7 22 | out-width: "70%" 23 | --- 24 | 25 | ```{r include=FALSE} 26 | knitr::opts_chunk$set( 27 | fig.width = 6, 28 | fig.height = 6 * 0.618, 29 | fig.retina = 3, 30 | dev = "ragg_png", 31 | fig.align = "center", 32 | out.width = "70%" 33 | ) 34 | 35 | library(tidyverse) 36 | 37 | source(here::here("R", "helpers.R")) 38 | ``` 39 | 40 | ### Loading babynames {.no-hide} 41 | 42 | Before we begin, let's learn a little about our data. The `babynames` dataset comes in the {babynames} package. The package is pre-installed for you, just as {ggplot2} was pre-installed in the last tutorial. But unlike in the last tutorial, I have not pre-_loaded_ {babynames}, or any other package. 43 | 44 | What does this mean? In R, whenever you want to use a package that is not part of base R, you need to load the package with the command `library()`. Until you load a package, R will not be able to find the datasets and functions contained in the package. For example, if we asked R to display the `babynames` dataset, which comes in the {babynames} package, right now, we'd get the message below. R cannot find the dataset because we haven't loaded the {babynames} package. 45 | 46 | ```{r error=TRUE} 47 | babynames 48 | ``` 49 | 50 | To load the {babynames} package, you would run the command `library(babynames)`. After you load a package, R will be able to find its contents _until you close R_. The next time you open R, you will need to reload the package if you wish to use it again. 51 | 52 | This might sound like an inconvenience, but choosing which packages to load keeps your R experience simple and orderly. 53 | 54 | In the chunk below, load {babynames} (the package) and then open the help page for `babynames` (the dataset). Be sure to read the help page before going on. 55 | 56 | ::: {.panel-tabset} 57 | ## {{< fa code >}} Interactive editor 58 | 59 | ```{webr-r} 60 | 61 | 62 | 63 | ``` 64 | 65 | ## {{< fa circle-check >}} Solution 66 | 67 | ```r 68 | library(babynames) 69 | ?babynames 70 | ``` 71 | 72 | ::: 73 | 74 | ```{r bnames, include=FALSE} 75 | library(babynames) 76 | ``` 77 | 78 | ### The data 79 | 80 | Now that you know a little about the dataset, let's examine its contents. If you were to run `babynames` at your R console, you would get output that looks like this: 81 | 82 | ```{r echo=TRUE, eval=FALSE} 83 | babynames 84 | 85 | #> 187 1880 F Christina 65 6.659495e-04 86 | #> 188 1880 F Lelia 65 6.659495e-04 87 | #> 189 1880 F Nelle 65 6.659495e-04 88 | #> 190 1880 F Sue 65 6.659495e-04 89 | #> 191 1880 F Johanna 64 6.557041e-04 90 | #> 192 1880 F Lilly 64 6.557041e-04 91 | #> 193 1880 F Lucinda 63 6.454587e-04 92 | #> 194 1880 F Minerva 63 6.454587e-04 93 | #> 195 1880 F Lettie 62 6.352134e-04 94 | #> 196 1880 F Roxie 62 6.352134e-04 95 | #> 197 1880 F Cynthia 61 6.249680e-04 96 | #> 198 1880 F Helena 60 6.147226e-04 97 | #> 199 1880 F Hilda 60 6.147226e-04 98 | #> 200 1880 F Hulda 60 6.147226e-04 99 | #> [ reached getOption("max.print") -- omitted 1825233 rows ] 100 | ``` 101 | 102 | Yikes. What is happening? 103 | 104 | ### Displaying large data 105 | 106 | `babynames` is a large data frame, and R is not well equipped to display the contents of large data frames. R shows as many rows as possible before your memory buffer is overwhelmed. At that point, R stops, leaving you to look at an arbitrary section of your data. 107 | 108 | You can avoid this behavior by transforming your data frame to a _tibble_. 109 | 110 | 111 | ## 112 | 113 | ```{r} 114 | #| echo: false 115 | #| results: asis 116 | create_buttons("02-tibbles.html") 117 | ``` 118 | -------------------------------------------------------------------------------- /transform-data/01-tibbles/02-tibbles.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "tibbles" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | engine: knitr 12 | filters: 13 | - webr 14 | webr: 15 | packages: 16 | - tibble 17 | - babynames 18 | autoload-packages: false 19 | cell-options: 20 | editor-font-scale: 0.85 21 | fig-width: 6 22 | fig-height: 3.7 23 | out-width: "70%" 24 | --- 25 | 26 | ```{r include=FALSE} 27 | knitr::opts_chunk$set( 28 | fig.width = 6, 29 | fig.height = 6 * 0.618, 30 | fig.retina = 3, 31 | dev = "ragg_png", 32 | fig.align = "center", 33 | out.width = "70%" 34 | ) 35 | 36 | library(tidyverse) 37 | 38 | source(here::here("R", "helpers.R")) 39 | ``` 40 | 41 | ```{webr-r} 42 | #| context: setup 43 | library(babynames) 44 | ``` 45 | 46 | ### What is a tibble? {.no-hide} 47 | 48 | A tibble is a special type of table. R displays tibbles in a refined way whenever you have the **tibble** package loaded: R will print only the first ten rows of a tibble as well as all of the columns that fit into your console window. R also adds useful summary information about the tibble, such as the data types of each column and the size of the data set. 49 | 50 | Whenever you do not have the tibble packages loaded, R will display the tibble as if it were a data frame. In fact, tibbles _are_ data frames, an enhanced type of data frame. 51 | 52 | You can think of the difference between the data frame display and the tibble display like this: 53 | 54 | ![](img/tibble_display.png){width=75%} 55 | 56 | ### `as_tibble()` 57 | 58 | You can transform a data frame to a tibble with the `as_tibble()` function in the tibble package, e.g. `as_tibble(cars)`. However, `babynames` is already a tibble. To display it nicely, you just need to load the {tibble} package. 59 | 60 | To see what I mean, use `library()` to load the tibble package in the chunk below and then call `babynames`. 61 | 62 | ::: {.panel-tabset} 63 | ## {{< fa code >}} Interactive editor 64 | 65 | ```{webr-r} 66 | 67 | 68 | 69 | ``` 70 | 71 | ## {{< fa circle-check >}} Solution 72 | 73 | ```r 74 | library(tibble) 75 | library(babynames) 76 | babynames 77 | ``` 78 | 79 | ::: 80 | 81 | ### 82 | 83 | Excellent! If you want to check whether or not an object is a tibble, you can use the `is_tibble()` function that comes in the tibble package. For example, this would return TRUE: `is_tibble(babynames)`. 84 | 85 | 86 | ### `View()` 87 | 88 | What if you'd like to inspect the remaining portions of a tibble? To see the entire tibble, use the `View()` command. R will launch a window that shows a scrollable display of the entire data set. For example, the code below will launch a data viewer in RStudio. 89 | 90 | ```{r eval=FALSE} 91 | View(babynames) 92 | ``` 93 | 94 | `View()` works in conjunction with the software that you run R from: `View()` opens the data editor provided by that software. Unfortunately, this tutorial doesn't come with a data editor, so you won't be able to use `View()` today (unless you open RStudio, for example, and run the code there). 95 | 96 | 97 | ## 98 | 99 | ```{r} 100 | #| echo: false 101 | #| results: asis 102 | create_buttons("03-tidyverse.html") 103 | ``` 104 | -------------------------------------------------------------------------------- /transform-data/01-tibbles/03-tidyverse.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "tidyverse" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | 12 | engine: knitr 13 | filters: 14 | - webr 15 | webr: 16 | packages: 17 | - tidyverse 18 | autoload-packages: false 19 | cell-options: 20 | editor-font-scale: 0.85 21 | fig-width: 6 22 | fig-height: 3.7 23 | out-width: "70%" 24 | --- 25 | 26 | ```{r include=FALSE} 27 | knitr::opts_chunk$set( 28 | fig.width = 6, 29 | fig.height = 6 * 0.618, 30 | fig.retina = 3, 31 | dev = "ragg_png", 32 | fig.align = "center", 33 | out.width = "70%" 34 | ) 35 | 36 | library(tidyverse) 37 | library(checkdown) 38 | 39 | source(here::here("R", "helpers.R")) 40 | ``` 41 | 42 | 43 | ### The tidyverse {.no-hide} 44 | 45 | The {tibble} package is one of several packages that are known collectively as ["the tidyverse"](http://tidyverse.org). Tidyverse packages share a common philosophy and are designed to work well together. For example, in this tutorial you will use the {tibble} package, the {ggplot2} package, and the {dplyr} package, all of which belong to the tidyverse. 46 | 47 | ### The tidyverse package 48 | 49 | When you use tidyverse packages, you can make your life easier by using the {tidyverse} package. The {tidyverse} package provides a shortcut for installing and loading the entire suite of packages in "the tidyverse", e.g. 50 | 51 | ```{r eval = FALSE} 52 | install.packages("tidyverse") 53 | library(tidyverse) 54 | ``` 55 | 56 | ### Installing the tidyverse 57 | 58 | Think of the {tidyverse} package as a placeholder for the packages that are in the "tidyverse". By itself, {tidyverse} does not do much, but when you install the {tidyverse} package it instructs R to install every other package in the tidyverse at the same time. In other words, when you run `install.packages("tidyverse")`, R installs the following packages for you in one simple step: 59 | 60 | * ggplot2 61 | * dplyr 62 | * tidyr 63 | * readr 64 | * purrr 65 | * tibble 66 | * hms 67 | * stringr 68 | * lubridate 69 | * forcats 70 | * DBI 71 | * haven 72 | * jsonlite 73 | * readxl 74 | * rvest 75 | * xml2 76 | * modelr 77 | * broom 78 | 79 | ### Loading the tidyverse 80 | 81 | When you load tidyverse with `library("tidyverse")`, it instructs R to load _the most commonly used_ tidyverse packages. These are: 82 | 83 | * ggplot2 84 | * dplyr 85 | * tidyr 86 | * readr 87 | * purrr 88 | * tibble 89 | * stringr 90 | * forcats 91 | * lubridate 92 | 93 | You can load the less commonly used tidyverse packages in the normal way, by running `library()` for each of them. 94 | 95 | Let's give this a try. We will use the ggplot2 and dplyr packages later in this tutorial. Let's use the tidyverse package to load them in the chunk below: 96 | 97 | ::: {.panel-tabset} 98 | ## {{< fa code >}} Interactive editor 99 | 100 | ```{webr-r} 101 | 102 | 103 | 104 | ``` 105 | 106 | ## {{< fa circle-check >}} Solution 107 | 108 | ```r 109 | library(tidyverse) 110 | ``` 111 | 112 | ::: 113 | 114 | ### Quiz 115 | 116 | ::: {.callout-note appearance="simple" icon=false .question} 117 | 118 | **Which package is not loaded by `library("tidyverse")`** 119 | 120 | ```{r tidyverse-check, echo=FALSE} 121 | check_question( 122 | answer = "babynames", 123 | options = c( 124 | "ggplot2", 125 | "dplyr", 126 | "tibble", 127 | "babynames" 128 | ), 129 | type = "radio", 130 | button_label = "Submit answer", 131 | q_id = 1, 132 | right = c("Correct!") 133 | ) 134 | ``` 135 | ::: 136 | 137 | ### Recap 138 | 139 | Tibbles and the {tidyverse} package are two tools that make life with R easier. Ironically, you may not come to appreciate their value right away: these tutorials pre-load packages for you. However, you will want to use tibbles and the {tidyverse} package when you move out of the tutorials and begin doing your own work with R inside of RStudio. 140 | 141 | This tutorial also introduced the babynames dataset. In the next tutorial, you will use this data set to plot the popularity of _your_ name over time. Along the way, you will learn how to filter and subset data sets in R. 142 | 143 | ## 144 | 145 | ```{r} 146 | #| echo: false 147 | #| results: asis 148 | create_buttons(NULL) 149 | ``` 150 | -------------------------------------------------------------------------------- /transform-data/01-tibbles/img/tibble_display.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/transform-data/01-tibbles/img/tibble_display.png -------------------------------------------------------------------------------- /transform-data/01-tibbles/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Working with tibbles" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | --- 11 | 12 | ```{r include=FALSE} 13 | knitr::opts_chunk$set( 14 | fig.width = 6, 15 | fig.height = 6 * 0.618, 16 | fig.retina = 3, 17 | dev = "ragg_png", 18 | fig.align = "center", 19 | out.width = "70%" 20 | ) 21 | 22 | source(here::here("R", "helpers.R")) 23 | ``` 24 | 25 | In this primer, you will explore the popularity of different names over time. To succeed, you will need to master some common tools for manipulating data with R: 26 | 27 | * tibbles and `View()`, which let you inspect raw data 28 | * `select()` and `filter()`, which let you extract rows and columns from a data frame 29 | * `arrange()`, which lets you reorder the rows in your data 30 | * `|>`, which organizes your code into reader-friendly "pipes" 31 | * `mutate()`, `group_by()`, and `summarize()`, which help you use your data to compute new variables and summary statistics 32 | 33 | These are some of the most useful R functions for data science, and the tutorials that follow will provide you everything you need to learn them. 34 | 35 | In the tutorials, we'll use a dataset named `babynames`, which comes in a package that is also named `babynames`. Within `babynames`, you will find information about almost every name given to children in the United States since 1880. 36 | 37 | This tutorial introduces `babynames` as well as a new data structure that makes working with data in R easy: the tibble. 38 | 39 | In addition to `babynames`, this tutorial uses the [core tidyverse packages](http://tidyverse.org/), including {ggplot2}, {tibble}, and {dplyr.} All of these packages have been pre-installed for your convenience. But they haven't been pre-loaded---something you will soon learn more about! 40 | 41 | 42 | ## 43 | 44 | ```{r} 45 | #| echo: false 46 | #| results: asis 47 | create_buttons("01-babynames.html") 48 | ``` 49 | -------------------------------------------------------------------------------- /transform-data/02-isolating/01-your-name.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Your name" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | --- 11 | 12 | ```{r include=FALSE} 13 | knitr::opts_chunk$set( 14 | fig.width = 6, 15 | fig.height = 6 * 0.618, 16 | fig.retina = 3, 17 | dev = "ragg_png", 18 | fig.align = "center", 19 | out.width = "70%" 20 | ) 21 | 22 | library(tidyverse) 23 | library(babynames) 24 | 25 | source(here::here("R", "helpers.R")) 26 | ``` 27 | 28 | ### The history of your name {.no-hide} 29 | 30 | You can use the data in `babynames` to make graphs like this, which reveal the history of a name, perhaps your name. 31 | 32 | ```{r echo=FALSE, message=FALSE, warning=FALSE, out.width="90%"} 33 | babynames |> 34 | filter(name == "Andrew", sex == "M") |> 35 | ggplot() + 36 | geom_line(aes(x = year, y = prop)) + 37 | labs(title = "Popularity of the name Andrew") 38 | ``` 39 | 40 | But before you do, you will need to trim down `babynames`. At the moment, there are more rows in `babynames` than you need to build your plot. 41 | 42 | ### An example 43 | 44 | To see what I mean, consider how I made the plot above: I began with the entire dataset, which if plotted as a scatterplot would've looked like this. 45 | 46 | ```{r plot-all-names, out.width="60%", cache=TRUE} 47 | ggplot(babynames) + 48 | geom_point(aes(x = year, y = prop)) + 49 | labs(title = "Popularity of every name in the dataset") 50 | ``` 51 | 52 | I then narrowed the data to just the rows that contain my name, before plotting the data with a line geom. Here's how the rows with just my name look as a scatterplot. 53 | 54 | ```{r out.width="60%"} 55 | babynames |> 56 | filter(name == "Andrew", sex == "M") |> 57 | ggplot() + 58 | geom_point(aes(x = year, y = prop)) + 59 | labs(title = "Popularity of the name Andrew") 60 | ``` 61 | 62 | If I had skipped this step, my line graph would've connected all of the points in the large dataset, creating an uninformative graph. 63 | 64 | ```{r out.width="60%", cached=TRUE} 65 | ggplot(babynames) + 66 | geom_line(aes(x = year, y = prop)) + 67 | labs(title = "Popularity of every name in the dataset") 68 | ``` 69 | 70 | Your goal in this section is to repeat this process for your own name (or a name that you choose). Along the way, you will learn a set of functions that isolate information within a dataset. 71 | 72 | ### Isolating data 73 | 74 | This type of task occurs often in data science: you need to extract data from a table before you can use it. You can do this task quickly with three functions that come in the {dplyr} package: 75 | 76 | 1. **`select()`**, which extracts columns from a data frame 77 | 1. **`filter()`**, which extracts rows from a data frame 78 | 1. **`arrange()`**, which moves important rows to the top of a data frame 79 | 80 | Each function takes a data frame or tibble as its first argument and returns a new data frame or tibble as its output. 81 | 82 | 83 | ## 84 | 85 | ```{r} 86 | #| echo: false 87 | #| results: asis 88 | create_buttons("02-select.html") 89 | ``` 90 | -------------------------------------------------------------------------------- /transform-data/02-isolating/02-select.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "`select()`" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | 12 | engine: knitr 13 | filters: 14 | - webr 15 | webr: 16 | packages: 17 | - babynames 18 | - dplyr 19 | cell-options: 20 | editor-font-scale: 0.85 21 | fig-width: 6 22 | fig-height: 3.7 23 | out-width: "70%" 24 | --- 25 | 26 | ```{r include=FALSE} 27 | knitr::opts_chunk$set( 28 | fig.width = 6, 29 | fig.height = 6 * 0.618, 30 | fig.retina = 3, 31 | dev = "ragg_png", 32 | fig.align = "center", 33 | out.width = "70%" 34 | ) 35 | 36 | library(tidyverse) 37 | library(checkdown) 38 | 39 | source(here::here("R", "helpers.R")) 40 | ``` 41 | 42 | `select()` extracts columns of a data frame and returns the columns as a new data frame. To use `select()`, pass it the name of a data frame to extract columns from, and then the names of the columns to extract. The column names do not need to appear in quotation marks or be prefixed with a `$`; `select()` knows to find them in the data frame that you supply. 43 | 44 | ### Exercise: `select()` 45 | 46 | Use the example below to get a feel for `select()`. Can you extract just the `name` column? How about the `name` and `year` columns? How about all of the columns except `prop`? 47 | 48 | ::: {.panel-tabset} 49 | ## {{< fa code >}} Interactive editor 50 | 51 | ```{webr-r} 52 | select(babynames, name, sex) 53 | 54 | 55 | ``` 56 | 57 | ## {{< fa circle-check >}} Solution 58 | 59 | ```r 60 | select(babynames, name) 61 | select(babynames, name, year) 62 | select(babynames, year, sex, name, n) 63 | ``` 64 | 65 | ::: 66 | 67 | 68 | ### `select()` helpers 69 | 70 | You can also use a series of helpers with `select()`. For example, if you place a minus sign before a column name, `select()` will return every column but that column. Can you predict how the minus sign will work here? 71 | 72 | ::: {.panel-tabset} 73 | ## {{< fa code >}} Interactive editor 74 | 75 | ```{webr-r} 76 | select(babynames, -c(n, prop)) 77 | 78 | 79 | ``` 80 | 81 | ::: 82 | 83 | The table below summarizes the other `select()` helpers that are available in {dplyr}. Study it, and then click "Continue" to test your understanding. 84 | 85 | | Helper function | Use | Example | 86 | |-------------------|------------------------|------------------------------| 87 | | **`-`** | Columns except | `select(babynames, -prop)` | 88 | | **`:`** | Columns between (inclusive) | `select(babynames, year:n)` | 89 | | **`contains()`** | Columns that contains a string | `select(babynames, contains("n"))` | 90 | | **`ends_with()`** | Columns that ends with a string | `select(babynames, ends_with("n"))` | 91 | | **`matches()`** | Columns that matches a regex | `select(babynames, matches("n"))` | 92 | | **`num_range()`** | Columns with a numerical suffix in the range | Not applicable with `babynames` | 93 | | **`one_of()`** | Columns whose name appear in the given set | `select(babynames, one_of(c("sex", "gender")))` | 94 | | **`starts_with()`** | Columns that starts with a string | `select(babynames, starts_with("n"))` | 95 | 96 | : {tbl-colwidths="[15, 35, 35]" .striped .hover .table-sm} 97 | 98 | 99 | ### `select()` quiz 100 | 101 | ::: {.callout-note appearance="simple" icon=false .question} 102 | 103 | **Which of these is not a way to select the `name` and `n` columns together?** 104 | 105 | ```{r predict, echo=FALSE} 106 | check_question( 107 | answer = 'select(babynames, ends_with("n"))', 108 | options = c( 109 | "select(babynames, -c(year, sex, prop))", 110 | "select(babynames, name:n)", 111 | 'select(babynames, starts_with("n"))', 112 | 'select(babynames, ends_with("n"))' 113 | ), 114 | type = "radio", 115 | button_label = "Submit answer", 116 | q_id = 1, 117 | right = c("Correct!") 118 | ) 119 | ``` 120 | ::: 121 | 122 | ## 123 | 124 | ```{r} 125 | #| echo: false 126 | #| results: asis 127 | create_buttons("03-filter.html") 128 | ``` 129 | -------------------------------------------------------------------------------- /transform-data/02-isolating/04-arrange.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "`arrange()`" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | engine: knitr 12 | filters: 13 | - webr 14 | webr: 15 | packages: 16 | - babynames 17 | - dplyr 18 | cell-options: 19 | editor-font-scale: 0.85 20 | fig-width: 6 21 | fig-height: 3.7 22 | out-width: "70%" 23 | --- 24 | 25 | ```{r include=FALSE} 26 | knitr::opts_chunk$set( 27 | fig.width = 6, 28 | fig.height = 6 * 0.618, 29 | fig.retina = 3, 30 | dev = "ragg_png", 31 | fig.align = "center", 32 | out.width = "70%" 33 | ) 34 | 35 | library(tidyverse) 36 | library(babynames) 37 | 38 | source(here::here("R", "helpers.R")) 39 | ``` 40 | 41 | `arrange()` returns all of the rows of a data frame reordered by the values of a column. As with `select()`, the first argument of `arrange()` should be a data frame and the remaining arguments should be the names of columns. If you give `arrange()` a single column name, it will return the rows of the data frame reordered so that the row with the lowest value in that column appears first, the row with the second lowest value appears second, and so on. If the column contains character strings, `arrange()` will place them in alphabetical order. 42 | 43 | ### Exercise: `arrange()` 44 | 45 | Use the code chunk below to arrange babynames by `n`. Can you tell what the smallest value of `n` is? 46 | 47 | ::: {.panel-tabset} 48 | ## {{< fa code >}} Interactive editor 49 | 50 | ```{webr-r} 51 | 52 | 53 | 54 | ``` 55 | 56 | ## {{< fa circle-check >}} Solution 57 | 58 | ```r 59 | arrange(babynames, n) 60 | ``` 61 | 62 | ::: 63 | 64 | ### 65 | 66 | Good job! The compiler of `babynames` used 5 as a cutoff; a name only made it into `babynames` for a given year and gender if it was used for five or more children. 67 | 68 | ### Tie breakers 69 | 70 | If you supply additional column names, `arrange()` will use them as tie breakers to order rows that have identical values in the earlier columns. Add to the code below, to make `prop` a tie breaker. The result should first order rows by value of `n` and then reorder rows within each value of `n` by values of `prop`. 71 | 72 | ::: {.panel-tabset} 73 | ## {{< fa code >}} Interactive editor 74 | 75 | ```{webr-r} 76 | arrange(babynames, n) 77 | 78 | 79 | ``` 80 | 81 | ## {{< fa circle-check >}} Solution 82 | 83 | ```r 84 | arrange(babynames, n, prop) 85 | ``` 86 | 87 | ::: 88 | 89 | 90 | ### `desc()` 91 | 92 | If you would rather arrange rows in the opposite order, i.e. from _large_ values to _small_ values, surround a column name with `desc()`. `arrange()` will reorder the rows based on the largest values to the smallest. 93 | 94 | Add a `desc()` to the code below to display the most popular name for 2017 (the largest year in the dataset) instead of 1880 (the smallest year in the dataset). 95 | 96 | ::: {.panel-tabset} 97 | ## {{< fa code >}} Interactive editor 98 | 99 | ```{webr-r} 100 | arrange(babynames, year, desc(prop)) 101 | 102 | 103 | ``` 104 | 105 | ## {{< fa circle-check >}} Solution 106 | 107 | ```r 108 | arrange(babynames, desc(year), desc(prop)) 109 | ``` 110 | 111 | ::: 112 | 113 | Think you have it? Click Continue to test yourself. 114 | 115 | ### `arrange()` quiz 116 | 117 | Which name was the most popular for a single gender in a single year? In the code chunk below, use `arrange()` to make the row with the largest value of `prop` appear at the top of the data set. 118 | 119 | ::: {.panel-tabset} 120 | ## {{< fa code >}} Interactive editor 121 | 122 | ```{webr-r} 123 | 124 | 125 | 126 | ``` 127 | 128 | ## {{< fa circle-check >}} Solution 129 | 130 | ```r 131 | arrange(babynames, desc(prop)) 132 | ``` 133 | 134 | ::: 135 | 136 | Now arrange `babynames` so that the row with the largest value of `n` appears at the top of the data frame. Will this be the same row? Why or why not? 137 | 138 | ::: {.panel-tabset} 139 | ## {{< fa code >}} Interactive editor 140 | 141 | ```{webr-r} 142 | 143 | 144 | 145 | ``` 146 | 147 | ## {{< fa circle-check >}} Solution 148 | 149 | ```r 150 | arrange(babynames, desc(n)) 151 | ``` 152 | 153 | ::: 154 | 155 | ### 156 | 157 | The number of children represented by each proportion grew over time as the population grew. 158 | 159 | ## 160 | 161 | ```{r} 162 | #| echo: false 163 | #| results: asis 164 | create_buttons("05-pipe.html") 165 | ``` 166 | -------------------------------------------------------------------------------- /transform-data/02-isolating/05-pipe.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "`|>`" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | engine: knitr 12 | filters: 13 | - webr 14 | webr: 15 | packages: 16 | - babynames 17 | - dplyr 18 | - ggplot2 19 | cell-options: 20 | editor-font-scale: 0.85 21 | fig-width: 6 22 | fig-height: 3.7 23 | out-width: "70%" 24 | --- 25 | 26 | ```{r include=FALSE} 27 | knitr::opts_chunk$set( 28 | fig.width = 6, 29 | fig.height = 6 * 0.618, 30 | fig.retina = 3, 31 | dev = "ragg_png", 32 | fig.align = "center", 33 | out.width = "70%" 34 | ) 35 | 36 | library(tidyverse) 37 | library(babynames) 38 | 39 | source(here::here("R", "helpers.R")) 40 | ``` 41 | 42 | ### Steps {.no-hide} 43 | 44 | Notice how each {dplyr} function takes a data frame as input and returns a data frame as output. This makes the functions easy to use in a step-by-step fashion. For example, you could: 45 | 46 | 1. Filter `babynames` to just boys born in 2017 47 | 2. Select the `name` and `n` columns from the result 48 | 3. Arrange those columns so that the most popular names appear near the top. 49 | 50 | ```{r} 51 | boys_2017 <- filter(babynames, year == 2017, sex == "M") 52 | boys_2017 <- select(boys_2017, name, n) 53 | boys_2017 <- arrange(boys_2017, desc(n)) 54 | boys_2017 55 | ``` 56 | 57 | ### Redundancy 58 | 59 | The result shows us the most popular boys names from 2017, which is the most recent year in the data set. But take a look at the code. Do you notice how we re-create `boys_2017` at each step so we will have something to pass to the next step? This is an inefficient way to write R code. 60 | 61 | You could avoid creating `boys_2017` by nesting your functions inside of each other, but this creates code that is hard to read: 62 | 63 | ```{r eval=FALSE} 64 | arrange(select(filter(babynames, year == 2017, sex == "M"), name, n), desc(n)) 65 | ``` 66 | 67 | There is a third way to write sequences of functions: the pipe. 68 | 69 | ### |> 70 | 71 | The pipe operator `|>` performs an extremely simple task: it passes the result on its left into the first argument of the function on its right. Or put another way, `x |> f(y)` is the same as `f(x, y)`. This piece of code punctuation makes it easy to write and read series of functions that are applied in a step by step way. For example, we can use the pipe to rewrite our code above: 72 | 73 | ```{r} 74 | babynames |> 75 | filter(year == 2017, sex == "M") |> 76 | select(name, n) |> 77 | arrange(desc(n)) 78 | ``` 79 | 80 | As you read the code, pronounce `|>` as **"and then"**. You'll notice that {dplyr} makes it easy to read pipes. Each function name is a verb, so our code resembles the statement, "Take `babynames`, _and then_ filter it by `name` and `sex`, _and then_ select the `name` and `n` columns, _and then_ arrange the results by descending values of `n`." 81 | 82 | {dplyr} also makes it easy to write pipes. Each {dplyr} function returns a data frame that can be piped into another {dplyr} function, which will accept the data frame as its first argument. In fact, {dplyr} functions are written with pipes in mind: each function does one simple task. {dplyr} expects you to use pipes to combine these simple tasks to produce sophisticated results. 83 | 84 | ### Exercise: Pipes 85 | 86 | I'll use pipes for the remainder of the tutorial, and I will expect you to as well. Let's practice a little by writing a new pipe in the chunk below. The pipe should: 87 | 88 | 1. Filter `babynames` to just the *girls* that were born in 2017 89 | 2. Select the `name` and `n` columns 90 | 3. Arrange the results so that the most popular names are near the top. 91 | 92 | Try to write your pipe without copying and pasting the code from above. 93 | 94 | ::: {.panel-tabset} 95 | ## {{< fa code >}} Interactive editor 96 | 97 | ```{webr-r} 98 | 99 | 100 | 101 | ``` 102 | 103 | ## {{< fa circle-check >}} Solution 104 | 105 | ```r 106 | babynames |> 107 | filter(year == 2017, sex == "F") |> 108 | select(name, n) |> 109 | arrange(desc(n)) 110 | ``` 111 | 112 | ::: 113 | 114 | ### Your name 115 | 116 | You've now mastered a set of skills that will let you easily plot the popularity of your name over time. In the code chunk below, use a combination of {dplyr} and {ggplot2} functions with `|>` to: 117 | 118 | 1. Trim `babynames` to just the rows that contain your name and your sex 119 | 2. Trim the result to just the columns that will appear in your graph (not strictly necessary, but useful practice) 120 | 3. Plot the results as a line graph with `year` on the x axis and `prop` on the y axis 121 | 122 | Note that the first argument of `ggplot()` takes a data frame, which means you can add `ggplot()` directly to the end of a pipe. However, you will need to switch from `|>` to `+` to finish adding layers to your plot. 123 | 124 | ::: {.panel-tabset} 125 | ## {{< fa code >}} Interactive editor 126 | 127 | ```{webr-r} 128 | 129 | 130 | 131 | ``` 132 | 133 | ## {{< fa circle-check >}} Solution 134 | 135 | ```r 136 | babynames |> 137 | filter(name == "Andrew", sex == "M") |> 138 | select(year, prop) |> 139 | ggplot() + 140 | geom_line(aes(x = year, y = prop)) + 141 | labs(title = "Popularity of the name Andrew") 142 | ``` 143 | 144 | ::: 145 | 146 | ### Recap 147 | 148 | Together, `select()`, `filter()`, and `arrange()` let you quickly find information displayed within your data. 149 | 150 | The next tutorial will show you how to derive information that is implied by your data, but not displayed within your data set. 151 | 152 | In that tutorial, you will continue to use the `|>` operator, which is an essential part of programming with the dplyr library. 153 | 154 | Pipes help make R expressive, like a spoken language. Spoken languages consist of simple words that you combine into sentences to create sophisticated thoughts. 155 | 156 | In the tidyverse, functions are like words: each does one simple task well. You can combine these tasks into pipes with `|>` to perform complex, customized procedures. 157 | 158 | 159 | ## 160 | 161 | ```{r} 162 | #| echo: false 163 | #| results: asis 164 | create_buttons(NULL) 165 | ``` 166 | -------------------------------------------------------------------------------- /transform-data/02-isolating/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Isolating data with {dplyr}" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | --- 11 | 12 | ```{r include=FALSE} 13 | knitr::opts_chunk$set( 14 | fig.width = 6, 15 | fig.height = 6 * 0.618, 16 | fig.retina = 3, 17 | dev = "ragg_png", 18 | fig.align = "center", 19 | out.width = "70%" 20 | ) 21 | 22 | source(here::here("R", "helpers.R")) 23 | ``` 24 | 25 | In this case study, you will explore the popularity of your own name over time. Along the way, you will master some of the most useful functions for isolating variables, cases, and values within a data frame: 26 | 27 | * `select()` and `filter()`, which let you extract rows and columns from a data frame 28 | * `arrange()`, which lets you reorder the rows in your data 29 | * `|>`, which organizes your code into reader-friendly "pipes" 30 | 31 | This tutorial uses the [core tidyverse packages](http://tidyverse.org/), including {ggplot2}, {tibble}, and {dplyr}, as well as the {babynames} package. All of these packages have been pre-installed and pre-loaded for your convenience. 32 | 33 | 34 | ## 35 | 36 | ```{r} 37 | #| echo: false 38 | #| results: asis 39 | create_buttons("01-your-name.html") 40 | ``` 41 | -------------------------------------------------------------------------------- /transform-data/03-deriving/01-most-popular-names.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "The most popular names" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | --- 12 | 13 | ```{r include=FALSE} 14 | knitr::opts_chunk$set( 15 | fig.width = 6, 16 | fig.height = 6 * 0.618, 17 | fig.retina = 3, 18 | dev = "ragg_png", 19 | fig.align = "center", 20 | out.width = "70%" 21 | ) 22 | 23 | library(tidyverse) 24 | library(babynames) 25 | library(checkdown) 26 | 27 | source(here::here("R", "helpers.R")) 28 | ``` 29 | 30 | ### What are the most popular names of all time? {.no-hide} 31 | 32 | Let's use `babynames` to answer a different question: what are the most popular names of all time? 33 | 34 | This question seems simple enough, but to answer it we need to be more precise: how do you define "the most popular" names? Try to think of several definitions and then click Continue. After the Continue button, I will suggest two definitions of my own. 35 | 36 | ### Two definitions of popular 37 | 38 | I suggest that we focus on two definitions of _popular_, one that uses sums and one that uses ranks: 39 | 40 | 1. **Sums** - A name is popular _if the total number of children that have the name is large when you sum across years_. 41 | 2. **Ranks** - A name is popular _if it consistently ranks among the top names from year to year_. 42 | 43 | This raises a question: 44 | 45 | ::: {.callout-note appearance="simple" icon=false .question} 46 | 47 | **Do we have enough information in `babynames` to compare the popularity of names?** 48 | 49 | ```{r predict, echo=FALSE} 50 | check_question( 51 | answer = "Yes. We can use the information in `babynames` to compute the values we want.", 52 | options = c( 53 | "No. No cell in `babynames` contains a rank value or a sum across years.", 54 | "Yes. We can use the information in `babynames` to compute the values we want." 55 | ), 56 | type = "radio", 57 | button_label = "Submit answer", 58 | q_id = 1, 59 | right = c("Correct!") 60 | ) 61 | ``` 62 | ::: 63 | 64 | ### Deriving information 65 | 66 | Every data frame that you meet implies more information than it displays. For example, `babynames` does not display the total number of children who had your name, but `babynames` certainly implies what that number is. To discover the number, you only need to do a calculation: 67 | 68 | ```{r} 69 | babynames |> 70 | filter(name == "Andrew", sex == "M") |> 71 | summarize(total = sum(n)) 72 | ``` 73 | 74 | ### Useful functions 75 | 76 | {dplyr} provides three functions that can help you reveal the information implied by your data: 77 | 78 | * `summarize()` 79 | * `group_by()` 80 | * `mutate()` 81 | 82 | Like `select()`, `filter()` and `arrange()`, these functions all take a data frame as their first argument and return a new data frame as their output, which makes them easy to use in pipes. 83 | 84 | Let's master each function and use them to analyze popularity as we go. 85 | 86 | ## 87 | 88 | ```{r} 89 | #| echo: false 90 | #| results: asis 91 | create_buttons("02-summarize.html") 92 | ``` 93 | -------------------------------------------------------------------------------- /transform-data/03-deriving/02-summarize.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "`summarize()`" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | engine: knitr 12 | filters: 13 | - webr 14 | webr: 15 | packages: 16 | - babynames 17 | - dplyr 18 | cell-options: 19 | editor-font-scale: 0.85 20 | fig-width: 6 21 | fig-height: 3.7 22 | out-width: "70%" 23 | --- 24 | 25 | ```{r include=FALSE} 26 | knitr::opts_chunk$set( 27 | fig.width = 6, 28 | fig.height = 6 * 0.618, 29 | fig.retina = 3, 30 | dev = "ragg_png", 31 | fig.align = "center", 32 | out.width = "70%" 33 | ) 34 | 35 | library(tidyverse) 36 | library(babynames) 37 | 38 | source(here::here("R", "helpers.R")) 39 | ``` 40 | 41 | `summarize()` takes a data frame and uses it to calculate a new data frame of summary statistics. 42 | 43 | ### Syntax 44 | 45 | To use `summarize()`, pass it a data frame and then one or more named arguments. Each named argument should be set to an R expression that generates a single value. Summarise will turn each named argument into a column in the new data frame. The name of each argument will become the column name, and the value returned by the argument will become the column contents. 46 | 47 | Importantly, the `summarize()` function is *destructive*. It collapses a dataset into a single row and throws away any columns that we don’t use when summarizing. Watch this little animation to see what it does: 48 | 49 | ```{=html} 50 | 53 | ``` 54 | 55 | ### Example 56 | 57 | I used `summarize()` earlier to calculate the total number of boys named "Andrew", but let's expand that code to also calculate 58 | 59 | * `max`: the maximum number of boys named "Andrew" in a single year 60 | * `mean`: the mean number of boys named "Andrew" per year 61 | 62 | ```{r} 63 | babynames |> 64 | filter(name == "Andrew", sex == "M") |> 65 | summarize(total = sum(n), max = max(n), mean = mean(n)) 66 | ``` 67 | 68 | Don't let the code above fool you. The first argument of `summarize()` is always a data frame, but when you use `summarize()` in a pipe, the first argument is provided by the pipe operator, `|>`. Here the first argument will be the data frame that is returned by `babynames |> filter(name == "Andrew", sex == "M")`. 69 | 70 | ### Exercise: `summarize()` 71 | 72 | Use the code chunk below to compute three statistics: 73 | 74 | 1. the total number of children who ever had your name 75 | 1. the maximum number of children given your name in a single year 76 | 1. the mean number of children given your name per year 77 | 78 | If you cannot think of an R function that would compute each statistic, click the Solution tab. 79 | 80 | ::: {.panel-tabset} 81 | ## {{< fa code >}} Interactive editor 82 | 83 | ```{webr-r} 84 | 85 | 86 | 87 | ``` 88 | 89 | ## {{< fa circle-check >}} Solution 90 | 91 | ```r 92 | babynames |> 93 | filter(name == "Andrew", sex == "M") |> 94 | summarize(total = sum(n), max = max(n), mean = mean(n)) 95 | ``` 96 | 97 | ::: 98 | 99 | 100 | ### Summary functions 101 | 102 | So far our `summarize()` examples have relied on `sum()`, `max()`, and `mean()`. But you can use any function in `summarize()` so long as it meets one criteria: the function must take a _vector_ of values as input and return a _single_ value as output. Functions that do this are known as **summary functions** and they are common in the field of descriptive statistics. Some of the most useful summary functions include: 103 | 104 | 1. **Measures of location**: `mean(x)`, `median(x)`, `quantile(x, 0.25)`, `min(x)`, and `max(x)` 105 | 1. **Measures of spread**: `sd(x)`, `var(x)`, `IQR(x)`, and `mad(x)` 106 | 1. **Measures of position**: `first(x)`, `nth(x, 2)`, and `last(x)` 107 | 1. **Counts**: `n_distinct(x)` and `n()`, which takes no arguments, and returns the size of the current group or data frame. 108 | 1. **Counts and proportions of logical values**: `sum(!is.na(x))`, which counts the number of `TRUE`s returned by a logical test; `mean(y == 0)`, which returns the proportion of `TRUE`s returned by a logical test. 109 | 110 | Let's apply some of these summary functions. Click Continue to test your understanding. 111 | 112 | ### Khaleesi challenge 113 | 114 | "Khaleesi" is a very modern name that appears to be based on the _Game of Thrones_ TV series, which premiered on April 17, 2011. In the chunk below, filter `babynames` to just the rows where `name == "Khaleesi"`. Then use `summarize()` and a summary function to return the first value of `year` in the data set. 115 | 116 | ::: {.panel-tabset} 117 | ## {{< fa code >}} Interactive editor 118 | 119 | ```{webr-r} 120 | 121 | 122 | 123 | ``` 124 | 125 | ## {{< fa circle-check >}} Solution 126 | 127 | ```r 128 | babynames |> 129 | filter(name == "Khaleesi") |> 130 | summarize(year = first(year)) 131 | ``` 132 | 133 | ::: 134 | 135 | 136 | ### Distinct name challenge 137 | 138 | In the chunk below, use `summarize()` and a summary function to return a data frame with two columns: 139 | 140 | * A column named `n` that displays the total number of rows in `babynames` 141 | * A column named `distinct` that displays the number of distinct names in `babynames` 142 | 143 | Will these numbers be different? Why or why not? 144 | 145 | ::: {.panel-tabset} 146 | ## {{< fa code >}} Interactive editor 147 | 148 | ```{webr-r} 149 | 150 | 151 | 152 | ``` 153 | 154 | ## {{< fa circle-check >}} Solution 155 | 156 | ```r 157 | babynames |> 158 | summarize(n = n(), distinct = n_distinct(name)) 159 | ``` 160 | 161 | ::: 162 | 163 | ### 164 | 165 | Good job! The two numbers are different because most names appear in the data set more than once. They appear once for each year in which they were used. 166 | 167 | ### `summarize()` by groups? 168 | 169 | How can we apply `summarize()` to find the most popular names in `babynames`? You've seen how to calculate the total number of children that have your name, which provides one of our measures of popularity, i.e. the total number of children that have a name: 170 | 171 | ```{r eval=FALSE} 172 | babynames |> 173 | filter(name == "Andrew", sex == "M") |> 174 | summarize(total = sum(n)) 175 | ``` 176 | 177 | However, we had to isolate your name from the rest of your data to calculate this number. You could imagine writing a program that goes through each name one at a time and: 178 | 179 | 1. filters out the rows with just that name 180 | 2. applies summarize to the rows 181 | 182 | Eventually, the program could combine all of the results back into a single data set. However, you don't need to write such a program; this is the job of {dplyr}'s `group_by()` function. 183 | 184 | ## 185 | 186 | ```{r} 187 | #| echo: false 188 | #| results: asis 189 | create_buttons("03-group_by.html") 190 | ``` 191 | -------------------------------------------------------------------------------- /transform-data/03-deriving/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Deriving information with {dplyr}" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | --- 11 | 12 | ```{r include=FALSE} 13 | knitr::opts_chunk$set( 14 | fig.width = 6, 15 | fig.height = 6 * 0.618, 16 | fig.retina = 3, 17 | dev = "ragg_png", 18 | fig.align = "center", 19 | out.width = "70%" 20 | ) 21 | 22 | source(here::here("R", "helpers.R")) 23 | ``` 24 | 25 | In this case study, you will identify the most popular American names from 1880 to 2015. While doing this, you will master three more dplyr functions: 26 | 27 | * `mutate()`, `group_by()`, and `summarize()`, which help you use your data to compute new variables and summary statistics 28 | 29 | These are some of the most useful R functions for data science, and this tutorial provides everything you need to learn them. 30 | 31 | This tutorial uses the [core tidyverse packages](http://tidyverse.org/), including {ggplot2}, {tibble}, and {dplyr}, as well as the `babynames` package. All of these packages have been pre-installed and pre-loaded for your convenience. 32 | 33 | 34 | ## 35 | 36 | ```{r} 37 | #| echo: false 38 | #| results: asis 39 | create_buttons("01-most-popular-names.html") 40 | ``` 41 | -------------------------------------------------------------------------------- /transform-data/03-deriving/video/grp-mutate.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/transform-data/03-deriving/video/grp-mutate.mp4 -------------------------------------------------------------------------------- /transform-data/03-deriving/video/grp-summarize-00.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/transform-data/03-deriving/video/grp-summarize-00.mp4 -------------------------------------------------------------------------------- /transform-data/03-deriving/video/grp-summarize-01.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/transform-data/03-deriving/video/grp-summarize-01.mp4 -------------------------------------------------------------------------------- /transform-data/03-deriving/video/grp-summarize-02.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/transform-data/03-deriving/video/grp-summarize-02.mp4 -------------------------------------------------------------------------------- /transform-data/03-deriving/video/grp-summarize-03.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/transform-data/03-deriving/video/grp-summarize-03.mp4 -------------------------------------------------------------------------------- /transform-data/03-deriving/video/mutate.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/transform-data/03-deriving/video/mutate.mp4 -------------------------------------------------------------------------------- /visualize-data/01-eda/03-covariation.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Covariation" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | --- 12 | 13 | ```{r include=FALSE} 14 | knitr::opts_chunk$set( 15 | fig.width = 6, 16 | fig.height = 6 * 0.618, 17 | fig.retina = 3, 18 | dev = "ragg_png", 19 | fig.align = "center", 20 | out.width = "70%" 21 | ) 22 | 23 | library(tidyverse) 24 | library(checkdown) 25 | 26 | source(here::here("R", "helpers.R")) 27 | ``` 28 | 29 | 30 | ### What is covariation? {.no-hide} 31 | 32 | If variation describes the behavior _within_ a variable, covariation describes the behavior _between_ variables. **Covariation** is the tendency for the values of two or more variables to vary together in a related way. The best way to spot covariation is to visualise the relationship between two or more variables. How you do that should again depend on whether your variables are categorical or continuous. 33 | 34 | ### Two categorical variables 35 | 36 | You can plot the relationship between two categorical variables with a heatmap or with `geom_count()`: 37 | 38 | ```{r echo=FALSE, out.width="100%"} 39 | #| layout-ncol: 2 40 | diamonds |> 41 | count(color, cut) |> 42 | ggplot(mapping = aes(x = color, y = cut)) + 43 | geom_tile(mapping = aes(fill = n)) + 44 | labs(title = "Color grade vs. cut quality for 53940 diamonds") 45 | 46 | ggplot(diamonds) + 47 | geom_count(aes(color, cut)) + 48 | labs(title = "Color grade vs. cut quality for 53940 diamonds") 49 | ``` 50 | 51 | Again, don't be concerned if you do not know how to make these graphs. For now, let's focus on the strategy of how to use visualizations in EDA. You'll learn how to make different types of plots in the tutorials that follow. 52 | 53 | ### One continuous and one categorical variable 54 | 55 | You can plot the relationship between one continuous and one categorical variable with a boxplot: 56 | 57 | ```{r echo=FALSE, out.width="80%"} 58 | ggplot(mpg) + 59 | geom_boxplot(aes(reorder(class, hwy, median), hwy)) + 60 | labs(title = "Pickup trucks and SUVs display the lowest fuel efficiency") + 61 | labs(x = "class") 62 | ``` 63 | 64 | ### Two continuous variables 65 | 66 | You can plot the relationship between two continuous variables with a scatterplot: 67 | 68 | ```{r echo=FALSE, message=FALSE, out.width="80%"} 69 | ggplot(data = faithful) + 70 | geom_point(aes(x = eruptions, y = waiting)) + 71 | labs(title = "Length of eruption vs wait time before eruption") 72 | ``` 73 | 74 | ### Patterns 75 | 76 | Patterns in your data provide clues about relationships. If a systematic relationship exists between two variables it will appear as a pattern in the data. If you spot a pattern, ask yourself: 77 | 78 | + Could this pattern be due to coincidence (i.e. random chance)? 79 | 80 | + How can you describe the relationship implied by the pattern? 81 | 82 | + How strong is the relationship implied by the pattern? 83 | 84 | + What other variables might affect the relationship? 85 | 86 | + Does the relationship change if you look at individual subgroups of the data? 87 | 88 | Remember that clusters and outliers are also a type of pattern. Two dimensional plots can reveal clusters and outliers that would not be visible in a one dimensional plot. If you spot either, ask yourself what they imply. 89 | 90 | ### Review 6: Patterns 91 | 92 | The scatterplot below shows the relationship between the length of an eruption of Old Faithful and the wait time before the eruption (i.e. the amount of time that passed between it and the previous eruption). 93 | 94 | ```{r echo=FALSE, message=FALSE, out.width="80%"} 95 | ggplot(data = faithful) + 96 | geom_point(aes(x = eruptions, y = waiting)) + 97 | labs(title = "Length of eruption vs wait time before eruption") 98 | ``` 99 | 100 | ::: {.callout-note appearance="simple" icon=false .question} 101 | 102 | **Does the scatterplot above reveal a pattern that helps to explain the variation in lengths of Old Faithful eruptions?** 103 | 104 | ```{r echo=FALSE} 105 | check_question( 106 | answer = "Yes. Long eruptions are associated with a _long_ wait before the eruption", 107 | options = c( 108 | "No. There is no pattern.", 109 | "Yes. Long eruptions are associated with a _short_ wait before the eruption", 110 | "Yes. Long eruptions are associated with a _long_ wait before the eruption" 111 | ), 112 | type = "radio", 113 | button_label = "Submit answer", 114 | q_id = 1, 115 | right = c("Correct! The data seems to suggest that a long build up before an eruption is associated with a long eruption. The plot also shows the two clusters that we saw before: there are long eruptions with a long build up and short eruptions with a short build up.") 116 | ) 117 | ``` 118 | ::: 119 | 120 | 121 | ### Uncertainty 122 | 123 | Patterns provide a useful tool for data scientists because they reveal covariation. If you think of variation as a phenomenon that creates uncertainty, covariation is a phenomenon that reduces it. When two variables covary, you can use the values of one variable to make better predictions about the values of the second. If the covariation is due to a causal relationship (a special case), you can use the value of one variable to control the value of the second. 124 | 125 | 126 | ### Recap 127 | 128 | You've learned a lot in this tutorial. Here's what you should keep with you: 129 | 130 | * EDA is an iterative cycle built around asking and refining questions. 131 | * These two questions are always useful: 132 | 1. What type of variation occurs _within_ my variables? 133 | 1. What type of covariation occurs _between_ my variables? 134 | * Remember the definitions of _variables_, _values_, _observations_, _variation_, _covariation_, _categorical_, and _continuous_. You'll see them again. Frequently. 135 | 136 | Throughout the tutorial, you also encountered several recommendations for plots that visualize variation and covariation for categorical and continuous variables. Plots are a bit like questions in EDA: you should make many quickly and try anything that strikes your fancy. You can refine your plots later to share with others. A lot of refinement will occur naturally as you iterate during EDA. 137 | 138 | The suggestions below can serve as starting point for visualizing data. In the tutorials that follow, you will learn how to make each type of plot, as well as how to use best practices and advanced skills when visualizing data. 139 | 140 | ![](img/plots-table.png){width=80%} 141 | 142 | ## 143 | 144 | ```{r} 145 | #| echo: false 146 | #| results: asis 147 | create_buttons(NULL) 148 | ``` 149 | -------------------------------------------------------------------------------- /visualize-data/01-eda/img/plots-table.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/visualize-data/01-eda/img/plots-table.png -------------------------------------------------------------------------------- /visualize-data/01-eda/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Exploratory data analysis" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | --- 11 | 12 | ```{r include=FALSE} 13 | knitr::opts_chunk$set( 14 | fig.width = 6, 15 | fig.height = 6 * 0.618, 16 | fig.retina = 3, 17 | dev = "ragg_png", 18 | fig.align = "center", 19 | out.width = "70%" 20 | ) 21 | 22 | source(here::here("R", "helpers.R")) 23 | ``` 24 | 25 | 26 | This tutorial will show you how to explore your data in a systematic way, a task that statisticians call **exploratory data analysis**, or **EDA** for short. In the tutorial you will: 27 | 28 | * Learn a strategy for exploring data 29 | * Practice finding patterns in data 30 | * Get tips about how to use different types of plots to explore data 31 | 32 | The tutorial is excerpted from _R for Data Science_ by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. You can purchase the book at [shop.oreilly.com](http://shop.oreilly.com/product/0636920034407.do). 33 | 34 | 35 | ## 36 | 37 | ```{r} 38 | #| echo: false 39 | #| results: asis 40 | create_buttons("01-eda.html") 41 | ``` 42 | -------------------------------------------------------------------------------- /visualize-data/02-bar-charts/01-bar-charts.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Bar charts" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | 12 | engine: knitr 13 | filters: 14 | - webr 15 | webr: 16 | packages: 17 | - ggplot2 18 | - dplyr 19 | cell-options: 20 | editor-font-scale: 0.85 21 | fig-width: 6 22 | fig-height: 3.7 23 | out-width: "70%" 24 | --- 25 | 26 | ```{r include=FALSE} 27 | knitr::opts_chunk$set( 28 | fig.width = 6, 29 | fig.height = 6 * 0.618, 30 | fig.retina = 3, 31 | dev = "ragg_png", 32 | fig.align = "center", 33 | out.width = "70%" 34 | ) 35 | 36 | library(tidyverse) 37 | library(checkdown) 38 | 39 | source(here::here("R", "helpers.R")) 40 | ``` 41 | 42 | ### How to make a bar chart {.no-hide} 43 | 44 | To make a bar chart with {ggplot2}, add `geom_bar()` to the [ggplot2 template](/basics/01-visualization-basics/01-code-template.qmd). For example, the code below plots a bar chart of the `cut` variable in the `diamonds` dataset, which comes with {ggplot2}. 45 | 46 | ```{r out.width="80%"} 47 | ggplot(data = diamonds) + 48 | geom_bar(mapping = aes(x = cut)) 49 | ``` 50 | 51 | ### The y axis 52 | 53 | You should not supply a $y$ aesthetic when you use `geom_bar()`; {ggplot2} will count how many times each $x$ value appears in the data, and then display the counts on the $y$ axis. So, for example, the plot above shows that over 20,000 diamonds in the data set had a value of `Ideal`. 54 | 55 | You can compute this information manually with the `count()` function from the {dplyr} package. 56 | 57 | ```{r} 58 | diamonds |> 59 | count(cut) 60 | ``` 61 | 62 | ### `geom_col()` 63 | 64 | Sometimes, you may want to map the heights of the bars not to counts, but to a variable in the data set. To do this, use `geom_col()`, which is short for column. 65 | 66 | ```{r out.width="80%"} 67 | ggplot(data = pressure) + 68 | geom_col(mapping = aes(x = temperature, y = pressure)) 69 | ``` 70 | 71 | ### `geom_col()` data 72 | 73 | When you use `geom_col()`, your $x$ and $y$ values should have a one to one relationship, as they do in the `pressure` data set (i.e. each value of `temperature` is paired with a single value of `pressure`). 74 | 75 | ```{r} 76 | pressure 77 | ``` 78 | 79 | ### Exercise 1: Make a bar chart 80 | 81 | Use the code chunk below to plot the distribution of the `color` variable in the `diamonds` data set, which comes in the {ggplot2} package. 82 | 83 | ::: {.panel-tabset} 84 | ## {{< fa code >}} Interactive editor 85 | 86 | ```{webr-r} 87 | 88 | 89 | 90 | ``` 91 | 92 | ## {{< fa circle-check >}} Solution 93 | 94 | ```r 95 | ggplot(data = diamonds) + 96 | geom_bar(mapping = aes(x = color)) 97 | ``` 98 | 99 | ::: 100 | 101 | 102 | ### Exercise 2: Interpretation 103 | 104 | ```{r out.width="80%", echo=FALSE} 105 | ggplot(data = diamonds) + 106 | geom_bar(mapping = aes(x = cut)) + 107 | labs(title = "Distribution of diamond cuts") 108 | ``` 109 | 110 | ::: {.callout-note appearance="simple" icon=false .question} 111 | 112 | **What is the most common type of cut in the `diamonds` dataset?** 113 | 114 | ```{r echo=FALSE} 115 | check_question( 116 | answer = "Ideal", 117 | options = c( 118 | "Fair", 119 | "Good", 120 | "Very Good", 121 | "Premium", 122 | "Ideal" 123 | ), 124 | type = "radio", 125 | button_label = "Submit answer", 126 | q_id = 1, 127 | right = c("Correct!") 128 | ) 129 | ``` 130 | ::: 131 | 132 | ::: {.callout-note appearance="simple" icon=false .question} 133 | 134 | **How many diamonds in the dataset had a `Good` cut?** 135 | 136 | ```{r echo=FALSE} 137 | check_question( 138 | answer = "≈5000", 139 | options = c( 140 | "≈2000", 141 | "≈5000", 142 | "≈7000", 143 | "≈20000" 144 | ), 145 | type = "radio", 146 | button_label = "Submit answer", 147 | q_id = 2, 148 | right = c("Correct!") 149 | ) 150 | ``` 151 | ::: 152 | 153 | 154 | ### Exercise 3: What went wrong? 155 | 156 | Diagnose the error below and then fix the code chunk to make a plot. 157 | 158 | ::: {.panel-tabset} 159 | ## {{< fa code >}} Interactive editor 160 | 161 | ```{webr-r} 162 | ggplot(data = pressure) + 163 | geom_bar(mapping = aes(x = temperature, y = pressure)) 164 | 165 | 166 | ``` 167 | 168 | ## {{< fa circle-check >}} Solution 169 | 170 | ```r 171 | ggplot(data = pressure) + 172 | geom_col(mapping = aes(x = temperature, y = pressure)) 173 | ``` 174 | 175 | ::: 176 | 177 | 178 | ### Exercise 4: `count()` and `geom_col()` 179 | 180 | Recreate the bar graph of `color` from exercise one, but this time first use `count()` to manually compute the heights of the bars. Then use `geom_col()` to plot the results as a bar graph. Does your graph look the same as in exercise one? 181 | 182 | ::: {.panel-tabset} 183 | ## {{< fa code >}} Interactive editor 184 | 185 | ```{webr-r} 186 | 187 | 188 | 189 | ``` 190 | 191 | ## {{< fa circle-check >}} Solution 192 | 193 | ```r 194 | diamonds |> 195 | count(color) |> 196 | ggplot() + 197 | geom_col(mapping = aes(x = color, y = n)) 198 | ``` 199 | 200 | ::: 201 | 202 | 203 | ## 204 | 205 | ```{r} 206 | #| echo: false 207 | #| results: asis 208 | create_buttons("02-aesthetics.html") 209 | ``` 210 | -------------------------------------------------------------------------------- /visualize-data/02-bar-charts/02-aesthetics.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Aesthetics" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | 12 | engine: knitr 13 | filters: 14 | - webr 15 | webr: 16 | packages: 17 | - ggplot2 18 | - dplyr 19 | cell-options: 20 | editor-font-scale: 0.85 21 | fig-width: 6 22 | fig-height: 3.7 23 | out-width: "70%" 24 | --- 25 | 26 | ```{r include=FALSE} 27 | knitr::opts_chunk$set( 28 | fig.width = 6, 29 | fig.height = 6 * 0.618, 30 | fig.retina = 3, 31 | dev = "ragg_png", 32 | fig.align = "center", 33 | out.width = "70%" 34 | ) 35 | 36 | library(tidyverse) 37 | library(checkdown) 38 | 39 | source(here::here("R", "helpers.R")) 40 | ``` 41 | 42 | 43 | ### Aesthetics for bars {.no-hide} 44 | 45 | `geom_bar()` and `geom_col()` can use several aesthetics: 46 | 47 | * `alpha` 48 | * `color` 49 | * `fill` 50 | * `linetype` 51 | * `size` 52 | 53 | One of these, `color`, creates the most surprising results. Predict what the code below will return and then run it. 54 | 55 | ::: {.panel-tabset} 56 | ## {{< fa code >}} Interactive editor 57 | 58 | ```{webr-r} 59 | ggplot(data = diamonds) + 60 | geom_bar(mapping = aes(x = cut, color = cut)) 61 | 62 | 63 | ``` 64 | 65 | ::: 66 | 67 | ### `fill` 68 | 69 | The `color` aesthetic controls the outline of each bar in your bar plot, which may not be what you want. To color the interior of each bar, use the `fill` aesthetic: 70 | 71 | ```{r echo=FALSE, out.width="100%"} 72 | #| layout-ncol: 2 73 | ggplot(data = diamonds) + 74 | geom_bar(mapping = aes(x = cut, color = cut), linewidth = 1) + 75 | labs(title = "color = cut") 76 | 77 | ggplot(data = diamonds) + 78 | geom_bar(mapping = aes(x = cut, fill = cut)) + 79 | labs(title = "fill = cut") 80 | ``` 81 | 82 | Use the code chunk below to experiment with fill, along with other `geom_bar()` aesthetics, like `alpha`, `linetype`, and `size`. 83 | 84 | ::: {.panel-tabset} 85 | ## {{< fa code >}} Interactive editor 86 | 87 | ```{webr-r} 88 | ggplot(data = diamonds) + 89 | geom_bar(mapping = aes(x = cut, color = cut)) 90 | 91 | 92 | ``` 93 | 94 | ::: 95 | 96 | 97 | ### Width 98 | 99 | You can control the width of each bar in your bar chart with the `width` parameter. In the chunk below, set `width = 1`, then `width = 0.5`. Can you spot the difference? 100 | 101 | ::: {.panel-tabset} 102 | ## {{< fa code >}} Interactive editor 103 | 104 | ```{webr-r} 105 | ggplot(data = diamonds) + 106 | geom_bar(mapping = aes(x = cut, fill = cut), width = 0.9) 107 | 108 | 109 | ``` 110 | 111 | ::: 112 | 113 | Notice that width is a _parameter_, not an aesthetic mapping. Hence, you should set width _outside_ of the `aes()` function. 114 | 115 | ### Exercise 5: Aesthetics 116 | 117 | Create a colored bar chart of the `class` variable from the `mpg` data set, which comes with ggplot2. Map the interior color of each bar to `class`. 118 | 119 | ::: {.panel-tabset} 120 | ## {{< fa code >}} Interactive editor 121 | 122 | ```{webr-r} 123 | 124 | 125 | 126 | ``` 127 | 128 | ## {{< fa circle-check >}} Solution 129 | 130 | ```r 131 | ggplot(data = mpg) + 132 | geom_bar(mapping = aes(x = class, fill = class)) 133 | ``` 134 | 135 | ::: 136 | 137 | 138 | ## 139 | 140 | ```{r} 141 | #| echo: false 142 | #| results: asis 143 | create_buttons("03-position-adjustments.html") 144 | ``` 145 | -------------------------------------------------------------------------------- /visualize-data/02-bar-charts/04-facets.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Facets" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | 12 | engine: knitr 13 | filters: 14 | - webr 15 | webr: 16 | packages: 17 | - ggplot2 18 | cell-options: 19 | editor-font-scale: 0.85 20 | fig-width: 6 21 | fig-height: 3.7 22 | out-width: "70%" 23 | --- 24 | 25 | ```{r include=FALSE} 26 | knitr::opts_chunk$set( 27 | fig.width = 6, 28 | fig.height = 6 * 0.618, 29 | fig.retina = 3, 30 | dev = "ragg_png", 31 | fig.align = "center", 32 | out.width = "70%" 33 | ) 34 | 35 | library(tidyverse) 36 | library(checkdown) 37 | 38 | source(here::here("R", "helpers.R")) 39 | ``` 40 | 41 | ### Facetting {.no-hide} 42 | 43 | You can more easily compare subgroups of data if you place each subgroup in its own subplot, a process known as **facetting.** 44 | 45 | ```{r echo=FALSE} 46 | ggplot(data = diamonds) + 47 | geom_bar(mapping = aes(x = color, fill = cut)) + 48 | facet_wrap(vars(cut)) 49 | ``` 50 | 51 | ### `facet_grid()` 52 | 53 | {ggplot2} provides two functions for facetting. `facet_grid()` divides the plot into a grid of subplots based on the values of one or two facetting variables. To use it, add `facet_grid()` to the end of your plot call. 54 | 55 | The code chunks below show three ways to facet with `facet_grid()`. Spot the differences between the chunks, then run the code to learn what the differences do. 56 | 57 | ::: {.panel-tabset} 58 | ## {{< fa code >}} Interactive editor 59 | 60 | ```{webr-r} 61 | ggplot(data = diamonds) + 62 | geom_bar(mapping = aes(x = color)) + 63 | facet_grid(rows = vars(clarity), cols = vars(cut)) 64 | 65 | 66 | ``` 67 | 68 | ::: 69 | 70 | ::: {.panel-tabset} 71 | ## {{< fa code >}} Interactive editor 72 | 73 | ```{webr-r} 74 | ggplot(data = diamonds) + 75 | geom_bar(mapping = aes(x = color)) + 76 | facet_grid(cols = vars(cut)) 77 | 78 | 79 | ``` 80 | 81 | ::: 82 | 83 | ::: {.panel-tabset} 84 | ## {{< fa code >}} Interactive editor 85 | 86 | ```{webr-r} 87 | ggplot(data = diamonds) + 88 | geom_bar(mapping = aes(x = color)) + 89 | facet_grid(rows = vars(clarity)) 90 | 91 | 92 | ``` 93 | 94 | ::: 95 | 96 | 97 | 98 | ### `facet_grid()` recap 99 | 100 | As you saw in the code examples, you use `facet_grid()` by passing a `rows` and/or a `cols` argument, with the names of the variables inside a `vars()` function. 101 | 102 | * `facet_grid()` will split the plot into facets vertically by the values of the `rows` variable: each facet will contain the observations that have a common value of the variable. 103 | * `facet_grid()` will split the plot horizontally by values of the `cols` variable. The result is a grid of facets, where each specific subplot shows a specific combination of values. 104 | 105 | 106 | ### `facet_wrap()` 107 | 108 | `facet_wrap()` provides a more relaxed way to facet a plot on a _single_ variable. It will split the plot into subplots and then reorganize the subplots into multiple rows so that each plot has a more or less square aspect ratio. In short, `facet_wrap()` _wraps_ the single row of subplots that you would get with `facet_grid()` into multiple rows. 109 | 110 | To use `facet_wrap()` pass it a variable name inside `vars()`, e.g. `facet_wrap(vars(color))`. 111 | 112 | Add `facet_wrap()` to the code below to create the graph that appeared at the start of this section. Facet by `cut`. 113 | 114 | ::: {.panel-tabset} 115 | ## {{< fa code >}} Interactive editor 116 | 117 | ```{webr-r} 118 | ggplot(data = diamonds) + 119 | geom_bar(mapping = aes(x = color, fill = cut)) 120 | 121 | 122 | ``` 123 | 124 | ## {{< fa circle-check >}} Solution 125 | 126 | ```r 127 | ggplot(data = diamonds) + 128 | geom_bar(mapping = aes(x = color, fill = cut)) + 129 | facet_wrap(vars(cut)) 130 | ``` 131 | 132 | ::: 133 | 134 | 135 | ### `scales` 136 | 137 | By default, each facet in your plot will share the same $x$ and $y$ ranges. You can change this by adding a `scales` argument to `facet_wrap()` or `facet_grid()`. 138 | 139 | * `scales = "free"` will let the $x$ and $y$ range of each facet vary 140 | * `scales = "free_x"` will let the $x$ range of each facet vary, but not the $y$ range 141 | * `scales = "free_y"` will let the $y$ range of each facet vary, but not the $x$ range. This is a convenient way to compare the shapes of different distributions 142 | 143 | Try changing the `scales` argument from `free` to `free_x` to `free_y` to see how it works: 144 | 145 | ::: {.panel-tabset} 146 | ## {{< fa code >}} Interactive editor 147 | 148 | ```{webr-r} 149 | ggplot(data = diamonds) + 150 | geom_bar(mapping = aes(x = color, fill = cut)) + 151 | facet_wrap(vars(cut), scales = "free") 152 | 153 | 154 | ``` 155 | 156 | ::: 157 | 158 | 159 | 160 | ### Recap 161 | 162 | In this tutorial, you learned how to make bar charts; but much of what you learned applies to other types of charts as well. Here's what you should know: 163 | 164 | * Bar charts are the basis for histograms, which means that you can interpret histograms in a similar way. 165 | * Bars are not the only geom in {ggplot2} that use the fill aesthetic. You can use both fill and color aesthetics with any geom that has an "interior" region. 166 | * You can use the same position adjustments with any {ggplot2} geom: `"identity"`, `"stack"`, `"dodge"`, `"fill"`, `"nudge"`, and `"jitter"` (we'll learn about `"nudge"` and `"jitter"` later). Each geom comes with its own sensible default. 167 | * You can facet any {ggplot2} plot by adding `facet_grid()` or `facet_wrap()` to the plot call. 168 | 169 | Bar charts are an excellent way to display the distribution of a categorical variable. In the next tutorial, we'll meet a set of geoms that display the distribution of a continuous variable. 170 | 171 | 172 | ## 173 | 174 | ```{r} 175 | #| echo: false 176 | #| results: asis 177 | create_buttons(NULL) 178 | ``` 179 | -------------------------------------------------------------------------------- /visualize-data/02-bar-charts/img/positions.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/visualize-data/02-bar-charts/img/positions.png -------------------------------------------------------------------------------- /visualize-data/02-bar-charts/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Bar charts" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | --- 11 | 12 | ```{r include=FALSE} 13 | knitr::opts_chunk$set( 14 | fig.width = 6, 15 | fig.height = 6 * 0.618, 16 | fig.retina = 3, 17 | dev = "ragg_png", 18 | fig.align = "center", 19 | out.width = "70%" 20 | ) 21 | 22 | source(here::here("R", "helpers.R")) 23 | ``` 24 | 25 | 26 | This tutorial will show you how to make and enhance **bar charts** with the {ggplot2} package. You will learn how to: 27 | 28 | * make and interpret bar charts 29 | * customize bar charts with **aesthetics** and **parameters** 30 | * use **position adjustments** 31 | * use **facets** to create subplots 32 | 33 | The tutorial is adapted from [_R for Data Science_](https://r4ds.had.co.nz/) by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. You can purchase the book at [shop.oreilly.com](http://shop.oreilly.com/product/0636920034407.do). 34 | 35 | The tutorial uses the {ggplot2} and {dplyr} packages, which have been pre-loaded for your convenience. 36 | 37 | 38 | ## 39 | 40 | ```{r} 41 | #| echo: false 42 | #| results: asis 43 | create_buttons("01-bar-charts.html") 44 | ``` 45 | -------------------------------------------------------------------------------- /visualize-data/03-histograms/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Histograms" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | --- 11 | 12 | ```{r include=FALSE} 13 | knitr::opts_chunk$set( 14 | fig.width = 6, 15 | fig.height = 6 * 0.618, 16 | fig.retina = 3, 17 | dev = "ragg_png", 18 | fig.align = "center", 19 | out.width = "70%" 20 | ) 21 | 22 | source(here::here("R", "helpers.R")) 23 | ``` 24 | 25 | **Histograms** are the most popular way to visualize continuous distributions. Here we will look at them and their derivatives. You will learn how to: 26 | 27 | * Make and interpret histograms 28 | * Adjust the **binwidth** of a histogram to reveal new information 29 | * Use geoms that are similar to histograms, such as __dotplots__, __frequency polygons__, and __densities__ 30 | 31 | The tutorial is adapted from [_R for Data Science_](https://r4ds.had.co.nz/) by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. You can purchase the book at [shop.oreilly.com](http://shop.oreilly.com/product/0636920034407.do). 32 | 33 | The tutorial uses the {ggplot2} and {dplyr} packages, which have been pre-loaded for your convenience. 34 | 35 | 36 | ## 37 | 38 | ```{r} 39 | #| echo: false 40 | #| results: asis 41 | create_buttons("01-histograms.html") 42 | ``` 43 | -------------------------------------------------------------------------------- /visualize-data/04-boxplots/02-similar-geoms.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Similar geoms" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | engine: knitr 12 | filters: 13 | - webr 14 | webr: 15 | packages: 16 | - ggplot2 17 | cell-options: 18 | editor-font-scale: 0.85 19 | fig-width: 6 20 | fig-height: 3.7 21 | out-width: "70%" 22 | --- 23 | 24 | ```{r include=FALSE} 25 | knitr::opts_chunk$set( 26 | fig.width = 6, 27 | fig.height = 6 * 0.618, 28 | fig.retina = 3, 29 | dev = "ragg_png", 30 | fig.align = "center", 31 | out.width = "70%" 32 | ) 33 | 34 | library(tidyverse) 35 | 36 | source(here::here("R", "helpers.R")) 37 | ``` 38 | 39 | ### `geom_dotplot()` {.no-hide} 40 | 41 | Boxplots provide a quick way to represent a distribution, but they leave behind a lot of information. {ggplot2} supplements boxplots with two geoms that show more information. 42 | 43 | The first is `geom_dotplot()`. If you set the `binaxis` parameter of `geom_dotplot()` to `"y"`, `geom_dotplot()` behaves like `geom_boxplot()`, display a separate distribution for each group of data. 44 | 45 | Here each group functions like a vertical histogram. Add the parameter `stackdir = "center"` then re-run the code. Can you interpret the results? 46 | 47 | ::: {.panel-tabset} 48 | ## {{< fa code >}} Interactive editor 49 | 50 | ```{webr-r} 51 | ggplot(data = mpg) + 52 | geom_dotplot(mapping = aes(x = class, y = hwy), binaxis = "y", 53 | dotsize = 0.5, binwidth = 1) 54 | 55 | 56 | ``` 57 | 58 | 59 | ## {{< fa circle-check >}} Solution 60 | 61 | ```r 62 | ggplot(data = mpg) + 63 | geom_dotplot(mapping = aes(x = class, y = hwy), binaxis = "y", 64 | dotsize = 0.5, binwidth = 1, stackdir = "center") 65 | ``` 66 | 67 | ::: 68 | 69 | ### 70 | 71 | Good job! When you set `stackdir = "center"`, `geom_dotplot()` arranges each row of dots symmetrically around the $x$ value. This layout will help you understand the next geom. 72 | 73 | As in the histogram tutorial, it takes a lot of tweaking to make a dotplot look right. As a result, I tend to only use them when I want to make a point. 74 | 75 | 76 | ### `geom_violin()` 77 | 78 | `geom_violin()` provides a second alternative to `geom_boxplot()`. A violin plot uses densities to draw a smoothed version of the centered dotplot you just made. 79 | 80 | You can think of a violin plot as an outline drawn around the edges of a centered dotplot. Each "violin" spans the range of the data. The violin is thick where there are many values, and thin where there are few. 81 | 82 | Convert the plot below from a boxplot to a violin plot. Note that violin plots do not use the parameters you saw for dotplots. 83 | 84 | ::: {.panel-tabset} 85 | ## {{< fa code >}} Interactive editor 86 | 87 | ```{webr-r} 88 | ggplot(data = mpg) + 89 | geom_boxplot(mapping = aes(x = class, y = hwy)) 90 | 91 | 92 | ``` 93 | 94 | ## {{< fa circle-check >}} Solution 95 | 96 | ```r 97 | ggplot(data = mpg) + 98 | geom_violin(mapping = aes(x = class, y = hwy)) 99 | ``` 100 | 101 | ::: 102 | 103 | ### 104 | 105 | 'Good job! Another way to interpret a violin plot is to mentally "push" the width of each violin all to one side (so the other side is a straight line). The result would be a density (e.g. `geom_density()`) turned on its side for each distribution). 106 | 107 | ### Exercise 7: Violin plots 108 | 109 | You can further enhance violin plots by adding the parameter `draw_quantiles = c(0.25, 0.5, 0.75)`. This will cause ggplot2 to draw horizontal lines across the violins at the 25th, 50th, and 75th percentiles. These are the same three horizontal lines that are displayed in a boxplot (the 25th and 75th percentiles are the bounds of the box, the 50th percentile is the median). 110 | 111 | Add these lines to the violin plot below. 112 | 113 | ::: {.panel-tabset} 114 | ## {{< fa code >}} Interactive editor 115 | 116 | ```{webr-r} 117 | ggplot(data = mpg) + 118 | geom_violin(mapping = aes(x = class, y = hwy)) 119 | 120 | 121 | ``` 122 | 123 | ## {{< fa circle-check >}} Solution 124 | 125 | ```r 126 | ggplot(data = mpg) + 127 | geom_violin(mapping = aes(x = class, y = hwy), draw_quantiles = c(0.25, 0.5, 0.75)) 128 | ``` 129 | 130 | ::: 131 | 132 | ### 133 | 134 | Good job! Can you predict how you would use `draw_quantiles` to draw a horizontal line at a different percentile, like the 60th percentile?. 135 | 136 | 137 | ## 138 | 139 | ```{r} 140 | #| echo: false 141 | #| results: asis 142 | create_buttons("03-counts.html") 143 | ``` 144 | -------------------------------------------------------------------------------- /visualize-data/04-boxplots/03-counts.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Counts" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | engine: knitr 12 | filters: 13 | - webr 14 | webr: 15 | packages: 16 | - ggplot2 17 | - dplyr 18 | cell-options: 19 | editor-font-scale: 0.85 20 | fig-width: 6 21 | fig-height: 3.7 22 | out-width: "70%" 23 | --- 24 | 25 | ```{r include=FALSE} 26 | knitr::opts_chunk$set( 27 | fig.width = 6, 28 | fig.height = 6 * 0.618, 29 | fig.retina = 3, 30 | dev = "ragg_png", 31 | fig.align = "center", 32 | out.width = "70%" 33 | ) 34 | 35 | library(tidyverse) 36 | 37 | source(here::here("R", "helpers.R")) 38 | ``` 39 | 40 | ### `geom_count()` {.no-hide} 41 | 42 | Boxplots provide an efficient way to explore the interaction of a continuous variable and a categorical variable. But what if you have two categorical variables? 43 | 44 | You can see how observations are distributed across two categorical variables with `geom_count()`. `geom_count()` draws a point at each combination of values from the two variables. The size of the point is mapped to the number of observations with this combination of values. Rare combinations will have small points, frequent combinations will have large points. 45 | 46 | ```{r out.width="80%", echo=FALSE, message=FALSE} 47 | ggplot(data = diamonds) + 48 | geom_count(mapping = aes(x = color, y = clarity)) 49 | ``` 50 | 51 | ### Exercise 8: Count plots 52 | 53 | Use `geom_count()` to plot the interaction of the `cut` and `clarity` variables in the `diamonds` data set. 54 | 55 | ::: {.panel-tabset} 56 | ## {{< fa code >}} Interactive editor 57 | 58 | ```{webr-r} 59 | 60 | 61 | 62 | ``` 63 | 64 | ## {{< fa circle-check >}} Solution 65 | 66 | ```r 67 | ggplot(data = diamonds) + 68 | geom_count(mapping = aes(x = cut, y = clarity)) 69 | ``` 70 | 71 | ::: 72 | 73 | 74 | ### `count()` 75 | 76 | You can use the `count()` function in the {dplyr} package to compute the count values displayed by `geom_count()`. To use `count()`, pass it a data frame and then the names of zero or more variables in the data frame. `count()` will return a new table that lists how many observations occur with each possible combination of the listed variables. 77 | 78 | So for example, the code below returns the counts that you visualized in Exercise 8. 79 | 80 | ```{r} 81 | diamonds |> 82 | count(cut, clarity) 83 | ``` 84 | 85 | ### Heat maps 86 | 87 | Heat maps provide a second way to visualize the relationship between two categorical variables. They work like count plots, but use a fill color instead of a point size, to display the number of observations in each combination. 88 | 89 | ### How to make a heat map 90 | 91 | {ggplot2} does not provide a geom function for heat maps, but you can construct a heat map by plotting the results of `count()` with `geom_tile()`. 92 | 93 | To do this, set the x and y aesthetics of `geom_tile()` to the variables that you pass to `count()`. Then map the fill aesthetic to the `n` variable computed by `count()`. The plot below displays the same counts as the plot in Exercise 8. 94 | 95 | ```{r out.width="80%"} 96 | diamonds |> 97 | count(cut, clarity) |> 98 | ggplot() + 99 | geom_tile(mapping = aes(x = cut, y = clarity, fill = n)) 100 | ``` 101 | 102 | ### Exercise 9: Make a heat map 103 | 104 | Practice the method above by re-creating the heat map below. 105 | 106 | ```{r echo=FALSE, out.width="80%"} 107 | diamonds |> 108 | count(color, cut) |> 109 | ggplot(mapping = aes(x = color, y = cut)) + 110 | geom_tile(mapping = aes(fill = n)) 111 | ``` 112 | 113 | ::: {.panel-tabset} 114 | ## {{< fa code >}} Interactive editor 115 | 116 | ```{webr-r} 117 | 118 | 119 | 120 | ``` 121 | 122 | ## {{< fa circle-check >}} Solution 123 | 124 | ```r 125 | diamonds |> 126 | count(color, cut) |> 127 | ggplot(mapping = aes(x = color, y = cut)) + 128 | geom_tile(mapping = aes(fill = n)) 129 | ``` 130 | 131 | ::: 132 | 133 | ### 134 | 135 | Good job! 136 | 137 | ### Recap 138 | 139 | Boxplots, dotplots and violin plots provide an easy way to look for relationships between a continuous variable and a categorical variable. Violin plots convey a lot of information quickly, but boxplots have a head start in popularity---they were easy to use when statisticians had to draw graphs by hand. 140 | 141 | In any of these graphs, look for distributions, ranges, medians, skewness or anything else that catches your eye to change in an unusual way from distribution to distribution. Often, you can make patterns even more revealing with the `fct_reorder()` function from the {forcats} package (we'll wait to learn about {forcats} until after you study factors). 142 | 143 | Count plots and heat maps help you see how observations are distributed across the interactions of two categorical variables. 144 | 145 | ## 146 | 147 | ```{r} 148 | #| echo: false 149 | #| results: asis 150 | create_buttons(NULL) 151 | ``` 152 | -------------------------------------------------------------------------------- /visualize-data/04-boxplots/img/box-png.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/visualize-data/04-boxplots/img/box-png.png -------------------------------------------------------------------------------- /visualize-data/04-boxplots/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Boxplots and counts" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | --- 11 | 12 | ```{r include=FALSE} 13 | knitr::opts_chunk$set( 14 | fig.width = 6, 15 | fig.height = 6 * 0.618, 16 | fig.retina = 3, 17 | dev = "ragg_png", 18 | fig.align = "center", 19 | out.width = "70%" 20 | ) 21 | 22 | source(here::here("R", "helpers.R")) 23 | ``` 24 | 25 | **Boxplots** display the relationship between a continuous variable and a categorical variable. **Count** plots display the relationship between two categorical variables. In this tutorial, you will learn how to use both. You will learn how to: 26 | 27 | * Make and interpret boxplots 28 | * Rotate boxplots by flipping the coordinate system of your plot 29 | * Use *violin* plots and *dotplots*, two geoms that are similar to boxplots 30 | * Make and interpret count plots 31 | 32 | The tutorial is adapted from [_R for Data Science_](https://r4ds.had.co.nz/) by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. You can purchase the book at [shop.oreilly.com](http://shop.oreilly.com/product/0636920034407.do). 33 | 34 | The tutorial uses the {ggplot2} and {dplyr} packages, which have been pre-loaded for your convenience. 35 | 36 | 37 | ## 38 | 39 | ```{r} 40 | #| echo: false 41 | #| results: asis 42 | create_buttons("01-boxplots.html") 43 | ``` 44 | -------------------------------------------------------------------------------- /visualize-data/05-scatterplots/01-scatterplots.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Scatterplots" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | engine: knitr 12 | filters: 13 | - webr 14 | webr: 15 | packages: 16 | - ggplot2 17 | - dplyr 18 | cell-options: 19 | editor-font-scale: 0.85 20 | fig-width: 6 21 | fig-height: 3.7 22 | out-width: "70%" 23 | --- 24 | 25 | ```{r include=FALSE} 26 | knitr::opts_chunk$set( 27 | fig.width = 6, 28 | fig.height = 6 * 0.618, 29 | fig.retina = 3, 30 | dev = "ragg_png", 31 | fig.align = "center", 32 | out.width = "70%" 33 | ) 34 | 35 | library(tidyverse) 36 | set.seed(1234) 37 | 38 | source(here::here("R", "helpers.R")) 39 | ``` 40 | 41 | ### Review 1: `geom_point()` {.no-hide} 42 | 43 | In [Visualization Basics](/basics/01-visualization-basics/), you learned how to make a scatterplot with `geom_point()`. 44 | 45 | The code below summarizes the mpg data set and begins to plot the results. Finish the plot with `geom_point()`. Put `mean_cty` on the $x$ axis and `mean_hwy` on the $y$ axis. 46 | 47 | ::: {.panel-tabset} 48 | ## {{< fa code >}} Interactive editor 49 | 50 | ```{webr-r} 51 | mpg |> 52 | group_by(class) |> 53 | summarize(mean_cty = mean(cty), mean_hwy = mean(hwy)) |> 54 | ggplot() 55 | 56 | 57 | ``` 58 | 59 | ## {{< fa circle-check >}} Solution 60 | 61 | ```r 62 | mpg |> 63 | group_by(class) |> 64 | summarize(mean_cty = mean(cty), mean_hwy = mean(hwy)) |> 65 | ggplot() + 66 | geom_point(mapping = aes(x = mean_cty, y = mean_hwy)) 67 | ``` 68 | 69 | ::: 70 | 71 | ### 72 | 73 | Good job! It can be tricky to remember when to use `|>` and when to use `+`. Use `|>` to add one complete step to a pipe of code. Use `+` to add one more line to a {ggplot2} call. 74 | 75 | ### `geom_text()` and `geom_label()` 76 | 77 | `geom_text()` and `geom_label()` create scatterplots that use words instead of points to display data. Each requires the extra aesthetic `label`, which you should map to a variable that contains text to display for each observation. 78 | 79 | Convert the plot below from `geom_point()` to `geom_text()` and map the `label` aesthetic to the `class` variable. When you are finished convert the code to `geom_label()` and rerun the plot. Can you spot the difference? 80 | 81 | ::: {.panel-tabset} 82 | ## {{< fa code >}} Interactive editor 83 | 84 | ```{webr-r} 85 | mpg |> 86 | group_by(class) |> 87 | summarize(mean_cty = mean(cty), mean_hwy = mean(hwy)) |> 88 | ggplot() + 89 | geom_point(mapping = aes(x = mean_cty, y = mean_hwy)) 90 | 91 | 92 | ``` 93 | 94 | ## {{< fa circle-check >}} Solution 95 | 96 | ```r 97 | mpg |> 98 | group_by(class) |> 99 | summarize(mean_cty = mean(cty), mean_hwy = mean(hwy)) |> 100 | ggplot() + 101 | geom_text(mapping = aes(x = mean_cty, y = mean_hwy, label = class)) 102 | 103 | mpg |> 104 | group_by(class) |> 105 | summarize(mean_cty = mean(cty), mean_hwy = mean(hwy)) |> 106 | ggplot() + 107 | geom_label(mapping = aes(x = mean_cty, y = mean_hwy, label = class)) 108 | ``` 109 | 110 | ::: 111 | 112 | ### 113 | 114 | Good job! `geom_text()` replaces each point with a piece of text supplied by the label aesthetic. `geom_label()` replaces each point with a textbox. Notice that some pieces of text overlap each other, and others run off the page. We'll soon look at a way to fix this. 115 | 116 | ### `geom_smooth()` 117 | 118 | In [Visualization Basics](/basics/01-visualization-basics/), you met `geom_smooth()`, which provides a summarized version of a scatterplot. 119 | 120 | `geom_smooth()` uses a model to fit a smoothed line to the data and then visualizes the results. By default, `geom_smooth()` fits a loess smooth to data sets with less than 1,000 observations, and a generalized additive model to data sets with more than 1,000 observations. 121 | 122 | ```{r echo=FALSE, out.width="100%", message=FALSE, warning=FALSE} 123 | #| layout-ncol: 2 124 | mpg |> 125 | group_by(class) |> 126 | summarize(mean_cty = mean(cty), mean_hwy = mean(hwy)) |> 127 | ggplot() + 128 | geom_point(mapping = aes(x = mean_cty, y = mean_hwy)) + 129 | labs(title = "geom_point()") + 130 | ylim(16, 30) 131 | 132 | mpg |> 133 | group_by(class) |> 134 | summarize(mean_cty = mean(cty), mean_hwy = mean(hwy)) |> 135 | ggplot() + 136 | geom_smooth(mapping = aes(x = mean_cty, y = mean_hwy), se = FALSE) + 137 | labs(title = "geom_smooth()") + 138 | ylim(16, 30) 139 | ``` 140 | 141 | ### `method` 142 | 143 | You can use the `method` parameter of `geom_smooth()` to fit and display other types of model lines. To do this, pass `method` the name of an R modeling function for `geom_smooth()` to use, such as `"lm"` (for linear models) or `"glm"` (for generalized linear models). 144 | 145 | In the code below, use `geom_smooth()` to draw the linear model line that fits the data. 146 | 147 | ::: {.panel-tabset} 148 | ## {{< fa code >}} Interactive editor 149 | 150 | ```{webr-r} 151 | mpg |> 152 | group_by(class) |> 153 | summarize(mean_cty = mean(cty), mean_hwy = mean(hwy)) |> 154 | ggplot() 155 | 156 | 157 | ``` 158 | 159 | ## {{< fa circle-check >}} Solution 160 | 161 | ```r 162 | mpg |> 163 | group_by(class) |> 164 | summarize(mean_cty = mean(cty), mean_hwy = mean(hwy)) |> 165 | ggplot() + 166 | geom_smooth(mapping = aes(x = mean_cty, y = mean_hwy), method = "lm") 167 | ``` 168 | 169 | ::: 170 | 171 | ### 172 | 173 | Good job! Now let's look at a way to make `geom_smooth()` much more useful. 174 | 175 | ## 176 | 177 | ```{r} 178 | #| echo: false 179 | #| results: asis 180 | create_buttons("02-layers.html") 181 | ``` 182 | -------------------------------------------------------------------------------- /visualize-data/05-scatterplots/03-coordinate-systems.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Coordinate systems" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | engine: knitr 12 | filters: 13 | - webr 14 | webr: 15 | packages: 16 | - ggplot2 17 | cell-options: 18 | editor-font-scale: 0.85 19 | fig-width: 6 20 | fig-height: 3.7 21 | out-width: "70%" 22 | --- 23 | 24 | ```{r include=FALSE} 25 | knitr::opts_chunk$set( 26 | fig.width = 6, 27 | fig.height = 6 * 0.618, 28 | fig.retina = 3, 29 | dev = "ragg_png", 30 | fig.align = "center", 31 | out.width = "70%" 32 | ) 33 | 34 | library(tidyverse) 35 | set.seed(1234) 36 | 37 | source(here::here("R", "helpers.R")) 38 | ``` 39 | 40 | ### `coord_flip()` {.no-hide} 41 | 42 | One way to customize a scatterplot is to plot it in a new coordinate system. {ggplot2} provides several helper functions that change the coordinate system of a plot. You've already seen one of these in action in the [boxplots tutorial](/visualize-data/04-boxplots/): `coord_flip()` flips the $x$ and $y$ axes of a plot. 43 | 44 | ```{r out.width="80%", message=FALSE, warning=FALSE} 45 | ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + 46 | geom_boxplot(outlier.alpha = 0) + 47 | geom_jitter(width = 0) + 48 | coord_flip() 49 | ``` 50 | 51 | ### The coord functions 52 | 53 | Altogether, {ggplot2} comes with several `coord` functions: 54 | 55 | * `coord_cartesian()`: (the default) Cartesian coordinates 56 | * `coord_fixed()`: Cartesian coordinates that maintain a fixed aspect ratio as the plot window is resized 57 | * `coord_flip()`: Cartesian coordinates with x and y axes flipped 58 | * `coord_sf()`: cartographic projections for plotting maps 59 | * `coord_polar()` and `coord_radial()`: polar and radial coordinates for round plots like pie charts 60 | * `coord_trans()`: transformed Cartesian coordinates 61 | 62 | By default, {ggplot2} will draw a plot in Cartesian coordinates unless you add one of the functions above to the plot code. 63 | 64 | ### `coord_polar()` 65 | 66 | You use each coord function like you use `coord_flip()`, by adding it to a {ggplot2} call. 67 | 68 | So for example, you could add `coord_polar()` to a plot to make a graph that uses polar coordinates. 69 | 70 | ```{r out.width="80%", message=FALSE, warning=FALSE} 71 | ggplot(data = diamonds) + 72 | geom_bar(mapping = aes(x = cut, fill = cut), width = 1) 73 | 74 | last_plot() + 75 | coord_polar() 76 | ``` 77 | 78 | ### Coordinate systems and scatterplots 79 | 80 | How can a coordinate system improve a scatterplot? 81 | 82 | Consider, the scatterplot below. It shows a strong relationship between the carat size of a diamond and its price. 83 | 84 | ```{r echo=FALSE, out.width="80%", message=FALSE, warning=FALSE} 85 | ggplot(data = diamonds) + 86 | geom_point(mapping = aes(x = carat, y = price)) 87 | ``` 88 | 89 | However, the relationship does not appear linear. It appears to have the form $y = x^{n}$, a common relationship found in nature. You can estimate the $n$ by replotting the data in a _log-log plot_. 90 | 91 | ### log-log plots 92 | 93 | Log-log plots graph the log of $x$ vs. the log of $y$, which has a valuable visual effect. If you log both sides of a relationship like 94 | 95 | $$ 96 | y = x^{n} 97 | $$ 98 | 99 | You get a linear relationship with slope $n$: 100 | 101 | $$ 102 | \begin{aligned} 103 | \log(y) &= \log(x^{n}) \\ 104 | \log(y) &= n \times \log(x) 105 | \end{aligned} 106 | $$ 107 | 108 | In other words, log-log plots unbend power relationships into straight lines. Moreover, they display $n$ as the slope of the straight line, which is reasonably easy to estimate. 109 | 110 | Try this by using the diamonds dataset to plot `log(carat)` on the x-axis and `log(price)` on the y-axis: 111 | 112 | ::: {.panel-tabset} 113 | ## {{< fa code >}} Interactive editor 114 | 115 | ```{webr-r} 116 | 117 | 118 | 119 | ``` 120 | 121 | ## {{< fa circle-check >}} Solution 122 | 123 | ```r 124 | ggplot(data = diamonds) + 125 | geom_point(mapping = aes(x = log(carat), y = log(price))) 126 | ``` 127 | 128 | ::: 129 | 130 | ### 131 | 132 | Good job! Now let's look at how you can do the same transformation, and others as well with a coord function. 133 | 134 | ### `coord_trans()` 135 | 136 | `coord_trans()` provides a second way to do the same transformation, or similar transformations. 137 | 138 | To use `coord_trans()` give it an $x$ and/or a $y$ argument. Set each to the name of an R function surrounded by quotation marks. `coord_trans()` will use the function to transform the specified axis before plotting the raw data. 139 | 140 | ::: {.panel-tabset} 141 | ## {{< fa code >}} Interactive editor 142 | 143 | ```{webr-r} 144 | ggplot(data = diamonds) + 145 | geom_point(mapping = aes(x = carat, y = price)) + 146 | coord_trans(x = "log", y = "log") 147 | 148 | 149 | ``` 150 | 151 | ::: 152 | 153 | 154 | ### Recap 155 | 156 | Scatterplots are one of the most useful types of plots for data science. You will have many chances to use `geom_point()`, `geom_smooth()`, and `geom_label_repel()` in your day-to-day work. 157 | 158 | However, this tutor introduced important two concepts that apply to more than just scatterplots: 159 | 160 | * You can add **multiple layers** to any plot that you make with {ggplot2} 161 | * You can add a different **coordinate system** to any plot that you make with {ggplot2} 162 | 163 | 164 | ## 165 | 166 | ```{r} 167 | #| echo: false 168 | #| results: asis 169 | create_buttons(NULL) 170 | ``` 171 | -------------------------------------------------------------------------------- /visualize-data/05-scatterplots/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Scatterplots" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | --- 11 | 12 | ```{r include=FALSE} 13 | knitr::opts_chunk$set( 14 | fig.width = 6, 15 | fig.height = 6 * 0.618, 16 | fig.retina = 3, 17 | dev = "ragg_png", 18 | fig.align = "center", 19 | out.width = "70%" 20 | ) 21 | 22 | source(here::here("R", "helpers.R")) 23 | ``` 24 | 25 | A **scatterplot** displays the relationship between two continuous variables. Scatterplots are one of the most common types of graphs---in fact, you've met scatterplots already in [Visualization Basics](/basics/01-visualization-basics/). 26 | 27 | In this tutorial, you'll learn how to: 28 | 29 | * Make new types of scatterplots with `geom_text()` and `geom_jitter()` 30 | * Add multiple **layers** of geoms to a plot 31 | * Enhance scatterplots with `geom_smooth()`, `geom_rug()`, and `geom_repel()` 32 | * Change the **coordinate system** of a plot 33 | 34 | The tutorial is adapted from [_R for Data Science_](https://r4ds.had.co.nz/) by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. You can purchase the book at [shop.oreilly.com](http://shop.oreilly.com/product/0636920034407.do). 35 | 36 | The tutorial uses the {ggplot2}, {ggrepel}, and {dplyr} packages, which have been pre-loaded for your convenience. 37 | 38 | 39 | ## 40 | 41 | ```{r} 42 | #| echo: false 43 | #| results: asis 44 | create_buttons("01-scatterplots.html") 45 | ``` 46 | -------------------------------------------------------------------------------- /visualize-data/06-line-graphs/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Line plots" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | --- 11 | 12 | ```{r include=FALSE} 13 | knitr::opts_chunk$set( 14 | fig.width = 6, 15 | fig.height = 6 * 0.618, 16 | fig.retina = 3, 17 | dev = "ragg_png", 18 | fig.align = "center", 19 | out.width = "70%" 20 | ) 21 | 22 | source(here::here("R", "helpers.R")) 23 | ``` 24 | 25 | A **line graph** displays a functional relationship between two continuous variables. A **map** displays spatial data. The two may seem different, but they are made in similar ways. This tutorial will examine them both. 26 | 27 | In this tutorial, you'll learn how to: 28 | 29 | * Make new types of line plots with `geom_step()`, `geom_area()`, `geom_path()`, and `geom_polygon()` 30 | * Avoid "whipsawing" with the group aesthetic 31 | * Find and plot map data with `geom_sf()` 32 | * Transform a coordinate system into a map projection with `coord_sf()` 33 | 34 | The tutorial is adapted from [_R for Data Science_](https://r4ds.had.co.nz/) by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. You can purchase the book at [shop.oreilly.com](http://shop.oreilly.com/product/0636920034407.do). 35 | 36 | The tutorial uses the {ggplot2}, {sf}, and {dplyr} packages, which have been pre-loaded for your convenience. 37 | 38 | 39 | ## 40 | 41 | ```{r} 42 | #| echo: false 43 | #| results: asis 44 | create_buttons("01-line-graphs.html") 45 | ``` 46 | -------------------------------------------------------------------------------- /visualize-data/07-overplotting/01-overplotting.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Overplotting" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | --- 11 | 12 | ```{r include=FALSE} 13 | knitr::opts_chunk$set( 14 | fig.width = 6, 15 | fig.height = 6 * 0.618, 16 | fig.retina = 3, 17 | dev = "ragg_png", 18 | fig.align = "center", 19 | out.width = "70%" 20 | ) 21 | 22 | library(tidyverse) 23 | 24 | source(here::here("R", "helpers.R")) 25 | ``` 26 | 27 | ### What is overplotting? {.no-hide} 28 | 29 | You've seen this plot several times in previous tutorials, but have you noticed that it only displays 126 points? This is unusual because the plot visualizes a data set that contains 234 points. 30 | 31 | ```{r echo=FALSE, out.width="80%"} 32 | ggplot(data = mpg) + 33 | geom_point(mapping = aes(x = displ, y = hwy)) 34 | ``` 35 | 36 | The missing points are hidden behind other points, a phenomenon known as _overplotting_. Overplotting is a problem because it provides an incomplete picture of the dataset. You cannot determine where the *mass* of the points fall, which makes it difficult to spot relationships in the data. 37 | 38 | ### Causes of overplotting 39 | 40 | Overplotting usually occurs for two different reasons: 41 | 42 | 1. The data points have been rounded to a "grid" of common values, as in the plot above 43 | 2. The dataset is so large that it cannot be plotted without points overlapping each other 44 | 45 | How you deal with overplotting will depend on the cause. 46 | 47 | 48 | ## 49 | 50 | ```{r} 51 | #| echo: false 52 | #| results: asis 53 | create_buttons("02-rounding.html") 54 | ``` 55 | -------------------------------------------------------------------------------- /visualize-data/07-overplotting/02-rounding.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Rounding" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | engine: knitr 12 | filters: 13 | - webr 14 | webr: 15 | packages: 16 | - ggplot2 17 | cell-options: 18 | editor-font-scale: 0.85 19 | fig-width: 6 20 | fig-height: 3.7 21 | out-width: "70%" 22 | --- 23 | 24 | ```{r include=FALSE} 25 | knitr::opts_chunk$set( 26 | fig.width = 6, 27 | fig.height = 6 * 0.618, 28 | fig.retina = 3, 29 | dev = "ragg_png", 30 | fig.align = "center", 31 | out.width = "70%" 32 | ) 33 | 34 | library(tidyverse) 35 | 36 | source(here::here("R", "helpers.R")) 37 | ``` 38 | 39 | 40 | ### Overplotting due to rounding {.no-hide} 41 | 42 | If your overplotting is due to rounding, you can obtain a better picture of the data by making each point semi-transparent. For example you could _set_ the `alpha` aesthetic of the plot below to a _value_ less than one, which will make the points transparent. 43 | 44 | Try this now. Set the points to an alpha of 0.25, which will make each point 25% opaque (i.e. four points staked on top of each other will create a solid black). 45 | 46 | ::: {.panel-tabset} 47 | ## {{< fa code >}} Interactive editor 48 | 49 | ```{webr-r} 50 | ggplot(data = mpg) + 51 | geom_point(mapping = aes(x = displ, y = hwy)) 52 | 53 | 54 | ``` 55 | 56 | ## {{< fa lightbulb >}} Hint 57 | 58 | **Hint:** Make sure you set `alpha = 0.25` *outside* of `aes()`. 59 | 60 | ## {{< fa circle-check >}} Solution 61 | 62 | ```r 63 | ggplot(data = mpg) + 64 | geom_point(mapping = aes(x = displ, y = hwy), alpha = 0.25) 65 | ``` 66 | 67 | ::: 68 | 69 | ### 70 | 71 | Good job! You can now identify which values contain more observations. The darker locations contain several points stacked on top of each other. 72 | 73 | 74 | ### Adjust the position 75 | 76 | A second strategy for dealing with rounding is to adjust the position of each point. `position = "jitter"` adds a small amount of random noise to the location of each point. Since the noise is random, it is unlikely that two points rounded to the same location will also be jittered to the same location. 77 | 78 | The result is a jittered plot that displays more of the data. Jittering comes with both limitations and benefits. You cannot use a jittered plot to see the _local_ values of the points, but you can use a jittered plot to perceive the _global_ relationship between the variables, something that is hard to do in the presence of overplotting. 79 | 80 | ```{r out.width="80%"} 81 | ggplot(data = mpg) + 82 | geom_point(mapping = aes(x = displ, y = hwy), position = "jitter") 83 | ``` 84 | 85 | ### Review: jitter 86 | 87 | In the [Scatterplots tutorial](/visualize-data/05-scatterplots/02-layers.qmd), you learned of a geom that displays the equivalent of `geom_point()` with a `position = "jitter"` adjustment. 88 | 89 | Rewrite the code below to use that geom. Do you obtain similar results? 90 | 91 | ::: {.panel-tabset} 92 | ## {{< fa code >}} Interactive editor 93 | 94 | ```{webr-r} 95 | ggplot(data = mpg) + 96 | geom_point(mapping = aes(x = displ, y = hwy), position = "jitter") 97 | 98 | 99 | ``` 100 | 101 | ## {{< fa circle-check >}} Solution 102 | 103 | ```r 104 | ggplot(data = mpg) + 105 | geom_jitter(mapping = aes(x = displ, y = hwy)) 106 | ``` 107 | 108 | ::: 109 | 110 | ### 111 | 112 | Good job! Now let's look at ways to handle overplotting due to large datasets. 113 | 114 | 115 | ## 116 | 117 | ```{r} 118 | #| echo: false 119 | #| results: asis 120 | create_buttons("03-large-data.html") 121 | ``` 122 | -------------------------------------------------------------------------------- /visualize-data/07-overplotting/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Overplotting and big data" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | --- 11 | 12 | ```{r include=FALSE} 13 | knitr::opts_chunk$set( 14 | fig.width = 6, 15 | fig.height = 6 * 0.618, 16 | fig.retina = 3, 17 | dev = "ragg_png", 18 | fig.align = "center", 19 | out.width = "70%" 20 | ) 21 | 22 | source(here::here("R", "helpers.R")) 23 | ``` 24 | 25 | 26 | Data visualization is a useful tool because it makes data accessible to your visual system, which can process large amounts of information quickly. However, two characteristics of data can short circuit this system. Data can not be easily visualized if 27 | 28 | 1. Data points are all rounded to the same values. 29 | 2. The data contains so many points that they occlude each other. 30 | 31 | These features both create _overplotting_, the condition where multiple geoms in the plot are plotted on top of each other, hiding each other. This tutorial will show you several strategies for dealing with overplotting, introducing new geoms along the way. 32 | 33 | The tutorial is adapted from [_R for Data Science_](https://r4ds.had.co.nz/) by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. You can purchase the book at [shop.oreilly.com](http://shop.oreilly.com/product/0636920034407.do). 34 | 35 | The tutorial uses the {ggplot2} and {hexbin} packages, which have been pre-loaded for your convenience. 36 | 37 | 38 | ## 39 | 40 | ```{r} 41 | #| echo: false 42 | #| results: asis 43 | create_buttons("01-overplotting.html") 44 | ``` 45 | -------------------------------------------------------------------------------- /visualize-data/08-customize/01-zooming.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Zooming" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | engine: knitr 12 | filters: 13 | - webr 14 | webr: 15 | packages: 16 | - ggplot2 17 | - dplyr 18 | cell-options: 19 | editor-font-scale: 0.85 20 | fig-width: 6 21 | fig-height: 3.7 22 | out-width: "70%" 23 | --- 24 | 25 | ```{r include=FALSE} 26 | knitr::opts_chunk$set( 27 | fig.width = 6, 28 | fig.height = 6 * 0.618, 29 | fig.retina = 3, 30 | dev = "ragg_png", 31 | fig.align = "center", 32 | out.width = "70%" 33 | ) 34 | 35 | library(tidyverse) 36 | 37 | source(here::here("R", "helpers.R")) 38 | ``` 39 | 40 | ```{webr-r} 41 | #| context: setup 42 | p <- ggplot(diamonds) + 43 | geom_boxplot(mapping = aes(x = cut, y = price)) 44 | ``` 45 | 46 | In the previous tutorials, you learned how to visualize data with graphs. Now let's look at how to customize the look and feel of your graphs. To do that we will need to begin with a graph that we can customize. 47 | 48 | ### Review 1: Make a plot 49 | 50 | In the chunk below, make a plot that uses boxplots to display the relationship between the `cut` and `price` variables from the `diamonds` dataset. 51 | 52 | ::: {.panel-tabset} 53 | ## {{< fa code >}} Interactive editor 54 | 55 | ```{webr-r} 56 | 57 | 58 | 59 | ``` 60 | 61 | ## {{< fa circle-check >}} Solution 62 | 63 | ```r 64 | ggplot(diamonds) + 65 | geom_boxplot(mapping = aes(x = cut, y = price)) 66 | ``` 67 | 68 | ::: 69 | 70 | ### 71 | 72 | Good job! Let's use this plot as a starting point to make a more pleasing plot that displays a clear message. 73 | 74 | ### Storing plots 75 | 76 | Since we want to use this plot again later, let's go ahead and save it. 77 | 78 | ```{r} 79 | p <- ggplot(diamonds) + 80 | geom_boxplot(mapping = aes(x = cut, y = price)) 81 | ``` 82 | 83 | Now whenever you call `p`, R will draw your plot. Try it and see. 84 | 85 | ::: {.panel-tabset} 86 | ## {{< fa code >}} Interactive editor 87 | 88 | ```{webr-r} 89 | 90 | 91 | 92 | ``` 93 | 94 | ## {{< fa circle-check >}} Solution 95 | 96 | ```r 97 | p 98 | ``` 99 | 100 | ::: 101 | 102 | ### 103 | 104 | Good job! By the way, have you taken a moment to look at what the plot shows? Let's do that now. 105 | 106 | ### Surprise? 107 | 108 | Our plot shows something surprising: when you group diamonds by `cut`, the worst cut diamonds have the highest median price. It's a little hard to see in the plot, but you can verify it with some data manipulation. 109 | 110 | ```{r} 111 | diamonds |> 112 | group_by(cut) |> 113 | summarise(median = median(price)) 114 | ``` 115 | 116 | ### Zoom 117 | 118 | ```{r echo=FALSE, out.width="80%"} 119 | p 120 | ``` 121 | 122 | The difference between median prices is hard to see in our plot because each group contains distant outliers. 123 | 124 | We can make the difference easier to see by zooming in on the low values of $y$, where the medians are located. There are two ways to zoom with {ggplot2}: with and without clipping. 125 | 126 | ### Clipping 127 | 128 | Clipping refers to how R should treat the data that falls outside of the zoomed region. To see its effect, look at these plots. Each zooms in on the region where price is between \$0 and \$7,500. 129 | 130 | ```{r echo=FALSE, out.width="100%", warning=FALSE, message=FALSE} 131 | #| layout-ncol: 2 132 | p + ylim(0, 7500) 133 | p + coord_cartesian(ylim = c(0, 7500)) 134 | ``` 135 | 136 | * The plot on the left zooms _by_ clipping. It removes all of the data points that fall outside of the desired region, and then plots the data points that remain. 137 | * The plot on the right zooms _without_ clipping. You can think of it as drawing the entire graph and then zooming into a certain region. 138 | 139 | ### `xlim()` and `ylim()` 140 | 141 | Of these, zooming by clipping is the easiest to do. To zoom your graph on the $x$ axis, add the function `xlim()` to the plot call. To zoom on the $y$ axis add the function `ylim()`. Each takes a minimum value and a maximum value to zoom to, like this 142 | 143 | ```{r eval=FALSE} 144 | some_plot + 145 | xlim(0, 100) 146 | ``` 147 | 148 | ### Exercise 1: Clipping 149 | 150 | Use `ylim()` to recreate our plot on the left from above. The plot zooms the $y$ axis from 0 to 7,500 by clipping. 151 | 152 | ::: {.panel-tabset} 153 | ## {{< fa code >}} Interactive editor 154 | 155 | ```{webr-r} 156 | p 157 | 158 | 159 | ``` 160 | 161 | ## {{< fa circle-check >}} Solution 162 | 163 | ```r 164 | p + ylim(0, 7500) 165 | ``` 166 | 167 | ::: 168 | 169 | ### 170 | 171 | Good job! Zooming by clipping will sometimes make the graph you want, but in our case it is a very bad idea. Can you tell why? 172 | 173 | 174 | ### A caution 175 | 176 | Zooming by clipping is a bad idea for boxplots. `ylim()` fundamentally changes the information conveyed in the boxplots because it throws out some of the data before drawing the boxplots. Those aren't the medians of the entire data set that we are looking at. 177 | 178 | How then can we zoom without clipping? 179 | 180 | ### `xlim` and `ylim` 181 | 182 | To zoom without clipping, set the `xlim` and/or `ylim` arguments of your plot's `coord_` function. Each takes a numeric vector of length two (the minimum and maximum values to zoom to). 183 | 184 | This is easy to do if your plot explicitly calls a `coord_` function 185 | 186 | ```{r out.width="80%"} 187 | p + coord_flip(ylim = c(0, 7500)) 188 | ``` 189 | 190 | ### `coord_cartesian()` 191 | 192 | But what if your plot doesn't call a `coord_` function? Then your plot is using Cartesian coordinates (the default). You can adjust the limits of your plot without changing the default coordinate system by adding `coord_cartesian()` to your plot. 193 | 194 | Try it below. Use `coord_cartesian()` to zoom `p` to the region where price falls between 0 and 7500. 195 | 196 | ::: {.panel-tabset} 197 | ## {{< fa code >}} Interactive editor 198 | 199 | ```{webr-r} 200 | p 201 | 202 | 203 | ``` 204 | 205 | ## {{< fa circle-check >}} Solution 206 | 207 | ```r 208 | p + coord_cartesian(ylim = c(0, 7500)) 209 | ``` 210 | 211 | ::: 212 | 213 | ### 214 | 215 | Good job! Now it is much easier to see the differences in the median. 216 | 217 | 218 | ### `p` 219 | 220 | Notice that our code so far has used `p` to make a plot, but it hasn't changed the plot that is saved inside of `p`. You can run `p` by itself to get the unzoomed plot. 221 | 222 | ```{r out.width="80%"} 223 | p 224 | ``` 225 | 226 | ### Updating `p` 227 | 228 | I like the zooming, so I'm purposefully going to overwrite the plot stored in `p` so that it uses it. 229 | 230 | ```{r out.width="80%"} 231 | p <- p + coord_cartesian(ylim = c(0, 7500)) 232 | p 233 | ``` 234 | 235 | 236 | ## 237 | 238 | ```{r} 239 | #| echo: false 240 | #| results: asis 241 | create_buttons("02-labels.html") 242 | ``` 243 | -------------------------------------------------------------------------------- /visualize-data/08-customize/02-labels.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Labels" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | engine: knitr 12 | filters: 13 | - webr 14 | webr: 15 | packages: 16 | - ggplot2 17 | cell-options: 18 | editor-font-scale: 0.85 19 | fig-width: 6 20 | fig-height: 3.7 21 | out-width: "70%" 22 | --- 23 | 24 | ```{r include=FALSE} 25 | knitr::opts_chunk$set( 26 | fig.width = 6, 27 | fig.height = 6 * 0.618, 28 | fig.retina = 3, 29 | dev = "ragg_png", 30 | fig.align = "center", 31 | out.width = "70%" 32 | ) 33 | 34 | library(tidyverse) 35 | 36 | p <- ggplot(diamonds) + 37 | geom_boxplot(mapping = aes(x = cut, y = price)) + 38 | coord_cartesian(ylim = c(0, 7500)) 39 | 40 | source(here::here("R", "helpers.R")) 41 | ``` 42 | 43 | ```{webr-r} 44 | #| context: setup 45 | p <- ggplot(diamonds) + 46 | geom_boxplot(mapping = aes(x = cut, y = price)) + 47 | coord_cartesian(ylim = c(0, 7500)) 48 | ``` 49 | 50 | ### `labs()` {.no-hide} 51 | 52 | The relationship in our plot is now easier to see, but that doesn't mean that everyone who sees our plot will spot it. We can draw their attention to the relationship with a label, like a title or a caption. 53 | 54 | To do this, we will use the `labs()` function. You can think of `labs()` as an all purpose function for adding labels to a {ggplot2} plot. 55 | 56 | ### Titles 57 | 58 | Give `labs()` a `title` argument to add a title. 59 | 60 | ```{r out.width="80%"} 61 | p + labs(title = "The title appears here") 62 | ``` 63 | 64 | ### Subtitles 65 | 66 | Give `labs()` a `subtitle` argument to add a subtitle. If you use multiple arguments, remember to separate them with a comma. 67 | 68 | ```{r out.width="80%"} 69 | p + labs(title = "The title appears here", 70 | subtitle = "The subtitle appears here, slightly smaller") 71 | ``` 72 | 73 | ### Captions 74 | 75 | Give `labs()` a `caption` argument to add a caption. I like to use captions to cite my data source. 76 | 77 | ```{r out.width="80%"} 78 | p + labs(title = "The title appears here", 79 | subtitle = "The subtitle appears here, slightly smaller", 80 | caption = "Captions appear at the bottom.") 81 | ``` 82 | 83 | ### Axis labels 84 | 85 | Give `labs()` `x` and `y` arguments to change the axis labels. 86 | 87 | ```{r out.width="80%"} 88 | p + labs(title = "The title appears here", 89 | subtitle = "The subtitle appears here, slightly smaller", 90 | caption = "Captions appear at the bottom.", 91 | x = "Diamond cut", 92 | y = "Price") 93 | ``` 94 | 95 | ### Legend titles 96 | 97 | If you've mapped a column to an aesthetic like `color`, `fill`, `linetype`, etc., you can change its label with `labs()` too: 98 | 99 | ```{r out.width="80%"} 100 | ggplot(diamonds) + 101 | geom_boxplot(mapping = aes(x = cut, y = price, fill = cut)) + 102 | labs(title = "The title appears here", 103 | subtitle = "The subtitle appears here, slightly smaller", 104 | caption = "Captions appear at the bottom.", 105 | x = "Diamond cut", 106 | y = "Price", 107 | fill = "Diamond cut") 108 | ``` 109 | 110 | ### Exercise 2: Labels 111 | 112 | Plot `p` with a set of informative labels. For learning purposes, be sure to use a title, subtitle, caption, and axis labels. 113 | 114 | ::: {.panel-tabset} 115 | ## {{< fa code >}} Interactive editor 116 | 117 | ```{webr-r} 118 | p 119 | 120 | 121 | ``` 122 | 123 | ## {{< fa circle-check >}} Solution 124 | 125 | ```r 126 | p + labs(title = "Diamond prices by cut", 127 | subtitle = "Fair cut diamonds fetch the highest median price. Why?", 128 | caption = "Data collected by Hadley Wickham") 129 | ``` 130 | 131 | ::: 132 | 133 | 134 | ### 135 | 136 | Good job! By the way, why *do* fair cut diamonds fetch the highest price? 137 | 138 | 139 | ### Exercise 3: Carat size? 140 | 141 | Perhaps a diamond's cut is conflated with its carat size. If fair cut diamonds tend to be larger diamonds that would explain their larger prices. Let's test this. 142 | 143 | Make a plot that displays the relationship between carat size, price, and cut for all diamonds. How do you interpret the results? Give your plot a title, subtitle, and caption that explain the plot and convey your conclusions. 144 | 145 | If you are looking for a way to start, I recommend using a smooth line with color mapped to cut, perhaps overlaid on the background data. 146 | 147 | ::: {.panel-tabset} 148 | ## {{< fa code >}} Interactive editor 149 | 150 | ```{webr-r} 151 | 152 | 153 | 154 | ``` 155 | 156 | ## {{< fa circle-check >}} Solution 157 | 158 | ```r 159 | ggplot(data = diamonds, mapping = aes(x = carat, y = price)) + 160 | geom_smooth(mapping = aes(color = cut), se = FALSE) + 161 | labs(title = "Carat size vs. Price", 162 | subtitle = "Fair cut diamonds tend to be large, but they fetch the lowest prices for most carat sizes.", 163 | caption = "Data by Hadley Wickham") 164 | ``` 165 | 166 | ::: 167 | 168 | ### 169 | 170 | Good job! The plot corroborates our hypothesis. 171 | 172 | ### `p1` 173 | 174 | Unlike `p`, our new plot uses color and has a legend. Let's save it to use later when we learn to customize colors and legends. 175 | 176 | ```{r out.width="80%", message=FALSE} 177 | p1 <- ggplot(data = diamonds, mapping = aes(x = carat, y = price)) + 178 | geom_smooth(mapping = aes(color = cut), se = FALSE) + 179 | labs(title = "Carat size vs. Price", 180 | subtitle = "Fair cut diamonds tend to be large, but they fetch the lowest prices for most carat sizes.", 181 | caption = "Data by Hadley Wickham") 182 | ``` 183 | 184 | ### `annotate()` 185 | 186 | `annotate()` provides a final way to label your graph: it adds a single geom to your plot. When you use `annotate()`, you must first choose which type of geom to add. Next, you must manually supply a value for each aesthetic required by the geom. 187 | 188 | So for example, we could use `annotate()` to add text to our plot. 189 | 190 | ```{r message=FALSE} 191 | p1 + annotate("text", x = 4, y = 7500, label = "There are no cheap,\nlarge diamonds") 192 | ``` 193 | 194 | Notice that I select `geom_text()` with `"text"`, the suffix of the function name in quotation marks. 195 | 196 | ## 197 | 198 | ```{r} 199 | #| echo: false 200 | #| results: asis 201 | create_buttons("03-themes.html") 202 | ``` 203 | -------------------------------------------------------------------------------- /visualize-data/08-customize/03-themes.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Themes" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | engine: knitr 12 | filters: 13 | - webr 14 | webr: 15 | packages: 16 | - ggplot2 17 | - ggthemes 18 | cell-options: 19 | editor-font-scale: 0.85 20 | fig-width: 6 21 | fig-height: 3.7 22 | out-width: "70%" 23 | --- 24 | 25 | ```{r include=FALSE} 26 | knitr::opts_chunk$set( 27 | fig.width = 6, 28 | fig.height = 6 * 0.618, 29 | fig.retina = 3, 30 | dev = "ragg_png", 31 | fig.align = "center", 32 | out.width = "70%" 33 | ) 34 | 35 | library(tidyverse) 36 | library(ggthemes) 37 | 38 | p1 <- ggplot(data = diamonds, mapping = aes(x = carat, y = price)) + 39 | geom_smooth(mapping = aes(color = cut), se = FALSE) + 40 | labs(title = "Carat size vs. Price", 41 | subtitle = "Fair cut diamonds tend to be large, but they fetch the lowest prices for most carat sizes.", 42 | caption = "Data by Hadley Wickham") 43 | 44 | source(here::here("R", "helpers.R")) 45 | ``` 46 | 47 | ```{webr-r} 48 | #| context: setup 49 | p1 <- ggplot(data = diamonds, mapping = aes(x = carat, y = price)) + 50 | geom_smooth(mapping = aes(color = cut), se = FALSE) + 51 | labs(title = "Carat size vs. Price", 52 | subtitle = "Fair cut diamonds tend to be large, but they fetch the lowest prices for most carat sizes.", 53 | caption = "Data by Hadley Wickham") 54 | ``` 55 | 56 | One of the most effective ways to control the look of your plot is with a theme. 57 | 58 | ### What is a theme? 59 | 60 | A theme describes how the non-data elements of your plot should look. For example, these two plots show the same data, but they use two very different themes. 61 | 62 | ```{r echo=FALSE, out.width="100%", message=FALSE, warning=FALSE} 63 | #| layout-ncol: 2 64 | p1 + theme_bw() 65 | p1 + theme_economist() 66 | ``` 67 | 68 | ### Theme functions 69 | 70 | To change the theme of your plot, add a `theme_` function to your plot call. The {ggplot2} package provides eight theme functions to choose from. 71 | 72 | * `theme_bw()` 73 | * `theme_classic()` 74 | * `theme_dark()` 75 | * `theme_gray()` 76 | * `theme_light()` 77 | * `theme_linedraw()` 78 | * `theme_minimal()` 79 | * `theme_void()` 80 | 81 | Use the box below to plot `p1` with each of the themes. Which theme do you prefer? Which theme does {ggplot2} apply by default? 82 | 83 | ::: {.panel-tabset} 84 | ## {{< fa code >}} Interactive editor 85 | 86 | ```{webr-r} 87 | p1 + theme_bw() 88 | 89 | 90 | ``` 91 | 92 | ::: 93 | 94 | ### 95 | 96 | Good job! {ggplot2} uses `theme_gray()` by default. 97 | 98 | ### {ggthemes} 99 | 100 | If you would like to give your graph a more complete makeover, the {ggthemes} package provides extra themes that imitate the graph styles of popular software packages and publications. These include: 101 | 102 | * `theme_base()` 103 | * `theme_calc()` 104 | * `theme_economist()` 105 | * `theme_economist_white()` 106 | * `theme_excel()` 107 | * `theme_few()` 108 | * `theme_fivethirtyeight()` 109 | * `theme_foundation()` 110 | * `theme_gdocs()` 111 | * `theme_hc()` 112 | * `theme_igray()` 113 | * `theme_map()` 114 | * `theme_pander()` 115 | * `theme_par()` 116 | * `theme_solarized()` 117 | * `theme_solarized_2()` 118 | * `theme_solid()` 119 | * `theme_stata()` 120 | * `theme_tufte()` 121 | * `theme_wsj()` 122 | 123 | Try plotting `p1` with at least two or three of the themes mentioned above. 124 | 125 | ::: {.panel-tabset} 126 | ## {{< fa code >}} Interactive editor 127 | 128 | ```{webr-r} 129 | p1 130 | 131 | 132 | ``` 133 | 134 | ## {{< fa circle-check >}} Solution 135 | 136 | ```r 137 | p1 + theme_wsj() 138 | ``` 139 | 140 | ::: 141 | 142 | ### 143 | 144 | Good job! Notice that each theme supplies its own font sizes, which means that your captions might run off the page for some themes. In practice, you can fix this by resizing your graph window. 145 | 146 | 147 | ### Update `p1` 148 | 149 | If you compare the {ggtheme} themes to the styles they imitate, you might notice something: the colors used to plot your data haven't changed. The colors are noticeably {ggplot2} colors. In the next section, we'll look at how to customize this remaining part of your graph: the data elements. 150 | 151 | Before we go on, I suggest that we update `p1` to use `theme_bw()`. It will make our next set of modifications easier to see. 152 | 153 | ```{r p1, out.width="80%", message=FALSE} 154 | p1 <- p1 + theme_bw() 155 | p1 156 | ``` 157 | 158 | 159 | ## 160 | 161 | ```{r} 162 | #| echo: false 163 | #| results: asis 164 | create_buttons("04-scales.html") 165 | ``` 166 | -------------------------------------------------------------------------------- /visualize-data/08-customize/06-quiz.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Quiz" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | 11 | engine: knitr 12 | filters: 13 | - webr 14 | webr: 15 | packages: 16 | - ggplot2 17 | cell-options: 18 | editor-font-scale: 0.85 19 | fig-width: 6 20 | fig-height: 3.7 21 | out-width: "70%" 22 | --- 23 | 24 | ```{r include=FALSE} 25 | knitr::opts_chunk$set( 26 | fig.width = 6, 27 | fig.height = 6 * 0.618, 28 | fig.retina = 3, 29 | dev = "ragg_png", 30 | fig.align = "center", 31 | out.width = "70%" 32 | ) 33 | 34 | library(tidyverse) 35 | 36 | source(here::here("R", "helpers.R")) 37 | ``` 38 | 39 | In this tutorial, you learned how to customize the graphs that you make with ggplot2 in several ways. You learned how to: 40 | 41 | * Zoom in on regions of the graph 42 | * Add titles, subtitles, and annotations 43 | * Add themes 44 | * Add color scales 45 | * Adjust legends 46 | 47 | To cement your skills, combine what you've learned to recreate the plot below. 48 | 49 | ```{r echo=FALSE, message=FALSE} 50 | ggplot(diamonds, aes(x = carat, y = price)) + 51 | geom_point() + 52 | geom_smooth(aes(color = cut), se = FALSE) + 53 | labs(title = "Ideal cut diamonds command the best price for every carat size", 54 | subtitle = "Lines show GAM estimate of mean values for each level of cut", 55 | caption = "Data provided by Hadley Wickham", 56 | x = "Log Carat Size", 57 | y = "Log Price Size", 58 | color = "Cut Rating") + 59 | scale_x_log10() + 60 | scale_y_log10() + 61 | scale_color_brewer(palette = "Greens") + 62 | theme_light() 63 | ``` 64 | 65 | ::: {.panel-tabset} 66 | ## {{< fa code >}} Interactive editor 67 | 68 | ```{webr-r} 69 | 70 | 71 | 72 | ``` 73 | 74 | ## {{< fa circle-check >}} Solution 75 | 76 | ```r 77 | ggplot(data = diamonds, mapping = aes(x = carat, y = price)) + 78 | geom_point() + 79 | geom_smooth(mapping = aes(color = cut), se = FALSE) + 80 | labs(title = "Ideal cut diamonds command the best price for every carat size", 81 | subtitle = "Lines show GAM estimate of mean values for each level of cut", 82 | caption = "Data provided by Hadley Wickham", 83 | x = "Log Carat Size", 84 | y = "Log Price Size", 85 | color = "Cut Rating") + 86 | scale_x_log10() + 87 | scale_y_log10() + 88 | scale_color_brewer(palette = "Greens") + 89 | theme_light() 90 | ``` 91 | 92 | ::: 93 | 94 | ## 95 | 96 | 97 | ```{r} 98 | #| echo: false 99 | #| results: asis 100 | create_buttons(NULL) 101 | ``` 102 | -------------------------------------------------------------------------------- /visualize-data/08-customize/img/viridis.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewheiss/r-primers/d5f6cefec2142dbb6c59189cd2756986db460e33/visualize-data/08-customize/img/viridis.png -------------------------------------------------------------------------------- /visualize-data/08-customize/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Customize your plots" 3 | format: 4 | html: 5 | toc: false 6 | section-divs: true 7 | include-after-body: 8 | - text: | 9 | 10 | --- 11 | 12 | ```{r include=FALSE} 13 | knitr::opts_chunk$set( 14 | fig.width = 6, 15 | fig.height = 6 * 0.618, 16 | fig.retina = 3, 17 | dev = "ragg_png", 18 | fig.align = "center", 19 | out.width = "70%" 20 | ) 21 | 22 | source(here::here("R", "helpers.R")) 23 | ``` 24 | 25 | This tutorial will teach you how to customize the look and feel of your plots. You will learn how to: 26 | 27 | * **Zoom in** on areas of interest 28 | * Add **labels** and **annotations** to your plots 29 | * Change the appearance of your plot with a **theme** 30 | * Use **scales** to select custom color palettes 31 | * Modify the labels, title, and position of **legends** 32 | 33 | The tutorial is adapted from [_R for Data Science_](https://r4ds.had.co.nz/) by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. You can purchase the book at [shop.oreilly.com](http://shop.oreilly.com/product/0636920034407.do). 34 | 35 | The tutorial uses the {ggplot2}, {dplyr}, {scales}, {ggthemes}, and {viridis} packages, which have been pre-loaded for your convenience. 36 | 37 | ## 38 | 39 | ```{r} 40 | #| echo: false 41 | #| results: asis 42 | create_buttons("01-zooming.html") 43 | ``` 44 | --------------------------------------------------------------------------------