├── .Rbuildignore ├── .gitignore ├── LICENSE.md ├── README.md ├── css └── style.css ├── dplyr-learnr.Rmd ├── dplyr-learnr.Rproj ├── dplyr-learnr.html ├── images ├── culmen_depth.png ├── data_wrangler.png ├── dplyr_across_where.jpeg ├── dplyr_case_when_sm.png ├── dplyr_filter_sm.png ├── dplyr_hex.png ├── dplyr_mutate.png ├── dplyr_relocate.png ├── lter_penguins.png ├── r_learners_sm.jpeg └── rename_sm.jpg └── rsconnect └── shinyapps.io └── allisonhorst └── dplyr-learnr.dcf /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^LICENSE\.md$ 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | .DS_Store 6 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | ## creative commons 2 | 3 | # CC0 1.0 Universal 4 | 5 | CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER. 6 | 7 | ### Statement of Purpose 8 | 9 | The laws of most jurisdictions throughout the world automatically confer exclusive Copyright and Related Rights (defined below) upon the creator and subsequent owner(s) (each and all, an "owner") of an original work of authorship and/or a database (each, a "Work"). 10 | 11 | Certain owners wish to permanently relinquish those rights to a Work for the purpose of contributing to a commons of creative, cultural and scientific works ("Commons") that the public can reliably and without fear of later claims of infringement build upon, modify, incorporate in other works, reuse and redistribute as freely as possible in any form whatsoever and for any purposes, including without limitation commercial purposes. These owners may contribute to the Commons to promote the ideal of a free culture and the further production of creative, cultural and scientific works, or to gain reputation or greater distribution for their Work in part through the use and efforts of others. 12 | 13 | For these and/or other purposes and motivations, and without any expectation of additional consideration or compensation, the person associating CC0 with a Work (the "Affirmer"), to the extent that he or she is an owner of Copyright and Related Rights in the Work, voluntarily elects to apply CC0 to the Work and publicly distribute the Work under its terms, with knowledge of his or her Copyright and Related Rights in the Work and the meaning and intended legal effect of CC0 on those rights. 14 | 15 | 1. __Copyright and Related Rights.__ A Work made available under CC0 may be protected by copyright and related or neighboring rights ("Copyright and Related Rights"). Copyright and Related Rights include, but are not limited to, the following: 16 | 17 | i. the right to reproduce, adapt, distribute, perform, display, communicate, and translate a Work; 18 | 19 | ii. moral rights retained by the original author(s) and/or performer(s); 20 | 21 | iii. publicity and privacy rights pertaining to a person's image or likeness depicted in a Work; 22 | 23 | iv. rights protecting against unfair competition in regards to a Work, subject to the limitations in paragraph 4(a), below; 24 | 25 | v. rights protecting the extraction, dissemination, use and reuse of data in a Work; 26 | 27 | vi. database rights (such as those arising under Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, and under any national implementation thereof, including any amended or successor version of such directive); and 28 | 29 | vii. other similar, equivalent or corresponding rights throughout the world based on applicable law or treaty, and any national implementations thereof. 30 | 31 | 2. __Waiver.__ To the greatest extent permitted by, but not in contravention of, applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and unconditionally waives, abandons, and surrenders all of Affirmer's Copyright and Related Rights and associated claims and causes of action, whether now known or unknown (including existing as well as future claims and causes of action), in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each member of the public at large and to the detriment of Affirmer's heirs and successors, fully intending that such Waiver shall not be subject to revocation, rescission, cancellation, termination, or any other legal or equitable action to disrupt the quiet enjoyment of the Work by the public as contemplated by Affirmer's express Statement of Purpose. 32 | 33 | 3. __Public License Fallback.__ Should any part of the Waiver for any reason be judged legally invalid or ineffective under applicable law, then the Waiver shall be preserved to the maximum extent permitted taking into account Affirmer's express Statement of Purpose. In addition, to the extent the Waiver is so judged Affirmer hereby grants to each affected person a royalty-free, non transferable, non sublicensable, non exclusive, irrevocable and unconditional license to exercise Affirmer's Copyright and Related Rights in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "License"). The License shall be deemed effective as of the date CC0 was applied by Affirmer to the Work. Should any part of the License for any reason be judged legally invalid or ineffective under applicable law, such partial invalidity or ineffectiveness shall not invalidate the remainder of the License, and in such case Affirmer hereby affirms that he or she will not (i) exercise any of his or her remaining Copyright and Related Rights in the Work or (ii) assert any associated claims and causes of action with respect to the Work, in either case contrary to Affirmer's express Statement of Purpose. 34 | 35 | 4. __Limitations and Disclaimers.__ 36 | 37 | a. No trademark or patent rights held by Affirmer are waived, abandoned, surrendered, licensed or otherwise affected by this document. 38 | 39 | b. Affirmer offers the Work as-is and makes no representations or warranties of any kind concerning the Work, express, implied, statutory or otherwise, including without limitation warranties of title, merchantability, fitness for a particular purpose, non infringement, or the absence of latent or other defects, accuracy, or the present or absence of errors, whether or not discoverable, all to the greatest extent permissible under applicable law. 40 | 41 | c. Affirmer disclaims responsibility for clearing rights of other persons that may apply to the Work or any use thereof, including without limitation any person's Copyright and Related Rights in the Work. Further, Affirmer disclaims responsibility for obtaining any necessary consents, permissions or other rights required for any use of the Work. 42 | 43 | d. Affirmer understands and acknowledges that Creative Commons is not a party to this document and has no duty or obligation with respect to this CC0 or use of the Work. 44 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # dplyr-learnr 2 | A colorful introduction to some common functions in dplyr, part of the tidyverse. 3 | -------------------------------------------------------------------------------- /css/style.css: -------------------------------------------------------------------------------- 1 | /****** 2 | dplyr learnr tutorial 3 | *******/ 4 | 5 | /*-------- Google font imports --------*/ 6 | 7 | @import url(//fonts.googleapis.com/css?family=Roboto+Mono); 8 | @import url('https://fonts.googleapis.com/css2?family=Tajawal:wght@200;400;500;700&display=swap'); 9 | 10 | 11 | /*-------- Background color & body font --------*/ 12 | body { 13 | background-color: white; 14 | font-family: 'Tajawal', sans-serif; 15 | color: black; 16 | font-size: 14px; 17 | padding-left: 20px; 18 | padding-right: 20px; 19 | } 20 | 21 | /*-------- Main Panel ---------*/ 22 | .topics { 23 | width: 70%; 24 | overflow-x: auto; 25 | padding-bottom: 600px; 26 | padding-left: 20px; 27 | padding-right: 20px; 28 | background-color: white; 29 | } 30 | 31 | .band { 32 | padding-left: 3%; 33 | padding-right: 3%; 34 | position: relative; 35 | } 36 | 37 | /* Emphasis (e.g. italics) */ 38 | em { 39 | font-family: 'Tajawal', sans-serif; 40 | color: dimgray; 41 | } 42 | 43 | /*--------- Headers ---------*/ 44 | 45 | h2 { 46 | color: black; 47 | font-family: 'Tajawal', sans-serif; 48 | font-weight: 500; 49 | } 50 | 51 | h3 { 52 | color: black; 53 | font-family: 'Tajawal', sans-serif; 54 | font-weight: 500; 55 | text-transform: uppercase; 56 | } 57 | 58 | h4 { 59 | color: darkmagenta; 60 | font-family: 'Tajawal', sans-serif; 61 | font-weight: 500; 62 | padding-top: 20px; 63 | text-transform: uppercase; 64 | } 65 | 66 | h5 { 67 | color: mediumorchid; 68 | font-family: 'Tajawal', sans-serif; 69 | font-size: 16px; 70 | } 71 | 72 | /*-------- Code chunk header panel ---------*/ 73 | .panel-default > .panel-heading { 74 | color: white; 75 | background-color: darkslateblue; 76 | border: 0; 77 | } 78 | 79 | /*------- Rectangle panels --------*/ 80 | .panel, pre { 81 | border-radius: 0; 82 | background-color: #EBEBEB; 83 | border: 0; 84 | font-size: 13px; 85 | font: 'Tajawal', sans-serif 86 | } 87 | 88 | /*-------- Code chunk main panel --------*/ 89 | 90 | .ace-tm { 91 | background-color: white; 92 | color: dimgray; 93 | border-bottom: 1px solid gainsboro; 94 | border-right: 1px solid gainsboro; 95 | border-left: 1px solid gainsboro; 96 | font-family: 'Roboto Mono', monospace; 97 | font-size: 13px; 98 | } 99 | 100 | /*-------- Line # cells --------*/ 101 | 102 | .ace_gutter-cell { 103 | padding-left: 19px; 104 | padding-right: 6px; 105 | background-repeat: no-repeat; 106 | background-color: white; 107 | color: lightseagreen; 108 | border-right: gainsboro; 109 | } 110 | 111 | /*-------- Code returns --------*/ 112 | .tutorial-exercise-output > pre { 113 | max-height: 500px; 114 | overflow-y: auto; 115 | background-color: white; 116 | color: dimgray; 117 | font-size: 13px; 118 | border: 1px solid gainsboro; 119 | } 120 | 121 | 122 | /*------- In line code aesthetic --------*/ 123 | code { 124 | font-family: 'Roboto Mono', monospace; 125 | color: teal; 126 | background-color: ghostwhite; 127 | font-size: 13px; 128 | } 129 | 130 | .panel, pre { 131 | border-radius: 0; 132 | background-color: #E5E4E2; 133 | border: 0; 134 | font-size: 13; 135 | font: 'Tajawal', sans-serif; 136 | } 137 | 138 | /*------- Hyperlinks --------*/ 139 | a { 140 | font-family: 'Tajawal', sans-serif; 141 | color: indigo; 142 | text-decoration: none; 143 | } 144 | 145 | a:hover, a:focus { 146 | color: mediumslateblue; 147 | text-decoration: underline; 148 | } 149 | 150 | 151 | /*------- Main menu -------*/ 152 | .topicsList .topic { 153 | font-family: 'Tajawal', sans-serif; 154 | font-weight: 400; 155 | color: teal; 156 | line-height: 45px; 157 | font-size: 16px; 158 | padding-left: 20px; 159 | padding-right: 20px; 160 | cursor: pointer; 161 | background-color: white; 162 | background-repeat: no-repeat; 163 | background-size: 2px 80px; 164 | background-position: left 100%; 165 | background-position-y: 100%; 166 | background-image: url(images/topicProgress.png); 167 | border-bottom: 0; 168 | border-top: 0; 169 | -webkit-transition-property: background-position; 170 | transition-property: background-position; 171 | -webkit-transition-duration: 0.25s; 172 | transition-duration: 0.25s; 173 | } 174 | 175 | /*------- Selected menu item -------*/ 176 | .topicsList .topic.current { 177 | font-family: 'Tajawal', sans-serif; 178 | background-color: mediumslateblue; 179 | color: white; 180 | border: 0; 181 | padding-left: 20px; 182 | padding-right: 20px; 183 | } 184 | /*--- When hovering ---*/ 185 | .topicsList .topic:hover { 186 | background-color: mediumorchid; 187 | color: white; 188 | padding-left: 20px; 189 | padding-right: 20px; 190 | } 191 | 192 | /*------- Solution / Hint buttons -------*/ 193 | tutorial-exercise-input .btn-tutorial-solution { 194 | background-color: mediumorchid; 195 | color: white; 196 | font-family: 'Tajawal', sans-serif; 197 | padding-top: 5px; 198 | } 199 | 200 | 201 | /*------- Solution popover --------*/ 202 | popover-title { 203 | margin: 0; 204 | background-color: mediumslateblue; 205 | color: white; 206 | font-family: 'Tajawal', sans-serif; 207 | } 208 | 209 | 210 | /*------- Buttons -------*/ 211 | 212 | .tutorial-exercise-input .btn-xs:hover { 213 | font-weight: normal; 214 | background-color: mediumorchid; 215 | color: white; 216 | padding-top: 5px; 217 | } 218 | 219 | /*------- Button default/primary -------*/ 220 | .btn-default { 221 | color: white; 222 | background-color: mediumslateblue; 223 | background-image: none; 224 | border: none; 225 | font-family: 'Tajawal', sans-serif; 226 | text-shadow: none; 227 | padding-top: 5px; 228 | } 229 | 230 | .btn-primary, .btn-success, .btn-info, .btn-light, .btn-xs { 231 | color: white; 232 | background-color: mediumslateblue; 233 | background-image: none; 234 | border: none; 235 | font-family: 'Tajawal', sans-serif; 236 | text-shadow: none; 237 | padding-top: 5px; 238 | } 239 | 240 | .topicActions .btn-primary:hover, .btn-success:hover, .btn-info:hover, .btn-light:hover, .btn-xs:hover { 241 | color: white; 242 | background-color: mediumorchid; 243 | background-image: none; 244 | border: none; 245 | font-family: 'Tajawal', sans-serif; 246 | text-shadow: none; 247 | padding-top: 5px; 248 | } 249 | 250 | .btn-default:hover, btn-success:hover, btn-info:hover, btn-primary:hover { 251 | background-color: mediumorchid; 252 | color: white; 253 | font-family: 'Tajawal', sans-serif; 254 | padding-top: 5px; 255 | } 256 | 257 | hr.examples { 258 | border: 3px solid slateblue; 259 | } 260 | 261 | hr.activities { 262 | border: 3px solid orange; 263 | } 264 | 265 | /*------ FORK ON GITHUB BUTTON -------*/ 266 | 267 | .button { 268 | display: inline-block; 269 | padding: 10px 20px; 270 | padding-top: 10px; 271 | text-align: center; 272 | text-decoration: none; 273 | color: white; 274 | background-color: darkgray; 275 | border-radius: 6px; 276 | outline: none; 277 | } 278 | -------------------------------------------------------------------------------- /dplyr-learnr.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wrangling penguins: some basic data wrangling in R with dplyr" 3 | author: "Allison Horst" 4 | output: 5 | learnr::tutorial: 6 | css: css/style.css 7 | runtime: shiny_prerendered 8 | --- 9 | 10 | ```{r setup, include=FALSE} 11 | library(shiny) 12 | library(learnr) 13 | library(tidyverse) 14 | library(palmerpenguins) 15 | library(kableExtra) 16 | library(fontawesome) 17 | library(here) 18 | 19 | 20 | knitr::opts_chunk$set(echo = FALSE) 21 | ``` 22 | 23 | ## 1. Welcome 24 | 25 | In this tutorial, we'll learn some basic functions to help you work with data using functions in the `dplyr` package, part of the `tidyverse` in *R*. 26 | 27 | #### What is the tidyverse? 28 | 29 | The [tidyverse](https://www.tidyverse.org/) is a collection of packages that contain useful functions for working with and visualizing data (and a bunch of other stuff). You don't need to install the `tidyverse` to write or run code in this tutorial, since it's already attached behind the scenes, but you can install it with `install.packages("tidyverse")` to work with it on your own outside of this tutorial. 30 | 31 | #### What is dplyr? 32 | 33 | ```{r, echo=FALSE, out.width="30%", fig.align = "left"} 34 | 35 | knitr::include_graphics("images/dplyr_hex.png") 36 | 37 | ``` 38 | 39 | `dplyr` is one package in the tidyverse. It is the home to many functions that make it easier for us to work with data. Those include things like selecting specific columns, deciding which rows to keep based on whether or not they match our conditions, and finding summary statistics for different variables and groups. Sometimes we call these steps part of "data wrangling." 40 | 41 | ```{r, echo=FALSE, out.width="100%", fig.align = "center", fig.cap = "Illustration from Hadley Wickham's 2019 talk, The Joy of Functional Programming"} 42 | 43 | knitr::include_graphics("images/data_wrangler.png") 44 | 45 | ``` 46 | 47 | #### What's in this tutorial? 48 | 49 | In this tutorial, you'll learn and practice examples using some functions in `dplyr` to work with data. Those are: 50 | 51 | - `filter()`: keep rows that satisfy your conditions 52 | - `select()`: keep or exclude some columns 53 | - `rename()`: rename columns 54 | - `relocate()`: move columns around 55 | - `mutate()`: add a new column 56 | - `group_by()` + `summarize()`: get summary statistics by group 57 | - `across()`: apply a function across columns 58 | - `count()`: quickly find counts for different groups 59 | - `case_when()`: like friendly if-else 60 | 61 | #### What's not in this tutorial? 62 | 63 | A WHOLE LOT. Visit https://dplyr.tidyverse.org/ and [R for Data Science](https://r4ds.had.co.nz/) for more information and examples. 64 | 65 | #### Code chunks for activities 66 | 67 | In each section of this tutorial, there will be examples and practice activities. You will complete practice activities in code chunks, like the one below. Once you enter your activity code, press 'Run' to see the output. If you get stuck, press the 'Hint' or 'Solution' button. 68 | 69 | ```{r calculator, exercise = TRUE} 70 | 0.2 + 0.3 71 | ``` 72 | 73 | ```{r calculator-hint} 74 | 0.2 + 0.4 # Should be plus 0.4 75 | ``` 76 | 77 | ```{r calculator-solution} 78 | (0.2 + 0.4)*100 # Multiply by 100 to make a percentage 79 | ``` 80 | 81 | When you press 'Run', the **output** of the code is returned below the code chunk. 82 | 83 | Go ahead and try typing in some basic calculations in the area below, and notice that when you update the numbers and press 'Run', the output below it is updated. 84 | 85 | ```{r practice_calc, exercise = TRUE} 86 | 87 | ``` 88 | 89 | ```{r, echo=FALSE, out.width="80%", fig.align = "center"} 90 | knitr::include_graphics("images/r_learners_sm.jpeg") 91 | ``` 92 | 93 | ### Thank you to: 94 | 95 | #### `dplyr` creators & contributors! 96 | 97 | Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2020). dplyr: A Grammar of Data Manipulation. R package version 1.0.2. https://CRAN.R-project.org/package=dplyr 98 | 99 | #### `tidyverse` creators & contributors! 100 | 101 | Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686 102 | 103 | #### `palmerpenguins` package coauthors 104 | 105 | Thanks palmerpenguins team, Dr. Alison Hill & Dr. Kristen B. Gorman! 106 | 107 | Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. R package version 0.1.0. https://allisonhorst.github.io/palmerpenguins/ 108 | 109 | **Valuable feedback and suggestions from:** 110 | 111 | - Gordon Blasco (NCEAS) 112 | - Casey O'Hara (Bren School) 113 | 114 | 115 | Fork GitHub repo 116 | 117 | 118 | ## 2. Meet the data 119 | 120 | We'll practice some wrangling in `dplyr` using data for penguin sizes recorded by Dr. Kristen Gorman and colleagues with the [Palmer Station Long Term Ecological Research site (Palmer LTER)](https://pal.lternet.edu/) at several islands in the Palmer Archipelago, Antarctica. Data are originally published in: Gorman KB, Williams TD, Fraser WR (2014) PLoS ONE 9(3): e90081. doi:10.1371/journal.pone.0090081, and made available through the Environmental Data Initiative (see data citation details [here](https://allisonhorst.github.io/palmerpenguins/index.html)). 121 | 122 | You do **not** need to import the data to work through this tutorial - the data are already here waiting behind the scenes. If you *do* ever want to use the `penguins` dataset outside of this tutorial, you can install the `palmerpenguins` package from CRAN using `install.packages("palmerpenguins")` and learn more about the package [here](https://allisonhorst.github.io/palmerpenguins/). 123 | 124 | #### A bit about the penguins: 125 | 126 | The 3 species of penguins in this data set are Adelie, chinstrap and gentoo. They are all awesome and adorable. 127 | 128 | ```{r, echo=FALSE, out.width="100%", fig.align = "center"} 129 | knitr::include_graphics("images/lter_penguins.png") 130 | ``` 131 | 132 | There are 8 variables included: 133 | 134 | - **species:** a factor denoting the penguin species (Adelie, Chinstrap, or Gentoo) 135 | - **island:** a factor denoting the island (in Palmer Archipelago, Antarctica) where observed 136 | - **bill_length_mm:** a number denoting length of the dorsal ridge of penguin bill (millimeters) 137 | - **bill_depth_mm:** a number denoting the depth of the penguin bill (millimeters) 138 | - **flipper_length_mm:** an integer denoting penguin flipper length (millimeters) 139 | - **body_mass_g:** an integer denoting penguin body mass (grams) 140 | - **sex:** a factor denoting penguin sex (male, female) 141 | - **year:** an integer denoting the study year (2007, 2008, or 2009) 142 | 143 | #### What do the data look like? 144 | 145 | Below is a glimpse of the first 10 lines of the penguins data (`NA`s indicate missing values throughout). Notice that the data are already in *tidy format* - meaning that: 146 | 147 | - Each variable is a column 148 | - Each observation is a row 149 | - Each value is in its own cell 150 | 151 | ```{r, echo = FALSE} 152 | penguins %>% 153 | head(10) %>% 154 | kable() %>% 155 | kable_styling(full_width = FALSE) 156 | ``` 157 | 158 | OK, go forth and `filter()`! 159 | 160 | ## 3. dplyr::filter() 161 | 162 | [CLICK HERE](https://dplyr.tidyverse.org/reference/filter.html) for `filter()` documentation from tidyverse.org. 163 | 164 | Use `filter()` to create a subset of the data only containing rows that satisfy your conditions. 165 | 166 | In the image below, the data must satisfy two conditions for a row (observation) to be retained: *type* must match "otter", and *site* must match "bay". Only two of the rows satisfy those conditions (the ones outlined in purple), so only those two would be retained upon running the code. 167 | 168 | ```{r, echo=FALSE, out.width="80%", fig.align = "center"} 169 | knitr::include_graphics("images/dplyr_filter_sm.png") 170 | ``` 171 | 172 | As a reminder for the following examples, here's a sample from the **penguins** data (5 observations / 344 total). 173 | 174 | ```{r} 175 | penguins[c(3,31,199,220,304),] %>% 176 | kable() %>% 177 | kable_styling(full_width = F) 178 | ``` 179 | Now let's learn some different ways we can use `filter()` to help us keep or exclude rows based on our conditions. 180 | 181 | ### FILTER EXAMPLES 182 | 183 | #### `r fa("fas fa-robot", fill = "purple")` Example 1 184 | 185 | Make a subset with only chinstrap penguins. 186 | 187 | In the code below, we **filter** the **penguins** data to only keep rows where the entry for **species** exactly matches "Chinstrap" (case sensitive). Note that when our condition is based on a string, the string is in quotation marks (here, "Chinstrap"). 188 | 189 | ```{r, echo = TRUE, message = FALSE, warning = FALSE} 190 | dplyr::filter(penguins, species == "Chinstrap") 191 | ``` 192 | 193 | #### `r fa("fas fa-robot", fill = "purple")` Example 2 194 | 195 | From `penguins`, filter to only include chinstrap and gentoo penguins. 196 | 197 | First, we need to be careful about the type of statement we're going to write. If we want to create a subset that contains chinstraps **and** gentoos, that means we want to keep rows where species matches "Chinstrap" **OR** "Gentoo". We can do this a couple different ways: 198 | 199 | - Use the "or" operator, `|` (the vertical line), between conditions 200 | - Use the `%in%` operator, followed by a vector of values to look for a match in 201 | 202 | ```{r, echo = TRUE, message = FALSE, warning = FALSE, eval = FALSE} 203 | # First way: using the or | operator 204 | dplyr::filter(penguins, species == "Chinstrap" | species == "Gentoo") 205 | ``` 206 | 207 | Does the same thing as... 208 | 209 | ```{r, echo = TRUE, message = FALSE, warning = FALSE} 210 | # Second way: using the %in% operator 211 | dplyr::filter(penguins, species %in% c("Chinstrap", "Gentoo")) 212 | ``` 213 | 214 | Note that the output is identical for both methods above. It may seem like the first way is easier right now - but if you have a lot of potential matches you're looking for, the second way reduces redundancy and increases code readability. 215 | 216 | **WARNING:** The condition `letter == c("A", "B")` is **very different** from `letter %in% c("A", "B")` - you will probably not EVER want to do the first, because it will work its way through the rows of the `letter` column looking for the strings "A" then "B" **in that order**. In other words, it will look for "A" in Row 1, then for "B" in Row 2, then "A" in Row 3, then "B" in Row 4, etc. - when what you probably want to do is just ask "Look in each row and keep it if it's "A" or "B"", which is what the second option (`letter %in% c("A", "B")`) will do. 217 | 218 | #### `r fa("fas fa-robot", fill = "purple")` Example 3 219 | 220 | Introducing the pipe operator (`%>%`)! 221 | 222 | In the examples above, we specified the dataset within the `filter()` function. That's fine for now, but moving forward we'll want to use the pipe operator (`%>%`) to perform operations in logical sequence. You can think of the pipe operator as saying "and then" in your code. For example: 223 | 224 | Collect kindling AND THEN open the flue AND THEN start fire using the pipe looks like: 225 | 226 | ``` 227 | Collect kindling %>% 228 | open the flue %>% 229 | start fire 230 | ``` 231 | 232 | Let's say we want to use the pipe operator in the example above to filter for chinstraps. We start by telling R the data frame name, AND THEN what to do with it: data `%>%` thing, like this: 233 | 234 | ```{r, echo = TRUE, message = FALSE, warning = FALSE} 235 | penguins %>% 236 | dplyr::filter(species == "Chinstrap") 237 | ``` 238 | 239 | Moving forward in this tutorial, we'll use the pipe operator. 240 | 241 | #### `r fa("fas fa-robot", fill = "purple")` Example 4 242 | 243 | From `penguins`, make a subset with Adelie penguins on Dream Island. 244 | 245 | Think carefully above the conditions. In this case, we only want to keep observations (rows) where the species is "Adelie" **AND** the island is "Dream" - a row should only be retained if both of those conditions are met. There are a number of ways you can write an **and** statement within `filter()`, including: 246 | 247 | - A comma between conditions indicates both must be met (`filter(x == "a", y == "b")`) 248 | - An ampersand between conditions indicates both must be met (`filter(x == "a" & y == "b")`) 249 | 250 | We can create a subset starting from penguins that only contains observations for Adelie penguins on Dream Island as follows: 251 | 252 | ```{r, echo = TRUE, message = FALSE, warning = FALSE} 253 | penguins %>% 254 | filter(species == "Adelie", island == "Dream") 255 | ``` 256 | 257 | ### Keep or exclude rows based on values 258 | 259 | We can also use `dplyr::filter()` to keep or exclude rows based on variable values using standard logical operators (`==`, `<=`, `>=`, `<`, `>`). 260 | 261 | Unlike when setting conditions for strings, values within `filter()` don't need to be within quotation marks. 262 | 263 | #### `r fa("fas fa-robot", fill = "purple")` Example 5 264 | 265 | From `penguins`, keep penguins with a flipper length greater than 200 mm. 266 | 267 | ```{r, echo = TRUE, message = FALSE, warning = FALSE} 268 | penguins %>% 269 | filter(flipper_length_mm > 200) 270 | ``` 271 | 272 | #### `r fa("fas fa-robot", fill = "purple")` Example 6 273 | 274 | From `penguins`, keep observations where body mass less than or equal to 2900 g. 275 | 276 | ```{r, echo = TRUE, message = FALSE, warning = FALSE} 277 | penguins %>% 278 | filter(body_mass_g <= 2900) 279 | ``` 280 | 281 | #### `r fa("fas fa-robot", fill = "purple")` Example 7 282 | 283 | From `penguins`, keep observations for Adelie penguins with a bill length greater than 40 mm. 284 | 285 | ```{r, echo = TRUE, message = FALSE, warning = FALSE} 286 | penguins %>% 287 | filter(species == "Adelie", bill_length_mm > 40) 288 | ``` 289 | 290 | ### Exclude rows based on conditions 291 | 292 | Imagine you have a dataset that contains 10 penguin species, and you want to keep observations for 8 of those species. That would be a long list of species to match - if you were referring to the ones you want to keep. 293 | 294 | But we can also use `filter()` by saying what we **don't** want to keep, for example we could just list the two species we want to *exclude* to reduce the typing we have to do. 295 | 296 | Use `!=` to indicate "does not match" within the `filter()` function. 297 | 298 | #### `r fa("fas fa-robot", fill = "purple")` Example 8 299 | 300 | Exclude observations for chinstraps from the `penguins` data. In the code below, the `!=` operator is used within `filter()` to keep observations where the species variable **does not match** "Chinstrap". 301 | 302 | ```{r, echo = TRUE, message = FALSE, warning = FALSE} 303 | penguins %>% 304 | filter(species != "Chinstrap") 305 | ``` 306 | 307 |
308 |
309 |
310 | 311 | ### FILTER PRACTICE ACTIVITIES 312 | 313 | 314 | In the code chunks below, write your own code to practice with the `dplyr::filter()` function. If you get stuck, click on the "Hint" and "Solution" buttons. 315 | 316 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 1 317 | 318 | Use `filter()` to create a subset from `penguins` that only contains gentoo penguins with a bill depth greater than or equal to 15.5 millimeters. 319 | 320 | ```{r test_q1, exercise = TRUE} 321 | 322 | ``` 323 | 324 | ```{r test_q1-hint, warning = FALSE} 325 | penguins %>% 326 | filter(species == "", bill_depth_mm > ___) 327 | ``` 328 | 329 | ```{r filter_q1-solution} 330 | penguins %>% 331 | filter(species == "Gentoo", bill_depth_mm > 15.5) 332 | ``` 333 | 334 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 2 335 | 336 | Use `filter()` to create a subset from `penguins` that contains observations for male penguins recorded at Dream and Biscoe Islands. 337 | 338 | ```{r filter_q2, exercise = TRUE} 339 | 340 | ``` 341 | 342 | ```{r filter_q2-hint} 343 | penguins %>% 344 | filter(island %in% c("_____","_____"), 345 | sex == "_____") 346 | ``` 347 | 348 | ```{r filter_q2-solution} 349 | penguins %>% 350 | filter(island %in% c("Dream","Biscoe"), 351 | sex == "male") 352 | ``` 353 | 354 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 3 355 | 356 | Use `filter()` to create a subset from `penguins` that contains observations for female Adelie penguins with bill lengths less than 35 mm. 357 | 358 | ```{r filter_q3, exercise = TRUE} 359 | 360 | ``` 361 | 362 | ```{r filter_q3-hint} 363 | penguins %>% 364 | filter(sex == "_____", 365 | species == "_____", 366 | bill_length_mm < _____) 367 | ``` 368 | 369 | 370 | ```{r filter_q3-solution} 371 | penguins %>% 372 | filter(sex == "female", 373 | species == "Adelie", 374 | bill_length_mm < 35) 375 | ``` 376 | 377 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 4 378 | 379 | Create a subset from `penguins` containing observations for female chinstrap penguins on Dream and Torgersen Islands. 380 | 381 | ```{r filter_q4, exercise = TRUE} 382 | 383 | ``` 384 | 385 | ```{r filter_q4-hint} 386 | penguins %>% 387 | filter(sex == "____", 388 | species == "____") %>% 389 | filter(island %in% c("____","____")) 390 | ``` 391 | 392 | ```{r filter_q4-solution} 393 | penguins %>% 394 | filter(sex == "female", 395 | species == "chinstrap") %>% 396 | filter(island %in% c("Dream","Torgersen")) 397 | ``` 398 | 399 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 5 400 | 401 | Create a subset from `penguins` that contains penguins that are either gentoos **OR** have a body mass greater than 4500 g. 402 | 403 | ```{r filter_q5, exercise = TRUE} 404 | 405 | ``` 406 | 407 | ```{r filter_q5-hint} 408 | penguins %>% 409 | filter(species == "_____" | body_mass_g > _____) 410 | ``` 411 | 412 | ```{r filter_q5-solution} 413 | penguins %>% 414 | filter(species == "Gentoo" | body_mass_g > 4500) 415 | ``` 416 | 417 | 418 | 419 | ## 4. dplyr::select() 420 | 421 | [CLICK HERE](https://dplyr.tidyverse.org/reference/select.html) for `select()` documentation from tidyverse.org. 422 | 423 | The main job of `dplyr::select()` is to help you pick which **columns** (if your data is in tidy format, those are variables) to keep or exclude. 424 | 425 | While making subsets of variables is rarely *necessary* for analyses (and is often unadvised), it can make large data sets with many variables more manageable. 426 | 427 | As a reminder, here's a sample from the **penguins** data (5 observations / 344 total). 428 | 429 | ```{r} 430 | penguins[c(3,31,199,220,304),] %>% 431 | kable() %>% 432 | kable_styling(full_width = F) 433 | ``` 434 | 435 | ### SELECT EXAMPLES 436 | 437 | Within `select()`, list the variables that you want to keep in your new subset, separated by commas. Select a range of sequential variables using a colon `:` between inclusive endpoint variables. For example, to select from columns `giraffe` to `narwhal` the entire range is referenced with `giraffe:narwhal`. Note: you can also use `select()` to reorder columns - the order they are listed within `select()` is the order they'll appear in the new subset. 438 | 439 | #### `r fa("fas fa-robot", fill = "purple")` Example 1 440 | 441 | Keep columns `year`, `island`, and `species` from the `penguins` data. 442 | 443 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 444 | penguins %>% 445 | select(year, island, species) 446 | ``` 447 | 448 | #### `r fa("fas fa-robot", fill = "purple")` Example 2 449 | 450 | Keep columns `species` and `body_mass_g` from the `penguins` data. 451 | 452 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 453 | penguins %>% 454 | select(species, body_mass_g) 455 | ``` 456 | 457 | #### `r fa("fas fa-robot", fill = "purple")` Example 3 458 | 459 | From the `penguins` data, keep all columns from `species` to `body_mass_g`. 460 | 461 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 462 | penguins %>% 463 | select(species:body_mass_g) 464 | ``` 465 | 466 | #### `r fa("fas fa-robot", fill = "purple")` Example 4 467 | 468 | Keep columns from `species` to `bill_depth_mm` *and* `year` from the penguins data. Note that additional (non-sequential) variables are added after a comma. 469 | 470 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 471 | penguins %>% 472 | select(species:bill_depth_mm, year) 473 | ``` 474 | 475 | ### Excluding columns 476 | 477 | Use the minus sign (`-`) in front of a variable name to exclude a single variable. For example, if I add `-giraffe` within the `select()` function, the `giraffe` column will be excluded. 478 | 479 | To exclude a range of columns, use `!(giraffe:narwhale)`. 480 | 481 | To exclude several non-sequential columns, use `!c(giraffe, wolf, shark)`. 482 | 483 | #### `r fa("fas fa-robot", fill = "purple")` Example 5 484 | 485 | From `penguins`, keep all variables *except* `island` 486 | 487 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 488 | penguins %>% 489 | select(-island) 490 | ``` 491 | 492 | #### `r fa("fas fa-robot", fill = "purple")` Example 6 493 | 494 | From `penguins`, keep all variables *except* those from `island` to `bill_depth_mm` 495 | 496 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 497 | penguins %>% 498 | select(!(island:bill_depth_mm)) 499 | ``` 500 | 501 | #### `r fa("fas fa-robot", fill = "purple")` Example 7 502 | 503 | From `penguins`, keep all variables *except* `species`, `flipper_length_mm`, and `year` 504 | 505 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 506 | penguins %>% 507 | select(!c(species, flipper_length_mm, year)) 508 | ``` 509 | 510 | ### Some useful helper functions 511 | 512 | There are **many** helper functions that add power to `select()` (for example, selecting all variables of a specified class, or selecting all variables that start or end with a certain string pattern). Check out more [here](https://dplyr.tidyverse.org/reference/select.html). 513 | 514 | Here, we'll just explore a few: `starts_with()`, `ends_with()`, and `contains()`. These allow you to select any columns that match your conditions for the column name. 515 | 516 | #### `r fa("fas fa-robot", fill = "purple")` Example 8 517 | 518 | Keep columns with names that start with "bill". 519 | 520 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 521 | penguins %>% 522 | select(starts_with("bill")) 523 | ``` 524 | 525 | #### `r fa("fas fa-robot", fill = "purple")`Example 9 526 | 527 | Keep columns with names that end in "mm". 528 | 529 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 530 | penguins %>% 531 | select(ends_with("mm")) 532 | ``` 533 | 534 | #### `r fa("fas fa-robot", fill = "purple")` Example 10 535 | 536 | Keep columns with names that contain "length". 537 | 538 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 539 | penguins %>% 540 | select(contains("length")) 541 | ``` 542 | 543 | #### `r fa("fas fa-robot", fill = "purple")` Example 11 544 | 545 | Keep columns that contain "length" OR start with "bill". 546 | 547 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 548 | penguins %>% 549 | select(contains("length") | starts_with("bill")) 550 | ``` 551 | 552 | ### Examples in sequence 553 | 554 | #### `r fa("fas fa-robot", fill = "purple")` Example 12 555 | 556 | Make a subset from penguins that only contains observations for gentoo penguins, and then only keep columns for island, sex and body mass. 557 | 558 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 559 | penguins %>% 560 | filter(species == "Gentoo") %>% 561 | select(island, sex, body_mass_g) 562 | ``` 563 | 564 | #### `r fa("fas fa-robot", fill = "purple")` Example 13 565 | 566 | Make a subset from penguins that only contains observations for male penguins with flippers longer than 200 mm, and then only keep columns that end with "mm". 567 | 568 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 569 | penguins %>% 570 | filter(sex == "male", flipper_length_mm > 200) %>% 571 | select(ends_with("mm")) 572 | ``` 573 | 574 |
575 |
576 |
577 | 578 | ### SELECT PRACTICE ACTIVITIES 579 | 580 | In the code chunks below, write your own code to practice with the `dplyr::select()` function. If you get stuck, click on the "Hint" or "Solution" buttons. 581 | 582 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 1 583 | 584 | Starting with the `penguins` data, only keep the `body_mass_g` variable. 585 | 586 | ```{r select_q1, exercise = TRUE} 587 | 588 | ``` 589 | 590 | ```{r select_q1-solution} 591 | penguins %>% 592 | select(body_mass_g) 593 | ``` 594 | 595 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 2 596 | 597 | Starting with the `penguins` data, keep columns from `bill_length_mm` to `body_mass_g`, and `year` 598 | 599 | ```{r select_q2, exercise = TRUE} 600 | 601 | ``` 602 | 603 | ```{r select_q2-hint} 604 | penguins %>% 605 | select(______:______, ______) 606 | ``` 607 | 608 | ```{r select_q2-solution} 609 | penguins %>% 610 | select(bill_lenth_mm:body_mass_g, year) 611 | ``` 612 | 613 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 3 614 | 615 | Starting with the `penguins` data, keep all columns except `island` 616 | 617 | ```{r select_q3, exercise = TRUE} 618 | 619 | ``` 620 | 621 | ```{r select_q3-solution} 622 | penguins %>% 623 | select(-island) 624 | ``` 625 | 626 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 4 627 | 628 | From `penguins`, keep all variables *except* `species`, `sex` and `year`. 629 | 630 | ```{r select_q4, exercise = TRUE} 631 | 632 | ``` 633 | 634 | ```{r select_q4-hint} 635 | penguins %>% 636 | select(!c(____, ____, ____)) 637 | ``` 638 | 639 | ```{r select_q4-solution} 640 | penguins %>% 641 | select(!c(species, sex, year)) 642 | ``` 643 | 644 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 5 645 | 646 | From `penguins`, keep the `species` column and any columns that end with "mm". 647 | 648 | ```{r select_q5, exercise = TRUE} 649 | 650 | ``` 651 | 652 | ```{r select_q5-hint} 653 | penguins %>% 654 | select(______, ends_with("__")) 655 | ``` 656 | 657 | ```{r select_q5-solution} 658 | penguins %>% 659 | select(species, ends_with("mm")) 660 | ``` 661 | 662 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 6 663 | 664 | From `penguins`, keep any columns that contain "bill" OR end with "mm". 665 | 666 | ```{r select_q6, exercise = TRUE} 667 | 668 | ``` 669 | 670 | ```{r select_q6-hint} 671 | penguins %>% 672 | select(contains("___") | ends_with("___")) 673 | ``` 674 | 675 | ```{r select_q6-solution} 676 | penguins %>% 677 | select(contains("bill") | ends_with("mm")) 678 | ``` 679 | 680 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 7 681 | 682 | In a piped sequence, starting from `penguins`: 683 | 684 | - Only keep observations for female penguins observed on Dream Island, THEN... 685 | - Keep variables `species`, and any variable starting with "bill" 686 | 687 | ```{r select_q7, exercise = TRUE} 688 | 689 | ``` 690 | 691 | ```{r select_q7-hint} 692 | penguins %>% 693 | filter(sex == "_____", island == "_____") %>% 694 | select(_____, starts_with("____")) 695 | ``` 696 | 697 | ```{r select_q7-solution} 698 | penguins %>% 699 | filter(sex == "female", island == "Dream") %>% 700 | select(species, starts_with("bill")) 701 | ``` 702 | 703 | 704 | 705 | ## 5. dplyr::relocate() 706 | 707 | [CLICK HERE](https://dplyr.tidyverse.org/reference/relocate.html) for `relocate()` documentation from tidyverse.org. 708 | 709 | ```{r, echo=FALSE, out.width="100%", fig.align = "center"} 710 | knitr::include_graphics("images/dplyr_relocate.png") 711 | ``` 712 | 713 | Use `relocate()` to move columns around, without messing with rows or groups. Some useful bits: 714 | 715 | - Use `.before` or `.after` to move a column to before or after another (by name or class) 716 | - If a single column is within the function (e.g. `relocate(col_A)`), that column is moved to the front 717 | 718 | See details and more examples from tidyverse.org [here](https://dplyr.tidyverse.org/reference/relocate.html). 719 | 720 | As a reminder, here's a sample from the **penguins** data (5 observations / 344 total). 721 | 722 | ```{r} 723 | penguins[c(3,31,199,220,304),] %>% 724 | kable() %>% 725 | kable_styling(full_width = F) 726 | ``` 727 | 728 | ### RELOCATE EXAMPLES 729 | 730 | #### `r fa("fas fa-robot", fill = "purple")` Example 1 731 | 732 | Move the `year` column to the front of the `penguins` data. 733 | 734 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 735 | penguins %>% 736 | relocate(year) 737 | ``` 738 | 739 | #### `r fa("fas fa-robot", fill = "purple")` Example 2 740 | 741 | Starting with `penguins`, move the `flipper_length_mm` column to after the `island` column. 742 | 743 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 744 | penguins %>% 745 | relocate(flipper_length_mm, .after = island) 746 | ``` 747 | 748 | #### `r fa("fas fa-robot", fill = "purple")` Example 3 749 | 750 | Starting with `penguins`, move the `bill_lenth_mm` column to before `year`. 751 | 752 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 753 | penguins %>% 754 | relocate(bill_length_mm, .before = year) 755 | ``` 756 | 757 | #### `r fa("fas fa-robot", fill = "purple")` Example 4 758 | 759 | Starting with `penguins`, move any numeric variables to after any factor variables. 760 | 761 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 762 | penguins %>% 763 | relocate(where(is.numeric), .after = where(is.factor)) 764 | ``` 765 | 766 |
767 |
768 |
769 | 770 | ### RELOCATE PRACTICE ACTIVITIES 771 | 772 | In the code chunks below, write your own code to practice with the `dplyr::relocate()` function. If you get stuck, click on the "Hint" or "Solution" buttons. 773 | 774 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 1 775 | 776 | Starting with `penguins`, move the `species` variable to before the `sex` variable. 777 | 778 | ```{r relocate_q1, exercise = TRUE} 779 | 780 | ``` 781 | ```{r relocate_q1-solution} 782 | penguins %>% 783 | relocate(species, .before = sex) 784 | ``` 785 | 786 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 2 787 | 788 | Starting with `penguins`, relocate the `bill_length_mm` variable so that it is the first column. 789 | 790 | ```{r relocate_q2, exercise = TRUE} 791 | 792 | ``` 793 | ```{r relocate_q-solution} 794 | penguins %>% 795 | relocate(bill_length_mm) 796 | ``` 797 | 798 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 3 799 | 800 | Move any factor variables (hint: `is.factor`) to after any integer variables (`is.integer`). 801 | 802 | ```{r relocate_q3, exercise = TRUE} 803 | 804 | ``` 805 | 806 | ```{r relocate_q3-solution} 807 | penguins %>% 808 | relocate(where(is.factor), .after = where(is.integer)) 809 | ``` 810 | 811 | 812 | ## 6. dplyr::rename() 813 | 814 | [CLICK HERE](https://dplyr.tidyverse.org/reference/rename.html) for `rename()` documentation from tidyverse.org. 815 | 816 | Use `rename()` to change the name of one or more columns. Generally, it'll look something like this: 817 | 818 | `df %>% rename(new_name = old_name)` 819 | 820 | You can also use a function to rename multiple columns using `rename_with()`. 821 | 822 | ```{r, echo=FALSE, out.width="100%", fig.align = "center"} 823 | knitr::include_graphics("images/rename_sm.jpg") 824 | ``` 825 | 826 | As a reminder, here's a sample from the **penguins** data (5 observations / 344 total). 827 | 828 | ```{r} 829 | penguins[c(3,31,199,220,304),] %>% 830 | kable() %>% 831 | kable_styling(full_width = F) 832 | ``` 833 | 834 | ### RENAME EXAMPLES 835 | 836 | #### `r fa("fas fa-robot", fill = "purple")` Example 1 837 | 838 | Rename the `island` column to `palmer_island`. 839 | 840 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 841 | penguins %>% 842 | rename(palmer_island = island) 843 | ``` 844 | 845 | #### `r fa("fas fa-robot", fill = "purple")` Example 2 846 | 847 | Rename the `year` column to `study_yr`, and `body_mass_g` to `mass`. 848 | 849 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 850 | penguins %>% 851 | rename(study_yr = year, mass = body_mass_g) 852 | ``` 853 | 854 | #### `r fa("fas fa-robot", fill = "purple")` Example 3 855 | 856 | Convert all column names in `penguins` to upper case with `toupper`. 857 | 858 | Note: This is probably not a thing you'd actually want to do in the wild. 859 | 860 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 861 | penguins %>% 862 | rename_with(toupper) 863 | ``` 864 | 865 | #### `r fa("fas fa-robot", fill = "purple")` Example 4. 866 | 867 | Convert all column names in `penguins` that end with "mm" to upper case (`toupper`). 868 | 869 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 870 | penguins %>% 871 | rename_with(toupper, ends_with("mm")) 872 | ``` 873 | 874 |
875 |
876 |
877 | 878 | ### RENAME PRACTICE ACTIVITIES 879 | 880 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 1 881 | 882 | Starting with `penguins`, rename the `flipper_length_mm` column to `flipper_mm`. 883 | 884 | ```{r rename_q1, exercise = TRUE} 885 | 886 | ``` 887 | 888 | ```{r rename_q1-solution} 889 | penguins %>% 890 | rename(flipper_mm = flipper_length_mm) 891 | ``` 892 | 893 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 2 894 | 895 | Starting with `penguins`, rename the `island` column to `island_name` and the `species` column to `penguin_spp`. 896 | 897 | ```{r rename_q2, exercise = TRUE} 898 | 899 | ``` 900 | 901 | ```{r rename_q2-solution} 902 | penguins %>% 903 | rename(island_name = island, 904 | penguin_spp = species) 905 | ``` 906 | 907 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 3 908 | 909 | Starting with `penguins`, convert any column names that start with "bill" to upper case. 910 | 911 | ```{r rename_q3, exercise = TRUE} 912 | 913 | ``` 914 | 915 | ```{r rename_q3-hint} 916 | penguins %>% 917 | rename_with(____, starts_with("____")) 918 | ``` 919 | 920 | ```{r rename_q3-solution} 921 | penguins %>% 922 | rename_with(toupper, starts_with("bill")) 923 | ``` 924 | 925 | ## 7. dplyr::mutate() 926 | 927 | [CLICK HERE](https://dplyr.tidyverse.org/reference/mutate.html) for `mutate()` documentation from tidyverse.org. 928 | 929 | Use `mutate()` to add a new column, while keeping the existing columns. The general structure is: 930 | 931 | ```{r, echo = TRUE, eval = FALSE} 932 | df %>% 933 | mutate(new_column_name = what_it_contains) 934 | ``` 935 | 936 | For example, if I had a data frame `df` with columns `A` and `B`, I can add a new column `C` that is the sum of `A` and `B` as follows (note: you can also use `sum(A,B)` here instead of `A + B`): 937 | 938 | ```{r, echo = TRUE, eval = FALSE} 939 | df %>% 940 | mutate(C = A + B) 941 | ``` 942 | 943 | ...as demonstrated by mutant monsters below: 944 | 945 | ```{r, echo=FALSE, out.width="80%", fig.align = "center"} 946 | knitr::include_graphics("images/dplyr_mutate.png") 947 | ``` 948 | 949 | As a reminder, here's a sample from the **penguins** data (5 observations / 344 total). 950 | 951 | ```{r} 952 | penguins[c(3,31,199,220,304),] %>% 953 | kable() %>% 954 | kable_styling(full_width = F) 955 | ``` 956 | 957 | ### MUTATE EXAMPLES 958 | 959 | #### `r fa("fas fa-robot", fill = "purple")` Example 1 960 | 961 | Add a new column to penguins with body mass (currently in grams) converted to kilograms. 962 | 963 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 964 | penguins %>% 965 | mutate(body_mass_kg = body_mass_g / 1000) 966 | ``` 967 | 968 | #### `r fa("fas fa-robot", fill = "purple")` Example 2 969 | 970 | Add a new column to penguins with the ratio of bill length to bill depth. 971 | 972 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 973 | penguins %>% 974 | mutate(bill_ratio = bill_length_mm / bill_depth_mm) 975 | ``` 976 | 977 | 978 | #### `r fa("fas fa-robot", fill = "purple")` Example 3 979 | 980 | You can also add multiple new columns within a single `mutate()`. 981 | 982 | For example, let's add three new columns to penguins within one `mutate()` function: one column that contains the bill ratio (bill length / bill depth), one that contains the body mass converted to kg, and one that contains the flipper length converted to meters. 983 | 984 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 985 | penguins %>% 986 | mutate(bill_ratio = bill_length_mm / bill_depth_mm, 987 | body_mass_kg = body_mass_g / 1000, 988 | flipper_length_m = flipper_length_mm / 1000) 989 | ``` 990 | 991 | #### `r fa("fas fa-robot", fill = "purple")` Example 4 992 | 993 | Add a new column with a sequence of values from 1 to the length of the data frame. 994 | 995 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 996 | penguins %>% 997 | mutate(record_number = seq(1:n())) 998 | ``` 999 | 1000 | #### `r fa("fas fa-robot", fill = "purple")` Example 5 1001 | 1002 | Convert the `island` variable to a character using `mutate`. 1003 | 1004 | Note / danger: this is a different (and a bit dangerous) use of `mutate()`. If you give the new column the same name as an existing column, the existing column will be **replaced**. As a general rule, if in doubt ADD A COLUMN (little cost) instead of overwriting a column. 1005 | 1006 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 1007 | penguins %>% 1008 | mutate(island = as.character(island)) 1009 | ``` 1010 | 1011 | #### `r fa("fas fa-robot", fill = "purple")` Example 6 1012 | 1013 | Use `fct_relevel()` within `mutate()` to reorder the factor levels of `island` to (1) Torgersen, (2) Biscoe, (3) Dream. 1014 | 1015 | Note: See `fct_relevel()` & `fct_reorder()` in the `forcats` package to change the order of factor levels. 1016 | 1017 | 1018 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 1019 | penguins %>% 1020 | mutate(island = fct_relevel(island, "Torgersen", "Biscoe", "Dream")) 1021 | ``` 1022 | 1023 |
1024 |
1025 |
1026 | 1027 | ### MUTATE PRACTICE ACTIVITIES 1028 | 1029 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 1 1030 | 1031 | Add a column to `penguins` that contains a new column `flipper_m`, which is the `flipper_length_mm` (flipper length in millimeters) converted to units of meters. 1032 | 1033 | ```{r mutate_q1, exercise = TRUE} 1034 | 1035 | ``` 1036 | 1037 | ```{r mutate_q1-hint} 1038 | penguins %>% 1039 | mutate(_____ = _____ / 1000) 1040 | ``` 1041 | 1042 | ```{r mutate_q1-solution} 1043 | penguins %>% 1044 | mutate(flipper_m = flipper_length_mm / 1000) 1045 | ``` 1046 | 1047 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 2 1048 | 1049 | The `year` column in `penguins` is currently an integer. Add a new column named `year_fct` that is the year converted to a factor (hint: `as.factor()`). 1050 | 1051 | ```{r mutate_q2, exercise = TRUE} 1052 | 1053 | ``` 1054 | 1055 | ```{r mutate_q2-hint} 1056 | penguins %>% 1057 | mutate(_____ = as.factor(_____)) 1058 | ``` 1059 | 1060 | ```{r mutate_q2-solution} 1061 | penguins %>% 1062 | mutate(year_fct = as.factor(year)) 1063 | ``` 1064 | 1065 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 3 1066 | 1067 | To `penguins`, add a new column `mass_lb` that contains penguin body mass, currently in grams, converted to pounds (1 gram = 0.0022 lb). 1068 | 1069 | 1070 | ```{r mutate_q3, exercise = TRUE} 1071 | 1072 | ``` 1073 | 1074 | ```{r mutate_q3-solution} 1075 | penguins %>% 1076 | mutate(mass_lb = body_mass_g * 0.0022) 1077 | ``` 1078 | 1079 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 4 1080 | 1081 | Starting with `penguins`, do the following within a single `mutate()` function: 1082 | 1083 | - Convert the `species` variable to a character 1084 | - Add a new column (called `flipper_cm` with flipper length in centimeters) 1085 | - Convert the `island` column to lowercase 1086 | 1087 | ```{r mutate_q4, exercise = TRUE} 1088 | 1089 | ``` 1090 | 1091 | ```{r mutate_q4-hint} 1092 | penguins %>% 1093 | mutate(species = as.character(_____), 1094 | flipper_cm = _____ / __, 1095 | island = tolower(_____)) 1096 | ``` 1097 | 1098 | ```{r mutate_q4-solution} 1099 | penguins %>% 1100 | mutate(species = as.character(species), 1101 | flipper_cm = flipper_length_mm / 10, 1102 | island = tolower(island)) 1103 | ``` 1104 | 1105 | ## 8. dplyr::group_by() %>% summarize() 1106 | 1107 | [CLICK HERE](https://dplyr.tidyverse.org/reference/group_by.html) for `group_by()` documentation, and [here](https://dplyr.tidyverse.org/reference/summarise.html) 1108 | for `summarize()` documentation, from tidyverse.org. 1109 | 1110 | Use the combination of `group_by()` and `summarize()` to find find summary statistics for different groups, and put them in a nice table. 1111 | 1112 | From [dplyr.tidyverse.org](https://dplyr.tidyverse.org/): 1113 | 1114 | - `group_by()` "takes an existing tbl and converts it into a grouped tbl where operations are performed 'by group'" 1115 | - `summarize()` "creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified" 1116 | 1117 | Basically, specify groups within your data with `group_by()`, then use `summarize()` to calculate something (e.g. the mean or other statistic) for each group, & return it in a nice table. It's a powerhouse combo. 1118 | 1119 | As a reminder for the following examples, here's a sample from the **penguins** data (5 observations / 344 total). 1120 | 1121 | ```{r} 1122 | penguins[c(3,31,199,220,304),] %>% 1123 | kable() %>% 1124 | kable_styling(full_width = F) 1125 | ``` 1126 | 1127 | ### GROUP_BY + SUMMARIZE EXAMPLES 1128 | 1129 | #### `r fa("fas fa-robot", fill = "purple")` Example 1 1130 | 1131 | Use `group_by()` and `summarize()` to prepare a summary table containing the mean of penguin body mass, grouped by penguin species. Note that the `na.rm = TRUE` argument is added to exclude missing values. 1132 | 1133 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 1134 | penguins %>% 1135 | group_by(species) %>% 1136 | summarize(mass_mean = mean(body_mass_g, na.rm = TRUE)) 1137 | ``` 1138 | 1139 | #### `r fa("fas fa-robot", fill = "purple")` Example 2 1140 | 1141 | Use `group_by()` and `summarize()` to prepare a summary table containing the mean and standard deviation of penguin body mass, grouped by penguin species. 1142 | 1143 | Notice that there are now *two* columns created by `summarize()`, mass_mean and mass_sd. 1144 | 1145 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 1146 | penguins %>% 1147 | group_by(species) %>% 1148 | summarize(mass_mean = mean(body_mass_g, na.rm = TRUE), 1149 | mass_sd = sd(body_mass_g, na.rm = TRUE)) 1150 | ``` 1151 | 1152 | #### `r fa("fas fa-robot", fill = "purple")` Example 3 1153 | 1154 | Use `group_by()` and `summarize()` to prepare a summary table containing the mean and standard deviation of penguin bill length, grouped by penguin species and sex. 1155 | 1156 | Note that the data are now grouped by `species` and `sex` (within `group_by()`), and two columns are created: `bill_length_mean` and `bill_length_sd`, which will contain the mean and standard deviation of bill length within each group. 1157 | 1158 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 1159 | penguins %>% 1160 | group_by(species, sex) %>% 1161 | summarize(bill_length_mean = mean(bill_length_mm, na.rm = TRUE), 1162 | bill_length_sd = sd(bill_length_mm, na.rm = TRUE)) 1163 | ``` 1164 | 1165 | #### `r fa("fas fa-robot", fill = "purple")` Example 4 1166 | 1167 | Use `group_by()` and `summarize()` to prepare a summary table containing the maximum and minimum flipper length for male Adelie penguins, grouped by island. 1168 | 1169 | Here, we'll first use `filter()` to only keep rows for male Adelie penguins, then use `group_by()` and `summarize()` to find the minimum and maximum flipper length **by island**, & put them in a nice table. 1170 | 1171 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 1172 | penguins %>% 1173 | filter(species == "Adelie", sex == "male") %>% 1174 | group_by(island) %>% 1175 | summarize(flip_max_length = max(flipper_length_mm), 1176 | flip_min_length = min(flipper_length_mm)) 1177 | ``` 1178 | 1179 |
1180 |
1181 |
1182 | 1183 | ### GROUP_BY %>% SUMMARIZE PRACTICE ACTIVITIES 1184 | 1185 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 1 1186 | 1187 | Starting with `penguins`, create a summary table containing the maximum and minimum length of flippers (call the columns `flip_max` and `flip_min`) for chinstrap penguins, grouped by island. 1188 | 1189 | ```{r group_summarize_q1, exercise = TRUE} 1190 | 1191 | ``` 1192 | 1193 | ```{r group_summarize_q1-hint} 1194 | penguins %>% 1195 | filter(species == "_______") %>% 1196 | group_by(_____) %>% 1197 | summarize(flip_max = max(_____), 1198 | flip_min = min(_____)) 1199 | ``` 1200 | 1201 | ```{r group_summarize_q1-solution} 1202 | penguins %>% 1203 | filter(species == "Chinstrap") %>% 1204 | group_by(island) %>% 1205 | summarize(flip_max = max(flipper_length_mm), 1206 | flip_min = min(flipper_length_mm)) 1207 | ``` 1208 | 1209 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 2 1210 | 1211 | Starting with `penguins`, group the data by species and year, then create a summary table containing the mean bill depth (call this `bill_depth_mean`) and mean bill length (call this `bill_length_mean`) for each group. 1212 | 1213 | Don't forget: `na.rm = TRUE` 1214 | 1215 | ```{r group_summarize_q2, exercise = TRUE} 1216 | 1217 | ``` 1218 | 1219 | ```{r group_summarize_q2-hint} 1220 | penguins %>% 1221 | group_by(_____, _____) %>% 1222 | summarize( 1223 | _________ = mean(______, na.rm = TRUE), 1224 | _________ = mean(______, na.rm = TRUE) 1225 | ) 1226 | ``` 1227 | 1228 | ```{r group_summarize_q2-solution} 1229 | penguins %>% 1230 | group_by(species, year) %>% 1231 | summarize( 1232 | bill_depth_mean = mean(bill_depth_mm, na.rm = TRUE), 1233 | bill_length_mean = mean(bill_length_mm, na.rm = TRUE) 1234 | ) 1235 | ``` 1236 | 1237 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 3 1238 | 1239 | Starting with `penguins`, in a piped sequence: 1240 | 1241 | - Add a new column called `bill_ratio` that is the ratio of bill length to bill depth (hint: `mutate()`) 1242 | - Only keep columns `species` and `bill_ratio` 1243 | - Group the data by `species` 1244 | - Create a summary table containing the mean of the `bill_ratio` variable, by species (name the column in the summary table `bill_ratio_mean`) 1245 | 1246 | ```{r group_summarize_q3, exercise = TRUE} 1247 | 1248 | ``` 1249 | 1250 | ```{r group_summarize_q3-hint} 1251 | penguins %>% 1252 | mutate(bill_ratio = ______ / ______) %>% 1253 | select(______, ______) %>% 1254 | group_by(______) %>% 1255 | summarize(______ = mean(______, na.rm = TRUE)) 1256 | ``` 1257 | 1258 | ```{r group_summarize_q3-solution} 1259 | penguins %>% 1260 | mutate(bill_ratio = bill_length_mm / bill_depth_mm) %>% 1261 | select(species, bill_ratio) %>% 1262 | group_by(species) %>% 1263 | summarize(bill_ratio_mean = mean(bill_ratio, na.rm = TRUE)) 1264 | ``` 1265 | 1266 | ## 9. dplyr::across() 1267 | 1268 | [CLICK HERE](https://dplyr.tidyverse.org/reference/across.html) for `across()` documentation from tidyverse.org. 1269 | 1270 | From [tidyverse.org](https://dplyr.tidyverse.org/reference/across.html), `dplyr::across()` "makes it easy to apply the same transformation to multiple columns." 1271 | 1272 | ```{r, echo=FALSE, out.width="100%", fig.align = "center"} 1273 | knitr::include_graphics("images/dplyr_across_where.jpeg") 1274 | ``` 1275 | 1276 | The `across()` function is especially useful within `summarize()` to efficiently create summary tables with one or more functions applied to multiple variables (columns). 1277 | 1278 | Let's compare two ways of doing the same thing: creating a summary table of mean values for all penguin size measurements ending in "mm" (bill depth, bill length, flipper length), by species. 1279 | 1280 | As a reminder, here's a sample from the **penguins** data (5 observations / 344 total). 1281 | 1282 | ```{r} 1283 | penguins[c(3,31,199,220,304),] %>% 1284 | kable() %>% 1285 | kable_styling(full_width = F) 1286 | ``` 1287 | 1288 | #### Approach 1: Using `group_by()` %>% `summarize()` 1289 | 1290 | ```{r, echo = TRUE, warning = FALSE, message = FALSE, eval = FALSE} 1291 | penguins %>% 1292 | group_by(species) %>% 1293 | summarize(bill_length_mean = mean(bill_length_mm, na.rm = TRUE), 1294 | bill_depth_mean = mean(bill_depth_mm, na.rm = TRUE), 1295 | flipper_length_mean = mean(flipper_length_mm, na.rm = TRUE)) 1296 | ``` 1297 | 1298 | #### Approach 2: Using `across()` within `summarize()` 1299 | 1300 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 1301 | penguins %>% 1302 | group_by(species) %>% 1303 | summarize(across(ends_with("mm"), mean, na.rm = TRUE)) 1304 | ``` 1305 | 1306 | The output is the same - but the second way (using `across()`) is much more efficient - and becomes even more useful as the number of columns you want to transform increases! 1307 | 1308 | The `across()` function also happily accepts most helper functions introduced for `select()`, including: `starts_with()`, `ends_with()`, `contains()`, as well as to specify classes (e.g. `is.numeric`, `is.character`, etc.). Follow along with the examples & exercises below to learn more! 1309 | 1310 | ### ACROSS EXAMPLES 1311 | 1312 | #### `r fa("fas fa-robot", fill = "purple")` Example 1 1313 | 1314 | Starting with `penguins`, use `across()` within `group_by() %>% summarize()` to make a summary table containing the mean value of all columns from bill depth (bill_depth_mm) to body mass (body_mass_g), grouped by species and year. 1315 | 1316 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 1317 | penguins %>% 1318 | group_by(species, island) %>% 1319 | summarize(across(bill_depth_mm:body_mass_g, min, na.rm = TRUE)) 1320 | ``` 1321 | 1322 | #### `r fa("fas fa-robot", fill = "purple")` Example 2 1323 | 1324 | Starting with `penguins`, use `across()` within `group_by()` and `summarize()` to make a summary table containing the minimum values for bill length and body mass, grouped by penguin species. 1325 | 1326 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 1327 | penguins %>% 1328 | group_by(species) %>% 1329 | summarize(across(c(bill_length_mm, body_mass_g), min, na.rm = TRUE)) 1330 | ``` 1331 | 1332 | #### `r fa("fas fa-robot", fill = "purple")` Example 3 1333 | 1334 | Starting with `penguins`, use `across()` within `group_by()` and `summarize()` to make a summary table containing the maximum value within any column starting with "bill", grouped by `year`. 1335 | 1336 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 1337 | penguins %>% 1338 | group_by(year) %>% 1339 | summarize(across(starts_with("bill"), max, na.rm = TRUE)) 1340 | ``` 1341 | 1342 | Since what is presented in the table is the *maximum* value for bill length and depth, we probably want to update the column names. We could do that manually using `rename()`, or we can add the `.names = ` argument within `across()` as shown in the examples below. 1343 | 1344 | #### `r fa("fas fa-robot", fill = "purple")` Example 4 1345 | 1346 | Repeat the example above, but add an argument that will automatically update the column names containing the maximum bill length and depth to start with "max_" followed by the original column name. 1347 | 1348 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 1349 | penguins %>% 1350 | group_by(year) %>% 1351 | summarise(across(starts_with("bill"), max, na.rm = TRUE, .names = "max_{.col}")) 1352 | ``` 1353 | 1354 | #### `r fa("fas fa-robot", fill = "purple")` Example 5 1355 | 1356 | Starting from `penguins`, create a summary table that finds the mean and standard deviation for all variables containing the string "length", grouped by penguin species. Update the column names to start with "avg_" or "sd_", followed by the original column names. 1357 | 1358 | There's quite a bit happening here, so a little breakdown: 1359 | 1360 | - We use `contains("length")` to indicate we'll apply the functions to any columns with the word "length" in the name 1361 | - Within `list()` is where the functions to be applied across columns are given, and where their "names" of "avg" and "stdev" are set 1362 | - We use `.names = ` to define the final column names in the summary table. Here, the name should start with the function "name" specified above ("avg" or "stdev"), then an underscore, then the original column name (that's what `"{.fn}_{.col}"` will do) 1363 | 1364 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 1365 | penguins %>% 1366 | group_by(species) %>% 1367 | summarize(across(contains("length"), 1368 | list(avg = mean, stdev = sd), 1369 | na.rm = TRUE, 1370 | .names = "{.fn}_{.col}")) 1371 | ``` 1372 | 1373 |
1374 |
1375 |
1376 | 1377 | ### ACROSS PRACTICE ACTIVITIES 1378 | 1379 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 1 1380 | 1381 | Starting with `penguins`, group data by island then use `across()` to find the median value of groups for any columns containing the string "mm". The names in the resulting table should be the original column name followed by an underscore, then the word "median" (e.g. colname_median). 1382 | 1383 | ```{r across_q1, exercise = TRUE} 1384 | 1385 | ``` 1386 | 1387 | ```{r across_q1-hint} 1388 | penguins %>% 1389 | group_by(______) %>% 1390 | summarize(across(contains("______"), 1391 | median, 1392 | na.rm = TRUE, 1393 | .names = "_______") 1394 | ) 1395 | ``` 1396 | 1397 | ```{r across_q1-solution} 1398 | penguins %>% 1399 | group_by(species) %>% 1400 | summarize(across(contains("mm"), 1401 | median, 1402 | na.rm = TRUE, 1403 | .names = "{.col}_median") 1404 | ) 1405 | ``` 1406 | 1407 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 2 1408 | 1409 | Starting with `penguins`, only keep observations for Adelie penguins, then use `across()` to find the maximum value for any numeric variable (hint: `where(is.numeric)`) for each island (i.e. group by island). 1410 | 1411 | ```{r across_q2, exercise = TRUE} 1412 | 1413 | ``` 1414 | 1415 | ```{r across_q2-hint} 1416 | penguins %>% 1417 | filter(species == "______") %>% 1418 | group_by(______) %>% 1419 | summarize(across(where(_____), _____, na.rm = TRUE)) 1420 | ``` 1421 | 1422 | ```{r across_q2-solution} 1423 | penguins %>% 1424 | filter(species == "Adelie") %>% 1425 | group_by(island) %>% 1426 | summarize(across(where(is.numeric), max, na.rm = TRUE)) 1427 | ``` 1428 | 1429 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 3 1430 | 1431 | Starting from `penguins`, write a piped sequence to: 1432 | 1433 | - Exclude penguins observed on Biscoe Island 1434 | - Only keep variables `species` through `body_mass_g` 1435 | - Rename the `species` variable to `spp_penguin` 1436 | - Group the data by `spp_penguin` 1437 | - Find the mean value for any variable containing the string "length", by penguin species, with column names updated to the original column name appended with "_max" at the end 1438 | 1439 | ```{r across_q3, exercise = TRUE} 1440 | 1441 | ``` 1442 | 1443 | ```{r across_q3-hint} 1444 | penguins %>% 1445 | filter(island != "_____") %>% 1446 | select(_____:_____) %>% 1447 | rename(_____ = _____) %>% 1448 | group_by(_____) %>% 1449 | summarize(across(contains("_____"), mean, na.rm = TRUE, .names = "{.col}_avg")) 1450 | ``` 1451 | 1452 | ```{r across_q3-solution} 1453 | penguins %>% 1454 | filter(island != "Biscoe") %>% 1455 | select(species:body_mass_g) %>% 1456 | rename(spp_penguin = species) %>% 1457 | group_by(spp_penguin) %>% 1458 | summarize(across(contains("length"), mean, na.rm = TRUE, .names = "{.col}_avg")) 1459 | ``` 1460 | 1461 | 1462 | 1463 | ## 10. dplyr::count() 1464 | 1465 | [CLICK HERE](https://dplyr.tidyverse.org/reference/count.html) for `count()` documentation from tidyverse.org. 1466 | 1467 | The `dplyr::count()` function wraps a bunch of things into one beautiful friendly line of code to help you find counts of observations by group. To demonstrate what it does, let's find the counts of penguins in the `penguins` dataset by species in two different ways: 1468 | 1469 | 1. Using `group_by()` %>% `summarize()` with `n()` to count observations 1470 | 2. Using `count()` to do the exact same thing 1471 | 1472 | As a reminder, here's a sample from the **penguins** data (5 observations / 344 total). 1473 | 1474 | ```{r} 1475 | penguins[c(3,31,199,220,304),] %>% 1476 | kable() %>% 1477 | kable_styling(full_width = F) 1478 | ``` 1479 | 1480 | #### Approach 1: `group_by()` %>% `summarize()` w/ `n()` 1481 | 1482 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 1483 | penguins %>% 1484 | group_by(species) %>% 1485 | summarize( 1486 | n = n() 1487 | ) 1488 | ``` 1489 | 1490 | #### Approach 2: `count()` 1491 | 1492 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 1493 | penguins %>% 1494 | count(species) 1495 | ``` 1496 | 1497 | Pretty cool, right? The `dplyr::count()` function does all the work of `group_by()`, `summarize()` **and** `n()` for you! 1498 | 1499 | **Note:** The default assumes that each observation is in its own row (case format). If you have a column containing *counts* (i.e. more that one observation are represented in a single row) use the `wt = ` argument to specify the column containing counts, then the `count()` function will sum them instead to find totals. 1500 | 1501 | ### COUNT EXAMPLES 1502 | 1503 | #### `r fa("fas fa-robot", fill = "purple")` Example 1 1504 | 1505 | Starting with the `penguins` dataset, find the counts of penguins by species and year. 1506 | 1507 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 1508 | penguins %>% 1509 | count(species, year) 1510 | ``` 1511 | 1512 | #### `r fa("fas fa-robot", fill = "purple")` Example 2 1513 | 1514 | Starting from `penguins`, find the number of observations by island. 1515 | 1516 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 1517 | penguins %>% 1518 | count(island) 1519 | ``` 1520 | 1521 |
1522 |
1523 |
1524 | 1525 | ### COUNT PRACTICE ACTIVITIES 1526 | 1527 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 1 1528 | 1529 | Starting with `penguins`, find counts of observation by species, island and year. 1530 | 1531 | ```{r count_q1, exercise = TRUE} 1532 | 1533 | ``` 1534 | 1535 | ```{r count_q1-solution} 1536 | penguins %>% 1537 | count(species, island, year) 1538 | ``` 1539 | 1540 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 2 1541 | 1542 | Starting with `penguins`, filter to only keep Adelie and gentoo penguins, then find counts by species and sex. 1543 | 1544 | ```{r count_q2, exercise = TRUE} 1545 | 1546 | ``` 1547 | 1548 | ```{r count_q2-hint} 1549 | penguins %>% 1550 | filter(species %in% c("_____","_____")) %>% 1551 | count(_____, _____) 1552 | ``` 1553 | 1554 | ```{r count_q2-solution} 1555 | penguins %>% 1556 | filter(species %in% c("Adelie","Gentoo")) %>% 1557 | count(species, sex) 1558 | ``` 1559 | 1560 | ## 11. dplyr::case_when() 1561 | 1562 | [CLICK HERE](https://dplyr.tidyverse.org/reference/case_when.html) for `case_when()` documentation from tidyverse.org. 1563 | 1564 | The `case_when()` function is like a really friendly if-else statement. When used within `mutate()`, it allows you to add a new column containing values dependent on your condition(s). 1565 | 1566 | ```{r, echo=FALSE, out.width="80%", fig.align = "center"} 1567 | knitr::include_graphics("images/dplyr_case_when_sm.png") 1568 | ``` 1569 | 1570 | ### CASE_WHEN EXAMPLES 1571 | 1572 | #### `r fa("fas fa-robot", fill = "purple")` Example 1 1573 | 1574 | To `penguins`, add a new column `size_bin` that contains: 1575 | 1576 | - "large" if body mass is greater than 4500 g 1577 | - "medium" if body mass is greater than 3000 g, and less than or equal to 4500 g 1578 | - "small" if body mass is less than or equal to 3000 g 1579 | 1580 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 1581 | penguins %>% 1582 | mutate(size_bin = case_when( 1583 | body_mass_g > 4500 ~ "large", 1584 | body_mass_g > 3000 & body_mass_g <= 4500 ~ "medium", 1585 | body_mass_g <= 3000 ~ "small" 1586 | ) 1587 | ) 1588 | ``` 1589 | 1590 | #### `r fa("fas fa-robot", fill = "purple")` Example 2 1591 | 1592 | Starting with `penguins`: 1593 | 1594 | - Limit the columns to `species`, `year`, and `flipper_length_mm` 1595 | - Rename the `year` column to `study_year` 1596 | - Only keep observations for Adelie penguins 1597 | - Add a new column called `flipper_rank` that contains: 1598 | 1599 | - 1 if `flipper_length_mm` is < 200 mm 1600 | - 2 if `flipper_length_mm` is >= 200 mm 1601 | - 0 if `flipper_length_mm` is anything else (e.g. `NA`) 1602 | 1603 | ```{r, echo = TRUE, warning = FALSE, message = FALSE} 1604 | penguins %>% 1605 | select(species, year, flipper_length_mm) %>% 1606 | rename(study_year = year) %>% 1607 | filter(species == "Adelie") %>% 1608 | mutate(flipper_rank = case_when( 1609 | flipper_length_mm < 200 ~ 1, 1610 | flipper_length_mm >= 200 ~ 2, 1611 | TRUE ~ 0 # 0 for anything else 1612 | )) 1613 | ``` 1614 | 1615 |
1616 |
1617 |
1618 | 1619 | ### CASE_WHEN PRACTICE ACTIVITIES 1620 | 1621 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 1 1622 | 1623 | Add a new column to `penguins` called `study_year` that contains: 1624 | 1625 | - "Year 1" if the year is 2007 1626 | - "Year 2" if the year is 2008 1627 | - "Year 3" if the year is 2009 1628 | 1629 | ```{r case_when_q1, exercise = TRUE} 1630 | 1631 | ``` 1632 | 1633 | ```{r case_when_q1-hint} 1634 | penguins %>% 1635 | mutate(study_year = 1636 | case_when( 1637 | year == 2007 ~ "Year 1", 1638 | year == 2008 ~ "Year 2", 1639 | year == 2009 ~ "Year 3" 1640 | )) 1641 | ``` 1642 | 1643 | #### `r fa("fas fa-keyboard", fill = "purple")` Practice Activity 2 1644 | 1645 | Starting with `penguins`, only keep observations for chinstrap penguins, then only keep the `flipper_length_mm` and `body_mass_g` variables. Add a new column called `fm_ratio` that contains the ratio of flipper length to body mass for each penguin. Next, add another column named `ratio_bin` which contains the word "high" if `fm_ratio` is greater than or equal to 0.05, "low" if the ratio is less than 0.05, and "no record" if anything else (e.g. `NA`). 1646 | 1647 | ```{r case_when_q2, exercise = TRUE} 1648 | 1649 | ``` 1650 | 1651 | ```{r case_when_q2-hint} 1652 | penguins %>% 1653 | filter(species == "_____") %>% 1654 | select(_____, _____) %>% 1655 | mutate(fm_ratio = _____ / _____) %>% 1656 | mutate(ratio_bin = case_when( 1657 | fm_ratio >= 0.05 ~ "_____", 1658 | fm_ratio < 0.05 ~ "_____", 1659 | TRUE ~ "_____" 1660 | )) 1661 | ``` 1662 | 1663 | ```{r case_when_q2-solution} 1664 | penguins %>% 1665 | filter(species == "Chinstrap") %>% 1666 | select(flipper_length_mm, body_mass_g) %>% 1667 | mutate(fm_ratio = flipper_length_mm / body_mass_g) %>% 1668 | mutate(ratio_bin = case_when( 1669 | fm_ratio >= 0.05 ~ "high", 1670 | fm_ratio < 0.05 ~ "low", 1671 | TRUE ~ "no record" 1672 | )) 1673 | ``` 1674 | 1675 | ## Resources 1676 | 1677 | Want to learn more about `dplyr`, the `tidyverse`, or coding in R generally? Here are some great places to start: 1678 | 1679 | - [R for Data Science](https://r4ds.had.co.nz/) by Hadley Wickham and Garrett Grolemund 1680 | - [dplyr.tidyverse.org](https://dplyr.tidyverse.org/) 1681 | -------------------------------------------------------------------------------- /dplyr-learnr.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | 15 | AutoAppendNewline: Yes 16 | StripTrailingWhitespace: Yes 17 | 18 | BuildType: Website 19 | -------------------------------------------------------------------------------- /images/culmen_depth.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allisonhorst/dplyr-learnr/80a5e64685eac98826e3e44eed785436f74f8edb/images/culmen_depth.png -------------------------------------------------------------------------------- /images/data_wrangler.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allisonhorst/dplyr-learnr/80a5e64685eac98826e3e44eed785436f74f8edb/images/data_wrangler.png -------------------------------------------------------------------------------- /images/dplyr_across_where.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allisonhorst/dplyr-learnr/80a5e64685eac98826e3e44eed785436f74f8edb/images/dplyr_across_where.jpeg -------------------------------------------------------------------------------- /images/dplyr_case_when_sm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allisonhorst/dplyr-learnr/80a5e64685eac98826e3e44eed785436f74f8edb/images/dplyr_case_when_sm.png -------------------------------------------------------------------------------- /images/dplyr_filter_sm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allisonhorst/dplyr-learnr/80a5e64685eac98826e3e44eed785436f74f8edb/images/dplyr_filter_sm.png -------------------------------------------------------------------------------- /images/dplyr_hex.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allisonhorst/dplyr-learnr/80a5e64685eac98826e3e44eed785436f74f8edb/images/dplyr_hex.png -------------------------------------------------------------------------------- /images/dplyr_mutate.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allisonhorst/dplyr-learnr/80a5e64685eac98826e3e44eed785436f74f8edb/images/dplyr_mutate.png -------------------------------------------------------------------------------- /images/dplyr_relocate.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allisonhorst/dplyr-learnr/80a5e64685eac98826e3e44eed785436f74f8edb/images/dplyr_relocate.png -------------------------------------------------------------------------------- /images/lter_penguins.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allisonhorst/dplyr-learnr/80a5e64685eac98826e3e44eed785436f74f8edb/images/lter_penguins.png -------------------------------------------------------------------------------- /images/r_learners_sm.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allisonhorst/dplyr-learnr/80a5e64685eac98826e3e44eed785436f74f8edb/images/r_learners_sm.jpeg -------------------------------------------------------------------------------- /images/rename_sm.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/allisonhorst/dplyr-learnr/80a5e64685eac98826e3e44eed785436f74f8edb/images/rename_sm.jpg -------------------------------------------------------------------------------- /rsconnect/shinyapps.io/allisonhorst/dplyr-learnr.dcf: -------------------------------------------------------------------------------- 1 | name: dplyr-learnr 2 | title: dplyr-learnr 3 | username: 4 | account: allisonhorst 5 | server: shinyapps.io 6 | hostUrl: https://api.shinyapps.io/v1 7 | appId: 3544451 8 | bundleId: 4173616 9 | url: https://allisonhorst.shinyapps.io/dplyr-learnr/ 10 | when: 1612315822.30477 11 | asMultiple: FALSE 12 | asStatic: FALSE 13 | --------------------------------------------------------------------------------