├── chain.png ├── .gitignore ├── README.Rmd └── README.md /chain.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/trinker/dplyr_in_a_nutshell/HEAD/chain.png -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # History files 2 | .Rhistory 3 | # Example code in package build process 4 | *-Ex.R 5 | .Rprofile 6 | .Rproj.user 7 | dplyr_in_a_nutshell.Rproj 8 | chain.docx 9 | README.html 10 | -------------------------------------------------------------------------------- /README.Rmd: -------------------------------------------------------------------------------- 1 | dplyr In a Nutshell 2 | === 3 | 4 | This is a minimal guide, mostly for myself, to remind me of the most import dplyr functions and how they relate to base R functions that I'm familiar with. Also check out [tidyr In a Nutshell](https://github.com/trinker/tidyr_in_a_nutshell). 5 | 6 | ```{r setup, include=FALSE, echo=FALSE} 7 | opts_chunk$set(comment=NA, tidy=FALSE) 8 | ``` 9 | 10 | # 8 dplyr Functions to Rule the World 11 | 12 | ### Speedy Table 13 | 14 | `tbl_df` 15 | 16 | 17 | ### The 5 Guys + 1 18 | 19 | 1. `filter` 20 | 2. `select` 21 | 3. `mutate` 22 | 4. `group_by` 23 | 5. `summarise` 24 | 6. `arrange` 25 | 26 | ### Chaining (pronounced "then") 27 | 28 | `%>%` 29 | 30 | # Relating the Functions 31 | 32 | ### Speedy Table 33 | 34 | `tbl_df` works similar to `data.table` in that it prints sensibly. 35 | 36 | ### Relating the 5 Guys + 1 to base R 37 | 38 | List of dplyr functions and the base functions they're related to: 39 | 40 | Base Function | dplyr Function(s) | Special Powers 41 | -----------------|-------------------|----------------------------- 42 | subset | filter & select | filter rows & select columns 43 | transform | mutate | operate with columns not yet created 44 | split | group_by | splits without cutting 45 | lapply + do.call | summarise | apply and bind in a single bound 46 | order + with | arrange | "I only have to specify dataframe once?" 47 | 48 | ### Chaining 49 | 50 | `%>%`... Do you know ggplot2's `+`? Same idea. 51 | 52 | ![](chain.png) 53 | 54 | *Basically previous input in chain supplied as argument 1 to function on right side.* 55 | 56 | # Demos 57 | ### Speedy Table 58 | ```{r, message=FALSE} 59 | library(dplyr) 60 | mtcars2 <- tbl_df(mtcars) 61 | ``` 62 | 63 | ### The 5 Guys 64 | ```{r, message=FALSE} 65 | filter(mtcars2[1:10, ], cyl == 8) 66 | select(mtcars2[1:10, ], mpg, cyl, hp:vs) 67 | arrange(mtcars2[1:10, ], cyl, disp) 68 | mutate(mtcars2[1:10, ], displ_l = disp / 61.0237, displ_l_add1 = displ_l + 1) 69 | summarise(mtcars, mean(disp)) 70 | ``` 71 | 72 | ### Chaining 73 | 74 | ```{r} 75 | mtcars2 %>% 76 | group_by(cyl) %>% 77 | summarise(md=mean(disp), mh=mean(hp), mdh=mean(disp + hp)) 78 | mtcars2 %>% 79 | group_by(cyl, gear) %>% 80 | summarise(md=mean(disp), mh=mean(hp), mdh=mean(disp + hp)) %>% 81 | arrange(-cyl, -gear) 82 | ## Use `%>%` with base functions too!!! 83 | mtcars2 %>% 84 | group_by(cyl, gear) %>% 85 | summarise(md=mean(disp), mh=mean(hp), mdh=mean(disp + hp)) %>% 86 | arrange(-cyl, -gear) %>% 87 | head() 88 | mtcars2 %>% 89 | group_by(cyl) %>% 90 | summarise(max(disp), hp[1]) 91 | mtcars2 %>% 92 | group_by(cyl) %>% 93 | summarise(n = n()) 94 | table(mtcars$cyl) 95 | ``` 96 | 97 | 98 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | dplyr In a Nutshell 2 | === 3 | 4 | This is a minimal guide, mostly for myself, to remind me of the most import dplyr functions and how they relate to base R functions that I'm familiar with. Also check out [tidyr In a Nutshell](https://github.com/trinker/tidyr_in_a_nutshell). 5 | 6 | 7 | 8 | # 8 dplyr Functions to Rule the World 9 | 10 | ### Speedy Table 11 | 12 | `tbl_df` 13 | 14 | 15 | ### The 5 Guys + 1 16 | 17 | 1. `filter` 18 | 2. `select` 19 | 3. `mutate` 20 | 4. `group_by` 21 | 5. `summarise` 22 | 6. `arrange` 23 | 24 | ### Chaining (pronounced "then") 25 | 26 | `%>%` 27 | 28 | # Relating the Functions 29 | 30 | ### Speedy Table 31 | 32 | `tbl_df` works similar to `data.table` in that it prints sensibly. 33 | 34 | ### Relating the 5 Guys + 1 to base R 35 | 36 | List of dplyr functions and the base functions they're related to: 37 | 38 | Base Function | dplyr Function(s) | Special Powers 39 | -----------------|-------------------|----------------------------- 40 | subset | filter & select | filter rows & select columns 41 | transform | mutate | operate with columns not yet created 42 | split | group_by | splits without cutting 43 | lapply + do.call | summarise | apply and bind in a single bound 44 | order + with | arrange | "I only have to specify dataframe once?" 45 | 46 | ### Chaining 47 | 48 | `%>%`... Do you know ggplot2's `+`? Same idea. 49 | 50 | ![](chain.png) 51 | 52 | *Basically previous input in chain supplied as argument 1 to function on right side.* 53 | 54 | # Demos 55 | ### Speedy Table 56 | 57 | ```r 58 | library(dplyr) 59 | mtcars2 <- tbl_df(mtcars) 60 | ``` 61 | 62 | ### The 5 Guys 63 | 64 | ```r 65 | filter(mtcars2[1:10, ], cyl == 8) 66 | ``` 67 | 68 | ``` 69 | Source: local data frame [2 x 11] 70 | 71 | mpg cyl disp hp drat wt qsec vs am gear carb 72 | 1 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2 73 | 2 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4 74 | ``` 75 | 76 | ```r 77 | select(mtcars2[1:10, ], mpg, cyl, hp:vs) 78 | ``` 79 | 80 | ``` 81 | Source: local data frame [10 x 7] 82 | 83 | mpg cyl hp drat wt qsec vs 84 | Mazda RX4 21.0 6 110 3.90 2.620 16.46 0 85 | Mazda RX4 Wag 21.0 6 110 3.90 2.875 17.02 0 86 | Datsun 710 22.8 4 93 3.85 2.320 18.61 1 87 | Hornet 4 Drive 21.4 6 110 3.08 3.215 19.44 1 88 | Hornet Sportabout 18.7 8 175 3.15 3.440 17.02 0 89 | Valiant 18.1 6 105 2.76 3.460 20.22 1 90 | Duster 360 14.3 8 245 3.21 3.570 15.84 0 91 | Merc 240D 24.4 4 62 3.69 3.190 20.00 1 92 | Merc 230 22.8 4 95 3.92 3.150 22.90 1 93 | Merc 280 19.2 6 123 3.92 3.440 18.30 1 94 | ``` 95 | 96 | ```r 97 | arrange(mtcars2[1:10, ], cyl, disp) 98 | ``` 99 | 100 | ``` 101 | Source: local data frame [10 x 11] 102 | 103 | mpg cyl disp hp drat wt qsec vs am gear carb 104 | 1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 105 | 2 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 106 | 3 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 107 | 4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 108 | 5 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 109 | 6 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 110 | 7 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 111 | 8 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 112 | 9 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 113 | 10 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 114 | ``` 115 | 116 | ```r 117 | mutate(mtcars2[1:10, ], displ_l = disp / 61.0237, displ_l_add1 = displ_l + 1) 118 | ``` 119 | 120 | ``` 121 | Source: local data frame [10 x 13] 122 | 123 | mpg cyl disp hp drat wt qsec vs am gear carb displ_l 124 | 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 2.622 125 | 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 2.622 126 | 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 1.770 127 | 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 4.228 128 | 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 5.899 129 | 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 3.687 130 | 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 5.899 131 | 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 2.404 132 | 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 2.307 133 | 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 2.746 134 | Variables not shown: displ_l_add1 (dbl) 135 | ``` 136 | 137 | ```r 138 | summarise(mtcars, mean(disp)) 139 | ``` 140 | 141 | ``` 142 | mean(disp) 143 | 1 230.7 144 | ``` 145 | 146 | ### Chaining 147 | 148 | 149 | ```r 150 | mtcars2 %>% 151 | group_by(cyl) %>% 152 | summarise(md=mean(disp), mh=mean(hp), mdh=mean(disp + hp)) 153 | ``` 154 | 155 | ``` 156 | Source: local data frame [3 x 4] 157 | 158 | cyl md mh mdh 159 | 1 4 105.1 82.64 187.8 160 | 2 6 183.3 122.29 305.6 161 | 3 8 353.1 209.21 562.3 162 | ``` 163 | 164 | ```r 165 | mtcars2 %>% 166 | group_by(cyl, gear) %>% 167 | summarise(md=mean(disp), mh=mean(hp), mdh=mean(disp + hp)) %>% 168 | arrange(-cyl, -gear) 169 | ``` 170 | 171 | ``` 172 | Source: local data frame [8 x 5] 173 | Groups: cyl 174 | 175 | cyl gear md mh mdh 176 | 1 8 5 326.0 299.5 625.5 177 | 2 8 3 357.6 194.2 551.8 178 | 3 6 5 145.0 175.0 320.0 179 | 4 6 4 163.8 116.5 280.3 180 | 5 6 3 241.5 107.5 349.0 181 | 6 4 5 107.7 102.0 209.7 182 | 7 4 4 102.6 76.0 178.6 183 | 8 4 3 120.1 97.0 217.1 184 | ``` 185 | 186 | ```r 187 | ## Use `%>%` with base functions too!!! 188 | mtcars2 %>% 189 | group_by(cyl, gear) %>% 190 | summarise(md=mean(disp), mh=mean(hp), mdh=mean(disp + hp)) %>% 191 | arrange(-cyl, -gear) %>% 192 | head() 193 | ``` 194 | 195 | ``` 196 | Source: local data frame [6 x 5] 197 | Groups: cyl 198 | 199 | cyl gear md mh mdh 200 | 1 8 5 326.0 299.5 625.5 201 | 2 8 3 357.6 194.2 551.8 202 | 3 6 5 145.0 175.0 320.0 203 | 4 6 4 163.8 116.5 280.3 204 | 5 6 3 241.5 107.5 349.0 205 | 6 4 5 107.7 102.0 209.7 206 | ``` 207 | 208 | ```r 209 | mtcars2 %>% 210 | group_by(cyl) %>% 211 | summarise(max(disp), hp[1]) 212 | ``` 213 | 214 | ``` 215 | Source: local data frame [3 x 3] 216 | 217 | cyl max(disp) hp[1] 218 | 1 4 146.7 93 219 | 2 6 258.0 110 220 | 3 8 472.0 175 221 | ``` 222 | 223 | ```r 224 | mtcars2 %>% 225 | group_by(cyl) %>% 226 | summarise(n = n()) 227 | ``` 228 | 229 | ``` 230 | Source: local data frame [3 x 2] 231 | 232 | cyl n 233 | 1 4 11 234 | 2 6 7 235 | 3 8 14 236 | ``` 237 | 238 | ```r 239 | table(mtcars$cyl) 240 | ``` 241 | 242 | ``` 243 | 244 | 4 6 8 245 | 11 7 14 246 | ``` 247 | 248 | 249 | --------------------------------------------------------------------------------