├── .gitattributes ├── .gitignore ├── LICENSE ├── README.md ├── ggplot2_2nd_ed ├── README.md ├── chapter_02_getting_started_with_ggplot2.Rmd ├── chapter_03_toolbox.Rmd ├── chapter_04_mastering_the_grammar.Rmd ├── chapter_05_build_a_plot_layer_by_layer.Rmd ├── chapter_06_scales_axes_and_legends.Rmd ├── chapter_07_positioning.Rmd ├── chapter_08_themes.Rmd ├── chapter_09_data_analysis.Rmd ├── chapter_10_data_transformation.Rmd ├── chapter_11_modeling_for_visualization.Rmd ├── chapter_12_programming_with_ggplot2.Rmd ├── img │ ├── plot.png │ └── youcanbeapirate-wb-sparkline.jpg ├── input │ └── gw_KS_sf.rds └── output │ ├── output.pdf │ ├── output2.pdf │ └── plot.rds ├── ggplot2_3rd_ed ├── 6_5_raster_maps.R ├── README.md ├── chapter_02_first_steps.Rmd ├── chapter_03_individual_geoms.Rmd ├── chapter_04_collective_geoms.Rmd ├── chapter_05_statistical_summaries.Rmd ├── chapter_06_maps.Rmd ├── chapter_07_networks.Rmd ├── chapter_08_annotations.Rmd ├── chapter_09_arranging_plots.Rmd ├── chapter_10_position_scales_and_axes.Rmd ├── chapter_11_colour_scales_and_legends.Rmd ├── chapter_12_other_aesthetics.Rmd ├── chapter_13_mastering_the_grammar.Rmd ├── chapter_14_build_a_plot_layer_by_layer.Rmd ├── chapter_15_scales_and_guides.Rmd ├── chapter_16_coordinate_system.Rmd ├── chapter_17_faceting.Rmd ├── chapter_18_themes.Rmd ├── chapter_19_programming_with_ggplot2.Rmd ├── chapter_20_internals_of_ggplot2.Rmd ├── chapter_21_extending_ggplot2.Rmd ├── chapter_22_a_case_study.Rmd ├── img │ ├── gghighlight_and_ggthemes_with_logo.jpg │ ├── plot.png │ ├── plot_with_logo.jpg │ ├── plot_with_theme_and_logo.jpg │ ├── satellite_imagery_1.tif │ ├── satellite_imagery_1.tif.aux.xml │ └── youcanbeapirate-wb-sparkline.jpg ├── output │ ├── last_animation.gif │ ├── last_animation.mp4 │ ├── last_animation_2.gif │ ├── output.pdf │ ├── output2.pdf │ ├── plot.pdf │ ├── plot.rds │ └── plot_with_theme.pdf └── raster │ ├── IDE00420.202302122200.tif │ ├── IDE00421.202302122200.tif │ ├── IDE00422.202302122200.tif │ ├── IDE00423.202302122200.tif │ ├── IDE00425.202302122200.tif │ ├── IDE00426.202302122200.tif │ ├── IDE00427.202302122200.tif │ ├── IDE00430.202302122200.tif │ ├── IDE00431.202302122200.tif │ ├── IDE00432.202302122200.tif │ ├── IDE00433.202302122200.tif │ ├── IDE00435.202302122200.tif │ ├── IDE00436.202302122200.tif │ └── IDE00437.202302122200.tif ├── learning-ggplot2.Rproj └── real_life_use_cases └── ggplot2_dependencies_visualized ├── README.html ├── README.md ├── pkgsearch.R ├── pkgsearch.pdf ├── pkgsearch.png ├── pkgsearch_ver2.pdf ├── pkgsearch_ver2.png ├── pkgsearch_ver3.pdf └── pkgsearch_ver3.png /.gitattributes: -------------------------------------------------------------------------------- 1 | *.tif filter=lfs diff=lfs merge=lfs -text 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # History files 2 | .Rhistory 3 | .Rapp.history 4 | 5 | # Session Data files 6 | .RData 7 | 8 | # User-specific files 9 | .Ruserdata 10 | 11 | # Example code in package build process 12 | *-Ex.R 13 | 14 | # Output files from R CMD build 15 | /*.tar.gz 16 | 17 | # Output files from R CMD check 18 | /*.Rcheck/ 19 | 20 | # RStudio files 21 | .Rproj.user/ 22 | 23 | # produced vignettes 24 | vignettes/*.html 25 | vignettes/*.pdf 26 | 27 | # OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3 28 | .httr-oauth 29 | 30 | # knitr and R markdown default cache directories 31 | *_cache/ 32 | /cache/ 33 | 34 | # Temporary files created by R markdown 35 | *.utf8.md 36 | *.knit.md 37 | 38 | # R Environment Variables 39 | .Renviron 40 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Antti Rask 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![](ggplot2_2nd_ed/img/youcanbeapirate-wb-sparkline.jpg) 2 | 3 | # Learning ggplot2 4 | 5 | I'm learning __{ggplot2}__ by reading the book __ggplot2: Elegant Graphics for Data Analysis__ by __Hadley Wickham__. 6 | 7 | I've read both the physical copy of the 2nd edition (published in 2016) and the [online version](https://ggplot2-book.org/) of the 3rd edition. 8 | 9 | The idea of this repo is to collect the code for both versions of the book. There are sections that the two books share, but because the structures are different, I've included the same code for each version. While I understand that it isn't ideal, I'm willing to make the sacrifice. So that you don't have to jump from one version to another to follow if you're looking at the code, no matter which version of the book you're reading. 10 | 11 | So why even have my own version of the code for something that is already available through that link I shared? 12 | 13 | 1. The 2nd edition has chapters that aren't included in the 3rd. While I understand the reasoning for leaving them out, I still found them useful. But they are from 2016 and some of the code is either deprecated or superseded ([the tidyverse lifecycle stages](https://lifecycle.r-lib.org/articles/stages.html)). And I wanted to see if I could update the code so that it still works. 14 | 2. While __{ggplot2}__ is part of the [__{tidyverse}__](https://www.tidyverse.org/), not all the code is written using the best that {tidyverse} has to offer. I've translated some of the base R code to a tidier format and have also paid a lot of attention to readability. 15 | 3. Part of what makes {ggplot2} great is the vast number of 3rd party [extensions](https://exts.ggplot2.tidyverse.org/) like the ones on this list (to give a few examples). I've tried to include some code for the ones that weren't featured in the book. 16 | * {cowplot} 17 | * {gganimate} 18 | * {ggdist} 19 | * {ggthemes} 20 | * {patchwork} 21 | 22 | ## Disclaimer! 23 | This repo is not meant to replace the book in any way. You should definitely read the book (or even both versions). It will help you understand {ggplot2} and data visualization in general much better than looking at the code or playing with it. 24 | 25 | Also, I would recommend you buy the book and support the R open-source community by doing so. Here's a direct [link](https://link.springer.com/book/10.1007/978-3-319-24277-4) to __Springer__'s (the publisher) website. 26 | -------------------------------------------------------------------------------- /ggplot2_2nd_ed/README.md: -------------------------------------------------------------------------------- 1 | ![](img/youcanbeapirate-wb-sparkline.jpg) 2 | 3 | # Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (2nd ed.) 4 | 5 | I've tried to comment on the changes I've made to the code, but there are some frequent changes that I'll comment on here so I don't have to repeat myself constantly. 6 | 7 | * In general, I've tried to use a __tibble__ (tbl) instead of a __data frame__ (df). So instead of _as.dataframe()_, you will usually find _tibble()_, but not always. There are some situations where only a data frame will work. 8 | * There have been some changes to ggplot2 since 2016: 9 | * __fun.y__ parameter has become __fun__ 10 | * __size__ has become __linewidth__ (for all geoms that use lines) -------------------------------------------------------------------------------- /ggplot2_2nd_ed/chapter_02_getting_started_with_ggplot2.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (2nd ed.), Chapter 2: Getting Started with ggplot2" 3 | author: "Original Code: Hadley Wickham (except Matt Dancho for ggdist and tidyquant) | Modifications: Antti Rask" 4 | date: "2023-01-05" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 2 Getting Started with ggplot2 13 | 14 | ```{r} 15 | library(tidyverse) 16 | library(conflicted) 17 | conflict_prefer("select", "dplyr") 18 | ``` 19 | 20 | ## 2.2 Fuel Economy Data 21 | 22 | ```{r} 23 | mpg 24 | ``` 25 | 26 | ### 2.2.1 Exercises 27 | 28 | 1. 29 | ```{r} 30 | help(mpg) 31 | 32 | dim(mpg) 33 | 34 | summary(mpg) 35 | 36 | str(mpg) 37 | 38 | glimpse(mpg) 39 | ``` 40 | 41 | 2. 42 | ```{r} 43 | data(package = "ggplot2") 44 | ``` 45 | 46 | 3. 47 | ```{r} 48 | mpg %>% 49 | # Miles per gallon 50 | select(cty, hwy) %>% 51 | mutate( 52 | cty_eur = 235.21 / cty, 53 | hwy_eur = 235.21 / hwy 54 | ) 55 | ``` 56 | 57 | 4. 58 | ```{r} 59 | # Without touching models 60 | mpg %>% 61 | distinct(manufacturer, model) %>% 62 | summarize( 63 | n = n(), 64 | .by = manufacturer 65 | ) %>% 66 | arrange(desc(n)) 67 | ``` 68 | 69 | ```{r} 70 | # With the core models only 71 | mpg %>% 72 | mutate( 73 | core_model = str_extract(model, "^\\S+") 74 | ) %>% 75 | distinct(manufacturer, core_model) %>% 76 | summarize( 77 | n = n(), 78 | .by = manufacturer 79 | ) %>% 80 | arrange(desc(n)) 81 | ``` 82 | 83 | ## 2.3 Key Components 84 | 85 | ```{r} 86 | mpg %>% 87 | ggplot(aes(displ, hwy)) + 88 | geom_point() 89 | ``` 90 | 91 | ### 2.3.1 Exercises 92 | 93 | 1. 94 | ```{r} 95 | mpg %>% 96 | ggplot(aes(cty, hwy)) + 97 | geom_point() 98 | ``` 99 | 100 | 2. 101 | ```{r} 102 | # Models and manufacturers 103 | mpg %>% 104 | ggplot(aes(model, manufacturer)) + 105 | geom_point() 106 | 107 | # One example solution with the amount of unique models by manufacturer 108 | mpg %>% 109 | mutate( 110 | manufacturer = manufacturer %>% as.factor(), 111 | core_model = str_extract(model, "^\\S+") 112 | ) %>% 113 | distinct(manufacturer, core_model) %>% 114 | summarize( 115 | n = n(), 116 | .by = manufacturer 117 | ) %>% 118 | 119 | ggplot(aes(fct_reorder(manufacturer, n), n)) + 120 | geom_col() + 121 | coord_flip() + 122 | labs( 123 | title = "Amount of unique models by manufacturer", 124 | subtitle = "", 125 | x = "", 126 | y = "" 127 | ) 128 | ``` 129 | 130 | 3. 131 | ```{r} 132 | # 1. 133 | mpg %>% 134 | ggplot(aes(cty, hwy)) + 135 | geom_point() 136 | 137 | # 2. 138 | diamonds %>% 139 | ggplot(aes(carat, price)) + 140 | geom_point() 141 | 142 | # 3. 143 | economics %>% 144 | ggplot(aes(date, unemploy)) + 145 | geom_line() 146 | 147 | # 4. 148 | mpg %>% 149 | ggplot(aes(cty)) + 150 | geom_histogram() 151 | ``` 152 | 153 | ## 2.4 Color, Size, Shape and Other Aesthetic Attributes 154 | 155 | ```{r} 156 | mpg %>% 157 | ggplot(aes(displ, cty, color = class)) + 158 | geom_point() 159 | ``` 160 | 161 | ```{r} 162 | p <- mpg %>% 163 | ggplot(aes(displ, hwy)) 164 | 165 | p + 166 | geom_point(aes(color = "blue")) 167 | 168 | p + 169 | geom_point(color = "blue") 170 | ``` 171 | 172 | 173 | ### 2.4.1 Exercises 174 | 175 | 1. 176 | ```{r} 177 | # Map color to continuous value 178 | mpg %>% 179 | ggplot(aes(cty, hwy, color = displ)) + 180 | geom_point() 181 | 182 | # Map color to categorical value 183 | mpg %>% 184 | ggplot(aes(cty, hwy, color = trans)) + 185 | geom_point() 186 | 187 | # Use more than one aesthetic in a plot 188 | mpg %>% 189 | ggplot(aes(cty, hwy, color = trans, size = trans)) + 190 | geom_point() 191 | ``` 192 | 193 | 2. 194 | ```{r} 195 | # Commented out, because causes error and can't see the other plot with the error: 196 | # mpg %>% 197 | # ggplot(aes(cty, hwy, shape = displ)) + 198 | # geom_point() 199 | 200 | mpg %>% 201 | ggplot(aes(cty, hwy, shape = trans)) + 202 | geom_point() 203 | ``` 204 | 205 | 3. 206 | ```{r} 207 | mpg %>% 208 | ggplot(aes(drv, cty)) + 209 | geom_boxplot() 210 | 211 | mpg %>% 212 | ggplot(aes(class, displ)) + 213 | geom_point(aes(color = drv)) 214 | ``` 215 | 216 | ## 2.5 Faceting 217 | 218 | ```{r} 219 | mpg %>% 220 | ggplot(aes(displ, hwy)) + 221 | geom_point() + 222 | facet_wrap(vars(class)) 223 | ``` 224 | 225 | ### 2.5.1. Exercises 226 | 227 | 1. 228 | ```{r} 229 | p <- mpg %>% 230 | ggplot(aes(drv, displ, fill = class)) + 231 | geom_col(position = "dodge") 232 | 233 | p + 234 | facet_wrap(vars(hwy)) 235 | 236 | p + 237 | facet_wrap(vars(cyl)) 238 | ``` 239 | 240 | 2. 241 | ```{r} 242 | mpg %>% 243 | ggplot(aes(displ, cty)) + 244 | geom_point() + 245 | facet_wrap(vars(cyl)) 246 | ``` 247 | 248 | 3. 249 | ```{r} 250 | p <- mpg %>% 251 | ggplot(aes(displ, cty)) + 252 | geom_point() 253 | 254 | p + 255 | facet_wrap( 256 | vars(cyl), 257 | nrow = 4, 258 | ncol = 1 259 | ) 260 | 261 | p + 262 | facet_wrap( 263 | vars(cyl), 264 | nrow = 1, 265 | ncol = 4 266 | ) 267 | ``` 268 | 269 | ## 2.6 Plot Geoms 270 | 271 | ### 2.6.1 Adding a Smoother to a Plot 272 | 273 | ```{r} 274 | mpg %>% 275 | ggplot(aes(displ, hwy)) + 276 | geom_point() + 277 | geom_smooth() 278 | ``` 279 | 280 | ```{r} 281 | # How span affects the resulting smoother 282 | p <- mpg %>% 283 | ggplot(aes(displ, hwy)) + 284 | geom_point() 285 | 286 | p + 287 | geom_smooth(span = 0.2) 288 | 289 | p + 290 | geom_smooth(span = 1) 291 | ``` 292 | 293 | ```{r} 294 | library(mgcv) 295 | 296 | mpg %>% 297 | ggplot(aes(displ, hwy)) + 298 | geom_point() + 299 | geom_smooth(method = "gam", formula = y ~ s(x)) 300 | ``` 301 | 302 | ```{r} 303 | # Line of best fit 304 | mpg %>% 305 | ggplot(aes(displ, hwy)) + 306 | geom_point() + 307 | geom_smooth(method = "lm") 308 | ``` 309 | 310 | ### 2.6.2 Boxplots and jittered points 311 | 312 | ```{r} 313 | p <- mpg %>% 314 | ggplot(aes(drv, hwy)) 315 | 316 | p + 317 | geom_point() 318 | 319 | p + 320 | geom_jitter() 321 | 322 | p + 323 | geom_boxplot() 324 | 325 | p + 326 | geom_violin() 327 | ``` 328 | 329 | ### 2.6.3 Histograms and frequency polygons 330 | 331 | ```{r} 332 | p <- mpg %>% 333 | ggplot(aes(hwy)) 334 | 335 | p + 336 | geom_histogram() 337 | 338 | p + 339 | geom_freqpoly() 340 | 341 | p + 342 | geom_freqpoly(binwidth = 2.5) 343 | 344 | p + 345 | geom_freqpoly(binwidth = 1) 346 | ``` 347 | 348 | ```{r} 349 | mpg %>% 350 | ggplot(aes(displ, color = drv)) + 351 | geom_freqpoly(binwidth = 0.5) 352 | 353 | mpg %>% 354 | ggplot(aes(displ, fill = drv)) + 355 | geom_histogram(binwidth = 0.5) + 356 | facet_wrap(vars(drv), ncol = 1) 357 | ``` 358 | 359 | ### 2.6.4 Bar charts 360 | 361 | ```{r} 362 | mpg %>% 363 | ggplot(aes(manufacturer)) + 364 | geom_bar() 365 | ``` 366 | 367 | ```{r} 368 | drugs <- tibble( 369 | drug = c("a", "b", "c"), 370 | effect = c(4.2, 9.7, 6.1) 371 | ) 372 | 373 | p <- drugs %>% 374 | ggplot(aes(drug, effect)) 375 | 376 | p + 377 | geom_bar(stat = "identity") 378 | 379 | p + 380 | geom_point() 381 | ``` 382 | 383 | ### 2.6.5 Time series with line and path plots 384 | 385 | ```{r} 386 | economics %>% 387 | ggplot(aes(date, unemploy / pop)) + 388 | geom_line() 389 | 390 | economics %>% 391 | ggplot(aes(date, uempmed)) + 392 | geom_line() 393 | ``` 394 | 395 | ```{r} 396 | library(lubridate) 397 | 398 | # I decided to replace all of this... 399 | # year <- function(x) as.POSIXlt(x)$year + 1900 400 | # ... with the year() function from lubridate 401 | 402 | p <- economics %>% 403 | ggplot(aes(unemploy / pop, uempmed)) 404 | 405 | p + 406 | geom_path() + 407 | geom_point() 408 | 409 | p + 410 | geom_path(color = "grey50") + 411 | geom_point(aes(color = year(date))) 412 | ``` 413 | 414 | ### 2.6.6 Exercises 415 | 416 | 1. 417 | ```{r} 418 | mpg %>% 419 | ggplot(aes(cty, hwy)) + 420 | geom_jitter(alpha = 0.5) 421 | ``` 422 | 423 | 2. 424 | ```{r} 425 | mpg %>% 426 | ggplot(aes(class, hwy)) + 427 | geom_boxplot() 428 | 429 | mpg %>% 430 | ggplot(aes(reorder(class, hwy), hwy)) + 431 | geom_boxplot() 432 | ``` 433 | 434 | 3. 435 | ```{r} 436 | diamonds %>% 437 | ggplot(aes(carat)) + 438 | geom_histogram(binwidth = 0.3) 439 | ``` 440 | 441 | 4. 442 | ```{r} 443 | diamonds %>% 444 | ggplot(aes(fct_reorder(cut, price), price)) + 445 | geom_boxplot() 446 | 447 | diamonds %>% 448 | ggplot(aes(price, after_stat(density), color = cut)) + 449 | geom_freqpoly(binwidth = 200) 450 | ``` 451 | 452 | 6. 453 | ```{r} 454 | ?geom_bar() 455 | 456 | p <- mpg %>% 457 | ggplot(aes(class)) 458 | 459 | p + 460 | geom_bar() 461 | 462 | p + 463 | geom_bar(aes(weight = displ)) 464 | ``` 465 | 466 | ## 2.7 Modifying the axes 467 | 468 | ```{r} 469 | p <- mpg %>% 470 | ggplot(aes(cty, hwy)) + 471 | geom_point(alpha = 1 / 3) 472 | 473 | p 474 | 475 | p + 476 | labs( 477 | x = "city driving (mpg)", 478 | y = "highway driving (mpg)" 479 | ) 480 | 481 | # Remove the axis labels with NULL 482 | p + 483 | labs( 484 | x = NULL, 485 | y = NULL 486 | ) 487 | ``` 488 | 489 | ```{r} 490 | p <- mpg %>% 491 | ggplot(aes(drv, hwy)) + 492 | geom_jitter(width = 0.25) 493 | 494 | p 495 | 496 | p + 497 | xlim("f", "r") + 498 | ylim(20, 30) 499 | 500 | # For continuous scales, use NA to set only one limit 501 | mpg %>% 502 | ggplot(aes(drv, hwy)) + 503 | geom_jitter(width = 0.25, na.rm = TRUE) + 504 | ylim(NA, 30) 505 | ``` 506 | 507 | ## 2.8 Output 508 | 509 | ```{r} 510 | p <- mpg %>% 511 | ggplot(aes(displ, hwy, color = factor(cyl))) + 512 | geom_point() 513 | ``` 514 | 515 | ```{r} 516 | p 517 | 518 | print(p) 519 | ``` 520 | 521 | ```{r} 522 | # Save png to disk 523 | ggsave("img/plot.png", p, width = 5, height = 5) 524 | ``` 525 | 526 | ```{r} 527 | summary(p) 528 | ``` 529 | 530 | ```{r} 531 | saveRDS(p, "output/plot.rds") 532 | q <- readRDS("output/plot.rds") 533 | q 534 | ``` 535 | 536 | ## 2.X Raincloud plot with ggdist 537 | 538 | https://www.business-science.io/r/2021/07/22/ggdist-raincloud-plots.html 539 | 540 | ```{r} 541 | library(ggdist) 542 | library(tidyquant) 543 | 544 | mpg %>% 545 | ggplot(aes(factor(drv), hwy, fill = factor(drv))) + 546 | 547 | # Add half-violin from {ggdist} package 548 | stat_halfeye( 549 | # Custom bandwidth 550 | adjust = 0.5, 551 | # Move geom to the right 552 | justification = -.2, 553 | # Remove slab interval 554 | .width = 0, 555 | point_color = NA 556 | ) + 557 | 558 | # Add boxplot 559 | geom_boxplot( 560 | width = .12, 561 | # Remove outliers 562 | outlier.color = NA, 563 | alpha = 0.5 564 | ) + 565 | 566 | # Add dot plots from {ggdist} package 567 | stat_dots( 568 | # Orientation to the left 569 | side = "left", 570 | # Move geom to the left 571 | justification = 1.1, 572 | # Adjust grouping (binning) of observations 573 | binwidth = .25 574 | ) + 575 | 576 | # Adjust theme 577 | scale_fill_tq() + 578 | theme_tq() + 579 | labs( 580 | title = "Raincloud plot", 581 | subtitle = "Is it better to get all the information in one plot?", 582 | x = "Drivetrain Type", 583 | y = "Highway Fuel Economy (MPG)", 584 | fill = "Drivetrain" 585 | ) + 586 | coord_flip() 587 | ``` 588 | -------------------------------------------------------------------------------- /ggplot2_2nd_ed/chapter_04_mastering_the_grammar.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (2nd ed.), Chapter 4 - Mastering the Grammar" 3 | author: "Original Code: Hadley Wickham | Modifications: Antti Rask" 4 | date: "2023-01-13" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 4 Mastering the Grammar 13 | 14 | ```{r} 15 | library(tidyverse) 16 | ``` 17 | 18 | ## 4.2 Building a Scatterplot 19 | 20 | ```{r} 21 | mpg %>% 22 | ggplot(aes(displ, hwy, color = factor(cyl))) + 23 | geom_point() 24 | ``` 25 | 26 | ### 4.2.1 Mapping aesthetics to data 27 | 28 | ```{r} 29 | p <- mpg %>% 30 | ggplot(aes(displ, hwy, color = factor(cyl))) 31 | 32 | # Don't do this! 33 | p + 34 | geom_line() + 35 | theme(legend.position = "none") 36 | 37 | # Or this! 38 | p + 39 | geom_bar(stat = "identity", position = "identity", fill = NA) + 40 | theme(legend.position = "none") 41 | 42 | # This works 43 | p + 44 | geom_point() + 45 | geom_smooth(method = "lm") 46 | ``` 47 | 48 | ## 4.3 Adding Complexity 49 | 50 | ```{r} 51 | mpg %>% 52 | ggplot(aes(displ, hwy)) + 53 | geom_point() + 54 | geom_smooth() + 55 | facet_wrap(vars(year)) 56 | ``` -------------------------------------------------------------------------------- /ggplot2_2nd_ed/chapter_05_build_a_plot_layer_by_layer.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (2nd ed.), Chapter 5 - Build a Plot Layer by Layer" 3 | author: "Original Code: Hadley Wickham | Modifications: Antti Rask" 4 | date: "2023-01-15" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 5 Build a Plot Layer by Layer 13 | 14 | ```{r} 15 | library(tidyverse) 16 | library(conflicted) 17 | conflict_prefer("filter", "dplyr") 18 | ``` 19 | 20 | ## 5.2 Building a Plot 21 | 22 | ```{r} 23 | p <- mpg %>% 24 | ggplot(aes(displ, hwy)) 25 | 26 | p 27 | ``` 28 | 29 | ```{r} 30 | p + 31 | geom_point() 32 | 33 | p + 34 | layer( 35 | mapping = NULL, 36 | data = NULL, 37 | geom = "point", 38 | stat = "identity", 39 | position = "identity" 40 | ) 41 | ``` 42 | 43 | ## 5.3 Data 44 | 45 | ```{r} 46 | mod <- 47 | loess(hwy ~ displ, data = mpg) 48 | 49 | grid <- 50 | tibble( 51 | displ = seq( 52 | min(mpg$displ), 53 | max(mpg$displ), 54 | length = 50) 55 | ) 56 | 57 | grid$hwy <- 58 | predict( 59 | mod, 60 | newdata = grid 61 | ) 62 | 63 | grid 64 | 65 | mod 66 | ``` 67 | 68 | ```{r} 69 | std_resid <- resid(mod) / mod$s 70 | 71 | outlier <- mpg %>% 72 | filter(abs(std_resid) > 2) 73 | 74 | outlier 75 | ``` 76 | 77 | ```{r} 78 | mpg %>% 79 | ggplot(aes(displ, hwy)) + 80 | geom_point() + 81 | geom_line( 82 | data = grid, 83 | color = "blue", 84 | linewidth = 1.5 85 | ) + 86 | geom_text(data = outlier, aes(label = model)) 87 | 88 | # Don't do this! 89 | ggplot(mapping = aes(displ, hwy)) + 90 | geom_point(data = mpg) + 91 | geom_line( 92 | data = grid, 93 | color = "blue", 94 | linewidth = 1.5 95 | ) + 96 | geom_text(data = outlier, aes(label = model)) 97 | ``` 98 | 99 | ### 5.3.1 Exercises 100 | 101 | 2. 102 | ```{r} 103 | class <- mpg %>% 104 | group_by(class) %>% 105 | summarise( 106 | n = n(), 107 | hwy = mean(hwy) 108 | ) 109 | 110 | mpg %>% 111 | ggplot(aes(class, hwy)) + 112 | geom_jitter(width = 0.2) + 113 | geom_point( 114 | data = class, 115 | size = 5, 116 | color = "red" 117 | ) + 118 | geom_text( 119 | data = class, 120 | aes(y = 10, label = str_c("n = ", n)), 121 | size = 3 122 | ) 123 | ``` 124 | 125 | ## 5.4 Aesthetic Mappings 126 | 127 | ```{r} 128 | aes(x = displ, y = hwy, color = class) 129 | 130 | aes(displ, hwy, color = class) 131 | ``` 132 | 133 | ### 5.4.1 Specifying the Aesthetics in the Plot vs. in the Layers 134 | 135 | ```{r} 136 | # All create the same plot specification 137 | mpg %>% 138 | ggplot(aes(displ, hwy, color = class)) + 139 | geom_point() 140 | 141 | mpg %>% 142 | ggplot(aes(displ, hwy)) + 143 | geom_point(aes(color = class)) 144 | 145 | mpg %>% 146 | ggplot(aes(displ)) + 147 | geom_point(aes(y = hwy, color = class)) 148 | 149 | mpg %>% 150 | ggplot() + 151 | geom_point(aes(displ, hwy, color = class)) 152 | ``` 153 | 154 | ```{r} 155 | mpg %>% 156 | ggplot(aes(displ, hwy, color = class)) + 157 | geom_point() + 158 | geom_smooth(method = "lm", se = FALSE) + 159 | theme(legend.position = "none") 160 | 161 | mpg %>% 162 | ggplot(aes(displ, hwy)) + 163 | geom_point(aes(color = class)) + 164 | geom_smooth(method = "lm", se = FALSE) + 165 | theme(legend.position = "none") 166 | ``` 167 | 168 | ### 5.4.2 Setting vs. Mapping 169 | 170 | ```{r} 171 | p <- mpg %>% 172 | ggplot(aes(cty, hwy)) 173 | 174 | p + 175 | geom_point(color = "darkblue") 176 | 177 | p + 178 | geom_point(aes(color = "darkblue")) 179 | 180 | # Override the default scale 181 | p + 182 | geom_point(aes(color = "darkblue")) + 183 | scale_color_identity() 184 | ``` 185 | 186 | ```{r} 187 | mpg %>% 188 | ggplot(aes(displ, hwy)) + 189 | geom_point() + 190 | geom_smooth( 191 | aes(color = "loess"), 192 | method = "loess", 193 | se = FALSE 194 | ) + 195 | geom_smooth( 196 | aes(color = "lm"), 197 | method = "lm", 198 | se = FALSE) + 199 | labs(color = "Method") 200 | ``` 201 | 202 | ### 5.4.3 Exercises 203 | 204 | 1. 205 | ```{r} 206 | mpg %>% 207 | ggplot() + 208 | geom_point(aes(displ, hwy)) 209 | 210 | mpg %>% 211 | ggplot(aes(cty, hwy)) + 212 | geom_point() + 213 | geom_smooth() 214 | 215 | msleep %>% 216 | mutate( 217 | log_brainwt = log(brainwt), 218 | log_bodywt = log(bodywt)) %>% 219 | ggplot(aes(log_brainwt, log_bodywt)) + 220 | geom_point() 221 | ``` 222 | 223 | 2. 224 | ```{r} 225 | # Don't do this! 226 | ggplot(mpg) + 227 | geom_point(aes(class, cty)) + 228 | geom_boxplot(aes(trans, hwy)) 229 | ``` 230 | 231 | ## 5.6 Stats 232 | 233 | ```{r} 234 | p <- 235 | mpg %>% 236 | ggplot(aes(trans, cty)) + 237 | geom_point() 238 | 239 | p + 240 | stat_summary( 241 | geom = "point", 242 | fun = "mean", 243 | color = "red", 244 | size = 4 245 | ) 246 | 247 | p + 248 | geom_point( 249 | stat = "summary", 250 | fun = "mean", 251 | color = "red", 252 | size = 4 253 | ) 254 | ``` 255 | 256 | ### 5.6.1 Generated Variables 257 | 258 | ```{r} 259 | p <- diamonds %>% 260 | ggplot(aes(price)) 261 | 262 | p + 263 | geom_histogram(binwidth = 500) 264 | 265 | p + 266 | geom_histogram( 267 | aes(y = after_stat(density)), 268 | binwidth = 500 269 | ) 270 | ``` 271 | 272 | ```{r} 273 | p <- 274 | diamonds %>% 275 | ggplot(aes(price, color = cut)) 276 | 277 | p + 278 | geom_freqpoly(binwidth = 500) + 279 | theme(legend.position = "none") 280 | 281 | p + 282 | geom_freqpoly( 283 | aes(y = after_stat(density)), 284 | binwidth = 500 285 | ) + 286 | theme(legend.position = "none") 287 | ``` 288 | 289 | ### 5.6.2 Exercises 290 | 291 | 1. 292 | ```{r} 293 | mod <- loess(hwy ~ displ, data = mpg) 294 | 295 | smoothed <- tibble(displ = seq(1.6, 7, length = 50)) 296 | 297 | pred <- predict(mod, newdata = smoothed, se = TRUE) 298 | 299 | smoothed$hwy <- pred$fit 300 | 301 | smoothed$hwy_lwr <- pred$fit - 1.96 * pred$se.fit 302 | 303 | smoothed$hwy_upr <- pred$fit + 1.96 * pred$se.fit 304 | 305 | smoothed 306 | 307 | mpg %>% 308 | ggplot(aes(displ, hwy)) + 309 | geom_smooth() 310 | 311 | smoothed %>% 312 | ggplot(aes(displ, hwy)) + 313 | geom_line(color = "blue", linewidth = 1) + 314 | geom_ribbon( 315 | aes(ymin = hwy_lwr, ymax = hwy_upr), 316 | alpha = 0.2 317 | ) 318 | ``` 319 | 320 | 3. 321 | ```{r} 322 | mpg %>% 323 | ggplot(aes(drv, trans)) + 324 | geom_count(aes(size = after_stat(prop), group = 1)) 325 | ``` 326 | 327 | ## 5.7. Position Adjustments 328 | 329 | ```{r} 330 | dplot <- diamonds %>% 331 | ggplot(aes(color, fill = cut)) + 332 | labs( 333 | x = NULL, 334 | y = NULL 335 | ) + 336 | theme(legend.position = "none") 337 | 338 | # Added the titles, so it's easier to remember which one you're looking at 339 | dplot + 340 | geom_bar() + 341 | labs( 342 | title = "position_stack() (default for geom_bar())" 343 | ) 344 | 345 | dplot + 346 | geom_bar(position = "fill") + 347 | labs( 348 | title = "position_fill()" 349 | ) 350 | 351 | dplot + 352 | geom_bar(position = "dodge") + 353 | labs( 354 | title = "position_dodge()" 355 | ) 356 | ``` 357 | 358 | ```{r} 359 | # Don't do this! 360 | dplot + 361 | geom_bar( 362 | position = "identity", 363 | alpha = 1 / 2, 364 | color = "grey50" 365 | ) 366 | 367 | diamonds %>% 368 | ggplot(aes(color, color = cut)) + 369 | geom_line(aes(group = cut), stat = "count") + 370 | labs( 371 | x = NULL, 372 | y = NULL 373 | ) + 374 | theme(legend.position = "none") 375 | ``` 376 | 377 | ```{r} 378 | p <- mpg %>% 379 | ggplot(aes(displ, hwy)) 380 | 381 | p + 382 | geom_point(position = "jitter") 383 | 384 | p + 385 | geom_point( 386 | position = position_jitter( 387 | width = 0.05, 388 | height = 0.5 389 | ) 390 | ) 391 | 392 | p + 393 | geom_jitter(width = 0.05, height = 0.5) 394 | ``` 395 | -------------------------------------------------------------------------------- /ggplot2_2nd_ed/chapter_07_positioning.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (2nd ed.), Chapter 7 - Positioning" 3 | author: "Original Code: Hadley Wickham | Modifications: Antti Rask" 4 | date: "2023-01-18" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 7 Positioning 13 | 14 | ```{r} 15 | library(tidyverse) 16 | library(conflicted) 17 | conflict_prefer("filter", "dplyr") 18 | conflict_prefer("select", "dplyr") 19 | ``` 20 | 21 | ## 7.2 Facetting 22 | 23 | ```{r} 24 | mpg2 <- mpg %>% 25 | # subset -> filter 26 | filter(cyl != 5 & drv %in% c("4", "f") & class != "2seater") 27 | ``` 28 | 29 | ### 7.2.1 Facet Wrap 30 | 31 | ```{r} 32 | base <- mpg2 %>% 33 | ggplot(aes(displ, hwy)) + 34 | geom_blank() + 35 | labs( 36 | x = NULL, 37 | y = NULL 38 | ) 39 | 40 | base + 41 | facet_wrap( 42 | vars(class), 43 | ncol = 3 44 | ) 45 | 46 | base + facet_wrap( 47 | vars(class), 48 | ncol = 3, 49 | as.table = FALSE 50 | ) 51 | 52 | base + 53 | facet_wrap( 54 | vars(class), 55 | nrow = 3 56 | ) 57 | 58 | base + 59 | facet_wrap( 60 | vars(class), 61 | nrow = 3, 62 | dir = "v" 63 | ) 64 | ``` 65 | 66 | ### 7.2.2 Facet Grid 67 | 68 | ```{r} 69 | base + 70 | facet_grid(cols = vars(cyl)) 71 | 72 | base + 73 | facet_grid(rows = vars(drv)) 74 | 75 | base + 76 | facet_grid(vars(drv), vars(cyl)) 77 | ``` 78 | 79 | ### 7.2.3 Controlling Scales 80 | 81 | ```{r} 82 | p <- mpg2 %>% 83 | ggplot(aes(cty, hwy)) + 84 | geom_abline() + 85 | geom_jitter(width = 0.1, height = 0.1) 86 | 87 | p + 88 | facet_wrap(vars(cyl)) 89 | 90 | p + 91 | facet_wrap(vars(cyl), scales = "free") 92 | ``` 93 | 94 | ```{r} 95 | economics_long 96 | 97 | economics_long %>% 98 | ggplot(aes(date, value)) + 99 | geom_line() + 100 | facet_wrap( 101 | vars(variable), 102 | scales = "free_y", 103 | ncol = 1 104 | ) 105 | ``` 106 | 107 | ```{r} 108 | mpg3 <- mpg2 %>% 109 | mutate( 110 | model = as_factor(model), 111 | manufacturer = as_factor(manufacturer) 112 | ) 113 | 114 | mpg3 %>% 115 | ggplot(aes(cty, fct_reorder(model, cty))) + 116 | geom_point() + 117 | facet_grid( 118 | rows = vars( 119 | fct_reorder( 120 | manufacturer, 121 | cty, 122 | .desc = TRUE 123 | ) 124 | ), 125 | scales = "free", 126 | space = "free" 127 | ) + 128 | theme(strip.text.y = element_text(angle = 0)) + 129 | labs(y = "model") 130 | ``` 131 | 132 | ### 7.2.4 Missing Facetting Variables 133 | 134 | ```{r} 135 | tbl1 <- tibble( 136 | x = 1:3, 137 | y = 1:3, 138 | gender = c("f", "f", "m") 139 | ) 140 | 141 | tbl2 <- tibble(x = 2, y = 2) 142 | 143 | tbl1 %>% 144 | ggplot(aes(x, y)) + 145 | geom_point( 146 | data = tbl2, 147 | color = "red", 148 | size = 2 149 | ) + 150 | geom_point() + 151 | facet_wrap(vars(gender)) 152 | ``` 153 | 154 | ### 7.2.5 Grouping vs. Facetting 155 | 156 | ```{r} 157 | tbl <- data.frame( 158 | x = rnorm(120, c(0, 2, 4)), 159 | y = rnorm(120, c(1, 2, 1)), 160 | z = letters[1:3] 161 | ) %>% as_tibble() 162 | 163 | # Grouping 164 | p <- tbl %>% 165 | ggplot(aes(x, y)) 166 | 167 | p + 168 | geom_point(aes(color = z)) + 169 | labs(title = "Grouping") 170 | 171 | # Faceting 172 | p + 173 | geom_point() + 174 | facet_wrap(vars(z)) + 175 | labs(title = "Faceting") 176 | 177 | # Faceting with means 178 | tbl_sum <- tbl %>% 179 | summarize( 180 | x = mean(x), 181 | y = mean(y), 182 | .by = z 183 | ) %>% 184 | rename(z2 = z) 185 | 186 | p + 187 | geom_point() + 188 | geom_point( 189 | data = tbl_sum, 190 | aes(color = z2), 191 | size = 4 192 | ) + 193 | facet_wrap(vars(z)) + 194 | labs(title = "Faceting with Means") 195 | 196 | # Grouping + faceting 197 | tbl3 <- tbl %>% 198 | select(-z) 199 | 200 | p + 201 | geom_point(data = tbl3, color = "grey70") + 202 | geom_point(aes(color = z)) + 203 | facet_wrap(vars(z)) + 204 | labs(title = "Grouping + Faceting") 205 | ``` 206 | 207 | ### 7.2.6 Continuous Variables 208 | 209 | ```{r} 210 | mpg4 <- mpg2 %>% 211 | mutate( 212 | # Bins of width 1 213 | disp_w = displ %>% cut_width(1), 214 | # Six bins of equal length 215 | disp_i = displ %>% cut_interval(6), 216 | # Six bins containing equal numbers of points 217 | disp_n = displ %>% cut_number(6) 218 | ) 219 | 220 | plot <- mpg4 %>% 221 | ggplot(aes(cty, hwy)) + 222 | geom_point() + 223 | labs( 224 | x = NULL, 225 | y = NULL 226 | ) 227 | 228 | plot + 229 | facet_wrap(vars(disp_w), nrow = 1) 230 | 231 | plot + 232 | facet_wrap(vars(disp_i), nrow = 1) 233 | 234 | plot + 235 | facet_wrap(vars(disp_n), nrow = 1) 236 | ``` 237 | 238 | ### 7.2.7 Exercises 239 | 240 | 1. 241 | ```{r} 242 | carats <- diamonds %>% 243 | mutate(carat = carat %>% as.integer()) 244 | 245 | carats %>% 246 | ggplot( 247 | aes( 248 | price, 249 | color = as.factor(carat) 250 | ) 251 | ) + 252 | geom_freqpoly() + 253 | facet_wrap(vars(cut), scales = "free_y") + 254 | labs(color = "carat") 255 | 256 | carats %>% 257 | ggplot(aes(price, color = cut)) + 258 | geom_freqpoly() + 259 | facet_wrap(vars(carat), scales = "free_y") 260 | ``` 261 | 262 | 2. 263 | ```{r} 264 | p <- diamonds %>% 265 | ggplot(aes(carat, price, color = color)) + 266 | geom_point() 267 | 268 | p 269 | 270 | p + 271 | facet_wrap(vars(color)) 272 | 273 | diamonds2 <- diamonds %>% 274 | select(-color) 275 | 276 | diamonds %>% 277 | ggplot(aes(carat, price)) + 278 | geom_point(data = diamonds2, color = "grey70") + 279 | geom_point(aes(color = color)) + 280 | facet_wrap(vars(color)) 281 | ``` 282 | 283 | 4. 284 | ```{r} 285 | mpg2 %>% 286 | ggplot(aes(displ, hwy)) + 287 | geom_point() + 288 | geom_smooth( 289 | data = mpg2 %>% select(-class), 290 | se = FALSE 291 | ) + 292 | facet_wrap(vars(class)) 293 | ``` 294 | 295 | ## 7.4 Linear Coordinate Systems 296 | 297 | ### 7.4.1 Zooming into a Plot with coord_cartesian() 298 | 299 | ```{r} 300 | base <- mpg %>% 301 | ggplot(aes(displ, hwy)) + 302 | geom_point() + 303 | geom_smooth() 304 | 305 | # Full dataset 306 | base 307 | 308 | # Scaling to 4--6 throws away data outside that range 309 | base + 310 | scale_x_continuous(limits = c(4, 6)) 311 | 312 | # Zooming to 4--6 keeps all the data but only shows some of it 313 | base + 314 | coord_cartesian(xlim = c(4, 6)) 315 | ``` 316 | 317 | ### 7.4.2 Flipping the Axes with coord_flip() 318 | 319 | ```{r} 320 | p <- mpg %>% 321 | ggplot(aes(displ, cty)) + 322 | geom_point() + 323 | geom_smooth() 324 | 325 | p 326 | 327 | # Exchanging cty and displ rotates the plot 90 degrees, # but the smooth is fit to the rotated data. 328 | mpg %>% 329 | ggplot(aes(cty, displ)) + 330 | geom_point() + 331 | geom_smooth() 332 | 333 | # coord_flip() fits the smooth to the original data, and then rotates the output 334 | p + 335 | coord_flip() 336 | ``` 337 | 338 | ## 7.5 Non-linear Coordinate Systems 339 | 340 | ```{r} 341 | rect <- tibble(x = 50, y = 50) 342 | line <- tibble(x = c(1, 200), y = c(100, 1)) 343 | 344 | base <- ggplot(mapping = aes(x, y)) + 345 | geom_tile( 346 | data = rect, 347 | aes(width = 50, height = 50) 348 | ) + 349 | geom_line(data = line) + 350 | labs( 351 | x = NULL, 352 | y = NULL 353 | ) 354 | 355 | base + 356 | labs(title = "base") 357 | 358 | base + 359 | coord_polar("x") + 360 | labs(title = 'coord_polar("x")') 361 | 362 | base + 363 | coord_polar("y") + 364 | labs(title = 'coord_polar("y")') 365 | 366 | base + 367 | coord_flip() + 368 | labs(title = "coord_flip") 369 | 370 | base + 371 | coord_trans(y = "log10") + 372 | labs(title = 'coord_trans(y = "log10")') 373 | 374 | base + 375 | coord_fixed() + 376 | labs(title = "coord_fixed()") 377 | ``` 378 | 379 | Munching: 380 | 381 | 1. We start with a line parameterised by its two endpoints: 382 | ```{r} 383 | tbl <- tibble( 384 | r = c(0, 1), 385 | theta = c(0, 3 / 2 * pi) 386 | ) 387 | 388 | tbl %>% 389 | ggplot(aes(r, theta)) + 390 | geom_line() + 391 | geom_point(size = 2, color = "red") 392 | ``` 393 | 394 | 2. We break it into multiple line segments, each with two endpoints: 395 | ```{r} 396 | interp <- function(rng, n) { 397 | seq(rng[1], rng[2], length = n) 398 | } 399 | 400 | munched <- tibble( 401 | r = interp(tbl$r, 15), 402 | theta = interp(tbl$theta, 15) 403 | ) 404 | 405 | munched %>% 406 | ggplot(aes(r, theta)) + 407 | geom_line() + 408 | geom_point(size = 2, color = "red") 409 | ``` 410 | 411 | 3. We transform the locations of each piece: 412 | ```{r} 413 | transformed <- transform( 414 | munched, 415 | x = r * sin(theta), 416 | y = r * cos(theta) 417 | ) 418 | 419 | transformed %>% 420 | ggplot(aes(x, y)) + 421 | geom_path() + 422 | geom_point(size = 2, color = "red") + 423 | coord_fixed() 424 | ``` 425 | 426 | ### 7.5.1 Transformations with coord_trans() 427 | 428 | ```{r, warning=FALSE} 429 | library(scales) 430 | 431 | # Linear model on original scale is poor fit 432 | base <- diamonds %>% 433 | ggplot(aes(carat, price)) + 434 | stat_bin2d() + 435 | geom_smooth(method = "lm") + 436 | labs( 437 | x = NULL, 438 | y = NULL 439 | ) + 440 | theme(legend.position = "none") 441 | 442 | base 443 | 444 | # Better fit on log scale, but harder to interpret 445 | base + 446 | scale_x_log10() + 447 | scale_y_log10() 448 | 449 | # Fit on log scale, then backtransform to original. 450 | # Highlights lack of expensive diamonds with large 451 | # carats 452 | pow10 <- exp_trans(10) 453 | 454 | base + 455 | scale_x_log10() + 456 | scale_y_log10() + 457 | coord_trans(x = pow10, y = pow10) 458 | ``` 459 | 460 | ### 7.5.2 Polar coordinates with coord_polar() 461 | 462 | ```{r} 463 | base <- mtcars %>% 464 | ggplot(aes(factor(1), fill = factor(cyl))) + 465 | geom_bar(width = 1) + 466 | theme(legend.position = "none") + 467 | scale_x_discrete(NULL, expand = c(0, 0)) + 468 | scale_y_continuous(NULL, expand = c(0, 0)) 469 | 470 | # Stacked barchart 471 | base 472 | 473 | # Pie chart 474 | base + 475 | coord_polar(theta = "y") 476 | 477 | # The bullseye chart 478 | base + 479 | coord_polar() 480 | ``` 481 | 482 | ### 7.5.3 Map Projections with coord_map() 483 | 484 | ```{r} 485 | # Prepare a map of NZ 486 | nzmap <- map_data("nz") %>% 487 | ggplot(aes(long, lat, group = group)) + 488 | geom_polygon(fill = "white", color = "black") + 489 | labs( 490 | x = NULL, 491 | y = NULL 492 | ) 493 | 494 | # Plot it in cartesian coordinates 495 | nzmap 496 | 497 | # With the aspect ratio approximation 498 | nzmap + 499 | # coord_quickmap -> coord_sf 500 | coord_sf() 501 | ``` 502 | 503 | ```{r} 504 | # Since coord_map() has been superseded by coord_sf, I've modified this code to use that instead 505 | world <- map_data("world") 506 | finland <- map_data("world", "finland") 507 | 508 | worldmap <- world %>% 509 | ggplot(aes(long, lat, group = group)) 510 | 511 | worldmap + 512 | geom_path() + 513 | geom_path(data = finland, color = "blue") + 514 | # coord_map -> coord_sf 515 | coord_sf() 516 | 517 | worldmap + 518 | geom_polygon(fill = "white", color = "black") + 519 | geom_polygon(data = finland, fill = "blue") + 520 | # coord_map -> coord_sf 521 | coord_sf() 522 | ``` 523 | -------------------------------------------------------------------------------- /ggplot2_2nd_ed/chapter_08_themes.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (2nd ed.), Chapter 8 - Themes" 3 | author: "Original Code: Hadley Wickham | Modifications: Antti Rask" 4 | date: "2023-01-20" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 8 Themes 13 | 14 | ```{r} 15 | library(tidyverse) 16 | ``` 17 | 18 | ## 8.1 Introduction 19 | 20 | ```{r} 21 | base <- mpg %>% 22 | ggplot(aes(cty, hwy, color = factor(cyl))) + 23 | geom_jitter() + 24 | geom_abline(color = "grey50", linewidth = 2) 25 | 26 | base 27 | 28 | labelled <- base + 29 | labs( 30 | x = "City mileage/gallon", 31 | y = "Highway mileage/gallon", 32 | color = "Cylinders", 33 | title = "Highway and city mileage are highly correlated" 34 | ) + 35 | scale_color_brewer(type = "seq", palette = "Spectral") 36 | 37 | labelled 38 | 39 | styled <- labelled + 40 | theme_bw() + 41 | theme( 42 | plot.title = element_text(face = "bold", size = 12), 43 | legend.background = element_rect( 44 | fill = "white", 45 | linewidth = 2, 46 | color = "grey70" 47 | ), 48 | legend.justification = c(0.99, 0), 49 | legend.position = c(0.95, 0.05), 50 | axis.ticks = element_line( 51 | color = "grey70", 52 | linewidth = 0.2 53 | ), 54 | panel.grid.major = element_line( 55 | color = "grey70", 56 | linewidth = 0.2 57 | ), 58 | panel.grid.minor = element_blank() 59 | ) 60 | 61 | styled 62 | ``` 63 | 64 | ## 8.2 Complete Themes 65 | 66 | ```{r} 67 | tbl <- tibble(x = 1:3, y = 1:3) 68 | 69 | base <- tbl %>% 70 | ggplot(aes(x, y)) + 71 | geom_point() 72 | 73 | base + 74 | theme_grey() + 75 | ggtitle("theme_grey()") 76 | 77 | base + 78 | theme_bw() + 79 | ggtitle("theme_bw()") 80 | 81 | base + 82 | theme_linedraw() + 83 | ggtitle("theme_linedraw()") 84 | 85 | base + 86 | theme_light() + 87 | ggtitle("theme_light()") 88 | 89 | base + 90 | theme_dark() + 91 | ggtitle("theme_dark()") 92 | 93 | base + 94 | theme_minimal() + 95 | ggtitle("theme_minimal()") 96 | 97 | base + 98 | theme_classic() + 99 | ggtitle("theme_classic()") 100 | 101 | base + 102 | theme_void() + 103 | ggtitle("theme_void()") 104 | ``` 105 | 106 | ```{r} 107 | library(ggthemes) 108 | 109 | base + 110 | theme_tufte() + 111 | ggtitle("theme_tufte()") 112 | 113 | base + 114 | theme_solarized() + 115 | ggtitle("theme_solarized()") 116 | 117 | base + 118 | theme_excel() + 119 | ggtitle("theme_excel()") 120 | ``` 121 | 122 | ### 8.2.1 Exercises 123 | 124 | 1. 125 | ```{r} 126 | base + 127 | theme_base() + 128 | ggtitle("theme_base()") 129 | 130 | base + 131 | theme_calc() + 132 | ggtitle("theme_calc()") 133 | 134 | base + 135 | theme_clean() + 136 | ggtitle("theme_clean()") 137 | 138 | base + 139 | theme_economist() + 140 | ggtitle("theme_economist()") 141 | 142 | base + 143 | theme_economist_white() + 144 | ggtitle("theme_economist_white()") 145 | 146 | base + 147 | theme_excel_new() + 148 | ggtitle("theme_excel_new()") 149 | 150 | base + 151 | theme_few() + 152 | ggtitle("theme_few()") 153 | 154 | base + 155 | theme_fivethirtyeight() + 156 | ggtitle("theme_fivethirtyeight()") 157 | 158 | base + 159 | theme_foundation() + 160 | ggtitle("theme_foundation()") 161 | 162 | base + 163 | theme_gdocs() + 164 | ggtitle("theme_gdocs()") 165 | 166 | base + 167 | theme_hc() + 168 | ggtitle("theme_hc()") 169 | 170 | base + 171 | theme_igray() + 172 | ggtitle("theme_igray()") 173 | 174 | nzmap <- map_data("nz") %>% 175 | ggplot(aes(long, lat, group = group)) + 176 | geom_polygon(fill = "white", color = "black") + 177 | labs( 178 | x = NULL, 179 | y = NULL 180 | ) 181 | 182 | nzmap + 183 | coord_sf() + 184 | theme_map() + 185 | ggtitle("theme_map()") 186 | 187 | base + 188 | theme_pander() + 189 | ggtitle("theme_pander()") 190 | 191 | base + 192 | theme_par() + 193 | ggtitle("theme_par()") 194 | 195 | base + 196 | theme_solarized_2() + 197 | ggtitle("theme_solarized_2()") 198 | 199 | base + 200 | theme_solid() + 201 | ggtitle("theme_solid()") 202 | 203 | base + 204 | theme_stata() + 205 | ggtitle("theme_stata()") 206 | 207 | base + 208 | theme_wsj() + 209 | ggtitle("theme_wsj()") 210 | ``` 211 | 212 | ## 8.3 Modifying Theme Components 213 | 214 | ```{r} 215 | base_t <- base + 216 | labs( 217 | title = "This is a ggplot", 218 | x = NULL, 219 | y = NULL 220 | ) 221 | 222 | base_t + 223 | theme(plot.title = element_text(size = 16)) 224 | 225 | base_t + 226 | theme( 227 | plot.title = element_text( 228 | face = "bold", 229 | color = "red" 230 | ) 231 | ) 232 | 233 | base_t + 234 | theme(plot.title = element_text(hjust = 1)) 235 | 236 | base_t + 237 | theme(plot.title = element_text(margin = margin())) 238 | 239 | base_t + 240 | theme( 241 | plot.title = element_text( 242 | margin = margin(t = 10, b = 10) 243 | ) 244 | ) 245 | 246 | base_t + 247 | theme( 248 | axis.title.y = element_text( 249 | margin = margin(r = 10) 250 | ) 251 | ) 252 | ``` 253 | 254 | ```{r} 255 | base + 256 | theme(panel.grid.major = element_line(color = "black")) 257 | 258 | base + 259 | theme(panel.grid.major = element_line(linewidth = 2)) 260 | 261 | base + 262 | theme( 263 | panel.grid.major = element_line( 264 | linetype = "dotted" 265 | ) 266 | ) 267 | ``` 268 | 269 | ```{r} 270 | base + 271 | theme(plot.background = element_rect(fill = "grey80", color = NA)) 272 | 273 | base + 274 | theme(plot.background = element_rect(color = "red", linewidth = 2)) 275 | 276 | base + 277 | theme(panel.background = element_rect(fill = "linen")) 278 | ``` 279 | 280 | ```{r} 281 | base 282 | 283 | base + 284 | theme(panel.grid.minor = element_blank()) 285 | 286 | base + 287 | theme(panel.grid.major = element_blank()) 288 | ``` 289 | 290 | ```{r} 291 | base + 292 | theme(panel.background = element_blank()) 293 | 294 | base + 295 | theme( 296 | axis.title.x = element_blank(), 297 | axis.title.y = element_blank() 298 | ) 299 | 300 | base + 301 | theme(axis.line = element_line(color = "grey50")) 302 | ``` 303 | 304 | ```{r} 305 | old_theme <- theme_update( 306 | plot.background = element_rect(fill = "lightblue3", color = NA), 307 | panel.background = element_rect(fill = "lightblue", color = NA), 308 | axis.text = element_text(color = "linen"), 309 | axis.title = element_text(color = "linen") 310 | ) 311 | 312 | base 313 | 314 | theme_set(old_theme) 315 | 316 | base 317 | ``` 318 | 319 | ## 8.4 Theme Elements 320 | 321 | ### 8.4.1 Plot Elements 322 | 323 | ```{r} 324 | base + 325 | theme(plot.background = element_rect(color = "grey50", linewidth = 2)) 326 | 327 | base + 328 | theme( 329 | plot.background = element_rect(color = "grey50", linewidth = 2), 330 | plot.margin = margin(2, 2, 2, 2) 331 | ) 332 | 333 | base + 334 | theme(plot.background = element_rect(fill = "lightblue")) 335 | ``` 336 | 337 | ### 8.4.2 Axis Elements 338 | 339 | ```{r} 340 | tbl <- tibble(x = 1:3, y = 1:3) 341 | 342 | base <- tbl %>% 343 | ggplot(aes(x, y)) + 344 | geom_point() 345 | 346 | # Accentuate the axes 347 | base + 348 | theme(axis.line = element_line(color = "grey50", linewidth = 1)) 349 | 350 | # Style both x and y axis labels 351 | base + 352 | theme(axis.text = element_text(color = "blue", size = 12)) 353 | 354 | # Useful for long labels 355 | base + 356 | theme(axis.text.x = element_text(angle = -90, vjust = 0.5)) 357 | ``` 358 | 359 | ```{r} 360 | tbl <- tibble( 361 | x = c("label", "a long label", "an even longer label"), 362 | y = 1:3 363 | ) 364 | 365 | base <- tbl %>% 366 | ggplot(aes(x, y)) + 367 | geom_point() 368 | 369 | base 370 | 371 | base + 372 | theme(axis.text.x = element_text(angle = -30, vjust = 1, hjust = 0)) + 373 | labs( 374 | x = NULL, 375 | y = NULL 376 | ) 377 | 378 | # My favorite alternative, using coord_flip() 379 | base + 380 | labs( 381 | x = NULL, 382 | y = NULL 383 | ) + 384 | coord_flip() 385 | ``` 386 | 387 | ### 8.4.3 Legends Elements 388 | 389 | ```{r} 390 | tbl <- tibble(x = 1:4, y = 1:4, z = rep(c("a", "b"), each = 2)) 391 | 392 | base <- tbl %>% 393 | ggplot(aes(x, y, color = z)) + 394 | geom_point() 395 | 396 | base + 397 | theme( 398 | legend.background = element_rect( 399 | fill = "lemonchiffon", 400 | color = "grey50", 401 | linewidth = 1 402 | ) 403 | ) 404 | 405 | base + 406 | theme( 407 | legend.key = element_rect(color = "grey50"), 408 | legend.key.width = unit(0.9, "cm"), 409 | legend.key.height = unit(0.75, "cm") 410 | ) 411 | 412 | base + 413 | theme( 414 | legend.text = element_text(size = 15), 415 | legend.title = element_text(size = 15, face = "bold") 416 | ) 417 | ``` 418 | 419 | ### 8.4.4 Panel Elements 420 | 421 | ```{r} 422 | base <- tbl %>% 423 | ggplot(aes(x, y)) + 424 | geom_point() 425 | 426 | # Modify background 427 | base + 428 | theme(panel.background = element_rect(fill = "lightblue")) 429 | 430 | # Tweak major grid lines 431 | base + 432 | theme(panel.grid.major = element_line(color = "gray60", linewidth = 0.8)) 433 | 434 | # Just in one direction 435 | base + 436 | theme(panel.grid.major.x = element_line(color = "gray60", linewidth = 0.8)) 437 | ``` 438 | 439 | ```{r} 440 | base2 <- base + 441 | theme(plot.background = element_rect(color = "grey50")) 442 | 443 | # Wide screen 444 | base2 + 445 | theme(aspect.ratio = 9 / 16) 446 | 447 | # Long and skiny 448 | base2 + 449 | theme(aspect.ratio = 2 / 1) 450 | 451 | # Square 452 | base2 + 453 | theme(aspect.ratio = 1) 454 | ``` 455 | 456 | ### 8.4.5 Facetting Elements 457 | 458 | ```{r} 459 | tbl <- tibble(x = 1:4, y = 1:4, z = c("a", "a", "b", "b")) 460 | 461 | base_f <- tbl %>% 462 | ggplot(aes(x, y)) + 463 | geom_point() + 464 | facet_wrap(vars(z)) 465 | 466 | base_f 467 | 468 | base_f + 469 | theme(panel.spacing = unit(0.5, "in")) 470 | 471 | base_f + 472 | theme( 473 | strip.background = element_rect( 474 | fill = "grey20", 475 | color = "grey80", 476 | linewidth = 1 477 | ), 478 | strip.text = element_text(color = "white") 479 | ) 480 | ``` 481 | 482 | ### 8.4.6 Exercises 483 | 484 | 2. 485 | ```{r} 486 | p <- base + 487 | theme_dark() + 488 | ggtitle("theme_dark()") 489 | 490 | p 491 | 492 | p + theme( 493 | plot.background = element_rect(fill = "black"), 494 | plot.title = element_text(color = "gray"), 495 | axis.title = element_text(color = "gray"), 496 | axis.text = element_text(color = "gray"), 497 | ) 498 | ``` 499 | 500 | ## 8.5 Saving Your Output 501 | 502 | ```{r} 503 | pdf("output/output.pdf", width = 6, height = 6) 504 | 505 | p <- mpg %>% 506 | ggplot(aes(displ, cty)) + 507 | geom_point() 508 | 509 | p 510 | 511 | dev.off() 512 | ``` 513 | 514 | ```{r} 515 | p 516 | 517 | ggsave("output/output2.pdf") 518 | ``` 519 | -------------------------------------------------------------------------------- /ggplot2_2nd_ed/chapter_09_data_analysis.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (2nd ed.), Chapter 9 - Data Analysis" 3 | author: "Original Code: Hadley Wickham | Modifications: Antti Rask" 4 | date: "2023-01-21" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 9 Data Analysis 13 | 14 | ```{r} 15 | library(tidyverse) 16 | library(conflicted) 17 | conflict_prefer("filter", "dplyr") 18 | conflict_prefer("select", "dplyr") 19 | ``` 20 | 21 | ## 9.2 Tidy Data 22 | 23 | Since there was no ec2 dataset ready (as it says in the book, it was derived from the economics dataset, the book just didn't show how), I thought we should start by creating it, using the economics dataset as a starting point. 24 | 25 | ```{r} 26 | library(lubridate) 27 | 28 | ec2 <- economics %>% 29 | filter(date >= "2006-01-01") %>% 30 | separate( 31 | date, 32 | c("year", "month"), 33 | sep = "-", 34 | extra = "drop" 35 | ) %>% 36 | select(year, month, uempmed) %>% 37 | pivot_wider( 38 | names_from = year, 39 | values_from = uempmed 40 | ) %>% 41 | mutate(month = month %>% as.numeric()) 42 | 43 | ec2 44 | ``` 45 | 46 | ## 9.3 pivot_longer() and pivot_wider() 47 | 48 | This was actually called Spread and Gather in the 2nd edition. However, since they have been replaced by pivot_longer and pivot_wider, I thought it would make sense to rename these sub chapters. 49 | 50 | If you want to read more about this, you can check out these two: 51 | pivot_longer(): https://tidyr.tidyverse.org/reference/pivot_longer.html 52 | pivot_wider(): https://tidyr.tidyverse.org/reference/pivot_wider.html 53 | 54 | ### 9.3.1 pivot_longer() 55 | 56 | ```{r} 57 | ec2 %>% 58 | pivot_longer( 59 | cols = `2006`:`2015`, 60 | names_to = "year", 61 | values_to = "unemp" 62 | ) 63 | 64 | ec2 %>% 65 | pivot_longer( 66 | cols = -month, 67 | names_to = "year", 68 | values_to = "unemp" 69 | ) 70 | 71 | economics_2 <- ec2 %>% 72 | pivot_longer( 73 | cols = -month, 74 | names_to = "year", 75 | values_to = "rate", 76 | names_transform = list(year = as.integer), 77 | values_drop_na = TRUE 78 | ) 79 | 80 | economics_2 81 | ``` 82 | 83 | ```{r} 84 | economics_2 %>% 85 | ggplot(aes(year + (month - 1) / 12, rate)) + 86 | geom_line() 87 | 88 | economics_2 %>% 89 | ggplot(aes(month, rate, group = year, color = year)) + 90 | geom_line(linewidth = 1) 91 | ``` 92 | 93 | ### 9.3.2 pivot_wider() 94 | 95 | ```{r} 96 | weather <- tibble( 97 | day = rep(1:3, 2), 98 | obs = rep(c("temp", "rain"), each = 3), 99 | val = c(c(23, 22, 20), c(0, 0, 5)) 100 | ) 101 | 102 | weather 103 | ``` 104 | 105 | ```{r} 106 | weather %>% 107 | pivot_wider( 108 | names_from = obs, 109 | values_from = val 110 | ) 111 | ``` 112 | 113 | ## 9.4 Separate and Unite 114 | 115 | ```{r} 116 | trt <- tibble( 117 | # There's perhaps a temptation to switch paste0 to str_c. In this case that would be a mistake 118 | var = paste0( 119 | rep(c("beg", "end"), each = 3), "_", rep(c("a", "b", "c")) 120 | ), 121 | val = c(1, 4, 2, 10, 5, 11) 122 | ) 123 | 124 | trt 125 | ``` 126 | 127 | ```{r} 128 | trt %>% 129 | separate( 130 | var, 131 | c("time", "treatment"), 132 | "_" 133 | ) 134 | ``` 135 | 136 | ## 9.5 Case Studies 137 | 138 | ### 9.5.1 Blood Pressure 139 | 140 | ```{r} 141 | bpd <- read_table( 142 | "name age start week1 week2 week3 143 | Anne 35 2014-03-27 100/80 100/75 120/90 144 | Ben 41 2014-03-09 110/65 100/65 135/70 145 | Carl 33 2014-04-02 125/80 ", 146 | na = "" 147 | ) 148 | 149 | bpd 150 | ``` 151 | 152 | ```{r} 153 | bpd_1 <- bpd %>% 154 | pivot_longer( 155 | starts_with("week"), 156 | names_to = "week", 157 | values_to = "bp" 158 | ) 159 | 160 | bpd_1 161 | ``` 162 | 163 | ```{r} 164 | bpd_2 <- bpd_1 %>% 165 | separate(bp, c("sys", "dia"), "/") 166 | 167 | bpd_2 168 | ``` 169 | 170 | ```{r} 171 | bpd_3 <- bpd_2 %>% 172 | extract( 173 | week, 174 | "week", 175 | "(\\d)", 176 | convert = TRUE 177 | ) %>% 178 | arrange(name, week) 179 | 180 | bpd_3 181 | ``` 182 | 183 | ### 9.5.2 Test Scores 184 | 185 | ```{r} 186 | scores <- tibble( 187 | person = rep(c("Greg", "Sally", "Sue"), each = 2), 188 | time = rep(c("pre", "post"), 3), 189 | test1 = round(rnorm(6, mean = 80, sd = 4), 0), 190 | test2 = round(jitter(test1, 15), 0) 191 | ) 192 | 193 | scores 194 | ``` 195 | 196 | ```{r} 197 | scores_1 <- scores %>% 198 | pivot_longer( 199 | cols = starts_with("test"), 200 | names_to = "test", 201 | values_to = "score" 202 | ) 203 | 204 | scores_1 205 | ``` 206 | 207 | ```{r} 208 | scores_2 <- scores_1 %>% 209 | pivot_wider( 210 | names_from = time, 211 | values_from = score 212 | ) 213 | 214 | scores_2 215 | ``` 216 | 217 | ```{r} 218 | scores_3 <- scores_2 %>% 219 | mutate(diff = post - pre) 220 | 221 | scores_3 %>% 222 | ggplot(aes(person, diff, color = test)) + 223 | geom_hline( 224 | linewidth = 2, 225 | color = "white", 226 | yintercept = 0 227 | ) + 228 | geom_point() + 229 | geom_path( 230 | aes(group = person), 231 | color = "grey50", 232 | arrow = arrow(length = unit(0.25, "cm")) 233 | ) 234 | ``` 235 | -------------------------------------------------------------------------------- /ggplot2_2nd_ed/chapter_10_data_transformation.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (2nd ed.), Chapter 10 - Data Transformation" 3 | author: "Original Code: Hadley Wickham | Modifications: Antti Rask" 4 | date: "2023-01-21" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 10 Data Transformation 13 | 14 | ```{r} 15 | library(tidyverse) 16 | library(conflicted) 17 | conflict_prefer("filter", "dplyr") 18 | conflict_prefer("select", "dplyr") 19 | ``` 20 | 21 | ## 10.2 Filter Observations 22 | 23 | ```{r} 24 | diamonds %>% 25 | ggplot(aes(x, y)) + 26 | geom_bin2d() 27 | ``` 28 | 29 | 30 | ```{r} 31 | diamonds %>% 32 | filter(x == 0 | y == 0) 33 | ``` 34 | 35 | ```{r} 36 | diamonds_ok <- diamonds %>% 37 | filter(x > 0, y > 0, y < 20) 38 | 39 | diamonds_ok %>% 40 | ggplot(aes(x, y)) + 41 | geom_bin2d() + 42 | geom_abline(slope = 1, color = "white", linewidth = 1, alpha = 0.5) 43 | ``` 44 | 45 | ### 10.2.2 Missing Values 46 | 47 | ```{r} 48 | x <- c(1, NA, 2) 49 | 50 | x == 1 51 | 52 | x > 2 53 | 54 | x + 10 55 | ``` 56 | 57 | ```{r} 58 | # Don't do this! 59 | x == NA 60 | x != NA 61 | 62 | # Do this instead! 63 | is.na(x) 64 | ``` 65 | 66 | ### 10.2.3 Exercises 67 | 68 | 1. 69 | ```{r} 70 | diamonds %>% 71 | filter(x == y) 72 | 73 | diamonds %>% 74 | filter(depth %>% between(55, 70)) %>% 75 | arrange(desc(depth)) 76 | 77 | diamonds %>% 78 | filter(carat < carat %>% median()) %>% 79 | arrange(desc(carat)) 80 | 81 | diamonds %>% 82 | filter(price / carat > 10000) %>% 83 | mutate(price_per_carat = price / carat) %>% 84 | arrange(price_per_carat) 85 | 86 | diamonds %>% 87 | filter(cut >= "Good") 88 | ``` 89 | 90 | 2. 91 | ```{r} 92 | x <- c(TRUE, FALSE, NA) 93 | 94 | x 95 | 96 | !x 97 | 98 | is.na(x) 99 | 100 | !is.na(x) 101 | 102 | x | is.na(x) 103 | 104 | !x | is.na(x) 105 | ``` 106 | 107 | 3. 108 | ```{r} 109 | diamonds %>% 110 | ggplot(aes(x, z)) + 111 | geom_bin2d() 112 | 113 | diamonds %>% 114 | ggplot(aes(y, z)) + 115 | geom_bin2d() 116 | 117 | diamonds_ok <- diamonds %>% 118 | filter(x > 0, y > 0, z > 0, y < 20, z < 10) 119 | 120 | diamonds_ok %>% 121 | ggplot(aes(x, z)) + 122 | geom_bin2d() + 123 | geom_smooth( 124 | method = "lm", 125 | se = FALSE, 126 | color = "white", 127 | linewidth = 1, 128 | alpha = 0.5 129 | ) 130 | 131 | diamonds_ok %>% 132 | ggplot(aes(y, z)) + 133 | geom_bin2d() + 134 | geom_smooth( 135 | method = "lm", 136 | se = FALSE, 137 | color = "white", 138 | linewidth = 1, 139 | alpha = 0.5 140 | ) 141 | ``` 142 | 143 | 4. 144 | ```{r} 145 | library(ggplot2movies) 146 | 147 | movies 148 | 149 | movies %>% 150 | summary() 151 | 152 | movies %>% 153 | ggplot(aes(year, color = is.na(budget))) + 154 | geom_freqpoly() 155 | 156 | movies %>% 157 | ggplot(aes(rating, color = is.na(budget))) + 158 | geom_freqpoly() 159 | ``` 160 | 161 | 5. 162 | ```{r} 163 | NA & FALSE 164 | 165 | NA | TRUE 166 | 167 | NA * 0 168 | 169 | NA ^ 0 170 | ``` 171 | 172 | ## 10.3 Create New Variables 173 | 174 | ```{r} 175 | diamonds_ok2 <- diamonds_ok %>% 176 | mutate( 177 | sym = x - y, 178 | size = sqrt(x ^ 2 + y ^ 2) 179 | ) 180 | 181 | diamonds_ok2 182 | ``` 183 | 184 | ```{r} 185 | diamonds_ok2 %>% 186 | ggplot(aes(size, sym)) + 187 | stat_bin2d() 188 | 189 | diamonds_ok2 %>% 190 | ggplot(aes(abs(sym))) + 191 | geom_histogram(binwidth = 0.1) 192 | 193 | diamonds_ok2 %>% 194 | filter(abs(sym) < 0.2) %>% 195 | ggplot(aes(abs(sym))) + 196 | geom_histogram(binwidth = 0.01) 197 | ``` 198 | 199 | ### 10.3.2 Exercises 200 | 201 | 1. 202 | ```{r} 203 | diamonds %>% 204 | mutate( 205 | volume = x * y * z * (2/3), 206 | density = carat * 0.2 / volume 207 | ) 208 | 209 | diamonds %>% 210 | mutate(price_per_carat = price / carat) 211 | 212 | diamonds %>% 213 | mutate( 214 | lcarat = log(carat), 215 | lprice = log(price) 216 | ) 217 | ``` 218 | 219 | 2. 220 | ```{r} 221 | diamonds %>% 222 | ggplot(aes(x, z)) + 223 | stat_bin2d() 224 | 225 | diamonds %>% 226 | ggplot(aes(log(x), log(z))) + 227 | stat_bin2d() 228 | ``` 229 | 230 | 3. 231 | ```{r} 232 | diamonds %>% 233 | mutate( 234 | depth_real = round(z / ((x + y) / 2) * 100), 235 | depth_diff = depth - depth_real 236 | ) %>% 237 | select(depth, depth_real, depth_diff) %>% 238 | filter(depth_real < 200) %>% 239 | ggplot(aes(depth_real, depth)) + 240 | geom_point() 241 | ``` 242 | 243 | 4. 244 | ```{r} 245 | diamonds %>% 246 | filter(x > y) %>% 247 | ggplot(aes(x)) + 248 | geom_histogram(binwidth = 0.2) 249 | 250 | diamonds %>% 251 | filter(y > x) %>% 252 | ggplot(aes(x)) + 253 | geom_histogram(binwidth = 0.2) 254 | ``` 255 | 256 | ## 10.4 Group-wise Summaries 257 | 258 | ```{r} 259 | sum_clarity <- diamonds %>% 260 | summarize( 261 | price = mean(price), 262 | .by = clarity 263 | ) 264 | 265 | sum_clarity 266 | ``` 267 | 268 | ```{r} 269 | p <- sum_clarity %>% 270 | ggplot(aes(clarity, price)) 271 | 272 | p + 273 | geom_col() 274 | 275 | p + 276 | geom_point() 277 | ``` 278 | 279 | ```{r} 280 | # It's clearer to tell explicitly with the .by argument that this summarization has been made with two groups (cut and depth) and the next one with one (cut) 281 | 282 | cut_depth <- diamonds %>% 283 | summarize( 284 | n = n(), 285 | .by = c(cut, depth) 286 | ) %>% 287 | filter(between(depth, 55, 70)) 288 | 289 | cut_depth 290 | ``` 291 | 292 | ```{r} 293 | cut_depth %>% 294 | ggplot(aes(depth, n, color = cut)) + 295 | geom_line() 296 | ``` 297 | 298 | ```{r} 299 | cut_depth %>% 300 | mutate( 301 | prop = n / sum(n), 302 | .by = cut 303 | ) %>% 304 | ggplot(aes(depth, prop, color = cut)) + 305 | geom_line() 306 | ``` 307 | 308 | ### 10.4.1 Useful Tools 309 | 310 | ```{r} 311 | diamonds %>% 312 | summarize( 313 | n_big = sum(carat >= 4), 314 | prop_cheap = mean(price < 1000) 315 | ) 316 | ``` 317 | 318 | ### 10.4.2 Statistical Considerations 319 | 320 | ```{r} 321 | by_clarity <- diamonds %>% 322 | summarize( 323 | n = n(), 324 | mean = mean(price), 325 | lq = quantile(price, 0.25), 326 | uq = quantile(price, 0.75), 327 | .by = clarity 328 | ) 329 | 330 | by_clarity 331 | ``` 332 | 333 | ```{r} 334 | by_clarity %>% 335 | ggplot(aes(clarity, mean)) + 336 | geom_linerange(aes(ymin = lq, ymax = uq)) + 337 | geom_point(aes(size = n)) 338 | ``` 339 | 340 | ```{r} 341 | data(Batting, package = "Lahman") 342 | 343 | Batting 344 | 345 | ba <- Batting %>% 346 | filter(AB > 0) %>% 347 | summarize( 348 | ba = sum(H, na.rm = TRUE) / sum(AB, na.rm = TRUE), 349 | .by = playerID 350 | ) 351 | 352 | ba 353 | ``` 354 | 355 | ```{r} 356 | ba %>% 357 | ggplot(aes(ba)) + 358 | geom_histogram(binwidth = 0.01) 359 | ``` 360 | 361 | ```{r} 362 | ba <- Batting %>% 363 | filter(AB > 0) %>% 364 | summarize( 365 | ba = sum(H, na.rm = TRUE) / sum(AB, na.rm = TRUE), 366 | ab = sum(AB, na.rm = TRUE), 367 | .by = playerID 368 | ) 369 | 370 | ba %>% 371 | ggplot(aes(ab, ba)) + 372 | geom_bin2d(bins = 100) + 373 | geom_smooth() 374 | 375 | ba %>% 376 | filter(ab >= 10) %>% 377 | ggplot(aes(ab, ba)) + 378 | geom_bin2d() + 379 | geom_smooth() 380 | ``` 381 | 382 | ### 10.4.3 Exercises 383 | 384 | 1. 385 | ```{r} 386 | movies_with_budget <- movies %>% 387 | mutate( 388 | budget_known = case_when( 389 | is.na(budget) ~ FALSE, 390 | TRUE ~ TRUE 391 | ) 392 | ) %>% 393 | summarize( 394 | has_budget = sum(budget_known), 395 | total = n(), 396 | prop_no_budget = 1 - has_budget / total, 397 | .by = year 398 | ) 399 | 400 | movies_with_budget %>% 401 | ggplot(aes(year, prop_no_budget)) + 402 | geom_area() 403 | ``` 404 | 405 | 2. 406 | ```{r} 407 | by_avg_length <- movies %>% 408 | summarize( 409 | n = n(), 410 | mean = mean(length), 411 | lq = quantile(length, 0.25), 412 | uq = quantile(length, 0.75), 413 | .by = year 414 | ) 415 | 416 | by_avg_length 417 | 418 | by_avg_length %>% 419 | ggplot(aes(year, mean)) + 420 | geom_linerange(aes(ymin = lq, ymax = uq)) + 421 | geom_point() 422 | ``` 423 | 424 | 3. 425 | ```{r} 426 | diamonds2 <- diamonds %>% 427 | summarize( 428 | number = n(), 429 | avg_price = mean(price), 430 | avg_size = mean(depth), 431 | .by = cut 432 | ) 433 | 434 | diamonds2 435 | 436 | # By cut 437 | diamonds %>% 438 | ggplot(aes(cut)) + 439 | geom_bar() + 440 | ggtitle("Number of diamonds in each cut") 441 | 442 | diamonds2 %>% 443 | ggplot(aes(cut, avg_price)) + 444 | geom_bar(stat = "identity") + 445 | ggtitle("Average Price of Each Cut") 446 | 447 | diamonds2 %>% 448 | ggplot(aes(cut, avg_size)) + 449 | geom_bar(stat = "identity") + 450 | ggtitle("Average Size of Each Cut") 451 | 452 | # By color 453 | diamonds3 <- diamonds %>% 454 | summarize( 455 | number = n(), 456 | avg_price = mean(price), 457 | avg_size = mean(depth), 458 | .by = color 459 | ) 460 | 461 | diamonds %>% 462 | ggplot(aes(color)) + 463 | geom_bar() + 464 | ggtitle("Number of diamonds in Each Color") 465 | 466 | diamonds3 %>% 467 | ggplot(aes(color, avg_price)) + 468 | geom_bar(stat = "identity") + 469 | ggtitle("Average Price of Each Color") 470 | 471 | diamonds3 %>% 472 | ggplot(aes(color, avg_size)) + 473 | geom_bar(stat = "identity") + 474 | ggtitle("Average Size of Each Color") 475 | 476 | # By clarity 477 | 478 | diamonds4 <- diamonds %>% 479 | summarise( 480 | number = n(), 481 | avg_price = mean(price), 482 | avg_size = mean(depth), 483 | .by = clarity 484 | ) 485 | 486 | diamonds %>% 487 | ggplot(aes(clarity)) + 488 | geom_bar() + 489 | ggtitle("Number of diamonds in Each Clarity") 490 | 491 | diamonds4 %>% 492 | ggplot(aes(clarity, avg_price)) + 493 | geom_bar(stat = "identity") + 494 | ggtitle("Average Price of Each Clarity") 495 | 496 | diamonds4 %>% 497 | ggplot(aes(clarity, avg_size)) + 498 | geom_bar(stat = "identity") + 499 | ggtitle("Average Size of Each Clarity") 500 | ``` 501 | 502 | 4. 503 | ```{r} 504 | # Version 1 505 | diamonds %>% 506 | group_by( 507 | cut = cut( 508 | carat, 509 | seq( 510 | min(carat), 511 | max(carat), 512 | by = 0.1 513 | ) 514 | ) 515 | ) %>% 516 | summarize(count = n()) %>% 517 | ungroup() %>% 518 | ggplot(aes(cut, count)) + 519 | geom_bar(stat = "identity") 520 | 521 | 522 | 523 | # Version 2 524 | n <- round((max(diamonds$carat) - min(diamonds$carat)) / 0.1) 525 | 526 | diamonds %>% 527 | group_by(cut = cut_interval(carat, n = n, width = 0.1)) %>% 528 | summarize(count = n()) %>% 529 | ungroup %>% 530 | ggplot(aes(cut, count)) + 531 | geom_bar(stat = "identity") 532 | 533 | # Original 534 | diamonds %>% 535 | ggplot(aes(carat)) + 536 | geom_histogram(binwidth = 0.1) 537 | ``` 538 | 539 | ## 10.5 Transformation Pipelines 540 | 541 | ```{r} 542 | # Don't do this! 543 | cut_depth <- group_by(diamonds, cut, depth) 544 | cut_depth <- summarize(cut_depth, n = n()) 545 | cut_depth <- filter(cut_depth, depth > 55, depth < 70) 546 | cut_depth <- mutate(cut_depth, prop = n / sum(n)) %>% 547 | ungroup() 548 | 549 | cut_depth 550 | 551 | # Don't do this! 552 | mutate( 553 | filter( 554 | summarize( 555 | group_by( 556 | diamonds, 557 | cut, 558 | depth 559 | ), 560 | n = n() 561 | ), 562 | depth > 55, 563 | depth < 70 564 | ), 565 | prop = n / sum(n) 566 | ) %>% 567 | ungroup() 568 | 569 | # Do this instead! 570 | cut_depth_2 <- diamonds %>% 571 | group_by(cut, depth) %>% 572 | summarize(n = n()) %>% 573 | filter(depth > 55, depth < 70) %>% 574 | mutate(prop = n / sum(n)) %>% 575 | ungroup() 576 | 577 | cut_depth_2 578 | ``` 579 | 580 | ### 10.5.1 Exercises 581 | 582 | 2. 583 | ```{r} 584 | library(magrittr) 585 | 586 | x <- runif(100) 587 | 588 | x %>% 589 | subtract(mean(.)) %>% 590 | raise_to_power(2) %>% 591 | mean() %>% 592 | sqrt() 593 | ``` 594 | -------------------------------------------------------------------------------- /ggplot2_2nd_ed/chapter_11_modeling_for_visualization.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (2nd ed.), Chapter 11 - Modeling for Visualization" 3 | author: "Original Code: Hadley Wickham | Modifications: Antti Rask" 4 | date: "2023-01-21" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 11 Modeling for Visualization 13 | 14 | ```{r} 15 | library(tidyverse) 16 | library(conflicted) 17 | conflict_prefer("filter", "dplyr") 18 | conflict_prefer("select", "dplyr") 19 | ``` 20 | 21 | ## 11.2 Removing Trend 22 | 23 | ```{r} 24 | diamonds2 <- diamonds %>% 25 | filter(carat <= 2) %>% 26 | mutate( 27 | lcarat = log2(carat), 28 | lprice = log2(price) 29 | ) 30 | 31 | diamonds2 32 | ``` 33 | 34 | ```{r} 35 | diamonds2 %>% 36 | ggplot(aes(lcarat, lprice)) + 37 | geom_bin2d() + 38 | geom_smooth( 39 | method = "lm", 40 | se = FALSE, 41 | linewidth = 2, 42 | color = "yellow" 43 | ) 44 | ``` 45 | 46 | ```{r} 47 | mod <- lm(lprice ~ lcarat, data = diamonds2) 48 | 49 | mod %>% 50 | summary() %>% 51 | coef() 52 | ``` 53 | 54 | ```{r} 55 | diamonds3 <- diamonds2 %>% 56 | mutate(rel_price = mod %>% resid()) 57 | 58 | diamonds3 %>% 59 | ggplot(aes(carat, rel_price)) + 60 | geom_bin2d() 61 | ``` 62 | 63 | ```{r} 64 | xgrid <- seq(-2, 1, by = 1/3) 65 | 66 | tibble(logx = xgrid, x = round(2 ^ xgrid, 2)) 67 | ``` 68 | 69 | ```{r} 70 | color_cut <- diamonds3 %>% 71 | summarize( 72 | price = mean(price), 73 | rel_price = mean(rel_price), 74 | .by = c(color, cut) 75 | ) 76 | 77 | color_cut 78 | ``` 79 | 80 | ```{r} 81 | color_cut %>% 82 | ggplot(aes(color, price, color = cut)) + 83 | geom_line(aes(group = cut)) + 84 | geom_point() 85 | ``` 86 | 87 | ```{r} 88 | color_cut %>% 89 | ggplot(aes(color, rel_price, color = cut)) + 90 | geom_line(aes(group = cut)) + 91 | geom_point() 92 | ``` 93 | 94 | ### 11.2.1 Exercises 95 | 96 | 1. 97 | ```{r} 98 | diamonds2 <- diamonds %>% 99 | mutate( 100 | lcarat = log2(carat), 101 | lprice = log2(price) 102 | ) 103 | 104 | diamonds2 %>% 105 | ggplot(aes(lcarat, lprice)) + 106 | geom_bin2d() + 107 | geom_smooth( 108 | method = "lm", 109 | se = FALSE, 110 | linewidth = 2, 111 | color = "yellow" 112 | ) 113 | 114 | mod <- lm(lprice ~ lcarat, data = diamonds2) 115 | 116 | mod %>% 117 | summary() %>% 118 | coef() 119 | 120 | diamonds3 <- diamonds2 %>% 121 | mutate(rel_price = mod %>% resid()) 122 | 123 | diamonds3 %>% 124 | ggplot(aes(carat, rel_price)) + 125 | geom_bin2d() 126 | 127 | xgrid <- seq(-2, 1, by = 1/3) 128 | 129 | tibble(logx = xgrid, x = round(2 ^ xgrid, 2)) 130 | 131 | color_cut <- diamonds3 %>% 132 | summarize( 133 | price = mean(price), 134 | rel_price = mean(rel_price), 135 | .by = c(color, cut) 136 | ) 137 | 138 | color_cut %>% 139 | ggplot(aes(color, price)) + 140 | geom_line(aes(group = cut, color = cut)) + 141 | geom_point(aes(color = cut)) 142 | 143 | color_cut %>% 144 | ggplot(aes(color, rel_price)) + 145 | geom_line(aes(group = cut, color = cut)) + 146 | geom_point(aes(color = cut)) 147 | ``` 148 | 149 | 2. 150 | ```{r} 151 | diamonds3 %>% 152 | ggplot(aes(color, carat, color = cut)) + 153 | geom_boxplot() 154 | ``` 155 | 156 | 3. 157 | ```{r} 158 | diamonds3 %>% 159 | ggplot(aes(cut, rel_price, color = color)) + 160 | geom_boxplot() + 161 | facet_wrap(vars(clarity)) 162 | ``` 163 | 164 | 4. 165 | ```{r} 166 | diamonds3 %>% 167 | ggplot(aes(depth, rel_price)) + 168 | geom_point() 169 | 170 | diamonds3 %>% 171 | ggplot(aes(table, rel_price)) + 172 | geom_point() 173 | ``` 174 | 175 | ## 11.3 Texas Housing Data 176 | 177 | ```{r} 178 | txhousing 179 | ``` 180 | 181 | ```{r} 182 | txhousing %>% 183 | ggplot(aes(date, sales, group = city)) + 184 | geom_line(alpha = 1/2) 185 | ``` 186 | 187 | ```{r} 188 | txhousing %>% 189 | ggplot(aes(date, log(sales), group = city)) + 190 | geom_line(alpha = 1/2) 191 | ``` 192 | 193 | ```{r} 194 | abilene <- txhousing %>% 195 | filter(city == "Abilene") 196 | 197 | abilene %>% 198 | ggplot(aes(date, log(sales))) + 199 | geom_line() 200 | 201 | mod <- lm( 202 | log(sales) ~ factor(month), 203 | data = abilene 204 | ) 205 | 206 | abilene2 <- abilene %>% 207 | mutate(rel_sales = mod %>% resid()) 208 | 209 | abilene2 %>% 210 | ggplot(aes(date, rel_sales)) + 211 | geom_line() 212 | ``` 213 | 214 | ```{r} 215 | de_season <- function(x, month) { 216 | lm( 217 | x ~ factor(month), 218 | na.action = na.exclude 219 | ) %>% 220 | resid() 221 | } 222 | 223 | txhousing2 <- txhousing %>% 224 | mutate( 225 | rel_sales = de_season(log(sales), month), 226 | .by = city 227 | ) 228 | 229 | txhousing2 %>% 230 | ggplot(aes(date, rel_sales)) + 231 | geom_line(aes(group = city), alpha = 1/5) + 232 | # Due to a warning message, I switched this to a stat_summary instead 233 | stat_summary( 234 | geom = "line", 235 | fun = "mean", 236 | color = "red" 237 | ) 238 | ``` 239 | 240 | ### 11.3.1 Exercises 241 | 242 | 1. 243 | ```{r} 244 | txhousing2 %>% 245 | ggplot(aes(date, rel_sales)) + 246 | geom_line(aes(group = city), alpha = 1/5) + 247 | geom_smooth( 248 | method = "loess", 249 | se = FALSE, 250 | color = "red" 251 | ) 252 | ``` 253 | 254 | 2. 255 | ```{r} 256 | txhousing %>% 257 | ggplot(aes(date, sales, group = city)) + 258 | geom_line(alpha = 1/2) 259 | ``` 260 | 261 | ## 11.4 Visualizing Models 262 | 263 | ```{r} 264 | library(broom) 265 | 266 | model_sum <- txhousing %>% 267 | group_by(city) %>% 268 | group_modify( 269 | ~ glance( 270 | lm( 271 | log2(sales) ~ factor(month), 272 | data = ., 273 | na.action = na.exclude 274 | ) 275 | ) 276 | ) %>% 277 | ungroup() 278 | 279 | model_sum 280 | ``` 281 | 282 | ```{r} 283 | model_sum %>% 284 | mutate(city = fct_reorder(city, r.squared)) %>% 285 | ggplot(aes(r.squared, city)) + 286 | geom_point() 287 | ``` 288 | 289 | ```{r} 290 | top3 <- c("Bryan-College Station", "Lubbock", "NE Tarrant County") 291 | 292 | bottom3 <- c("McAllen", "Brownsville", "Harlingen") 293 | 294 | extreme <- txhousing %>% 295 | filter( 296 | city %in% c(top3, bottom3), 297 | !is.na(sales) 298 | ) %>% 299 | mutate(city = factor(city, c(top3, bottom3))) 300 | 301 | extreme %>% 302 | ggplot(aes(month, log(sales), group = year)) + 303 | geom_line() + 304 | facet_wrap(vars(city)) 305 | ``` 306 | 307 | ## 11.6 Coefficient-Level Summaries 308 | 309 | ```{r} 310 | library(broom) 311 | 312 | coefs <- txhousing %>% 313 | group_by(city) %>% 314 | group_modify( 315 | ~ tidy( 316 | lm( 317 | log2(sales) ~ factor(month), 318 | data = ., 319 | na.action = na.exclude 320 | ) 321 | ) 322 | ) %>% 323 | ungroup() 324 | 325 | coefs 326 | ``` 327 | 328 | ```{r} 329 | months <- coefs %>% 330 | filter(str_detect(term, "factor")) %>% 331 | extract(term, "month", "(\\d+)", convert = TRUE) 332 | 333 | months 334 | ``` 335 | 336 | ```{r} 337 | months %>% 338 | ggplot(aes(month, 2 ^ estimate, group = city)) + 339 | geom_line() 340 | ``` 341 | 342 | ```{r} 343 | coef_sum <- months %>% 344 | group_by(city) %>% 345 | summarize(max = max(estimate)) 346 | 347 | coef_sum %>% 348 | mutate(city = fct_reorder(city, max)) %>% 349 | ggplot(aes(2 ^ max, city)) + 350 | geom_point() 351 | ``` 352 | 353 | ## 11.7. Observation Data 354 | 355 | ```{r} 356 | library(broom) 357 | 358 | obs_sum <- txhousing %>% 359 | group_by(city) %>% 360 | nest() %>% 361 | mutate( 362 | mod = map( 363 | data, 364 | ~ lm( 365 | log2(sales) ~ factor(month), 366 | data = . 367 | ) 368 | ), 369 | augmented = map(mod, augment, se_fit = TRUE) 370 | ) %>% 371 | unnest(augmented) %>% 372 | ungroup() %>% 373 | select(-c(data, mod, .rownames)) %>% 374 | rename( 375 | "log2.sales" = "log2(sales)", 376 | "factor.month" = "factor(month)" 377 | ) 378 | 379 | obs_sum 380 | ``` 381 | 382 | ```{r} 383 | obs_sum %>% 384 | ggplot(aes(.std.resid)) + 385 | geom_histogram(binwidth = 0.1) 386 | 387 | obs_sum %>% 388 | ggplot(aes(abs(.std.resid))) + 389 | geom_histogram(binwidth = 0.1) 390 | ``` 391 | 392 | ```{r} 393 | obs_sum %>% 394 | filter(abs(.std.resid) > 2) %>% 395 | summarize( 396 | n = n(), 397 | avg = mean(abs(.std.resid)), 398 | .by = city 399 | ) %>% 400 | arrange(desc(n)) 401 | ``` 402 | -------------------------------------------------------------------------------- /ggplot2_2nd_ed/chapter_12_programming_with_ggplot2.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (2nd ed.), Chapter 12 - Programming with ggplot2" 3 | author: "Original Code: Hadley Wickham | Modifications: Antti Rask" 4 | date: "2023-01-21" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 12 Programming with ggplot2 13 | 14 | ```{r} 15 | library(tidyverse) 16 | ``` 17 | 18 | ## 12.2 Single Components 19 | 20 | ```{r} 21 | bestfit <- geom_smooth( 22 | method = "lm", 23 | se = FALSE, 24 | color = alpha("steelblue", 0.5), 25 | linewidth = 2 26 | ) 27 | 28 | mpg %>% 29 | ggplot(aes(cty, hwy)) + 30 | geom_point() + 31 | bestfit 32 | 33 | mpg %>% 34 | ggplot(aes(displ, hwy)) + 35 | geom_point() + 36 | bestfit 37 | ``` 38 | 39 | ```{r} 40 | geom_lm <- function( 41 | formula = y ~ x, 42 | color = alpha("steelblue", 0.5), 43 | linewidth = 2, 44 | ... 45 | ) { 46 | geom_smooth( 47 | formula = formula, 48 | se = FALSE, 49 | method = "lm", 50 | color = color, 51 | linewidth = linewidth, 52 | ... 53 | ) 54 | } 55 | 56 | p <- mpg %>% 57 | ggplot(aes(displ, 1 / hwy)) + 58 | geom_point() 59 | 60 | p + 61 | geom_lm() 62 | 63 | p + 64 | geom_lm( 65 | y ~ poly(x, 2), 66 | linewidth = 1, 67 | color = "red" 68 | ) 69 | ``` 70 | 71 | ### 12.2.1 Exercises 72 | 73 | 1. 74 | ```{r} 75 | histogram_pink_100_bins <- geom_histogram(bins = 100, fill = "pink") 76 | 77 | mpg %>% 78 | ggplot(aes(cty)) + 79 | histogram_pink_100_bins 80 | ``` 81 | 82 | 2. 83 | ```{r} 84 | brewer_fill_blues <- scale_fill_brewer(palette = "Blues") 85 | 86 | mpg %>% 87 | ggplot(aes(factor(manufacturer), fill = factor(drv))) + 88 | geom_bar(position = "stack") + 89 | brewer_fill_blues + 90 | coord_flip() 91 | ``` 92 | 93 | ## 12.3 Multiple Components 94 | 95 | ```{r} 96 | geom_mean <- function() { 97 | list( 98 | stat_summary( 99 | fun = "mean", 100 | geom = "bar", 101 | fill = "grey70" 102 | ), 103 | stat_summary( 104 | fun.data = "mean_cl_normal", 105 | geom = "errorbar", 106 | width = 0.4 107 | ) 108 | ) 109 | } 110 | 111 | mpg %>% 112 | ggplot(aes(class, cty)) + 113 | geom_mean() 114 | 115 | mpg %>% 116 | ggplot(aes(drv, cty)) + 117 | geom_mean() 118 | ``` 119 | 120 | ```{r} 121 | geom_mean <- function(se = TRUE) { 122 | list( 123 | stat_summary( 124 | fun = "mean", 125 | geom = "bar", 126 | fill = "grey70" 127 | ), 128 | if (se) { 129 | stat_summary( 130 | fun.data = "mean_cl_normal", 131 | geom = "errorbar", 132 | width = 0.4 133 | ) 134 | } 135 | ) 136 | } 137 | 138 | p <- mpg %>% 139 | ggplot(aes(drv, cty)) 140 | 141 | p + 142 | geom_mean() 143 | 144 | p + 145 | geom_mean(se = FALSE) 146 | ``` 147 | 148 | ### 12.3.2 Annotation 149 | 150 | ```{r} 151 | # borders -> borders2 152 | borders2 <- function( 153 | database = "world", 154 | regions = ".", 155 | fill = NA, 156 | color = "grey50", 157 | ... 158 | ) { 159 | 160 | tbl <- map_data(database, regions) 161 | 162 | geom_polygon( 163 | # aes_ -> aes + ~lat -> long + ~long -> lat + ~group -> group 164 | aes(long, lat, group = group), 165 | data = tbl, 166 | fill = fill, 167 | color = color, 168 | ..., 169 | inherit.aes = FALSE, 170 | show.legend = FALSE 171 | ) 172 | } 173 | 174 | # These weren't part of the original code, but I wanted to see what the function above would look like 175 | p <- ggplot() + 176 | borders2(fill = "blue") 177 | 178 | p 179 | 180 | p + 181 | coord_sf() 182 | ``` 183 | 184 | ### 12.3.3 Additional Arguments 185 | 186 | ```{r} 187 | geom_mean <- function(..., bar.params = list(), errorbar.params = list()) { 188 | params <- list(...) 189 | bar.params <- modifyList(params, bar.params) 190 | errorbar.params <- modifyList(params, errorbar.params) 191 | 192 | bar <- do.call( 193 | "stat_summary", 194 | modifyList( 195 | list( 196 | fun = "mean", 197 | geom = "bar", 198 | fill = "grey70" 199 | ), 200 | bar.params 201 | ) 202 | ) 203 | 204 | errorbar <- do.call( 205 | "stat_summary", 206 | modifyList( 207 | list( 208 | fun.data = "mean_cl_normal", 209 | geom = "errorbar", 210 | width = 0.4 211 | ), 212 | errorbar.params 213 | ) 214 | ) 215 | 216 | list(bar, errorbar) 217 | } 218 | 219 | mpg %>% 220 | ggplot(aes(class, cty)) + 221 | geom_mean( 222 | color = "steelblue", 223 | errorbar.params = list(width = 0.5, linewidth = 1) 224 | ) 225 | 226 | mpg %>% 227 | ggplot(aes(class, cty)) + 228 | geom_mean( 229 | bar.params = list(fill = "steelblue"), 230 | errorbar.params = list(color = "blue") 231 | ) 232 | ``` 233 | 234 | ## 12.4. Plot Functions 235 | 236 | ```{r} 237 | piechart <- function(data, mapping) { 238 | ggplot(data, mapping) + 239 | geom_bar(width = 1) + 240 | coord_polar(theta = "y") + 241 | labs( 242 | x = NULL, 243 | y = NULL 244 | ) 245 | } 246 | 247 | mpg %>% 248 | piechart(aes(factor(1), fill = class)) 249 | ``` 250 | 251 | ```{r} 252 | pcp_data <- function(df) { 253 | # vapply -> map_vec + .progress = TRUE + .ptype = logical(1) 254 | is_numeric <- map_vec(df, is.numeric, .progress = TRUE, .ptype = logical(1)) 255 | 256 | # Rescale numeric columns 257 | rescale01 <- function(x) { 258 | rng <- range(x, na.rm = TRUE) 259 | (x - rng[1]) / (rng[2] - rng[1]) 260 | } 261 | # lapply -> map 262 | df[is_numeric] <- map(df[is_numeric], rescale01) 263 | 264 | # Add row identifier 265 | df$.row <- rownames(df) 266 | 267 | # Treat numerics as value (aka measure) variables 268 | # gather_ -> pivot_longer + cols = names(df)[is_numeric] + names_to = "variable" + values_to = "value" 269 | pivot_longer(df, cols = names(df)[is_numeric], names_to = "variable", values_to = "value") 270 | } 271 | 272 | pcp <- function(df, ...) { 273 | df <- pcp_data(df) 274 | df %>% 275 | ggplot(aes(variable, value, group = .row)) + 276 | geom_line(...) 277 | } 278 | 279 | mpg %>% 280 | pcp() 281 | 282 | mpg %>% 283 | pcp(aes(color = drv)) 284 | ``` 285 | 286 | ### 12.4.1 Indirectly Referring to Variables 287 | 288 | ```{r} 289 | # Since aes_ has been deprecated, this block has changed significantly 290 | x_var <- "displ" 291 | 292 | # Don't do this! 293 | aes(x_var) 294 | 295 | # Do one of these instead: 296 | aes(displ) 297 | 298 | # This doesn't look right by itself, but will prove useful in the next block 299 | aes(.data[[x_var]]) 300 | 301 | aes({{ x_var }}) 302 | ``` 303 | 304 | ```{r} 305 | # Since aes_ has been deprecated, this block has changed significantly, but there is a working replacement shown below: 306 | 307 | piechart1 <- function(data, var, ...) { 308 | # aes_(~factor(1) -> aes(factor(1) + as.name(var) -> .data[[var]] 309 | piechart(data, aes(factor(1), fill = .data[[var]])) 310 | } 311 | 312 | mpg %>% 313 | piechart1("class") + 314 | theme(legend.position = "none") 315 | 316 | piechart2 <- function(data, var, ...) { 317 | # aes_(~factor(1) -> aes(factor(1) + as.name(var) -> {{ var }} 318 | piechart(data, aes(factor(1), fill = {{ var }})) 319 | } 320 | 321 | mpg %>% 322 | piechart2(class) + 323 | theme(legend.position = "none") 324 | ``` 325 | 326 | ### 12.4.2 The Plot Environment 327 | 328 | ```{r} 329 | # This didn't use to work, but works nowadays just fine: 330 | f <- function() { 331 | n <- 10 332 | geom_line(aes(x / n)) 333 | } 334 | 335 | tbl <- tibble(x = 1:3, y = 1:3) 336 | 337 | tbl %>% 338 | ggplot(aes(x, y)) + f() 339 | 340 | # This still works, too: 341 | f <- function() { 342 | color <- "blue" 343 | geom_line(color = color) 344 | } 345 | 346 | tbl %>% 347 | ggplot(aes(x, y)) + f() 348 | ``` 349 | 350 | ## 12.5 Functional Programming 351 | 352 | ```{r} 353 | geoms <- list( 354 | geom_point(), 355 | geom_boxplot(aes(group = cut_width(displ, 1))), 356 | list(geom_point(), geom_smooth()) 357 | ) 358 | 359 | p <- mpg %>% 360 | ggplot(aes(displ, hwy)) 361 | 362 | # lapply -> map 363 | map( 364 | geoms, 365 | function(g) { 366 | p + g 367 | } 368 | ) 369 | ``` 370 | -------------------------------------------------------------------------------- /ggplot2_2nd_ed/img/plot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_2nd_ed/img/plot.png -------------------------------------------------------------------------------- /ggplot2_2nd_ed/img/youcanbeapirate-wb-sparkline.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_2nd_ed/img/youcanbeapirate-wb-sparkline.jpg -------------------------------------------------------------------------------- /ggplot2_2nd_ed/input/gw_KS_sf.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_2nd_ed/input/gw_KS_sf.rds -------------------------------------------------------------------------------- /ggplot2_2nd_ed/output/output.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_2nd_ed/output/output.pdf -------------------------------------------------------------------------------- /ggplot2_2nd_ed/output/output2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_2nd_ed/output/output2.pdf -------------------------------------------------------------------------------- /ggplot2_2nd_ed/output/plot.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_2nd_ed/output/plot.rds -------------------------------------------------------------------------------- /ggplot2_3rd_ed/6_5_raster_maps.R: -------------------------------------------------------------------------------- 1 | # ## 6.5 Raster maps 2 | # 3 | # There was a problem with the {bomrang} package and it has been archived. So, the example code for the 3rd 4 | # edition doesn't work as it is. 5 | # 6 | # I opened an issue about it (https://github.com/hadley/ggplot2-book/issues/338#issuecomment-1422203499) and Adam H. Sparks, the creator (you can correct me if I'm wrong) of the {bomrang} package was kind enough to 7 | # provide a function that does what the get_available_imagery() function did in the original example. 8 | # 9 | # ```{r} 10 | # library(curl) 11 | # library(tidyverse) 12 | # 13 | # get_available_imagery <- function(product_id = "all") { 14 | # 15 | # ftp_base <- "ftp://ftp.bom.gov.au/anon/gen/gms/" 16 | # 17 | # .ftp_images <- function(product_id, bom_server) { 18 | # 19 | # list_files <- new_handle() 20 | # 21 | # handle_setopt( 22 | # handle = list_files, 23 | # CONNECTTIMEOUT = 60L, 24 | # TIMEOUT = 120L, 25 | # ftp_use_epsv = TRUE, 26 | # dirlistonly = TRUE 27 | # ) 28 | # 29 | # # get file list from FTP server 30 | # con <- curl( 31 | # url = ftp_base, 32 | # open = "r", 33 | # handle = list_files 34 | # ) 35 | # 36 | # tif_files <- readLines(con) 37 | # 38 | # close(con) 39 | # 40 | # # filter only the GeoTIFF files 41 | # tif_files <- tif_files %>% 42 | # as_tibble() %>% 43 | # filter(str_detect(value, "^.*\\.tif")) %>% 44 | # pull() 45 | # 46 | # # check if the Product ID requested provides any files on the server 47 | # if (length(tif_files) == 0 | tif_files[1] == ftp_base) { 48 | # stop( 49 | # str_c( 50 | # "\nSorry, no files are currently available for ", 51 | # product_id 52 | # ) 53 | # ) 54 | # } 55 | # return(tif_files) 56 | # } 57 | # 58 | # tif_list <- .ftp_images(product_id, bom_server = ftp_base) 59 | # 60 | # write_lines(tif_list, file = file.path(tempdir(), "tif_list")) 61 | # 62 | # cat("\nThe following files are currently available for download:\n") 63 | # 64 | # print(tif_list) 65 | # 66 | # } 67 | # ``` 68 | # 69 | # 70 | # ```{r} 71 | # library(lubridate) 72 | # 73 | # yesterday_10pm <- 74 | # as.character(floor_date(now() - ddays(1), "day") + dhours(22)) %>% 75 | # str_replace_all("-", "") %>% 76 | # str_replace_all(":", "") %>% 77 | # str_replace_all(" ", "") %>% 78 | # str_sub(1, 12) 79 | # 80 | # yesterday_10pm 81 | # ``` 82 | # 83 | # ```{r} 84 | # files <- get_available_imagery() %>% 85 | # str_subset(yesterday_10pm) 86 | # ``` 87 | # 88 | # ```{r} 89 | # walk2( 90 | # .x = str_c("ftp://ftp.bom.gov.au/anon/gen/gms/", files), 91 | # .y = file.path("raster", files), 92 | # .f = ~ download.file(url = .x, destfile = .y) 93 | # ) 94 | # ``` 95 | # 96 | # ```{r} 97 | # dir("raster") 98 | # ``` 99 | # 100 | # ```{r} 101 | # img_vis <- file.path("raster", "IDE00420.202302122200.tif") 102 | # img_inf <- file.path("raster", "IDE00421.202302122200.tif") 103 | # ``` 104 | # 105 | # ```{r} 106 | # library(stars) 107 | # 108 | # sat_vis <- read_stars( 109 | # img_vis, 110 | # RasterIO = list(nBufXSize = 600, nBufYSize = 600), 111 | # proxy = TRUE 112 | # ) 113 | # 114 | # sat_inf <- read_stars( 115 | # img_inf, 116 | # RasterIO = list(nBufXSize = 600, nBufYSize = 600), 117 | # proxy = TRUE 118 | # ) 119 | # ``` 120 | # 121 | # ```{r} 122 | # sat_vis 123 | # ``` 124 | # 125 | # ```{r} 126 | # ggplot() + 127 | # geom_stars(data = sat_vis) + 128 | # coord_equal() 129 | # ``` 130 | # 131 | # ```{r} 132 | # ggplot() + 133 | # geom_stars(data = sat_vis, show.legend = FALSE) + 134 | # facet_wrap(vars(band)) + 135 | # coord_equal() + 136 | # scale_fill_gradient(low = "black", high = "white") 137 | # ``` 138 | # 139 | # ```{r} 140 | # oz_states <- st_transform(oz_states, crs = st_crs(sat_vis)) 141 | # ``` 142 | # 143 | # ```{r} 144 | # ggplot() + 145 | # geom_stars(data = sat_vis, show.legend = FALSE) + 146 | # geom_sf(data = oz_states, fill = NA, color = "white") + 147 | # coord_sf() + 148 | # theme_void() + 149 | # scale_fill_gradient(low = "black", high = "white") 150 | # ``` 151 | # 152 | # ```{r} 153 | # cities <- oz_capitals %>% 154 | # st_as_sf(coords = c("lon", "lat"), crs = 4326, remove = FALSE) 155 | # ``` 156 | # 157 | # ```{r} 158 | # cities <- st_transform(cities, st_crs(sat_vis)) 159 | # ``` 160 | # 161 | # ```{r} 162 | # ggplot() + 163 | # geom_stars(data = sat_vis, show.legend = FALSE) + 164 | # geom_sf(data = oz_states, fill = NA, color = "white") + 165 | # geom_sf(data = cities, color = "red") + 166 | # coord_sf() + 167 | # theme_void() + 168 | # scale_fill_gradient(low = "black", high = "white") 169 | # ``` 170 | # 171 | # ```{r} 172 | # geom_sf_text(data = cities, mapping = aes(label = city)) 173 | # ``` 174 | # 175 | # 176 | # 177 | # 178 | # 179 | # 180 | # 181 | 182 | library(tiff) 183 | 184 | # Read in the large tif file 185 | img <- read_tif("ggplot2_3rd_ed/raster/IDE00420.202302122200.tif") 186 | 187 | # Convert the image to a matrix 188 | img_matrix <- as.matrix(img[1]) 189 | 190 | # Convert the matrix to a data frame 191 | img_df <- as.data.frame(img_matrix) 192 | 193 | # Visualize the image using ggplot2 194 | library(ggplot2) 195 | 196 | ggplot(data = img_df, aes(x = 1:nrow(img_df), y = 1:ncol(img_df))) + 197 | geom_raster(aes(fill = value)) -------------------------------------------------------------------------------- /ggplot2_3rd_ed/README.md: -------------------------------------------------------------------------------- 1 | ![](img/youcanbeapirate-wb-sparkline.jpg) 2 | 3 | # Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (3rd ed.) 4 | 5 | I've tried to comment on the changes I've made to the code, but there are some frequent changes that I'll comment on here so I don't have to repeat myself constantly. 6 | 7 | * In general, I've tried to use a __tibble__ (tbl) instead of a __data frame__ (df). So instead of _as.dataframe()_, you will usually find _tibble()_, but not always. There are some situations where only a data frame will work. 8 | * There have been some changes to ggplot2 since 2016: 9 | * __fun.y__ parameter has become __fun__ 10 | * __size__ has become __linewidth__ (for all geoms that use lines) -------------------------------------------------------------------------------- /ggplot2_3rd_ed/chapter_02_first_steps.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (3rd ed.), Chapter 2: First steps" 3 | author: "Original Code: Hadley Wickham (except Matt Dancho for ggdist and tidyquant) | Modifications: Antti Rask" 4 | date: "2023-01-05" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 2 First steps 13 | 14 | ```{r} 15 | library(tidyverse) 16 | library(conflicted) 17 | conflict_prefer("select", "dplyr") 18 | ``` 19 | 20 | ## 2.2 Fuel economy data 21 | 22 | ```{r} 23 | mpg 24 | ``` 25 | 26 | ### 2.2.1 Exercises 27 | 28 | 1. 29 | ```{r} 30 | help(mpg) 31 | 32 | dim(mpg) 33 | 34 | summary(mpg) 35 | 36 | str(mpg) 37 | 38 | glimpse(mpg) 39 | ``` 40 | 41 | 2. 42 | ```{r} 43 | data(package = "ggplot2") 44 | ``` 45 | 46 | 3. 47 | ```{r} 48 | mpg %>% 49 | # Miles per gallon 50 | select(cty, hwy) %>% 51 | mutate( 52 | cty_eur = 235.21 / cty, 53 | hwy_eur = 235.21 / hwy 54 | ) 55 | ``` 56 | 57 | 4. 58 | ```{r} 59 | # Without touching models 60 | mpg %>% 61 | distinct(manufacturer, model) %>% 62 | summarize( 63 | n = n(), 64 | .by = manufacturer 65 | ) %>% 66 | arrange(desc(n)) 67 | ``` 68 | 69 | ```{r} 70 | # With the core models only 71 | mpg %>% 72 | mutate( 73 | core_model = str_extract(model, "^\\S+") 74 | ) %>% 75 | distinct(manufacturer, core_model) %>% 76 | summarize( 77 | n = n(), 78 | .by = manufacturer 79 | ) %>% 80 | arrange(desc(n)) 81 | ``` 82 | 83 | ## 2.3 Key components 84 | 85 | ```{r} 86 | mpg %>% 87 | ggplot(aes(displ, hwy)) + 88 | geom_point() 89 | ``` 90 | 91 | ### 2.3.1 Exercises 92 | 93 | 1. 94 | ```{r} 95 | mpg %>% 96 | ggplot(aes(cty, hwy)) + 97 | geom_point() 98 | ``` 99 | 100 | 2. 101 | ```{r} 102 | # Models and manufacturers 103 | mpg %>% 104 | ggplot(aes(model, manufacturer)) + 105 | geom_point() 106 | 107 | # One example solution with the amount of unique models by manufacturer 108 | mpg %>% 109 | mutate( 110 | manufacturer = manufacturer %>% as.factor(), 111 | core_model = str_extract(model, "^\\S+") 112 | ) %>% 113 | distinct(manufacturer, core_model) %>% 114 | summarize( 115 | n = n(), 116 | .by = manufacturer 117 | ) %>% 118 | 119 | ggplot(aes(fct_reorder(manufacturer, n), n)) + 120 | geom_col() + 121 | coord_flip() + 122 | labs( 123 | title = "Amount of unique models by manufacturer", 124 | subtitle = "", 125 | x = "", 126 | y = "" 127 | ) 128 | ``` 129 | 130 | 3. 131 | ```{r} 132 | # 1. 133 | mpg %>% 134 | ggplot(aes(cty, hwy)) + 135 | geom_point() 136 | 137 | # 2. 138 | diamonds %>% 139 | ggplot(aes(carat, price)) + 140 | geom_point() 141 | 142 | # 3. 143 | economics %>% 144 | ggplot(aes(date, unemploy)) + 145 | geom_line() 146 | 147 | # 4. 148 | mpg %>% 149 | ggplot(aes(cty)) + 150 | geom_histogram() 151 | ``` 152 | 153 | ## 2.4 Color, size, shape and other aesthetic attributes 154 | 155 | ```{r} 156 | mpg %>% 157 | ggplot(aes(displ, cty, color = class)) + 158 | geom_point() 159 | ``` 160 | 161 | ```{r} 162 | p <- mpg %>% 163 | ggplot(aes(displ, hwy)) 164 | 165 | p + 166 | geom_point(aes(color = "blue")) 167 | 168 | p + 169 | geom_point(color = "blue") 170 | ``` 171 | 172 | ### 2.4.1 Exercises 173 | 174 | 1. 175 | ```{r} 176 | # Map color to continuous value 177 | mpg %>% 178 | ggplot(aes(cty, hwy, color = displ)) + 179 | geom_point() 180 | 181 | # Map color to categorical value 182 | mpg %>% 183 | ggplot(aes(cty, hwy, color = trans)) + 184 | geom_point() 185 | 186 | # Use more than one aesthetic in a plot 187 | mpg %>% 188 | ggplot(aes(cty, hwy, color = trans, size = trans)) + 189 | geom_point() 190 | ``` 191 | 192 | 2. 193 | ```{r} 194 | # Commented out, because causes error and can't see the other plot with the error: 195 | # mpg %>% 196 | # ggplot(aes(cty, hwy, shape = displ)) + 197 | # geom_point() 198 | 199 | mpg %>% 200 | ggplot(aes(cty, hwy, shape = trans)) + 201 | geom_point() 202 | ``` 203 | 204 | 3. 205 | ```{r} 206 | mpg %>% 207 | ggplot(aes(drv, cty)) + 208 | geom_boxplot() 209 | 210 | mpg %>% 211 | ggplot(aes(class, displ)) + 212 | geom_point(aes(color = drv)) 213 | ``` 214 | 215 | ## 2.5 Faceting 216 | 217 | ```{r} 218 | mpg %>% 219 | ggplot(aes(displ, hwy)) + 220 | geom_point() + 221 | facet_wrap(vars(class)) 222 | ``` 223 | 224 | ### 2.5.1. Exercises 225 | 226 | 1. 227 | ```{r} 228 | p <- mpg %>% 229 | ggplot(aes(drv, displ, fill = class)) + 230 | geom_col(position = "dodge") 231 | 232 | p + 233 | facet_wrap(vars(hwy)) 234 | 235 | p + 236 | facet_wrap(vars(cyl)) 237 | ``` 238 | 239 | 2. 240 | ```{r} 241 | mpg %>% 242 | ggplot(aes(displ, cty)) + 243 | geom_point() + 244 | facet_wrap(vars(cyl)) 245 | ``` 246 | 247 | 3. 248 | ```{r} 249 | p <- mpg %>% 250 | ggplot(aes(displ, cty)) + 251 | geom_point() 252 | 253 | p + 254 | facet_wrap( 255 | vars(cyl), 256 | nrow = 4, 257 | ncol = 1 258 | ) 259 | 260 | p + 261 | facet_wrap( 262 | vars(cyl), 263 | nrow = 1, 264 | ncol = 4 265 | ) 266 | ``` 267 | 268 | ## 2.6 Plot geoms 269 | 270 | ### 2.6.1 Adding a smoother to a plot 271 | 272 | ```{r} 273 | mpg %>% 274 | ggplot(aes(displ, hwy)) + 275 | geom_point() + 276 | geom_smooth() 277 | ``` 278 | 279 | ```{r} 280 | # How span affects the resulting smoother 281 | p <- mpg %>% 282 | ggplot(aes(displ, hwy)) + 283 | geom_point() 284 | 285 | p + 286 | geom_smooth(span = 0.2) 287 | 288 | p + 289 | geom_smooth(span = 1) 290 | ``` 291 | 292 | #### mgcv 293 | ```{r} 294 | library(mgcv) 295 | 296 | mpg %>% 297 | ggplot(aes(displ, hwy)) + 298 | geom_point() + 299 | geom_smooth(method = "gam", formula = y ~ s(x)) 300 | ``` 301 | 302 | ```{r} 303 | # Line of best fit 304 | mpg %>% 305 | ggplot(aes(displ, hwy)) + 306 | geom_point() + 307 | geom_smooth(method = "lm") 308 | ``` 309 | 310 | ### 2.6.2 Boxplots and jittered points 311 | 312 | ```{r} 313 | p <- mpg %>% 314 | ggplot(aes(drv, hwy)) 315 | 316 | p + 317 | geom_point() 318 | 319 | p + 320 | geom_jitter() 321 | 322 | p + 323 | geom_boxplot() 324 | 325 | p + 326 | geom_violin() 327 | ``` 328 | 329 | ### 2.6.3 Histograms and frequency polygons 330 | 331 | ```{r} 332 | p <- mpg %>% 333 | ggplot(aes(hwy)) 334 | 335 | p + 336 | geom_histogram() 337 | 338 | p + 339 | geom_freqpoly() 340 | 341 | p + 342 | geom_freqpoly(binwidth = 2.5) 343 | 344 | p + 345 | geom_freqpoly(binwidth = 1) 346 | ``` 347 | 348 | ```{r} 349 | mpg %>% 350 | ggplot(aes(displ, color = drv)) + 351 | geom_freqpoly(binwidth = 0.5) 352 | 353 | mpg %>% 354 | ggplot(aes(displ, fill = drv)) + 355 | geom_histogram(binwidth = 0.5) + 356 | facet_wrap(vars(drv), ncol = 1) 357 | ``` 358 | 359 | ### 2.6.4 Bar charts 360 | 361 | ```{r} 362 | mpg %>% 363 | ggplot(aes(manufacturer)) + 364 | geom_bar() 365 | ``` 366 | 367 | ```{r} 368 | drugs <- tibble( 369 | drug = c("a", "b", "c"), 370 | effect = c(4.2, 9.7, 6.1) 371 | ) 372 | 373 | p <- drugs %>% 374 | ggplot(aes(drug, effect)) 375 | 376 | p + 377 | geom_bar(stat = "identity") 378 | 379 | p + 380 | geom_point() 381 | ``` 382 | 383 | ### 2.6.5 Time series with line and path plots 384 | 385 | ```{r} 386 | economics %>% 387 | ggplot(aes(date, unemploy / pop)) + 388 | geom_line() 389 | 390 | economics %>% 391 | ggplot(aes(date, uempmed)) + 392 | geom_line() 393 | ``` 394 | 395 | #### lubridate 396 | ```{r} 397 | library(lubridate) 398 | 399 | # I decided to replace all of this... 400 | # year <- function(x) as.POSIXlt(x)$year + 1900 401 | # ... with the year() function from lubridate 402 | 403 | p <- economics %>% 404 | ggplot(aes(unemploy / pop, uempmed)) 405 | 406 | p + 407 | geom_path() + 408 | geom_point() 409 | 410 | p + 411 | geom_path(color = "grey50") + 412 | geom_point(aes(color = year(date))) 413 | ``` 414 | 415 | ### 2.6.6 Exercises 416 | 417 | 1. 418 | ```{r} 419 | mpg %>% 420 | ggplot(aes(cty, hwy)) + 421 | geom_jitter(alpha = 0.5) 422 | ``` 423 | 424 | 2. 425 | ```{r} 426 | mpg %>% 427 | ggplot(aes(class, hwy)) + 428 | geom_boxplot() 429 | 430 | mpg %>% 431 | ggplot(aes(fct_reorder(class, hwy), hwy)) + 432 | geom_boxplot() 433 | ``` 434 | 435 | 3. 436 | ```{r} 437 | diamonds %>% 438 | ggplot(aes(carat)) + 439 | geom_histogram(binwidth = 0.3) 440 | ``` 441 | 442 | 4. 443 | ```{r} 444 | diamonds %>% 445 | ggplot(aes(fct_reorder(cut, price), price)) + 446 | geom_boxplot() 447 | 448 | diamonds %>% 449 | ggplot(aes(price, after_stat(density), color = cut)) + 450 | geom_freqpoly(binwidth = 200) 451 | ``` 452 | 453 | 6. 454 | ```{r} 455 | ?geom_bar() 456 | 457 | p <- mpg %>% 458 | ggplot(aes(class)) 459 | 460 | p + 461 | geom_bar() 462 | 463 | p + 464 | geom_bar(aes(weight = displ)) 465 | ``` 466 | 467 | ## 2.7 Modifying the axes 468 | 469 | ```{r} 470 | p <- mpg %>% 471 | ggplot(aes(cty, hwy)) + 472 | geom_point(alpha = 1 / 3) 473 | 474 | p 475 | 476 | p + 477 | labs( 478 | x = "city driving (mpg)", 479 | y = "highway driving (mpg)" 480 | ) 481 | 482 | # Remove the axis labels with NULL 483 | p + 484 | labs( 485 | x = NULL, 486 | y = NULL 487 | ) 488 | ``` 489 | 490 | ```{r} 491 | p <- mpg %>% 492 | ggplot(aes(drv, hwy)) + 493 | geom_jitter(width = 0.25) 494 | 495 | p 496 | 497 | p + 498 | xlim("f", "r") + 499 | ylim(20, 30) 500 | 501 | # For continuous scales, use NA to set only one limit 502 | mpg %>% 503 | ggplot(aes(drv, hwy)) + 504 | geom_jitter(width = 0.25, na.rm = TRUE) + 505 | ylim(NA, 30) 506 | ``` 507 | 508 | ## 2.8 Output 509 | 510 | ```{r} 511 | p <- mpg %>% 512 | ggplot(aes(displ, hwy, color = factor(cyl))) + 513 | geom_point() 514 | ``` 515 | 516 | ```{r} 517 | p 518 | 519 | print(p) 520 | ``` 521 | 522 | ```{r} 523 | # Save png to disk 524 | ggsave("img/plot.png", p, width = 5, height = 5) 525 | ``` 526 | 527 | ```{r} 528 | summary(p) 529 | ``` 530 | 531 | ```{r} 532 | saveRDS(p, "output/plot.rds") 533 | q <- readRDS("output/plot.rds") 534 | q 535 | ``` 536 | 537 | ## 2.X Raincloud plot with ggdist and tidyquant 538 | 539 | Original from: https://www.business-science.io/r/2021/07/22/ggdist-raincloud-plots.html 540 | 541 | ```{r} 542 | library(ggdist) 543 | library(tidyquant) 544 | 545 | mpg %>% 546 | ggplot(aes(factor(drv), hwy, fill = factor(drv))) + 547 | 548 | # Add half-violin from {ggdist} package 549 | stat_halfeye( 550 | # Custom bandwidth 551 | adjust = 0.5, 552 | # Move geom to the right 553 | justification = -.2, 554 | # Remove slab interval 555 | .width = 0, 556 | point_color = NA 557 | ) + 558 | 559 | # Add boxplot 560 | geom_boxplot( 561 | width = .12, 562 | # Remove outliers 563 | outlier.color = NA, 564 | alpha = 0.5 565 | ) + 566 | 567 | # Add dot plots from {ggdist} package 568 | stat_dots( 569 | # Orientation to the left 570 | side = "left", 571 | # Move geom to the left 572 | justification = 1.1, 573 | # Adjust grouping (binning) of observations 574 | binwidth = .25 575 | ) + 576 | 577 | # Adjust theme 578 | scale_fill_tq() + 579 | theme_tq() + 580 | labs( 581 | title = "Raincloud plot", 582 | subtitle = "Is it better to get all the information in one plot?", 583 | x = "Drivetrain Type", 584 | y = "Highway Fuel Economy (MPG)", 585 | fill = "Drivetrain" 586 | ) + 587 | coord_flip() 588 | ``` 589 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/chapter_03_individual_geoms.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (3rd ed.), Chapter 3 - Individual geoms" 3 | author: "Original Code: Hadley Wickham | Modifications: Antti Rask" 4 | date: "2023-01-12" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 3 Individual geoms 13 | 14 | ```{r} 15 | library(tidyverse) 16 | ``` 17 | 18 | ## 3.1 Basic plot types 19 | 20 | ```{r} 21 | tbl <- tibble( 22 | x = c(3, 1, 5), 23 | y = c(2, 4, 6), 24 | label = c("a","b","c") 25 | ) 26 | 27 | p <- tbl %>% 28 | ggplot(aes(x, y, label = label)) + 29 | # Hide axis label 30 | labs( 31 | x = NULL, 32 | y = NULL 33 | ) + 34 | # Shrink plot title 35 | theme(plot.title = element_text(size = 12)) 36 | 37 | p + 38 | geom_point() + 39 | ggtitle("point") 40 | 41 | p + 42 | geom_text() + 43 | ggtitle("text") 44 | 45 | p + 46 | geom_bar(stat = "identity") + 47 | ggtitle("bar") 48 | 49 | p + 50 | geom_tile() + 51 | ggtitle("raster") 52 | ``` 53 | 54 | ```{r} 55 | p + 56 | geom_line() + 57 | ggtitle("line") 58 | 59 | p + 60 | geom_area() + 61 | ggtitle("area") 62 | 63 | p + 64 | geom_path() + 65 | ggtitle("path") 66 | 67 | p + 68 | geom_polygon() + 69 | ggtitle("polygon") 70 | ``` 71 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/chapter_04_collective_geoms.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (3rd ed.), Chapter 4 - Collective geoms" 3 | author: "Original Code: Hadley Wickham | Modifications: Antti Rask" 4 | date: "2023-01-12" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 4 Collective geoms 13 | 14 | ```{r} 15 | library(tidyverse) 16 | library(conflicted) 17 | conflict_prefer("filter", "dplyr") 18 | ``` 19 | 20 | ```{r} 21 | library(nlme) 22 | 23 | oxboys <- Oxboys 24 | 25 | head(oxboys) 26 | ``` 27 | 28 | ### 4.1 Multiple groups, one aesthetic 29 | 30 | ```{r} 31 | oxboys %>% 32 | ggplot(aes(age, height, group = Subject)) + 33 | geom_point() + 34 | geom_line() 35 | ``` 36 | 37 | ```{r} 38 | # Don't do this! 39 | oxboys %>% 40 | ggplot(aes(age, height)) + 41 | geom_point() + 42 | geom_line() 43 | ``` 44 | 45 | ### 4.2 Different groups on different layers 46 | 47 | ```{r} 48 | # Don't do this! 49 | oxboys %>% 50 | ggplot(aes(age, height, group = Subject)) + 51 | geom_line() + 52 | geom_smooth(method = "lm", se = FALSE) 53 | ``` 54 | 55 | ```{r} 56 | oxboys %>% 57 | ggplot(aes(age, height)) + 58 | geom_line(aes(group = Subject)) + 59 | geom_smooth( 60 | method = "lm", 61 | linewidth = 2, 62 | se = FALSE 63 | ) 64 | ``` 65 | 66 | ### 4.3 Overriding the default grouping 67 | 68 | ```{r} 69 | p <- oxboys %>% 70 | ggplot(aes(Occasion, height)) + 71 | geom_boxplot() 72 | 73 | p 74 | 75 | # Don't do this! 76 | p + 77 | geom_line(color = "#3366FF", alpha = 0.5) 78 | 79 | # This works 80 | p + 81 | geom_line( 82 | aes(group = Subject), 83 | color = "#3366FF", 84 | alpha = 0.5 85 | ) 86 | ``` 87 | 88 | ### 4.4 Matching aesthetics to graphic objects 89 | 90 | ```{r} 91 | tbl <- tibble(x = 1:3, y = 1:3, color = c(1,3,5)) 92 | 93 | # 1 94 | tbl %>% 95 | ggplot(aes(x, y, color = factor(color))) + 96 | geom_line(aes(group = 1), linewidth = 2) + 97 | geom_point(size = 5) 98 | 99 | # 2 100 | tbl %>% 101 | ggplot(aes(x, y, color = color)) + 102 | geom_line(aes(group = 1), linewidth = 2) + 103 | geom_point(size = 5) 104 | 105 | # 3 106 | xgrid <- with( 107 | tbl, 108 | seq(min(x), max(x), length = 50) 109 | ) 110 | 111 | interpolation <- tibble( 112 | x = xgrid, 113 | y = approx( 114 | x = tbl$x, 115 | y = tbl$y, 116 | xout = xgrid)$y, 117 | color = approx( 118 | tbl$x, 119 | tbl$color, 120 | xout = xgrid)$y 121 | ) 122 | 123 | interpolation %>% 124 | ggplot(aes(x, y, color = color)) + 125 | geom_line(linewidth = 2) + 126 | geom_point(data = tbl, size = 5) 127 | ``` 128 | 129 | ```{r} 130 | mpg %>% 131 | ggplot(aes(class)) + 132 | geom_bar() 133 | 134 | mpg %>% 135 | ggplot(aes(class, fill = drv)) + 136 | geom_bar() 137 | ``` 138 | 139 | ```{r} 140 | # Don't do this! 141 | mpg %>% 142 | ggplot(aes(class, fill = hwy)) + 143 | geom_bar() 144 | 145 | mpg %>% 146 | ggplot(aes(class, fill = hwy, group = hwy)) + 147 | geom_bar() 148 | ``` 149 | 150 | ### 4.5 Exercises 151 | 152 | 1. 153 | ```{r} 154 | mpg %>% 155 | ggplot(aes(cyl, hwy, group = cyl)) + 156 | geom_boxplot() 157 | ``` 158 | 159 | 2. 160 | ```{r} 161 | mpg %>% 162 | ggplot(aes(displ, cty, group = as.integer(displ))) + 163 | geom_boxplot() 164 | ``` 165 | 166 | 3. 167 | ```{r} 168 | tbl <- tibble(x = 1:3, y = 1:3, color = c(1,3,5)) 169 | 170 | # Group = 1 171 | ggplot(tbl, aes(x, y, color = factor(color))) + 172 | geom_line(aes(group = 1), linewidth = 2) + 173 | geom_point(size = 5) + 174 | labs(title = "Group = 1") 175 | 176 | # Group = 1 omitted 177 | ggplot(tbl, aes(x, y, color = factor(color))) + 178 | geom_line(linewidth = 2) + 179 | geom_point(size = 5) + 180 | labs(title = "Group = 1 omitted") 181 | 182 | # Group = 2 183 | ggplot(tbl, aes(x, y, color = factor(color))) + 184 | geom_line(aes(group = 2), linewidth = 2) + 185 | geom_point(size = 5) + 186 | labs(title = "Group = 2") 187 | ``` 188 | 189 | 4. 190 | ```{r} 191 | mpg %>% 192 | ggplot(aes(drv)) + 193 | geom_bar(color = "white") 194 | 195 | mpg %>% 196 | ggplot(aes(drv, fill = hwy, group = hwy)) + 197 | geom_bar(color = "white") 198 | 199 | mpg2 <- mpg %>% 200 | arrange(hwy) %>% 201 | mutate(id = seq_along(hwy)) 202 | 203 | mpg2 %>% 204 | ggplot(aes(drv, fill = hwy, group = id)) + 205 | geom_bar(color = "white") 206 | ``` 207 | 208 | 5. 209 | ```{r} 210 | library(babynames) 211 | 212 | hadley <- babynames %>% 213 | filter(name == "Hadley") 214 | 215 | hadley %>% 216 | ggplot(aes(year, n, group = sex, color = sex)) + 217 | geom_line() 218 | ``` -------------------------------------------------------------------------------- /ggplot2_3rd_ed/chapter_05_statistical_summaries.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (3rd ed.), Chapter 5 - Statistical summaries" 3 | author: "Original Code: Hadley Wickham | Modifications: Antti Rask" 4 | date: "2023-01-12" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 5 Statistical summaries 13 | 14 | ```{r} 15 | library(tidyverse) 16 | library(conflicted) 17 | conflict_prefer("filter", "dplyr") 18 | ``` 19 | 20 | ## 5.1 Revealing uncertainty 21 | 22 | ```{r} 23 | y <- c(18, 11, 16) 24 | tbl <- tibble( 25 | x = 1:3, 26 | y = y, 27 | se = c(1.2, 0.5, 1.0) 28 | ) 29 | 30 | base <- tbl %>% 31 | ggplot(aes(x, y, ymin = y - se, ymax = y + se)) 32 | 33 | # Added the titles, so it's easier to remember which one you're looking at 34 | base + 35 | geom_crossbar() + 36 | labs(title = "geom_crossbar") 37 | 38 | base + 39 | geom_pointrange() + 40 | labs(title = "geom_pointrange") 41 | 42 | base + 43 | geom_smooth(stat = "identity") + 44 | labs(title = "geom_smooth") 45 | 46 | base + 47 | geom_errorbar() + 48 | labs(title = "geom_errorbar") 49 | 50 | base + 51 | geom_linerange() + 52 | labs(title = "geom_linerange") 53 | 54 | base + 55 | geom_ribbon() + 56 | labs(title = "geom_ribbon") 57 | ``` 58 | 59 | ## 5.2 Weighted data 60 | 61 | ```{r} 62 | # Unweighted 63 | p <- midwest %>% 64 | ggplot(aes(percwhite, percbelowpoverty)) 65 | 66 | p + 67 | geom_point() 68 | 69 | # Weight by population 70 | p + 71 | geom_point(aes(size = poptotal / 1e6)) + 72 | scale_size_area( 73 | "Population\n(millions)", 74 | breaks = c(0.5, 1, 2, 4) 75 | ) 76 | ``` 77 | 78 | ```{r} 79 | # Unweighted 80 | p + 81 | geom_point() + 82 | geom_smooth(method = lm, linewidth = 1) 83 | 84 | # Weighted by population 85 | p + 86 | geom_point(aes(size = poptotal / 1e6)) + 87 | geom_smooth( 88 | aes(weight = poptotal), 89 | method = lm, 90 | linewidth = 1 91 | ) + 92 | scale_size_area(guide = "none") 93 | ``` 94 | 95 | ```{r} 96 | p <- midwest %>% 97 | ggplot(aes(percbelowpoverty)) 98 | 99 | p + 100 | geom_histogram(binwidth = 1) + 101 | labs(y = "Counties") 102 | 103 | p + 104 | geom_histogram( 105 | aes(weight = poptotal), 106 | binwidth = 1 107 | ) + 108 | labs(y = "Population (1000s)") 109 | ``` 110 | 111 | ## 5.3 Diamonds data 112 | 113 | ```{r} 114 | diamonds 115 | ``` 116 | 117 | ## 5.4 Displaying distributions 118 | 119 | ```{r} 120 | p <- diamonds %>% 121 | ggplot(aes(depth)) 122 | 123 | p + 124 | geom_histogram() 125 | 126 | p + 127 | geom_histogram(binwidth = 0.1) + 128 | xlim(55, 70) 129 | ``` 130 | 131 | ```{r} 132 | p <- diamonds %>% 133 | ggplot(aes(depth)) 134 | 135 | p + 136 | geom_freqpoly( 137 | aes(color = cut), 138 | binwidth = 0.1, 139 | na.rm = TRUE 140 | ) + 141 | xlim(58, 68) + 142 | theme(legend.position = "none") 143 | 144 | p + 145 | geom_histogram( 146 | aes(fill = cut), 147 | binwidth = 0.1, 148 | position = "fill", 149 | na.rm = TRUE 150 | ) + 151 | xlim(58, 68) + 152 | theme(legend.position = "none") 153 | ``` 154 | 155 | ```{r} 156 | diamonds %>% 157 | ggplot(aes(depth)) + 158 | geom_density(na.rm = TRUE) + 159 | xlim(58, 68) + 160 | theme(legend.position = "none") 161 | 162 | diamonds %>% 163 | ggplot(aes(depth, fill = cut, color = cut)) + 164 | geom_density(alpha = 0.2, na.rm = TRUE) + 165 | xlim(58, 68) + 166 | theme(legend.position = "none") 167 | ``` 168 | 169 | ```{r} 170 | diamonds %>% 171 | ggplot(aes(clarity, depth)) + 172 | geom_boxplot() 173 | 174 | diamonds %>% 175 | ggplot(aes(carat, depth)) + 176 | geom_boxplot(aes(group = cut_width(carat, 0.1))) + 177 | xlim(NA, 2.05) 178 | ``` 179 | 180 | ```{r} 181 | diamonds %>% 182 | ggplot(aes(clarity, depth)) + 183 | geom_violin() 184 | 185 | diamonds %>% 186 | ggplot(aes(carat, depth)) + 187 | geom_violin(aes(group = cut_width(carat, 0.1))) + 188 | xlim(NA, 2.05) 189 | ``` 190 | 191 | ### 3.11.1 Exercises (5.4.1) 192 | 193 | 1. 194 | ```{r} 195 | diamonds %>% 196 | ggplot(aes(carat)) + 197 | geom_histogram(binwidth = 0.2) 198 | ``` 199 | 200 | 2. 201 | ```{r} 202 | p <- diamonds %>% 203 | ggplot(aes(price)) 204 | 205 | p + 206 | geom_histogram(binwidth = 100) 207 | 208 | p + 209 | geom_histogram(binwidth = 500) 210 | ``` 211 | 212 | 3. 213 | ```{r} 214 | diamonds %>% 215 | ggplot(aes(clarity, price)) + 216 | geom_boxplot() 217 | ``` 218 | 219 | 4. 220 | ```{r} 221 | diamonds %>% 222 | count(depth) %>% 223 | mutate( 224 | sum = sum(n), 225 | density = n / sum 226 | ) %>% 227 | ggplot(aes(depth, density)) + 228 | geom_line() 229 | ``` 230 | 231 | ## 5.5 Dealing with overplotting 232 | 233 | ```{r} 234 | tbl <- tibble(x = rnorm(2000), y = rnorm(2000)) 235 | 236 | norm <- tbl %>% 237 | ggplot(aes(x, y)) + 238 | labs( 239 | x = NULL, 240 | y = NULL 241 | ) 242 | 243 | # Added the titles, so it's easier to remember which one you're looking at 244 | 245 | norm + 246 | geom_point() 247 | 248 | # Hollow circles 249 | norm + 250 | geom_point(shape = 1) + 251 | labs(title = "shape = 1") 252 | 253 | # Pixel sized 254 | norm + 255 | geom_point(shape = ".") + 256 | labs(title = 'shape = "."') 257 | 258 | norm + 259 | geom_point(alpha = 1 / 3) + 260 | labs(title = "alpha = 1 / 3") 261 | 262 | norm + 263 | geom_point(alpha = 1 / 5) + 264 | labs(title = "alpha = 1 / 5") 265 | 266 | norm + 267 | geom_point(alpha = 1 / 10) + 268 | labs(title = "alpha = 1 / 10") 269 | 270 | ``` 271 | 272 | ```{r} 273 | norm + 274 | geom_bin2d() 275 | 276 | norm + 277 | geom_bin2d(bins = 10) 278 | ``` 279 | 280 | ```{r} 281 | norm + 282 | geom_hex() 283 | 284 | norm + 285 | geom_hex(bins = 10) 286 | ``` 287 | 288 | ## 5.6 Statistical summaries 289 | 290 | ```{r} 291 | diamonds %>% 292 | ggplot(aes(color)) + 293 | geom_bar() 294 | 295 | diamonds %>% 296 | ggplot(aes(color, price)) + 297 | geom_bar(stat = "summary_bin", fun = mean) 298 | ``` 299 | 300 | ```{r} 301 | diamonds %>% 302 | ggplot(aes(table, depth)) + 303 | geom_bin2d(binwidth = 1, na.rm = TRUE) + 304 | xlim(50, 70) + 305 | ylim(50, 70) 306 | 307 | diamonds %>% 308 | ggplot(aes(table, depth, z = price)) + 309 | geom_raster( 310 | binwidth = 1, 311 | stat = "summary_2d", 312 | fun = mean, 313 | na.rm = TRUE 314 | ) + 315 | xlim(50, 70) + 316 | ylim(50, 70) 317 | 318 | diamonds %>% 319 | ggplot(aes(table, depth, z = price)) + 320 | geom_tile( 321 | binwidth = 1, 322 | stat = "summary_2d", 323 | fun = mean, 324 | na.rm = TRUE 325 | ) + 326 | xlim(50, 70) + 327 | ylim(50, 70) 328 | ``` 329 | 330 | ## 5.7 Surfaces 331 | 332 | ```{r} 333 | faithfuld %>% 334 | ggplot(aes(eruptions, waiting)) + 335 | # ..level.. -> after_stat(level) 336 | geom_contour(aes(z = density, color = after_stat(level))) 337 | ``` 338 | 339 | ```{r} 340 | faithfuld %>% 341 | ggplot(aes(eruptions, waiting)) + 342 | geom_raster(aes(fill = density)) 343 | ``` 344 | 345 | ```{r} 346 | # Bubble plots work better with fewer observations 347 | small <- faithfuld %>% 348 | slice(seq(1, nrow(.), 10)) 349 | small 350 | 351 | # Or alternatively 352 | small2 <- faithfuld %>% 353 | slice(which(row_number() %% 10 == 1)) 354 | small2 355 | 356 | # Or alternatively 357 | small3 <- faithfuld %>% 358 | filter(row_number() %% 10 == 1) 359 | small3 360 | 361 | small %>% 362 | ggplot(aes(eruptions, waiting)) + 363 | geom_point(aes(size = density), alpha = 1/3) + 364 | scale_size_area() 365 | ``` 366 | 367 | ## 5.X 368 | 369 | ### rgl - 3D visualization device system for R using OpenGL 370 | 371 | https://dmurdoch.github.io/rgl/ 372 | 373 | ```{r} 374 | library(rgl) 375 | 376 | with( 377 | iris, 378 | plot3d( 379 | Sepal.Length, 380 | Sepal.Width, 381 | Petal.Length, 382 | type = "s", 383 | col = as.numeric(Species) 384 | ) 385 | ) 386 | ``` 387 | 388 | ```{r} 389 | close3d() 390 | ``` 391 | 392 | ```{r} 393 | library(rgl) 394 | library(MASS) 395 | 396 | set.seed(123) 397 | x <- rgamma(100, shape = 5, rate = 0.1) 398 | fit <- fitdistr(x, dgamma, list(shape = 1, rate = 0.1), lower = 0.001) 399 | 400 | loglik <- function(shape, rate) { 401 | sum(dgamma(x, shape = shape, rate = rate, log = TRUE)) 402 | } 403 | 404 | loglik <- Vectorize(loglik) 405 | xlim <- fit$estimate[1] + 4 * fit$sd[1] * c(-1,1) 406 | ylim <- fit$estimate[2] + 4 * fit$sd[2] * c(-1,1) 407 | mfrow3d(1, 2, sharedMouse = TRUE) 408 | persp3d( 409 | loglik, 410 | xlim = xlim, 411 | ylim = ylim, 412 | n = 30 413 | ) 414 | zlim <- fit$loglik + c(-qchisq(0.99, 2)/2, 0) 415 | next3d() 416 | persp3d( 417 | loglik, 418 | xlim = xlim, 419 | ylim = ylim, 420 | zlim = zlim, 421 | n = 30 422 | ) 423 | ``` 424 | 425 | ```{r} 426 | close3d() 427 | ``` 428 | 429 | ```{r} 430 | library(rgl) 431 | triangles3d( 432 | cbind( 433 | x = rnorm(9), 434 | y = rnorm(9), 435 | z = rnorm(9) 436 | ), 437 | col = "green" 438 | ) 439 | decorate3d() 440 | bg3d("lightgray") 441 | aspect3d(1, 1, 1) 442 | ``` 443 | 444 | ```{r} 445 | close3d() 446 | ``` -------------------------------------------------------------------------------- /ggplot2_3rd_ed/chapter_06_maps.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (3rd ed.), Chapter 6 - Maps" 3 | author: "Original Code: Hadley Wickham (except Taro Mieno for Raster Images, Duncan Murdoch for rgl and Dr. Dominic Royé for OpenStreetMaps) | Modifications: Antti Rask" 4 | date: "2023-01-12" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 6 Maps 13 | 14 | ```{r} 15 | library(tidyverse) 16 | library(conflicted) 17 | conflict_prefer("filter", "dplyr") 18 | conflict_prefer("select", "dplyr") 19 | ``` 20 | 21 | ## 6.1 Polygon maps 22 | 23 | ```{r} 24 | mi_counties <- map_data("county", "michigan") %>% 25 | as_tibble() %>% 26 | select(lon = long, lat, group, id = subregion) 27 | 28 | head(mi_counties) 29 | ``` 30 | 31 | ```{r} 32 | p <- mi_counties %>% 33 | ggplot(aes(lon, lat, group = group)) 34 | 35 | p + 36 | geom_point(size = .25, show.legend = FALSE) + 37 | coord_sf() 38 | 39 | p + 40 | geom_polygon() + 41 | coord_sf() 42 | 43 | p + 44 | geom_polygon( 45 | fill = NA, 46 | color = "grey50" 47 | ) + 48 | coord_sf() 49 | ``` 50 | 51 | ## 6.2 Simple features map 52 | 53 | ```{r} 54 | library(ozmaps) 55 | library(sf) 56 | 57 | oz_states <- ozmap_states 58 | oz_states 59 | 60 | oz_states %>% 61 | ggplot() + 62 | geom_sf() + 63 | coord_sf() 64 | ``` 65 | 66 | ### 6.2.1 Layered maps 67 | 68 | ```{r} 69 | library(rmapshaper) 70 | 71 | oz_states <- ozmap_states %>% 72 | filter(NAME != "Other Territories") 73 | 74 | oz_votes <- ms_simplify(abs_ced) 75 | 76 | ggplot() + 77 | geom_sf( 78 | data = oz_states, 79 | mapping = aes(fill = NAME), 80 | show.legend = FALSE 81 | ) + 82 | geom_sf(data = oz_votes, fill = NA) + 83 | coord_sf() 84 | ``` 85 | 86 | ### 6.2.2 Labelled maps 87 | 88 | ```{r} 89 | # Filter electorates in the Sydney metropolitan region 90 | sydney_map <- abs_ced %>% 91 | filter( 92 | NAME %in% c( 93 | "Sydney", 94 | "Wentworth", 95 | "Warringah", 96 | "Kingsford Smith", 97 | "Grayndler", 98 | "Lowe", 99 | "North Sydney", 100 | "Barton", 101 | "Bradfield", 102 | "Banks", 103 | "Blaxland", 104 | "Reid", 105 | "Watson", 106 | "Fowler", 107 | "Werriwa", 108 | "Prospect", 109 | "Parramatta", 110 | "Bennelong", 111 | "Mackellar", 112 | "Greenway", 113 | "Mitchell", 114 | "Chifley", 115 | "McMahon" 116 | ) 117 | ) 118 | 119 | # Draw the electoral map of Sydney 120 | sydney_map %>% 121 | ggplot() + 122 | geom_sf(aes(fill = NAME), show.legend = FALSE) + 123 | coord_sf( 124 | xlim = c(150.97, 151.3), 125 | ylim = c(-33.98, -33.79) 126 | ) + 127 | geom_sf_label( 128 | aes(label = NAME), 129 | label.padding = unit(1, "mm") 130 | ) 131 | 132 | # From the book: 133 | # "The warning message is worth noting. Internally geom_sf_label() uses the function st_point_on_surface() from the sf package to place labels, and the warning message occurs because most algorithms used by sf to compute geometric quantities (e.g., centroids, interior points) are based on an assumption that the points lie in on a flat two dimensional surface and parameterised with Cartesian co-ordinates. This assumption is not strictly warranted, and in some cases (e.g., regions near the poles) calculations that treat longitude and latitude in this way will give erroneous answers. For this reason, the sf package produces warning messages when it relies on this approximation." 134 | ``` 135 | 136 | ### 6.2.3 Adding other geoms 137 | 138 | ```{r} 139 | oz_capitals <- tribble( 140 | ~city, ~lat, ~lon, 141 | "Sydney", -33.8688, 151.2093, 142 | "Melbourne", -37.8136, 144.9631, 143 | "Brisbane", -27.4698, 153.0251, 144 | "Adelaide", -34.9285, 138.6007, 145 | "Perth", -31.9505, 115.8605, 146 | "Hobart", -42.8821, 147.3272, 147 | "Canberra", -35.2809, 149.1300, 148 | "Darwin", -12.4634, 130.8456, 149 | ) 150 | 151 | ggplot() + 152 | geom_sf(data = oz_votes) + 153 | geom_sf( 154 | data = oz_states, 155 | color = "black", 156 | fill = NA 157 | ) + 158 | geom_point( 159 | data = oz_capitals, 160 | mapping = aes(x = lon, y = lat), color = "red") + 161 | coord_sf() 162 | ``` 163 | 164 | ## 6.3 Map projections 165 | 166 | ```{r} 167 | st_crs(oz_votes) 168 | 169 | st_crs(oz_votes) == st_crs(4283) 170 | ``` 171 | 172 | ```{r} 173 | p <- oz_votes %>% 174 | ggplot() + 175 | geom_sf() 176 | 177 | p 178 | 179 | p + 180 | coord_sf(crs = st_crs(3112)) 181 | ``` 182 | 183 | ## 6.4 Working with sf data 184 | 185 | ```{r} 186 | edenmonaro <- abs_ced %>% 187 | filter(NAME == "Eden-Monaro") 188 | 189 | p <- edenmonaro %>% 190 | ggplot() + 191 | geom_sf() 192 | 193 | p + 194 | coord_sf( 195 | xlim = c(147.75, 150.25), 196 | ylim = c(-37.5, -34.5) 197 | ) 198 | 199 | p + 200 | coord_sf( 201 | xlim = c(150, 150.25), 202 | ylim = c(-36.3, -36) 203 | ) 204 | ``` 205 | 206 | ```{r} 207 | edenmonaro_geometry <- edenmonaro %>% 208 | pull(geometry) 209 | 210 | st_bbox(edenmonaro_geometry) 211 | ``` 212 | 213 | ```{r} 214 | edenmonaro_geometry 215 | ``` 216 | 217 | ```{r} 218 | st_cast(edenmonaro_geometry, "POLYGON") 219 | ``` 220 | 221 | ```{r} 222 | dawson <- abs_ced %>% 223 | filter(NAME == "Dawson") %>% 224 | pull(geometry) 225 | 226 | dawson 227 | 228 | dawson %>% 229 | ggplot() + 230 | geom_sf() + 231 | coord_sf() 232 | ``` 233 | 234 | ```{r} 235 | dawson <- st_cast(dawson, "POLYGON") 236 | 237 | which.max(st_area(dawson)) 238 | 239 | dawson[-69] %>% 240 | ggplot() + 241 | geom_sf() + 242 | coord_sf() 243 | ``` 244 | 245 | ## 6.5 Raster maps 246 | 247 | ```{r} 248 | library(terra) 249 | library(weatherOz) 250 | 251 | # Option 1 252 | avail <- get_available_imagery(product_id = "IDE00425") 253 | 254 | i <- get_satellite_imagery(product_id = avail, scans = 1) 255 | 256 | plot(i) 257 | ``` 258 | 259 | ```{r} 260 | library(stars) 261 | 262 | # Option 2 263 | 264 | avail <- get_available_imagery(product_id = "IDE00425") 265 | 266 | # Loop through all available images 267 | # 268 | for (index in 1:1) { 269 | i <- get_satellite_imagery(product_id = avail[index], scans = 1) 270 | 271 | # Save the satellite imagery as a .tif file with a unique name 272 | output_file <- str_c("img/satellite_imagery_", index, ".tif") 273 | writeRaster(i, output_file, overwrite = TRUE) 274 | } 275 | 276 | sat_vis <- read_stars( 277 | "img/satellite_imagery_1.tif", 278 | RasterIO = list(nBufXSize = 600, nBufYSize = 600) 279 | ) 280 | 281 | sat_vis 282 | 283 | ggplot() + 284 | geom_stars(data = sat_vis) + 285 | coord_equal() + 286 | theme_void() 287 | ``` 288 | 289 | ```{r} 290 | ggplot() + 291 | geom_stars(data = sat_vis, show.legend = FALSE) + 292 | facet_wrap(vars(band)) + 293 | scale_fill_gradient(low = "black", high = "white") + 294 | coord_equal() + 295 | theme_void() 296 | ``` 297 | 298 | ```{r} 299 | oz_states <- st_transform(oz_states, crs = st_crs(sat_vis)) 300 | 301 | ggplot() + 302 | geom_stars(data = sat_vis, show.legend = FALSE) + 303 | geom_sf(data = oz_states, fill = NA, color = "white") + 304 | coord_sf() + 305 | scale_fill_gradient(low = "black", high = "white") + 306 | theme_void() 307 | ``` 308 | 309 | ```{r} 310 | cities <- oz_capitals %>% 311 | st_as_sf(coords = c("lon", "lat"), crs = 4326, remove = FALSE) 312 | 313 | cities <- st_transform(cities, st_crs(sat_vis)) 314 | 315 | ggplot() + 316 | geom_stars(data = sat_vis, show.legend = FALSE) + 317 | geom_sf(data = oz_states, fill = NA, color = "white") + 318 | geom_sf(data = cities, color = "red") + 319 | geom_sf_text(data = cities, mapping = aes(label = city), color = "white") + 320 | scale_fill_gradient(low = "black", high = "white") + 321 | coord_sf() + 322 | theme_void() 323 | ``` 324 | 325 | ## 6.X OpenStreetMap with Dr. Dominic Royé 326 | 327 | https://dominicroye.github.io/en/2018/accessing-openstreetmap-data-with-r/ 328 | 329 | ```{r} 330 | library(osmdata) 331 | library(ggmap) 332 | 333 | # The first five features 334 | available_features() %>% 335 | head() 336 | 337 | # Amenities 338 | available_tags("amenity") %>% 339 | head() 340 | 341 | # Shops 342 | available_tags("shop") %>% 343 | head() 344 | ``` 345 | 346 | ```{r} 347 | # Building the query 348 | q <- getbb("Madrid") %>% 349 | opq() %>% 350 | add_osm_feature("amenity", "cinema") 351 | 352 | # Query structure 353 | q %>% 354 | str() 355 | ``` 356 | 357 | ```{r} 358 | cinema <- osmdata_sf(q) 359 | cinema 360 | ``` 361 | 362 | ```{r} 363 | # Our background map 364 | mad_map <- get_map( 365 | getbb("Madrid"), 366 | maptype = "toner-background", 367 | source = "stamen" 368 | ) 369 | 370 | # Final map 371 | mad_map %>% 372 | ggmap() + 373 | geom_sf( 374 | data = cinema$osm_points, 375 | inherit.aes = FALSE, 376 | color = "#238443", 377 | fill = "#004529", 378 | alpha = .5, 379 | size = 4, 380 | shape = 21 381 | ) + 382 | labs( 383 | x = "", 384 | y = "" 385 | ) 386 | ``` 387 | 388 | ```{r} 389 | # Bounding box for the Iberian Peninsula 390 | m <- c(-10, 30, 5, 46) 391 | 392 | # Building the query 393 | q <- m %>% 394 | opq(timeout = 25*100) %>% 395 | add_osm_feature("name", "Mercadona") %>% 396 | add_osm_feature("shop", "supermarket") 397 | 398 | # Query 399 | mercadona <- osmdata_sf(q) 400 | 401 | # Final map 402 | ggplot(mercadona$osm_points) + 403 | geom_sf( 404 | color = "#08519c", 405 | fill = "#08306b", 406 | alpha = .5, 407 | size = 1, 408 | shape = 21 409 | ) + 410 | theme_void() 411 | ``` 412 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/chapter_08_annotations.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (3rd ed.), Chapter 8 - Annotations" 3 | author: "Original Code: Hadley Wickham (except Claus Wilke for ggfittext) | Modifications: Antti Rask" 4 | date: "2023-01-20" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 8 Annotations 13 | 14 | ```{r} 15 | library(tidyverse) 16 | library(conflicted) 17 | conflict_prefer("filter", "dplyr") 18 | ``` 19 | 20 | ## 8.1 Plot and axis titles 21 | 22 | ```{r} 23 | mpg %>% 24 | ggplot(aes(displ, hwy, color = factor(cyl))) + 25 | geom_point() + 26 | labs( 27 | x = "Engine displacement (litres)", 28 | y = "Highway miles per gallon", 29 | color = "Number of cylinders", 30 | title = "Mileage by engine size and cylinders", 31 | subtitle = "Source: http://fueleconomy.gov" 32 | ) 33 | ``` 34 | 35 | ```{r} 36 | values <- seq(from = -2, to = 2, by = .01) 37 | tbl <- tibble(x = values, y = values ^ 3) 38 | 39 | tbl %>% 40 | ggplot(aes(x, y)) + 41 | geom_path() + 42 | labs(y = quote(f(x) == x^3)) 43 | ``` 44 | 45 | ```{r} 46 | library(ggtext) 47 | 48 | tbl <- tibble(x = 1:3, y = 1:3) 49 | base <- tbl %>% 50 | ggplot(aes(x, y)) + 51 | geom_point() + 52 | labs( 53 | x = "Axis title with *italics* and **boldface**" 54 | ) 55 | 56 | base 57 | 58 | base + 59 | theme(axis.title.x = element_markdown()) 60 | ``` 61 | 62 | ```{r} 63 | base <- mpg %>% 64 | ggplot(aes(cty, hwy, color = factor(cyl))) + 65 | geom_jitter() + 66 | geom_abline(color = "grey50", linewidth = 2) 67 | 68 | base 69 | 70 | labelled <- base + 71 | labs( 72 | x = "City mileage/gallon", 73 | y = "Highway mileage/gallon", 74 | color = "Cylinders", 75 | title = "Highway and city mileage are highly correlated" 76 | ) + 77 | scale_color_brewer(type = "seq", palette = "Spectral") 78 | 79 | labelled 80 | 81 | styled <- labelled + 82 | theme_bw() + 83 | theme( 84 | plot.title = element_text(face = "bold", size = 12), 85 | legend.background = element_rect( 86 | fill = "white", 87 | linewidth = 2, 88 | color = "grey70" 89 | ), 90 | legend.justification = c(0.99, 0), 91 | legend.position = c(0.95, 0.05), 92 | axis.ticks = element_line( 93 | color = "grey70", 94 | linewidth = 0.2 95 | ), 96 | panel.grid.major = element_line( 97 | color = "grey70", 98 | linewidth = 0.2 99 | ), 100 | panel.grid.minor = element_blank() 101 | ) 102 | 103 | styled 104 | ``` 105 | 106 | ## 8.2 Text labels 107 | 108 | ```{r} 109 | tbl <- tibble( 110 | x = 1, 111 | y = 3:1, 112 | family = c("sans", "serif", "mono") 113 | ) 114 | 115 | tbl %>% 116 | ggplot(aes(x, y)) + 117 | geom_text(aes(label = family, family = family)) 118 | ``` 119 | 120 | ```{r} 121 | tbl <- tibble( 122 | x = 1, 123 | y = 3:1, 124 | face = c("plain", "bold", "italic") 125 | ) 126 | 127 | tbl %>% 128 | ggplot(aes(x, y)) + 129 | geom_text(aes(label = face, fontface = face)) 130 | ``` 131 | 132 | ```{r} 133 | tbl <- tibble( 134 | x = c(1, 1, 2, 2, 1.5), 135 | y = c(1, 2, 1, 2, 1.5), 136 | text = c( 137 | "bottom-left", "top-left", 138 | "bottom-right", "top-right", "center" 139 | ) 140 | ) 141 | 142 | tbl %>% 143 | ggplot(aes(x, y)) + 144 | geom_text(aes(label = text)) 145 | 146 | tbl %>% 147 | ggplot(aes(x, y)) + 148 | geom_text( 149 | aes(label = text), 150 | vjust = "inward", 151 | hjust = "inward" 152 | ) 153 | ``` 154 | 155 | ```{r} 156 | tbl <- tibble( 157 | treatment = c("a", "b", "c"), 158 | response = c(1.2, 3.4, 2.5) 159 | ) 160 | 161 | tbl %>% 162 | ggplot(aes(treatment, response)) + 163 | geom_point() + 164 | geom_text( 165 | # paste0 -> str_c 166 | aes(label = str_c("(", response, ")")), 167 | nudge_x = -0.3 168 | ) + 169 | ylim(1.1, 3.6) 170 | ``` 171 | 172 | ```{r} 173 | p <- mpg %>% 174 | ggplot(aes(displ, hwy)) 175 | 176 | p + 177 | geom_text(aes(label = model)) + 178 | xlim(1, 8) 179 | 180 | p + 181 | geom_text(aes(label = model), check_overlap = TRUE) + 182 | xlim(1, 8) 183 | ``` 184 | 185 | ```{r} 186 | tbl <- tibble( 187 | waiting = c(55, 80), 188 | eruptions = c(2, 4.3), 189 | label = c("peak one", "peak two") 190 | ) 191 | 192 | faithfuld %>% 193 | ggplot(aes(waiting, eruptions)) + 194 | geom_tile(aes(fill = density)) + 195 | geom_label(data = tbl, aes(label = label)) 196 | ``` 197 | 198 | ```{r} 199 | library(ggrepel) 200 | 201 | mini_mpg <- mpg %>% 202 | # sample + nrow -> slice_sample 203 | slice_sample(n = 20) 204 | 205 | mpg %>% 206 | ggplot(aes(displ, hwy)) + 207 | geom_point(color = "red") + 208 | geom_text_repel(data = mini_mpg, aes(label = class)) 209 | ``` 210 | 211 | ## 8.3 Building custom annotations 212 | 213 | ```{r} 214 | economics %>% 215 | ggplot(aes(date, unemploy)) + 216 | geom_line() 217 | ``` 218 | 219 | ```{r} 220 | presidential_filtered <- presidential %>% 221 | # subset + economics$date[1] -> filter + min(economics$date) 222 | filter( 223 | start > min(economics$date), 224 | # added the second filter, because the economics data ends way before the presidential one 225 | start < max(economics$date) 226 | ) 227 | 228 | economics %>% 229 | ggplot() + 230 | geom_rect( 231 | aes(xmin = start, xmax = end, fill = party), 232 | ymin = -Inf, 233 | ymax = Inf, 234 | alpha = 0.2, 235 | data = presidential_filtered 236 | ) + 237 | geom_vline( 238 | aes(xintercept = as.numeric(start)), 239 | data = presidential_filtered, 240 | color = "grey50", 241 | alpha = 0.5 242 | ) + 243 | geom_text( 244 | aes(x = start, y = 2500, label = name), 245 | data = presidential_filtered, 246 | size = 3, 247 | vjust = 0, 248 | hjust = 0, 249 | nudge_x = 50 250 | ) + 251 | geom_line(aes(date, unemploy)) + 252 | scale_fill_manual(values = c("blue", "red")) + 253 | labs( 254 | x = "date", 255 | y = "unemployment" 256 | ) 257 | ``` 258 | 259 | ```{r} 260 | yrng <- range(economics$unemploy) 261 | xrng <- range(economics$date) 262 | 263 | # paste -> str_c 264 | caption <- str_c( 265 | # strwrap -> str_wrap 266 | str_wrap( 267 | "Unemployment rates in the US have varied a lot over the years", 268 | 40 269 | ), 270 | collapse = "\n" 271 | ) 272 | 273 | economics %>% 274 | ggplot(aes(date, unemploy)) + 275 | geom_line() + 276 | geom_text( 277 | aes(x, y, label = caption), 278 | data = tibble( 279 | # xrng[1] -> min(xrng) 280 | x = min(xrng), 281 | # yrng[2] -> max(yrng) 282 | y = max(yrng), 283 | caption = caption 284 | ), 285 | hjust = 0, 286 | vjust = 1, 287 | size = 4 288 | ) 289 | 290 | # Same, but easier thanks to annotate() 291 | economics %>% 292 | ggplot(aes(date, unemploy)) + 293 | geom_line() + 294 | annotate( 295 | geom = "text", 296 | # xrng[1] -> min(xrng) 297 | x = min(xrng), 298 | # yrng[2] -> max(yrng) 299 | y = max(yrng), 300 | label = caption, 301 | hjust = 0, 302 | vjust = 1, 303 | size = 4 304 | ) 305 | ``` 306 | 307 | ```{r} 308 | p <- mpg %>% 309 | ggplot(aes(displ, hwy)) + 310 | geom_point( 311 | data = filter(mpg, manufacturer == "subaru"), 312 | color = "orange", 313 | size = 3 314 | ) + 315 | geom_point() 316 | 317 | p + 318 | annotate( 319 | geom = "point", 320 | x = 5.5, 321 | y = 40, 322 | color = "orange", 323 | size = 3 324 | ) + 325 | annotate( 326 | geom = "point", 327 | x = 5.5, 328 | y = 40) + 329 | annotate( 330 | geom = "text", 331 | x = 5.6, 332 | y = 40, 333 | label = "Subaru", 334 | hjust = "left" 335 | ) 336 | ``` 337 | 338 | ```{r} 339 | p + 340 | annotate( 341 | geom = "curve", 342 | x = 4, 343 | y = 35, 344 | xend = 2.65, 345 | yend = 27, 346 | curvature = .3, 347 | arrow = arrow(length = unit(2, "mm")) 348 | ) + 349 | annotate( 350 | # Changed the geom to label and added the orange fill to make the Subaru pop out a bit more 351 | geom = "label", 352 | fill = "orange", 353 | x = 4.1, 354 | y = 35, 355 | label = "Subaru", 356 | hjust = "left" 357 | ) 358 | ``` 359 | 360 | ## 8.4 Direct labelling 361 | 362 | ```{r} 363 | library(directlabels) 364 | 365 | p <- 366 | mpg %>% 367 | ggplot(aes(displ, hwy, color = class)) 368 | 369 | p + 370 | geom_point() 371 | 372 | p + 373 | geom_point(show.legend = FALSE) + 374 | geom_dl( 375 | aes(label = class), 376 | method = "smart.grid" 377 | ) 378 | ``` 379 | 380 | ```{r} 381 | library(ggforce) 382 | 383 | mpg %>% 384 | ggplot(aes(displ, hwy)) + 385 | geom_point() + 386 | geom_mark_ellipse(aes(label = cyl, group = cyl)) 387 | ``` 388 | 389 | ```{r} 390 | library(gghighlight) 391 | library(nlme) 392 | 393 | oxboys <- Oxboys 394 | 395 | oxboys %>% 396 | ggplot(aes(age, height, group = Subject)) + 397 | geom_line() + 398 | geom_point() + 399 | gghighlight(Subject %in% 1:3) 400 | ``` 401 | 402 | ## 8.5 Annotation across facets 403 | 404 | ```{r} 405 | diamonds %>% 406 | ggplot(aes(log10(carat), log10(price))) + 407 | geom_bin2d() + 408 | facet_wrap( 409 | vars(cut), 410 | nrow = 1 411 | ) 412 | ``` 413 | 414 | ```{r} 415 | mod_coef <- coef(lm(log10(price) ~ log10(carat), data = diamonds)) 416 | 417 | diamonds %>% 418 | ggplot(aes(log10(carat), log10(price))) + 419 | geom_bin2d() + 420 | geom_abline( 421 | intercept = mod_coef[1], 422 | slope = mod_coef[2], 423 | color = "white", 424 | linewidth = 1 425 | ) + 426 | facet_wrap( 427 | vars(cut), 428 | nrow = 1 429 | ) 430 | ``` 431 | 432 | ```{r} 433 | library(gghighlight) 434 | 435 | mpg %>% 436 | ggplot(aes(displ, hwy, color = factor(cyl))) + 437 | geom_point() + 438 | gghighlight() + 439 | facet_wrap(vars(cyl)) 440 | ``` 441 | 442 | ## 8.X ggfittext 443 | 444 | Original from: https://github.com/wilkox/ggfittext 445 | 446 | ### Fitting text inside a box 447 | 448 | ```{r} 449 | library(ggfittext) 450 | 451 | p <- animals %>% 452 | ggplot(aes(type, flies, label = animal)) + 453 | geom_tile(fill = "white", color = "black") 454 | 455 | p + 456 | geom_fit_text() 457 | ``` 458 | 459 | ### Reflowing text 460 | 461 | ```{r} 462 | p + 463 | geom_fit_text(reflow = TRUE) 464 | ``` 465 | 466 | ### Growing text 467 | 468 | ```{r} 469 | p + 470 | geom_fit_text(reflow = TRUE, grow = TRUE) 471 | ``` 472 | 473 | ### Placing text 474 | 475 | ```{r} 476 | p + 477 | geom_fit_text(place = "topleft", reflow = TRUE) 478 | ``` 479 | 480 | ### Bar plots 481 | 482 | ```{r} 483 | altitudes %>% 484 | ggplot(aes(craft, altitude, label = altitude)) + 485 | geom_col() + 486 | geom_bar_text() 487 | ``` 488 | 489 | ```{r} 490 | p <- beverages %>% 491 | ggplot( 492 | aes(beverage, proportion, label = ingredient, fill = ingredient) 493 | ) 494 | 495 | p + 496 | geom_col(position = "stack") + 497 | geom_bar_text(position = "stack", reflow = TRUE) 498 | 499 | p + 500 | geom_col(position = "dodge") + 501 | geom_bar_text( 502 | position = "dodge", 503 | grow = TRUE, 504 | reflow = TRUE, 505 | place = "left" 506 | ) + 507 | coord_flip() 508 | ``` 509 | 510 | ### Specifying the box limits 511 | 512 | ```{r} 513 | presidential %>% 514 | ggplot( 515 | aes( 516 | ymin = start, ymax = end, x = party, label = name) 517 | ) + 518 | geom_fit_text(grow = TRUE) + 519 | geom_errorbar(alpha = 0.5) 520 | ``` 521 | 522 | ### Experimental feature: text in polar coordinates 523 | 524 | ```{r} 525 | gold %>% 526 | ggplot( 527 | aes( 528 | xmin = xmin, 529 | xmax = xmax, 530 | ymin = ymin, 531 | ymax = ymax, 532 | fill = linenumber, 533 | label = line 534 | ) 535 | ) + 536 | coord_polar() + 537 | geom_rect() + 538 | geom_fit_text(min.size = 0, grow = TRUE) + 539 | scale_fill_gradient(low = "#fee391", high = "#238443") 540 | ``` 541 | 542 | ### Other useful arguments 543 | 544 | ```{r} 545 | animals %>% 546 | ggplot( 547 | aes(type, flies, fill = mass, label = animal) 548 | ) + 549 | geom_tile() + 550 | geom_fit_text( 551 | reflow = TRUE, 552 | grow = TRUE, 553 | contrast = TRUE 554 | ) 555 | ``` 556 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/chapter_10_position_scales_and_axes.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (3rd ed.), Chapter 10 - Position scales and axes" 3 | author: "Original Code: Hadley Wickham | Modifications: Antti Rask" 4 | date: "2023-01-16" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 10 Position scales and axes 13 | 14 | ```{r} 15 | library(tidyverse) 16 | library(conflicted) 17 | conflict_prefer("filter", "dplyr") 18 | ``` 19 | 20 | ```{r} 21 | mpg %>% 22 | ggplot(aes(displ)) + 23 | geom_histogram() 24 | 25 | mpg %>% 26 | ggplot(aes(displ, after_stat(count))) + 27 | geom_histogram() 28 | ``` 29 | 30 | ## 10.1 Numeric position scales 31 | 32 | ### 10.1.1 Limits 33 | 34 | ```{r} 35 | ggplot(mpg, aes(displ, hwy)) + 36 | geom_point() + 37 | facet_wrap(vars(year)) 38 | 39 | base_99 <- mpg %>% 40 | filter(year == 1999) %>% 41 | ggplot(aes(displ, hwy)) + 42 | geom_point() 43 | 44 | base_08 <- mpg %>% filter(year == 2008) %>% 45 | ggplot(aes(displ, hwy)) + 46 | geom_point() 47 | 48 | base_99 49 | base_08 50 | ``` 51 | 52 | ```{r} 53 | base_99 + 54 | scale_x_continuous(limits = c(1, 7)) + 55 | scale_y_continuous(limits = c(10, 45)) 56 | 57 | base_08 + 58 | scale_x_continuous(limits = c(1, 7)) + 59 | scale_y_continuous(limits = c(10, 45)) 60 | ``` 61 | 62 | ```{r} 63 | base_99 + 64 | lims(x = c(1, 7), y = c(10, 45) 65 | ) 66 | 67 | base_08 + 68 | lims(x = c(1, 7), y = c(10, 45)) 69 | ``` 70 | 71 | ### 10.1.2 Zooming in 72 | 73 | ```{r} 74 | base <- ggplot(mpg, aes(drv, hwy)) + 75 | geom_hline(yintercept = 28, colour = "red") + 76 | geom_boxplot() 77 | 78 | base + 79 | labs(title = "base") 80 | 81 | # works as expected 82 | base + 83 | coord_cartesian(ylim = c(10, 35)) + 84 | labs(title = "coord_cartesian(ylim = c(10, 35))") 85 | 86 | # distorts the boxplot, avoid! 87 | base + 88 | ylim(10, 35) + 89 | labs(title = "ylim(10, 35)") 90 | ``` 91 | 92 | ### 10.1.3 Visual range expansion 93 | 94 | ```{r} 95 | p <- faithfuld %>% 96 | ggplot(aes(waiting, eruptions)) + 97 | geom_raster(aes(fill = density)) + 98 | theme(legend.position = "none") + 99 | labs( 100 | x = NULL, 101 | y = NULL 102 | ) 103 | 104 | p 105 | 106 | p + 107 | scale_x_continuous(expand = c(0, 0)) + 108 | scale_y_continuous(expand = c(0, 0)) 109 | 110 | # Alternatively 111 | p + 112 | scale_x_continuous(expand = expansion(0)) + 113 | scale_y_continuous(expand = expansion(0)) 114 | ``` 115 | 116 | ```{r} 117 | # Additive expansion of three units on both axes 118 | p + 119 | scale_x_continuous(expand = expansion(add = 3)) + 120 | scale_y_continuous(expand = expansion(add = 3)) 121 | 122 | # Multiplicative expansion of 20% on both axes 123 | p + 124 | scale_x_continuous(expand = expansion(mult = .2)) + 125 | scale_y_continuous(expand = expansion(mult = .2)) 126 | 127 | # Multiplicative expansion of 5% at the lower end of each axes, and 20% at the upper end; for the y-axis the expansion is set directly instead of using expansion() 128 | p + 129 | scale_x_continuous( 130 | expand = expansion(mult = c(.05, .2)) 131 | ) + 132 | scale_y_continuous( 133 | expand = c(.05, 0, .2, 0) 134 | ) 135 | ``` 136 | 137 | ### 10.1.4 Breaks 138 | 139 | ```{r} 140 | toy <- tibble( 141 | const = 1, 142 | up = 1:4, 143 | txt = letters[1:4], 144 | big = (1:4) * 1000, 145 | log = c(2, 5, 10, 2000) 146 | ) 147 | 148 | toy 149 | ``` 150 | 151 | ```{r} 152 | base <- toy %>% ggplot(aes(big, const)) + 153 | geom_point() + 154 | labs( 155 | x = NULL, 156 | y = NULL 157 | ) + 158 | scale_y_continuous(breaks = NULL) 159 | 160 | base 161 | ``` 162 | 163 | ```{r} 164 | library(scales) 165 | 166 | base + 167 | scale_x_continuous(breaks = c(1000, 2000, 4000)) + 168 | labs(title = "scale_x_continuous(breaks = c(1000, 2000, 4000))") 169 | 170 | base + 171 | scale_x_continuous(breaks = c(1000, 1500, 2000, 4000)) + 172 | labs(title = "scale_x_continuous(breaks = c(1000, 1500, 2000, 4000))") 173 | 174 | # same as the base 175 | base + 176 | scale_x_continuous(breaks = breaks_extended()) + 177 | labs(title = "scale_x_continuous(breaks = breaks_extended()) (same as base)") 178 | 179 | base + 180 | scale_x_continuous(breaks = breaks_extended(n = 2)) + 181 | labs(title = "breaks = breaks_extended(n = 2)") 182 | ``` 183 | 184 | ```{r} 185 | library(scales) 186 | 187 | base + 188 | scale_x_continuous(breaks = breaks_width(800)) + 189 | labs(title = "scale_x_continuous(breaks = breaks_width(800))") 190 | 191 | base + 192 | scale_x_continuous( 193 | breaks = breaks_width(800, offset = 200) 194 | ) + 195 | labs(title = "scale_x_continuous(breaks = breaks_width(800, offset = 200))") 196 | 197 | base + 198 | scale_x_continuous( 199 | breaks = breaks_width(800, offset = -200) 200 | ) + 201 | labs(title = "scale_x_continuous(breaks = breaks_width(800, offset = -200))") 202 | ``` 203 | 204 | ### 10.1.5 Minor breaks 205 | 206 | ```{r} 207 | mb <- unique(as.numeric(1:10 %o% 10 ^ (0:3))) 208 | mb 209 | ``` 210 | 211 | ```{r} 212 | base <- toy %>% 213 | ggplot(aes(log, const)) + 214 | geom_point() + 215 | labs( 216 | x = NULL, 217 | y = NULL 218 | ) + 219 | scale_y_continuous(breaks = NULL) 220 | 221 | base + 222 | scale_x_log10() + 223 | labs(title = "scale_x_log10()") 224 | 225 | base + 226 | scale_x_log10(minor_breaks = mb) + 227 | labs(title = "scale_x_log10(minor_breaks = mb)") 228 | ``` 229 | 230 | ### 10.1.6 Labels 231 | 232 | ```{r} 233 | base <- toy %>% ggplot(aes(big, const)) + 234 | geom_point() + 235 | labs( 236 | x = NULL, 237 | y = NULL 238 | ) + 239 | scale_y_continuous(breaks = NULL) 240 | 241 | base 242 | 243 | base + 244 | scale_x_continuous( 245 | breaks = c(2000, 4000), 246 | labels = c("2k", "4k") 247 | ) 248 | ``` 249 | 250 | ```{r} 251 | library(scales) 252 | 253 | base <- toy %>% 254 | ggplot(aes(big, const)) + 255 | geom_point() + 256 | labs( 257 | x = NULL, 258 | y = NULL 259 | ) + 260 | scale_x_continuous(breaks = NULL) 261 | 262 | base 263 | 264 | base + 265 | scale_y_continuous(labels = label_percent()) + 266 | labs(title = "scale_y_continuous(labels = label_percent())") 267 | 268 | base + 269 | scale_y_continuous(labels = label_dollar(prefix = "", suffix = "€")) + 270 | labs(title = 'scale_y_continuous(labels = label_dollar(prefix = "", suffix = "€"))') 271 | 272 | base + 273 | scale_y_continuous(breaks = NULL) + 274 | labs(title = "scale_y_continuous(breaks = NULL)") 275 | 276 | base + 277 | scale_y_continuous(labels = NULL) + 278 | labs(title = "scale_y_continuous(labels = NULL)") 279 | ``` 280 | 281 | ### 10.1.7 Transformations 282 | 283 | ```{r} 284 | base <- mpg %>% 285 | ggplot(aes(displ, hwy)) + 286 | geom_point() 287 | 288 | base 289 | 290 | base + 291 | scale_x_reverse() 292 | 293 | base + 294 | scale_y_reverse() 295 | ``` 296 | 297 | ```{r} 298 | # Convert from fuel economy to fuel consumption 299 | mpg %>% 300 | ggplot(aes(displ, hwy)) + 301 | geom_point() + 302 | scale_y_continuous(trans = "reciprocal") 303 | 304 | # Log transform x and y axes 305 | diamonds %>% 306 | ggplot(aes(price, carat)) + 307 | geom_bin2d() + 308 | scale_x_continuous(trans = "log10") + 309 | scale_y_continuous(trans = "log10") 310 | ``` 311 | 312 | ```{r} 313 | library(scales) 314 | 315 | # The following are equivalent 316 | p <- ggplot(mpg, aes(displ, hwy)) + 317 | geom_point() 318 | 319 | p + 320 | scale_y_continuous(trans = "reciprocal") 321 | 322 | p + 323 | scale_y_continuous(trans = reciprocal_trans()) 324 | ``` 325 | 326 | ```{r} 327 | # The following are equivalent 328 | 329 | p <- diamonds %>% 330 | ggplot(aes(price, carat)) + 331 | geom_bin2d() 332 | 333 | p + 334 | scale_x_continuous(trans = "log10") + 335 | scale_y_continuous(trans = "log10") 336 | 337 | p + 338 | scale_x_log10() + 339 | scale_y_log10() 340 | ``` 341 | 342 | ```{r} 343 | # These two are similar, but there are minor differences in the tick labels 344 | 345 | # manual transformation 346 | mpg %>% 347 | ggplot(aes(log10(displ), hwy)) + 348 | geom_point() 349 | 350 | # transform using scales 351 | mpg %>% 352 | ggplot(aes(displ, hwy)) + 353 | geom_point() + 354 | scale_x_log10() 355 | ``` 356 | 357 | ## 10.2 Data-time position scales 358 | 359 | ### 10.2.1 Breaks 360 | 361 | ```{r} 362 | date_base <- economics %>% 363 | ggplot(aes(date, psavert)) + 364 | geom_line(na.rm = TRUE) + 365 | labs( 366 | x = NULL, 367 | y = NULL 368 | ) 369 | 370 | date_base 371 | 372 | date_base + 373 | scale_x_date(date_breaks = "15 years") 374 | ``` 375 | 376 | ```{r} 377 | library(lubridate) 378 | library(scales) 379 | 380 | the_year <- as_date(c("2021-01-01", "2021-12-31")) 381 | 382 | set_breaks <- breaks_width("1 month") 383 | 384 | set_breaks(the_year) 385 | ``` 386 | 387 | ```{r} 388 | library(scales) 389 | 390 | set_breaks <- breaks_width("1 month", offset = 8) 391 | 392 | set_breaks(the_year) 393 | ``` 394 | 395 | ### 10.2.2 Minor breaks 396 | 397 | ```{r} 398 | library(lubridate) 399 | 400 | tbl <- tibble(y = as_date(c("2022-01-01", "2022-04-01"))) 401 | 402 | base <- tbl %>% 403 | ggplot(aes(y = y)) + 404 | labs(y = NULL) + 405 | theme_minimal() + 406 | theme( 407 | panel.grid.major = element_line(colour = "black"), 408 | panel.grid.minor = element_line(colour = "grey50") 409 | ) 410 | 411 | base + 412 | scale_y_date(date_breaks = "1 month") 413 | 414 | base + 415 | scale_y_date(date_breaks = "1 month", date_minor_breaks = "1 week") 416 | ``` 417 | 418 | ### 10.2.3 Labels 419 | 420 | ```{r} 421 | library(lubridate) 422 | library(scales) 423 | 424 | base <- economics %>% 425 | ggplot(aes(date, psavert)) + 426 | geom_line(na.rm = TRUE) + 427 | labs( 428 | x = NULL, 429 | y = NULL 430 | ) 431 | 432 | base + 433 | scale_x_date( 434 | date_breaks = "5 years" 435 | ) 436 | 437 | base + 438 | scale_x_date( 439 | date_breaks = "5 years", 440 | date_labels = "%y" 441 | ) 442 | 443 | Sys.setlocale(category = "LC_ALL", locale = "us") 444 | 445 | lim <- as_date(c("2004-01-01", "2005-01-01")) 446 | 447 | base + 448 | scale_x_date(limits = lim, date_labels = "%b %y") 449 | 450 | base + 451 | scale_x_date(limits = lim, date_labels = "%B\n%Y") 452 | 453 | base + 454 | scale_x_date( 455 | limits = lim, 456 | labels = label_date_short() 457 | ) 458 | ``` 459 | 460 | ## 10.3 Discrete position scales 461 | 462 | ```{r} 463 | p <- mpg %>% 464 | ggplot(aes(hwy, class)) + 465 | geom_point() 466 | 467 | # These two are equivalent 468 | p 469 | 470 | p + 471 | scale_x_continuous() + 472 | scale_y_discrete() 473 | 474 | # To add annotation 475 | p + 476 | annotate("text", x = 5, y = 1:7, label = 1:7) 477 | 478 | # To add jitter 479 | 480 | mpg %>% 481 | ggplot(aes(hwy, class)) + 482 | geom_jitter(width = 0, height = .25) + 483 | annotate("text", x = 5, y = 1:7, label = 1:7) 484 | ``` 485 | 486 | ```{r} 487 | p <- mpg %>% 488 | ggplot(aes(x = drv, y = hwy)) 489 | 490 | p + 491 | geom_boxplot() 492 | 493 | p + 494 | geom_boxplot(width = .4) 495 | ``` 496 | 497 | ### 10.3.1 Limits, breaks, and labels 498 | 499 | ```{r} 500 | base <- toy %>% 501 | ggplot(aes(const, txt, label = txt)) + 502 | geom_label() + 503 | scale_x_continuous(breaks = NULL) + 504 | labs( 505 | x = NULL, 506 | y = NULL 507 | ) 508 | 509 | base 510 | 511 | base + 512 | scale_y_discrete(limits = c("a", "b", "c", "d", "e")) + 513 | labs(title = 'scale_y_discrete(limits = c("a", "b", "c", "d", "e"))') 514 | 515 | base + 516 | scale_y_discrete(limits = c("d", "c", "a", "b")) + 517 | labs(title = 'scale_y_discrete(limits = c("d", "c", "a", "b"))') 518 | 519 | base + 520 | scale_y_discrete(breaks = c("b", "c")) + 521 | labs(title = 'scale_y_discrete(breaks = c("b", "c"))') 522 | 523 | base + 524 | scale_y_discrete(labels = c(c = "carrot", b = "banana")) + 525 | labs(title = 'scale_y_discrete(labels = c(c = "carrot", b = "banana"))') 526 | ``` 527 | 528 | ### 10.3.2 Label positions 529 | 530 | ```{r} 531 | base <- mpg %>% 532 | # It's nicer to look at these in order so I added this mutation before going to ggplot 533 | mutate(manufacturer = factor(manufacturer) %>% fct_reorder(hwy)) %>% 534 | ggplot(aes(manufacturer, hwy)) + 535 | geom_boxplot() + 536 | labs(x = "manufacturer") 537 | 538 | base 539 | 540 | # These two do the same thing 541 | base + 542 | guides(x = guide_axis(n.dodge = 3)) 543 | 544 | base + 545 | scale_x_discrete(guide = guide_axis(n.dodge = 3)) 546 | 547 | # These two do the same thing 548 | base + 549 | guides(x = guide_axis(angle = 90)) 550 | 551 | base + 552 | scale_x_discrete(guide = guide_axis(angle = 90)) 553 | 554 | # You can always flip the axes 555 | base + 556 | coord_flip() 557 | ``` 558 | 559 | ## 10.4 Binned position scales 560 | 561 | ```{r} 562 | p <- mpg %>% 563 | ggplot(aes(hwy)) 564 | 565 | p + 566 | geom_histogram(bins = 8) 567 | 568 | p + 569 | geom_bar() + 570 | scale_x_binned() 571 | ``` 572 | 573 | ```{r} 574 | base <- mpg %>% 575 | ggplot(aes(displ, hwy)) + 576 | geom_count() 577 | 578 | base 579 | 580 | base + 581 | scale_x_binned(n.breaks = 15) + 582 | scale_y_binned(n.breaks = 15) 583 | ``` 584 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/chapter_12_other_aesthetics.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (3rd ed.), Chapter 12 - Other aesthetics" 3 | author: "Original Code: Hadley Wickham | Modifications: Antti Rask" 4 | date: "2023-01-16" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 12 Other aesthetics 13 | 14 | ```{r} 15 | library(tidyverse) 16 | ``` 17 | 18 | ## 12.1 Size 19 | 20 | ```{r} 21 | base <- mpg %>% 22 | ggplot(aes(displ, hwy, size = cyl)) + 23 | geom_point() 24 | 25 | base 26 | 27 | base + 28 | scale_size(range = c(1, 2)) 29 | ``` 30 | 31 | ### 12.1.1 Radius size scales 32 | 33 | ```{r} 34 | # Since there is no ready planets dataset, I copy pasted the dataframe from the online version and used the tribble function to turn it into a tibble 35 | 36 | planets <- tribble( 37 | ~name, ~type, ~position, ~radius, ~orbit, 38 | "Mercury", "Inner", 1, 2440, 5.79e+07, 39 | "Venus", "Inner", 2, 6052, 1.08e+08, 40 | "Earth", "Inner", 3, 6378, 1.50e+08, 41 | "Mars", "Inner", 4, 3390, 2.28e+08, 42 | "Jupiter", "Outer", 5, 71400, 7.78e+08, 43 | "Saturn", "Outer", 6, 60330, 1.43e+09, 44 | "Uranus", "Outer", 7, 25559, 2.87e+09, 45 | "Neptune", "Outer", 8, 24764, 4.50e+09 46 | ) %>% 47 | mutate( 48 | name = as_factor(name) %>% fct_reorder(position) 49 | ) 50 | 51 | planets 52 | ``` 53 | 54 | ```{r} 55 | base <- planets %>% 56 | ggplot(aes(1, name, size = radius)) + 57 | geom_point() + 58 | scale_x_continuous(breaks = NULL) + 59 | labs( 60 | x = NULL, 61 | y = NULL, 62 | size = NULL 63 | ) 64 | 65 | base + 66 | ggtitle("not to scale") 67 | 68 | base + 69 | scale_radius(limits = c(0, NA), range = c(0, 10)) + 70 | ggtitle("to scale") 71 | ``` 72 | 73 | ### 12.1.2 Binned size scales 74 | 75 | ```{r} 76 | base <- mpg %>% 77 | ggplot(aes(displ, manufacturer, size = hwy)) + 78 | geom_point(alpha = .2) + 79 | scale_size_binned() 80 | 81 | base 82 | 83 | base + 84 | guides(size = guide_bins(show.limits = TRUE)) 85 | 86 | base + 87 | guides(size = guide_bins(axis = FALSE)) 88 | 89 | base + 90 | guides( 91 | size = guide_bins( 92 | axis.colour = "red", 93 | axis.arrow = arrow( 94 | length = unit(.1, "inches"), 95 | ends = "first", 96 | type = "closed" 97 | ) 98 | ) 99 | ) 100 | 101 | base + 102 | guides(size = guide_bins(direction = "horizontal")) 103 | 104 | base + 105 | guides(size = guide_bins(axis = FALSE, direction = "horizontal")) 106 | 107 | base + 108 | guides(size = guide_bins(axis = FALSE, direction = "horizontal", show.limits = TRUE)) 109 | ``` 110 | 111 | ## 12.2 Shape 112 | 113 | ```{r} 114 | base <- mpg %>% 115 | ggplot(aes(displ, hwy, shape = factor(cyl))) + 116 | geom_point() 117 | 118 | base 119 | 120 | base + 121 | scale_shape(solid = FALSE) 122 | 123 | base + 124 | scale_shape_manual( 125 | values = c("4" = 16, "5" = 17, "6" = 1 , "8" = 2) 126 | ) 127 | ``` 128 | 129 | ## 12.3 Line type 130 | 131 | ```{r} 132 | economics_long %>% 133 | ggplot(aes(date, value01, linetype = variable)) + 134 | geom_line() 135 | ``` 136 | 137 | ```{r} 138 | tbl <- tibble(value = letters[1:13]) 139 | 140 | base <- tbl %>% 141 | ggplot(aes(linetype = value)) + 142 | geom_segment( 143 | mapping = aes(x = 0, xend = 1, y = value, yend = value), 144 | show.legend = FALSE 145 | ) + 146 | theme(panel.grid = element_blank()) + 147 | scale_x_continuous(NULL, NULL) 148 | 149 | base 150 | 151 | linetypes <- function(n) { 152 | types <- c("55", "75", "95", "1115", "111115", "11111115", "5158", "9198", "c1c8") 153 | return(types[seq_len(n)]) 154 | } 155 | 156 | base + 157 | scale_linetype(palette = linetypes) 158 | 159 | base + 160 | scale_linetype(palette = linetypes, na.value = "dotted") 161 | ``` 162 | 163 | ## 12.4 Manual scales 164 | 165 | ```{r} 166 | huron <- tibble( 167 | year = 1875:1972, 168 | level = as.numeric(LakeHuron) 169 | ) 170 | 171 | huron %>% 172 | ggplot(aes(year)) + 173 | geom_line(aes(y = level + 5), color = "red") + 174 | geom_line(aes(y = level - 5), color = "blue") 175 | 176 | p <- huron %>% 177 | ggplot(aes(year)) + 178 | geom_line(aes(y = level + 5, color = "above")) + 179 | geom_line(aes(y = level - 5, color = "below")) 180 | 181 | p 182 | 183 | p + 184 | scale_color_manual( 185 | "Direction", 186 | values = c("above" = "red", "below" = "blue") 187 | ) 188 | ``` 189 | 190 | ## 12.5 Identity scales 191 | 192 | ```{r} 193 | head(luv_colours) 194 | 195 | luv_colours %>% 196 | ggplot(aes(u, v, color = col)) + 197 | geom_point(size = 3) + 198 | scale_color_identity() + 199 | coord_equal() 200 | ``` 201 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/chapter_13_mastering_the_grammar.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (3rd ed.), Chapter 13 - Mastering the grammar" 3 | author: "Original Code: Hadley Wickham | Modifications: Antti Rask" 4 | date: "2023-01-13" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 13 Mastering the grammar 13 | 14 | ```{r} 15 | library(tidyverse) 16 | ``` 17 | 18 | ## 13.2 Building a scatterplot 19 | 20 | ```{r} 21 | mpg %>% 22 | ggplot(aes(displ, hwy, color = factor(cyl))) + 23 | geom_point() 24 | ``` 25 | 26 | ### 13.2.1 Mapping aesthetics to data 27 | 28 | ```{r} 29 | p <- mpg %>% 30 | ggplot(aes(displ, hwy, color = factor(cyl))) 31 | 32 | # Don't do this! 33 | p + 34 | geom_line() + 35 | theme(legend.position = "none") 36 | 37 | # Or this! 38 | p + 39 | geom_bar(stat = "identity", position = "identity", fill = NA) + 40 | theme(legend.position = "none") 41 | 42 | # This works 43 | p + 44 | geom_point() + 45 | geom_smooth(method = "lm") 46 | ``` 47 | 48 | ## 13.3 Adding Complexity 49 | 50 | ```{r} 51 | mpg %>% 52 | ggplot(aes(displ, hwy)) + 53 | geom_point() + 54 | geom_smooth() + 55 | facet_wrap(vars(year)) 56 | ``` 57 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/chapter_14_build_a_plot_layer_by_layer.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (3rd ed.), Chapter 14 - Build a plot layer by layer" 3 | author: "Original Code: Hadley Wickham | Modifications: Antti Rask" 4 | date: "2023-01-15" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 14 Build a plot layer by layer 13 | 14 | ```{r} 15 | library(tidyverse) 16 | library(conflicted) 17 | conflict_prefer("filter", "dplyr") 18 | ``` 19 | 20 | ## 14.2 Building a plot 21 | 22 | ```{r} 23 | p <- mpg %>% 24 | ggplot(aes(displ, hwy)) 25 | 26 | p 27 | ``` 28 | 29 | ```{r} 30 | p + 31 | geom_point() 32 | 33 | p + 34 | layer( 35 | mapping = NULL, 36 | data = NULL, 37 | geom = "point", 38 | stat = "identity", 39 | position = "identity" 40 | ) 41 | ``` 42 | 43 | ## 14.3 Data 44 | 45 | ```{r} 46 | mod <- 47 | loess(hwy ~ displ, data = mpg) 48 | 49 | grid <- 50 | tibble( 51 | displ = seq( 52 | min(mpg$displ), 53 | max(mpg$displ), 54 | length = 50) 55 | ) 56 | 57 | grid$hwy <- 58 | predict( 59 | mod, 60 | newdata = grid 61 | ) 62 | 63 | grid 64 | 65 | mod 66 | ``` 67 | 68 | ```{r} 69 | std_resid <- resid(mod) / mod$s 70 | 71 | outlier <- mpg %>% 72 | filter(abs(std_resid) > 2) 73 | 74 | outlier 75 | ``` 76 | 77 | ```{r} 78 | mpg %>% 79 | ggplot(aes(displ, hwy)) + 80 | geom_point() + 81 | geom_line( 82 | data = grid, 83 | color = "blue", 84 | linewidth = 1.5 85 | ) + 86 | geom_text(data = outlier, aes(label = model)) 87 | 88 | # Don't do this! 89 | ggplot(mapping = aes(displ, hwy)) + 90 | geom_point(data = mpg) + 91 | geom_line( 92 | data = grid, 93 | color = "blue", 94 | linewidth = 1.5 95 | ) + 96 | geom_text(data = outlier, aes(label = model)) 97 | ``` 98 | 99 | ### 14.3.1 Exercises 100 | 101 | 2. 102 | ```{r} 103 | class <- mpg %>% 104 | group_by(class) %>% 105 | summarise( 106 | n = n(), 107 | hwy = mean(hwy) 108 | ) 109 | 110 | mpg %>% 111 | ggplot(aes(class, hwy)) + 112 | geom_jitter(width = 0.2) + 113 | geom_point( 114 | data = class, 115 | size = 5, 116 | color = "red" 117 | ) + 118 | geom_text( 119 | data = class, 120 | aes(y = 10, label = str_c("n = ", n)), 121 | size = 3 122 | ) 123 | ``` 124 | 125 | ## 14.4 Aesthetic mappings 126 | 127 | ```{r} 128 | aes(x = displ, y = hwy, color = class) 129 | 130 | aes(displ, hwy, color = class) 131 | ``` 132 | 133 | ### 14.4.1 Specifying the aesthetics in the plot vs. in the layers 134 | 135 | ```{r} 136 | # All create the same plot specification 137 | mpg %>% 138 | ggplot(aes(displ, hwy, color = class)) + 139 | geom_point() 140 | 141 | mpg %>% 142 | ggplot(aes(displ, hwy)) + 143 | geom_point(aes(color = class)) 144 | 145 | mpg %>% 146 | ggplot(aes(displ)) + 147 | geom_point(aes(y = hwy, color = class)) 148 | 149 | mpg %>% 150 | ggplot() + 151 | geom_point(aes(displ, hwy, color = class)) 152 | ``` 153 | 154 | ```{r} 155 | mpg %>% 156 | ggplot(aes(displ, hwy, color = class)) + 157 | geom_point() + 158 | geom_smooth(method = "lm", se = FALSE) + 159 | theme(legend.position = "none") 160 | 161 | mpg %>% 162 | ggplot(aes(displ, hwy)) + 163 | geom_point(aes(color = class)) + 164 | geom_smooth(method = "lm", se = FALSE) + 165 | theme(legend.position = "none") 166 | ``` 167 | 168 | ### 14.4.2 Setting vs. mapping 169 | 170 | ```{r} 171 | p <- mpg %>% 172 | ggplot(aes(cty, hwy)) 173 | 174 | p + 175 | geom_point(color = "darkblue") 176 | 177 | p + 178 | geom_point(aes(color = "darkblue")) 179 | 180 | # Override the default scale 181 | p + 182 | geom_point(aes(color = "darkblue")) + 183 | scale_color_identity() 184 | ``` 185 | 186 | ```{r} 187 | mpg %>% 188 | ggplot(aes(displ, hwy)) + 189 | geom_point() + 190 | geom_smooth( 191 | aes(color = "loess"), 192 | method = "loess", 193 | se = FALSE 194 | ) + 195 | geom_smooth( 196 | aes(color = "lm"), 197 | method = "lm", 198 | se = FALSE) + 199 | labs(color = "Method") 200 | ``` 201 | 202 | ### 14.4.3 Exercises 203 | 204 | 1. 205 | ```{r} 206 | mpg %>% 207 | ggplot() + 208 | geom_point(aes(displ, hwy)) 209 | 210 | mpg %>% 211 | ggplot(aes(cty, hwy)) + 212 | geom_point() + 213 | geom_smooth() 214 | 215 | msleep %>% 216 | mutate( 217 | log_brainwt = log(brainwt), 218 | log_bodywt = log(bodywt)) %>% 219 | ggplot(aes(log_brainwt, log_bodywt)) + 220 | geom_point() 221 | ``` 222 | 223 | 2. 224 | ```{r} 225 | # Don't do this! 226 | ggplot(mpg) + 227 | geom_point(aes(class, cty)) + 228 | geom_boxplot(aes(trans, hwy)) 229 | ``` 230 | 231 | ## 14.6 Stats 232 | 233 | ```{r} 234 | p <- 235 | mpg %>% 236 | ggplot(aes(trans, cty)) + 237 | geom_point() 238 | 239 | p + 240 | stat_summary( 241 | geom = "point", 242 | fun = "mean", 243 | color = "red", 244 | size = 4 245 | ) 246 | 247 | p + 248 | geom_point( 249 | stat = "summary", 250 | fun = "mean", 251 | color = "red", 252 | size = 4 253 | ) 254 | ``` 255 | 256 | ### 14.6.1 Generated variables 257 | 258 | ```{r} 259 | p <- diamonds %>% 260 | ggplot(aes(price)) 261 | 262 | p + 263 | geom_histogram(binwidth = 500) 264 | 265 | p + 266 | geom_histogram( 267 | aes(y = after_stat(density)), 268 | binwidth = 500 269 | ) 270 | ``` 271 | 272 | ```{r} 273 | p <- 274 | diamonds %>% 275 | ggplot(aes(price, color = cut)) 276 | 277 | p + 278 | geom_freqpoly(binwidth = 500) + 279 | theme(legend.position = "none") 280 | 281 | p + 282 | geom_freqpoly( 283 | aes(y = after_stat(density)), 284 | binwidth = 500 285 | ) + 286 | theme(legend.position = "none") 287 | ``` 288 | 289 | ### 14.6.2 Exercises 290 | 291 | 1. 292 | ```{r} 293 | mod <- loess(hwy ~ displ, data = mpg) 294 | 295 | smoothed <- tibble(displ = seq(1.6, 7, length = 50)) 296 | 297 | pred <- predict(mod, newdata = smoothed, se = TRUE) 298 | 299 | smoothed$hwy <- pred$fit 300 | 301 | smoothed$hwy_lwr <- pred$fit - 1.96 * pred$se.fit 302 | 303 | smoothed$hwy_upr <- pred$fit + 1.96 * pred$se.fit 304 | 305 | smoothed 306 | 307 | mpg %>% 308 | ggplot(aes(displ, hwy)) + 309 | geom_smooth() 310 | 311 | smoothed %>% 312 | ggplot(aes(displ, hwy)) + 313 | geom_line(color = "blue", linewidth = 1) + 314 | geom_ribbon( 315 | aes(ymin = hwy_lwr, ymax = hwy_upr), 316 | alpha = 0.2 317 | ) 318 | ``` 319 | 320 | 3. 321 | ```{r} 322 | mpg %>% 323 | ggplot(aes(drv, trans)) + 324 | geom_count(aes(size = after_stat(prop), group = 1)) 325 | ``` 326 | 327 | ## 14.7 Position adjustments 328 | 329 | ```{r} 330 | dplot <- diamonds %>% 331 | ggplot(aes(color, fill = cut)) + 332 | labs( 333 | x = NULL, 334 | y = NULL 335 | ) + 336 | theme(legend.position = "none") 337 | 338 | # Added the titles, so it's easier to remember which one you're looking at 339 | dplot + 340 | geom_bar() + 341 | labs( 342 | title = "position_stack() (default for geom_bar())" 343 | ) 344 | 345 | dplot + 346 | geom_bar(position = "fill") + 347 | labs( 348 | title = "position_fill()" 349 | ) 350 | 351 | dplot + 352 | geom_bar(position = "dodge") + 353 | labs( 354 | title = "position_dodge()" 355 | ) 356 | ``` 357 | 358 | ```{r} 359 | # Don't do this! 360 | dplot + 361 | geom_bar( 362 | position = "identity", 363 | alpha = 1 / 2, 364 | color = "grey50" 365 | ) 366 | 367 | diamonds %>% 368 | ggplot(aes(color, color = cut)) + 369 | geom_line(aes(group = cut), stat = "count") + 370 | labs( 371 | x = NULL, 372 | y = NULL 373 | ) + 374 | theme(legend.position = "none") 375 | ``` 376 | 377 | ```{r} 378 | p <- mpg %>% 379 | ggplot(aes(displ, hwy)) 380 | 381 | p + 382 | geom_point(position = "jitter") 383 | 384 | p + 385 | geom_point( 386 | position = position_jitter( 387 | width = 0.05, 388 | height = 0.5 389 | ) 390 | ) 391 | 392 | p + 393 | geom_jitter(width = 0.05, height = 0.5) 394 | ``` 395 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/chapter_15_scales_and_guides.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (3rd ed.), Chapter 15 - Scales and Guides" 3 | author: "Original Code: Hadley Wickham | Modifications: Antti Rask" 4 | date: "2023-01-16" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 15 Scales and guides 13 | 14 | ```{r} 15 | library(tidyverse) 16 | ``` 17 | 18 | ## 15.1 Theory of scales and guides 19 | 20 | ### 15.1.1 Scale specification 21 | 22 | ```{r} 23 | p <- mpg %>% 24 | ggplot(aes(displ, hwy)) + 25 | geom_point(aes(color = class)) 26 | 27 | p 28 | 29 | p + 30 | scale_x_continuous() + 31 | scale_y_continuous() + 32 | scale_color_discrete() 33 | ``` 34 | 35 | ```{r} 36 | p <- mpg %>% 37 | ggplot(aes(displ, hwy)) 38 | 39 | p + 40 | geom_point(aes(color = class)) + 41 | scale_x_continuous("A really awesome x axis") + 42 | scale_y_continuous("An amazingly great y axis") 43 | 44 | # Don't do this! 45 | p + 46 | geom_point() + 47 | scale_x_continuous("Label 1") + 48 | scale_x_continuous("Label 2") 49 | 50 | # Do this instead! 51 | p + 52 | geom_point() + 53 | scale_x_continuous("Label 2") 54 | 55 | p + 56 | geom_point(aes(color = class)) + 57 | scale_x_sqrt() + 58 | scale_color_brewer() 59 | ``` 60 | 61 | ## 15.4 Scale limits 62 | 63 | ```{r} 64 | library(scales) 65 | 66 | tbl <- tibble(x = 1:6, y = 8:13) 67 | 68 | base <- tbl %>% 69 | ggplot(aes(x, y, fill = x)) + 70 | # bar chart 71 | geom_col() + 72 | # for visual clarity only 73 | geom_vline(xintercept = 3.5, color = "red") 74 | 75 | base 76 | 77 | base + 78 | scale_fill_gradient(limits = c(1, 3)) 79 | 80 | base + 81 | scale_fill_gradient( 82 | limits = c(1, 3), 83 | oob = squish 84 | ) 85 | ``` 86 | 87 | ## 15.6 Scale transformation 88 | 89 | ```{r} 90 | base <- faithfuld %>% 91 | ggplot(aes(waiting, eruptions)) + 92 | geom_raster(aes(fill = density)) + 93 | scale_x_continuous(NULL, NULL, expand = c(0, 0)) + 94 | scale_y_continuous(NULL, NULL, expand = c(0, 0)) 95 | 96 | base 97 | 98 | base + 99 | scale_fill_continuous(trans = "sqrt") 100 | ``` 101 | 102 | ```{r} 103 | tbl <- tibble( 104 | x = runif(20), 105 | y = runif(20), 106 | z = sample(20) 107 | ) 108 | 109 | base <- tbl %>% 110 | ggplot(aes(x, y, size = z)) + 111 | geom_point() 112 | 113 | base 114 | 115 | base + scale_size(trans = "reverse") 116 | ``` 117 | 118 | ## 15.7 Legend merging and splitting 119 | 120 | ### 15.7.1 Merging legends 121 | 122 | ```{r} 123 | toy <- tibble( 124 | const = 1, 125 | up = 1:4, 126 | txt = letters[1:4], 127 | big = (1:4) * 1000, 128 | log = c(2, 5, 10, 2000) 129 | ) 130 | 131 | p <- toy %>% 132 | ggplot(aes(up, up)) 133 | 134 | p + 135 | geom_point(size = 4, color = "grey20") + 136 | geom_point(aes(color = txt), size = 2) 137 | 138 | p + 139 | geom_point( 140 | size = 4, color = "grey20", show.legend = TRUE 141 | ) + 142 | geom_point(aes(color = txt), size = 2) 143 | ``` 144 | 145 | ```{r} 146 | base <- toy %>% 147 | ggplot(aes(const, up)) + 148 | scale_x_continuous(NULL, breaks = NULL) 149 | 150 | base + 151 | geom_point(aes(color = txt)) 152 | 153 | base + 154 | geom_point(aes(shape = txt)) 155 | 156 | base + 157 | geom_point(aes(shape = txt, color = txt)) 158 | ``` 159 | 160 | ```{r} 161 | base <- toy %>% 162 | ggplot(aes(const, up)) + 163 | geom_point(aes(shape = txt, color = txt)) + 164 | scale_x_continuous(NULL, breaks = NULL) 165 | 166 | base 167 | 168 | base + 169 | labs(shape = "Split legend") 170 | 171 | base + 172 | labs( 173 | shape = "Merged legend", 174 | color = "Merged legend" 175 | ) 176 | ``` 177 | 178 | ### 15.7.2 Splitting legends 179 | 180 | ```{r} 181 | library(ggnewscale) 182 | 183 | base <- mpg %>% 184 | ggplot(aes(displ, hwy)) + 185 | geom_point(aes(color = factor(year)), size = 5) + 186 | scale_color_brewer( 187 | "year", 188 | type = "qual", 189 | palette = 5 190 | ) 191 | 192 | base 193 | 194 | base + 195 | new_scale_color() + 196 | geom_point( 197 | aes(color = cyl == 4), 198 | size = 1, 199 | fill = NA 200 | ) + 201 | scale_color_manual( 202 | "4 cylinder", 203 | values = c("grey60", "black") 204 | ) 205 | ``` 206 | 207 | ## 15.8 Legend key glyphs 208 | 209 | ```{r} 210 | base <- economics %>% 211 | ggplot(aes(date, psavert, color = "savings")) 212 | 213 | base + 214 | geom_line() 215 | 216 | base + 217 | geom_line(key_glyph = "timeseries") 218 | 219 | # Alternatively 220 | base + 221 | geom_line(key_glyph = draw_key_timeseries) 222 | ``` 223 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/chapter_16_coordinate_system.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (3rd ed.), Chapter 16 - Coordinate systems" 3 | author: "Original Code: Hadley Wickham | Modifications: Antti Rask" 4 | date: "2023-01-18" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 16 Coordinate systems 13 | 14 | ```{r} 15 | library(tidyverse) 16 | ``` 17 | 18 | ## 16.1 Linear coordinate systems 19 | 20 | ### 16.1.1 Zooming into a plot with coord_cartesian() 21 | 22 | ```{r} 23 | base <- mpg %>% 24 | ggplot(aes(displ, hwy)) + 25 | geom_point() + 26 | geom_smooth() 27 | 28 | # Full dataset 29 | base 30 | 31 | # Scaling to 4--6 throws away data outside that range 32 | base + 33 | scale_x_continuous(limits = c(4, 6)) 34 | 35 | # Zooming to 4--6 keeps all the data but only shows some of it 36 | base + 37 | coord_cartesian(xlim = c(4, 6)) 38 | ``` 39 | 40 | ### 16.1.2 Flipping the axes with coord_flip() 41 | 42 | ```{r} 43 | p <- mpg %>% 44 | ggplot(aes(displ, cty)) + 45 | geom_point() + 46 | geom_smooth() 47 | 48 | p 49 | 50 | # Exchanging cty and displ rotates the plot 90 degrees, # but the smooth is fit to the rotated data. 51 | mpg %>% 52 | ggplot(aes(cty, displ)) + 53 | geom_point() + 54 | geom_smooth() 55 | 56 | # coord_flip() fits the smooth to the original data, and then rotates the output 57 | p + 58 | coord_flip() 59 | ``` 60 | 61 | ## 16.2 Non-linear coordinate systems 62 | 63 | ```{r} 64 | rect <- tibble(x = 50, y = 50) 65 | line <- tibble(x = c(1, 200), y = c(100, 1)) 66 | 67 | base <- ggplot(mapping = aes(x, y)) + 68 | geom_tile( 69 | data = rect, 70 | aes(width = 50, height = 50) 71 | ) + 72 | geom_line(data = line) + 73 | labs( 74 | x = NULL, 75 | y = NULL 76 | ) 77 | 78 | base + 79 | labs(title = "base") 80 | 81 | base + 82 | coord_polar("x") + 83 | labs(title = 'coord_polar("x")') 84 | 85 | base + 86 | coord_polar("y") + 87 | labs(title = 'coord_polar("y")') 88 | 89 | base + 90 | coord_flip() + 91 | labs(title = "coord_flip") 92 | 93 | base + 94 | coord_trans(y = "log10") + 95 | labs(title = 'coord_trans(y = "log10")') 96 | 97 | base + 98 | coord_fixed() + 99 | labs(title = "coord_fixed()") 100 | ``` 101 | 102 | Munching: 103 | 104 | 1. We start with a line parameterised by its two endpoints: 105 | ```{r} 106 | tbl <- tibble( 107 | r = c(0, 1), 108 | theta = c(0, 3 / 2 * pi) 109 | ) 110 | 111 | tbl %>% 112 | ggplot(aes(r, theta)) + 113 | geom_line() + 114 | geom_point(size = 2, color = "red") 115 | ``` 116 | 117 | 2. We break it into multiple line segments, each with two endpoints: 118 | ```{r} 119 | interp <- function(rng, n) { 120 | seq(rng[1], rng[2], length = n) 121 | } 122 | 123 | munched <- tibble( 124 | r = interp(tbl$r, 15), 125 | theta = interp(tbl$theta, 15) 126 | ) 127 | 128 | munched %>% 129 | ggplot(aes(r, theta)) + 130 | geom_line() + 131 | geom_point(size = 2, color = "red") 132 | ``` 133 | 134 | 3. We transform the locations of each piece: 135 | ```{r} 136 | transformed <- transform( 137 | munched, 138 | x = r * sin(theta), 139 | y = r * cos(theta) 140 | ) 141 | 142 | transformed %>% 143 | ggplot(aes(x, y)) + 144 | geom_path() + 145 | geom_point(size = 2, color = "red") + 146 | coord_fixed() 147 | ``` 148 | 149 | ### 16.2.1 Transformations with coord_trans() 150 | 151 | ```{r, warning=FALSE} 152 | library(scales) 153 | 154 | # Linear model on original scale is poor fit 155 | base <- diamonds %>% 156 | ggplot(aes(carat, price)) + 157 | stat_bin2d() + 158 | geom_smooth(method = "lm") + 159 | labs( 160 | x = NULL, 161 | y = NULL 162 | ) + 163 | theme(legend.position = "none") 164 | 165 | base 166 | 167 | # Better fit on log scale, but harder to interpret 168 | p <- base + 169 | scale_x_log10() + 170 | scale_y_log10() 171 | 172 | p 173 | 174 | # Fit on log scale, then backtransform to original. 175 | # Highlights lack of expensive diamonds with large 176 | # carats 177 | pow10 <- exp_trans(10) 178 | 179 | p + 180 | coord_trans(x = pow10, y = pow10) 181 | ``` 182 | 183 | ### 16.2.2 Polar coordinates with coord_polar() 184 | 185 | ```{r} 186 | base <- mtcars %>% 187 | ggplot(aes(factor(1), fill = factor(cyl))) + 188 | geom_bar(width = 1) + 189 | theme(legend.position = "none") + 190 | scale_x_discrete(NULL, expand = c(0, 0)) + 191 | scale_y_continuous(NULL, expand = c(0, 0)) 192 | 193 | # Stacked barchart 194 | base 195 | 196 | # Pie chart 197 | base + 198 | coord_polar(theta = "y") 199 | 200 | # The bullseye chart 201 | base + 202 | coord_polar() 203 | ``` 204 | 205 | ### 16.2.3 Map Projections with coord_map() 206 | 207 | ```{r} 208 | # Prepare a map of NZ 209 | nzmap <- map_data("nz") %>% 210 | ggplot(aes(long, lat, group = group)) + 211 | geom_polygon(fill = "white", color = "black") + 212 | labs( 213 | x = NULL, 214 | y = NULL 215 | ) 216 | 217 | # Plot it in cartesian coordinates 218 | nzmap 219 | 220 | # With the aspect ratio approximation 221 | nzmap + 222 | # coord_quickmap -> coord_sf 223 | coord_sf() 224 | ``` 225 | 226 | ```{r} 227 | # Since coord_map() has been superseded by coord_sf, I've modified this code to use that instead 228 | world <- map_data("world") 229 | finland <- map_data("world", "finland") 230 | 231 | worldmap <- world %>% 232 | ggplot(aes(long, lat, group = group)) 233 | 234 | worldmap + 235 | geom_path() + 236 | geom_path(data = finland, color = "blue") + 237 | # coord_map -> coord_sf 238 | coord_sf() 239 | 240 | worldmap + 241 | geom_polygon(fill = "white", color = "black") + 242 | geom_polygon(data = finland, fill = "blue") + 243 | # coord_map -> coord_sf 244 | coord_sf() 245 | ``` 246 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/chapter_17_faceting.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (3rd ed.), Chapter 17 - Faceting" 3 | author: "Original Code: Hadley Wickham | Modifications: Antti Rask" 4 | date: "2023-01-18" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 17 Faceting 13 | 14 | ```{r} 15 | library(tidyverse) 16 | library(conflicted) 17 | conflict_prefer("filter", "dplyr") 18 | conflict_prefer("select", "dplyr") 19 | ``` 20 | 21 | ```{r} 22 | mpg2 <- mpg %>% 23 | # subset -> filter 24 | filter(cyl != 5 & drv %in% c("4", "f") & class != "2seater") 25 | ``` 26 | 27 | ## 17.1 Facet wrap 28 | 29 | ```{r} 30 | base <- mpg2 %>% 31 | ggplot(aes(displ, hwy)) + 32 | geom_blank() + 33 | labs( 34 | x = NULL, 35 | y = NULL 36 | ) 37 | 38 | base + 39 | facet_wrap( 40 | vars(class), 41 | ncol = 3 42 | ) 43 | 44 | base + facet_wrap( 45 | vars(class), 46 | ncol = 3, 47 | as.table = FALSE 48 | ) 49 | 50 | base + 51 | facet_wrap( 52 | vars(class), 53 | nrow = 3 54 | ) 55 | 56 | base + 57 | facet_wrap( 58 | vars(class), 59 | nrow = 3, 60 | dir = "v" 61 | ) 62 | ``` 63 | 64 | ## 17.2 Facet grid 65 | 66 | ```{r} 67 | base + 68 | facet_grid(cols = vars(cyl)) 69 | 70 | base + 71 | facet_grid(rows = vars(drv)) 72 | 73 | base + 74 | facet_grid(vars(drv), vars(cyl)) 75 | ``` 76 | 77 | ## 17.3 Controlling scales 78 | 79 | ```{r} 80 | p <- mpg2 %>% 81 | ggplot(aes(cty, hwy)) + 82 | geom_abline() + 83 | geom_jitter(width = 0.1, height = 0.1) 84 | 85 | p + 86 | facet_wrap(vars(cyl)) 87 | 88 | p + 89 | facet_wrap(vars(cyl), scales = "free") 90 | ``` 91 | 92 | ```{r} 93 | economics_long 94 | 95 | economics_long %>% 96 | ggplot(aes(date, value)) + 97 | geom_line() + 98 | facet_wrap( 99 | vars(variable), 100 | scales = "free_y", 101 | ncol = 1 102 | ) 103 | ``` 104 | 105 | ```{r} 106 | mpg3 <- mpg2 %>% 107 | mutate( 108 | model = as_factor(model), 109 | manufacturer = as_factor(manufacturer) 110 | ) 111 | 112 | mpg3 %>% 113 | ggplot(aes(cty, fct_reorder(model, cty))) + 114 | geom_point() + 115 | facet_grid( 116 | rows = vars( 117 | fct_reorder( 118 | manufacturer, 119 | cty, 120 | .desc = TRUE 121 | ) 122 | ), 123 | scales = "free", 124 | space = "free" 125 | ) + 126 | theme(strip.text.y = element_text(angle = 0)) + 127 | labs(y = "model") 128 | ``` 129 | 130 | ## 17.4 Missing faceting variables 131 | 132 | ```{r} 133 | tbl1 <- tibble( 134 | x = 1:3, 135 | y = 1:3, 136 | gender = c("f", "f", "m") 137 | ) 138 | 139 | tbl2 <- tibble(x = 2, y = 2) 140 | 141 | tbl1 %>% 142 | ggplot(aes(x, y)) + 143 | geom_point( 144 | data = tbl2, 145 | color = "red", 146 | size = 2 147 | ) + 148 | geom_point() + 149 | facet_wrap(vars(gender)) 150 | ``` 151 | 152 | ## 17.5 Grouping vs. faceting 153 | 154 | ```{r} 155 | tbl <- data.frame( 156 | x = rnorm(120, c(0, 2, 4)), 157 | y = rnorm(120, c(1, 2, 1)), 158 | z = letters[1:3] 159 | ) %>% as_tibble() 160 | 161 | # Grouping 162 | p <- tbl %>% 163 | ggplot(aes(x, y)) 164 | 165 | p + 166 | geom_point(aes(color = z)) + 167 | labs(title = "Grouping") 168 | 169 | # Faceting 170 | p + 171 | geom_point() + 172 | facet_wrap(vars(z)) + 173 | labs(title = "Faceting") 174 | 175 | # Faceting with means 176 | tbl_sum <- tbl %>% 177 | summarize( 178 | x = mean(x), 179 | y = mean(y), 180 | .by = z 181 | ) %>% 182 | rename(z2 = z) 183 | 184 | p + 185 | geom_point() + 186 | geom_point( 187 | data = tbl_sum, 188 | aes(color = z2), 189 | size = 4 190 | ) + 191 | facet_wrap(vars(z)) + 192 | labs(title = "Faceting with Means") 193 | 194 | # Grouping + faceting 195 | tbl3 <- tbl %>% 196 | select(-z) 197 | 198 | p + 199 | geom_point(data = tbl3, color = "grey70") + 200 | geom_point(aes(color = z)) + 201 | facet_wrap(vars(z)) + 202 | labs(title = "Grouping + Faceting") 203 | ``` 204 | 205 | ## 17.6 Continuous variables 206 | 207 | ```{r} 208 | mpg4 <- mpg2 %>% 209 | mutate( 210 | # Bins of width 1 211 | disp_w = displ %>% cut_width(1), 212 | # Six bins of equal length 213 | disp_i = displ %>% cut_interval(6), 214 | # Six bins containing equal numbers of points 215 | disp_n = displ %>% cut_number(6) 216 | ) 217 | 218 | plot <- mpg4 %>% 219 | ggplot(aes(cty, hwy)) + 220 | geom_point() + 221 | labs( 222 | x = NULL, 223 | y = NULL 224 | ) 225 | 226 | plot + 227 | facet_wrap(vars(disp_w), nrow = 1) 228 | 229 | plot + 230 | facet_wrap(vars(disp_i), nrow = 1) 231 | 232 | plot + 233 | facet_wrap(vars(disp_n), nrow = 1) 234 | ``` 235 | 236 | ## 17.7 Exercises 237 | 238 | 1. 239 | ```{r} 240 | carats <- diamonds %>% 241 | mutate(carat = carat %>% as.integer()) 242 | 243 | carats %>% 244 | ggplot( 245 | aes( 246 | price, 247 | color = as.factor(carat) 248 | ) 249 | ) + 250 | geom_freqpoly() + 251 | facet_wrap(vars(cut), scales = "free_y") + 252 | labs(color = "carat") 253 | 254 | carats %>% 255 | ggplot(aes(price, color = cut)) + 256 | geom_freqpoly() + 257 | facet_wrap(vars(carat), scales = "free_y") 258 | ``` 259 | 260 | 2. 261 | ```{r} 262 | p <- diamonds %>% 263 | ggplot(aes(carat, price, color = color)) + 264 | geom_point() 265 | 266 | p 267 | 268 | p + 269 | facet_wrap(vars(color)) 270 | 271 | diamonds2 <- diamonds %>% 272 | select(-color) 273 | 274 | diamonds %>% 275 | ggplot(aes(carat, price)) + 276 | geom_point(data = diamonds2, color = "grey70") + 277 | geom_point(aes(color = color)) + 278 | facet_wrap(vars(color)) 279 | ``` 280 | 281 | 4. 282 | ```{r} 283 | mpg2 %>% 284 | ggplot(aes(displ, hwy)) + 285 | geom_point() + 286 | geom_smooth( 287 | data = mpg2 %>% select(-class), 288 | se = FALSE 289 | ) + 290 | facet_wrap(vars(class)) 291 | ``` 292 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/chapter_19_programming_with_ggplot2.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (3rd ed.), Chapter 19 - Programming with ggplot2" 3 | author: "Original Code: Hadley Wickham | Modifications: Antti Rask" 4 | date: "2023-01-21" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 19 Programming with ggplot2 13 | 14 | ```{r} 15 | library(tidyverse) 16 | ``` 17 | 18 | ## 19.2 Single components 19 | 20 | ```{r} 21 | bestfit <- geom_smooth( 22 | method = "lm", 23 | se = FALSE, 24 | color = alpha("steelblue", 0.5), 25 | linewidth = 2 26 | ) 27 | 28 | mpg %>% 29 | ggplot(aes(cty, hwy)) + 30 | geom_point() + 31 | bestfit 32 | 33 | mpg %>% 34 | ggplot(aes(displ, hwy)) + 35 | geom_point() + 36 | bestfit 37 | ``` 38 | 39 | ```{r} 40 | geom_lm <- function( 41 | formula = y ~ x, 42 | color = alpha("steelblue", 0.5), 43 | linewidth = 2, 44 | ... 45 | ) { 46 | geom_smooth( 47 | formula = formula, 48 | se = FALSE, 49 | method = "lm", 50 | color = color, 51 | linewidth = linewidth, 52 | ... 53 | ) 54 | } 55 | 56 | p <- mpg %>% 57 | ggplot(aes(displ, 1 / hwy)) + 58 | geom_point() 59 | 60 | p + 61 | geom_lm() 62 | 63 | p + 64 | geom_lm( 65 | y ~ poly(x, 2), 66 | linewidth = 1, 67 | color = "red" 68 | ) 69 | ``` 70 | 71 | ### 19.2.1 Exercises 72 | 73 | 1. 74 | ```{r} 75 | histogram_pink_100_bins <- geom_histogram(bins = 100, fill = "pink") 76 | 77 | mpg %>% 78 | ggplot(aes(cty)) + 79 | histogram_pink_100_bins 80 | ``` 81 | 82 | 2. 83 | ```{r} 84 | brewer_fill_blues <- scale_fill_brewer(palette = "Blues") 85 | 86 | mpg %>% 87 | ggplot(aes(factor(manufacturer), fill = factor(drv))) + 88 | geom_bar(position = "stack") + 89 | brewer_fill_blues + 90 | coord_flip() 91 | ``` 92 | 93 | ## 19.3 Multiple components 94 | 95 | ```{r} 96 | geom_mean <- function() { 97 | list( 98 | stat_summary( 99 | fun = "mean", 100 | geom = "bar", 101 | fill = "grey70" 102 | ), 103 | stat_summary( 104 | fun.data = "mean_cl_normal", 105 | geom = "errorbar", 106 | width = 0.4 107 | ) 108 | ) 109 | } 110 | 111 | mpg %>% 112 | ggplot(aes(class, cty)) + 113 | geom_mean() 114 | 115 | mpg %>% 116 | ggplot(aes(drv, cty)) + 117 | geom_mean() 118 | ``` 119 | 120 | ```{r} 121 | geom_mean <- function(se = TRUE) { 122 | list( 123 | stat_summary( 124 | fun = "mean", 125 | geom = "bar", 126 | fill = "grey70" 127 | ), 128 | if (se) { 129 | stat_summary( 130 | fun.data = "mean_cl_normal", 131 | geom = "errorbar", 132 | width = 0.4 133 | ) 134 | } 135 | ) 136 | } 137 | 138 | p <- mpg %>% 139 | ggplot(aes(drv, cty)) 140 | 141 | p + 142 | geom_mean() 143 | 144 | p + 145 | geom_mean(se = FALSE) 146 | ``` 147 | 148 | ### 19.3.2 Annotation 149 | 150 | ```{r} 151 | # borders -> borders2 152 | borders2 <- function( 153 | database = "world", 154 | regions = ".", 155 | fill = NA, 156 | color = "grey50", 157 | ... 158 | ) { 159 | 160 | tbl <- map_data(database, regions) 161 | 162 | geom_polygon( 163 | # aes_ -> aes + ~lat -> long + ~long -> lat + ~group -> group 164 | aes(long, lat, group = group), 165 | data = tbl, 166 | fill = fill, 167 | color = color, 168 | ..., 169 | inherit.aes = FALSE, 170 | show.legend = FALSE 171 | ) 172 | } 173 | 174 | # These weren't part of the original code, but I wanted to see what the function above would look like 175 | p <- ggplot() + 176 | borders2(fill = "blue") 177 | 178 | p 179 | 180 | p + 181 | coord_sf() 182 | ``` 183 | 184 | ### 19.3.3 Additional arguments 185 | 186 | ```{r} 187 | geom_mean <- function(..., bar.params = list(), errorbar.params = list()) { 188 | params <- list(...) 189 | bar.params <- modifyList(params, bar.params) 190 | errorbar.params <- modifyList(params, errorbar.params) 191 | 192 | bar <- do.call( 193 | "stat_summary", 194 | modifyList( 195 | list( 196 | fun = "mean", 197 | geom = "bar", 198 | fill = "grey70" 199 | ), 200 | bar.params 201 | ) 202 | ) 203 | 204 | errorbar <- do.call( 205 | "stat_summary", 206 | modifyList( 207 | list( 208 | fun.data = "mean_cl_normal", 209 | geom = "errorbar", 210 | width = 0.4 211 | ), 212 | errorbar.params 213 | ) 214 | ) 215 | 216 | list(bar, errorbar) 217 | } 218 | 219 | mpg %>% 220 | ggplot(aes(class, cty)) + 221 | geom_mean( 222 | color = "steelblue", 223 | errorbar.params = list(width = 0.5, linewidth = 1) 224 | ) 225 | 226 | mpg %>% 227 | ggplot(aes(class, cty)) + 228 | geom_mean( 229 | bar.params = list(fill = "steelblue"), 230 | errorbar.params = list(color = "blue") 231 | ) 232 | ``` 233 | 234 | ## 19.4. Plot functions 235 | 236 | ```{r} 237 | piechart <- function(data, mapping) { 238 | ggplot(data, mapping) + 239 | geom_bar(width = 1) + 240 | coord_polar(theta = "y") + 241 | labs( 242 | x = NULL, 243 | y = NULL 244 | ) 245 | } 246 | 247 | mpg %>% 248 | piechart(aes(factor(1), fill = class)) 249 | ``` 250 | 251 | ```{r} 252 | pcp_data <- function(tbl) { 253 | # vapply -> map_vec + .progress = TRUE + .ptype = logical(1) 254 | is_numeric <- map_vec(tbl, is.numeric, .progress = TRUE, .ptype = logical(1)) 255 | 256 | # Rescale numeric columns 257 | rescale01 <- function(x) { 258 | rng <- range(x, na.rm = TRUE) 259 | (x - rng[1]) / (rng[2] - rng[1]) 260 | } 261 | # lapply -> map 262 | tbl[is_numeric] <- map(tbl[is_numeric], rescale01) 263 | 264 | # Add row identifier 265 | tbl$.row <- rownames(tbl) 266 | 267 | # Treat numerics as value (aka measure) variables 268 | # gather_ -> pivot_longer + cols = names(tbl)[is_numeric] + names_to = "variable" + values_to = "value" 269 | pivot_longer(tbl, cols = names(tbl)[is_numeric], names_to = "variable", values_to = "value") 270 | } 271 | 272 | pcp <- function(tbl, ...) { 273 | tbl <- pcp_data(tbl) 274 | tbl %>% 275 | ggplot(aes(variable, value, group = .row)) + 276 | geom_line(...) 277 | } 278 | 279 | mpg %>% 280 | pcp() 281 | 282 | mpg %>% 283 | pcp(aes(color = drv)) 284 | ``` 285 | 286 | ### 19.4.1 Indirectly referring to variables 287 | 288 | ```{r} 289 | my_function <- function(x_var) { 290 | aes(x = x_var) 291 | } 292 | 293 | my_function(abc) 294 | ``` 295 | 296 | ```{r} 297 | my_function <- function(x_var) { 298 | aes(x = {{ x_var }}) 299 | } 300 | my_function(abc) 301 | ``` 302 | 303 | 304 | 305 | ```{r} 306 | piechart1 <- function(data, var) { 307 | # aes_(~factor(1) -> aes(factor(1) + as.name(var) -> .data[[var]] 308 | piechart(data, aes(factor(1), fill = .data[[var]])) 309 | } 310 | 311 | mpg %>% 312 | piechart1("class") 313 | 314 | piechart2 <- function(data, var) { 315 | # aes_(~factor(1) -> aes(factor(1) + as.name(var) -> {{ var }} 316 | piechart(data, aes(factor(1), fill = {{ var }})) 317 | } 318 | 319 | mpg %>% 320 | piechart2(class) 321 | ``` 322 | 323 | ## 19.5 Functional programming 324 | 325 | ```{r} 326 | geoms <- list( 327 | geom_point(), 328 | geom_boxplot(aes(group = cut_width(displ, 1))), 329 | list(geom_point(), geom_smooth()) 330 | ) 331 | 332 | p <- mpg %>% 333 | ggplot(aes(displ, hwy)) 334 | 335 | # lapply -> map 336 | map( 337 | geoms, 338 | function(g) { 339 | p + g 340 | } 341 | ) 342 | ``` 343 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/chapter_20_internals_of_ggplot2.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wickham, Hadley - ggplot2: Elegant Graphics for Data Analysis (3rd ed.), Chapter 20 - Internals of ggplot2" 3 | author: "Antti Rask" 4 | date: "2023-02-13" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | # 20 Internals of ggplot2 13 | 14 | ```{r} 15 | library(tidyverse) 16 | ``` 17 | 18 | ## 20.1 The plot() method 19 | 20 | ```{r} 21 | library(grid) 22 | 23 | p <- mpg %>% 24 | ggplot(aes(displ, hwy, color = drv)) + 25 | geom_point(position = "jitter") + 26 | geom_smooth(method = "lm", formula = y ~ x) + 27 | facet_wrap(vars(year)) + 28 | ggtitle("A plot for expository purposes") 29 | 30 | ggprint <- function(x) { 31 | data <- ggplot_build(x) 32 | gtable <- ggplot_gtable(data) 33 | grid.newpage() 34 | grid.draw(gtable) 35 | return(invisible(x)) 36 | } 37 | 38 | ggprint(p) 39 | ``` 40 | 41 | ## 20.3 The gtable step 42 | 43 | ### 20.3.4 Output 44 | 45 | ```{r} 46 | p_built <- ggplot_build(p) 47 | p_gtable <- ggplot_gtable(p_built) 48 | 49 | class(p_gtable) 50 | 51 | p_gtable 52 | ``` 53 | 54 | ```{r} 55 | library(grid) 56 | 57 | grid.newpage() 58 | grid.draw(p_gtable) 59 | ``` 60 | 61 | ## 20.4 Introducing ggproto 62 | 63 | ### 20.4.1 ggproto objects 64 | 65 | ```{r} 66 | NewObject <- ggproto( 67 | `_class` = NULL, 68 | `_inherits` = NULL 69 | ) 70 | 71 | NewObject <- ggproto(NULL, NULL) 72 | 73 | NewObject 74 | ``` 75 | 76 | ### 20.4.2 Creating new classes 77 | 78 | ```{r} 79 | NewClass <- ggproto("NewClass", NULL) 80 | 81 | NewClass 82 | ``` 83 | 84 | ```{r} 85 | Person <- ggproto("Person", NULL, 86 | 87 | # fields 88 | given_name = NA, 89 | family_name = NA, 90 | birth_date = NA, 91 | 92 | # methods 93 | full_name = function(self, family_last = TRUE) { 94 | if(family_last == TRUE) { 95 | return(paste(self$given_name, self$family_name)) 96 | } 97 | return(paste(self$family_name, self$given_name)) 98 | }, 99 | age = function(self) { 100 | days_old <- Sys.Date() - self$birth_date 101 | floor(as.integer(days_old) / 365.25) 102 | }, 103 | description = function(self) { 104 | paste(self$full_name(), "is", self$age(), "years old") 105 | } 106 | ) 107 | 108 | Person 109 | 110 | Person$full_name 111 | ``` 112 | 113 | ### 20.4.3 Creating new instances 114 | 115 | ```{r} 116 | Thomas <- ggproto(NULL, Person, 117 | given_name = "Thomas Lin", 118 | family_name = "Pedersen", 119 | birth_date = as.Date("1985/10/12") 120 | ) 121 | 122 | Danielle <- ggproto(NULL, Person, 123 | given_name = "Danielle Jasmine", 124 | family_name = "Navarro", 125 | birth_date = as.Date("1977/09/12") 126 | ) 127 | 128 | Thomas$description() 129 | Danielle$description() 130 | ``` 131 | 132 | ### 20.4.4 Creating subclasses 133 | 134 | ```{r} 135 | # define the subclass 136 | NewSubClass <- ggproto("NewSubClass", Person) 137 | 138 | # verify that this works 139 | NewSubClass 140 | ``` 141 | 142 | ```{r} 143 | Royalty <- ggproto( 144 | "Royalty", 145 | Person, 146 | rank = NA, 147 | territory = NA, 148 | full_name = function(self) { 149 | paste( 150 | self$rank, 151 | self$given_name, 152 | "of", 153 | self$territory 154 | ) 155 | } 156 | ) 157 | 158 | Victoria <- ggproto(NULL, Royalty, 159 | given_name = "Victoria", 160 | family_name = "Hanover", 161 | rank = "Queen", 162 | territory = "the United Kingdom", 163 | birth_date = as.Date("1819/05/24") 164 | ) 165 | 166 | Victoria$full_name() 167 | 168 | Victoria$description() 169 | ``` 170 | 171 | ```{r} 172 | Police <- ggproto( 173 | "Police", 174 | Person, 175 | rank = NA, 176 | description = function(self) { 177 | paste( 178 | self$rank, 179 | ggproto_parent(Person, self)$description() 180 | ) 181 | } 182 | ) 183 | 184 | John <- ggproto( 185 | NULL, 186 | Police, 187 | given_name = "John", 188 | family_name = "McClane", 189 | rank = "Detective", 190 | birth_date = as.Date("1955/03/19") 191 | ) 192 | 193 | John$full_name() 194 | John$description() 195 | ``` 196 | 197 | ### 20.4.5 Style guide for ggproto 198 | 199 | ```{r} 200 | GeomErrorbar <- ggproto( 201 | # ... 202 | setup_params = function(data, params) { 203 | GeomLinerange$setup_params(data, params) 204 | } 205 | # ... 206 | ) 207 | 208 | GeomErrorbar 209 | ``` 210 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/img/gghighlight_and_ggthemes_with_logo.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/img/gghighlight_and_ggthemes_with_logo.jpg -------------------------------------------------------------------------------- /ggplot2_3rd_ed/img/plot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/img/plot.png -------------------------------------------------------------------------------- /ggplot2_3rd_ed/img/plot_with_logo.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/img/plot_with_logo.jpg -------------------------------------------------------------------------------- /ggplot2_3rd_ed/img/plot_with_theme_and_logo.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/img/plot_with_theme_and_logo.jpg -------------------------------------------------------------------------------- /ggplot2_3rd_ed/img/satellite_imagery_1.tif: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:f1d052db1c8646c370f3ef2e57cb6f20d7aa171372f7404d48f30149a9fbae26 3 | size 189918256 4 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/img/satellite_imagery_1.tif.aux.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | red 4 | 5 | 255 6 | -9999 7 | 0 8 | -9999 9 | 10 | 11 | 12 | green 13 | 14 | 255 15 | -9999 16 | 0 17 | -9999 18 | 19 | 20 | 21 | blue 22 | 23 | 255 24 | -9999 25 | 0 26 | -9999 27 | 28 | 29 | 30 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/img/youcanbeapirate-wb-sparkline.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/img/youcanbeapirate-wb-sparkline.jpg -------------------------------------------------------------------------------- /ggplot2_3rd_ed/output/last_animation.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/output/last_animation.gif -------------------------------------------------------------------------------- /ggplot2_3rd_ed/output/last_animation.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/output/last_animation.mp4 -------------------------------------------------------------------------------- /ggplot2_3rd_ed/output/last_animation_2.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/output/last_animation_2.gif -------------------------------------------------------------------------------- /ggplot2_3rd_ed/output/output.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/output/output.pdf -------------------------------------------------------------------------------- /ggplot2_3rd_ed/output/output2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/output/output2.pdf -------------------------------------------------------------------------------- /ggplot2_3rd_ed/output/plot.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/output/plot.pdf -------------------------------------------------------------------------------- /ggplot2_3rd_ed/output/plot.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/output/plot.rds -------------------------------------------------------------------------------- /ggplot2_3rd_ed/output/plot_with_theme.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/output/plot_with_theme.pdf -------------------------------------------------------------------------------- /ggplot2_3rd_ed/raster/IDE00420.202302122200.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/raster/IDE00420.202302122200.tif -------------------------------------------------------------------------------- /ggplot2_3rd_ed/raster/IDE00421.202302122200.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/raster/IDE00421.202302122200.tif -------------------------------------------------------------------------------- /ggplot2_3rd_ed/raster/IDE00422.202302122200.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/raster/IDE00422.202302122200.tif -------------------------------------------------------------------------------- /ggplot2_3rd_ed/raster/IDE00423.202302122200.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/raster/IDE00423.202302122200.tif -------------------------------------------------------------------------------- /ggplot2_3rd_ed/raster/IDE00425.202302122200.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/raster/IDE00425.202302122200.tif -------------------------------------------------------------------------------- /ggplot2_3rd_ed/raster/IDE00426.202302122200.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/raster/IDE00426.202302122200.tif -------------------------------------------------------------------------------- /ggplot2_3rd_ed/raster/IDE00427.202302122200.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/raster/IDE00427.202302122200.tif -------------------------------------------------------------------------------- /ggplot2_3rd_ed/raster/IDE00430.202302122200.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/ggplot2_3rd_ed/raster/IDE00430.202302122200.tif -------------------------------------------------------------------------------- /ggplot2_3rd_ed/raster/IDE00431.202302122200.tif: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:f53740858a7f2b6ef438061b5e51f093f3195702a7766faf01b7518b34d85a08 3 | size 2670478 4 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/raster/IDE00432.202302122200.tif: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:ca3556bdb874929e04d9efd036a11f0f14d203a71217d167efc2d755f6964a92 3 | size 3693444 4 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/raster/IDE00433.202302122200.tif: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:9a926891510dbd003a456e7ed940d598b02c9d111b8b9eebb4c60983dcd3d4b9 3 | size 4802629 4 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/raster/IDE00435.202302122200.tif: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:b0626c249393c348aa21ac407a7f03f76584d587ef29141456ba3572a40eb0eb 3 | size 14881590 4 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/raster/IDE00436.202302122200.tif: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:dd7893361e31e82f68bf7aff3008177b03846e6a4ca9a27189ff57dd1a101179 3 | size 4541947 4 | -------------------------------------------------------------------------------- /ggplot2_3rd_ed/raster/IDE00437.202302122200.tif: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:41ffd1257c887c621edacf503956418ca5b2a0d3cf24c44d27e310126dc4605e 3 | size 3296839 4 | -------------------------------------------------------------------------------- /learning-ggplot2.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 4 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | -------------------------------------------------------------------------------- /real_life_use_cases/ggplot2_dependencies_visualized/README.md: -------------------------------------------------------------------------------- 1 | ![](../../ggplot2_2nd_ed/img/youcanbeapirate-wb-sparkline.jpg) 2 | 3 | # Number of Packages on CRAN depending on, importing, or suggesting {ggplot2} 4 | 5 | __Georgios Karamanis__ shared their recent __Tidy Tuesday__ visualization on [LinkedIn](https://www.linkedin.com/posts/georgios-karamanis-a54926153_tidytuesday-rstats-dataviz-activity-7111680233430224896-EdtA/). 6 | 7 | Here's a link to the [GitHub repo](https://github.com/gkaramanis/tidytuesday/tree/master/2023/2023-week_38). 8 | 9 | The thing is, Georgios' original graph shows the years for the LATEST release of the ggplot2-related packages on CRAN. While that is an interesting question, I've been trying to find out a good data source for another question: what are the years for the initial releases of those packages. That question led me down a rabbit hole and I eventually found __{pkgsearch}__. 10 | 11 | So what I did was take Georgios' original code (with their blessing) and 12 | 13 | 1. change the data source (using __{pkgsearch}__) 14 | 2. use __{purrr}__ to easily get all the ggplot2-related packages' metadata 15 | 3. bring in a third type/category 16 | 4. make the stream chart less 'wavy' 17 | 5. change the color scheme 18 | 6. change the fonts to _Roboto Mono_ (using __{showtext}__) 19 | 7. annotate all the major __{ggplot2}__ releases 20 | 8. other, smaller changes 21 | 22 | I'm thankful to Georgios for their support and continuing inspiration. If you're interested in data visualization, they are one of the people to follow. Just follow those links I listed earlier. 23 | 24 | One more thing, this visualization is part of the background work I'm doing for my upcoming book about ggplot2 extension packages called _'ggplot2 extended'_ (working title). If you are interested in seeing how that project advances, you can start by following me on LinkedIn. I'm also happy to have conversations about the different ggplot2 extensions, if you have strong opinions and/or knowledge about them. So, don't hesitate to DM me on LinkedIn! Do mention it's about ggplot2 and the response rate will be significantly higher... 25 | -------------------------------------------------------------------------------- /real_life_use_cases/ggplot2_dependencies_visualized/pkgsearch.R: -------------------------------------------------------------------------------- 1 | # Number of Packages on CRAN depending on, importing, or suggesting {ggplot2} # 2 | 3 | # Inspiration ---- 4 | 5 | # Georgios Karamanis shared their recent Tidy Tuesday visualization on LinkedIn: 6 | # https://www.linkedin.com/posts/georgios-karamanis-a54926153_tidytuesday-rstats-dataviz-activity-7111680233430224896-EdtA/ 7 | 8 | # Here's a link to the GitHub repo: 9 | # https://github.com/gkaramanis/tidytuesday/tree/master/2023/2023-week_38 10 | 11 | # The thing is, Georgios' original graph shows the years for the LATEST release of the ggplot2-related 12 | # packages on CRAN. While that is an interesting question, I've been trying to find out a good data 13 | # source for another question: what are the years for the initial releases of those packages. That 14 | # question led me down a rabbit hole and I eventually found {pkgsearch}. 15 | 16 | # So what I did was take Georgios' original code (with their blessing) and 17 | # 1) change the data source (using {pkgsearch}) 18 | # 2) use {purrr} to easily get all the ggplot2-related packages' metadata 19 | # 3) bring in a third type/category 20 | # 4) make the stream chart less 'wavy' 21 | # 5) change the color scheme 22 | # 6) change the fonts to Roboto Mono (using {showtext}) 23 | # 7) annotate all the major ggplot2 releases 24 | # 8) other, smaller changes 25 | 26 | # I'm thankful to Georgios for their support and continuing inspiration. If you're interested in data 27 | # visualization, they are one of the people to follow. 28 | 29 | # One more thing, this visualization is part of the background work I'm doing for my upcoming book 30 | # about ggplot2 extension packages called 'ggplot2 extended' (working title). If you are interested in 31 | # seeing how that project advances, you can start by following me on LinkedIn. I'm also happy to have 32 | # conversations about the different ggplot2 extensions, if you have strong opinions and/or knowledge 33 | # about them. So, don't hesitate to DM me on LinkedIn! Do mention it's about ggplot2 and the response 34 | # rate will be significantly higher... 35 | 36 | # Packages ---- 37 | library(colorspace) # A Toolbox for Manipulating and Assessing Colors and Palettes 38 | library(conflicted) # An Alternative Conflict Resolution Strategy 39 | conflicts_prefer(dplyr::filter) 40 | library(ggrepel) # Automatically Position Non-Overlapping Text Labels with 'ggplot2' 41 | library(ggstream) # Create Streamplots in 'ggplot2' 42 | library(ggtext) # Improved Text Rendering Support for 'ggplot2' 43 | library(pkgsearch) # Search and Query CRAN R Packages 44 | library(showtext) # Using Fonts More Easily in R Graphs 45 | library(tidyverse) # Easily Install and Load the 'Tidyverse' 46 | 47 | # Data ---- 48 | 49 | ## Vector of ggplot2 related packages ---- 50 | ggplot2_pkg_names <- pkg_search("ggplot2", size = 6000) %>% 51 | as_tibble() %>% 52 | arrange(package, .locale = "en") %>% # alphabetical order, ignore case 53 | filter(package != "irtplay") %>% # irtplay was removed from CRAN and was causing an error 54 | pull(package) 55 | 56 | ggplot2_pkg_names 57 | 58 | ## Fetch the package history information for the ggplot2 related packages ---- 59 | 60 | # Note: This might take a while! 61 | ggplot2_history <- map_dfr( 62 | ggplot2_pkg_names, function(pkg) { 63 | cran_package_history(pkg) %>% 64 | as_tibble() 65 | } 66 | ) 67 | 68 | ggplot2_history 69 | 70 | ## Fetch initial release dates for the packages ---- 71 | initial_release_dates <- ggplot2_history %>% 72 | summarize( 73 | initial_release_date = min(date) %>% as_date(), 74 | .by = Package 75 | ) %>% 76 | mutate(year = year(initial_release_date)) 77 | 78 | initial_release_dates 79 | 80 | ## Fetch dependencies for the latest version of the packages ---- 81 | ggplot2_dependencies <- ggplot2_history %>% 82 | 83 | # Have to unnest dependencies first. The nested column contains 'type', which is the type of 84 | # dependency (depends/enhances/imports/suggests) and 'package', which is the package towards 85 | # which there is that dependency. The confusing part is that we already have the column Package. 86 | # But we're only using package to filter in only the ones that mention ggplot2 as a dependency. 87 | 88 | unnest(dependencies) %>% 89 | filter(package == "ggplot2") %>% 90 | filter( 91 | date == max(date) %>% as_date(), 92 | .by = Package 93 | ) %>% 94 | distinct(Package, type) 95 | 96 | ggplot2_dependencies 97 | 98 | ## Create the final tibble ---- 99 | year_end <- Sys.Date() %>% 100 | # year() 101 | year() - 1 102 | 103 | ggplot2_years_and_dependencies <- ggplot2_dependencies %>% 104 | inner_join(initial_release_dates) %>% 105 | count(year, type) %>% 106 | filter( 107 | 108 | # Get rid of this type, because there are only 4 packages of its kind 109 | type != "Enhances", 110 | 111 | # There are 32 packages that were released before 2007, which is possible 112 | # due to the fact that the dependency could have appeared after the initial 113 | # release. I decided to leave them out for the sake of clarity. 114 | between(year, 2007, year_end) 115 | ) 116 | 117 | ggplot2_years_and_dependencies 118 | 119 | # Colors ---- 120 | color_1 <- "#F36523" 121 | color_2 <- "#125184" 122 | color_3 <- "#2E8B57" 123 | colors <- c(color_1, color_2, color_3) 124 | 125 | # Annotation ---- 126 | annotation_numbers <- ggplot2_years_and_dependencies %>% 127 | summarize(n = sum(n), .by = type) %>% 128 | arrange(type) %>% 129 | mutate(y = c(400, 75, -325)) %>% 130 | mutate( 131 | label = case_when( 132 | type == "Depends" ~ str_glue("**{n}**"), 133 | type == "Imports" ~ str_glue("**{n}**"), 134 | type == "Suggests" ~ str_glue("**{n}**") 135 | ) 136 | ) 137 | 138 | annotation_numbers 139 | 140 | # Fonts ---- 141 | font_add_google("Roboto Mono", "Roboto") 142 | showtext_auto() 143 | font_family <- "Roboto" 144 | 145 | # Plot ---- 146 | ggplot2_years_and_dependencies %>% 147 | filter() %>% 148 | ggplot() + 149 | 150 | ## ggplot2 releases ---- 151 | geom_point( 152 | aes(x = 2007, y = 0), 153 | data = NULL, 154 | size = 1.5, 155 | stat = "unique", 156 | ) + 157 | geom_label_repel( 158 | aes(x = 2007, y = 0, label = "{ggplot2}\nver 0.5"), 159 | data = NULL, 160 | stat = "unique", 161 | nudge_y = 75, 162 | label.size = NA, 163 | lineheight = 0.9, 164 | family = font_family 165 | ) + 166 | geom_label_repel( 167 | aes(x = 2014, y = 50, label = "{ggplot2}\nver 1.0"), 168 | data = NULL, 169 | stat = "unique", 170 | nudge_y = 115, 171 | label.size = NA, 172 | lineheight = 0.9, 173 | family = font_family 174 | ) + 175 | geom_label_repel( 176 | aes(x = 2015, y = 125, label = "{ggplot2}\nver 2.0"), 177 | data = NULL, 178 | stat = "unique", 179 | nudge_y = 100, 180 | label.size = NA, 181 | lineheight = 0.9, 182 | family = font_family 183 | ) + 184 | geom_label_repel( 185 | aes(x = 2018, y = 100, label = "{ggplot2}\nver 3.0"), 186 | data = NULL, 187 | stat = "unique", 188 | nudge_y = 200, 189 | label.size = NA, 190 | lineheight = 0.9, 191 | family = font_family 192 | ) + 193 | # geom_label_repel( 194 | # aes(x = 2024, y = 225, label = "{ggplot2}\nver 3.5"), 195 | # data = NULL, 196 | # stat = "unique", 197 | # nudge_y = 200, 198 | # label.size = NA, 199 | # lineheight = 0.9, 200 | # family = font_family 201 | # ) + 202 | 203 | ## Stream ---- 204 | geom_stream( 205 | aes( 206 | x = year, 207 | y = n, 208 | fill = type, 209 | color = after_scale(darken(fill)) 210 | ), 211 | bw = 1, 212 | linewidth = 0.1 213 | ) + 214 | 215 | ## Labels ---- 216 | 217 | # Text 218 | geom_richtext( 219 | aes( 220 | x = year_end + 0.1, 221 | y = 475, 222 | label = "Total # of
packages
currently:" 223 | ), 224 | data = NULL, 225 | stat = "unique", 226 | hjust = 0, 227 | lineheight = 0.9, 228 | label.size = NA, 229 | family = font_family 230 | ) + 231 | 232 | # Numbers 233 | geom_richtext( 234 | data = annotation_numbers, 235 | aes( 236 | x = year_end + 0.2, 237 | y = y, 238 | label = label 239 | ), 240 | hjust = 0, 241 | lineheight = 0.9, 242 | label.size = NA, 243 | size = 5, 244 | family = font_family 245 | ) + 246 | 247 | ## Scales ---- 248 | scale_x_continuous( 249 | breaks = seq(2007, year_end, 2), 250 | minor_breaks = 2007:year_end 251 | ) + 252 | scale_fill_manual( 253 | values = colors 254 | ) + 255 | 256 | ## Coord ---- 257 | coord_cartesian(clip = "off") + 258 | 259 | ## Labels ---- 260 | labs( 261 | title = str_glue("Number of packages on CRAN depending on, importing, or suggesting {{ggplot2}"), 262 | subtitle = "Aggregated by the initial package release years. Categories may change from one version to another and were taken from the latest versions.", 263 | caption = "Data: CRAN via {pkgsearch} | Visualization: Antti Rask | Updated: 2023-12-31" 264 | ) + 265 | 266 | ## Theme ---- 267 | theme_minimal(base_family = font_family) + 268 | theme( 269 | axis.text.x = element_text( 270 | size = 14, 271 | face = "bold", 272 | margin = margin(10, 0, 0, 0) 273 | ), 274 | axis.text.y = element_blank(), 275 | axis.title = element_blank(), 276 | legend.position = "none", 277 | panel.grid.major.y = element_blank(), 278 | panel.grid.minor.y = element_blank(), 279 | plot.margin = margin(10, 50, 10, 10), 280 | plot.title = element_markdown( 281 | face = "bold", 282 | size = 20, 283 | hjust = 0.5 284 | ), 285 | plot.subtitle = element_text( 286 | hjust = 0.5, 287 | margin = margin(0, 0, 20, 0) 288 | ), 289 | plot.caption = element_text( 290 | size = 10, 291 | color = darken("darkgrey", 0.4), 292 | hjust = 0.5, 293 | margin = margin(20, 0, 0, 0) 294 | ) 295 | ) 296 | -------------------------------------------------------------------------------- /real_life_use_cases/ggplot2_dependencies_visualized/pkgsearch.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/real_life_use_cases/ggplot2_dependencies_visualized/pkgsearch.pdf -------------------------------------------------------------------------------- /real_life_use_cases/ggplot2_dependencies_visualized/pkgsearch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/real_life_use_cases/ggplot2_dependencies_visualized/pkgsearch.png -------------------------------------------------------------------------------- /real_life_use_cases/ggplot2_dependencies_visualized/pkgsearch_ver2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/real_life_use_cases/ggplot2_dependencies_visualized/pkgsearch_ver2.pdf -------------------------------------------------------------------------------- /real_life_use_cases/ggplot2_dependencies_visualized/pkgsearch_ver2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/real_life_use_cases/ggplot2_dependencies_visualized/pkgsearch_ver2.png -------------------------------------------------------------------------------- /real_life_use_cases/ggplot2_dependencies_visualized/pkgsearch_ver3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/real_life_use_cases/ggplot2_dependencies_visualized/pkgsearch_ver3.pdf -------------------------------------------------------------------------------- /real_life_use_cases/ggplot2_dependencies_visualized/pkgsearch_ver3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AnttiRask/learning_ggplot2/81242b7a53f120397e942b7fe3406fd5047fed80/real_life_use_cases/ggplot2_dependencies_visualized/pkgsearch_ver3.png --------------------------------------------------------------------------------