├── .Rbuildignore ├── cover.png ├── references.qmd ├── style ├── akbar.woff ├── graphs.scss ├── joyce.theme └── joyce_temp.R ├── tips.txt ├── .gitignore ├── copy.R ├── intro.qmd ├── graphs.Rproj ├── geoms.qmd ├── resources.qmd ├── appendix.qmd ├── DESCRIPTION ├── .github └── workflows │ └── quartobook.yml ├── helpers.R ├── references.bib ├── _quarto.yml ├── acknowledgments.qmd ├── grammar.qmd ├── index.qmd └── layers.qmd /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^.*\.Rproj$ 2 | ^\.Rproj\.user$ 3 | -------------------------------------------------------------------------------- /cover.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jtr13/graphs/main/cover.png -------------------------------------------------------------------------------- /references.qmd: -------------------------------------------------------------------------------- 1 | # References {.unnumbered} 2 | 3 | ::: {#refs} 4 | ::: 5 | -------------------------------------------------------------------------------- /style/akbar.woff: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jtr13/graphs/main/style/akbar.woff -------------------------------------------------------------------------------- /tips.txt: -------------------------------------------------------------------------------- 1 | Tips and tricks 2 | 3 | .footnotes { 4 | font-size: .9rem; 5 | } 6 | 7 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | _book/ 6 | /.quarto/ 7 | .DS_Store 8 | *_cache/ 9 | *_files/ 10 | -------------------------------------------------------------------------------- /copy.R: -------------------------------------------------------------------------------- 1 | # need to draft in an .R file to get color previews in RStudio 2 | copy <- function() { 3 | file <- readLines("style/joyce_temp.R") 4 | writeLines(file, "style/joyce.theme") 5 | } -------------------------------------------------------------------------------- /intro.qmd: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | This is a book created from markdown and executable code. 4 | 5 | See @knuth84 for additional discussion of literate programming. 6 | 7 | ```{r} 8 | 1 + 1 9 | ``` 10 | -------------------------------------------------------------------------------- /graphs.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | 15 | BuildType: None 16 | -------------------------------------------------------------------------------- /geoms.qmd: -------------------------------------------------------------------------------- 1 | # Six geoms 2 | 3 | ## `geom_histogram()` 4 | 5 | 6 | ## `geom_density()` 7 | 8 | ```{r} 9 | library(ggplot2) 10 | ggplot(faithful, aes(x = eruptions, y = after_stat(density))) + 11 | geom_histogram(breaks = seq(1.5, 5.5, .25), color = "blue", 12 | fill = "cornflowerblue", alpha = .5) + 13 | geom_density(linewidth = 1.5, color = "red") 14 | ``` 15 | 16 | -------------------------------------------------------------------------------- /resources.qmd: -------------------------------------------------------------------------------- 1 | # ggplot2 resources 2 | 3 | ## Themes 4 | 5 | How to create BBC style graphics https://bbc.github.io/rcookbook/ 6 | 7 | Todd Schneider's Simpsons Theme https://github.com/toddwschneider/flim-springfield/blob/f9b7f123aa8962b56b624fe4032cd4af3f68cc14/analysis/helpers.R 8 | 9 | https://towardsdatascience.com/themes-to-spice-up-visualizations-with-ggplot2-3e275038dafa 10 | 11 | -------------------------------------------------------------------------------- /appendix.qmd: -------------------------------------------------------------------------------- 1 | # Required mappings 2 | 3 | ```{r} 4 | #| echo: false 5 | library(ggplot2) 6 | x <- lsf.str("package:ggplot2") 7 | geominfo <- data.frame(GEOM = x[stringr::str_detect(x, "^geom")]) 8 | get_req <- function(geom) { 9 | t <- evaluate::evaluate(paste0("ggplot(mtcars) + ", 10 | geom, "()")) 11 | message <- t[[length(t)]]$parent$message 12 | missing <- ifelse(!is.null(message), 13 | stringr::str_remove_all(message, "^.*aesthetics: |^.*requires an | aesthetic.$"), NA) 14 | } 15 | geominfo$`REQUIRED MAPPINGS` <- sapply(geominfo$GEOM, get_req) 16 | knitr::kable(geominfo) 17 | ``` 18 | 19 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: graphs 2 | Title: What the Package Does (One Line, Title Case) 3 | Version: 0.0.0.9000 4 | Authors@R: 5 | person(given = "First", 6 | family = "Last", 7 | role = c("aut", "cre"), 8 | email = "first.last@example.com", 9 | comment = c(ORCID = "YOUR-ORCID-ID")) 10 | Description: What the package does (one paragraph). 11 | License: `use_mit_license()`, `use_gpl3_license()` or friends to 12 | pick a license 13 | Encoding: UTF-8 14 | LazyData: true 15 | Roxygen: list(markdown = TRUE) 16 | RoxygenNote: 7.1.1 17 | Imports: 18 | evaluate, 19 | ggthemes, 20 | knitr, 21 | patchwork, 22 | rmarkdown, 23 | tidyverse 24 | -------------------------------------------------------------------------------- /.github/workflows/quartobook.yml: -------------------------------------------------------------------------------- 1 | 2 | on: 3 | push: 4 | branches: 5 | - main 6 | schedule: 7 | # run every Saturday at 8am (12 UTC) 8 | - cron: '0 12 * * 6' 9 | 10 | name: quarto 11 | 12 | jobs: 13 | build-deploy: 14 | runs-on: ubuntu-latest 15 | steps: 16 | - uses: actions/checkout@v3 17 | 18 | - name: Install Quarto 19 | uses: quarto-dev/quarto-actions/setup@v2 20 | 21 | - name: setup R 22 | uses: r-lib/actions/setup-r@v2 23 | 24 | - name: setup dependencies 25 | uses: r-lib/actions/setup-r-dependencies@v2 26 | 27 | - name: Deploy 🚀 28 | uses: quarto-dev/quarto-actions/publish@v2 29 | with: 30 | target: gh-pages 31 | -------------------------------------------------------------------------------- /helpers.R: -------------------------------------------------------------------------------- 1 | # creates a graph with arrows 2 | 3 | arrow_chart <- function(domain, range) { 4 | if (length(domain) != length(range)) stop("Domain and range must be the same length") 5 | num <- length(domain) 6 | domain <- paste(domain, " ") 7 | range <- paste(" ", range) 8 | info <- data.frame(x = 0, y = num:1, xend = 1, yend = num:1, t1 = domain, 9 | t2 = range) 10 | ggplot(info, aes(x = x, y = y, xend = xend, yend = yend)) + 11 | geom_segment(arrow = arrow()) + 12 | geom_text(aes(label = t1), size = 7, hjust = 1, family = "mono") + 13 | geom_text(aes(x = xend, y = yend, label = t2), size = 7, hjust = 0, family = "mono") + 14 | lims(x = c(-1, 3), y = c(-4, num + 1)) + 15 | theme_void(fs) + 16 | theme(plot.title = element_text(hjust = .5)) 17 | } 18 | -------------------------------------------------------------------------------- /style/graphs.scss: -------------------------------------------------------------------------------- 1 | /*-- scss:defaults --*/ 2 | 3 | @import url('https://fonts.googleapis.com/css2?family=Lato&display=swap'); 4 | 5 | $border-color-left: #80abd7!default; 6 | $background-color: #e6eef7 !default; 7 | $headings-color: #3379be; 8 | $link-color: #3379be; 9 | $code-color: #3379be; 10 | 11 | $font-family-sans-serif: "Lato", sans-serif; 12 | 13 | /*-- scss:rules --*/ 14 | 15 | p code:not(.sourceCode), td code:not(.sourceCode), 16 | li code:not(.sourceCode) { 17 | font-weight: bold; 18 | background-color: white; 19 | } 20 | 21 | .big { 22 | font-size: 1.1rem; 23 | } 24 | 25 | .footnotes { 26 | font-size: .85rem; 27 | } 28 | 29 | // https://stackoverflow.com/questions/74647399/define-a-new-callout-in-quarto 30 | 31 | div.callout-tip.callout { 32 | border-left-color: $border-color-left; 33 | } 34 | 35 | div.callout-tip.callout-style-default>.callout-header { 36 | background-color: $background-color; 37 | } 38 | 39 | -------------------------------------------------------------------------------- /references.bib: -------------------------------------------------------------------------------- 1 | @article{knuth84, 2 | author = {Knuth, Donald E.}, 3 | title = {Literate Programming}, 4 | year = {1984}, 5 | issue_date = {May 1984}, 6 | publisher = {Oxford University Press, Inc.}, 7 | address = {USA}, 8 | volume = {27}, 9 | number = {2}, 10 | issn = {0010-4620}, 11 | url = {https://doi.org/10.1093/comjnl/27.2.97}, 12 | doi = {10.1093/comjnl/27.2.97}, 13 | journal = {Comput. J.}, 14 | month = may, 15 | pages = {97–111}, 16 | numpages = {15} 17 | } 18 | 19 | 20 | 21 | @book{agresti2015, 22 | title = {Foundations of linear and generalized linear models}, 23 | author = {Agresti, Alan}, 24 | year = {2015}, 25 | date = {2015}, 26 | publisher = {John Wiley & Sons Inc}, 27 | series = {Wiley series in probability and statistics}, 28 | address = {Hoboken, New Jersey} 29 | } 30 | 31 | @misc{blue-yel, 32 | title = {Blue-yellow Color Palette}, 33 | url = {https://www.color-hex.com/color-palette/80255}, 34 | langid = {en} 35 | } 36 | -------------------------------------------------------------------------------- /_quarto.yml: -------------------------------------------------------------------------------- 1 | project: 2 | type: book 3 | 4 | book: 5 | title: "A Solid Start to ggplot2" 6 | author: "Joyce Robbins" 7 | date: today 8 | site-url: https://jtr13.github.io/graphs 9 | repo-url: https://github.com/jtr13/graphs 10 | repo-branch: main 11 | repo-actions: [source, edit, issue] 12 | chapters: 13 | - index.qmd 14 | - intro.qmd 15 | - grammar.qmd 16 | - layers.qmd 17 | - geoms.qmd 18 | - acknowledgments.qmd 19 | - references.qmd 20 | - appendix.qmd 21 | 22 | bibliography: references.bib 23 | 24 | format: 25 | html: 26 | theme: [default, style/graphs.scss] 27 | code-link: true 28 | highlight-style: style/joyce.theme 29 | callout-icon: false 30 | fig-width: 4 31 | fig-height: 3 32 | 33 | knitr: 34 | opts_chunk: 35 | fig.align: center 36 | out.width: 50% 37 | 38 | 39 | execute: 40 | echo: true 41 | warning: false 42 | message: false 43 | error: true 44 | cache: false 45 | 46 | editor: source 47 | 48 | -------------------------------------------------------------------------------- /acknowledgments.qmd: -------------------------------------------------------------------------------- 1 | # Making this book 2 | 3 | - New Project, New Directory, Quarto book 4 | 5 | - edit `_quarto.yml` 6 | 7 | To add GitHub Actions: 8 | 9 | - add `DESCRIPTION` 10 | 11 | - add `_book/` to `.gitignore` 12 | 13 | - add GitHub Actions workflow 14 | 15 | - Settings, Actions -- give GA write access 16 | 17 | ## Colors 18 | 19 | Blue yellow Color Palette: https://www.color-hex.com/color-palette/80255 20 | 21 | Aki no Fujisan Color Palette: https://www.color-hex.com/color-palette/1026103 22 | 23 | 24 | ## ggplot2 fonts 25 | 26 | **Custom fonts** https://r-graph-gallery.com/custom-fonts-in-R-and-ggplot2.html 27 | 28 | http://www.cookbook-r.com/Graphs/Fonts/ 29 | 30 | https://www.stat.auckland.ac.nz/~paul/R/fontfamily.pdf 31 | 32 | ## quarto 33 | 34 | ### _quarto.yml 35 | 36 | https://github.com/hadley/r4ds/blob/main/_quarto.yml 37 | 38 | https://github.com/vizdata-s23/vizdata-s23/blob/main/style/sta313.scss 39 | 40 | ## Syntax highlighting 41 | 42 | https://www.garrickadenbuie.com/blog/pandoc-syntax-highlighting-examples/ 43 | 44 | Translation between syntax highlighting terms and CSS classes: 45 | [https://raw.githubusercontent.com/tajmone/pandoc-goodies/master/skylighting/css/built-in-styles/zenburn.css](https://raw.githubusercontent.com/tajmone/pandoc-goodies/master/skylighting/css/built-in-styles/zenburn.css) 46 | 47 | ## GitHub Actions 48 | 49 | -------------------------------------------------------------------------------- /grammar.qmd: -------------------------------------------------------------------------------- 1 | # Grammar of graphics 2 | 3 | The grammar of graphics on which ggplot2 is based on... 4 | 5 | ```{r} 6 | #| echo: false 7 | library(tidyverse) 8 | library(ggthemes) 9 | library(patchwork) 10 | data_color <- "#008fd5" 11 | fs <- 13 12 | 13 | df <- data.frame(state.x77) |> 14 | rownames_to_column("State") |> 15 | mutate(Region = state.region) 16 | 17 | # full graph 18 | 19 | ggplot(df, aes(Income/1000, Illiteracy/100)) + 20 | geom_point(color = data_color) + 21 | facet_wrap(~Region) + 22 | scale_x_continuous(name = "Per capita income (in thousands of $)") + 23 | scale_y_continuous(name = "Illiteracy rate", labels = scales::percent) + 24 | labs(title = "Illiteracy vs. Income by State") + 25 | theme_fivethirtyeight(11, base_family = "Chalkboard") + 26 | theme(plot.title = element_text(size = rel(1.2)), 27 | axis.title = element_text()) 28 | 29 | ``` 30 | 31 | 32 | ## Components 33 | 34 | 35 | ```{r} 36 | #| echo: false 37 | #| fig-height: 2.75 38 | #| fig-width: 7 39 | #| out-width: 70% 40 | 41 | p1 <- ggplot(df, aes(Income, Illiteracy)) + 42 | geom_point(color = data_color) + 43 | theme_void(fs) + 44 | ggtitle("Layer(s)") 45 | 46 | p2 <- ggplot(df, aes(Income, Illiteracy/100)) + 47 | theme_classic(fs) + 48 | ggtitle("Scales") + 49 | scale_y_continuous(name = "Illiteracy", labels = scales::percent) 50 | 51 | p3 <- ggplot(df, aes(Income, Illiteracy/100)) + 52 | scale_x_continuous(name = NULL) + 53 | scale_y_continuous(name = NULL) + 54 | ggtitle("Coordinate System") + 55 | theme_bw(fs) + 56 | theme(axis.text = element_blank()) 57 | 58 | p4 <- ggplot(df, aes(Income, Illiteracy/100)) + 59 | scale_x_continuous(name = NULL) + 60 | scale_y_continuous(name = NULL) + 61 | ggtitle("Faceting") + 62 | theme_bw(fs) + 63 | theme(axis.text = element_blank()) + 64 | facet_wrap(~Region) 65 | 66 | 67 | p5 <- ggplot(df, aes(Income, Illiteracy/100)) + 68 | ggtitle("Theme") + 69 | theme_fivethirtyeight(13, base_family = "Chalkboard") + 70 | theme(plot.title = element_text(size = rel(1.2))) 71 | 72 | p1 + plot_spacer() + p2 + plot_spacer() + p3 + 73 | plot_layout(widths = c(.35, .05, .35, .05, .35)) 74 | ``` 75 | 76 | ```{r} 77 | #| echo: false 78 | #| fig-height: 2.75 79 | #| fig-width: 7 80 | #| out-width: 70% 81 | plot_spacer() + p4 + plot_spacer() + p5 + plot_spacer() + 82 | plot_layout(widths = c(.1, .37, .05, .37, .1)) 83 | ``` 84 | 85 | -------------------------------------------------------------------------------- /index.qmd: -------------------------------------------------------------------------------- 1 | # Preface {.unnumbered} 2 | 3 | This tutorial is designed for R users with no experience with ggplot2 or those who have tried to learn but got stuck along the way. It is not designed to be a comprehensive guide but rather to build a solid foundation. It differs from other resources in the following ways: 4 | 5 | * *Theory* We begin with the grammar of graphics, the philosophical approach upon which ggplot2 is based. 6 | 7 | * *Data* We focus on data types and data shape. From my experience a good portion of ggplot2 problems are caused by data having the wrong *type* or wrong *shape* rather than the wrong ggplot2 code. 8 | 9 | * *Essentials* We focus on the elements that matter most for creating effective graphs: the data layers, scales, and faceting. We pay minimal attention to theme (non-data) elements such as tweaking the size and positions of labels and the like. As such, this is not a comprehensive guide. If you have a basic understanding of ggplot2 and wish to learn how to do something specific, the following are great resources: [ggplot2: Elegant Graphics for Data Analysis (3e)](https://ggplot2-book.org/) or [R Graphics Cookbook, 2nd edition](https://r-graphics.org/) 10 | 11 | * *Good graphs* 12 | 13 | * *Errors* 14 | 15 | ## Setup 16 | 17 | As this is not an introduction to R, you probably already have R installed. If you haven't updated R recently, say within the last year, download and install the appropriate version for your operating system from [The Comprehensive R Archive Network](https://cloud.r-project.org). If you use RStudio check for updates by clicking "Help" "Check for Updates" from within the application. Finally, update or install the tidyverse packages by running `install.packages("tidyverse")`. For all three -- R, RStudio, **tidyverse** -- do not ignore the advice to update! 18 | 19 | ## The basics 20 | 21 | ggplot2 is based on a *grammar of graphics* (the "gg" in ggplot2) which makes it different from graphics programs that are based on chart types. Think of the old style lego kits that give you building blocks with which you can make whatever you want. To make the most of the package, it's very helpful to think of it in these terms. Rather than think in terms of names of charts, think in terms of the graphical elements that are needed to create that type of chart. With knowledge of how to put together the basic elements you will be able to create anything you want. 22 | 23 | 24 | 25 | 26 | 27 | This is a Quarto book. 28 | 29 | To learn more about Quarto books visit . 30 | -------------------------------------------------------------------------------- /style/joyce.theme: -------------------------------------------------------------------------------- 1 | { 2 | "metadata" : { 3 | "copyright": [ 4 | "SPDX-FileCopyrightText: 2016 Volker Krause ", 5 | "SPDX-FileCopyrightText: 2016 Dominik Haumann " 6 | ], 7 | "license": "SPDX-License-Identifier: MIT", 8 | "revision" : 5, 9 | "name" : "Printing" 10 | }, 11 | "text-styles": { 12 | "Normal" : { 13 | "text-color" : "#000000", 14 | "selected-text-color" : "#ffffff", 15 | "bold" : false, 16 | "italic" : false, 17 | "underline" : false, 18 | "strike-through" : false 19 | }, 20 | "Attribute" : { 21 | "text-color" : "#9753B8", 22 | "bold": true 23 | }, 24 | "Function" : { 25 | "text-color" : "#3379be", 26 | "bold" : true 27 | }, 28 | "SpecialChar" : { 29 | "text-color" : "#ff5500", 30 | "bold" : true 31 | }, 32 | "String" : { 33 | "text-color" : "#666666" 34 | }, 35 | "DecVal" : { 36 | "text-color" : "#666666" 37 | }, 38 | "Comment" : { 39 | "text-color" : "#666666" 40 | }, 41 | "Float" : { 42 | "text-color" : "#666666" 43 | }, 44 | "Attribute-old" : { 45 | "text-color" : "#2E8B57", 46 | "bold": true 47 | }, 48 | 49 | 50 | 51 | "Keyword" : { 52 | "text-color" : "#000000", 53 | "selected-text-color" : "#ffffff", 54 | "bold" : true 55 | }, 56 | "Variable" : { 57 | "text-color" : "#0057ae", 58 | "selected-text-color" : "#00316e" 59 | }, 60 | "ControlFlow" : { 61 | "text-color" : "#000000", 62 | "selected-text-color" : "#ffffff", 63 | "bold" : true 64 | }, 65 | "Operator" : { 66 | "text-color" : "#000000", 67 | "selected-text-color" : "#ffffff" 68 | }, 69 | "BuiltIn" : { 70 | "text-color" : "#644a9b", 71 | "selected-text-color" : "#452886" 72 | }, 73 | "Extension" : { 74 | "text-color" : "#0095ff", 75 | "selected-text-color" : "#ffffff", 76 | "bold" : true 77 | }, 78 | "Preprocessor" : { 79 | "text-color" : "#006e28", 80 | "selected-text-color" : "#006e28" 81 | }, 82 | "Char" : { 83 | "text-color" : "#924c9d", 84 | "selected-text-color" : "#6c2477" 85 | }, 86 | "VerbatimString" : { 87 | "text-color" : "#ea0404", 88 | "selected-text-color" : "#9c0e0e" 89 | }, 90 | "SpecialString" : { 91 | "text-color" : "#ff5500", 92 | "selected-text-color" : "#ff5500" 93 | }, 94 | "Import" : { 95 | "text-color" : "#644a9b", 96 | "selected-text-color" : "#452886" 97 | }, 98 | "DataType" : { 99 | "text-color" : "#0057ae", 100 | "selected-text-color" : "#00316e" 101 | }, 102 | "BaseN" : { 103 | "text-color" : "#b08000", 104 | "selected-text-color" : "#805c00" 105 | }, 106 | "Constant" : { 107 | "text-color" : "#aa5500", 108 | "selected-text-color" : "#5e2f00" 109 | }, 110 | 111 | "Documentation" : { 112 | "text-color" : "#607880", 113 | "selected-text-color" : "#46585e" 114 | }, 115 | "Annotation" : { 116 | "text-color" : "#ca60ca", 117 | "selected-text-color" : "#a44ea4" 118 | }, 119 | "CommentVar" : { 120 | "text-color" : "#0095ff", 121 | "selected-text-color" : "#ffffff" 122 | }, 123 | "RegionMarker" : { 124 | "text-color" : "#0057ae", 125 | "selected-text-color" : "#00316e", 126 | "background-color" : "#e0e9f8" 127 | }, 128 | "Information" : { 129 | "text-color" : "#d2d2d2", 130 | "selected-text-color" : "#805c00" 131 | }, 132 | "Warning" : { 133 | "text-color" : "#d2d2d2", 134 | "selected-text-color" : "#9c0e0e" 135 | }, 136 | "Alert" : { 137 | "text-color" : "#bf0303", 138 | "selected-text-color" : "#9c0e0e", 139 | "background-color" : "#f7e6e6", 140 | "bold" : true 141 | }, 142 | "Error" : { 143 | "text-color" : "#bf0303", 144 | "selected-text-color" : "#9c0e0e", 145 | "underline" : true 146 | }, 147 | "Others" : { 148 | "text-color" : "#006e28", 149 | "selected-text-color" : "#006e28" 150 | } 151 | }, 152 | "editor-colors": { 153 | "BackgroundColor" : "#f5f5f5", 154 | "CodeFolding" : "#94caef", 155 | "BracketMatching" : "#edf9ff", 156 | "CurrentLine" : "#f8f7f6", 157 | "IconBorder" : "#d6d2d0", 158 | "IndentationLine" : "#d2d2d2", 159 | "LineNumbers" : "#221f1e", 160 | "CurrentLineNumber" : "#221f1e", 161 | "MarkBookmark" : "#0000ff", 162 | "MarkBreakpointActive" : "#ff0000", 163 | "MarkBreakpointReached" : "#ffff00", 164 | "MarkBreakpointDisabled" : "#ff00ff", 165 | "MarkExecution" : "#a0a0a4", 166 | "MarkWarning" : "#00ff00", 167 | "MarkError" : "#ff0000", 168 | "ModifiedLines" : "#f6e6e6", 169 | "ReplaceHighlight" : "#00ff00", 170 | "SavedLines" : "#baf8ce", 171 | "SearchHighlight" : "#ffff00", 172 | "TextSelection" : "#94caef", 173 | "Separator" : "#221f1e", 174 | "SpellChecking" : "#bf0303", 175 | "TabMarker" : "#d2d2d2", 176 | "TemplateBackground" : "#d6d2d0", 177 | "TemplatePlaceholder" : "#baf8ce", 178 | "TemplateFocusedPlaceholder" : "#76da98", 179 | "TemplateReadOnlyPlaceholder" : "#f6e6e6", 180 | "WordWrapMarker" : "#ededed" 181 | } 182 | } 183 | -------------------------------------------------------------------------------- /style/joyce_temp.R: -------------------------------------------------------------------------------- 1 | { 2 | "metadata" : { 3 | "copyright": [ 4 | "SPDX-FileCopyrightText: 2016 Volker Krause ", 5 | "SPDX-FileCopyrightText: 2016 Dominik Haumann " 6 | ], 7 | "license": "SPDX-License-Identifier: MIT", 8 | "revision" : 5, 9 | "name" : "Printing" 10 | }, 11 | "text-styles": { 12 | "Normal" : { 13 | "text-color" : "#000000", 14 | "selected-text-color" : "#ffffff", 15 | "bold" : false, 16 | "italic" : false, 17 | "underline" : false, 18 | "strike-through" : false 19 | }, 20 | "Attribute" : { 21 | "text-color" : "#9753B8", 22 | "bold": true 23 | }, 24 | "Function" : { 25 | "text-color" : "#3379be", 26 | "bold" : true 27 | }, 28 | "SpecialChar" : { 29 | "text-color" : "#ff5500", 30 | "bold" : true 31 | }, 32 | "String" : { 33 | "text-color" : "#666666" 34 | }, 35 | "DecVal" : { 36 | "text-color" : "#666666" 37 | }, 38 | "Comment" : { 39 | "text-color" : "#666666" 40 | }, 41 | "Float" : { 42 | "text-color" : "#666666" 43 | }, 44 | "Attribute-old" : { 45 | "text-color" : "#2E8B57", 46 | "bold": true 47 | }, 48 | 49 | 50 | 51 | "Keyword" : { 52 | "text-color" : "#000000", 53 | "selected-text-color" : "#ffffff", 54 | "bold" : true 55 | }, 56 | "Variable" : { 57 | "text-color" : "#0057ae", 58 | "selected-text-color" : "#00316e" 59 | }, 60 | "ControlFlow" : { 61 | "text-color" : "#000000", 62 | "selected-text-color" : "#ffffff", 63 | "bold" : true 64 | }, 65 | "Operator" : { 66 | "text-color" : "#000000", 67 | "selected-text-color" : "#ffffff" 68 | }, 69 | "BuiltIn" : { 70 | "text-color" : "#644a9b", 71 | "selected-text-color" : "#452886" 72 | }, 73 | "Extension" : { 74 | "text-color" : "#0095ff", 75 | "selected-text-color" : "#ffffff", 76 | "bold" : true 77 | }, 78 | "Preprocessor" : { 79 | "text-color" : "#006e28", 80 | "selected-text-color" : "#006e28" 81 | }, 82 | "Char" : { 83 | "text-color" : "#924c9d", 84 | "selected-text-color" : "#6c2477" 85 | }, 86 | "VerbatimString" : { 87 | "text-color" : "#ea0404", 88 | "selected-text-color" : "#9c0e0e" 89 | }, 90 | "SpecialString" : { 91 | "text-color" : "#ff5500", 92 | "selected-text-color" : "#ff5500" 93 | }, 94 | "Import" : { 95 | "text-color" : "#644a9b", 96 | "selected-text-color" : "#452886" 97 | }, 98 | "DataType" : { 99 | "text-color" : "#0057ae", 100 | "selected-text-color" : "#00316e" 101 | }, 102 | "BaseN" : { 103 | "text-color" : "#b08000", 104 | "selected-text-color" : "#805c00" 105 | }, 106 | "Constant" : { 107 | "text-color" : "#aa5500", 108 | "selected-text-color" : "#5e2f00" 109 | }, 110 | 111 | "Documentation" : { 112 | "text-color" : "#607880", 113 | "selected-text-color" : "#46585e" 114 | }, 115 | "Annotation" : { 116 | "text-color" : "#ca60ca", 117 | "selected-text-color" : "#a44ea4" 118 | }, 119 | "CommentVar" : { 120 | "text-color" : "#0095ff", 121 | "selected-text-color" : "#ffffff" 122 | }, 123 | "RegionMarker" : { 124 | "text-color" : "#0057ae", 125 | "selected-text-color" : "#00316e", 126 | "background-color" : "#e0e9f8" 127 | }, 128 | "Information" : { 129 | "text-color" : "#d2d2d2", 130 | "selected-text-color" : "#805c00" 131 | }, 132 | "Warning" : { 133 | "text-color" : "#d2d2d2", 134 | "selected-text-color" : "#9c0e0e" 135 | }, 136 | "Alert" : { 137 | "text-color" : "#bf0303", 138 | "selected-text-color" : "#9c0e0e", 139 | "background-color" : "#f7e6e6", 140 | "bold" : true 141 | }, 142 | "Error" : { 143 | "text-color" : "#bf0303", 144 | "selected-text-color" : "#9c0e0e", 145 | "underline" : true 146 | }, 147 | "Others" : { 148 | "text-color" : "#006e28", 149 | "selected-text-color" : "#006e28" 150 | } 151 | }, 152 | "editor-colors": { 153 | "BackgroundColor" : "#f5f5f5", 154 | "CodeFolding" : "#94caef", 155 | "BracketMatching" : "#edf9ff", 156 | "CurrentLine" : "#f8f7f6", 157 | "IconBorder" : "#d6d2d0", 158 | "IndentationLine" : "#d2d2d2", 159 | "LineNumbers" : "#221f1e", 160 | "CurrentLineNumber" : "#221f1e", 161 | "MarkBookmark" : "#0000ff", 162 | "MarkBreakpointActive" : "#ff0000", 163 | "MarkBreakpointReached" : "#ffff00", 164 | "MarkBreakpointDisabled" : "#ff00ff", 165 | "MarkExecution" : "#a0a0a4", 166 | "MarkWarning" : "#00ff00", 167 | "MarkError" : "#ff0000", 168 | "ModifiedLines" : "#f6e6e6", 169 | "ReplaceHighlight" : "#00ff00", 170 | "SavedLines" : "#baf8ce", 171 | "SearchHighlight" : "#ffff00", 172 | "TextSelection" : "#94caef", 173 | "Separator" : "#221f1e", 174 | "SpellChecking" : "#bf0303", 175 | "TabMarker" : "#d2d2d2", 176 | "TemplateBackground" : "#d6d2d0", 177 | "TemplatePlaceholder" : "#baf8ce", 178 | "TemplateFocusedPlaceholder" : "#76da98", 179 | "TemplateReadOnlyPlaceholder" : "#f6e6e6", 180 | "WordWrapMarker" : "#ededed" 181 | } 182 | } 183 | -------------------------------------------------------------------------------- /layers.qmd: -------------------------------------------------------------------------------- 1 | # Layers 2 | 3 | The layers represent the data, what the graph is all about. Everything else--the scales, coordinate system, faceting, and themes--are accessories to make the data clear and comprehensible. Therefore it is essential to get the data right. Even if everything else looks perfect, if the data is wrong the graph is worthless. Each layer consists of [five components](https://ggplot2-book.org/layers.html): 1) data, 2) aesthetic mapping, 3) geom, 4) stat, and 5) position. Most of the time you can rely on the defaults for 4) stat and 5) position, so we'll start with the first three components, all of which are required. 4 | 5 | ## Data 6 | 7 | ggplot2 is designed to work only with data frames. That means no vectors, matrices, tables, or lists. If your data is is not in data frame form, you'll need to convert it to a data frame first. How can you tell if you have the right format? Use `class` to check: 8 | 9 | ```{r} 10 | library(ggplot2) 11 | class(faithful) 12 | class(CO2) 13 | class(diamonds) 14 | class(Titanic) 15 | class(Seatbelts) 16 | ``` 17 | 18 | As long as `data.frame` is one of the classes returned, you're good to go. So `faithful` and `CO2`, two of the built-in base R datasets would work, as would `diamonds` a dataset that comes with the ggplot2 package. Note that `class(diamonds)` also returns `tbl_df` and `tbl` indications that `diamonds` is also a *tibble*, the **tidyverse** version of a data frame. We'll return to this topic later. Neither `Titanic` nor `Seatbelts` is a data frame so both would produce errors if we tried to create graphs from this data with ggplot2 without converting the data. If you read data from a file with `read.csv()`, `read_csv()`, or other functions for reading tabular data, it will be a `data.frame`. 19 | 20 | ## Geoms 21 | 22 | Geoms are the heart and soul of graphics made with ggplot2. A "geom" is short-hand for geometric object, the shapes that represent. We will begin with six commonly used geoms, shown below. 23 | 24 | ```{r} 25 | #| echo: false 26 | #| fig-width: 3 27 | #| layout-ncol: 3 28 | #| out-width: 85% 29 | 30 | 31 | library(tidyverse) 32 | data_color <- "#008fd5" 33 | fs <- 14 34 | 35 | df <- data.frame(state.x77) |> 36 | rownames_to_column("State") |> 37 | mutate(Region = state.region) 38 | 39 | m <- .25 40 | 41 | ggplot(df, aes(Income, Illiteracy)) + 42 | geom_point(color = data_color, size = 2) + ggtitle("geom_point()") + 43 | scale_x_continuous(expand = expansion(mult = m)) + 44 | scale_y_continuous(expand = expansion(mult = m)) + 45 | theme_void(13, base_family = "Menlo") + 46 | theme(plot.title = element_text(color = "#3379be", face = "bold")) 47 | 48 | ggplot(df, aes(Income)) + 49 | geom_histogram(bins = 15, color = data_color, fill = data_color, alpha = .5, lwd = 1.25) + ggtitle("geom_histogram()") + 50 | scale_x_continuous(expand = expansion(mult = m)) + 51 | scale_y_continuous(expand = expansion(mult = m)) + 52 | theme_void(13, base_family = "Menlo") + 53 | theme(plot.title = element_text(color = "#3379be", face = "bold")) 54 | 55 | ggplot(df, aes(Income)) + 56 | geom_density(color = data_color, lwd = 1.25) + 57 | ggtitle("geom_density()") + 58 | scale_x_continuous(expand = expansion(mult = m)) + 59 | scale_y_continuous(expand = expansion(mult = m)) + 60 | theme_void(13, base_family = "Menlo") + 61 | theme(plot.title = element_text(color = "#3379be", face = "bold")) 62 | 63 | ggplot(df, aes(x = Region, y = Income)) + 64 | geom_boxplot(color = data_color, lwd = 1.25) + ggtitle("geom_boxplot()") + 65 | scale_x_discrete(expand = expansion(mult = m)) + 66 | scale_y_continuous(expand = expansion(mult = m)) + 67 | theme_void(13, base_family = "Menlo") + 68 | theme(plot.title = element_text(color = "#3379be", face = "bold")) 69 | 70 | ggplot(df, aes(x = Region)) + 71 | geom_bar(fill = data_color, color = data_color, alpha = .5, width = .8, lwd = 1.25) + ggtitle("geom_bar()") + 72 | scale_x_discrete(expand = expansion(mult = 4*m)) + 73 | scale_y_continuous(expand = expansion(mult = 1.5*m)) + 74 | theme_void(13, base_family = "Menlo") + 75 | theme(plot.title = element_text(color = "#3379be", face = "bold")) 76 | 77 | ggplot(df, aes(x = Region)) + 78 | geom_bar(fill = data_color, color = data_color, alpha = .5, width = .8, lwd = 1.25) + ggtitle("geom_col()") + 79 | scale_x_discrete(expand = expansion(mult = 4*m)) + 80 | scale_y_continuous(expand = expansion(mult = 1.5*m)) + 81 | theme_void(13, base_family = "Menlo") + 82 | theme(plot.title = element_text(color = "#3379be", face = "bold")) 83 | ``` 84 | 85 | Once these geoms are mastered, the hope is that it will be easy to learn additional geoms as you'll know how they work. (Did you know that `geom_bar()` and `geom_col()` produce the same visual? We'll discuss why later.) 86 | 87 | ## Aesthetic mappings 88 | 89 | An aesthetic mapping relates visual properties with variables (also called features or columns) in the data. There are a limited number of aesthetic mappings; some of the most common are `x`, `y`, `color`, and `fill`. For example, to create the following scatterplot, we map `x` to `Income` and `y` to `Illiteracy`: 90 | 91 | ```{r} 92 | #| echo: false 93 | #| layout-ncol: 2 94 | 95 | ggplot(df, aes(Income/1000, Illiteracy/100)) + 96 | geom_point(color = data_color) + 97 | scale_x_continuous(name = "Per capita income (in thousands of $)") + 98 | scale_y_continuous(name = "Illiteracy rate", labels = scales::percent) + 99 | labs(title = "Illiteracy vs. Income by State", caption = "Data: state.x77, base R dataset") + 100 | theme_bw(13) 101 | 102 | source("helpers.R") 103 | arrow_chart(domain = c("x", "y"), range = c("Income", "Illiteracy")) + 104 | ggtitle("Aesthetic mappings") 105 | 106 | ``` 107 | 108 | For each geom, there is a small set of required mappings and a much larger set of optional mappings. The catch is that sometimes it may not be clear which mappings are required. In this guide we will always make a special point of indicating the required mappings, as this can be a stumbling block for beginners. 109 | 110 | ::: callout-tip 111 | Whenever you learn a new geom, pay careful attention to the required mappings. 112 | ::: 113 | 114 | Let's consider an example. `geom_histogram()` has **one** required mapping **`x` or `y`**. A standard histogram with vertical bars is produced by mapping `x` though there may be circumstances in which a `y` mapping is desired, for example to create a population pyramid. 115 | 116 | ```{r} 117 | #| echo: false 118 | #| layout-ncol: 2 119 | 120 | ggplot(df, aes(x = Income)) + 121 | geom_histogram(bins = 15, color = data_color, fill = data_color, alpha = .5, lwd = 1.25) + 122 | scale_x_continuous(expand = expansion(mult = m)) + 123 | scale_y_continuous(expand = expansion(mult = m)) + 124 | ggtitle("x mapped to variable") + 125 | theme_void(fs) 126 | 127 | ggplot(df, aes(y = Income)) + 128 | geom_histogram(bins = 15, color = data_color, fill = data_color, alpha = .5, lwd = 1.25) + 129 | scale_x_continuous(expand = expansion(mult = m)) + 130 | scale_y_continuous(expand = expansion(mult = m)) + 131 | ggtitle("y mapped to variable") + 132 | theme_void(fs) 133 | 134 | ``` 135 | 136 | ## Continuous vs. discrete mappings 137 | 138 | ::: callout-tip 139 | ## Don't skip this section. 140 | 141 | It's really important. 🤓 142 | ::: 143 | 144 | In addition to knowing the required mappings, it is critical to know whether the the visual component (`x`, `y`, `fill`, `color`, etc.) must be mapped to a *continuous* (think numerical) variable, *discrete* (think categorical) variable, or either one. If you're working with columns in a data frame you will likely know their data types, but there are times when you'll need to check that the data types are correct. There are many ways to check; depending on the method, continuous columns will be marked as `numeric`, `num`, `dbl`, `integer`, or `dbl`. *Discrete* mappings will appear as: `factor`, `character`, `chr`, `Factor`, `fct`, `Ord.factor`, `ord`, `logi`, `logical`, `lgl`, to indicate that the variable is a character, factor, or logical. (The differences among the terms within each group are not important at the moment.) 145 | 146 | Let's consider the built-in dataset `CO2`. Try running the code shown below for practice. 147 | 148 | Recall that we first must be sure that we're working with a data frame: 149 | 150 | ```{r} 151 | class(CO2) 152 | ``` 153 | 154 | Now let's look at the data types of the columns with `str()`: 155 | 156 | ```{r} 157 | str(CO2) 158 | ``` 159 | 160 | Note that both `conc` and `uptake` have data type `num` which is *continuous*, while `Plant`, `Type` and `Treatment` have data types `Factor` or `Ord.factor` which are *discrete*. 161 | 162 | Another method is to use `glimpse()` from the **dplyr** package, which is similar to `str()` but shows more data and less attribute information: 163 | 164 | ```{r} 165 | library(dplyr) 166 | glimpse(CO2) 167 | ``` 168 | 169 | We see that `glimpse()` labels the *continuous* (numeric) columns as `` rather than `num` and the *discrete* columns as `` and ``. Again, for our purposes, this distinction isn't important. 170 | 171 | ::: callout-tip 172 | `View()` and `head()` -- common methods for looking at data -- do NOT show data types so should not be used in this situation. 173 | ::: 174 | 175 | ### An example 176 | 177 | Suppose we wish to draw histograms of the *continuous* variables in the `CO2` dataset. Since histogram represents the distribution of a numerical variable and has no meaning for categorical variables, the mapping must be to a numerical variable. Our choices therefore are `conc` and `uptake`: 178 | 179 | ```{r} 180 | #| echo: false 181 | #| fig-height: 4 182 | #| fig-width: 4 183 | #| layout: [[-5, 45, -5, 45, -5]] 184 | 185 | ggplot(CO2, aes(conc)) + 186 | geom_histogram(breaks = seq(0, 1000, 125), color = data_color, fill = "#98C8EA", lwd = 1.25) + 187 | ggtitle("Histogram of conc") + 188 | theme_bw(16) 189 | 190 | ggplot(CO2, aes(uptake)) + 191 | geom_histogram(breaks = seq(0, 50, 5), color = data_color, fill = "#98C8EA", lwd = 1.25) + 192 | ggtitle("Histogram of uptake") + 193 | theme_bw(16) 194 | ``` 195 | 196 | If we try to try, though, to draw a histogram of a *discrete* variable, such as `Plant`, we'll get an error: 197 | 198 | ```{r} 199 | #| echo: false 200 | #| error: true 201 | ggplot(CO2, aes(Plant)) + 202 | geom_histogram(color = "black", fill = data_color) + 203 | ggtitle("Histogram of Plant") + 204 | theme_bw(fs) 205 | ``` 206 | 207 | We're ready now to combine the three essentials of a layer--data, an aesthetic mapping, and a geom--to create our first plot. 208 | 209 | ## Our first plot 210 | 211 | To review we said that plots are made up of layers, scales, coordinate systems, faceting, and themes. We begin by focusing on the layers and relying on defaults for all the rest. Each layer we said is composed of data, aesthetic mappings, geoms, stat and position. We will focus on **data**, **aesthetic mappings**, and **geoms**, and rely on defaults for stat and position. 212 | 213 | ::: callout-tip 214 | Always start with the geom and ask yourself: what are the required aesthetic mappings? 215 | ::: 216 | 217 | Let's make a histogram using the built-in dataset `faithful`. We confirm that it's a data frame and then determine that both variables are continuous (type `num`) and therefore either would work with `geom_histogram()`, which as we said requires continuous aesthetic mappings: 218 | 219 | ```{r} 220 | class(faithful) 221 | str(faithful) 222 | ``` 223 | 224 | Let's draw a histogram with the `waiting` variable. We now have everything we need: 225 | 226 | 1. data: `faithful` 227 | 2. mapping: `x` ➞ `waiting` 228 | 3. geom: `geom_histogram()` 229 | 230 | All that's left is to convert this to ggplot2 code. Typically the data and mapping are indicated in the call to `ggplot()` which initializes the plot, and then we add in the geom. 231 | 232 | ```{r} 233 | #| message: true 234 | library(ggplot2) 235 | ggplot(data = faithful, mapping = aes(x = waiting)) + 236 | geom_histogram() 237 | ``` 238 | 239 | And we have our first plot! A few things to note: 240 | 241 | - ggplot2 tries its best to teach you how to create good graphs.[^layers-1] In this case, it is communicating that the default number of bins for histograms drawn with ggplot2 is 30. Rather than rely on the default, you should try different values for [binwidth =]{.at} (or [bins =]{.at} or [breaks =]{.at}) to find one that best captures the shape of the distribution. 242 | 243 | - You cannot start a new line with [+]{.sc}. It is good practice to start a new line *after* every [+]{.sc}. 244 | 245 | - This plot isn't very pretty. It's very tempting to change the colors, font size, etc. but we're going to save that for later. 246 | 247 | [^layers-1]: There's a long history in R, dating back to the development of its predecessor, S, of following best practices for creating statistic graphics. Many of the help files for base R and ggplot2 plotting functions contain references to research by William Cleveland and others on creating effective graphs. (add references) 248 | --------------------------------------------------------------------------------