├── .Rbuildignore ├── .github ├── .gitignore └── workflows │ └── build_book.yaml ├── .gitignore ├── DESCRIPTION ├── IDE00422.202001072100.tif ├── README.md ├── _quarto.yml ├── after_body.tex ├── annotations.qmd ├── arranging-plots.qmd ├── before_body.tex ├── book ├── .gitignore ├── ggplot2-book.tex ├── latexmk ├── latexmkrc ├── render-tex.R └── springer │ ├── spbasic.bst │ ├── svind.ist │ └── svmono.cls ├── chicago-fullnote-bibliography.csl ├── collective-geoms.qmd ├── colour-wheel.R ├── common.R ├── contributing.md ├── contributors.csv ├── coord.qmd ├── cover.jpg ├── diagrams ├── colour-wheel.png ├── cover.pdf ├── diamond-dimensions.pdf ├── diamond-dimensions.png ├── empty.pdf ├── ggplot-pipeline.graffle ├── grid-grobs.graffle │ ├── data.plist │ └── image1.pdf ├── grid-grobs.pdf ├── grid-panelvp.graffle ├── grid-panelvp.pdf ├── grid-viewports.graffle │ ├── data.plist │ └── image1.pdf ├── grid-viewports.pdf ├── hcl-space.png ├── mastery-schema.graffle ├── mastery-schema.pdf ├── mastery-schema.png ├── position-facets.graffle ├── position-facets.pdf ├── position-facets.png ├── scale-guides.graffle │ ├── data.plist │ └── image1.pdf ├── scale-guides.pdf ├── scale-guides.png ├── vector-raster.graffle ├── vector-raster.pdf └── vector-raster.png ├── errata.txt ├── ext-springs.qmd ├── extending.qmd ├── extensions.qmd ├── facet.qmd ├── ga_script.html ├── getting-started.qmd ├── ggplot2-book.Rproj ├── ggplot2-review ├── forrence.md ├── ggplot2-book-David-Robinson.pdf ├── ggplot2-book-comments-forrence.pdf ├── ggplot2-book-pastor.pdf ├── ggplot2-book-wdoane-edits.pdf └── ggplot2-book_ygc.pdf ├── index.qmd ├── individual-geoms.qmd ├── internals.qmd ├── internals_ggbuild.R ├── internals_gggtable.R ├── introduction.qmd ├── layers.qmd ├── maps.qmd ├── mastery.qmd ├── mi_raster.rds ├── networks.qmd ├── preamble.tex ├── preface-2e.qmd ├── preface-3e.qmd ├── programming.qmd ├── references.bib ├── references.qmd ├── scales-colour.qmd ├── scales-guides.qmd ├── scales-other.qmd ├── scales-position.qmd ├── scales.qmd ├── springer ├── contract-1-amendment.pdf ├── contract-1.pdf ├── contract-2.pdf ├── contract-3.pdf ├── marketing-ggplot2.txt └── proposal-3e.md ├── start.qmd ├── statistical-summaries.qmd ├── style.css ├── themes.qmd ├── todo.numbers └── toolbox.qmd /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^.*\.Rproj$ 2 | ^\.Rproj\.user$ 3 | ^\.github$ 4 | -------------------------------------------------------------------------------- /.github/.gitignore: -------------------------------------------------------------------------------- 1 | *.html 2 | -------------------------------------------------------------------------------- /.github/workflows/build_book.yaml: -------------------------------------------------------------------------------- 1 | on: 2 | push: 3 | branches: main 4 | pull_request: 5 | branches: main 6 | # to be able to trigger a manual build 7 | workflow_dispatch: 8 | schedule: 9 | # run every day at 11 PM 10 | - cron: "0 23 * * *" 11 | 12 | name: build_book.yaml 13 | 14 | env: 15 | isExtPR: ${{ github.event.pull_request.head.repo.fork == true }} 16 | RUST_BACKTRACE: 1 17 | 18 | jobs: 19 | build: 20 | runs-on: ubuntu-latest 21 | env: 22 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 23 | steps: 24 | - uses: actions/checkout@v3 25 | 26 | - uses: r-lib/actions/setup-r@v2 27 | with: 28 | use-public-rspm: true 29 | 30 | - uses: r-lib/actions/setup-r-dependencies@v2 31 | 32 | - name: Render book to all format 33 | # Add any command line argument needed 34 | run: | 35 | quarto render 36 | 37 | - name: Upload website artifact 38 | if: ${{ github.ref == 'refs/heads/main' || github.ref == 'refs/heads/master' }} 39 | uses: actions/upload-pages-artifact@v3 40 | with: 41 | path: "_book" 42 | 43 | deploy: 44 | needs: build 45 | 46 | permissions: 47 | pages: write # to deploy to Pages 48 | id-token: write # to verify the deployment originates from an appropriate source 49 | 50 | environment: 51 | name: github-pages 52 | url: ${{ steps.deployment.outputs.page_url }} 53 | 54 | runs-on: ubuntu-latest 55 | steps: 56 | - name: Deploy to GitHub Pages 57 | id: deployment 58 | uses: actions/deploy-pages@v4 59 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | /*.html 2 | /*.pdf 3 | .Rproj.user 4 | .Rhistory 5 | book/ggplot2-book.pdf 6 | temp.Rmd 7 | rsconnect 8 | www 9 | _figures 10 | _book/ 11 | _bookdown_files/ 12 | _main.rds 13 | _main.adx 14 | _main.log 15 | _main.aux 16 | _main.idx 17 | _main.out 18 | _main.toc 19 | 20 | /.quarto/ 21 | *_cache 22 | *_files 23 | _main.Rmd 24 | site_libs 25 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: ggbook2 2 | Title: ggbook2 is not really an R package, but this is here 3 | to list the dependencies for building the ggplot2 book. 4 | Version: 0.1 5 | Authors@R: 6 | person(given = "First", 7 | family = "Last", 8 | role = c("aut", "cre"), 9 | email = "first.last@example.com") 10 | URL: https://github.com/hadley/ggplot2-book 11 | Depends: 12 | R (>= 3.1.0) 13 | Imports: 14 | colorBlindness, 15 | directlabels, 16 | dplyr, 17 | ggforce, 18 | gghighlight, 19 | ggnewscale, 20 | ggplot2 (>= 3.5.0), 21 | ggraph, 22 | ggrepel, 23 | ggtext, 24 | ggthemes, 25 | hexbin, 26 | Hmisc, 27 | mapproj, 28 | maps, 29 | munsell, 30 | ozmaps, 31 | paletteer (>= 1.2.0), 32 | patchwork, 33 | rmapshaper, 34 | scico, 35 | seriation, 36 | sf, 37 | stars, 38 | tidygraph, 39 | tidyr, 40 | wesanderson 41 | Suggests: 42 | babynames, 43 | bookdown, 44 | bslib, 45 | conflicted, 46 | desc, 47 | downlit, 48 | jsonlite, 49 | sessioninfo 50 | Encoding: UTF-8 51 | -------------------------------------------------------------------------------- /IDE00422.202001072100.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/IDE00422.202001072100.tif -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ggplot2 book 2 | 3 | 4 | [![Build status](https://github.com/hadley/ggplot2-book/actions/workflows/bookdown.yaml/badge.svg?event=push)](https://github.com/hadley/ggplot2-book/actions) 5 | 6 | 7 | This is code and text behind the [ggplot2: elegant graphics for data analysis](http://ggplot2-book.org/) book. Please help us make it better by [contributing](contributing.md)! 8 | 9 | ## Installing dependencies 10 | 11 | Install the R packages used by the book with: 12 | 13 | ```r 14 | # install.packages("devtools") 15 | devtools::install_deps() 16 | ``` 17 | 18 | ## Build the book 19 | 20 | In RStudio, press Cmd/Ctrl + Shift + B. Or run: 21 | 22 | ```R 23 | bookdown::render_book("index.Rmd") 24 | ``` 25 | 26 | 27 | -------------------------------------------------------------------------------- /_quarto.yml: -------------------------------------------------------------------------------- 1 | project: 2 | type: book 3 | output-dir: _book 4 | 5 | engine: knitr 6 | 7 | book: 8 | title: "ggplot2: Elegant Graphics for Data Analysis (3e)" 9 | reader-mode: true 10 | 11 | page-footer: 12 | left: | 13 | ggplot2: Elegant Graphics for Data Analysis (3e) was written by 14 | Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen. 15 | right: | 16 | This book was built with Quarto. 17 | cover-image: cover.jpg 18 | favicon: cover.jpg 19 | site-url: https://ggplot2-book.org/ 20 | repo-url: https://github.com/hadley/ggplot2-book/ 21 | repo-branch: main 22 | repo-actions: [edit, issue] 23 | chapters: 24 | - index.qmd 25 | 26 | - preface-3e.qmd 27 | - preface-2e.qmd 28 | 29 | - part: start.qmd 30 | chapters: 31 | - introduction.qmd 32 | - getting-started.qmd 33 | 34 | - part: toolbox.qmd 35 | chapters: 36 | - individual-geoms.qmd 37 | - collective-geoms.qmd 38 | - statistical-summaries.qmd 39 | - maps.qmd 40 | - networks.qmd 41 | - annotations.qmd 42 | - arranging-plots.qmd 43 | 44 | - part: scales.qmd 45 | chapters: 46 | - scales-position.qmd 47 | - scales-colour.qmd 48 | - scales-other.qmd 49 | 50 | - part: mastery.qmd 51 | chapters: 52 | - layers.qmd 53 | - scales-guides.qmd 54 | - coord.qmd 55 | - facet.qmd 56 | - themes.qmd 57 | 58 | - part: extending.qmd 59 | chapters: 60 | - programming.qmd 61 | - internals.qmd 62 | - extensions.qmd 63 | - ext-springs.qmd 64 | 65 | bibliography: references.bib 66 | 67 | format: 68 | html: 69 | theme: 70 | - cosmo 71 | code-link: true 72 | 73 | author-meta: "Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen" 74 | include-in-header: "ga_script.html" 75 | callout-appearance: simple 76 | 77 | editor: visual 78 | 79 | -------------------------------------------------------------------------------- /after_body.tex: -------------------------------------------------------------------------------- 1 | \backmatter 2 | 3 | \let\hyperlink=\oldhyperlink % Restore old hyperlink behaviour 4 | \cleardoublepage 5 | \markboth{Index}{Index} 6 | \addcontentsline{toc}{chapter}{Index} 7 | \printindex 8 | 9 | \addcontentsline{toc}{chapter}{Code index} 10 | \printindex[code] 11 | -------------------------------------------------------------------------------- /arranging-plots.qmd: -------------------------------------------------------------------------------- 1 | # Arranging plots {#sec-arranging-plots} 2 | 3 | ```{r} 4 | #| echo: false 5 | #| message: false 6 | #| results: asis 7 | source("common.R") 8 | status("drafting") 9 | ``` 10 | 11 | The grammar presented in ggplot2 is concerned with creating single plots. 12 | While the faceting system provides the means to produce several subplots all of these are part of the same main visualization, sharing layers, data, and scales. 13 | However, it is often necessary to use multiple disparate plots to tell a story or make an argument. 14 | These can of course be created individually and assembled in a layout program, but it is beneficial to do this in code to avoid time consuming and non-reproducible manual labor. 15 | A range of packages have risen to the occasion and provide different approaches to arranging separate plots. 16 | While this chapter will focus on the patchwork package you may also find some of the same functionalities in the cowplot, gridExtra and ggpubr packages. 17 | 18 | This chapter will be split into two parts. 19 | The first will be concerned with arranging plots side by side with no overlap, while the second will be concerned with arranging plots on top of each other. 20 | While these two scenarios are not necessarily in opposition to each other, the former scenario will often benefit from functionality that makes little sense in the latter, e.g. alignment of plotting regions. 21 | 22 | ## Laying out plots side by side 23 | 24 | Often, one wants to show two or more plots side by side to show different aspects of the same story in a compelling way. 25 | This is the scenario that patchwork was build to solve. 26 | At its heart, patchwork is a package that extends ggplot2's use of the `+` operator to work between multiple plots, as well as add additional operators for specialized compositions and working with compositions of plots. 27 | 28 | As an example of the most basic use of patchwork, we'll use the following 4 plots of the `mpg` dataset 29 | 30 | ```{r} 31 | p1 <- ggplot(mpg) + 32 | geom_point(aes(x = displ, y = hwy)) 33 | 34 | p2 <- ggplot(mpg) + 35 | geom_bar(aes(x = as.character(year), fill = drv), position = "dodge") + 36 | labs(x = "year") 37 | 38 | p3 <- ggplot(mpg) + 39 | geom_density(aes(x = hwy, fill = drv), colour = NA) + 40 | facet_grid(rows = vars(drv)) 41 | 42 | p4 <- ggplot(mpg) + 43 | stat_summary(aes(x = drv, y = hwy, fill = drv), geom = "col", fun.data = mean_se) + 44 | stat_summary(aes(x = drv, y = hwy), geom = "errorbar", fun.data = mean_se, width = 0.5) 45 | ``` 46 | 47 | The simplest use of patchwork is to use `+` to add plots together, thus creating an assemble of plots to display together: 48 | 49 | ```{r} 50 | library(patchwork) 51 | 52 | p1 + p2 53 | ``` 54 | 55 | `+` does not specify any specific layout, only that the plots should be displayed together. 56 | In the absence of a layout the same algorithm that governs the number of rows and columns in `facet_wrap()` will decide the number of rows and columns. 57 | This means that adding 3 plots together will create a 1x3 grid while adding 4 plots together will create a 2x2 grid. 58 | 59 | ```{r} 60 | p1 + p2 + p3 + p4 61 | ``` 62 | 63 | As can be seen from the two examples above, patchwork takes care of aligning the different parts of the plots with each other. 64 | You can see that all plotting regions are aligned, even in the presence of faceting. 65 | Further, you can see that the y-axis titles in the two left-most plots are aligned despite the axis text in the bottom left plot being wider. 66 | 67 | ### Taking control of the layout 68 | 69 | It is often that the automatically created grid is not what you want and it is of course possible to control it. 70 | The most direct and powerful way is to do this is to add a `plot_layout()` specification to the plot: 71 | 72 | ```{r} 73 | p1 + p2 + p3 + plot_layout(ncol = 2) 74 | ``` 75 | 76 | A common scenario is wanting to force a single row or column. 77 | patchwork provides two operators, `|` and `/` respectively, to facilitate this (under the hood they simply set number of rows or columns in the layout to 1). 78 | 79 | ```{r} 80 | p1 / p2 81 | ``` 82 | 83 | ```{r} 84 | # Basically the same as using `+` but the intend is clearer 85 | p3 | p4 86 | ``` 87 | 88 | patchwork allows nesting layouts which means that it is possible to create some very intricate layouts using just these two operators 89 | 90 | ```{r} 91 | p3 | (p2 / (p1 | p4)) 92 | ``` 93 | 94 | Alternatively, for very complex layouts, it is possible to specify non-tabular layouts with a textual representation in the `design` argument in `plot_layout()`. 95 | 96 | ```{r} 97 | layout <- " 98 | AAB 99 | C#B 100 | CDD 101 | " 102 | 103 | p1 + p2 + p3 + p4 + plot_layout(design = layout) 104 | ``` 105 | 106 | As has been apparent in the last couple of plots the legend often becomes redundant between plots. 107 | While it is possible to remove the legend in all but one plot before assembling them, patchwork provides something easier for the common case: 108 | 109 | ```{r} 110 | p1 + p2 + p3 + plot_layout(ncol = 2, guides = "collect") 111 | ``` 112 | 113 | Electing to collect guides will take all guides and put them together at the position governed by the global theme. 114 | Further, it will remove any duplicate guide leaving only unique guides in the plot. 115 | The duplication detection looks at the appearance of the guide, and not the underlying scale it comes from. 116 | Thus, it will only remove guides that are exactly alike. 117 | If you want to optimize space use by putting guides in an empty area of the layout, you can specify a plotting area for collected guides: 118 | 119 | ```{r} 120 | p1 + p2 + p3 + guide_area() + plot_layout(ncol = 2, guides = "collect") 121 | ``` 122 | 123 | ### Modifying subplots 124 | 125 | One of the tenets of patchwork is that the plots remain as standard ggplot objects until rendered. 126 | This means that they are amenable to modification after they have been assembled. 127 | The specific plots can by retrieved and set with `[[]]` indexing: 128 | 129 | ```{r} 130 | p12 <- p1 + p2 131 | p12[[2]] <- p12[[2]] + theme_light() 132 | p12 133 | ``` 134 | 135 | Often though, it is necessary to modify all subplots at once to e.g. give them a common theme. 136 | patchwork provides the `&` for this scenario: 137 | 138 | ```{r} 139 | p1 + p4 & theme_minimal() 140 | ``` 141 | 142 | This can also be used to give plots a common axis if they share the same aesthetic on that axis: 143 | 144 | ```{r} 145 | p1 + p4 & scale_y_continuous(limits = c(0, 45)) 146 | ``` 147 | 148 | ### Adding annotation 149 | 150 | Once plots have been assembled, they start to form a single unit. 151 | This also means that titles, subtitles, and captions will often pertain to the full ensemble and not individual plots. 152 | Titles etc. can be added to patchwork plots using the `plot_annotation()` function. 153 | 154 | ```{r} 155 | p34 <- p3 + p4 + plot_annotation( 156 | title = "A closer look at the effect of drive train in cars", 157 | caption = "Source: mpg dataset in ggplot2" 158 | ) 159 | p34 160 | ``` 161 | 162 | The titles formatted according to the theme specification in the `plot_annotation()` call. 163 | 164 | ```{r} 165 | p34 + plot_annotation(theme = theme_gray(base_family = "mono")) 166 | ``` 167 | 168 | As the global theme often follows the theme of the subplots, using `&` along with a theme object will modify the global theme as well as the themes of the subplots 169 | 170 | ```{r} 171 | p34 & theme_gray(base_family = "mono") 172 | ``` 173 | 174 | Another type of annotation, known especially in scientific literature, is to add tags to each subplot that will then be used to identify them in the text and caption. 175 | ggplot2 has the `tag` element for exactly this and patchwork offers functionality to set this automatically using the `tag_levels` argument. 176 | It can generate automatic levels in latin characters, arabic numerals, or roman numerals 177 | 178 | ```{r} 179 | p123 <- p1 | (p2 / p3) 180 | p123 + plot_annotation(tag_levels = "I") # Uppercase roman numerics 181 | ``` 182 | 183 | An additional feature is that it is possible to use nesting to define new tagging levels: 184 | 185 | ```{r} 186 | p123[[2]] <- p123[[2]] + plot_layout(tag_level = "new") 187 | p123 + plot_annotation(tag_levels = c("I", "a")) 188 | ``` 189 | 190 | ------------------------------------------------------------------------ 191 | 192 | As can be seen, patchwork offers a long range of possibilities when it comes to arranging plots, and the API scales with the level of complexity of the assembly, from simply using `+` to place multiple plots in the same area, to using nesting, layouts, and annotations to create advanced custom layouts. 193 | 194 | ## Arranging plots on top of each other 195 | 196 | While a lot of the functionality in patchwork is concerned with aligning plots in a grid, it also allows you to make insets, i.e. small plots placed on top of another plot. 197 | The functionality for this is wrapped in the `inset_element()` function which serves to mark the given plot as an inset to be placed on the preceding plot, along with recording the wanted placement etc. 198 | The basic usage is like this: 199 | 200 | ```{r} 201 | p1 + inset_element(p2, left = 0.5, bottom = 0.4, right = 0.9, top = 0.95) 202 | ``` 203 | 204 | The position is specified by given the left, right, top, and bottom location of the inset. 205 | The default is to use `npc` units which goes from 0 to 1 in the given area, but any `grid::unit()` can be used by giving them explicitly. 206 | The location is by default set to the panel area, but this can be changed with the `align_to` argument. 207 | Combining all this we can place an inset exactly 15 mm from the top right corner like this: 208 | 209 | ```{r} 210 | p1 + 211 | inset_element( 212 | p2, 213 | left = 0.4, 214 | bottom = 0.4, 215 | right = unit(1, "npc") - unit(15, "mm"), 216 | top = unit(1, "npc") - unit(15, "mm"), 217 | align_to = "full" 218 | ) 219 | ``` 220 | 221 | insets are not confined to ggplots. 222 | Any graphics supported by `wrap_elements()` can be used, including patchworks: 223 | 224 | ```{r} 225 | p24 <- p2 / p4 + plot_layout(guides = "collect") 226 | p1 + inset_element(p24, left = 0.5, bottom = 0.05, right = 0.95, top = 0.9) 227 | ``` 228 | 229 | A nice feature of insets is that they behave as standard patchwork subplots until they are rendered. 230 | This means that they are amenable to modifications after assembly, e.g. using `&`: 231 | 232 | ```{r} 233 | p12 <- p1 + inset_element(p2, left = 0.5, bottom = 0.5, right = 0.9, top = 0.95) 234 | p12 & theme_bw() 235 | ``` 236 | 237 | And auto tagging works as expected as well: 238 | 239 | ```{r} 240 | p12 + plot_annotation(tag_levels = "A") 241 | ``` 242 | 243 | ## Wrapping up 244 | 245 | This chapter has given a brief overview of some of the composition possibilities provided by patchwork, but is in no way exhaustive. 246 | Patchwork provides support for more than just ggplots and allows you to combine grid and base graphic elements with your plots as well if need be. 247 | It also allows even more complex designs using the `area()` constructor instead of the textual representation showcased here. 248 | All of these functionalities and many more are covered in the different guides available on its website: 249 | -------------------------------------------------------------------------------- /before_body.tex: -------------------------------------------------------------------------------- 1 | \frontmatter 2 | 3 | %\begin{dedication} 4 | \begin{quote} 5 | To my parents, Alison \& Brian Wickham. Without them, and their unconditional 6 | love and support, none of this would have been possible. 7 | \end{quote} 8 | %\end{dedication} 9 | -------------------------------------------------------------------------------- /book/.gitignore: -------------------------------------------------------------------------------- 1 | CHAPTERS 2 | tex/diagrams 3 | tex/ggplot2-book.fls 4 | tex/ggplot2-book.pdf 5 | tex/latexmk 6 | tex/latexmkrc 7 | tex.zip 8 | tex 9 | -------------------------------------------------------------------------------- /book/ggplot2-book.tex: -------------------------------------------------------------------------------- 1 | \documentclass[graybox,envcountchap,sectrefs]{svmono} 2 | 3 | \usepackage[scaled=0.92,varqu]{inconsolata} 4 | 5 | \usepackage{float} 6 | \usepackage{index} 7 | % index functions separately 8 | \newindex{code}{adx}{and}{R code index} 9 | \newcommand{\indexf}[1]{\index[code]{#1@\texttt{#1()}}} 10 | \newcommand{\indexc}[1]{\index[code]{#1@\texttt{#1}}} 11 | 12 | % Taken from pandoc x.md -o test.tex --standalone 13 | \usepackage{color} 14 | \usepackage{fancyvrb} 15 | \newcommand{\VerbBar}{|} 16 | \newcommand{\VERB}{\Verb[commandchars=\\\{\}]} 17 | \DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}} 18 | \newenvironment{Shaded}{}{} 19 | \newcommand{\KeywordTok} [1]{\textcolor[rgb]{0.00,0.44,0.13}{{#1}}} 20 | \newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.56,0.13,0.00}{{#1}}} 21 | \newcommand{\DecValTok} [1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}} 22 | \newcommand{\BaseNTok} [1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}} 23 | \newcommand{\FloatTok} [1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}} 24 | \newcommand{\CharTok} [1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}} 25 | \newcommand{\StringTok} [1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}} 26 | \newcommand{\CommentTok} [1]{\textcolor[rgb]{0.38,0.63,0.69}{{#1}}} 27 | \newcommand{\OtherTok} [1]{\textcolor[rgb]{0.00,0.44,0.13}{{#1}}} 28 | \newcommand{\AlertTok} [1]{\textcolor[rgb]{1.00,0.00,0.00}{{#1}}} 29 | \newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.02,0.16,0.49}{{#1}}} 30 | \newcommand{\ErrorTok} [1]{\textcolor[rgb]{1.00,0.00,0.00}{{#1}}} 31 | \newcommand{\NormalTok} [1]{{#1}} 32 | 33 | \newcommand{\OperatorTok} [1]{{#1}} 34 | \newcommand{\ControlFlowTok} [1]{{#1}} 35 | % 36 | \usepackage{longtable} 37 | \usepackage{booktabs} 38 | \usepackage{graphicx} 39 | \DeclareGraphicsExtensions{.pdf,.png} 40 | \providecommand{\tightlist}{\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} 41 | 42 | \usepackage[hyphens]{url} 43 | \usepackage{hyperref} 44 | 45 | % Place links in parens 46 | \renewcommand{\href}[2]{#2 (\url{#1})} 47 | % Use auto ref for internal links 48 | \let\oldhyperlink=\hyperlink 49 | \renewcommand{\hyperlink}[2]{\autoref{#1}} 50 | \def\chapterautorefname{Chapter} 51 | \def\sectionautorefname{Section} 52 | \def\subsectionautorefname{Section} 53 | \def\subsubsectionautorefname{Section} 54 | 55 | \setlength{\emergencystretch}{3em} % prevent overfull lines 56 | \vbadness=10000 % suppress underfull \vbox 57 | \hbadness=10000 % suppress underfull \vbox 58 | \hfuzz=10pt 59 | 60 | \makeindex 61 | \title{ggplot2} 62 | \subtitle{Elegant Graphics for Data Analysis} 63 | \author{Hadley Wickham} 64 | 65 | \begin{document} 66 | 67 | \frontmatter 68 | \maketitle 69 | 70 | \begin{dedication} 71 | To my parents, Alison \& Brian Wickham. Without them, and their unconditional 72 | love and support, none of this would have been possible. 73 | \end{dedication} 74 | 75 | \include{preface} 76 | 77 | \tableofcontents 78 | 79 | \mainmatter 80 | 81 | \part{Getting started} 82 | 83 | \include{introduction} 84 | \include{ggplot} 85 | \include{toolbox} 86 | 87 | \part{The Grammar} 88 | 89 | \include{mastery} 90 | \include{layers} 91 | \include{scales} 92 | \include{position} 93 | \include{themes} 94 | 95 | \part{Data analysis} 96 | 97 | \include{tidy-data} 98 | \include{data-manip} 99 | \include{modelling} 100 | \include{programming} 101 | 102 | \backmatter 103 | 104 | \let\hyperlink=\oldhyperlink % Restore old hyperlink behaviour 105 | \cleardoublepage 106 | \markboth{Index}{Index} 107 | \addcontentsline{toc}{chapter}{Index} 108 | \printindex 109 | 110 | \addcontentsline{toc}{chapter}{Code index} 111 | \printindex[code] 112 | 113 | \end{document} 114 | -------------------------------------------------------------------------------- /book/latexmkrc: -------------------------------------------------------------------------------- 1 | add_cus_dep('adx', 'and', 0, 'makeadx2and'); 2 | sub makeadx2and { 3 | system( "makeindex -o \"$_[0].and\" \"$_[0].adx\"" ); 4 | } 5 | -------------------------------------------------------------------------------- /book/render-tex.R: -------------------------------------------------------------------------------- 1 | library("methods") # avoids weird broom error 2 | library("rmarkdown") 3 | 4 | tex_chapter <- function (chapter = NULL, latex_engine = c("xelatex", "pdflatex", 5 | "lualatex"), code_width = 65) { 6 | options(digits = 3) 7 | set.seed(1014) 8 | latex_engine <- match.arg(latex_engine) 9 | rmarkdown::output_format(rmarkdown::knitr_options("html", chapter), 10 | rmarkdown::pandoc_options(to = "latex", 11 | from = "markdown_style", 12 | ext = ".tex", 13 | args = c("--top-level-division=chapter", 14 | rmarkdown::pandoc_latex_engine_args(latex_engine)) 15 | ), 16 | clean_supporting = FALSE) 17 | } 18 | 19 | path <- commandArgs(trailingOnly = TRUE) 20 | # command line args should contain just one chapter name 21 | if (length(path) == 0) { 22 | message("No input supplied") 23 | } else { 24 | base <- tex_chapter() 25 | base$knitr$opts_knit$width <- 67 26 | base$pandoc$from <- "markdown" 27 | 28 | rmarkdown::render(path, base, output_dir = "book/tex", envir = globalenv(), quiet = TRUE) 29 | } -------------------------------------------------------------------------------- /book/springer/svind.ist: -------------------------------------------------------------------------------- 1 | headings_flag 1 2 | heading_prefix "{\\bf " 3 | heading_suffix "}\\nopagebreak%\n \\indexspace\\nopagebreak%" 4 | delim_0 "\\idxquad " 5 | delim_1 "\\idxquad " 6 | delim_2 "\\idxquad " 7 | delim_n ",\\," 8 | -------------------------------------------------------------------------------- /collective-geoms.qmd: -------------------------------------------------------------------------------- 1 | # Collective geoms {#sec-collective-geoms} 2 | 3 | ```{r} 4 | #| echo: false 5 | #| message: false 6 | #| results: asis 7 | source("common.R") 8 | status("drafting") 9 | ``` 10 | 11 | Geoms can be roughly divided into individual and collective geoms. 12 | An **individual** geom draws a distinct graphical object for each observation (row). 13 | For example, the point geom draws one point per row. 14 | A **collective** geom displays multiple observations with one geometric object. 15 | This may be a result of a statistical summary, like a boxplot, or may be fundamental to the display of the geom, like a polygon. 16 | Lines and paths fall somewhere in between: each line is composed of a set of straight segments, but each segment represents two points. 17 | How do we control the assignment of observations to graphical elements? 18 | This is the job of the `group` aesthetic. 19 | \index{Grouping} \indexc{group} \index{Geoms!collective} 20 | 21 | By default, the `group` aesthetic is mapped to the interaction of all discrete variables in the plot. 22 | This often partitions the data correctly, but when it does not, or when no discrete variable is used in a plot, you'll need to explicitly define the grouping structure by mapping group to a variable that has a different value for each group. 23 | 24 | There are three common cases where the default is not enough, and we will consider each one below. 25 | In the following examples, we will use a simple longitudinal dataset, `Oxboys`, from the nlme package. 26 | It records the heights (`height`) and centered ages (`age`) of 26 boys (`Subject`), measured on nine occasions (`Occasion`). 27 | `Subject` and `Occasion` are stored as ordered factors. 28 | \index{nlme} \index{Data!Oxboys@\texttt{Oxboys}} 29 | 30 | ```{r} 31 | #| label: oxboys 32 | data(Oxboys, package = "nlme") 33 | head(Oxboys) 34 | ``` 35 | 36 | ## Multiple groups, one aesthetic 37 | 38 | In many situations, you want to separate your data into groups, but render them in the same way. 39 | In other words, you want to be able to distinguish individual subjects, but not identify them. 40 | This is common in longitudinal studies with many subjects, where the plots are often descriptively called spaghetti plots. 41 | For example, the following plot shows the growth trajectory for each boy (each `Subject`): \index{Data!longitudinal} \indexf{geom\_line} 42 | 43 | ```{r} 44 | #| label: oxboys-line 45 | ggplot(Oxboys, aes(age, height, group = Subject)) + 46 | geom_point() + 47 | geom_line() 48 | ``` 49 | 50 | If you incorrectly specify the grouping variable, you'll get a characteristic sawtooth appearance: 51 | 52 | ```{r} 53 | #| label: oxboys-line-bad 54 | ggplot(Oxboys, aes(age, height)) + 55 | geom_point() + 56 | geom_line() 57 | ``` 58 | 59 | If a group isn't defined by a single variable, but instead by a combination of multiple variables, use `interaction()` to combine them, e.g. `aes(group = interaction(school_id, student_id))`. 60 | \indexf{interaction} 61 | 62 | ## Different groups on different layers 63 | 64 | Sometimes we want to plot summaries that use different levels of aggregation: one layer might display individuals, while another displays an overall summary. 65 | Building on the previous example, suppose we want to add a single smooth line, showing the overall trend for *all* boys. 66 | If we use the same grouping in both layers, we get one smooth per boy: \indexf{geom\_smooth} 67 | 68 | ```{r} 69 | #| label: layer18 70 | ggplot(Oxboys, aes(age, height, group = Subject)) + 71 | geom_line() + 72 | geom_smooth(method = "lm", se = FALSE) 73 | ``` 74 | 75 | This is not what we wanted; we have inadvertently added a smoothed line for each boy. 76 | Grouping controls both the display of the geoms, and the operation of the stats: one statistical transformation is run for each group. 77 | 78 | Instead of setting the grouping aesthetic in `ggplot()`, where it will apply to all layers, we set it in `geom_line()` so it applies only to the lines. 79 | There are no discrete variables in the plot so the default grouping variable will be a constant and we get one smooth: 80 | 81 | ```{r} 82 | #| label: layer19 83 | ggplot(Oxboys, aes(age, height)) + 84 | geom_line(aes(group = Subject)) + 85 | geom_smooth(method = "lm", linewidth = 2, se = FALSE) 86 | ``` 87 | 88 | ## Overriding the default grouping 89 | 90 | Some plots have a discrete x scale, but you still want to draw lines connecting *across* groups. 91 | This is the strategy used in interaction plots, profile plots, and parallel coordinate plots, among others. 92 | For example, imagine we've drawn boxplots of height at each measurement occasion: \indexf{geom\_boxplot} 93 | 94 | ```{r} 95 | #| label: oxbox 96 | ggplot(Oxboys, aes(Occasion, height)) + 97 | geom_boxplot() 98 | ``` 99 | 100 | There is one discrete variable in this plot, `Occasion`, so we get one boxplot for each unique x value. 101 | Now we want to overlay lines that connect each individual boy. 102 | Simply adding `geom_line()` does not work: the lines are drawn within each occasion, not across each subject: 103 | 104 | ```{r} 105 | #| label: oxbox-line-bad 106 | ggplot(Oxboys, aes(Occasion, height)) + 107 | geom_boxplot() + 108 | geom_line(colour = "#3366FF", alpha = 0.5) 109 | ``` 110 | 111 | To get the plot we want, we need to override the grouping to say we want one line per boy: 112 | 113 | ```{r} 114 | #| label: oxbox-line 115 | ggplot(Oxboys, aes(Occasion, height)) + 116 | geom_boxplot() + 117 | geom_line(aes(group = Subject), colour = "#3366FF", alpha = 0.5) 118 | ``` 119 | 120 | ## Matching aesthetics to graphic objects {#sec-matching} 121 | 122 | A final important issue with collective geoms is how the aesthetics of the individual observations are mapped to the aesthetics of the complete entity. 123 | What happens when different aesthetics are mapped to a single geometric element? 124 | \index{Aesthetics!matching to geoms} 125 | 126 | In ggplot2, this is handled differently for different collective geoms. 127 | Lines and paths operate on a "first value" principle: each segment is defined by two observations, and ggplot2 applies the aesthetic value (e.g., colour) associated with the *first* observation when drawing the segment. 128 | That is, the aesthetic for the first observation is used when drawing the first segment, the second observation is used when drawing the second segment and so on. 129 | The aesthetic value for the last observation is not used: 130 | 131 | ```{r} 132 | #| layout-ncol: 2 133 | #| fig-width: 4 134 | df <- data.frame(x = 1:3, y = 1:3, colour = c(1, 3, 5)) 135 | 136 | ggplot(df, aes(x, y, colour = factor(colour))) + 137 | geom_line(aes(group = 1), linewidth = 2) + 138 | geom_point(size = 5) 139 | 140 | ggplot(df, aes(x, y, colour = colour)) + 141 | geom_line(aes(group = 1), linewidth = 2) + 142 | geom_point(size = 5) 143 | ``` 144 | 145 | On the left --- where colour is discrete --- the first point and first line segment are red, the second point and second line segment are green, and the final point (with no corresponding segment) is blue. 146 | On the right --- where colour is continuous --- the same principle is applied to the three different shades of blue. 147 | Notice that even though the colour variable is continuous, ggplot2 does not smoothly blend from one aesthetic value to another. 148 | If this is the behaviour you want, you can perform the linear interpolation yourself: 149 | 150 | ```{r} 151 | #| label: matching-lines2 152 | xgrid <- with(df, seq(min(x), max(x), length = 50)) 153 | interp <- data.frame( 154 | x = xgrid, 155 | y = approx(df$x, df$y, xout = xgrid)$y, 156 | colour = approx(df$x, df$colour, xout = xgrid)$y 157 | ) 158 | ggplot(interp, aes(x, y, colour = colour)) + 159 | geom_line(linewidth = 2) + 160 | geom_point(data = df, size = 5) 161 | ``` 162 | 163 | An additional limitation for paths and lines is worth noting: the line type must be constant over each individual line. 164 | In R there is no way to draw a line which has varying line type. 165 | \indexf{geom\_line} \indexf{geom\_path} 166 | 167 | What about other collective geoms, such as polygons? 168 | Most collective geoms are more complicated than lines and path, and a single geometric object can map onto many observations. 169 | In such cases it is not obvious how the aesthetics of individual observations should be combined. 170 | For instance, how would you colour a polygon that had a different fill colour for each point on its border? 171 | Due to this ambiguity ggplot2 adopts a simple rule: the aesthetics from the individual components are used only if they are all the same. 172 | If the aesthetics differ for each component, ggplot2 uses a default value instead. 173 | \indexf{geom\_polygon} 174 | 175 | These issues are most relevant when mapping aesthetics to continuous variables. 176 | For discrete variables, the default behaviour of ggplot2 is to treat the variable as part of the group aesthetic, as described above. 177 | This has the effect of splitting the collective geom into smaller pieces. 178 | This works particularly well for bar and area plots, because stacking the individual pieces produces the same shape as the original ungrouped data: 179 | 180 | ```{r} 181 | #| label: bar-split-disc 182 | #| layout-ncol: 2 183 | #| fig-width: 4 184 | ggplot(mpg, aes(class)) + 185 | geom_bar() 186 | ggplot(mpg, aes(class, fill = drv)) + 187 | geom_bar() 188 | ``` 189 | 190 | If you try to map the fill aesthetic to a continuous variable (e.g., `hwy`) in the same way, it doesn't work. 191 | The default grouping will only be based on `class`, so each bar is now associated with multiple colours (depending on the value of `hwy` for the observations in each class). 192 | Because a bar can only display one colour, ggplot2 reverts to the default grey in this case. 193 | To show multiple colours, we need multiple bars for each `class`, which we can get by overriding the grouping: 194 | 195 | ```{r} 196 | #| label: bar-split-cont 197 | #| layout-ncol: 2 198 | #| fig-width: 4 199 | ggplot(mpg, aes(class, fill = hwy)) + 200 | geom_bar() 201 | ggplot(mpg, aes(class, fill = hwy, group = hwy)) + 202 | geom_bar() 203 | ``` 204 | 205 | In the plot on the right, the "shaded bars" for each `class` have been constructed by stacking many distinct bars on top of each other, each filled with a different shade based on the value of `hwy`. 206 | Note that when you do this, the bars are stacked in the order defined by the grouping variable (in this example `hwy`). 207 | If you need fine control over this behaviour, you'll need to create a factor with levels ordered as needed. 208 | 209 | ## Exercises 210 | 211 | 1. Draw a boxplot of `hwy` for each value of `cyl`, without turning `cyl` into a factor. 212 | What extra aesthetic do you need to set? 213 | 214 | 2. Modify the following plot so that you get one boxplot per integer value of `displ`. 215 | 216 | ```{r} 217 | #| eval: false 218 | ggplot(mpg, aes(displ, cty)) + 219 | geom_boxplot() 220 | ``` 221 | 222 | 3. When illustrating the difference between mapping continuous and discrete colours to a line, the discrete example needed `aes(group = 1)`. 223 | Why? 224 | What happens if that is omitted? 225 | What's the difference between `aes(group = 1)` and `aes(group = 2)`? 226 | Why? 227 | 228 | 4. How many bars are in each of the following plots? 229 | 230 | ```{r} 231 | #| eval: false 232 | ggplot(mpg, aes(drv)) + 233 | geom_bar() 234 | 235 | ggplot(mpg, aes(drv, fill = hwy, group = hwy)) + 236 | geom_bar() 237 | 238 | library(dplyr) 239 | mpg2 <- mpg %>% arrange(hwy) %>% mutate(id = seq_along(hwy)) 240 | ggplot(mpg2, aes(drv, fill = hwy, group = id)) + 241 | geom_bar() 242 | ``` 243 | 244 | (Hint: try adding an outline around each bar with `colour = "white"`) 245 | 246 | 5. Install the babynames package. 247 | It contains data about the popularity of baby names in the US. 248 | Run the following code and fix the resulting graph. 249 | Why does this graph make us unhappy? 250 | 251 | ```{r} 252 | #| eval: false 253 | library(babynames) 254 | hadley <- dplyr::filter(babynames, name == "Hadley") 255 | ggplot(hadley, aes(year, n)) + 256 | geom_line() 257 | ``` 258 | -------------------------------------------------------------------------------- /colour-wheel.R: -------------------------------------------------------------------------------- 1 | library(scales) 2 | library(dplyr) 3 | library(colorspace) 4 | 5 | hcl <- expand.grid(x = seq(-1, 1, length = 100), y = seq(-1, 1, length=100)) %>% 6 | tbl_df() %>% 7 | filter(x^2 + y^2 < 1) %>% 8 | mutate( 9 | r = sqrt(x^2 + y^2), 10 | c = 100 * r, 11 | h = 180 / pi * atan2(y, x), 12 | l = 65, 13 | colour = hcl(h, c, l) 14 | ) 15 | 16 | # sin(h) = y / (c / 100) 17 | # y = sin(h) * c / 100 18 | 19 | cols <- hue_pal()(5) 20 | selected <- RGB(t(col2rgb(cols)) / 255) %>% 21 | as("polarLUV") %>% 22 | coords() %>% 23 | as.data.frame() %>% 24 | mutate( 25 | x = cos(H / 180 * pi) * C / 100, 26 | y = sin(H / 180 * pi) * C / 100, 27 | colour = cols 28 | ) 29 | 30 | ggplot(hcl, aes(x, y)) + 31 | geom_raster(aes(fill = colour)) + 32 | scale_fill_identity() + 33 | scale_colour_identity() + 34 | coord_equal() + 35 | scale_x_continuous("", breaks = NULL) + 36 | scale_y_continuous("", breaks = NULL) + 37 | geom_point(data = selected, size = 10, color = "white") + 38 | geom_point(data = selected, size = 5, aes(colour = colour)) 39 | -------------------------------------------------------------------------------- /common.R: -------------------------------------------------------------------------------- 1 | set.seed(451) 2 | 3 | library(ggplot2) 4 | conflicted::conflict_prefer("Position", "ggplot2") 5 | 6 | library(dplyr) 7 | conflicted::conflict_prefer("filter", "dplyr") 8 | conflicted::conflict_prefer("pull", "dplyr") # in case git2r is loaded 9 | 10 | library(tidyr) 11 | conflicted::conflict_prefer("extract", "tidyr") 12 | 13 | options(digits = 3, dplyr.print_min = 6, dplyr.print_max = 6) 14 | options(crayon.enabled = FALSE) 15 | 16 | # suppress startup message 17 | library(maps) 18 | 19 | knitr::opts_chunk$set( 20 | comment = "#>", 21 | collapse = TRUE, 22 | fig.show = "hold", 23 | dpi = 300, 24 | cache = TRUE 25 | ) 26 | 27 | is_latex <- function() { 28 | identical(knitr::opts_knit$get("rmarkdown.pandoc.to"), "latex") 29 | } 30 | 31 | status <- function(type) { 32 | status <- switch(type, 33 | polishing = "should be readable but is currently undergoing final polishing", 34 | restructuring = "is undergoing heavy restructuring and may be confusing or incomplete", 35 | drafting = "is currently a dumping ground for ideas, and we don't recommend reading it", 36 | complete = "is largely complete and just needs final proof reading", 37 | stop("Invalid `type`", call. = FALSE) 38 | ) 39 | 40 | class <- switch(type, 41 | polishing = "note", 42 | restructuring = "important", 43 | drafting = "important", 44 | complete = "note" 45 | ) 46 | 47 | callout <- paste0( 48 | "\n", 49 | "::: {.callout-", class, "} \n", 50 | "You are reading the work-in-progress third edition of the ggplot2 book. ", 51 | "This chapter ", status, ". \n", 52 | "::: \n" 53 | ) 54 | 55 | cat(callout) 56 | } 57 | 58 | 59 | # Draw parts of plots ----------------------------------------------------- 60 | 61 | draw_legends <- function(...) { 62 | plots <- list(...) 63 | gtables <- lapply(plots, function(x) ggplot_gtable(ggplot_build(x))) 64 | guides <- lapply(gtables, gtable::gtable_filter, "guide-box") 65 | 66 | one <- Reduce(function(x, y) cbind(x, y, size = "first"), guides) 67 | 68 | grid::grid.newpage() 69 | grid::grid.draw(one) 70 | } 71 | 72 | 73 | # Customised plot layout -------------------------------------------------- 74 | 75 | plot_hook_bookdown <- function(x, options) { 76 | paste0( 77 | begin_figure(x, options), 78 | include_graphics(x, options), 79 | end_figure(x, options) 80 | ) 81 | } 82 | 83 | begin_figure <- function(x, options) { 84 | if (!knitr_first_plot(options)) 85 | return("") 86 | 87 | paste0( 88 | "\\begin{figure}[H]\n", 89 | if (options$fig.align == "center") " \\centering\n" 90 | ) 91 | } 92 | end_figure <- function(x, options) { 93 | if (!knitr_last_plot(options)) 94 | return("") 95 | 96 | paste0( 97 | if (!is.null(options$fig.cap)) { 98 | paste0( 99 | ' \\caption{', options$fig.cap, '}\n', 100 | ' \\label{fig:', options$label, '}\n' 101 | ) 102 | }, 103 | "\\end{figure}\n" 104 | ) 105 | } 106 | include_graphics <- function(x, options) { 107 | opts <- c( 108 | sprintf('width=%s', options$out.width), 109 | sprintf('height=%s', options$out.height), 110 | options$out.extra 111 | ) 112 | if (length(opts) > 0) { 113 | opts_str <- paste0("[", paste(opts, collapse = ", "), "]") 114 | } else { 115 | opts_str <- "" 116 | } 117 | 118 | paste0(" \\includegraphics", 119 | opts_str, 120 | "{", tools::file_path_sans_ext(x), "}", 121 | if (options$fig.cur != options$fig.num) "%", 122 | "\n" 123 | ) 124 | } 125 | 126 | knitr_first_plot <- function(x) { 127 | x$fig.show != "hold" || x$fig.cur == 1L 128 | } 129 | knitr_last_plot <- function(x) { 130 | x$fig.show != "hold" || x$fig.cur == x$fig.num 131 | } 132 | 133 | 134 | # control output lines ---------------------------------------------------- 135 | 136 | hook_output <- knitr::knit_hooks$get("output") 137 | knitr::knit_hooks$set(output = function(x, options) { 138 | lines <- options$output.lines 139 | if (is.null(lines)) { 140 | return(hook_output(x, options)) # pass to default hook 141 | } 142 | 143 | x <- unlist(strsplit(x, "\n")) 144 | 145 | if (length(lines)==1) { # first n lines 146 | if (length(x) > lines) { 147 | # truncate the output, but add .... 148 | x <- c(head(x, lines), more) 149 | } 150 | } else { 151 | x <- x[lines] # don't add ... when we get vector input 152 | } 153 | # paste these lines together 154 | x <- paste(c(x, ""), collapse = "\n") 155 | hook_output(x, options) 156 | }) 157 | 158 | -------------------------------------------------------------------------------- /contributing.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | 3 | This book will be updated in the open, and it wouldn't be nearly as good without your contributions. There are a number of ways you can help make the book even better: 4 | 5 | * If there’s a particular section of the book that you think needs an update (or is just plain missing), please let us know by filing an [issue](https://github.com/hadley/ggplot2-book/issues). 6 | 7 | * Submit a pull request including a brief description of your changes: 8 | fixing typos is perfectly adequate. 9 | 10 | * If you make significant changes, include the phrase "I assign the copyright of this contribution to Hadley Wickham" - I need this so I can publish the printed book. 11 | 12 | * If you don't understand something, please [let me know](mailto:h.wickham@gmail.com). Your feedback on what is confusing or hard to understand is valuable. -------------------------------------------------------------------------------- /contributors.csv: -------------------------------------------------------------------------------- 1 | login,n,name,blog 2 | agisga,1,Alexej Gossmann,http://www.alexejgossmann.com/ 3 | chriselrod,1,NA,NA 4 | cpsievert,115,Carson Sievert,https://cpsievert.me/ 5 | davechilders,1,Dave Childers,NA 6 | dicorynia,1,NA,NA 7 | djmurphy420,2,Dennis Murphy,NA 8 | djnavarro,25,Danielle Navarro,https://djnavarro.net 9 | dmurdoch,1,NA,NA 10 | dongzhuoer,4,Zhuoer Dong,https://dongzhuoer.github.io 11 | dyavorsky,1,Dan Yavorsky,https://sites.google.com/view/dan-yavorskys-website/about 12 | fjuniorr,1,Francisco Júnior,NA 13 | gokceneraslan,5,Gökçen Eraslan,NA 14 | hadley,534,Hadley Wickham,http://hadley.nz 15 | jashapiro,1,jashapiro,NA 16 | jimhester,3,Jim Hester,http://www.jimhester.com 17 | jmgirard,1,Jeffrey Girard,https://www.jmgirard.com 18 | joelgombin,1,Joel Gombin,joelgombin.github.com 19 | jonas-hag,1,NA,NA 20 | lindbrook,5,NA,NA 21 | MarHer90,2,NA,NA 22 | mine-cetinkaya-rundel,1,Mine Cetinkaya-Rundel,http://mine-cr.com 23 | neilmcguigan,1,Neil McGuigan,NA 24 | nevrome,1,Clemens Schmid,https://nevrome.de 25 | pitmonticone,1,Pietro Monticone,NA 26 | pkq,1,Patrick Kennedy,NA 27 | Robinlovelace,5,Robin,https://www.robinlovelace.net 28 | seaaan,1,Sean Hughes,NA 29 | thomasp85,21,Thomas Lin Pedersen,www.data-imaginist.com 30 | tklebel,1,Thomas Klebel,https://thomasklebel.eu 31 | tomjemmett,1,Tom Jemmett,https://www.strategyunitwm.nhs.uk/ 32 | truemoid,2,Alex Trueman,NA 33 | wibeasley,1,Will Beasley,http://scholar.google.com/citations?user=ffsJTC0AAAAJ&hl=en 34 | wjakethompson,1,Jake Thompson,https://wjakethompson.com/ 35 | yiluheihei,3,Yang Cao,https://www.choyang.me 36 | yutannihilation,3,Hiroaki Yutani,https://yutani.rbind.io/ 37 | -------------------------------------------------------------------------------- /coord.qmd: -------------------------------------------------------------------------------- 1 | # Coordinate systems {#sec-coord} 2 | 3 | ```{r} 4 | #| echo: false 5 | #| message: false 6 | #| results: asis 7 | source("common.R") 8 | status("drafting") 9 | ``` 10 | 11 | Coordinate systems have two main jobs: \index{Coordinate systems} 12 | 13 | - Combine the two position aesthetics to produce a 2d position on the plot. 14 | The position aesthetics are called `x` and `y`, but they might be better called position 1 and 2 because their meaning depends on the coordinate system used. 15 | For example, with the polar coordinate system they become angle and radius (or radius and angle), and with maps they become latitude and longitude. 16 | 17 | - In coordination with the faceter, coordinate systems draw axes and panel backgrounds. 18 | While the scales control the values that appear on the axes, and how they map from data to position, it is the coordinate system which actually draws them. 19 | This is because their appearance depends on the coordinate system: an angle axis looks quite different than an x axis. 20 | 21 | There are two types of coordinate systems. 22 | Linear coordinate systems preserve the shape of geoms: 23 | 24 | - `coord_cartesian()`: the default Cartesian coordinate system, where the 2d position of an element is given by the combination of the x and y positions. 25 | 26 | - `coord_flip()`: Cartesian coordinate system with x and y axes flipped. 27 | 28 | - `coord_fixed()`: Cartesian coordinate system with a fixed aspect ratio. 29 | 30 | On the other hand, non-linear coordinate systems can change the shapes: a straight line may no longer be straight. 31 | The closest distance between two points may no longer be a straight line. 32 | 33 | - `coord_map()`/`coord_quickmap()`/`coord_sf()`: Map projections. 34 | 35 | - `coord_polar()`: Polar coordinates. 36 | 37 | - `coord_trans()`: Apply arbitrary transformations to x and y positions, after the data has been processed by the stat. 38 | 39 | Each coordinate system is described in more detail below. 40 | 41 | ## Linear coordinate systems {#sec-cartesian} 42 | 43 | There are three linear coordinate systems: `coord_cartesian()`, `coord_flip()`, `coord_fixed()`. 44 | \index{Coordinate systems!Cartesian} \indexf{coord\_cartesian} 45 | 46 | ### Zooming into a plot with `coord_cartesian()` 47 | 48 | `coord_cartesian()` has arguments `xlim` and `ylim`. 49 | If you think back to the scales chapter, you might wonder why we need these. 50 | Doesn't the limits argument of the scales already allow us to control what appears on the plot? 51 | The key difference is how the limits work: when setting scale limits, any data outside the limits is thrown away; but when setting coordinate system limits, we still use all the data, but we only display a small region of the plot. 52 | Setting coordinate system limits is like looking at the plot under a magnifying glass. 53 | \index{Zooming} 54 | 55 | ```{r} 56 | #| label: limits-smooth 57 | #| layout-ncol: 3 58 | #| fig-width: 3 59 | base <- ggplot(mpg, aes(displ, hwy)) + 60 | geom_point() + 61 | geom_smooth() 62 | 63 | # Full dataset 64 | base 65 | # Scaling to 4--6 throws away data outside that range 66 | base + scale_x_continuous(limits = c(4, 6)) 67 | # Zooming to 4--6 keeps all the data but only shows some of it 68 | base + coord_cartesian(xlim = c(4, 6)) 69 | ``` 70 | 71 | ### Flipping the axes with `coord_flip()` {#sec-coord-flip} 72 | 73 | Most statistics and geoms assume you are interested in y values conditional on x values (e.g., smooth, summary, boxplot, line): in most statistical models, the x values are assumed to be measured without error. 74 | If you are interested in x conditional on y (or you just want to rotate the plot 90 degrees), you can use `coord_flip()` to exchange the x and y axes. 75 | Compare this with just exchanging the variables mapped to x and y: \index{Rotating} \index{Coordinate systems!flipped} \indexf{coord\_flip} 76 | 77 | ```{r} 78 | #| label: coord-flip 79 | #| layout-ncol: 3 80 | #| fig-width: 3 81 | ggplot(mpg, aes(displ, cty)) + 82 | geom_point() + 83 | geom_smooth() 84 | # Exchanging cty and displ rotates the plot 90 degrees, but the smooth 85 | # is fit to the rotated data. 86 | ggplot(mpg, aes(cty, displ)) + 87 | geom_point() + 88 | geom_smooth() 89 | # coord_flip() fits the smooth to the original data, and then rotates 90 | # the output 91 | ggplot(mpg, aes(displ, cty)) + 92 | geom_point() + 93 | geom_smooth() + 94 | coord_flip() 95 | ``` 96 | 97 | ### Equal scales with `coord_fixed()` 98 | 99 | `coord_fixed()` fixes the ratio of length on the x and y axes. 100 | The default `ratio` ensures that the x and y axes have equal scales: i.e., 1 cm along the x axis represents the same range of data as 1 cm along the y axis. 101 | The aspect ratio will also be set to ensure that the mapping is maintained regardless of the shape of the output device. 102 | See the documentation of `coord_fixed()` for more details. 103 | \index{Aspect ratio} \index{Coordinate systems!equal} \indexf{coord\_equal} 104 | 105 | ## Non-linear coordinate systems {#sec-coord-non-linear} 106 | 107 | Unlike linear coordinates, non-linear coordinates can change the shape of geoms. 108 | For example, in polar coordinates a rectangle becomes an arc; in a map projection, the shortest path between two points is not necessarily a straight line. 109 | The code below shows how a line and a rectangle are rendered in a few different coordinate systems. 110 | \index{Transformation!coordinate system} \index{Coordinate systems!non-linear} 111 | 112 | ```{r} 113 | #| label: coord-trans-ex 114 | #| layout-ncol: 3 115 | #| fig-width: 3 116 | rect <- data.frame(x = 50, y = 50) 117 | line <- data.frame(x = c(1, 200), y = c(100, 1)) 118 | base <- ggplot(mapping = aes(x, y)) + 119 | geom_tile(data = rect, aes(width = 50, height = 50)) + 120 | geom_line(data = line) + 121 | xlab(NULL) + ylab(NULL) 122 | base 123 | base + coord_polar("x") 124 | base + coord_polar("y") 125 | ``` 126 | 127 | ```{r} 128 | #| label: coord-trans-ex-2 129 | #| layout-ncol: 3 130 | #| fig-width: 3 131 | base + coord_flip() 132 | base + coord_trans(y = "log10") 133 | base + coord_fixed() 134 | ``` 135 | 136 | The transformation takes part in two steps. 137 | Firstly, the parameterisation of each geom is changed to be purely location-based, rather than location- and dimension-based. 138 | For example, a bar can be represented as an x position (a location), a height and a width (two dimensions). 139 | Interpreting height and width in a non-Cartesian coordinate system is hard because a rectangle may no longer have constant height and width, so we convert to a purely location-based representation, a polygon defined by the four corners. 140 | This effectively converts all geoms to a combination of points, lines and polygons. 141 | \index{Geoms!parameterisation} \index{Coordinate systems!transformation} 142 | 143 | Once all geoms have a location-based representation, the next step is to transform each location into the new coordinate system. 144 | It is easy to transform points, because a point is still a point no matter what coordinate system you are in. 145 | Lines and polygons are harder, because a straight line may no longer be straight in the new coordinate system. 146 | To make the problem tractable we assume that all coordinate transformations are smooth, in the sense that all very short lines will still be very short straight lines in the new coordinate system. 147 | With this assumption in hand, we can transform lines and polygons by breaking them up into many small line segments and transforming each segment. 148 | This process is called munching and is illustrated below: \index{Munching} 149 | 150 | 1. We start with a line parameterised by its two endpoints: 151 | 152 | ```{r} 153 | df <- data.frame(r = c(0, 1), theta = c(0, 3 / 2 * pi)) 154 | ggplot(df, aes(r, theta)) + 155 | geom_line() + 156 | geom_point(size = 2, colour = "red") 157 | ``` 158 | 159 | 2. We break it into multiple line segments, each with two endpoints. 160 | 161 | ```{r} 162 | interp <- function(rng, n) { 163 | seq(rng[1], rng[2], length = n) 164 | } 165 | munched <- data.frame( 166 | r = interp(df$r, 15), 167 | theta = interp(df$theta, 15) 168 | ) 169 | 170 | ggplot(munched, aes(r, theta)) + 171 | geom_line() + 172 | geom_point(size = 2, colour = "red") 173 | ``` 174 | 175 | 3. We transform the locations of each piece: 176 | 177 | ```{r} 178 | transformed <- transform(munched, 179 | x = r * sin(theta), 180 | y = r * cos(theta) 181 | ) 182 | 183 | ggplot(transformed, aes(x, y)) + 184 | geom_path() + 185 | geom_point(size = 2, colour = "red") + 186 | coord_fixed() 187 | ``` 188 | 189 | Internally ggplot2 uses many more segments so that the result looks smooth. 190 | 191 | ### Transformations with `coord_trans()` 192 | 193 | Like limits, we can also transform the data in two places: at the scale level or at the coordinate system level. 194 | `coord_trans()` has arguments `x` and `y` which should be strings naming the transformer or transformer objects (see @sec-scale-position). 195 | Transforming at the scale level occurs before statistics are computed and does not change the shape of the geom. 196 | Transforming at the coordinate system level occurs after the statistics have been computed, and does affect the shape of the geom. 197 | Using both together allows us to model the data on a transformed scale and then backtransform it for interpretation: a common pattern in analysis. 198 | \index{Transformation!coordinate system} \index{Coordinate systems!transformed} \indexf{coord\_trans} 199 | 200 | ```{r} 201 | #| label: backtrans 202 | #| warning: false 203 | #| layout-ncol: 3 204 | #| fig-width: 3 205 | # Linear model on original scale is poor fit 206 | base <- ggplot(diamonds, aes(carat, price)) + 207 | stat_bin2d() + 208 | geom_smooth(method = "lm") + 209 | xlab(NULL) + 210 | ylab(NULL) + 211 | theme(legend.position = "none") 212 | base 213 | 214 | # Better fit on log scale, but harder to interpret 215 | base + 216 | scale_x_log10() + 217 | scale_y_log10() 218 | 219 | # Fit on log scale, then backtransform to original. 220 | # Highlights lack of expensive diamonds with large carats 221 | pow10 <- scales::exp_trans(10) 222 | base + 223 | scale_x_log10() + 224 | scale_y_log10() + 225 | coord_trans(x = pow10, y = pow10) 226 | ``` 227 | 228 | ### Polar coordinates with `coord_polar()` 229 | 230 | Using polar coordinates gives rise to pie charts and wind roses (from bar geoms), and radar charts (from line geoms). 231 | Polar coordinates are often used for circular data, particularly time or direction, but the perceptual properties are not good because the angle is harder to perceive for small radii than it is for large radii. 232 | The `theta` argument determines which position variable is mapped to angle (by default, x) and which to radius. 233 | 234 | The code below shows how we can turn a bar into a pie chart or a bullseye chart by changing the coordinate system. 235 | The documentation includes other examples. 236 | \index{Polar coordinates} \index{Coordinate systems!polar} \indexf{coord\_polar} 237 | 238 | ```{r} 239 | #| label: polar 240 | #| layout-ncol: 3 241 | #| fig-width: 3 242 | base <- ggplot(mtcars, aes(factor(1), fill = factor(cyl))) + 243 | geom_bar(width = 1) + 244 | theme(legend.position = "none") + 245 | scale_x_discrete(NULL, expand = c(0, 0)) + 246 | scale_y_continuous(NULL, expand = c(0, 0)) 247 | 248 | # Stacked barchart 249 | base 250 | 251 | # Pie chart 252 | base + coord_polar(theta = "y") 253 | 254 | # The bullseye chart 255 | base + coord_polar() 256 | ``` 257 | 258 | ### Map projections with `coord_map()` 259 | 260 | Maps are intrinsically displays of spherical data. 261 | Simply plotting raw longitudes and latitudes is misleading, so we must *project* the data. 262 | There are two ways to do this with ggplot2: \index{Maps!projections} \index{Coordinate systems!map projections} \indexf{coord\_map} \indexf{coord\_quickmap} \index{mapproj} 263 | 264 | - `coord_quickmap()` is a quick and dirty approximation that sets the aspect ratio to ensure that 1m of latitude and 1m of longitude are the same distance in the middle of the plot. 265 | This is a reasonable place to start for smaller regions, and is very fast. 266 | 267 | ```{r} 268 | #| label: map-nz 269 | #| layout-ncol: 2 270 | #| fig-width: 4 271 | # Prepare a map of NZ 272 | nzmap <- ggplot(map_data("nz"), aes(long, lat, group = group)) + 273 | geom_polygon(fill = "white", colour = "black") + 274 | xlab(NULL) + ylab(NULL) 275 | 276 | # Plot it in cartesian coordinates 277 | nzmap 278 | # With the aspect ratio approximation 279 | nzmap + coord_quickmap() 280 | ``` 281 | 282 | - `coord_map()` uses the **mapproj** package, to do a formal map projection. 283 | It takes the same arguments as `mapproj::mapproject()` for controlling the projection. 284 | It is much slower than `coord_quickmap()` because it must munch the data and transform each piece. 285 | 286 | ```{r} 287 | #| label: map-world 288 | #| layout-ncol: 3 289 | #| fig-width: 3 290 | #| dev: png 291 | world <- map_data("world") 292 | worldmap <- ggplot(world, aes(long, lat, group = group)) + 293 | geom_path() + 294 | scale_y_continuous(NULL, breaks = (-2:3) * 30, labels = NULL) + 295 | scale_x_continuous(NULL, breaks = (-4:4) * 45, labels = NULL) 296 | 297 | worldmap + coord_map() 298 | # Some crazier projections 299 | worldmap + coord_map("ortho") 300 | worldmap + coord_map("stereographic") 301 | ``` 302 | -------------------------------------------------------------------------------- /cover.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/cover.jpg -------------------------------------------------------------------------------- /diagrams/colour-wheel.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/colour-wheel.png -------------------------------------------------------------------------------- /diagrams/cover.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/cover.pdf -------------------------------------------------------------------------------- /diagrams/diamond-dimensions.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/diamond-dimensions.pdf -------------------------------------------------------------------------------- /diagrams/diamond-dimensions.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/diamond-dimensions.png -------------------------------------------------------------------------------- /diagrams/empty.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/empty.pdf -------------------------------------------------------------------------------- /diagrams/ggplot-pipeline.graffle: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | ActiveLayerIndex 6 | 0 7 | AutoAdjust 8 | 9 | CanvasColor 10 | 11 | w 12 | 1 13 | 14 | CanvasOrigin 15 | {0, 0} 16 | CanvasScale 17 | 1 18 | ColumnAlign 19 | 1 20 | ColumnSpacing 21 | 36 22 | CreationDate 23 | 2007-03-22 12:47:39 -0500 24 | Creator 25 | Hadley Wickham 26 | DisplayScale 27 | 1 cm = 1 cm 28 | GraphDocumentVersion 29 | 5 30 | GraphicsList 31 | 32 | 33 | Class 34 | Group 35 | Graphics 36 | 37 | 38 | Bounds 39 | {{37, 115.25}, {123, 233}} 40 | Class 41 | ShapedGraphic 42 | ID 43 | 25 44 | Shape 45 | Rectangle 46 | Style 47 | 48 | fill 49 | 50 | Draws 51 | NO 52 | 53 | shadow 54 | 55 | Draws 56 | NO 57 | 58 | stroke 59 | 60 | Color 61 | 62 | b 63 | 0.8 64 | g 65 | 0.8 66 | r 67 | 0.8 68 | 69 | 70 | 71 | 72 | 73 | Bounds 74 | {{47.5, 300.25}, {102, 37}} 75 | Class 76 | ShapedGraphic 77 | ID 78 | 26 79 | Shape 80 | Rectangle 81 | Style 82 | 83 | shadow 84 | 85 | Draws 86 | NO 87 | 88 | 89 | Text 90 | 91 | Text 92 | {\rtf1\mac\ansicpg10000\cocoartf824\cocoasubrtf420 93 | {\fonttbl\f0\fswiss\fcharset77 Helvetica;} 94 | {\colortbl;\red255\green255\blue255;} 95 | \pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\qc\pardirnatural 96 | 97 | \f0\fs24 \cf0 Train scales} 98 | 99 | 100 | 101 | Bounds 102 | {{47.5, 242.25}, {102, 37}} 103 | Class 104 | ShapedGraphic 105 | ID 106 | 27 107 | Shape 108 | Rectangle 109 | Style 110 | 111 | shadow 112 | 113 | Draws 114 | NO 115 | 116 | 117 | Text 118 | 119 | Text 120 | {\rtf1\mac\ansicpg10000\cocoartf824\cocoasubrtf420 121 | {\fonttbl\f0\fswiss\fcharset77 Helvetica;} 122 | {\colortbl;\red255\green255\blue255;} 123 | \pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\qc\pardirnatural 124 | 125 | \f0\fs24 \cf0 Calculate statistics} 126 | 127 | 128 | 129 | Bounds 130 | {{47.5, 184.25}, {102, 37}} 131 | Class 132 | ShapedGraphic 133 | ID 134 | 28 135 | Shape 136 | Rectangle 137 | Style 138 | 139 | shadow 140 | 141 | Draws 142 | NO 143 | 144 | 145 | Text 146 | 147 | Text 148 | {\rtf1\mac\ansicpg10000\cocoartf824\cocoasubrtf420 149 | {\fonttbl\f0\fswiss\fcharset77 Helvetica;} 150 | {\colortbl;\red255\green255\blue255;} 151 | \pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\qc\pardirnatural 152 | 153 | \f0\fs24 \cf0 Train scales} 154 | 155 | 156 | 157 | Bounds 158 | {{47.5, 126.25}, {102, 37}} 159 | Class 160 | ShapedGraphic 161 | ID 162 | 29 163 | Shape 164 | Rectangle 165 | Style 166 | 167 | shadow 168 | 169 | Draws 170 | NO 171 | 172 | 173 | Text 174 | 175 | Text 176 | {\rtf1\mac\ansicpg10000\cocoartf824\cocoasubrtf420 177 | {\fonttbl\f0\fswiss\fcharset77 Helvetica;} 178 | {\colortbl;\red255\green255\blue255;} 179 | \pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\qc\pardirnatural 180 | 181 | \f0\fs24 \cf0 Map aesthetics} 182 | 183 | 184 | 185 | ID 186 | 24 187 | 188 | 189 | Bounds 190 | {{317, 134}, {97, 57}} 191 | Class 192 | ShapedGraphic 193 | ID 194 | 17 195 | Shape 196 | Circle 197 | Style 198 | 199 | shadow 200 | 201 | Draws 202 | NO 203 | 204 | 205 | Text 206 | 207 | Text 208 | {\rtf1\mac\ansicpg10000\cocoartf824\cocoasubrtf420 209 | {\fonttbl\f0\fswiss\fcharset77 Helvetica;} 210 | {\colortbl;\red255\green255\blue255;} 211 | \pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\qc\pardirnatural 212 | 213 | \f0\fs24 \cf0 Scales} 214 | 215 | 216 | 217 | Class 218 | Group 219 | Graphics 220 | 221 | 222 | Bounds 223 | {{183, 115}, {123, 233}} 224 | Class 225 | ShapedGraphic 226 | ID 227 | 19 228 | Shape 229 | Rectangle 230 | Style 231 | 232 | fill 233 | 234 | Draws 235 | NO 236 | 237 | shadow 238 | 239 | Draws 240 | NO 241 | 242 | stroke 243 | 244 | Color 245 | 246 | b 247 | 0.8 248 | g 249 | 0.8 250 | r 251 | 0.8 252 | 253 | 254 | 255 | 256 | 257 | Bounds 258 | {{193.5, 300}, {102, 37}} 259 | Class 260 | ShapedGraphic 261 | ID 262 | 20 263 | Shape 264 | Rectangle 265 | Style 266 | 267 | shadow 268 | 269 | Draws 270 | NO 271 | 272 | 273 | Text 274 | 275 | Text 276 | {\rtf1\mac\ansicpg10000\cocoartf824\cocoasubrtf420 277 | {\fonttbl\f0\fswiss\fcharset77 Helvetica;} 278 | {\colortbl;\red255\green255\blue255;} 279 | \pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\qc\pardirnatural 280 | 281 | \f0\fs24 \cf0 Train scales} 282 | 283 | 284 | 285 | Bounds 286 | {{193.5, 242}, {102, 37}} 287 | Class 288 | ShapedGraphic 289 | ID 290 | 21 291 | Shape 292 | Rectangle 293 | Style 294 | 295 | shadow 296 | 297 | Draws 298 | NO 299 | 300 | 301 | Text 302 | 303 | Text 304 | {\rtf1\mac\ansicpg10000\cocoartf824\cocoasubrtf420 305 | {\fonttbl\f0\fswiss\fcharset77 Helvetica;} 306 | {\colortbl;\red255\green255\blue255;} 307 | \pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\qc\pardirnatural 308 | 309 | \f0\fs24 \cf0 Calculate statistics} 310 | 311 | 312 | 313 | Bounds 314 | {{193.5, 184}, {102, 37}} 315 | Class 316 | ShapedGraphic 317 | ID 318 | 22 319 | Shape 320 | Rectangle 321 | Style 322 | 323 | shadow 324 | 325 | Draws 326 | NO 327 | 328 | 329 | Text 330 | 331 | Text 332 | {\rtf1\mac\ansicpg10000\cocoartf824\cocoasubrtf420 333 | {\fonttbl\f0\fswiss\fcharset77 Helvetica;} 334 | {\colortbl;\red255\green255\blue255;} 335 | \pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\qc\pardirnatural 336 | 337 | \f0\fs24 \cf0 Train scales} 338 | 339 | 340 | 341 | Bounds 342 | {{193.5, 126}, {102, 37}} 343 | Class 344 | ShapedGraphic 345 | ID 346 | 23 347 | Shape 348 | Rectangle 349 | Style 350 | 351 | shadow 352 | 353 | Draws 354 | NO 355 | 356 | 357 | Text 358 | 359 | Text 360 | {\rtf1\mac\ansicpg10000\cocoartf824\cocoasubrtf420 361 | {\fonttbl\f0\fswiss\fcharset77 Helvetica;} 362 | {\colortbl;\red255\green255\blue255;} 363 | \pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\qc\pardirnatural 364 | 365 | \f0\fs24 \cf0 Map aesthetics} 366 | 367 | 368 | 369 | ID 370 | 18 371 | 372 | 373 | Bounds 374 | {{124, 426}, {102, 37}} 375 | Class 376 | ShapedGraphic 377 | ID 378 | 11 379 | Shape 380 | Rectangle 381 | Style 382 | 383 | shadow 384 | 385 | Draws 386 | NO 387 | 388 | 389 | Text 390 | 391 | Text 392 | {\rtf1\mac\ansicpg10000\cocoartf824\cocoasubrtf420 393 | {\fonttbl\f0\fswiss\fcharset77 Helvetica;} 394 | {\colortbl;\red255\green255\blue255;} 395 | \pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\qc\pardirnatural 396 | 397 | \f0\fs24 \cf0 Grobs} 398 | 399 | 400 | 401 | Bounds 402 | {{124, 368.5}, {102, 37}} 403 | Class 404 | ShapedGraphic 405 | ID 406 | 10 407 | Shape 408 | Rectangle 409 | Style 410 | 411 | shadow 412 | 413 | Draws 414 | NO 415 | 416 | 417 | Text 418 | 419 | Text 420 | {\rtf1\mac\ansicpg10000\cocoartf824\cocoasubrtf420 421 | {\fonttbl\f0\fswiss\fcharset77 Helvetica;} 422 | {\colortbl;\red255\green255\blue255;} 423 | \pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\qc\pardirnatural 424 | 425 | \f0\fs24 \cf0 Map scales} 426 | 427 | 428 | 429 | Bounds 430 | {{124, 58}, {102, 37}} 431 | Class 432 | ShapedGraphic 433 | ID 434 | 3 435 | Shape 436 | Rectangle 437 | Style 438 | 439 | shadow 440 | 441 | Draws 442 | NO 443 | 444 | 445 | Text 446 | 447 | Text 448 | {\rtf1\mac\ansicpg10000\cocoartf824\cocoasubrtf420 449 | {\fonttbl\f0\fswiss\fcharset77 Helvetica;} 450 | {\colortbl;\red255\green255\blue255;} 451 | \pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\qc\pardirnatural 452 | 453 | \f0\fs24 \cf0 Data} 454 | 455 | 456 | 457 | GridInfo 458 | 459 | GuidesLocked 460 | NO 461 | GuidesVisible 462 | YES 463 | HPages 464 | 1 465 | ImageCounter 466 | 1 467 | IsPalette 468 | NO 469 | KeepToScale 470 | 471 | Layers 472 | 473 | 474 | Lock 475 | NO 476 | Name 477 | Layer 1 478 | Print 479 | YES 480 | View 481 | YES 482 | 483 | 484 | LayoutInfo 485 | 486 | LinksVisible 487 | NO 488 | MagnetsVisible 489 | NO 490 | MasterSheet 491 | Master 1 492 | MasterSheets 493 | 494 | 495 | ActiveLayerIndex 496 | 0 497 | AutoAdjust 498 | 499 | CanvasColor 500 | 501 | w 502 | 1 503 | 504 | CanvasOrigin 505 | {0, 0} 506 | CanvasScale 507 | 1 508 | ColumnAlign 509 | 1 510 | ColumnSpacing 511 | 36 512 | DisplayScale 513 | 1 cm = 1 cm 514 | GraphicsList 515 | 516 | GridInfo 517 | 518 | HPages 519 | 1 520 | IsPalette 521 | NO 522 | KeepToScale 523 | 524 | Layers 525 | 526 | 527 | Lock 528 | NO 529 | Name 530 | Layer 1 531 | Print 532 | YES 533 | View 534 | YES 535 | 536 | 537 | LayoutInfo 538 | 539 | Orientation 540 | 2 541 | OutlineStyle 542 | Basic 543 | RowAlign 544 | 1 545 | RowSpacing 546 | 36 547 | SheetTitle 548 | Master 1 549 | UniqueID 550 | 1 551 | VPages 552 | 1 553 | 554 | 555 | ModificationDate 556 | 2007-03-22 15:10:58 -0500 557 | Modifier 558 | Hadley Wickham 559 | NotesVisible 560 | NO 561 | Orientation 562 | 2 563 | OriginVisible 564 | NO 565 | OutlineStyle 566 | Basic 567 | PageBreaks 568 | YES 569 | PrintInfo 570 | 571 | NSBottomMargin 572 | 573 | float 574 | 0 575 | 576 | NSLeftMargin 577 | 578 | float 579 | 0 580 | 581 | NSPaperSize 582 | 583 | size 584 | {612, 792} 585 | 586 | NSRightMargin 587 | 588 | float 589 | 0 590 | 591 | NSTopMargin 592 | 593 | float 594 | 0 595 | 596 | 597 | ReadOnly 598 | NO 599 | RowAlign 600 | 1 601 | RowSpacing 602 | 36 603 | SheetTitle 604 | Canvas 1 605 | SmartAlignmentGuidesActive 606 | YES 607 | SmartDistanceGuidesActive 608 | YES 609 | UniqueID 610 | 1 611 | UseEntirePage 612 | 613 | VPages 614 | 1 615 | WindowInfo 616 | 617 | CurrentSheet 618 | 0 619 | DrawerOpen 620 | 621 | DrawerTab 622 | Outline 623 | DrawerWidth 624 | 209 625 | FitInWindow 626 | 627 | Frame 628 | {{835, 107}, {609, 889}} 629 | ShowRuler 630 | 631 | ShowStatusBar 632 | 633 | VisibleRegion 634 | {{0, 0}, {594, 775}} 635 | Zoom 636 | 1 637 | 638 | 639 | 640 | -------------------------------------------------------------------------------- /diagrams/grid-grobs.graffle/data.plist: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/grid-grobs.graffle/data.plist -------------------------------------------------------------------------------- /diagrams/grid-grobs.graffle/image1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/grid-grobs.graffle/image1.pdf -------------------------------------------------------------------------------- /diagrams/grid-grobs.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/grid-grobs.pdf -------------------------------------------------------------------------------- /diagrams/grid-panelvp.graffle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/grid-panelvp.graffle -------------------------------------------------------------------------------- /diagrams/grid-panelvp.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/grid-panelvp.pdf -------------------------------------------------------------------------------- /diagrams/grid-viewports.graffle/data.plist: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/grid-viewports.graffle/data.plist -------------------------------------------------------------------------------- /diagrams/grid-viewports.graffle/image1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/grid-viewports.graffle/image1.pdf -------------------------------------------------------------------------------- /diagrams/grid-viewports.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/grid-viewports.pdf -------------------------------------------------------------------------------- /diagrams/hcl-space.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/hcl-space.png -------------------------------------------------------------------------------- /diagrams/mastery-schema.graffle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/mastery-schema.graffle -------------------------------------------------------------------------------- /diagrams/mastery-schema.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/mastery-schema.pdf -------------------------------------------------------------------------------- /diagrams/mastery-schema.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/mastery-schema.png -------------------------------------------------------------------------------- /diagrams/position-facets.graffle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/position-facets.graffle -------------------------------------------------------------------------------- /diagrams/position-facets.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/position-facets.pdf -------------------------------------------------------------------------------- /diagrams/position-facets.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/position-facets.png -------------------------------------------------------------------------------- /diagrams/scale-guides.graffle/data.plist: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/scale-guides.graffle/data.plist -------------------------------------------------------------------------------- /diagrams/scale-guides.graffle/image1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/scale-guides.graffle/image1.pdf -------------------------------------------------------------------------------- /diagrams/scale-guides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/scale-guides.pdf -------------------------------------------------------------------------------- /diagrams/scale-guides.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/scale-guides.png -------------------------------------------------------------------------------- /diagrams/vector-raster.graffle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/vector-raster.graffle -------------------------------------------------------------------------------- /diagrams/vector-raster.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/vector-raster.pdf -------------------------------------------------------------------------------- /diagrams/vector-raster.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/diagrams/vector-raster.png -------------------------------------------------------------------------------- /errata.txt: -------------------------------------------------------------------------------- 1 | 2 | Final example in grid appendix should be 3 | p <- qplot(wt, mpg, data=mtcars, colour=cyl) 4 | grob <- ggplotGrob(p) 5 | grob2 <- geditGrob(grob, gPath("legend.text"), gp=gpar(fontface = "bold")) 6 | grid.newpage() 7 | grid.draw(grob2) 8 | 9 | In figure 3.8 on page 38, the figure label says that small values occur at the bottom of the scales. But in the figure they do not, e.g. 2 is shown at the top and 10 at the bottom. 10 | 11 | From Antony: -------------------------------------------------------- 12 | Appendix B 13 | Caption to Fig B1, something is wrong 14 | 15 | Chap 10 16 | p177-8 Interactive zooming would be so much better! 17 | 18 | Chap 9 19 | Figs 9.1 and 9.2 are claimed to be the same, but they are not. 20 | 21 | The two plots in Fig 9.2 are supposed to have the same predictions, but do not. 22 | 23 | Fig 9.5 (right) is not nearly as good as 9.5 (left) for the purpose. How would you handle lags? Did you ever see Graham Wills' Diamond Fast for interactive time series graphics? 24 | 25 | §9.2.2 Not a great pcp example. Would prefer examples where the plot tells us something and is chosen for that reason. What do the pcp's here tell us? 26 | 27 | Fig 6.1 p95 "What does it eat?" There is no legend entry for insecti 28 | 29 | p72 (jittered figures) Fig 5.5 is not nice 30 | 31 | p75 Fig 5.9 seems to me misleading, better to stick with the raw data resolution 32 | Or at least say more about what you are trying to achieve (simulating the likely variability) 33 | 34 | p79 Shouldn't cities be shown in the righthand plot? 35 | 36 | p79 Does the US look like this?! (Figs 5.12, 5.13) 37 | 38 | From Karl Ove Hufthammer ------------------------------------------ 39 | On page 162-163, there are shown four plots; let's call them A, B, C and D. 40 | B should be identical to C, except for the confidence ribbons. However, both the x and y axes differ. 41 | D should be identical to to B (and to C plus confidence limits). But D looks very different. For example, the peak in B and C lies under the other lines in D. And the axes are also slightly different from B. 42 | 43 | From Scott Kostyshak --------------------------------------------------------- 44 | 45 | p. 200 46 | I don't know anything about viewports but it seems strange that the upperleft and bottomright panels are both _2_1, should one be _1_2? 47 | 48 | Rob Baer --------------------------------------------------------------------- 49 | 50 | p100 Figure 6.4 legend. I think the reference to the left and right graphs is reversed. 51 | p. 116 line 17 reads, "cation (e.g., x~y) ..."; consider instead "cation (e.g., y~x) ..."; 52 | p. 135, fig 7.15 Legend suggest third panel has a smooth that fits the original data. It actually smooths as in panel 2. This is exactly the way the code on p.134 draws it, though, so I guess we would say the legend not the panel is wrong. 53 | p. 136, figure 8.9b. When I execulte the code, the viewport figure as written still covers the points of plot b unlike shown in the figure. The figure appears to also include a change in viewport origin not included in sample code. Is there a missing redifinition of subvp? 54 | ------------------------------------------------------------------------------ 55 | From Stefan Schreiber 56 | p. 101, 2nd bullet: 57 | For 14/10/1979 to be displayed the format should be “%d/%m/%Y”. In the book it looks like a lower case ‘y’. 58 | p. 143, Table 8.1: Theme elements. 59 | It says that the theme elements axis.text.x and axis.text.y control the axis labels, and axis.title.x and axis.title.y control the tick labels. I believe it is the other way around (also when considering Fig. 6.12: The components of the axes and legend). 60 | ------------------------------------------------------------------------------ 61 | From Raymond Balise 62 | Page 79 assault and murder should be Assault and Murder 63 | The code for pages 82 and 83 is not included in the book 64 | page 84 the first code block is missing the code to make the m2 dataset. I think this is what you did to make m2: 65 | m2 <- ggplot(movies, aes(x = round(rating), y = log10(votes))) 66 | page 84 the second code block is missing the code to make the m dataset. I think this is what you did to make m: 67 | m <- ggplot(movies, aes(x = year, y = rating)) 68 | page 124 the code for the left panel of the plot is not included in the book. I got it by tweaking the mpg3 dataset as follows: 69 | mpg3 <- within(mpg2, { 70 | model <- reorder(as.character(model), cty) 71 | manufacturer <- reorder(manufacturer, -cty) 72 | }) 73 | models <- qplot(cty, model, data = mpg3) 74 | models 75 | page 125 has a comma where a decimal point belongs: 76 | xmaj <- c(.3, .5, 1,3, 5) should be xmaj <- c(.3, .5, 1.3, 5) 77 | While not really a typo... the American spelling of the word colour on the last line of page 125 does not work. 78 | 79 | From Carson Sievert: 80 | First paragraph of scales chapter: should "the range of the range" be "the range of the _scale_"? 81 | -------------------------------------------------------------------------------- /extending.qmd: -------------------------------------------------------------------------------- 1 | # Advanced topics {#sec-advanced-topics .unnumbered} 2 | 3 | ```{r} 4 | #| echo: false 5 | #| message: false 6 | #| results: asis 7 | source("common.R") 8 | ``` 9 | 10 | As you become more fluent in ggplot2, you may find yourself wanting to use it in more advanced ways. 11 | You may want to write your own functions that create plots in a reusable fashion, or you may want to write your own packages that extend ggplot2. 12 | If this describes you, then the chapters in this part of the book are designed to get you started. 13 | In @sec-programming we discuss programming techniques you can use to create flexible and reusable ggplot2 visualisations. 14 | This is followed by @sec-internals which dives into the mechanics of what ggplot2 does when creating a plot, and @sec-extensions which builds upon this discussion to talk about how ggplot2 extensions are written. 15 | Finally, in order to make these ideas a little more concrete, @sec-spring1 presents a worked example of developing a ggplot2 extension. 16 | -------------------------------------------------------------------------------- /facet.qmd: -------------------------------------------------------------------------------- 1 | # Faceting {#sec-facet} 2 | 3 | ```{r} 4 | #| echo: false 5 | #| message: false 6 | #| results: asis 7 | source("common.R") 8 | status("drafting") 9 | ``` 10 | 11 | You first encountered faceting in @sec-qplot-faceting. 12 | Faceting generates small multiples each showing a different subset of the data. 13 | Small multiples are a powerful tool for exploratory data analysis: you can rapidly compare patterns in different parts of the data and see whether they are the same or different. 14 | This section will discuss how you can fine-tune facets, particularly the way in which they interact with position scales. 15 | \index{Faceting} \index{Positioning!faceting} 16 | 17 | There are three types of faceting: 18 | 19 | - `facet_null()`: a single plot, the default. 20 | \indexf{facet\_null} 21 | 22 | - `facet_wrap()`: "wraps" a 1d ribbon of panels into 2d. 23 | 24 | - `facet_grid()`: produces a 2d grid of panels defined by variables which form the rows and columns. 25 | 26 | The differences between `facet_wrap()` and `facet_grid()` are illustrated in the figure below. 27 | 28 | ```{r} 29 | #| label: facet-sketch 30 | #| echo: false 31 | #| out.width: 75% 32 | #| fig.cap: A sketch illustrating the difference between the two faceting systems. `facet_grid()` 33 | #| (left) is fundamentally 2d, being made up of two independent components. `facet_wrap()` 34 | #| (right) is 1d, but wrapped into 2d to save space. 35 | knitr::include_graphics("diagrams/position-facets.png", dpi = 300, auto_pdf = TRUE) 36 | ``` 37 | 38 | Faceted plots have the capability to fill up a lot of space, so for this chapter we will use a subset of the mpg dataset that has a manageable number of levels: three cylinders (4, 6, 8), two types of drive train (4 and f), and six classes. 39 | 40 | ```{r} 41 | #| label: mpg2 42 | mpg2 <- subset(mpg, cyl != 5 & drv %in% c("4", "f") & class != "2seater") 43 | ``` 44 | 45 | ## Facet wrap {#sec-facet-wrap} 46 | 47 | `facet_wrap()` makes a long ribbon of panels (generated by any number of variables) and wraps it into 2d. 48 | This is useful if you have a single variable with many levels and want to arrange the plots in a more space efficient manner. 49 | \index{Faceting!wrapped} \indexf{facet\_wrap} \indexc{\textasciitilde} 50 | 51 | You can control how the ribbon is wrapped into a grid with `ncol`, `nrow`, `as.table` and `dir`. 52 | `ncol` and `nrow` control how many columns and rows (you only need to set one). 53 | `as.table` controls whether the facets are laid out like a table (`TRUE`), with highest values at the bottom-right, or a plot (`FALSE`), with the highest values at the top-right. 54 | `dir` controls the direction of wrap: **h**orizontal or **v**ertical. 55 | 56 | ```{r} 57 | #| layout-ncol: 2 58 | #| fig-width: 4 59 | base <- ggplot(mpg2, aes(displ, hwy)) + 60 | geom_blank() + 61 | xlab(NULL) + 62 | ylab(NULL) 63 | 64 | base + facet_wrap(~class, ncol = 3) 65 | base + facet_wrap(~class, ncol = 3, as.table = FALSE) 66 | ``` 67 | 68 | ```{r} 69 | #| layout-ncol: 2 70 | #| fig-width: 4 71 | base + facet_wrap(~class, nrow = 3) 72 | base + facet_wrap(~class, nrow = 3, dir = "v") 73 | ``` 74 | 75 | ## Facet grid 76 | 77 | `facet_grid()` lays out plots in a 2d grid, as defined by a formula: \index{Faceting!grid} \indexf{facet\_grid} 78 | 79 | - `. ~ a` spreads the values of `a` across the columns. 80 | This direction facilitates comparisons of y position, because the vertical scales are aligned. 81 | 82 | ```{r} 83 | base + facet_grid(. ~ cyl) 84 | ``` 85 | 86 | - `b ~ .` spreads the values of `b` down the rows. 87 | This direction facilitates comparison of x position because the horizontal scales are aligned. 88 | This makes it particularly useful for comparing distributions. 89 | 90 | ```{r} 91 | base + facet_grid(drv ~ .) 92 | ``` 93 | 94 | - `b ~ a` spreads `a` across columns and `b` down rows. 95 | You'll usually want to put the variable with the greatest number of levels in the columns, to take advantage of the aspect ratio of your screen. 96 | 97 | ```{r} 98 | base + facet_grid(drv ~ cyl) 99 | ``` 100 | 101 | You can use multiple variables in the rows or columns, by "adding" them together, e.g. `a + b ~ c + d`. 102 | Variables appearing together on the rows or columns are nested in the sense that only combinations that appear in the data will appear in the plot. 103 | Variables that are specified on rows and columns will be crossed: all combinations will be shown, including those that didn't appear in the original dataset: this may result in empty panels. 104 | 105 | ## Controlling scales {#sec-controlling-scales} 106 | 107 | For both `facet_wrap()` and `facet_grid()` you can control whether the position scales are the same in all panels (fixed) or allowed to vary between panels (free) with the `scales` parameter: \index{Faceting!interaction with scales} \index{Scales!interaction with faceting} \index{Faceting!controlling scales} 108 | 109 | - `scales = "fixed"`: x and y scales are fixed across all panels. 110 | - `scales = "free_x"`: the x scale is free, and the y scale is fixed. 111 | - `scales = "free_y"`: the y scale is free, and the x scale is fixed. 112 | - `scales = "free"`: x and y scales vary across panels. 113 | 114 | `facet_grid()` imposes an additional constraint on the scales: all panels in a column must have the same x scale, and all panels in a row must have the same y scale. 115 | This is because each column shares an x axis, and each row shares a y axis. 116 | 117 | Fixed scales make it easier to see patterns across panels; free scales make it easier to see patterns within panels. 118 | 119 | ```{r} 120 | #| label: fixed-vs-free 121 | #| layout-ncol: 2 122 | #| fig-width: 4 123 | p <- ggplot(mpg2, aes(cty, hwy)) + 124 | geom_abline() + 125 | geom_jitter(width = 0.1, height = 0.1) 126 | 127 | p + facet_wrap(~cyl) 128 | p + facet_wrap(~cyl, scales = "free") 129 | ``` 130 | 131 | Free scales are also useful when we want to display multiple time series that were measured on different scales. 132 | To do this, we first need to change from 'wide' to 'long' data, stacking the separate variables into a single column. 133 | An example of this is shown below with the long form of the `economics` data. 134 | \index{Data!economics\_long@\texttt{economics\_long}} 135 | 136 | ```{r} 137 | #| label: time 138 | economics_long 139 | ggplot(economics_long, aes(date, value)) + 140 | geom_line() + 141 | facet_wrap(~variable, scales = "free_y", ncol = 1) 142 | ``` 143 | 144 | `facet_grid()` has an additional parameter called `space`, which takes the same values as `scales`. 145 | When space is "free", each column (or row) will have width (or height) proportional to the range of the scale for that column (or row). 146 | This makes the scaling equal across the whole plot: 1 cm on each panel maps to the same range of data. 147 | (This is somewhat analogous to the 'sliced' axis limits of lattice.) For example, if panel a had range 2 and panel b had range 4, one-third of the space would be given to a, and two-thirds to b. 148 | This is most useful for categorical scales, where we can assign space proportionally based on the number of levels in each facet, as illustrated below. 149 | 150 | ```{r} 151 | #| label: discrete-free 152 | mpg2$model <- reorder(mpg2$model, mpg2$cty) 153 | mpg2$manufacturer <- reorder(mpg2$manufacturer, -mpg2$cty) 154 | 155 | ggplot(mpg2, aes(cty, model)) + 156 | geom_point() + 157 | facet_grid(manufacturer ~ ., scales = "free", space = "free") + 158 | theme(strip.text.y = element_text(angle = 0)) 159 | ``` 160 | 161 | ## Missing faceting variables {#sec-missing-faceting-columns} 162 | 163 | If you are using faceting on a plot with multiple datasets, what happens when one of those datasets is missing the faceting variables? 164 | This situation commonly arises when you are adding contextual information that should be the same in all panels. 165 | For example, imagine you have a spatial display of disease faceted by gender. 166 | What happens when you add a map layer that does not contain the gender variable? 167 | Here ggplot will do what you expect: it will display the map in every facet: missing faceting variables are treated like they have all values. 168 | \index{Faceting!missing data} 169 | 170 | Here's a simple example. 171 | Note how the single red point from `df2` appears in both panels. 172 | 173 | ```{r} 174 | df1 <- data.frame(x = 1:3, y = 1:3, gender = c("f", "f", "m")) 175 | df2 <- data.frame(x = 2, y = 2) 176 | 177 | ggplot(df1, aes(x, y)) + 178 | geom_point(data = df2, colour = "red", size = 2) + 179 | geom_point() + 180 | facet_wrap(~gender) 181 | ``` 182 | 183 | This technique is particularly useful when you add annotations to make it easier to compare between facets, as shown in the next section. 184 | 185 | ## Grouping vs. faceting {#sec-group-vs-facet} 186 | 187 | Faceting is an alternative to using aesthetics (like colour, shape or size) to differentiate groups. 188 | Both techniques have strengths and weaknesses, based around the relative positions of the subsets. 189 | \index{Faceting!vs. grouping} \index{Grouping!vs. faceting} With faceting, each group is quite far apart in its own panel, and there is no overlap between the groups. 190 | This is good if the groups overlap a lot, but it does make small differences harder to see. 191 | When using aesthetics to differentiate groups, the groups are close together and may overlap, but small differences are easier to see. 192 | 193 | ```{r} 194 | df <- data.frame( 195 | x = rnorm(120, c(0, 2, 4)), 196 | y = rnorm(120, c(1, 2, 1)), 197 | z = letters[1:3] 198 | ) 199 | 200 | ggplot(df, aes(x, y)) + 201 | geom_point(aes(colour = z)) 202 | ``` 203 | 204 | ```{r} 205 | ggplot(df, aes(x, y)) + 206 | geom_point() + 207 | facet_wrap(~z) 208 | ``` 209 | 210 | Comparisons between facets often benefit from some thoughtful annotation. 211 | For example, in this case we could show the mean of each group in every panel. 212 | To do this we group and summarise the data using the dplyr package, which is covered in R for Data Science at . 213 | Note that we need two "z" variables: one for the facets and one for the colours. 214 | \index{Faceting!adding annotations} 215 | 216 | ```{r} 217 | df_sum <- df %>% 218 | group_by(z) %>% 219 | summarise(x = mean(x), y = mean(y)) %>% 220 | rename(z2 = z) 221 | 222 | ggplot(df, aes(x, y)) + 223 | geom_point() + 224 | geom_point(data = df_sum, aes(colour = z2), size = 4) + 225 | facet_wrap(~z) 226 | ``` 227 | 228 | Another useful technique is to put all the data in the background of each panel: 229 | 230 | ```{r} 231 | df2 <- dplyr::select(df, -z) 232 | 233 | ggplot(df, aes(x, y)) + 234 | geom_point(data = df2, colour = "grey70") + 235 | geom_point(aes(colour = z)) + 236 | facet_wrap(~z) 237 | ``` 238 | 239 | ## Continuous variables {#sec-continuous-variables} 240 | 241 | To facet continuous variables, you must first discretise them. 242 | ggplot2 provides three helper functions to do so: \index{Faceting!by continuous variables} 243 | 244 | - Divide the data into `n` bins each of the same length: `cut_interval(x, n)` \indexf{cut\_interval} 245 | 246 | - Divide the data into bins of width `width`: `cut_width(x, width)`. 247 | \indexf{cut\_width} 248 | 249 | - Divide the data into n bins each containing (approximately) the same number of points: `cut_number(x, n = 10)`. 250 | \indexf{cut\_number} 251 | 252 | They are illustrated below: 253 | 254 | ```{r} 255 | #| label: discretising 256 | #| layout-ncol: 3 257 | #| fig-width: 3 258 | # Bins of width 1 259 | mpg2$disp_w <- cut_width(mpg2$displ, 1) 260 | # Six bins of equal length 261 | mpg2$disp_i <- cut_interval(mpg2$displ, 6) 262 | # Six bins containing equal numbers of points 263 | mpg2$disp_n <- cut_number(mpg2$displ, 6) 264 | 265 | plot <- ggplot(mpg2, aes(cty, hwy)) + 266 | geom_point() + 267 | labs(x = NULL, y = NULL) 268 | plot + facet_wrap(~disp_w, nrow = 1) 269 | plot + facet_wrap(~disp_i, nrow = 1) 270 | plot + facet_wrap(~disp_n, nrow = 1) 271 | ``` 272 | 273 | Note that the faceting formula does not evaluate functions, so you must first create a new variable containing the discretised data. 274 | 275 | ## Exercises 276 | 277 | 1. Diamonds: display the distribution of price conditional on cut and carat. 278 | Try faceting by cut and grouping by carat. 279 | Try faceting by carat and grouping by cut. 280 | Which do you prefer? 281 | 282 | 2. Diamonds: compare the relationship between price and carat for each colour. 283 | What makes it hard to compare the groups? 284 | Is grouping better or faceting? 285 | If you use faceting, what annotation might you add to make it easier to see the differences between panels? 286 | 287 | 3. Why is `facet_wrap()` generally more useful than `facet_grid()`? 288 | 289 | 4. Recreate the following plot. 290 | It facets `mpg2` by class, overlaying a smooth curve fit to the full dataset. 291 | 292 | ```{r} 293 | #| echo: false 294 | ggplot(mpg2, aes(displ, hwy)) + 295 | geom_smooth(data = select(mpg2, -class), se = FALSE) + 296 | geom_point() + 297 | facet_wrap(~class, nrow = 2) 298 | ``` 299 | -------------------------------------------------------------------------------- /ga_script.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 10 | 11 | -------------------------------------------------------------------------------- /ggplot2-book.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: knitr 13 | LaTeX: XeLaTeX 14 | 15 | AutoAppendNewline: Yes 16 | StripTrailingWhitespace: Yes 17 | 18 | BuildType: Website 19 | 20 | MarkdownWrap: Sentence 21 | MarkdownCanonical: Yes 22 | -------------------------------------------------------------------------------- /ggplot2-review/forrence.md: -------------------------------------------------------------------------------- 1 | 2 | 1 - It's not always obvious what packages are used in a given chapter. For example, dplyr is attached in one of the exercises in Chapter 3 (3.5.5), and the pipe operator and print method are both used throughout the rest of the chapter. It might be a little confusing for someone new to the dplyr/tidyr/ggplot2 combo. 3 | 4 | 2 - Perhaps modify the message to include that `bins` is now a valid input as of #1158. For instance, `stat_bin()` using `bins = 30`. Pick better value with `binwidth` or `bins`.` I can make a PR if it's more convenient. 5 | 6 | 3 - Do you want to set warnings and messages to false for the entire doc, minus the times when you want to emphasize the warning/message (for instance, in 2.6.3)? It would save some space. 7 | 8 | 4 - Just a lot of little things, like commas, capitalization, etc. 9 | - eg. `facetting` and `faceting` are both used throughout the book, though it looks like the double t is the British English way. 10 | - I tend to break sentences up more. It's a preference thing, so many of those are optional. 11 | 12 | 5 - A few of the examples were broken when using previous package versions, such as the example on pg. 207 with dplyr 0.4.2. The most recent dev versions seemed to clear these up. 13 | -------------------------------------------------------------------------------- /ggplot2-review/ggplot2-book-David-Robinson.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/ggplot2-review/ggplot2-book-David-Robinson.pdf -------------------------------------------------------------------------------- /ggplot2-review/ggplot2-book-comments-forrence.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/ggplot2-review/ggplot2-book-comments-forrence.pdf -------------------------------------------------------------------------------- /ggplot2-review/ggplot2-book-pastor.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/ggplot2-review/ggplot2-book-pastor.pdf -------------------------------------------------------------------------------- /ggplot2-review/ggplot2-book-wdoane-edits.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/ggplot2-review/ggplot2-book-wdoane-edits.pdf -------------------------------------------------------------------------------- /ggplot2-review/ggplot2-book_ygc.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/ggplot2-review/ggplot2-book_ygc.pdf -------------------------------------------------------------------------------- /index.qmd: -------------------------------------------------------------------------------- 1 | # Welcome {.unnumbered} 2 | 3 | This is the on-line version of work-in-progress **3rd edition** of "ggplot2: elegant graphics for data analysis" published by Springer. 4 | You can learn what's changed from the 2nd edition in the [Preface](#preface-3e). 5 | 6 | While this book gives some details on the basics of ggplot2, its primary focus is explaining the Grammar of Graphics that ggplot2 uses, and describing the full details. 7 | It is not a [cookbook](https://r-graphics.org), and won't necessarily help you create any specific graphic that you need. 8 | But it will help you understand the details of the underlying theory, giving you the power to tailor any plot specifically to your needs. 9 | 10 | The book is written by [Hadley Wickham](http://hadley.nz), [Danielle Navarro](https://djnavarro.net), and [Thomas Lin Pedersen](https://www.data-imaginist.com). 11 | -------------------------------------------------------------------------------- /individual-geoms.qmd: -------------------------------------------------------------------------------- 1 | # Individual geoms {#sec-individual-geoms} 2 | 3 | ```{r} 4 | #| echo: false 5 | #| message: false 6 | #| results: asis 7 | source("common.R") 8 | status("drafting") 9 | ``` 10 | 11 | ## Basic plot types {#sec-basics} 12 | 13 | These geoms are the fundamental building blocks of ggplot2. 14 | They are useful in their own right, but are also used to construct more complex geoms. 15 | Most of these geoms are associated with a named plot: when that geom is used by itself in a plot, that plot has a special name. 16 | 17 | Each of these geoms is two dimensional and requires both `x` and `y` aesthetics. 18 | All of them understand `colour` (or `color`) and `size` aesthetics, and the filled geoms (bar, tile and polygon) also understand `fill`. 19 | 20 | - `geom_area()` draws an **area plot**, which is a line plot filled to the y-axis (filled lines). 21 | Multiple groups will be stacked on top of each other. 22 | \index{Area plot} \indexf{geom\_area} 23 | 24 | - `geom_bar(stat = "identity")` makes a **bar chart**. 25 | We need `stat = "identity"` because the default stat automatically counts values (so is essentially a 1d geom, see @sec-distributions). 26 | The identity stat leaves the data unchanged. 27 | Multiple bars in the same location will be stacked on top of one another.\index{Barchart} \indexf{geom\_bar} 28 | 29 | - `geom_line()` makes a **line plot**. 30 | The `group` aesthetic determines which observations are connected; see @sec-collective-geoms for more detail. 31 | `geom_line()` connects points from left to right; `geom_path()` is similar but connects points in the order they appear in the data. 32 | Both `geom_line()` and `geom_path()` also understand the aesthetic `linetype`, which maps a categorical variable to solid, dotted and dashed lines. 33 | \index{Line plot} \indexf{geom\_line} \indexf{geom\_path} 34 | 35 | - `geom_point()` produces a **scatterplot**. 36 | `geom_point()` also understands the `shape` aesthetic. 37 | \indexf{geom\_point} 38 | 39 | - `geom_polygon()` draws polygons, which are filled paths. 40 | Each vertex of the polygon requires a separate row in the data. 41 | It is often useful to merge a data frame of polygon coordinates with the data just prior to plotting. 42 | @sec-maps illustrates this concept in more detail for map data. 43 | \indexf{geom\_polygon} 44 | 45 | - `geom_rect()`, `geom_tile()` and `geom_raster()` draw rectangles. 46 | `geom_rect()` is parameterised by the four corners of the rectangle, `xmin`, `ymin`, `xmax` and `ymax`. 47 | `geom_tile()` is exactly the same, but parameterised by the center of the rect and its size, `x`, `y`, `width` and `height`. 48 | `geom_raster()` is a fast special case of `geom_tile()` used when all the tiles are the same size. 49 | \index{Image plot} \index{Level plot} \indexf{geom\_tile}. 50 | \indexf{geom\_rect} \indexf{geom\_raster} 51 | 52 | - `geom_text()` adds text to a plot. 53 | It requires a `label` aesthetic that provides the text to display, and has a number of parameters (`angle`, `family`, `fontface`, `hjust` and `vjust`) that control the appearance of the text. 54 | 55 | Each geom is shown in the code below. 56 | Observe the different axis ranges for the bar, area and tile plots: these geoms take up space outside the range of the data, and so push the axes out. 57 | 58 | ```{r} 59 | #| label: geom-basic 60 | #| layout-ncol: 4 61 | #| fig-width: 2.5 62 | df <- data.frame( 63 | x = c(3, 1, 5), 64 | y = c(2, 4, 6), 65 | label = c("a","b","c") 66 | ) 67 | p <- ggplot(df, aes(x, y, label = label)) + 68 | labs(x = NULL, y = NULL) + # Hide axis label 69 | theme(plot.title = element_text(size = 12)) # Shrink plot title 70 | p + geom_point() + ggtitle("point") 71 | p + geom_text() + ggtitle("text") 72 | p + geom_bar(stat = "identity") + ggtitle("bar") 73 | p + geom_tile() + ggtitle("raster") 74 | ``` 75 | 76 | ```{r} 77 | #| layout-ncol: 4 78 | #| fig-width: 2.5 79 | p + geom_line() + ggtitle("line") 80 | p + geom_area() + ggtitle("area") 81 | p + geom_path() + ggtitle("path") 82 | p + geom_polygon() + ggtitle("polygon") 83 | ``` 84 | 85 | ### Exercises 86 | 87 | 1. What geoms would you use to draw each of the following named plots? 88 | 89 | 1. Scatterplot 90 | 2. Line chart 91 | 3. Histogram 92 | 4. Bar chart 93 | 5. Pie chart 94 | 95 | 2. What's the difference between `geom_path()` and `geom_polygon()`? 96 | What's the difference between `geom_path()` and `geom_line()`? 97 | 98 | 3. What low-level geoms are used to draw `geom_smooth()`? 99 | What about `geom_boxplot()` and `geom_violin()`? 100 | -------------------------------------------------------------------------------- /internals_ggbuild.R: -------------------------------------------------------------------------------- 1 | # The ggbuild() function mimics the behaviour of the ggplot_build() 2 | # function (from version 3.3.0.9000). It is a direct copy of the source 3 | # code from the original, with calls to internal ggplot2 functions 4 | # namespaced via ::: 5 | # 6 | # The only substantive modification from the original is that it maintains 7 | # a list "all_steps" that records the state of the data at various points in 8 | # the build process, in addition to the final ggplot_built object. 9 | # 10 | # Lines that marked with # ****** are those that have been modified or inserted 11 | 12 | 13 | ggbuild <- function(plot) { 14 | 15 | all_steps <- list() # ****** 16 | 17 | plot <- ggplot2:::plot_clone(plot) # ****** 18 | if (length(plot$layers) == 0) { 19 | plot <- plot + geom_blank() 20 | } 21 | 22 | layers <- plot$layers 23 | layer_data <- lapply(layers, function(y) y$layer_data(plot$data)) 24 | 25 | scales <- plot$scales 26 | # Apply function to layer and matching data 27 | by_layer <- function(f) { 28 | out <- vector("list", length(data)) 29 | for (i in seq_along(data)) { 30 | out[[i]] <- f(l = layers[[i]], d = data[[i]]) 31 | } 32 | out 33 | } 34 | 35 | # Allow all layers to make any final adjustments based 36 | # on raw input data and plot info 37 | data <- layer_data 38 | data <- by_layer(function(l, d) l$setup_layer(d, plot)) 39 | 40 | # Initialise panels, add extra data for margins & missing faceting 41 | # variables, and add on a PANEL variable to data 42 | layout <- ggplot2:::create_layout(plot$facet, plot$coordinates) # ****** 43 | data <- layout$setup(data, plot$data, plot$plot_env) 44 | 45 | # Compute aesthetics to produce data with generalised variable names 46 | data <- by_layer(function(l, d) l$compute_aesthetics(d, plot)) 47 | 48 | # Record the data at the end of the "preparation" stage 49 | all_steps$prepared <- data # ****** 50 | 51 | 52 | # Transform all scales 53 | data <- lapply(data, scales$transform_df) # ****** 54 | 55 | # Record the layer data after scale transformation applied 56 | all_steps$transformed <- data # ****** 57 | 58 | # Map and train positions so that statistics have access to ranges 59 | # and all positions are numeric 60 | scale_x <- function() scales$get_scales("x") 61 | scale_y <- function() scales$get_scales("y") 62 | 63 | layout$train_position(data, scale_x(), scale_y()) 64 | data <- layout$map_position(data) 65 | 66 | # Record the layer data after position adjustment 67 | all_steps$positioned <- data # ****** 68 | 69 | # Apply and map statistics 70 | data <- by_layer(function(l, d) l$compute_statistic(d, layout)) 71 | data <- by_layer(function(l, d) l$map_statistic(d, plot)) 72 | 73 | # Record the state of the layer data after position adjustment 74 | all_steps$poststat <- data # ****** 75 | 76 | # Make sure missing (but required) aesthetics are added 77 | scales$add_missing(c("x", "y"), plot$plot_env) # ****** 78 | 79 | # Reparameterise geoms from (e.g.) y and width to ymin and ymax 80 | data <- by_layer(function(l, d) l$compute_geom_1(d)) 81 | 82 | # Apply position adjustments 83 | data <- by_layer(function(l, d) l$compute_position(d, layout)) 84 | 85 | # Record the state of the data once geom and position adjustments are made 86 | all_steps$geompos <- data # ****** 87 | 88 | # Reset position scales, then re-train and map. This ensures that facets 89 | # have control over the range of a plot: is it generated from what is 90 | # displayed, or does it include the range of underlying data 91 | layout$reset_scales() 92 | layout$train_position(data, scale_x(), scale_y()) 93 | layout$setup_panel_params() 94 | data <- layout$map_position(data) 95 | 96 | # Train and map non-position scales 97 | npscales <- scales$non_position_scales() 98 | if (npscales$n() > 0) { 99 | lapply(data, npscales$train_df) # ****** 100 | data <- lapply(data, npscales$map_df) # ****** 101 | } 102 | 103 | # Fill in defaults etc. 104 | data <- by_layer(function(l, d) l$compute_geom_2(d)) 105 | 106 | # Let layer stat have a final say before rendering 107 | data <- by_layer(function(l, d) l$finish_statistics(d)) 108 | 109 | # Let Layout modify data before rendering 110 | data <- layout$finish_data(data) 111 | 112 | 113 | # Record the data at the end of the "preparation" stage 114 | all_steps$built <- structure( # ****** 115 | list(data = data, layout = layout, plot = plot), # ****** 116 | class = "ggplot_built" # ****** 117 | ) # ****** 118 | 119 | return(all_steps) # ****** 120 | } 121 | 122 | -------------------------------------------------------------------------------- /internals_gggtable.R: -------------------------------------------------------------------------------- 1 | # The gggtable() function mimics the behaviour of the ggplot_gtable() 2 | # function (from version 3.3.0.9000). It is a direct copy of the source 3 | # code from the original, with calls to internal ggplot2 functions 4 | # namespaced via :::, and calls to grid and gtable functions namespaced with :: 5 | # 6 | # The only substantive modification from the original is that it maintains 7 | # a list "all_states" that records the state of the gtable at various points in 8 | # the rendering process in addition to the final gtable 9 | # 10 | # Lines that marked with # ****** are those that have been modified or inserted 11 | 12 | gggtable <- function(data) { 13 | 14 | `%||%` <- ggplot2:::`%||%` # ****** 15 | 16 | plot <- data$plot 17 | layout <- data$layout 18 | data <- data$data 19 | theme <- ggplot2:::plot_theme(plot) # ****** 20 | 21 | geom_grobs <- Map(function(l, d) l$draw_geom(d, layout), plot$layers, data) 22 | plot_table <- layout$render(geom_grobs, data, theme, plot$labels) 23 | 24 | # Record the state after the panel layouts have done their job (I think!) 25 | all_states <- list() # ****** 26 | all_states$panels <- plot_table # ****** 27 | 28 | # Legends 29 | position <- theme$legend.position %||% "right" 30 | if (length(position) == 2) { 31 | position <- "manual" 32 | } 33 | 34 | legend_box <- plot$guides$assemble(theme) 35 | plot_table <- ggplot2:::table_add_legends(plot_table, legend_box, theme) 36 | 37 | # Record the state of the gtable after the legends have been added 38 | all_states$legend <- plot_table # ****** 39 | 40 | 41 | # Title 42 | title <- element_render(theme, "plot.title", plot$labels$title, margin_y = TRUE) 43 | title_height <- grid::grobHeight(title) # ****** 44 | 45 | # Subtitle 46 | subtitle <- element_render(theme, "plot.subtitle", plot$labels$subtitle, margin_y = TRUE) 47 | subtitle_height <- grid::grobHeight(subtitle) # ****** 48 | 49 | # Tag 50 | tag <- element_render(theme, "plot.tag", plot$labels$tag, margin_y = TRUE, margin_x = TRUE) 51 | tag_height <- grid::grobHeight(tag) # ****** 52 | tag_width <- grid::grobWidth(tag) # ****** 53 | 54 | # whole plot annotation 55 | caption <- element_render(theme, "plot.caption", plot$labels$caption, margin_y = TRUE) 56 | caption_height <- grid::grobHeight(caption) # ****** 57 | 58 | # positioning of title and subtitle is governed by plot.title.position 59 | # positioning of caption is governed by plot.caption.position 60 | # "panel" means align to the panel(s) 61 | # "plot" means align to the entire plot (except margins and tag) 62 | title_pos <- theme$plot.title.position %||% "panel" 63 | if (!(title_pos %in% c("panel", "plot"))) { 64 | abort('plot.title.position should be either "panel" or "plot".') 65 | } 66 | caption_pos <- theme$plot.caption.position %||% "panel" 67 | if (!(caption_pos %in% c("panel", "plot"))) { 68 | abort('plot.caption.position should be either "panel" or "plot".') 69 | } 70 | 71 | pans <- plot_table$layout[grepl("^panel", plot_table$layout$name), , drop = FALSE] 72 | if (title_pos == "panel") { 73 | title_l = min(pans$l) 74 | title_r = max(pans$r) 75 | } else { 76 | title_l = 1 77 | title_r = ncol(plot_table) 78 | } 79 | if (caption_pos == "panel") { 80 | caption_l = min(pans$l) 81 | caption_r = max(pans$r) 82 | } else { 83 | caption_l = 1 84 | caption_r = ncol(plot_table) 85 | } 86 | 87 | plot_table <- gtable::gtable_add_rows(plot_table, subtitle_height, pos = 0) # ****** 88 | plot_table <- gtable::gtable_add_grob(plot_table, subtitle, name = "subtitle", # ****** 89 | t = 1, b = 1, l = title_l, r = title_r, clip = "off") 90 | 91 | plot_table <- gtable::gtable_add_rows(plot_table, title_height, pos = 0) # ****** 92 | plot_table <- gtable::gtable_add_grob(plot_table, title, name = "title", # ****** 93 | t = 1, b = 1, l = title_l, r = title_r, clip = "off") 94 | 95 | plot_table <- gtable::gtable_add_rows(plot_table, caption_height, pos = -1) # ****** 96 | plot_table <- gtable::gtable_add_grob(plot_table, caption, name = "caption", # ****** 97 | t = -1, b = -1, l = caption_l, r = caption_r, clip = "off") 98 | 99 | plot_table <- gtable::gtable_add_rows(plot_table, unit(0, 'pt'), pos = 0) # ****** 100 | plot_table <- gtable::gtable_add_cols(plot_table, unit(0, 'pt'), pos = 0) # ****** 101 | plot_table <- gtable::gtable_add_rows(plot_table, unit(0, 'pt'), pos = -1) # ****** 102 | plot_table <- gtable::gtable_add_cols(plot_table, unit(0, 'pt'), pos = -1) # ****** 103 | 104 | tag_pos <- theme$plot.tag.position %||% "topleft" 105 | if (length(tag_pos) == 2) tag_pos <- "manual" 106 | valid_pos <- c("topleft", "top", "topright", "left", "right", "bottomleft", 107 | "bottom", "bottomright") 108 | 109 | if (!(tag_pos == "manual" || tag_pos %in% valid_pos)) { 110 | abort(glue("plot.tag.position should be a coordinate or one of ", 111 | glue_collapse(valid_pos, ', ', last = " or "))) 112 | } 113 | 114 | if (tag_pos == "manual") { 115 | xpos <- theme$plot.tag.position[1] 116 | ypos <- theme$plot.tag.position[2] 117 | tag_parent <- justify_grobs(tag, x = xpos, y = ypos, 118 | hjust = theme$plot.tag$hjust, 119 | vjust = theme$plot.tag$vjust, 120 | int_angle = theme$plot.tag$angle, 121 | debug = theme$plot.tag$debug) 122 | plot_table <- gtable_add_grob(plot_table, tag_parent, name = "tag", t = 1, 123 | b = nrow(plot_table), l = 1, 124 | r = ncol(plot_table), clip = "off") 125 | } else { 126 | # Widths and heights are reassembled below instead of assigning into them 127 | # in order to avoid bug in grid 3.2 and below. 128 | if (tag_pos == "topleft") { 129 | plot_table$widths <- grid::unit.c(tag_width, plot_table$widths[-1]) # ****** 130 | plot_table$heights <- grid::unit.c(tag_height, plot_table$heights[-1]) # ****** 131 | plot_table <- gtable::gtable_add_grob(plot_table, tag, name = "tag", # ****** 132 | t = 1, l = 1, clip = "off") 133 | } else if (tag_pos == "top") { 134 | plot_table$heights <- grid::unit.c(tag_height, plot_table$heights[-1]) # ****** 135 | plot_table <- gtable::gtable_add_grob(plot_table, tag, name = "tag", # ****** 136 | t = 1, l = 1, r = ncol(plot_table), 137 | clip = "off") 138 | } else if (tag_pos == "topright") { 139 | plot_table$widths <- grid::unit.c(plot_table$widths[-ncol(plot_table)], tag_width) # ****** 140 | plot_table$heights <- grid::unit.c(tag_height, plot_table$heights[-1]) # ****** 141 | plot_table <- gtable::gtable_add_grob(plot_table, tag, name = "tag", # ****** 142 | t = 1, l = ncol(plot_table), clip = "off") 143 | } else if (tag_pos == "left") { 144 | plot_table$widths <- grid::unit.c(tag_width, plot_table$widths[-1]) # ****** 145 | plot_table <- gtable::gtable_add_grob(plot_table, tag, name = "tag", # ****** 146 | t = 1, b = nrow(plot_table), l = 1, 147 | clip = "off") 148 | } else if (tag_pos == "right") { 149 | plot_table$widths <- grid::unit.c(plot_table$widths[-ncol(plot_table)], tag_width) # ****** 150 | plot_table <- gtable::gtable_add_grob(plot_table, tag, name = "tag", # ****** 151 | t = 1, b = nrow(plot_table), l = ncol(plot_table), 152 | clip = "off") 153 | } else if (tag_pos == "bottomleft") { 154 | plot_table$widths <- grid::unit.c(tag_width, plot_table$widths[-1]) # ****** 155 | plot_table$heights <- grid::unit.c(plot_table$heights[-nrow(plot_table)], tag_height) # ****** 156 | plot_table <- gtable::gtable_add_grob(plot_table, tag, name = "tag", # ****** 157 | t = nrow(plot_table), l = 1, clip = "off") 158 | } else if (tag_pos == "bottom") { 159 | plot_table$heights <- grid::unit.c(plot_table$heights[-nrow(plot_table)], tag_height) # ****** 160 | plot_table <- gtable::gtable_add_grob(plot_table, tag, name = "tag", # ****** 161 | t = nrow(plot_table), l = 1, r = ncol(plot_table), clip = "off") 162 | } else if (tag_pos == "bottomright") { 163 | plot_table$widths <- grid::unit.c(plot_table$widths[-ncol(plot_table)], tag_width) # ****** 164 | plot_table$heights <- grid::unit.c(plot_table$heights[-nrow(plot_table)], tag_height) # ****** 165 | plot_table <- gtable::gtable_add_grob(plot_table, tag, name = "tag", # ****** 166 | t = nrow(plot_table), l = ncol(plot_table), clip = "off") 167 | } 168 | } 169 | 170 | # Margins 171 | plot_table <- gtable::gtable_add_rows(plot_table, theme$plot.margin[1], pos = 0) # ****** 172 | plot_table <- gtable::gtable_add_cols(plot_table, theme$plot.margin[2]) # ****** 173 | plot_table <- gtable::gtable_add_rows(plot_table, theme$plot.margin[3]) # ****** 174 | plot_table <- gtable::gtable_add_cols(plot_table, theme$plot.margin[4], pos = 0) # ****** 175 | 176 | if (inherits(theme$plot.background, "element")) { 177 | plot_table <- gtable::gtable_add_grob(plot_table, # ****** 178 | element_render(theme, "plot.background"), 179 | t = 1, l = 1, b = -1, r = -1, name = "background", z = -Inf) 180 | plot_table$layout <- plot_table$layout[c(nrow(plot_table$layout), 1:(nrow(plot_table$layout) - 1)),] 181 | plot_table$grobs <- plot_table$grobs[c(nrow(plot_table$layout), 1:(nrow(plot_table$layout) - 1))] 182 | } 183 | 184 | 185 | # Record the final state of the gtable 186 | all_states$final <- plot_table # ****** 187 | return(all_states) # ****** 188 | } 189 | -------------------------------------------------------------------------------- /introduction.qmd: -------------------------------------------------------------------------------- 1 | \mainmatter 2 | 3 | # Introduction {#sec-introduction} 4 | 5 | ```{r} 6 | #| echo: false 7 | #| message: false 8 | #| results: asis 9 | source("common.R") 10 | status("drafting") 11 | ``` 12 | 13 | ## Welcome to ggplot2 14 | 15 | ggplot2 is an R package for producing statistical, or data, graphics. 16 | Unlike most other graphics packages, ggplot2 has an underlying grammar, based on the Grammar of Graphics [@wilkinson:2006], that allows you to compose graphs by combining independent components. 17 | This makes ggplot2 powerful. 18 | Rather than being limited to sets of pre-defined graphics, you can create novel graphics that are tailored to your specific problem. 19 | While the idea of having to learn a grammar may sound overwhelming, ggplot2 is actually easy to learn: there is a simple set of core principles and there are very few special cases. 20 | The hard part is that it may take a little time to forget all the preconceptions that you bring over from using other graphics tools. 21 | 22 | ggplot2 provides beautiful, hassle-free plots that take care of fiddly details like drawing legends. 23 | In fact, its carefully chosen defaults mean that you can produce publication-quality graphics in seconds. 24 | However, if you do have special formatting requirements, ggplot2's comprehensive theming system makes it easy to do what you want. 25 | Ultimately, this means that rather than spending your time making your graph look pretty, you can instead focus on creating the graph that best reveals the message in your data. 26 | 27 | ggplot2 is designed to work iteratively. 28 | You start with a layer that shows the raw data. 29 | Then you add layers of annotations and statistical summaries. 30 | This allows you to produce graphics using the same structured thinking that you would use to design an analysis. 31 | This reduces the distance between the plot in your head and the one on the page. 32 | This is especially helpful for students who have not yet developed the structured approach to analysis used by experts. 33 | 34 | Learning the grammar will not only help you create graphics that you're familiar with, but will also help you to create newer, better graphics. 35 | Without a grammar, there is no underlying theory, so most graphics packages are just a big collection of special cases. 36 | For example, in base R, if you design a new graphic, it's composed of raw plot elements like lines and points so it's hard to design new components that combine with existing plots. 37 | In ggplot2, the expressions used to create a new graphic are composed of higher-level elements, like representations of the raw data and statistical transformations, that can easily be combined with new datasets and other plots. 38 | 39 | This book provides a hands-on introduction to ggplot2 with lots of example code and graphics. 40 | It also explains the grammar on which ggplot2 is based. 41 | Like other formal systems, ggplot2 is useful even when you don't understand the underlying model. 42 | However, the more you learn about it, the more effectively you'll be able to use ggplot2. 43 | 44 | This book will introduce you to ggplot2 assuming that you're a novice, unfamiliar with the grammar; teach you the basics so that you can re-create plots you are already familiar with; show you how to use the grammar to create new types of graphics; and eventually turn you into an expert who can build new components to extend the grammar. 45 | 46 | ## What is the grammar of graphics? 47 | 48 | @wilkinson:2006 created the grammar of graphics to describe the fundamental features that underlie all statistical graphics. 49 | The grammar of graphics is an answer to the question of what is a statistical graphic? 50 | ggplot2 [@wickham:2007d] builds on Wilkinson's grammar by focussing on the primacy of layers and adapting it for use in R. 51 | In brief, the grammar tells us that a graphic maps the data to the aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars). 52 | The plot may also include statistical transformations of the data and information about the plot's coordinate system. 53 | Facetting can be used to plot for different subsets of the data. 54 | The combination of these independent components are what make up a graphic. 55 | 56 | As the book progresses, the formal grammar will be explained in greater detail. 57 | The first description of the components follows below. 58 | It introduces some of the terminology that will be used throughout the book and outlines the basic function of each component. 59 | Don't worry if it doesn't make sense right away: you'll have many more opportunities to learn about the components and how they work together. 60 | 61 | All plots are composed of the **data**, the information you want to visualise, and a **mapping**, the description of how the data's variables are mapped to aesthetic attributes. 62 | There are five mapping components: 63 | 64 | - A **layer** is a collection of geometric elements and statistical transformations. 65 | Geometric elements, **geom**s for short, represent what you actually see in the plot: points, lines, polygons, etc. 66 | Statistical transformations, **stat**s for short, summarise the data: for example, binning and counting observations to create a histogram, or fitting a linear model. 67 | 68 | - **Scale**s map values in the data space to values in the aesthetic space. 69 | This includes the use of colour, shape or size. 70 | Scales also draw the legend and axes, which make it possible to read the original data values from the plot (an inverse mapping). 71 | 72 | - A **coord**, or coordinate system, describes how data coordinates are mapped to the plane of the graphic. 73 | It also provides axes and gridlines to help read the graph. 74 | We normally use the Cartesian coordinate system, but a number of others are available, including polar coordinates and map projections. 75 | 76 | - A **facet** specifies how to break up and display subsets of data as small multiples. 77 | This is also known as conditioning or latticing/trellising. 78 | 79 | - A **theme** controls the finer points of display, like the font size and background colour. 80 | While the defaults in ggplot2 have been chosen with care, you may need to consult other references to create an attractive plot. 81 | A good starting place is Tufte's early works [@tufte:1990; @tufte:1997; @tufte:2001]. 82 | 83 | It's also important to note what the grammar doesn't do: 84 | 85 | - It doesn't suggest which graphics to use. 86 | While this book endeavours to promote a sensible process for producing plots, the focus is on how to produce the plots you want, not on which plot to produce. 87 | For more advice on choosing or creating plots to answer the question you're interested in, you may want to consult @robbins:2004, @cleveland:1993, @chambers:1983, and @tukey:1977. 88 | 89 | - It doesn't describe interactive graphics, only static ones. 90 | There is essentially no difference between displaying ggplot2 graphs on a computer screen and printing them on a piece of paper. 91 | 92 | ## How does ggplot2 fit in with other R graphics? 93 | 94 | There are a number of other graphics systems available in R: base graphics, grid graphics and trellis/lattice graphics. 95 | How does ggplot2 differ from them? 96 | 97 | - Base graphics were written by Ross Ihaka based on experience implementing the S graphics driver and partly looking at @chambers:1983. 98 | Base graphics has a pen on paper model: you can only draw on top of the plot, you cannot modify or delete existing content. 99 | There is no (user accessible) representation of the graphics, apart from their appearance on the screen. 100 | Base graphics includes both tools for drawing primitives and entire plots. 101 | Base graphics functions are generally fast, but have limited scope. 102 | If you've created a single scatterplot, or histogram, or a set of boxplots in the past, you've probably used base graphics. 103 | \index{Base graphics} 104 | 105 | - The development of "grid" graphics, a much richer system of graphical primitives, started in 2000. 106 | Grid is developed by Paul Murrell, growing out of his PhD work [@murrell:1998]. 107 | Grid grobs (graphical objects) can be represented independently of the plot and modified later. 108 | A system of viewports (each containing its own coordinate system) makes it easier to lay out complex graphics. 109 | Grid provides drawing primitives, but no tools for producing statistical graphics. 110 | \index{grid} 111 | 112 | - The lattice package, developed by Deepayan Sarkar, uses grid graphics to implement the trellis graphics system of @cleveland:1993 and is a considerable improvement over base graphics. 113 | You can easily produce conditioned plots and some plotting details (e.g., legends) are taken care of automatically. 114 | However, lattice graphics lacks a formal model, which can make it hard to extend. 115 | Lattice graphics are explained in depth in @sarkar:2008. 116 | \index{lattice} 117 | 118 | - ggplot2, started in 2005, is an attempt to take the good things about base and lattice graphics and improve on them with a strong underlying model which supports the production of any kind of statistical graphic, based on the principles outlined above. 119 | The solid underlying model of ggplot2 makes it easy to describe a wide range of graphics with a compact syntax, and independent components make extension easy. 120 | Like lattice, ggplot2 uses grid to draw the graphics, which means you can exercise much low-level control over the appearance of the plot. 121 | 122 | - htmlwidgets, , provides a common framework for accessing web visualisation tools from R. 123 | Packages built on top of htmlwidgets include leaflet (, maps), dygraph (, time series) and networkD3 (, networks). 124 | \index{htmlwidgets} 125 | 126 | - plotly, , is a popular javascript visualisation toolkit with an R interface. 127 | It's a great tool if you want to make interactive graphics for HTML documents, and even comes with a `ggplotly()` function that can convert many ggplot2 graphics into their interactive equivalents. 128 | 129 | Many other R packages, such as vcd [@meyer:2006], plotrix [@plotrix] and gplots [@gplots], implement specialist graphics, but no others provide a framework for producing statistical graphics. 130 | A comprehensive list of all graphical tools available in other packages can be found in the graphics task view at . 131 | 132 | ## About this book 133 | 134 | The first chapter, @sec-getting-started, describes how to quickly get started using ggplot2 to make useful graphics. 135 | This chapter introduces several important ggplot2 concepts: geoms, aesthetic mappings and facetting. 136 | 137 | @sec-individual-geoms to @sec-arranging-plots explore how to use the basic toolbox to solve a wide range of visualisation problems that you're likely to encounter in practice. 138 | 139 | Then @sec-scale-position to @sec-scale-other show you how to control the most important scales, allowing you to tweak the details of axes and legends. 140 | 141 | In "The Grammar" we describe the layered grammar of graphics which underlies ggplot2. 142 | The theory is illustrated in @sec-layers which demonstrates how to add additional layers to your plot, exercising full control over the geoms and stats used within them. 143 | 144 | Understanding how scales work is crucial for fine-tuning the perceptual properties of your plot. 145 | Customising scales gives fine control over the exact appearance of the plot and helps to support the story that you are telling. 146 | @sec-scale-position, @sec-scale-colour and @sec-scale-other will show you what scales are available, how to adjust their parameters, and how to control the appearance of axes and legends. 147 | 148 | Coordinate systems and facetting control the position of elements of the plot. 149 | These are described in @sec-position. 150 | Faceting is a very powerful graphical tool as it allows you to rapidly compare different subsets of your data. 151 | Different coordinate systems are less commonly needed, but are very important for certain types of data. 152 | 153 | To polish your plots for publication, you will need to learn about the tools described in @sec-polishing. 154 | There you will learn about how to control the theming system of ggplot2 and how to save plots to disk. 155 | 156 | ## Prerequisites {#sec-prerequisites} 157 | 158 | \index{Installation} 159 | 160 | Before we continue, make sure you have all the software you need for this book: 161 | 162 | - **R**: If you don't have R installed already, you may be reading the wrong book; we assume a basic familiarity with R throughout this book. 163 | If you'd like to learn how to use R, we'd recommend [*R for Data Science*](https://r4ds.had.co.nz/) which is designed to get you up and running with R with a minimum of fuss. 164 | 165 | - **RStudio**: RStudio is a free and open source integrated development environment (IDE) for R. 166 | While you can write and use R code with any R environment (including R GUI and [ESS](http://ess.r-project.org)), RStudio has some nice features specifically for authoring and debugging your code. 167 | We recommend giving it a try, but it's not required to be successful with ggplot2 or with this book. 168 | You can download RStudio Desktop from 169 | 170 | - **R packages**: This book uses a bunch of R packages. 171 | You can install them all at once by running: 172 | 173 | ```{r} 174 | #| echo: false 175 | #| cache: false 176 | deps <- desc::desc_get_deps() 177 | pkgs <- sort(deps$package[deps$type == "Imports"]) 178 | pkgs2 <- strwrap(paste(encodeString(pkgs, quote = '"'), collapse = ", "), exdent = 2) 179 | 180 | install <- paste0( 181 | "install.packages(c(\n ", 182 | paste(pkgs2, "\n", collapse = ""), 183 | "))" 184 | ) 185 | ``` 186 | 187 | ```{r} 188 | #| code: !expr install 189 | #| eval: false 190 | ``` 191 | 192 | ## Other resources {#sec-other-resources} 193 | 194 | This book teaches you the elements of ggplot2's grammar and how they fit together, but it does not document every function in complete detail. 195 | You will need additional documentation as your use of ggplot2 becomes more complex and varied. 196 | 197 | The best resource for specific details of ggplot2 functions and their arguments will always be the built-in documentation. 198 | This is accessible online, , and from within R using the usual help syntax. 199 | The advantage of the online documentation is that you can see all the example plots and navigate between topics more easily. 200 | 201 | If you use ggplot2 regularly, it's a good idea to sign up for the ggplot2 mailing list, . 202 | The list has relatively low traffic and is very friendly to new users. 203 | Another useful resource is stackoverflow, . 204 | There is an active ggplot2 community on stackoverflow, and many common questions have already been asked and answered. 205 | In either place, you're much more likely to get help if you create a minimal reproducible example. 206 | The [reprex](https://github.com/jennybc/reprex) package by Jenny Bryan provides a convenient way to do this, and also include advice on creating a good example. 207 | The more information you provide, the easier it is for the community to help you. 208 | 209 | The number of functions in ggplot2 can be overwhelming, but RStudio provides some great cheatsheets to jog your memory at . 210 | 211 | Finally, the complete source code for the book is available online at . 212 | This contains the complete text for the book, as well as all the code and data needed to recreate all the plots. 213 | 214 | ## Colophon 215 | 216 | This book was written in [RStudio](https://posit.co/products/open-source/rstudio/) using [bookdown](http://bookdown.org/). 217 | The [website](http://ggplot2-book.org/) is hosted with [netlify](http://netlify.com/), and automatically updated after every commit by [Github Actions](https://github.com/features/actions). 218 | The complete source is available from [GitHub](https://github.com/hadley/ggplot2-book). 219 | 220 | This version of the book was built with `r R.version.string` and the following packages: 221 | 222 | ```{r} 223 | #| echo: false 224 | #| results: asis 225 | pkgs <- sessioninfo::package_info(pkgs, dependencies = FALSE) 226 | df <- tibble( 227 | package = pkgs$package, 228 | version = pkgs$ondiskversion, 229 | source = gsub("@", "\\\\@", pkgs$source) 230 | ) 231 | knitr::kable(df, format = "markdown") 232 | ``` 233 | -------------------------------------------------------------------------------- /mi_raster.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/mi_raster.rds -------------------------------------------------------------------------------- /networks.qmd: -------------------------------------------------------------------------------- 1 | # Networks {#sec-networks} 2 | 3 | ```{r} 4 | #| echo: false 5 | #| message: false 6 | #| results: asis 7 | source("common.R") 8 | status("drafting") 9 | ``` 10 | 11 | Like maps and spatial data, networks and graphs occupy a special part of the visualization landscape, but whereas spatial data mostly differ from regular plotting in their use of projections, networks bring their own data structure as well as their own visualization paradigms to the table. 12 | Because of these complications networks are not directly supported in ggplot2. 13 | Several efforts over the years have tried to add this missing piece, and in this chapter, we will see how to use [ggraph](https://ggraph.data-imaginist.com) for network visualization. 14 | Other packages that offer some of the same functionality includes [geomnet](http://sctyner.github.io/geomnet/), [ggnetwork](https://briatte.github.io/ggnetwork/), and [GGally](https://ggobi.github.io/ggally/) for regular network plots, and [ggtree](https://github.com/YuLab-SMU/ggtree) and [ggdendro](http://andrie.github.io/ggdendro/) for tree visualization specifically. 15 | 16 | ## What is network data? 17 | 18 | Networks (or graphs as their mathematical concept is called) are data that consists of entities (*nodes* or *vertices*) and their relation (*edges* or *links*). 19 | Both nodes and edges can have additional data attached, and edges can furthermore be considered directed or undirected depending on the nature of the connection (a network encoding mutual friendship would have undirected edges, whereas an ancestor network will have directed edges because *child-of* is not a symmetrical relation). 20 | 21 | The nature of network data means that it is not readily representable in a single data frame, which is one of the key complications to using it with ggplot2. 22 | However, it can be encoded as two interrelated data frames, one encoding the nodes, and one encoding the edges. 23 | This is the approach used in tidygraph, which is the data-manipulation package underlying ggraph. 24 | To make better use of ggraph it is thus beneficial to understand a little about tidygraph. 25 | 26 | ### A tidy network manipulation API 27 | 28 | tidygraph can be considered first and foremost a dplyr API for network data, allowing the same semantics for manipulating networks as is known from dplyr. 29 | An example of this can be seen below where we create a random graph using the Erdős-Rényi sampling method, assign a random label to the nodes, and sort the edges based on the label of their source node. 30 | 31 | ```{r} 32 | library(tidygraph) 33 | 34 | graph <- play_erdos_renyi(n = 10, p = 0.2) %>% 35 | activate(nodes) %>% 36 | mutate(class = sample(letters[1:4], n(), replace = TRUE)) %>% 37 | activate(edges) %>% 38 | arrange(.N()$class[from]) 39 | 40 | graph 41 | ``` 42 | 43 | While `mutate()`, `arrange()`, and `n()` are well-known, we can see some new functions that needs explaining: `activate()` informs tidygraph on which part of the network you want to work on, either `nodes` or `edges`. 44 | Further, we see the use of `.N()` which gives access to the node data of the current graph even when working with the edges (there's also a corresponding `.E()` function to access the edge data, and `.G()` to access the whole graph). 45 | 46 | ### Conversion 47 | 48 | Network data is often presented in a range of different formats depending on where you get it from. 49 | tidygraph understands most of the different classes used in R for network data and these can be converted using `as_tbl_graph()`. 50 | Below is an example of converting a data frame encoded as an edgelist, as well as converting the result of `hclust()`. 51 | 52 | ```{r} 53 | data(highschool, package = "ggraph") 54 | head(highschool) 55 | ``` 56 | 57 | ```{r} 58 | hs_graph <- as_tbl_graph(highschool, directed = FALSE) 59 | hs_graph 60 | ``` 61 | 62 | ```{r} 63 | luv_clust <- hclust(dist(luv_colours[, 1:3])) 64 | luv_graph <- as_tbl_graph(luv_clust) 65 | luv_graph 66 | ``` 67 | 68 | We can see that tidygraph automatically adds additional information when converting, e.g. the *year* column in the highschool data, and the *label* and *leaf* property of the nodes in the hierarchical clustering. 69 | 70 | ### Algorithms 71 | 72 | While simply manipulating networks is nice, the real benefit of networks comes from the different operations that can be performed on them using the underlying structure. 73 | tidygraph has rich support for a range of different groups of algorithms such as centrality calculation (which node is most central), ranking (order nodes so nodes are located close to those they are connected to), grouping (finding clusters inside the network), etc. 74 | The algorithm API is designed to be used inside `mutate()` and will always return a vector with length and order matching the nodes or edges. 75 | Further, it does not require you to specify the graph or nodes you want to calculate for since this is given implicitly in the `mutate()` call. 76 | As an example, we will calculate the centrality of the nodes in our sample graph using the PageRank algorithm and sort the nodes according to that: 77 | 78 | ```{r} 79 | graph %>% 80 | activate(nodes) %>% 81 | mutate(centrality = centrality_pagerank()) %>% 82 | arrange(desc(centrality)) 83 | ``` 84 | 85 | ### Want more? 86 | 87 | This is just a brief glimpse into tidygraph, for the sake of understanding ggraph. 88 | If you are interested in learning more, the tidygraph website gives an overview of all the functionalities in the package: 89 | 90 | ## Visualizing networks 91 | 92 | ggraph builds on top of tidygraph and ggplot2 to allow a complete and familiar grammar of graphics for network data. 93 | Still, it is a little different from most ggplot2 extension packages since it works with another data type that is fundamentally different from tabular data. 94 | More so, most network visualizations don't concern themselves with mapping variables to `x` and `y` aesthetics since they are concerned with showing the network topology more than relations between two variables. 95 | In order to show network topology, the concept of layouts are employed. 96 | Layouts are algorithms that use the network structure to calculate (often arbitrary) `x` and `y` values for each node that can then be used for visualization purposes. 97 | To put it in another way, when plotting tabular data the `x` and `y` aesthetics are almost always mapped to existing variables in the data (or statistical transformations of existing data) whereas when plotting network data `x` and `y` are mapped to values derived from the topology of the network and which are by themselves meaningless. 98 | 99 | ### Setting up the visualization 100 | 101 | Whereas a normal ggplot2 plot is initialized with a `ggplot()` call, a ggraph plot is initialized with a `ggraph()` call. 102 | The first argument is the data, which can be a tbl_graph or any object convertible to one. 103 | The second argument is a layout function and any further arguments will be passed on to that function. 104 | The default layout will choose an appropriate layout based on the type of graph you provide, but while it is often a decent starting point you should always take control and explore the different layouts available --- networks are notorious for their ability to show non-existing or exaggerated relations in some layouts. 105 | There are more to layouts than described in this section. 106 | The [Getting Started guide to layouts](https://ggraph.data-imaginist.com/articles/Layouts.html) will tell you even more and showcase all the different layouts provided by ggraph. 107 | 108 | #### Specifying a layout 109 | 110 | The layout argument can either take a string or a function. 111 | If a string is provided, the name will be matched to one of the build in layouts (of which there are many). 112 | If a function is provided it is assumed that the function takes a tbl_graph and returns a data frame with at least an x and y column and with the same number of rows as there are nodes in the input graph. 113 | Below we can see examples of using the default layout, specifying a specific layout, and providing arguments to the layout (which are evaluated in the context of the input graph): 114 | 115 | ```{r} 116 | library(ggraph) 117 | ggraph(hs_graph) + 118 | geom_edge_link() + 119 | geom_node_point() 120 | ``` 121 | 122 | ```{r} 123 | ggraph(hs_graph, layout = "drl") + 124 | geom_edge_link() + 125 | geom_node_point() 126 | ``` 127 | 128 | ```{r} 129 | hs_graph <- hs_graph %>% 130 | activate(edges) %>% 131 | mutate(edge_weights = runif(n())) 132 | ggraph(hs_graph, layout = "stress", weights = edge_weights) + 133 | geom_edge_link(aes(alpha = edge_weights)) + 134 | geom_node_point() + 135 | scale_edge_alpha_identity() 136 | ``` 137 | 138 | In order to show the graph above we're using the `geom_edge_link()` and `geom_node_point()` functions, and while we have not yet discussed these they do exactly what you may imagine: drawing nodes as points and edges as straight lines. 139 | 140 | #### Circularity 141 | 142 | Some layouts may be used in both a linear and circular version. 143 | The correct way to change this in ggplot2 would be to use `coord_polar()` to change the coordinate system, but since we only want to change the position of nodes in the layout, and not affect the edges, this is a function of the layout. 144 | The following can show the difference: 145 | 146 | ```{r} 147 | ggraph(luv_graph, layout = "dendrogram", circular = TRUE) + 148 | geom_edge_link() + 149 | coord_fixed() 150 | ``` 151 | 152 | ```{r} 153 | ggraph(luv_graph, layout = "dendrogram") + 154 | geom_edge_link() + 155 | coord_polar() + 156 | scale_y_reverse() 157 | ``` 158 | 159 | As we can see, using `coord_polar()` will bend our edges, which is hardly a desirable choice. 160 | 161 | ### Drawing nodes 162 | 163 | Of the two types of data stored in a graph, nodes are by far the ones that is most alike to what we are used to plotting. 164 | After all, they are often shown as points in very much the same way as observations are displayed in a scatter plot. 165 | While conceptually simple we still won't cover everything there is to know about nodes, so as with layouts the interested reader is directed towards the [Getting Started guide to nodes](https://ggraph.data-imaginist.com/articles/Nodes.html) to learn more. 166 | All the node drawing geoms in ggraph are prefixed with `geom_node_` and the one you are most likely to use the most is `geom_node_point()`. 167 | While it may superficially look a lot like `geom_point()` it has some additional features that it shares with all node and edge geoms. 168 | First, you don't have to specify the `x` and `y` aesthetics. 169 | These are given by the layout and their mapping is implicit. 170 | Second, you have access to a `filter` aesthetic that allows you to turn off the drawing of specific nodes. 171 | Third, you may use any tidygraph algorithms inside the `aes()` and they will get evaluated on the graph being visualized. 172 | To see this in action we plot our highschool graph again, but this time only showing nodes with more than 2 connections, and colored by their power centrality: 173 | 174 | ```{r} 175 | ggraph(hs_graph, layout = "stress") + 176 | geom_edge_link() + 177 | geom_node_point( 178 | aes(filter = centrality_degree() > 2, 179 | colour = centrality_power()), 180 | size = 4 181 | ) 182 | ``` 183 | 184 | Being able to use algorithms directly inside the visualization code is a powerful way to iterate on your visualization as you don't need to go back and change the input graph. 185 | 186 | Apart from points, there are more specialized geoms, many tied to a specific type of layout. 187 | If one wishes to draw a treemap the `geom_node_tile()` is needed: 188 | 189 | ```{r} 190 | ggraph(luv_graph, layout = "treemap") + 191 | geom_node_tile(aes(fill = depth)) 192 | ``` 193 | 194 | ### Drawing edges 195 | 196 | Edge geoms have a lot more bells and whistles than node geoms, mainly because there are so many different ways one can connect two things. 197 | There is no way to cover it all, both in terms of the different types of geoms, as well as the common functionality they have. 198 | The [Getting Started guide to edges](https://ggraph.data-imaginist.com/articles/Edges.html) will give a complete overview. 199 | 200 | We have already seen `geom_edge_link()` in action, which draws a straight line between the connected nodes, but it can do more than we've seen already. 201 | Under the hood it will split up the line in a bunch of small fragments and it is possible to use that to draw a gradient along the edge, e.g. to show direction: 202 | 203 | ```{r} 204 | ggraph(graph, layout = "stress") + 205 | geom_edge_link(aes(alpha = after_stat(index))) 206 | ``` 207 | 208 | If you are drawing a lot of edges this expansion might become prohibitively time consuming and ggraph provides a `0` suffixed version that draws it as a simple geom (and doesn't allow you to draw gradients). 209 | Further, for the special case where you want to interpolate between two values at the end points (e.g. variables on the nodes) a `2` suffixed version exist as well: 210 | 211 | ```{r} 212 | ggraph(graph, layout = "stress") + 213 | geom_edge_link2( 214 | aes(colour = node.class), 215 | width = 3, 216 | lineend = "round") 217 | ``` 218 | 219 | The use of the `node.class` variable might surprise you. 220 | Edge geoms have access to the variables of the terminal nodes through specially prefixed variables. 221 | For the standard and `0` version these are available through `node1.` and `node2.` prefixed variables, and for the `2` version they are available through `node.` prefixed variables (as used above). 222 | The three versions of edge geoms are common to all edge geom types, not just `geom_edge_link()`. 223 | 224 | There are more ways to draw edges than simple straight lines. 225 | Some are specific to trees or specific layouts, but many are general purpose. 226 | One specific use-case for another edge type is when you have multiple edges running between the same nodes. 227 | Drawing them as straight lines will obscure the multiplicity of the edges, which is e.g. apparent with the highschool graph where multiple parallel edges are present but invisible in the plots above. 228 | In order to show parallel edges you can either use `geom_edge_fan()` or `geom_edge_parallel()`: 229 | 230 | ```{r} 231 | ggraph(hs_graph, layout = "stress") + 232 | geom_edge_fan() 233 | ``` 234 | 235 | ```{r} 236 | ggraph(hs_graph, layout = "stress") + 237 | geom_edge_parallel() 238 | ``` 239 | 240 | It is clear that these geoms should only be used for relatively simple graphs since they increase the amount of clutter and overplotting in the plot. 241 | Looking at trees and specifically dendrograms, one commonly used edge type is the elbow edge: 242 | 243 | ```{r} 244 | ggraph(luv_graph, layout = "dendrogram", height = height) + 245 | geom_edge_elbow() 246 | ``` 247 | 248 | `geom_edge_bend()` and `geom_edge_diagonal()` are smoother versions of this. 249 | 250 | #### Clipping edges around the nodes 251 | 252 | A common issue, especially when using arrows to show directionality of edges, is that the node will overlap the edge because it runs to the center of the node, not the edge of the point showing the node. 253 | This can be seen below: 254 | 255 | ```{r} 256 | ggraph(graph, layout = "stress") + 257 | geom_edge_link(arrow = arrow()) + 258 | geom_node_point(aes(colour = class), size = 8) 259 | ``` 260 | 261 | Obviously, we would like the edges to stop before they reach the point so that the arrow is not obscured. 262 | This is possible in ggraph using the `start_cap` and `end_cap` aesthetics which allow you to specify a clipping region around the terminal nodes. 263 | To fix the above plot we would set a circular clipping region of the correct size around each node: 264 | 265 | ```{r} 266 | ggraph(graph, layout = "stress") + 267 | geom_edge_link( 268 | arrow = arrow(), 269 | start_cap = circle(5, "mm"), 270 | end_cap = circle(5, "mm") 271 | ) + 272 | geom_node_point(aes(colour = class), size = 8) 273 | ``` 274 | 275 | #### An edge is not always a line 276 | 277 | While it is natural to think of edges as different kinds of lines connecting points, this is only true for certain network plot types. 278 | One should always be mindful that nodes and edges are abstract concepts and can be visualized in a multitude of ways. 279 | As an example of this we can look at matrix plots which show nodes implicitly by row and column position, and show edges as points or tiles. 280 | 281 | ```{r} 282 | ggraph(hs_graph, layout = "matrix", sort.by = node_rank_traveller()) + 283 | geom_edge_point() 284 | ``` 285 | 286 | ### Faceting 287 | 288 | Faceting is not a concept often applied to network visualization, but it is just as powerful for networks as it is for tabular data. 289 | While the standard faceting functions in ggplot2 do technically work with ggraph, they do not on a conceptual level, since nodes and edges are connected and splitting nodes on multiple subplots will automatically move edges with them even though the edges do not have the faceting variable in their data. 290 | Because of this, ggraph provides its own specialized versions of `facet_wrap()` and `facet_grid()`. 291 | `facet_nodes()` and `facet_edges()` are will target either nodes or edges and wrap the panels in the same manner as `facet_wrap()`. 292 | For `facet_nodes()` the convention is that if an edge goes between two nodes in the same panel it will be shown in that panel, but if it is split between multiple panels it will be removed. 293 | For `facet_edges()` nodes will be repeated in all panels. 294 | To see it in action we can look at our highschool graph and see how their friendships have evolved over the years. 295 | 296 | ```{r} 297 | ggraph(hs_graph, layout = "stress") + 298 | geom_edge_link() + 299 | geom_node_point() + 300 | facet_edges(~year) 301 | ``` 302 | 303 | As it becomes very clear with faceting, we see a clear evolution of the friendships going from two completely separate groups to a more mixed single group. 304 | 305 | As faceting also accepts tidygraph algorithms it is a great way to evaluate e.g. the result of groupings on the fly. 306 | 307 | ```{r} 308 | ggraph(hs_graph, layout = "stress") + 309 | geom_edge_link() + 310 | geom_node_point() + 311 | facet_nodes(~ group_spinglass()) 312 | ``` 313 | 314 | The last included facet type is `facet_graph()` which works as `facet_grid()`, but allows you to specify which part the rows and columns should facet on, edges or nodes. 315 | 316 | ## Want more? 317 | 318 | This is just a taste of the possibilities presented in ggraph. 319 | If you want to dive deeper you may find the resources in and helpful. 320 | Understanding the tidygraph foundation and API will increase your mastery and understanding of ggraph, so make sure you study them in unison. 321 | -------------------------------------------------------------------------------- /preamble.tex: -------------------------------------------------------------------------------- 1 | \usepackage{booktabs} 2 | 3 | \usepackage{float} 4 | \usepackage{index} 5 | % index functions separately 6 | \newindex{code}{adx}{and}{R code index} 7 | \newcommand{\indexf}[1]{\index[code]{#1@\texttt{#1()}}} 8 | \newcommand{\indexc}[1]{\index[code]{#1@\texttt{#1}}} 9 | 10 | \DeclareGraphicsExtensions{.pdf,.png} 11 | 12 | % Place links in parens 13 | \renewcommand{\href}[2]{#2 (\url{#1})} 14 | % Use auto ref for internal links 15 | \let\oldhyperlink=\hyperlink 16 | \renewcommand{\hyperlink}[2]{\autoref{#1}} 17 | \def\chapterautorefname{Chapter} 18 | \def\sectionautorefname{Section} 19 | \def\subsectionautorefname{Section} 20 | \def\subsubsectionautorefname{Section} 21 | 22 | \setlength{\emergencystretch}{3em} % prevent overfull lines 23 | \vbadness=10000 % suppress underfull \vbox 24 | \hbadness=10000 % suppress underfull \vbox 25 | \hfuzz=10pt 26 | 27 | \makeindex 28 | -------------------------------------------------------------------------------- /preface-2e.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | engine: knitr 3 | --- 4 | 5 | # Preface to the second edition {.unnumbered} 6 | 7 | Welcome to the second edition of "ggplot2: elegant graphics for data analysis". 8 | I'm so excited to have an updated book that shows off all the latest and greatest ggplot2 features, as well as the great things that have been happening in R and in the ggplot2 community the last five years. 9 | The ggplot2 community is vibrant: the ggplot2 mailing list has over 7,000 members and there is a very active Stack Overflow community, with nearly 10,000 questions tagged with ggplot2. 10 | While most of my development effort is no longer going into ggplot2 (more on that below), there's never been a better time to learn it and use it. 11 | 12 | I am tremendously grateful for the success of ggplot2. 13 | It's one of the most commonly downloaded R packages (over a million downloads in the last year!) and has influenced the design of graphics packages for other languages. 14 | Personally, ggplot2 has bought me many exciting opportunities to travel the world and meet interesting people. 15 | I love hearing how people are using R and ggplot2 to understand the data that they care about. 16 | 17 | A big thanks for this edition goes to Carson Sievert, who helped me modernise the code, including converting the sources to R Markdown. 18 | He also updated many of the examples and helped me proofread the book. 19 | 20 | ## Major changes {.unnumbered} 21 | 22 | I've spent a lot of effort ensuring that this edition is a true upgrade over the first. 23 | As well as updating the code everywhere to make sure it's fully compatible with the latest version of ggplot2, I have: 24 | 25 | - Shown much more code in the book, so it's easier to use as a reference. 26 | Overall the book has a more "knitr"-ish sensibility: there are fewer floating figures and tables, and more inline code. 27 | This makes the layout a little less pretty but keeps related items closer together. 28 | 29 | - Published the complete source online at . 30 | 31 | - Switched from `qplot()` to `ggplot()` in the introduction. 32 | Feedback indicated that `qplot()` was a crutch: it makes simple plots a little easier, but it doesn't help with mastering the grammar. 33 | 34 | - Added practice exercises throughout the book so you can practice new techniques immediately after learning about them. 35 | 36 | - Added pointers to the rich ecosystem of packages that have built up around ggplot2. 37 | You'll now see a number of other packages highlighted in the book, and get pointers to other packages I think are particularly useful. 38 | 39 | - Overhauled the toolbox chapter to cover all the new geoms. 40 | I've added a completely new section on text labels, since it's important and not covered in detail elsewhere. 41 | The mapping section has been considerably expanded to talk more about the different types of map data, and where you might find them. 42 | 43 | - Completely rewritten the scales chapter to focus on the most important tasks. 44 | It also discusses the new features that give finer control over legend appearance, and shows off some of the new scales added to ggplot2. 45 | 46 | - Split the data analysis chapter into three pieces: data tidying (with tidyr), data manipulation (with dplyr), and model visualisation (with broom). 47 | I discuss the latest iteration of my data manipulation tools, and introduce the fantastic broom package by David Robinson. 48 | 49 | The book is accompanied by a new version of ggplot2: version 2.0.0. 50 | This includes a number of minor tweaks and improvements, and considerable improvements to the documentation. 51 | Coming back to ggplot2 development after a considerable pause has helped me to see many problems that previously escaped notice. 52 | ggplot2 2.0.0 (finally!) contains an official extension mechanism so that others can contribute new ggplot2 components in their own packages. 53 | This is documented in a new vignette, `vignette("extending-ggplot2")`. 54 | 55 | ## The future {.unnumbered} 56 | 57 | ggplot2 is now stable, and is unlikely to change much in the future. 58 | There will be bug fixes and there may be new geoms, but there will be no large changes to how ggplot2 works. 59 | The next iteration of ggplot2 is ggvis. 60 | ggvis is significantly more ambitious because it aims to provide a grammar of *interactive* graphics. 61 | ggvis is still young, and lacks many of the features of ggplot2 (most notably it currently lacks faceting and has no way to make static graphics), but over the coming years the goal is to make ggvis better than ggplot2. 62 | 63 | The syntax of ggvis is a little different to ggplot2. 64 | You won't be able to trivially convert your ggplot2 plots to ggvis, but we think the cost is worth it: the new syntax is considerably more consistent, and will be easier for newcomers to learn. 65 | If you've mastered ggplot2, you'll find your skills transfer very well to ggvis and after struggling with the syntax for a while, it will start to feel quite natural. 66 | The important skills you learn when mastering ggplot2 are not the programmatic details of describing a plot in code, but the much harder challenge of thinking about how to turn data into effective visualisations. 67 | 68 | ## Acknowledgements {.unnumbered} 69 | 70 | Many people have contributed to this book with high-level structural insights, spelling and grammar corrections and bug reports. 71 | I'd particularly like to thank William E. J. Doane, Alexander Forrence, Devin Pastoor, David Robinson, and Guangchuang Yu, for their detailed technical reviews of the book. 72 | 73 | Many others have contributed over the (now quite long!) lifetime of ggplot2. 74 | I would like to thank: Leland Wilkinson, for discussions and comments that cemented my understanding of the grammar; Gabor Grothendieck, for early helpful comments; Heike Hofmann and Di Cook, for being great advisors and supporting the development of ggplot2 during my PhD; Charlotte Wickham; the students of stat480 and stat503 at ISU, for trying it out when it was very young; Debby Swayne, for masses of helpful feedback and advice; Bob Muenchen, Reinhold Kliegl, Philipp Pagel, Richard Stahlhut, Baptiste Auguie, Jean-Olivier Irisson, Thierry Onkelinx and the many others who have read draft versions of the book and given me feedback; and last, but not least, the members of R-help and the ggplot2 mailing list, for providing the many interesting and challenging graphics problems that have helped motivate this book. 75 | 76 | ```{block2} 77 | #| type: flushright 78 | #| html.tag: p 79 | Hadley Wickham 80 | September 2015 81 | ``` 82 | -------------------------------------------------------------------------------- /preface-3e.qmd: -------------------------------------------------------------------------------- 1 | # Preface to the third edition {#sec-preface-3e .unnumbered} 2 | 3 | Welcome to the third edition of "ggplot2: elegant graphics for data analysis". 4 | I'm so excited to have a new edition of the book updated for all the changes that have happened to ggplot2 in the last five years. 5 | I'm also excited to finally have an online version of the book, [\](https://ggplot2-book.org/){.uri}, thanks to a renegotiated contract with Springer. 6 | 7 | Since the last version of the book, the major change to ggplot2 itself is the growth in the contributor community. 8 | While I still lead the project and continue to care deeply about visualisation, I'm no longer involved in the day-to-day development of the package. 9 | At the time of writing, the core developers of ggplot2 are: 10 | 11 | - [Winston Chang](https://github.com/wch) 12 | - [Lionel Henry](https://github.com/lionel-) 13 | - [Thomas Lin Pedersen](https://github.com/thomasp85) 14 | - [Kohske Takahashi](https://github.com/kohske) 15 | - [Claus Wilke](https://github.com/clauswilke) 16 | - [Kara Woo](https://github.com/karawoo) 17 | - [Hiroaki Yutani](https://github.com/yutannihilation) 18 | - [Dewey Dunnington](https://github.com/paleolimbot) 19 | 20 | You can see an up-to-date list and how to become a core developer in the [ggplot2 governance document](https://github.com/tidyverse/ggplot2/blob/master/GOVERNANCE.md). 21 | 22 | ## Major changes {.unnumbered} 23 | 24 | - The *Data Analysis*, *Data Transformation*, and *Modelling for Visualisation* chapters have been removed so that the book can focus on visualisation. 25 | If you're looking for general advice on doing data science in R, we recommend [R for Data Science (2e)](https://r4ds.hadley.nz). 26 | 27 | - The *Toolbox* chapter has been expanded into six chapters that cover practical applications of layers. 28 | This includes more material on maps and annotations, and a new chapter that discusses how to arrange multiple plots on one page. 29 | 30 | - Similarly, the old *Scales, Axes, and Legends* chapter has been split into into four chapters. 31 | The first three cover the practical combination of scales and guides for the most common scale types, and the final chapter focusses on the underlying theory. 32 | 33 | - The old *Positioning* chapter has been split into new *Coordinate Systems* and *Faceting* chapters, giving more room for details on these important topics. 34 | 35 | - New chapters describe more about the internals of ggplot2, and how you can extend it in your own package. 36 | 37 | ## Acknowledgements {.unnumbered} 38 | 39 | This edition of the book was made possible by two new co-authors: Danielle Navarro and Thomas Lin Pedersen. 40 | Danielle contributed most of the new material in the layers and scales chapters, and Thomas contributed new chapters on arranging plots (using his patchwork package), and on how to extend ggplot2. 41 | 42 | This book was written in the open and chapters were advertised on twitter when complete. 43 | It is truly a community effort: many people read drafts, fixed typos, suggested improvements, and contributed content. 44 | Without those contributors, the book wouldn't be nearly as good as it is, and I'm deeply grateful for their help. 45 | 46 | ```{r} 47 | #| eval: false 48 | #| echo: false 49 | library(tidyverse) 50 | contribs_all_json <- gh::gh("/repos/:owner/:repo/contributors", 51 | owner = "hadley", 52 | repo = "ggplot2-book", 53 | .limit = Inf 54 | ) 55 | contribs_all <- tibble( 56 | login = contribs_all_json %>% map_chr("login"), 57 | n = contribs_all_json %>% map_int("contributions") 58 | ) 59 | 60 | contribs_old <- read_csv("contributors.csv", col_types = list()) 61 | contribs_new <- contribs_all %>% anti_join(contribs_old, by = "login") 62 | 63 | # Get info for new contributors 64 | needed_json <- map( 65 | contribs_new$login, 66 | ~ gh::gh("/users/:username", username = .x) 67 | ) 68 | info_new <- tibble( 69 | login = contribs_new$login, 70 | name = map_chr(needed_json, "name", .default = NA), 71 | blog = map_chr(needed_json, "blog", .default = NA) 72 | ) 73 | info_old <- contribs_old %>% select(login, name, blog) 74 | info_all <- bind_rows(info_old, info_new) 75 | 76 | contribs_all <- contribs_all %>% 77 | left_join(info_all, by = "login") %>% 78 | arrange(login) 79 | write_csv(contribs_all, "contributors.csv") 80 | ``` 81 | 82 | ```{r} 83 | #| results: asis 84 | #| echo: false 85 | #| message: false 86 | library(dplyr) 87 | contributors <- read.csv("contributors.csv", stringsAsFactors = FALSE) 88 | contributors <- contributors %>% 89 | filter(login != "hadley") %>% 90 | mutate( 91 | login = paste0("\\@", login), 92 | desc = ifelse(is.na(name), login, paste0(name, " (", login, ")")) 93 | ) 94 | 95 | cat("A big thank you to all ", nrow(contributors), " people who contributed specific improvements via GitHub pull requests (in alphabetical order by username): ", sep = "") 96 | cat(paste0(contributors$desc, collapse = ", ")) 97 | cat(".\n") 98 | ``` 99 | -------------------------------------------------------------------------------- /programming.qmd: -------------------------------------------------------------------------------- 1 | # Programming with ggplot2 {#sec-programming} 2 | 3 | ```{r} 4 | #| echo: false 5 | #| message: false 6 | #| results: asis 7 | source("common.R") 8 | status("drafting") 9 | ``` 10 | 11 | ## Introduction 12 | 13 | A major requirement of a good data analysis is flexibility. 14 | If your data changes, or you discover something that makes you rethink your basic assumptions, you need to be able to easily change many plots at once. 15 | The main inhibitor of flexibility is code duplication. 16 | If you have the same plotting statement repeated over and over again, you'll have to make the same change in many different places. 17 | Often just the thought of making all those changes is exhausting! 18 | This chapter will help you overcome that problem by showing you how to program with ggplot2. 19 | \index{Programming} 20 | 21 | To make your code more flexible, you need to reduce duplicated code by writing functions. 22 | When you notice you're doing the same thing over and over again, think about how you might generalise it and turn it into a function. 23 | If you're not that familiar with how functions work in R, you might want to brush up your knowledge at . 24 | 25 | In this chapter we'll show how to write functions that create: 26 | 27 | - A single ggplot2 component. 28 | - Multiple ggplot2 components. 29 | - A complete plot. 30 | 31 | And then we'll finish off with a brief illustration of how you can apply functional programming techniques to ggplot2 objects. 32 | 33 | You might also find the [cowplot](https://github.com/wilkelab/cowplot) and [ggthemes](https://github.com/jrnold/ggthemes) packages helpful. 34 | As well as providing reusable components that help you directly, you can also read the source code of the packages to figure out how they work. 35 | 36 | ## Single components 37 | 38 | Each component of a ggplot plot is an object. 39 | Most of the time you create the component and immediately add it to a plot, but you don't have to. 40 | Instead, you can save any component to a variable (giving it a name), and then add it to multiple plots: 41 | 42 | ```{r} 43 | #| label: layer9 44 | #| layout-ncol: 2 45 | #| fig-width: 4 46 | bestfit <- geom_smooth( 47 | method = "lm", 48 | se = FALSE, 49 | colour = alpha("steelblue", 0.5), 50 | linewidth = 2 51 | ) 52 | ggplot(mpg, aes(cty, hwy)) + 53 | geom_point() + 54 | bestfit 55 | ggplot(mpg, aes(displ, hwy)) + 56 | geom_point() + 57 | bestfit 58 | ``` 59 | 60 | That's a great way to reduce simple types of duplication (it's much better than copying-and-pasting!), but requires that the component be exactly the same each time. 61 | If you need more flexibility, you can wrap these reusable snippets in a function. 62 | For example, we could extend our `bestfit` object to a more general function for adding lines of best fit to a plot. 63 | The following code creates a `geom_lm()` with three parameters: the model `formula`, the line `colour` and the `linewidth`: 64 | 65 | ```{r} 66 | #| label: geom-lm 67 | #| layout-ncol: 2 68 | #| fig-width: 4 69 | geom_lm <- function(formula = y ~ x, colour = alpha("steelblue", 0.5), 70 | linewidth = 2, ...) { 71 | geom_smooth(formula = formula, se = FALSE, method = "lm", colour = colour, 72 | linewidth = linewidth, ...) 73 | } 74 | ggplot(mpg, aes(displ, 1 / hwy)) + 75 | geom_point() + 76 | geom_lm() 77 | ggplot(mpg, aes(displ, 1 / hwy)) + 78 | geom_point() + 79 | geom_lm(y ~ poly(x, 2), linewidth = 1, colour = "red") 80 | ``` 81 | 82 | Pay close attention to the use of "`...`". 83 | When included in the function definition "`...`" allows a function to accept arbitrary additional arguments. 84 | Inside the function, you can then use "`...`" to pass those arguments on to another function. 85 | Here we pass "`...`" onto `geom_smooth()` so the user can still modify all the other arguments we haven't explicitly overridden. 86 | When you write your own component functions, it's a good idea to always use "`...`" in this way. 87 | \indexc{...} 88 | 89 | Finally, note that you can only *add* components to a plot; you can't modify or remove existing objects. 90 | 91 | ### Exercises 92 | 93 | 1. Create an object that represents a pink histogram with 100 bins. 94 | 95 | 2. Create an object that represents a fill scale with the Blues ColorBrewer palette. 96 | 97 | 3. Read the source code for `theme_grey()`. 98 | What are its arguments? 99 | How does it work? 100 | 101 | 4. Create `scale_colour_wesanderson()`. 102 | It should have a parameter to pick the palette from the wesanderson package, and create either a continuous or discrete scale. 103 | 104 | ## Multiple components 105 | 106 | It's not always possible to achieve your goals with a single component. 107 | Fortunately, ggplot2 has a convenient way of adding multiple components to a plot in one step with a list. 108 | The following function adds two layers: one to show the mean, and one to show its 95% confidence interval: 109 | 110 | ```{r} 111 | #| label: geom-mean-1 112 | #| warning: false 113 | #| layout-ncol: 2 114 | #| fig-width: 4 115 | geom_mean <- function() { 116 | list( 117 | stat_summary(fun = "mean", geom = "bar", fill = "grey70"), 118 | stat_summary(fun.data = "mean_cl_normal", geom = "errorbar", width = 0.4) 119 | ) 120 | } 121 | ggplot(mpg, aes(class, cty)) + geom_mean() 122 | ggplot(mpg, aes(drv, cty)) + geom_mean() 123 | ``` 124 | 125 | If the list contains any `NULL` elements, they're ignored. 126 | This makes it easy to conditionally add components: 127 | 128 | ```{r} 129 | #| label: geom-mean-2 130 | #| warning: false 131 | #| layout-ncol: 2 132 | #| fig-width: 4 133 | geom_mean <- function(se = TRUE) { 134 | list( 135 | stat_summary(fun = "mean", geom = "bar", fill = "grey70"), 136 | if (se) 137 | stat_summary(fun.data = "mean_cl_normal", geom = "errorbar", width = 0.4) 138 | ) 139 | } 140 | 141 | ggplot(mpg, aes(drv, cty)) + geom_mean() 142 | ggplot(mpg, aes(drv, cty)) + geom_mean(se = FALSE) 143 | ``` 144 | 145 | ### Plot components 146 | 147 | You're not just limited to adding layers in this way. 148 | You can also include any of the following object types in the list: 149 | 150 | - A data.frame, which will override the default dataset associated with the plot. 151 | (If you add a data frame by itself, you'll need to use `%+%`, but this is not necessary if the data frame is in a list.) 152 | 153 | - An `aes()` object, which will be combined with the existing default aesthetic mapping. 154 | 155 | - Scales, which override existing scales, with a warning if they've already been set by the user. 156 | 157 | - Coordinate systems and faceting specification, which override the existing settings. 158 | 159 | - Theme components, which override the specified components. 160 | 161 | ### Annotation 162 | 163 | It's often useful to add standard annotations to a plot. 164 | In this case, your function will also set the data in the layer function, rather than inheriting it from the plot. 165 | There are two other options that you should set when you do this. 166 | These ensure that the layer is self-contained: \index{Annotation!functions} 167 | 168 | - `inherit.aes = FALSE` prevents the layer from inheriting aesthetics from the parent plot. 169 | This ensures your annotation works regardless of what else is on the plot. 170 | \indexc{inherit.aes} 171 | 172 | - `show.legend = FALSE` ensures that your annotation won't appear in the legend. 173 | \indexc{show.legend} 174 | 175 | One example of this technique is the `borders()` function built into ggplot2. 176 | It's designed to add map borders from one of the datasets in the maps package: \indexf{borders} 177 | 178 | ```{r} 179 | borders <- function(database = "world", regions = ".", fill = NA, 180 | colour = "grey50", ...) { 181 | df <- map_data(database, regions) 182 | geom_polygon( 183 | aes_(~long, ~lat, group = ~group), 184 | data = df, fill = fill, colour = colour, ..., 185 | inherit.aes = FALSE, show.legend = FALSE 186 | ) 187 | } 188 | ``` 189 | 190 | ### Additional arguments 191 | 192 | If you want to pass additional arguments to the components in your function, `...` is no good: there's no way to direct different arguments to different components. 193 | Instead, you'll need to think about how you want your function to work, balancing the benefits of having one function that does it all vs. the cost of having a complex function that's harder to understand. 194 | \indexc{...} 195 | 196 | To get you started, here's one approach using `modifyList()` and `do.call()`: \indexf{modifyList} \indexf{do.call} 197 | 198 | ```{r} 199 | #| layout-ncol: 2 200 | #| fig-width: 4 201 | geom_mean <- function(..., bar.params = list(), errorbar.params = list()) { 202 | params <- list(...) 203 | bar.params <- modifyList(params, bar.params) 204 | errorbar.params <- modifyList(params, errorbar.params) 205 | 206 | bar <- do.call("stat_summary", modifyList( 207 | list(fun = "mean", geom = "bar", fill = "grey70"), 208 | bar.params) 209 | ) 210 | errorbar <- do.call("stat_summary", modifyList( 211 | list(fun.data = "mean_cl_normal", geom = "errorbar", width = 0.4), 212 | errorbar.params) 213 | ) 214 | 215 | list(bar, errorbar) 216 | } 217 | 218 | ggplot(mpg, aes(class, cty)) + 219 | geom_mean( 220 | colour = "steelblue", 221 | errorbar.params = list(width = 0.5, linewidth = 1) 222 | ) 223 | ggplot(mpg, aes(class, cty)) + 224 | geom_mean( 225 | bar.params = list(fill = "steelblue"), 226 | errorbar.params = list(colour = "blue") 227 | ) 228 | ``` 229 | 230 | If you need more complex behaviour, it might be easier to create a custom geom or stat. 231 | You can learn about that in the extending ggplot2 vignette included with the package. 232 | Read it by running `vignette("extending-ggplot2")`. 233 | 234 | ### Exercises 235 | 236 | 1. To make the best use of space, many examples in this book hide the axes labels and legend. 237 | we've just copied-and-pasted the same code into multiple places, but it would make more sense to create a reusable function. 238 | What would that function look like? 239 | 240 | 2. Extend the `borders()` function to also add `coord_quickmap()` to the plot. 241 | 242 | 3. Look through your own code. 243 | What combinations of geoms or scales do you use all the time? 244 | How could you extract the pattern into a reusable function? 245 | 246 | ## Plot functions {#sec-functions} 247 | 248 | Creating small reusable components is most in line with the ggplot2 spirit: you can recombine them flexibly to create whatever plot you want. 249 | But sometimes you're creating the same plot over and over again, and you don't need that flexibility. 250 | Instead of creating components, you might want to write a function that takes data and parameters and returns a complete plot. 251 | \index{Plot functions} 252 | 253 | For example, you could wrap up the complete code needed to make a piechart: 254 | 255 | ```{r} 256 | piechart <- function(data, mapping) { 257 | ggplot(data, mapping) + 258 | geom_bar(width = 1) + 259 | coord_polar(theta = "y") + 260 | xlab(NULL) + 261 | ylab(NULL) 262 | } 263 | piechart(mpg, aes(factor(1), fill = class)) 264 | ``` 265 | 266 | This is much less flexible than the component-based approach, but equally, it's much more concise. 267 | Note that we were careful to return the plot object, rather than printing it. 268 | That makes it possible to add on other ggplot2 components. 269 | 270 | You can take a similar approach to drawing parallel coordinates plots (PCPs). 271 | PCPs require a transformation of the data, so we recommend writing two functions: one that does the transformation and one that generates the plot. 272 | Keeping these two pieces separate makes life much easier if you later want to reuse the same transformation for a different visualisation. 273 | \index{Parallel coordinate plots} 274 | 275 | ```{r} 276 | #| label: pcp_data 277 | #| layout-ncol: 2 278 | #| fig-width: 4 279 | pcp_data <- function(df) { 280 | is_numeric <- vapply(df, is.numeric, logical(1)) 281 | 282 | # Rescale numeric columns 283 | rescale01 <- function(x) { 284 | rng <- range(x, na.rm = TRUE) 285 | (x - rng[1]) / (rng[2] - rng[1]) 286 | } 287 | df[is_numeric] <- lapply(df[is_numeric], rescale01) 288 | 289 | # Add row identifier 290 | df$.row <- rownames(df) 291 | 292 | # Treat numerics as value (aka measure) variables 293 | # gather_ is the standard-evaluation version of gather, and 294 | # is usually easier to program with. 295 | tidyr::gather_(df, "variable", "value", names(df)[is_numeric]) 296 | } 297 | pcp <- function(df, ...) { 298 | df <- pcp_data(df) 299 | ggplot(df, aes(variable, value, group = .row)) + geom_line(...) 300 | } 301 | pcp(mpg) 302 | pcp(mpg, aes(colour = drv)) 303 | ``` 304 | 305 | ### Indirectly referring to variables 306 | 307 | The `piechart()` function above is a little unappealing because it requires the user to know the exact `aes()` specification that generates a pie chart. 308 | It would be more convenient if the user could simply specify the name of the variable to plot. 309 | To do that you'll need to learn a bit more about how `aes()` works. 310 | 311 | `aes()` uses tidy-evaluation: rather than looking at the values of its arguments, it looks at their expressions. 312 | This makes programming difficult because when you want it to refer to a variable provided in an argument it instead uses the name of the argument: 313 | 314 | ```{r} 315 | my_function <- function(x_var) { 316 | aes(x = x_var) 317 | } 318 | my_function(abc) 319 | ``` 320 | 321 | We resolve this using the standard technique for programming with tidy-evaluation: embracing. 322 | Embracing tells gglot2 to look "inside" the argument use its value, not its literal name: 323 | 324 | ```{r} 325 | my_function <- function(x_var) { 326 | aes(x = {{ x_var }}) 327 | } 328 | my_function(abc) 329 | ``` 330 | 331 | This makes it easy to update our piechart function: 332 | 333 | ```{r} 334 | piechart <- function(data, var) { 335 | ggplot(data, aes(factor(1), fill = {{ var }})) + 336 | geom_bar(width = 1) + 337 | coord_polar(theta = "y") + 338 | xlab(NULL) + 339 | ylab(NULL) 340 | } 341 | mpg |> piechart(class) 342 | ``` 343 | 344 | ### Exercises 345 | 346 | 1. Create a `distribution()` function specially designed for visualising continuous distributions. 347 | Allow the user to supply a dataset and the name of a variable to visualise. 348 | Let them choose between histograms, frequency polygons, and density plots. 349 | What other arguments might you want to include? 350 | 351 | 2. What additional arguments should `pcp()` take? 352 | What are the downsides of how `...` is used in the current code? 353 | 354 | ## Functional programming 355 | 356 | Since ggplot2 objects are just regular R objects, you can put them in a list. 357 | This means you can apply all of R's great functional programming tools. 358 | For example, if you wanted to add different geoms to the same base plot, you could put them in a list and use `lapply()`. 359 | \index{Functional programming} \indexf{lapply} 360 | 361 | ```{r} 362 | #| layout-ncol: 3 363 | #| fig-width: 3 364 | geoms <- list( 365 | geom_point(), 366 | geom_boxplot(aes(group = cut_width(displ, 1))), 367 | list(geom_point(), geom_smooth()) 368 | ) 369 | 370 | p <- ggplot(mpg, aes(displ, hwy)) 371 | lapply(geoms, function(g) p + g) 372 | ``` 373 | 374 | If you're not familiar with functional programming, read through and think about how you might apply the techniques to your duplicated plotting code. 375 | 376 | ### Exercises 377 | 378 | 1. How could you add a `geom_point()` layer to each element of the following list? 379 | 380 | ```{r} 381 | #| eval: false 382 | plots <- list( 383 | ggplot(mpg, aes(displ, hwy)), 384 | ggplot(diamonds, aes(carat, price)), 385 | ggplot(faithfuld, aes(waiting, eruptions, size = density)) 386 | ) 387 | ``` 388 | 389 | 2. What does the following function do? 390 | What's a better name for it? 391 | 392 | ```{r} 393 | #| eval: false 394 | mystery <- function(...) { 395 | Reduce(`+`, list(...), accumulate = TRUE) 396 | } 397 | 398 | mystery( 399 | ggplot(mpg, aes(displ, hwy)) + geom_point(), 400 | geom_smooth(), 401 | xlab(NULL), 402 | ylab(NULL) 403 | ) 404 | ``` 405 | -------------------------------------------------------------------------------- /references.qmd: -------------------------------------------------------------------------------- 1 | `r if (knitr::is_html_output()) ' # References {-} '` 2 | -------------------------------------------------------------------------------- /scales-other.qmd: -------------------------------------------------------------------------------- 1 | # Other aesthetics {#sec-scale-other} 2 | 3 | ```{r} 4 | #| echo: false 5 | #| message: false 6 | #| results: asis 7 | source("common.R") 8 | status("polishing") 9 | ``` 10 | 11 | In addition to position and colour, there are several other aesthetics that ggplot2 can use to represent data. In this chapter we'll look at size scales (@sec-scale-size), shape scales (@sec-scale-shape), line width scales (@sec-scale-linewidth), and line type scales (@sec-scale-linetype), which use visual features other than location and colour to represent data values. Additionally, we'll talk about manual scales (@sec-scale-manual) and identity scales (@sec-scale-identity): these don't necessarily use different visual features, but they construct data mappings in an unusual way. 12 | 13 | ## Size {#sec-scale-size} 14 | 15 | \index{Size} 16 | 17 | ```{r} 18 | #| echo: false 19 | planets <- data.frame( 20 | name = c( 21 | "Mercury", 22 | "Venus", 23 | "Earth", 24 | "Mars", 25 | "Jupiter", 26 | "Saturn", 27 | "Uranus", 28 | "Neptune" 29 | ), 30 | type = c(rep("Inner", 4), rep("Outer", 4)), 31 | position = 1:8, 32 | radius = c(2440, 6052, 6378, 3390, 71400, 60330, 25559, 24764), 33 | orbit = c( 34 | 57900000, 35 | 108200000, 36 | 149600000, 37 | 227900000, 38 | 778300000, 39 | 1427000000, 40 | 2871000000, 41 | 4497100000 42 | ) 43 | # mass = c(3.3022e+23, 4.8685e+24, 5.9736e+24, 6.4185e+23, 1.8986e+27, 5.6846e+26, 8.681e+25, 1.0243e+26) 44 | ) 45 | planets$name <- with(planets, factor(name, name)) 46 | ``` 47 | 48 | The size aesthetic is typically used to scale points and text. The default scale for size aesthetics is `scale_size()` in which a linear increase in the variable is mapped onto a linear increase in the area (not the radius) of the geom. Scaling as a function of area is a sensible default as human perception of size is more closely mimicked by area scaling than by radius scaling. By default the smallest value in the data (more precisely in the scale limits) is mapped to a size of 1 and the largest is mapped to a size of 6. The `range` argument allows you to scale the size of the geoms: 49 | 50 | ```{r} 51 | #| layout-ncol: 2 52 | #| fig-width: 4 53 | base <- ggplot(mpg, aes(displ, hwy, size = cyl)) + 54 | geom_point() 55 | 56 | base 57 | base + scale_size(range = c(1, 2)) 58 | ``` 59 | 60 | There are several size scales worth noting briefly: 61 | 62 | - `scale_size_area()` and `scale_size_binned_area()` are versions of `scale_size()` and `scale_size_binned()` that ensure that a value of 0 maps to an area of 0. 63 | 64 | - `scale_radius()` maps the data value to the radius rather than to the area (@sec-radius-scaling). 65 | 66 | - `scale_size_binned()` is a size scale that behaves like `scale_size()` but maps continuous values onto discrete size categories, analogous to the binned position and colour scales discussed in @sec-binned-position and @sec-binned-colour respectively. Legends associated with this scale are discussed in @sec-guide-bins. 67 | 68 | - `scale_size_date()` and `scale_size_datetime()` are designed to handle date data, analogous to the date scales discussed in @sec-date-scales. 69 | 70 | ### Radius size scales {#sec-radius-scaling} 71 | 72 | There are situations where area scaling is undesirable, and for such situations the `scale_radius()` function is provided. To illustrate when `scale_radius()` is appropriate consider a data set containing astronomical data that includes the radius of different planets: 73 | 74 | ```{r} 75 | planets 76 | ``` 77 | 78 | In this instance a plot that uses the size aesthetic to represent the radius of the planets should use `scale_radius()` rather than the default `scale_size()`. It is also important in this case to set the scale limits so that a planet with radius 0 would be drawn with a disc with radius 0. 79 | 80 | ```{r} 81 | #| layout-ncol: 2 82 | #| fig-width: 4 83 | base <- ggplot(planets, aes(1, name, size = radius)) + 84 | geom_point() + 85 | scale_x_continuous(breaks = NULL) + 86 | labs(x = NULL, y = NULL, size = NULL) 87 | 88 | base + ggtitle("not to scale") 89 | base + 90 | scale_radius(limits = c(0, NA), range = c(0, 10)) + 91 | ggtitle("to scale") 92 | ``` 93 | 94 | On the left it is difficult to distinguish Jupiter from Saturn, despite the fact that the difference between the two should be double the size of Earth; compare this to the plot on the right where the radius of Jupiter is visibly larger. 95 | 96 | ### Binned size scales {#sec-guide-bins} 97 | 98 | Binned size scales work similarly to binned scales for colour and position aesthetics (@sec-binned-colour and @sec-binned-position). One difference is how legends are displayed. The default legend for a binned size scale, and all binned scales except position and colour aesthetics, is governed by `guide_bins()`. For instance, in the `mpg` data we could use `scale_size_binned()` to create a binned version of the continuous variable `hwy`: 99 | 100 | ```{r} 101 | base <- ggplot(mpg, aes(displ, manufacturer, size = hwy)) + 102 | geom_point(alpha = .2) + 103 | scale_size_binned() 104 | 105 | base 106 | ``` 107 | 108 | Unlike `guide_legend()`, the guide created for a binned scale by `guide_bins()` does not organise the individual keys into a table. Instead they are arranged in a column (or row) along a single vertical (or horizontal) axis, which by default is displayed with its own axis. The important arguments to `guide_bins()` are listed below: 109 | 110 | - `axis` indicates whether the axis should be drawn (default is `TRUE`) 111 | 112 | ```{r} 113 | base + guides(size = guide_bins(axis = FALSE)) 114 | ``` 115 | 116 | - `direction` is a character string specifying the direction of the guide, either `"vertical"` (the default) or `"horizontal"` 117 | 118 | ```{r} 119 | base + guides(size = guide_bins(direction = "horizontal")) 120 | ``` 121 | 122 | - `show.limits` specifies whether tick marks are shown at the ends of the guide axis (default is FALSE) 123 | 124 | ```{r} 125 | base + guides(size = guide_bins(show.limits = TRUE)) 126 | ``` 127 | 128 | - `axis.colour`, `axis.linewidth` and `axis.arrow` are used to control the guide axis that is displayed alongside the legend keys 129 | 130 | ```{r} 131 | base + 132 | guides( 133 | size = guide_bins( 134 | axis.colour = "red", 135 | axis.arrow = arrow( 136 | length = unit(.1, "inches"), 137 | ends = "first", 138 | type = "closed" 139 | ) 140 | ) 141 | ) 142 | ``` 143 | 144 | - `keywidth`, `keyheight`, `reverse` and `override.aes` have the same behaviour for `guide_bins()` as they do for `guide_legend()` (see @sec-guide-legend) 145 | 146 | ## Shape {#sec-scale-shape} 147 | 148 | Values can be mapped to the shape aesthetic. The typical use for this is when you have a small number of discrete categories: if the data variable contains more than 6 values it becomes difficult to distinguish between shapes, and will produce a warning. The default `scale_shape()` function contains a single argument: set `solid = TRUE` (the default) to use a "palette" consisting of three solid shapes and three hollow shapes, or set `solid = FALSE` to use six hollow shapes: 149 | 150 | ```{r} 151 | #| layout-ncol: 2 152 | #| fig-width: 4 153 | base <- ggplot(mpg, aes(displ, hwy, shape = factor(cyl))) + 154 | geom_point() 155 | 156 | base 157 | base + scale_shape(solid = FALSE) 158 | ``` 159 | 160 | Although any one plot is unlikely to be readable with more than a 6 distinct markers, there are 25 possible shapes to choose from, each associated with an integer value: 161 | 162 | ```{r} 163 | #| echo: false 164 | df <- data.frame( 165 | shape = 1:25, 166 | x = (0:24) %% 13, 167 | y = 2 - floor((0:24) / 13) 168 | ) 169 | ggplot(df, aes(x, y, shape = shape)) + 170 | geom_point(size = 4) + 171 | geom_text(aes(label = shape), nudge_y = .3) + 172 | theme_void() + 173 | scale_shape_identity() + 174 | ylim(.8, 2.5) 175 | ``` 176 | 177 | You can specify the marker types for each data value manually using `scale_shape_manual()`: 178 | 179 | ```{r} 180 | base + 181 | scale_shape_manual( 182 | values = c("4" = 16, "5" = 17, "6" = 1, "8" = 2) 183 | ) 184 | ``` 185 | 186 | For more information about manual scales see @sec-scale-manual. 187 | 188 | ## Line width {#sec-scale-linewidth} 189 | 190 | The linewidth aesthetic, introduced in ggplot2 3.4.0, is used to control the width of lines. In earlier versions of ggplot2 the size aesthetic was used for this purpose, which caused some difficulty for complex geoms such as `geom_pointrange()` that contain both points and lines. For these geoms it's often important to be able to separately control the size of the points and the width of the lines. This is illustrated in the plots below. In the leftmost plot both the size and linewidth aesthetics are set at their default values. The middle plot increases the size of the points while leaving the linewidth unchanged, whereas the plot on the right increases the linewidth while leaving the point size unchanged. 191 | 192 | ```{r} 193 | #| layout-ncol: 3 194 | #| fig-width: 3 195 | base <- ggplot(airquality, aes(x = factor(Month), y = Temp)) 196 | 197 | base + geom_pointrange(stat = "summary", fun.data = "median_hilow") 198 | base + 199 | geom_pointrange( 200 | stat = "summary", 201 | fun.data = "median_hilow", 202 | size = 2 203 | ) 204 | base + 205 | geom_pointrange( 206 | stat = "summary", 207 | fun.data = "median_hilow", 208 | linewidth = 2 209 | ) 210 | ``` 211 | 212 | In practice you're most likely to set linewidth as a fixed parameter, as shown in the previous example, but it is a true aesthetic and can be mapped onto data values: 213 | 214 | ```{r} 215 | ggplot(airquality, aes(Day, Temp, group = Month)) + 216 | geom_line(aes(linewidth = Month)) + 217 | scale_linewidth(range = c(0.5, 3)) 218 | ``` 219 | 220 | Linewidth scales behave like size scales in most ways, but there are differences. As discussed earlier the default behaviour of a size scale is to increase linearly with the area of the plot marker (e.g., the diameter of a circular plot marker increases with the square root of the data value). In contrast, the linewidth increases linearly with the data value. 221 | 222 | Binned linewidth scales can be added using `scale_linewidth_binned()`. 223 | 224 | ## Line type {#sec-scale-linetype} 225 | 226 | It is possible to map a variable onto the linetype aesthetic in ggplot2. This works best for discrete variables with a small number of categories, and `scale_linetype()` is an alias for `scale_linetype_discrete()`. Continuous variables cannot be mapped to line types unless `scale_linetype_binned()` is used: although there is a `scale_linetype_continuous()` function, all it does is produce an error. To see why the linetype aesthetic is suited only to cases with a few categories, consider this plot: 227 | 228 | ```{r} 229 | ggplot(economics_long, aes(date, value01, linetype = variable)) + 230 | geom_line() 231 | ``` 232 | 233 | With five categories the plot is quite difficult to read, and it is unlikely you will want to use the linetype aesthetic for more than that. The default "palette" for linetype is supplied by the `scales::linetype_pal()` function, and includes the 13 linetypes shown below: 234 | 235 | ```{r} 236 | df <- data.frame(value = letters[1:13]) 237 | base <- ggplot(df, aes(linetype = value)) + 238 | geom_segment( 239 | mapping = aes(x = 0, xend = 1, y = value, yend = value), 240 | show.legend = FALSE 241 | ) + 242 | theme(panel.grid = element_blank()) + 243 | scale_x_continuous(NULL, NULL) 244 | 245 | base 246 | ``` 247 | 248 | You can control the line type by specifying a string with up to 8 hexadecimal values (i.e., from 0 to F). 249 | In this specification, the first value is the length of the first line segment, the second value is the length of the first space between segments, and so on. 250 | This allows you to specify your own line types using `scale_linetype_manual()`, or alternatively, by passing a custom function to the `palette` argument: 251 | 252 | ```{r} 253 | #| eval: false 254 | linetypes <- function(n) { 255 | types <- c( 256 | "55", 257 | "75", 258 | "95", 259 | "1115", 260 | "111115", 261 | "11111115", 262 | "5158", 263 | "9198", 264 | "c1c8" 265 | ) 266 | return(types[seq_len(n)]) 267 | } 268 | 269 | base + discrete_scale("linetype", palette = linetypes) 270 | ``` 271 | 272 | Note that the last four lines are blank, because the `linetypes()` function defined above returns `NA` when the number of categories exceeds 9. 273 | The `discrete_scale()` function contains a `na.value` argument used to specify what kind of line is plotted for these values. 274 | By default this produces a blank line, but you can override this by setting `na.value = "dotted"`: 275 | 276 | ```{r} 277 | #| eval: false 278 | base + discrete_scale("linetype", palette = linetypes) 279 | ``` 280 | 281 | Valid line types can be set using a human readable character string: `"blank"`, `"solid"`, `"dashed"`, `"dotted"`, `"dotdash"`, `"longdash"`, and `"twodash"` are all understood. 282 | 283 | ## Manual scales {#sec-scale-manual} 284 | 285 | Manual scales are just a list of valid values that are mapped to the unique discrete values. If you want to customise these scales, you need to create your own new scale with the "manual" version of each: `scale_linetype_manual()`, `scale_shape_manual()`, `scale_colour_manual()`, etc. The manual scale has one important argument, `values`, where you specify the values that the scale should produce if this vector is named, it will match the values of the output to the values of the input; otherwise it will match in order of the levels of the discrete variable. You will need some knowledge of the valid aesthetic values, which are described in `vignette("ggplot2-specs")`. \index{Shape} \index{Line type} \indexf{scale\_shape\_manual} \indexf{scale\_colour\_manual} \indexf{scale\_linetype\_manual} 286 | 287 | Manual scales have appeared earlier, in @sec-manual-colour and @sec-scale-shape. In this example we'll show a creative use of `scale_colour_manual()` to display multiple variables on the same plot and show a useful legend. In most plotting systems, you'd colour the lines and then add a legend: \index{Data!longitudinal} 288 | 289 | ```{r} 290 | #| label: huron 291 | huron <- data.frame(year = 1875:1972, level = as.numeric(LakeHuron)) 292 | ggplot(huron, aes(year)) + 293 | geom_line(aes(y = level + 5), colour = "red") + 294 | geom_line(aes(y = level - 5), colour = "blue") 295 | ``` 296 | 297 | That doesn't work in ggplot because there's no way to add a legend manually. Instead, give the lines informative labels: 298 | 299 | ```{r} 300 | #| label: huron2 301 | ggplot(huron, aes(year)) + 302 | geom_line(aes(y = level + 5, colour = "above")) + 303 | geom_line(aes(y = level - 5, colour = "below")) 304 | ``` 305 | 306 | And then tell the scale how to map labels to colours: 307 | 308 | ```{r} 309 | #| label: huron3 310 | ggplot(huron, aes(year)) + 311 | geom_line(aes(y = level + 5, colour = "above")) + 312 | geom_line(aes(y = level - 5, colour = "below")) + 313 | scale_colour_manual( 314 | "Direction", 315 | values = c("above" = "red", "below" = "blue") 316 | ) 317 | ``` 318 | 319 | ## Identity scales {#sec-scale-identity} 320 | 321 | Identity scales --- such as `scale_colour_identity()` and `scale_shape_identity()` --- are used when your data is already scaled such that the data and aesthetic spaces are the same. The code below shows an example where the identity scale is useful. `luv_colours` contains the locations of all R's built-in colours in the LUV colour space (the space that HCL is based on). A legend is unnecessary, because the point colour represents itself: the data and aesthetic spaces are the same. \index{Scales!identity} \indexf{scale\_identity} 322 | 323 | ```{r} 324 | #| label: scale-identity 325 | head(luv_colours) 326 | 327 | ggplot(luv_colours, aes(u, v)) + 328 | geom_point(aes(colour = col), size = 3) + 329 | scale_color_identity() + 330 | coord_equal() 331 | ``` 332 | -------------------------------------------------------------------------------- /scales.qmd: -------------------------------------------------------------------------------- 1 | # Scales {#sec-scales .unnumbered} 2 | 3 | ```{r} 4 | #| echo: false 5 | #| message: false 6 | #| results: asis 7 | source("common.R") 8 | ``` 9 | 10 | \index{Scales} 11 | 12 | Scales in ggplot2 control the mapping from data to aesthetics. 13 | They take your data and turn it into something that you can see, like size, colour, position or shape. 14 | They also provide the tools that let you interpret the plot: the axes and legends. 15 | You can generate plots with ggplot2 without knowing how scales work, but understanding scales and learning how to manipulate them will give you much more control. 16 | 17 | In ggplot2, guides are produced automatically based on the layers in your plot. 18 | You don't directly control the legends and axes; instead, you set up the data so that there's a clear mapping between data and aesthetics, and a guide is generated for you. 19 | This is very different to base R graphics, where you have total control over the legend, and can be frustrating when you first start using ggplot2. 20 | However, once you get the hang of it, you'll find that it saves you time, and there is little you cannot do. 21 | 22 | The scales toolbox divides scales into three main groups, covered in separate chapters: 23 | 24 | - Position scales and axes are described in @sec-scale-position. 25 | - Colour scales and legends are described in @sec-scale-colour. 26 | - Scales for other aesthetics are described in @sec-scale-other. 27 | 28 | The theory of scales is covered in @sec-scales-guides, which expands on these chapters as well as other sections in the book that refer to scales (e.g., @sec-titles is extended by @sec-scale-names). 29 | -------------------------------------------------------------------------------- /springer/contract-1-amendment.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/springer/contract-1-amendment.pdf -------------------------------------------------------------------------------- /springer/contract-1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/springer/contract-1.pdf -------------------------------------------------------------------------------- /springer/contract-2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/springer/contract-2.pdf -------------------------------------------------------------------------------- /springer/contract-3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/springer/contract-3.pdf -------------------------------------------------------------------------------- /springer/marketing-ggplot2.txt: -------------------------------------------------------------------------------- 1 | > Springer will include the preface, table of contents, and one sample 2 | > chapter on our web site. Which chapter should we use? 3 | 4 | Chapter 2: Getting started with qplot 5 | 6 | 7 | > The back-cover copy consists of 2-4 paragraphs and is written for a 8 | > technical audience so you can assume some knowledge of the area. The 9 | > first paragraph or two states what is important about the book, who 10 | > would find it useful, and the prerequisites. 11 | 12 | ggplot2 is a new data visualisation package for R that uses the insights from Leland Wilkison's Grammar of Graphics to create a powerful and flexible system for creating data graphics. Practically, \ggplot provides beautiful, hassle-free plots, that take care of fiddly details like drawing legends. A carefully chosen set of defaults means that most of the time you can toss-off a publication-quality graphic in seconds, but you if do have special formatting requirements, a comprehensive theming system makes it easy to do what you want. Instead of spending time making your graph look pretty, you can focus on creating a graph that bests reveals the messages in your data. 13 | 14 | This book will be useful to everyone who has struggled with displaying their data in an informative and attractive way. You will need some basic knowledge of R (i.e. you should be able to get your data in R), but ggplot2 is a mini-language specifically tailored for producing graphics, and you'll learn everything you need in the book. After reading this book you'll be able to produce graphics customised precisely for your problems, to and you'll find it easy to get graphics out of your head and on to the screen or page. 15 | 16 | Hadley Wickham is an Assistant Professor of Statistics at Rice University, and is interested in developing tools (both computational and cognitive) for making data preparation, visualisation and analysis easier. He has developed 15 R packages and in 2006 he won the John Chambers Award for Statistical Computing for his work on the ggplot and reshape R packages. 17 | 18 | 19 | > We also need a briefer and non-technical description for our bookstore 20 | > sales reps and bookstore/library buyers with minimal background. How 21 | > would you explain to these buyers what your book is about and why it is 22 | > important? 23 | 24 | This book describes ggplot2, a plotting system for R. R is an open-source statistical computing environment that is very popular in the academic statistics community, and is increasingly used in other areas of academia and industry. ggplot2 is a add on package for data visualisation. It provides you with the tools to make pretty much any type of data visualisation that you can imagine, using a consistent syntax that is easy to learn. 25 | 26 | > Please supply three unique selling points that we can present in 27 | > bulleted form on our web page. 28 | 29 | * Provides both rich theory and powerful applications. 30 | 31 | * All figures are accompanied by the code required to produce them. This makes it easier to see an interesting plot and modify it for your own data. 32 | 33 | * Full-colour figures are visually interesting, and useful one of the most powerful aspects of human vision: colour perception. 34 | 35 | 36 | 37 | > Please supply a maximum of five key words/phrases for your book. We will 38 | > make them available to search engines such as Springer.com and 39 | > Amazon.com. 40 | 41 | * Grammar of Graphics 42 | * Data graphics 43 | * Statistical graphics 44 | * Visualisation 45 | * R 46 | 47 | 48 | 49 | > What is the main competing book and what advantages does your book 50 | > offer? 51 | 52 | * Lattice, by Deepayan Sarkar. This is competing in the sense that the underlying graphics systems (lattice and ggplot2) are competing. I would expect many people interested in R graphics to buy both books. I would claim ggplot2 is more useful in the long run because you less limited in the types of plots you can produce. 53 | 54 | It is complementary to: 55 | 56 | * R Graphics, by Paul Murrell, which describes the lower-level details of the drawing system used by ggplot2 57 | 58 | * The Grammar of Graphics, by Lee Wilkinson, which goes into more detail about the theory behind ggplot2. 59 | 60 | 61 | 62 | > Your book will be advertised to the main statistical societies. Please 63 | > list any other societies that are an important market for your book. 64 | 65 | IEEE - esp. infovis community 66 | 67 | > What supplementary material will be available? If you have exercise 68 | > sets, will you be able to supply answers/solutions/hints for instructors 69 | > that require the book for a course? 70 | 71 | ggplot2 website provides: 72 | 73 | * over 400 examples of ggplot2 plots, with code required to produce them 74 | * all code used in book 75 | 76 | 77 | 78 | > We want to promote your book in various ways, and welcome any 79 | > suggestions from you. You can mention your new book on your homepage. 80 | > Are there any e-mail lists that you can used to announce your book? We 81 | > can supply a .jpg file of your book cover before your book is published. 82 | > You can add information about the book, like the full contents, data 83 | > sets, errata lists, color figures, computer code, and related research 84 | > work. You can link your page to the Springer online catalog page 85 | > featuring your book. Send us your URL so we can link to it. You can also 86 | > send information to relevant newsgroups. 87 | 88 | I'll promote the book on the ggplot2 mailing list, which currently has ~160 members. My homepage is http://had.co.nz/ 89 | 90 | YOUR CHOICE FOR BOOKSTORE LOCATION IS: 91 | Statistics 92 | 93 | MARKETING CATEGORIES 94 | 95 | Statistics and Computing/Statistical Programs 96 | 97 | Mathematics: Visualization 98 | Computer Science: Computer Graphics 99 | -------------------------------------------------------------------------------- /springer/proposal-3e.md: -------------------------------------------------------------------------------- 1 | # Revised TOC 2 | 3 | Getting Started 4 | 1. Introduction 5 | 2. Getting Started with ggplot2 6 | 7 | Toolbox 8 | 3. Individual geoms 9 | 4. Collective geoms 10 | 5. Statistical summaries 11 | 6. Space and time 12 | 7. Annotations 13 | 8. Arranging multiple plots 14 | 9. Frequently asked questions 15 | 16 | The Grammar 17 | 10. Mastering the grammar 18 | 11. Build a plot layer-by-layer 19 | 12. Scales, Axes, and Legends 20 | 13. Positioning 21 | 14. Themes 22 | 23 | Extending ggplot2 24 | 15. Programming with ggplot2 25 | 16. Introduction to ggproto 26 | 17. Extending ggplot2 27 | 28 | Appendices 29 | A. Aesthetic specification 30 | 31 | # Summary of major changes 32 | 33 | * Expand toolbox chapter into multiple chapters to give greater space 34 | to do into useful details. New FAQ chapter to cover the most common 35 | problems people encounter in practice. 36 | 37 | * Remove the "data analysis", "data transformation", and "modelling 38 | for visualisation" chapters (since they are now covered better 39 | elsewhere) 40 | 41 | * Add a new "Extending ggplot2" part to discuss how to extend ggplot2 42 | in packages and elsewhere. 43 | 44 | * The entire book will updated to use the latest features and idioms 45 | of ggplot2. -------------------------------------------------------------------------------- /start.qmd: -------------------------------------------------------------------------------- 1 | # Getting Started {#sec-start} 2 | 3 | ```{r} 4 | #| echo: false 5 | #| message: false 6 | #| results: asis 7 | source("common.R") 8 | status("drafting") 9 | ``` 10 | 11 | To be consistent with the other parts to the book there should be some opening text here. 12 | -------------------------------------------------------------------------------- /style.css: -------------------------------------------------------------------------------- 1 | .book .book-header h1 { 2 | opacity: 1; 3 | text-align: left; 4 | } 5 | 6 | #header .title { 7 | margin-bottom: 0em; 8 | } 9 | #header h4.author { 10 | margin: 0; 11 | color: #666; 12 | } 13 | #header h4.author em { 14 | font-style: normal; 15 | } 16 | 17 | /* Sidebar formating --------------------------------------------*/ 18 | 19 | div.sidebar, div.base { 20 | border: 1px solid #ccc; 21 | border-left-width: 5px; 22 | border-radius: 5px; 23 | padding: 1em; 24 | margin: 1em 0; 25 | } 26 | 27 | /* .book .book-body .page-wrapper .page-inner section.normal is needed 28 | to override the styles produced by gitbook, which are ridiculously 29 | overspecified. Goal of the selectors is to ensure internal "margins" 30 | controlled only by padding of container */ 31 | 32 | .book .book-body .page-wrapper .page-inner section.normal div.sidebar > :first-child, 33 | .book .book-body .page-wrapper .page-inner section.normal div.base > :first-child { 34 | margin-top: 0; 35 | } 36 | 37 | .book .book-body .page-wrapper .page-inner section.normal div.sidebar > :last-child, 38 | .book .book-body .page-wrapper .page-inner section.normal div.base > :last-child { 39 | margin-bottom: 0; 40 | } 41 | 42 | div.base::before { 43 | display: block; 44 | content: "In base R"; 45 | } 46 | 47 | div.base::before, 48 | .book .book-body .page-wrapper .page-inner section.normal .sidebar h3 { 49 | font-size: 1.1em; 50 | font-weight: 700; 51 | margin-bottom: 0.25em; 52 | color: #333; 53 | } 54 | 55 | .todo { 56 | display: block; 57 | border: 1px solid red; 58 | border-left-width: 5px; 59 | border-radius: 5px; 60 | padding: 0.5em 1em; 61 | margin: 1em 0; 62 | } 63 | 64 | .todo::before { 65 | content: "TO DO: "; 66 | font-weight: bold; 67 | color: red; 68 | } 69 | 70 | /* Other gitbook tweaks -------------------------------------------------- */ 71 | 72 | .book .book-body .page-wrapper .page-inner section.normal code { 73 | padding: 2px 0; 74 | } 75 | -------------------------------------------------------------------------------- /todo.numbers: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/ggplot2-book/34b224a18d65048864ebfd1d8761cead473ab674/todo.numbers -------------------------------------------------------------------------------- /toolbox.qmd: -------------------------------------------------------------------------------- 1 | # Layers {#sec-toolbox .unnumbered} 2 | 3 | ```{r} 4 | #| echo: false 5 | #| message: false 6 | #| results: asis 7 | source("common.R") 8 | status("drafting") 9 | ``` 10 | 11 | The layered structure of ggplot2 encourages you to design and construct graphics in a structured manner., You've learned the basics in the previous chapter, and in this chapter you'll get a more comprehensive task-based introduction. 12 | The goal here is not to exhaustively explore every option of every geom, but instead to show the most important tools for a given task. 13 | For more information about individual geoms, along with many more examples illustrating their use, see the documentation. 14 | 15 | It is useful to think about the purpose of each layer before it is added. 16 | In general, there are three purposes for a layer: \index{Layers!strategy} 17 | 18 | - To display the **data**. 19 | We plot the raw data for many reasons, relying on our skills at pattern detection to spot gross structure, local structure, and outliers. 20 | This layer appears on virtually every graphic. 21 | In the earliest stages of data exploration, it is often the only layer. 22 | 23 | - To display a statistical **summary** of the data. 24 | As we develop and explore models of the data, it is useful to display model predictions in the context of the data. 25 | Showing the data helps us improve the model, and showing the model helps reveal subtleties of the data that we might otherwise miss. 26 | Summaries are usually drawn on top of the data. 27 | 28 | - To add additional **metadata**: context, annotations, and references. 29 | A metadata layer displays background context, annotations that help to give meaning to the raw data, or fixed references that aid comparisons across panels. 30 | Metadata can be useful in the background and foreground. 31 | 32 | A map is often used as a background layer with spatial data. 33 | Background metadata should be rendered so that it doesn't interfere with your perception of the data, so is usually displayed underneath the data and formatted so that it is minimally perceptible. 34 | That is, if you concentrate on it, you can see it with ease, but it doesn't jump out at you when you are casually browsing the plot. 35 | 36 | Other metadata is used to highlight important features of the data. 37 | If you have added explanatory labels to a couple of inflection points or outliers, then you want to render them so that they pop out at the viewer. 38 | In that case, you want this to be the very last layer drawn. 39 | 40 | This chapter is broken up into the following sections, each of which deals with a particular graphical challenge. 41 | This is not an exhaustive or exclusive categorisation, and there are many other possible ways to break up graphics into different categories. 42 | Each geom can be used for many different purposes, especially if you are creative. 43 | However, this breakdown should cover many common tasks and help you learn about some of the possibilities. 44 | 45 | - Basic plot types that produce common, 'named' graphics like scatterplots and line charts, @sec-basics. 46 | 47 | - Displaying text, @sec-text-labels. 48 | 49 | - Adding arbitrary additional anotations, @sec-annotations. 50 | 51 | - Surface plots to display 3d surfaces in 2d, @sec-surface. 52 | 53 | - Drawing maps, @sec-maps. 54 | 55 | - Revealing uncertainty and error, with various 1d and 2d intervals, @sec-uncertainty. 56 | 57 | - Weighted data, @sec-weighting. 58 | 59 | - In @sec-diamonds, you'll learn about the diamonds dataset. 60 | 61 | The final three sections use this data to discuss techniques for visualising larger datasets: 62 | 63 | - Displaying distributions, continuous and discrete, 1d and 2d, joint and conditional, @sec-distributions. 64 | 65 | - Dealing with overplotting in scatterplots, a challenge with large datasets,\ 66 | 67 | 1. 68 | 69 | - Displaying statistical summaries instead of the raw data, @sec-summary. 70 | --------------------------------------------------------------------------------