├── .gitignore ├── README.md ├── Texas_income.Rmd ├── Texas_income.html ├── bundestag_pie.Rmd ├── bundestag_pie.html ├── corruption_human_development.Rmd ├── corruption_human_development.html ├── goode.Rmd ├── goode.html ├── health_status.Rmd ├── health_status.html └── practical_ggplot2.Rproj /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Practical ggplot2 2 | 3 | Claus O. Wilke 4 | 5 | The R package ggplot2 provides a powerful and flexible approach to data visualization, and it is suitable both for rapid exploration of different visualization approaches and for producing carefully crafted publication-quality figures. However, getting ggplot2 to make figures that look exactly the way you want them to can sometimes be challenging, and beginners and experts alike can get confused by themes, scales, coords, guides, or facets. This repository houses a set of step-by-step examples demonstrating how to get the most out of ggplot2, including how to choose and customize scales, how to theme plots, and when and how to use extension packages. 6 | 7 | The examples shown are based on the book ["Fundamentals of Data Visualization."](https://serialmentor.com/dataviz) However, there are minor differences between the figures here and the ones in the book. Most importantly, the book uses the Myriad Pro font family, which is not freely available. I have also cleaned up the ggplot2 code where appropriate, and I have made adjustments to font and figure sizes so the figures look appropriate in the default R Markdown html style. 8 | 9 | List of examples provided: 10 | 11 | - [Bundestag pie chart](https://htmlpreview.github.io/?https://github.com/clauswilke/practical_ggplot2/blob/master/bundestag_pie.html) 12 | - [Corruption and human development](https://htmlpreview.github.io/?https://github.com/clauswilke/practical_ggplot2/blob/master/corruption_human_development.html) 13 | - [Health status by age](https://htmlpreview.github.io/?https://github.com/clauswilke/practical_ggplot2/blob/master/health_status.html) 14 | - [Interrupted Goode homolosine](https://htmlpreview.github.io/?https://github.com/clauswilke/practical_ggplot2/blob/master/goode.html) 15 | - [Median Texas income by county](https://htmlpreview.github.io/?https://github.com/clauswilke/practical_ggplot2/blob/master/Texas_income.html) 16 | -------------------------------------------------------------------------------- /Texas_income.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Median Texas income by county" 3 | output: 4 | html_document: 5 | df_print: paged 6 | --- 7 | 8 | ```{r setup, echo = FALSE} 9 | knitr::opts_chunk$set( 10 | fig.width = 6, fig.height = .75*6 11 | ) 12 | ``` 13 | 14 | The dviz.supp package contains the dataset we are working with. 15 | ```{r} 16 | # devtools::install_github("clauswilke/dviz.supp") 17 | ``` 18 | 19 | The data, in table form and simple longitude--latitude plot. 20 | 21 | ```{r message = FALSE} 22 | library(tidyverse) 23 | library(sf) 24 | 25 | # load data 26 | data(texas_income, package = "dviz.supp") 27 | 28 | texas_income 29 | 30 | ggplot(texas_income, aes(fill = estimate)) + 31 | geom_sf() 32 | ``` 33 | 34 | 35 | Transform to more appropriate coordinate system, [EPSG:3083,](https://epsg.io/3083) a Texas-centric Albers equal area projection. 36 | 37 | ```{r} 38 | # EPSG:3083 Texas-centric Albers equal area 39 | # https://epsg.io/3083 40 | texas_crs <- "+proj=aea +lat_1=27.5 +lat_2=35 +lat_0=18 +lon_0=-100 +x_0=1500000 +y_0=6000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs" 41 | 42 | texas_transf <- st_transform(texas_income, crs = texas_crs) 43 | 44 | ggplot(texas_transf, aes(fill = estimate)) + 45 | geom_sf() 46 | ``` 47 | 48 | Use different color scale. 49 | 50 | ```{r} 51 | library(colorspace) # for scale_fill_continuous_sequential 52 | 53 | ggplot(texas_transf, aes(fill = estimate)) + 54 | geom_sf(color = "white") + 55 | scale_fill_continuous_sequential( 56 | palette = "Blues", rev = TRUE, 57 | na.value = "grey60" 58 | ) 59 | ``` 60 | 61 | We want the legend to be oriented horizontally and placed to the lower left of Texas. Therefore, we need to change the plot limits to create space for the legend. We choose limits of -110, -93.5 degrees longitude at 30 degrees latitude, transformed to X, Y with EPSG:3083. We could do the transformation with `st_transform()`, as above, or we can do it manually at https://epsg.io/transform. 62 | 63 | ```{r} 64 | # https://epsg.io/transform#s_srs=4326&t_srs=3083&x=-110.0000000&y=30.0000000 65 | # https://epsg.io/transform#s_srs=4326&t_srs=3083&x=-93.5000000&y=30.0000000 66 | # (-110, 30) is (538250.08, 7363459.44) 67 | # (-93.5, 30) is (2125629.02, 7338358.43) 68 | 69 | texas_xlim <- c(538250, 2125629) 70 | 71 | ggplot(texas_transf, aes(fill = estimate)) + 72 | geom_sf(color = "white") + 73 | coord_sf(xlim = texas_xlim) + 74 | scale_fill_continuous_sequential( 75 | palette = "Blues", rev = TRUE, 76 | na.value = "grey60" 77 | ) 78 | ``` 79 | 80 | Now we can move the legend into place. 81 | 82 | ```{r} 83 | ggplot(texas_transf, aes(fill = estimate)) + 84 | geom_sf(color = "white") + 85 | coord_sf(xlim = texas_xlim) + 86 | scale_fill_continuous_sequential( 87 | palette = "Blues", rev = TRUE, 88 | na.value = "grey60", 89 | guide = guide_colorbar( 90 | direction = "horizontal", 91 | label.position = "bottom", 92 | title.position = "top", 93 | barwidth = grid::unit(3.0, "in"), 94 | barheight = grid::unit(0.2, "in") 95 | ) 96 | ) + 97 | theme( 98 | legend.title.align = 0.5, 99 | legend.text.align = 0.5, 100 | legend.justification = c(0, 0), 101 | legend.position = c(0.02, 0.1) 102 | ) 103 | ``` 104 | 105 | Fine-tune legend title and breaks. 106 | 107 | ```{r} 108 | ggplot(texas_transf, aes(fill = estimate)) + 109 | geom_sf(color = "white") + 110 | coord_sf(xlim = texas_xlim) + 111 | scale_fill_continuous_sequential( 112 | palette = "Blues", rev = TRUE, 113 | na.value = "grey60", 114 | name = "annual median income (USD)", 115 | limits = c(18000, 90000), 116 | breaks = 20000*c(1:4), 117 | labels = c("$20,000", "$40,000", "$60,000", "$80,000"), 118 | guide = guide_colorbar( 119 | direction = "horizontal", 120 | label.position = "bottom", 121 | title.position = "top", 122 | barwidth = grid::unit(3.0, "in"), 123 | barheight = grid::unit(0.2, "in") 124 | ) 125 | ) + 126 | theme( 127 | legend.title.align = 0.5, 128 | legend.text.align = 0.5, 129 | legend.justification = c(0, 0), 130 | legend.position = c(0.02, 0.1) 131 | ) 132 | ``` 133 | 134 | Switch the theme to a simple, mostly empty theme suitable for a map. 135 | 136 | ```{r message = FALSE} 137 | library(cowplot) # for theme_map() 138 | 139 | ggplot(texas_transf, aes(fill = estimate)) + 140 | geom_sf(color = "white") + 141 | coord_sf(xlim = texas_xlim) + 142 | scale_fill_continuous_sequential( 143 | palette = "Blues", rev = TRUE, 144 | na.value = "grey60", 145 | name = "annual median income (USD)", 146 | limits = c(18000, 90000), 147 | breaks = 20000*c(1:4), 148 | labels = c("$20,000", "$40,000", "$60,000", "$80,000"), 149 | guide = guide_colorbar( 150 | direction = "horizontal", 151 | label.position = "bottom", 152 | title.position = "top", 153 | barwidth = grid::unit(3.0, "in"), 154 | barheight = grid::unit(0.2, "in") 155 | ) 156 | ) + 157 | theme_map(12) + 158 | theme( 159 | legend.title.align = 0.5, 160 | legend.text.align = 0.5, 161 | legend.justification = c(0, 0), 162 | legend.position = c(0.02, 0.1) 163 | ) 164 | ``` 165 | 166 | Remove the graticule by setting `datum = NA` in `coord_sf()`. 167 | ```{r} 168 | ggplot(texas_transf, aes(fill = estimate)) + 169 | geom_sf(color = "white") + 170 | coord_sf(xlim = texas_xlim, datum = NA) + 171 | scale_fill_continuous_sequential( 172 | palette = "Blues", rev = TRUE, 173 | na.value = "grey60", 174 | name = "annual median income (USD)", 175 | limits = c(18000, 90000), 176 | breaks = 20000*c(1:4), 177 | labels = c("$20,000", "$40,000", "$60,000", "$80,000"), 178 | guide = guide_colorbar( 179 | direction = "horizontal", 180 | label.position = "bottom", 181 | title.position = "top", 182 | barwidth = grid::unit(3.0, "in"), 183 | barheight = grid::unit(0.2, "in") 184 | ) 185 | ) + 186 | theme_map(12) + 187 | theme( 188 | legend.title.align = 0.5, 189 | legend.text.align = 0.5, 190 | legend.justification = c(0, 0), 191 | legend.position = c(0.02, 0.1) 192 | ) 193 | ``` 194 | -------------------------------------------------------------------------------- /bundestag_pie.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Bundestag pie chart" 3 | output: 4 | html_document: 5 | df_print: paged 6 | --- 7 | 8 | The dviz.supp package contains the dataset we are working with. 9 | ```{r} 10 | # devtools::install_github("clauswilke/dviz.supp") 11 | ``` 12 | 13 | 14 | The data, shown in table form and basic pie-chart form. 15 | ```{r message = FALSE} 16 | library(tidyverse) 17 | 18 | bundestag <- dviz.supp::bundestag %>% 19 | select(party, seats, colors) 20 | 21 | bundestag 22 | 23 | ggplot(bundestag, aes(x = 1, y = seats, fill = party)) + 24 | geom_col() + 25 | coord_polar(theta = "y") 26 | ``` 27 | 28 | We could try to style this pie chart to make it look the way we want, but I usually find it easier to draw pie charts in cartesian coordinates, using `geom_arc_bar()` from ggforce. This requires a little more data preparation up front but gives much more predictable results on the back end. 29 | 30 | ```{r} 31 | library(ggforce) # for geom_arc_bar() 32 | 33 | bund_pie <- bundestag %>% 34 | arrange(seats) %>% 35 | mutate( 36 | end_angle = 2*pi*cumsum(seats)/sum(seats), # ending angle for each pie slice 37 | start_angle = lag(end_angle, default = 0), # starting angle for each pie slice 38 | mid_angle = 0.5*(start_angle + end_angle), # middle of each pie slice, for the text label 39 | # horizontal and vertical justifications depend on whether we're to the left/right 40 | # or top/bottom of the pie 41 | hjust = ifelse(mid_angle > pi, 1, 0), 42 | vjust = ifelse(mid_angle < pi/2 | mid_angle > 3*pi/2, 0, 1) 43 | ) 44 | 45 | bund_pie 46 | 47 | # radius of the pie and radius for outside and inside labels 48 | rpie = 1 49 | rlabel_out = 1.05 * rpie 50 | rlabel_in = 0.6 * rpie 51 | 52 | ggplot(bund_pie) + 53 | geom_arc_bar( 54 | aes( 55 | x0 = 0, y0 = 0, r0 = 0, r = rpie, 56 | start = start_angle, end = end_angle, fill = party 57 | ) 58 | ) + 59 | coord_fixed() 60 | ``` 61 | 62 | Next we add labels representing the numbers of seats for each party. 63 | 64 | ```{r} 65 | ggplot(bund_pie) + 66 | geom_arc_bar( 67 | aes( 68 | x0 = 0, y0 = 0, r0 = 0, r = rpie, 69 | start = start_angle, end = end_angle, fill = party 70 | ) 71 | ) + 72 | geom_text( 73 | aes( 74 | x = rlabel_in * sin(mid_angle), 75 | y = rlabel_in * cos(mid_angle), 76 | label = seats 77 | ), 78 | size = 14/.pt # use 14 pt font size 79 | ) + 80 | coord_fixed() 81 | ``` 82 | 83 | And we provide labels for the parties outside of the pie. 84 | 85 | ```{r} 86 | ggplot(bund_pie) + 87 | geom_arc_bar( 88 | aes( 89 | x0 = 0, y0 = 0, r0 = 0, r = rpie, 90 | start = start_angle, end = end_angle, fill = party 91 | ) 92 | ) + 93 | geom_text( 94 | aes( 95 | x = rlabel_in * sin(mid_angle), 96 | y = rlabel_in * cos(mid_angle), 97 | label = seats 98 | ), 99 | size = 14/.pt 100 | ) + 101 | geom_text( 102 | aes( 103 | x = rlabel_out * sin(mid_angle), 104 | y = rlabel_out * cos(mid_angle), 105 | label = party, 106 | hjust = hjust, vjust = vjust 107 | ), 108 | size = 14/.pt 109 | ) + 110 | coord_fixed() 111 | ``` 112 | 113 | We see that we need to make some space for the outside labels. We can do that by manually setting the scale limits and expansion. 114 | 115 | ```{r} 116 | ggplot(bund_pie) + 117 | geom_arc_bar( 118 | aes( 119 | x0 = 0, y0 = 0, r0 = 0, r = rpie, 120 | start = start_angle, end = end_angle, fill = party 121 | ) 122 | ) + 123 | geom_text( 124 | aes( 125 | x = rlabel_in * sin(mid_angle), 126 | y = rlabel_in * cos(mid_angle), 127 | label = seats 128 | ), 129 | size = 14/.pt 130 | ) + 131 | geom_text( 132 | aes( 133 | x = rlabel_out * sin(mid_angle), 134 | y = rlabel_out * cos(mid_angle), 135 | label = party, 136 | hjust = hjust, vjust = vjust 137 | ), 138 | size = 14/.pt 139 | ) + 140 | scale_x_continuous( 141 | name = NULL, 142 | limits = c(-1.5, 1.5), 143 | expand = c(0, 0) 144 | ) + 145 | scale_y_continuous( 146 | name = NULL, 147 | limits = c(-1.05, 1.15), 148 | expand = c(0, 0) 149 | ) + 150 | coord_fixed() 151 | ``` 152 | 153 | This plot shows how using a cartesian coordinate system is helpful. We can see exactly where elements lie and how we need to expand the limits to fully show all the labels. We have expanded the x axis limits by the same amount so that the final pie chart is centered in the drawing area. 154 | 155 | Next we change the pie colors. The dataset provides appropriate party colors, and we use those directly with `scale_fill_identity()`. Note that this scale eliminates the legend. We don't need a legend anyways, because we have direct labeled the pie slices. 156 | 157 | 158 | ```{r} 159 | ggplot(bund_pie) + 160 | geom_arc_bar( 161 | aes( 162 | x0 = 0, y0 = 0, r0 = 0, r = rpie, 163 | start = start_angle, end = end_angle, fill = colors 164 | ) 165 | ) + 166 | geom_text( 167 | aes( 168 | x = rlabel_in * sin(mid_angle), 169 | y = rlabel_in * cos(mid_angle), 170 | label = seats 171 | ), 172 | size = 14/.pt 173 | ) + 174 | geom_text( 175 | aes( 176 | x = rlabel_out * sin(mid_angle), 177 | y = rlabel_out * cos(mid_angle), 178 | label = party, 179 | hjust = hjust, vjust = vjust 180 | ), 181 | size = 14/.pt 182 | ) + 183 | scale_x_continuous( 184 | name = NULL, 185 | limits = c(-1.5, 1.5), 186 | expand = c(0, 0) 187 | ) + 188 | scale_y_continuous( 189 | name = NULL, 190 | limits = c(-1.05, 1.15), 191 | expand = c(0, 0) 192 | ) + 193 | scale_fill_identity() + 194 | coord_fixed() 195 | ``` 196 | 197 | The black color for the text labels doesn't work well on top of the dark fill colors, and the black outline also looks overbearing, so we'll change those colors to white. 198 | 199 | ```{r} 200 | ggplot(bund_pie) + 201 | geom_arc_bar( 202 | aes( 203 | x0 = 0, y0 = 0, r0 = 0, r = rpie, 204 | start = start_angle, end = end_angle, fill = colors 205 | ), 206 | color = "white" 207 | ) + 208 | geom_text( 209 | aes( 210 | x = rlabel_in * sin(mid_angle), 211 | y = rlabel_in * cos(mid_angle), 212 | label = seats 213 | ), 214 | size = 14/.pt, 215 | color = c("black", "white", "white") 216 | ) + 217 | geom_text( 218 | aes( 219 | x = rlabel_out * sin(mid_angle), 220 | y = rlabel_out * cos(mid_angle), 221 | label = party, 222 | hjust = hjust, vjust = vjust 223 | ), 224 | size = 14/.pt 225 | ) + 226 | scale_x_continuous( 227 | name = NULL, 228 | limits = c(-1.5, 1.5), 229 | expand = c(0, 0) 230 | ) + 231 | scale_y_continuous( 232 | name = NULL, 233 | limits = c(-1.05, 1.15), 234 | expand = c(0, 0) 235 | ) + 236 | scale_fill_identity() + 237 | coord_fixed() 238 | ``` 239 | 240 | Finally, we need to apply a theme that removes the background grid and axes 241 | 242 | ```{r message = FALSE} 243 | library(cowplot) # for theme_map() 244 | 245 | ggplot(bund_pie) + 246 | geom_arc_bar( 247 | aes( 248 | x0 = 0, y0 = 0, r0 = 0, r = rpie, 249 | start = start_angle, end = end_angle, fill = colors 250 | ), 251 | color = "white" 252 | ) + 253 | geom_text( 254 | aes( 255 | x = rlabel_in * sin(mid_angle), 256 | y = rlabel_in * cos(mid_angle), 257 | label = seats 258 | ), 259 | size = 14/.pt, 260 | color = c("black", "white", "white") 261 | ) + 262 | geom_text( 263 | aes( 264 | x = rlabel_out * sin(mid_angle), 265 | y = rlabel_out * cos(mid_angle), 266 | label = party, 267 | hjust = hjust, vjust = vjust 268 | ), 269 | size = 14/.pt 270 | ) + 271 | scale_x_continuous( 272 | name = NULL, 273 | limits = c(-1.5, 1.5), 274 | expand = c(0, 0) 275 | ) + 276 | scale_y_continuous( 277 | name = NULL, 278 | limits = c(-1.05, 1.15), 279 | expand = c(0, 0) 280 | ) + 281 | scale_fill_identity() + 282 | coord_fixed() + 283 | theme_map() 284 | ``` 285 | 286 | -------------------------------------------------------------------------------- /corruption_human_development.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Corruption and human development" 3 | output: 4 | html_document: 5 | df_print: paged 6 | --- 7 | 8 | The dviz.supp package contains the dataset we are working with. 9 | ```{r} 10 | # devtools::install_github("clauswilke/dviz.supp") 11 | ``` 12 | 13 | 14 | The data, shown in table form and basic scatterplot form. 15 | ```{r message = FALSE} 16 | library(tidyverse) 17 | 18 | corrupt <- dviz.supp::corruption %>% 19 | filter(year == 2015) %>% 20 | na.omit() 21 | 22 | head(corrupt) 23 | 24 | ggplot(corrupt, aes(cpi, hdi, color = region)) + 25 | geom_point() 26 | ``` 27 | 28 | Basic styling: point colors and theme. 29 | ```{r message = FALSE} 30 | library(cowplot) # for theme_minimal_hgrid() 31 | library(colorspace) # for darken() 32 | 33 | # Okabe Ito colors 34 | region_cols <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#999999") 35 | 36 | ggplot(corrupt, aes(cpi, hdi)) + 37 | geom_point( 38 | aes(color = region, fill = region), 39 | size = 2.5, alpha = 0.5, shape = 21 40 | ) + 41 | scale_color_manual( 42 | values = darken(region_cols, 0.3) 43 | ) + 44 | scale_fill_manual( 45 | values = region_cols 46 | ) + 47 | theme_minimal_hgrid(12, rel_small = 1) # font size 12 pt throughout 48 | ``` 49 | 50 | Add smoothing line. 51 | 52 | ```{r} 53 | ggplot(corrupt, aes(cpi, hdi)) + 54 | geom_smooth( 55 | aes(color = "y ~ log(x)", fill = "y ~ log(x)"), 56 | method = 'lm', formula = y~log(x), se = FALSE, fullrange = TRUE 57 | ) + 58 | geom_point( 59 | aes(color = region, fill = region), 60 | size = 2.5, alpha = 0.5, shape = 21 61 | ) + 62 | scale_color_manual( 63 | values = darken(region_cols, 0.3) 64 | ) + 65 | scale_fill_manual( 66 | values = region_cols 67 | ) + 68 | theme_minimal_hgrid(12, rel_small = 1) 69 | ``` 70 | 71 | Set the same scale name for color and fill scale, to force merging of guides. 72 | 73 | ```{r} 74 | ggplot(corrupt, aes(cpi, hdi)) + 75 | geom_smooth( 76 | aes(color = "y ~ log(x)", fill = "y ~ log(x)"), 77 | method = 'lm', formula = y~log(x), se = FALSE, fullrange = TRUE 78 | ) + 79 | geom_point( 80 | aes(color = region, fill = region), 81 | size = 2.5, alpha = 0.5, shape = 21 82 | ) + 83 | scale_color_manual( 84 | name = NULL, 85 | values = darken(region_cols, 0.3) 86 | ) + 87 | scale_fill_manual( 88 | name = NULL, 89 | values = region_cols 90 | ) + 91 | theme_minimal_hgrid(12, rel_small = 1) 92 | ``` 93 | 94 | Override legend aesthetics. 95 | 96 | ```{r} 97 | ggplot(corrupt, aes(cpi, hdi)) + 98 | geom_smooth( 99 | aes(color = "y ~ log(x)", fill = "y ~ log(x)"), 100 | method = 'lm', formula = y~log(x), se = FALSE, fullrange = TRUE 101 | ) + 102 | geom_point( 103 | aes(color = region, fill = region), 104 | size = 2.5, alpha = 0.5, shape = 21 105 | ) + 106 | scale_color_manual( 107 | name = NULL, 108 | values = darken(region_cols, 0.3) 109 | ) + 110 | scale_fill_manual( 111 | name = NULL, 112 | values = region_cols 113 | ) + 114 | guides( 115 | color = guide_legend( 116 | override.aes = list( 117 | linetype = c(rep(0, 5), 1), 118 | shape = c(rep(21, 5), NA) 119 | ) 120 | ) 121 | ) + 122 | theme_minimal_hgrid(12, rel_small = 1) 123 | ``` 124 | 125 | Set x and y scales, move legend on top. 126 | ```{r} 127 | ggplot(corrupt, aes(cpi, hdi)) + 128 | geom_smooth( 129 | aes(color = "y ~ log(x)", fill = "y ~ log(x)"), 130 | method = 'lm', formula = y~log(x), se = FALSE, fullrange = TRUE 131 | ) + 132 | geom_point( 133 | aes(color = region, fill = region), 134 | size = 2.5, alpha = 0.5, shape = 21 135 | ) + 136 | scale_color_manual( 137 | name = NULL, 138 | values = darken(region_cols, 0.3) 139 | ) + 140 | scale_fill_manual( 141 | name = NULL, 142 | values = region_cols 143 | ) + 144 | scale_x_continuous( 145 | name = "Corruption Perceptions Index, 2015 (100 = least corrupt)", 146 | limits = c(10, 95), 147 | breaks = c(20, 40, 60, 80, 100), 148 | expand = c(0, 0) 149 | ) + 150 | scale_y_continuous( 151 | name = "Human Development Index, 2015\n(1.0 = most developed)", 152 | limits = c(0.3, 1.05), 153 | breaks = c(0.2, 0.4, 0.6, 0.8, 1.0), 154 | expand = c(0, 0) 155 | ) + 156 | guides( 157 | color = guide_legend( 158 | override.aes = list( 159 | linetype = c(rep(0, 5), 1), 160 | shape = c(rep(21, 5), NA) 161 | ) 162 | ) 163 | ) + 164 | theme_minimal_hgrid(12, rel_small = 1) + 165 | theme( 166 | legend.position = "top", 167 | legend.justification = "right", 168 | legend.text = element_text(size = 9), 169 | legend.box.spacing = unit(0, "pt") 170 | ) 171 | ``` 172 | 173 | Reformat legend into a single row. 174 | 175 | ```{r} 176 | corrupt <- corrupt %>% 177 | mutate(region = case_when( 178 | region == "Middle East and North Africa" ~ "Middle East\nand North Africa", 179 | region == "Europe and Central Asia" ~ "Europe and\nCentral Asia", 180 | region == "Sub Saharan Africa" ~ "Sub-Saharan\nAfrica", 181 | TRUE ~ region) 182 | ) 183 | 184 | ggplot(corrupt, aes(cpi, hdi)) + 185 | geom_smooth( 186 | aes(color = "y ~ log(x)", fill = "y ~ log(x)"), 187 | method = 'lm', formula = y~log(x), se = FALSE, fullrange = TRUE 188 | ) + 189 | geom_point( 190 | aes(color = region, fill = region), 191 | size = 2.5, alpha = 0.5, shape = 21 192 | ) + 193 | scale_color_manual( 194 | name = NULL, 195 | values = darken(region_cols, 0.3) 196 | ) + 197 | scale_fill_manual( 198 | name = NULL, 199 | values = region_cols 200 | ) + 201 | scale_x_continuous( 202 | name = "Corruption Perceptions Index, 2015 (100 = least corrupt)", 203 | limits = c(10, 95), 204 | breaks = c(20, 40, 60, 80, 100), 205 | expand = c(0, 0) 206 | ) + 207 | scale_y_continuous( 208 | name = "Human Development Index, 2015\n(1.0 = most developed)", 209 | limits = c(0.3, 1.05), 210 | breaks = c(0.2, 0.4, 0.6, 0.8, 1.0), 211 | expand = c(0, 0) 212 | ) + 213 | guides( 214 | color = guide_legend( 215 | nrow = 1, 216 | override.aes = list( 217 | linetype = c(rep(0, 5), 1), 218 | shape = c(rep(21, 5), NA) 219 | ) 220 | ) 221 | ) + 222 | theme_minimal_hgrid(12, rel_small = 1) + 223 | theme( 224 | legend.position = "top", 225 | legend.justification = "right", 226 | legend.text = element_text(size = 9), 227 | legend.box.spacing = unit(0, "pt") 228 | ) 229 | ``` 230 | 231 | Highlight select countries. 232 | 233 | ```{r} 234 | library(ggrepel) 235 | 236 | country_highlight <- c("Germany", "Norway", "United States", "Greece", "Singapore", "Rwanda", "Russia", "Venezuela", "Sudan", "Iraq", "Ghana", "Niger", "Chad", "Kuwait", "Qatar", "Myanmar", "Nepal", "Chile", "Argentina", "Japan", "China") 237 | 238 | corrupt <- corrupt %>% 239 | mutate( 240 | label = ifelse(country %in% country_highlight, country, "") 241 | ) 242 | 243 | ggplot(corrupt, aes(cpi, hdi)) + 244 | geom_smooth( 245 | aes(color = "y ~ log(x)", fill = "y ~ log(x)"), 246 | method = 'lm', formula = y~log(x), se = FALSE, fullrange = TRUE 247 | ) + 248 | geom_point( 249 | aes(color = region, fill = region), 250 | size = 2.5, alpha = 0.5, shape = 21 251 | ) + 252 | geom_text_repel( 253 | aes(label = label), 254 | color = "black", 255 | size = 9/.pt, # font size 9 pt 256 | point.padding = 0.1, 257 | box.padding = .6, 258 | min.segment.length = 0, 259 | seed = 7654 260 | ) + 261 | scale_color_manual( 262 | name = NULL, 263 | values = darken(region_cols, 0.3) 264 | ) + 265 | scale_fill_manual( 266 | name = NULL, 267 | values = region_cols 268 | ) + 269 | scale_x_continuous( 270 | name = "Corruption Perceptions Index, 2015 (100 = least corrupt)", 271 | limits = c(10, 95), 272 | breaks = c(20, 40, 60, 80, 100), 273 | expand = c(0, 0) 274 | ) + 275 | scale_y_continuous( 276 | name = "Human Development Index, 2015\n(1.0 = most developed)", 277 | limits = c(0.3, 1.05), 278 | breaks = c(0.2, 0.4, 0.6, 0.8, 1.0), 279 | expand = c(0, 0) 280 | ) + 281 | guides( 282 | color = guide_legend( 283 | nrow = 1, 284 | override.aes = list( 285 | linetype = c(rep(0, 5), 1), 286 | shape = c(rep(21, 5), NA) 287 | ) 288 | ) 289 | ) + 290 | theme_minimal_hgrid(12, rel_small = 1) + 291 | theme( 292 | legend.position = "top", 293 | legend.justification = "right", 294 | legend.text = element_text(size = 9), 295 | legend.box.spacing = unit(0, "pt") 296 | ) 297 | ``` 298 | -------------------------------------------------------------------------------- /goode.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Interrupted Goode homolosine" 3 | output: 4 | html_document: 5 | df_print: paged 6 | --- 7 | 8 | ```{r echo = FALSE, message = FALSE} 9 | knitr::opts_chunk$set( 10 | fig.height = 3.5 11 | ) 12 | ``` 13 | 14 | The interrupted Goode homolosine projection is available via `coord_sf()`. 15 | 16 | ```{r message = FALSE} 17 | library(tidyverse) 18 | library(rworldmap) # for getMap() 19 | library(sf) 20 | library(cowplot) 21 | 22 | world_sf <- st_as_sf(getMap(resolution = "low")) 23 | 24 | crs_goode <- "+proj=igh" 25 | 26 | ggplot(world_sf) + 27 | geom_sf(size = 0.5/.pt) + 28 | coord_sf(crs = crs_goode) + 29 | theme_minimal_grid() 30 | ``` 31 | 32 | This figure reveals a few problems: First, land masses that are separated by the cuts (Greenland and Antarctica) are drawn as if they occupied the space between the cuts as well. Second, the parallels are not interrupted by the cuts. We can address these problems by drawing a polygon that covers the parts that should not have been drawn in the first place. 33 | 34 | We start by creating a polygon that encircles the world but excludes the cuts. This is easy in longitude-latitude coordinates. The cuts are at -40 degrees longitude above the equator and at -100, -20, and 80 degrees longitude below the equator. Also remember: The equator lies at 0 degrees latitude, the north pole at 90 degrees latitude, and the south pole at -90 degrees. Latitudes run from -180 degrees to 180 degrees. 35 | 36 | ```{r} 37 | # projection outline in long-lat coordinates 38 | lats <- c( 39 | 90:-90, # right side down 40 | -90:0, 0:-90, # third cut bottom 41 | -90:0, 0:-90, # second cut bottom 42 | -90:0, 0:-90, # first cut bottom 43 | -90:90, # left side up 44 | 90:0, 0:90, # cut top 45 | 90 # close 46 | ) 47 | longs <- c( 48 | rep(180, 181), # right side down 49 | rep(c(80.01, 79.99), each = 91), # third cut bottom 50 | rep(c(-19.99, -20.01), each = 91), # second cut bottom 51 | rep(c(-99.99, -100.01), each = 91), # first cut bottom 52 | rep(-180, 181), # left side up 53 | rep(c(-40.01, -39.99), each = 91), # cut top 54 | 180 # close 55 | ) 56 | 57 | goode_outline <- 58 | list(cbind(longs, lats)) %>% 59 | st_polygon() %>% 60 | st_sfc( 61 | crs = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs" 62 | ) 63 | ``` 64 | 65 | To demonstrate what we have done, we plot the world in long-lat coordinates, with the Goode outline on top. We fill the outline with a light red, mostly transparent color to show where it lies. We see the cuts as four vertical lines, one from the top to the middle and three from the bottom to the middle. 66 | 67 | ```{r} 68 | ggplot(world_sf) + 69 | geom_sf(size = 0.5/.pt) + 70 | geom_sf(data = goode_outline, fill = "#FF808015", color = "#FF8080", size = 0.5) + 71 | theme_minimal_grid() 72 | ``` 73 | 74 | If we repeat the same plot but using the Goode transformation, it becomes clearer what we are doing. Our outline exactly covers the world map, and it has now opened up along the cuts. 75 | ```{r} 76 | ggplot(world_sf) + 77 | geom_sf(size = 0.5/.pt) + 78 | geom_sf(data = goode_outline, fill = "#FF808015", color = "#FF8080", size = 0.5) + 79 | coord_sf(crs = crs_goode) + 80 | theme_minimal_grid() 81 | ``` 82 | 83 | However, we need to cover up the parts that are drawn outside this shaded area. To do so, we create a rectangle that is 10% larger than the bounding box of the Goode outline and then subtract the outline from that rectangle. 84 | ```{r} 85 | # now we need to work in transformed coordinates, not in long-lat coordinates 86 | goode_outline <- st_transform(goode_outline, crs = crs_goode) 87 | 88 | # get the bounding box in transformed coordinates and expand by 10% 89 | xlim <- st_bbox(goode_outline)[c("xmin", "xmax")]*1.1 90 | ylim <- st_bbox(goode_outline)[c("ymin", "ymax")]*1.1 91 | 92 | # turn into enclosing rectangle 93 | goode_encl_rect <- 94 | list( 95 | cbind( 96 | c(xlim[1], xlim[2], xlim[2], xlim[1], xlim[1]), 97 | c(ylim[1], ylim[1], ylim[2], ylim[2], ylim[1]) 98 | ) 99 | ) %>% 100 | st_polygon() %>% 101 | st_sfc(crs = crs_goode) 102 | 103 | # calculate the area outside the earth outline as the difference 104 | # between the enclosing rectangle and the earth outline 105 | goode_without <- st_difference(goode_encl_rect, goode_outline) 106 | 107 | ggplot(world_sf) + 108 | geom_sf(size = 0.5/.pt) + 109 | geom_sf(data = goode_without, fill = "#FF808015", color = "#FF8080", size = 0.5) + 110 | coord_sf(crs = crs_goode) + 111 | theme_minimal_grid() 112 | ``` 113 | 114 | If we plot this polygon with a solid white fill and no outline, we have successfully covered the parts that should not make it into the final figure. 115 | 116 | ```{r} 117 | ggplot(world_sf) + 118 | geom_sf(size = 0.5/.pt) + 119 | geom_sf(data = goode_without, fill = "white", color = "NA") + 120 | coord_sf(crs = crs_goode) + 121 | theme_minimal_grid() 122 | ``` 123 | 124 | Now we can start to apply some styling. We set the background color to light blue and the fill color for the land masses to a light orange-brown. 125 | 126 | ```{r} 127 | ggplot(world_sf) + 128 | geom_sf(fill = "#E69F00B0", color = "black", size = 0.5/.pt) + 129 | geom_sf(data = goode_without, fill = "white", color = "NA") + 130 | coord_sf(crs = crs_goode) + 131 | theme_minimal_grid() + 132 | theme( 133 | panel.background = element_rect(fill = "#56B4E950", color = "white", size = 1), 134 | panel.grid.major = element_line(color = "gray30", size = 0.25) 135 | ) 136 | ``` 137 | 138 | One problem that arises is that `coord_sf()` automatically expands the coordinate system to beyond the size of the largest object drawn, so that we end up with a blue band around the outside of the figure. We can address this problem by setting the limits manually. 139 | 140 | ```{r} 141 | ggplot(world_sf) + 142 | geom_sf(fill = "#E69F00B0", color = "black", size = 0.5/.pt) + 143 | geom_sf(data = goode_without, fill = "white", color = "NA") + 144 | coord_sf(crs = crs_goode, xlim = 0.95*xlim, ylim = 0.95*ylim, expand = FALSE) + 145 | theme_minimal_grid() + 146 | theme( 147 | panel.background = element_rect(fill = "#56B4E950", color = "white", size = 1), 148 | panel.grid.major = element_line(color = "gray30", size = 0.25) 149 | ) 150 | ``` 151 | 152 | Finally, we add the outline back in, to give the map a defined visual extent. 153 | 154 | ```{r} 155 | ggplot(world_sf) + 156 | geom_sf(fill = "#E69F00B0", color = "black", size = 0.5/.pt) + 157 | geom_sf(data = goode_without, fill = "white", color = "NA") + 158 | geom_sf(data = goode_outline, fill = NA, color = "gray30", size = 0.5/.pt) + 159 | coord_sf(crs = crs_goode, xlim = 0.95*xlim, ylim = 0.95*ylim, expand = FALSE) + 160 | theme_minimal_grid() + 161 | theme( 162 | panel.background = element_rect(fill = "#56B4E950", color = "white", size = 1), 163 | panel.grid.major = element_line(color = "gray30", size = 0.25) 164 | ) 165 | ``` 166 | -------------------------------------------------------------------------------- /health_status.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Health status by age" 3 | output: 4 | html_document: 5 | df_print: paged 6 | --- 7 | 8 | ```{r echo = FALSE, message = FALSE} 9 | knitr::opts_chunk$set( 10 | fig.height = 2.7 11 | ) 12 | ``` 13 | 14 | The dviz.supp package contains the dataset we are working with. 15 | ```{r} 16 | # devtools::install_github("clauswilke/dviz.supp") 17 | ``` 18 | 19 | 20 | The data, shown in table form and as basic density plots. 21 | ```{r message = FALSE} 22 | library(tidyverse) 23 | library(cowplot) 24 | 25 | data_health <- select(dviz.supp::happy, age, health) %>% 26 | na.omit() %>% 27 | mutate(health = fct_rev(health)) # revert factor order 28 | 29 | head(data_health) 30 | 31 | ggplot(data_health, aes(x = age, y = stat(count))) + 32 | geom_density(fill = "lightblue") + 33 | facet_wrap(~health, nrow = 1) 34 | ``` 35 | 36 | Add overall distribution as background. 37 | ```{r} 38 | ggplot(data_health, aes(x = age, y = stat(count))) + 39 | # we add a density layer that uses the data without the column 40 | # by which we are faceting; this places the full dataset into 41 | # each facet 42 | geom_density( 43 | data = select(data_health, -health), 44 | aes(fill = "all people surveyed"), 45 | ) + 46 | geom_density(aes(fill = "highlighted group")) + 47 | facet_wrap(~health, nrow = 1) 48 | ``` 49 | 50 | Define the scales. 51 | ```{r} 52 | ggplot(data_health, aes(x = age, y = stat(count))) + 53 | geom_density( 54 | data = select(data_health, -health), 55 | aes(fill = "all people surveyed"), 56 | ) + 57 | geom_density(aes(fill = "highlighted group")) + 58 | scale_x_continuous( 59 | name = "age (years)", 60 | limits = c(15, 98), 61 | expand = c(0, 0) 62 | ) + 63 | scale_y_continuous( 64 | name = "count", 65 | expand = c(0, 0) 66 | ) + 67 | scale_fill_manual( 68 | values = c("#b3b3b3a0", "#2b8cbed0"), 69 | name = NULL, 70 | guide = guide_legend(direction = "horizontal") 71 | ) + 72 | facet_wrap(~health, nrow = 1) 73 | ``` 74 | 75 | Basic theme; move legend to bottom; remove outline around densities. 76 | ```{r} 77 | ggplot(data_health, aes(x = age, y = stat(count))) + 78 | geom_density( 79 | data = select(data_health, -health), 80 | aes(fill = "all people surveyed"), 81 | color = NA 82 | ) + 83 | geom_density(aes(fill = "highlighted group"), color = NA) + 84 | scale_x_continuous( 85 | name = "age (years)", 86 | limits = c(15, 98), 87 | expand = c(0, 0) 88 | ) + 89 | scale_y_continuous( 90 | name = "count", 91 | expand = c(0, 0) 92 | ) + 93 | scale_fill_manual( 94 | values = c("#b3b3b3a0", "#2b8cbed0"), 95 | name = NULL, 96 | guide = guide_legend(direction = "horizontal") 97 | ) + 98 | facet_wrap(~health, nrow = 1) + 99 | theme_minimal_hgrid(12) + 100 | theme( 101 | legend.position = "bottom", 102 | legend.justification = "right" 103 | ) 104 | ``` 105 | 106 | Theme tweaks: Larger strip labels, move legend closer to plot, adjust horizontal legend spacing. 107 | 108 | ```{r} 109 | ggplot(data_health, aes(x = age, y = stat(count))) + 110 | geom_density( 111 | data = select(data_health, -health), 112 | aes(fill = "all people surveyed"), 113 | color = NA 114 | ) + 115 | geom_density(aes(fill = "highlighted group"), color = NA) + 116 | scale_x_continuous( 117 | name = "age (years)", 118 | limits = c(15, 98), 119 | expand = c(0, 0) 120 | ) + 121 | scale_y_continuous( 122 | name = "count", 123 | expand = c(0, 0) 124 | ) + 125 | scale_fill_manual( 126 | values = c("#b3b3b3a0", "#2b8cbed0"), 127 | name = NULL, 128 | guide = guide_legend(direction = "horizontal") 129 | ) + 130 | facet_wrap(~health, nrow = 1) + 131 | theme_minimal_hgrid(12) + 132 | theme( 133 | strip.text = element_text(size = 12, margin = margin(0, 0, 6, 0, "pt")), 134 | legend.position = "bottom", 135 | legend.justification = "right", 136 | legend.margin = margin(6, 0, 1.5, 0, "pt"), 137 | legend.spacing.x = grid::unit(3, "pt"), 138 | legend.spacing.y = grid::unit(0, "pt"), 139 | legend.box.spacing = grid::unit(0, "pt") 140 | ) 141 | ``` 142 | 143 | Remove axis line, add spacing between legend items. 144 | 145 | ```{r} 146 | ggplot(data_health, aes(x = age, y = stat(count))) + 147 | geom_density( 148 | data = select(data_health, -health), 149 | # a simple workaround to a limitation in ggplot2: 150 | # add a few spaces at the end of the legend text 151 | # to space out the legend items 152 | aes(fill = "all people surveyed "), 153 | color = NA 154 | ) + 155 | geom_density(aes(fill = "highlighted group"), color = NA) + 156 | scale_x_continuous( 157 | name = "age (years)", 158 | limits = c(15, 98), 159 | expand = c(0, 0) 160 | ) + 161 | scale_y_continuous( 162 | name = "count", 163 | expand = c(0, 0) 164 | ) + 165 | scale_fill_manual( 166 | values = c("#b3b3b3a0", "#2b8cbed0"), 167 | name = NULL, 168 | guide = guide_legend(direction = "horizontal") 169 | ) + 170 | facet_wrap(~health, nrow = 1) + 171 | theme_minimal_hgrid(12) + 172 | theme( 173 | axis.line = element_blank(), 174 | strip.text = element_text(size = 12, margin = margin(0, 0, 6, 0, "pt")), 175 | legend.position = "bottom", 176 | legend.justification = "right", 177 | legend.margin = margin(6, 0, 1.5, 0, "pt"), 178 | legend.spacing.x = grid::unit(3, "pt"), 179 | legend.spacing.y = grid::unit(0, "pt"), 180 | legend.box.spacing = grid::unit(0, "pt") 181 | ) 182 | ``` 183 | 184 | Turn off clipping. 185 | 186 | ```{r} 187 | ggplot(data_health, aes(x = age, y = stat(count))) + 188 | geom_density( 189 | data = select(data_health, -health), 190 | aes(fill = "all people surveyed "), 191 | color = NA 192 | ) + 193 | geom_density(aes(fill = "highlighted group"), color = NA) + 194 | scale_x_continuous( 195 | name = "age (years)", 196 | limits = c(15, 98), 197 | expand = c(0, 0) 198 | ) + 199 | scale_y_continuous( 200 | name = "count", 201 | expand = c(0, 0) 202 | ) + 203 | scale_fill_manual( 204 | values = c("#b3b3b3a0", "#2b8cbed0"), 205 | name = NULL, 206 | guide = guide_legend(direction = "horizontal") 207 | ) + 208 | facet_wrap(~health, nrow = 1) + 209 | coord_cartesian(clip = "off") + 210 | theme_minimal_hgrid(12) + 211 | theme( 212 | axis.line = element_blank(), 213 | strip.text = element_text(size = 12, margin = margin(0, 0, 6, 0, "pt")), 214 | legend.position = "bottom", 215 | legend.justification = "right", 216 | legend.margin = margin(6, 0, 1.5, 0, "pt"), 217 | legend.spacing.x = grid::unit(3, "pt"), 218 | legend.spacing.y = grid::unit(0, "pt"), 219 | legend.box.spacing = grid::unit(0, "pt") 220 | ) 221 | ``` 222 | -------------------------------------------------------------------------------- /practical_ggplot2.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | --------------------------------------------------------------------------------