├── .gitignore ├── Building-a-ggplot_files └── figure-gfm │ ├── unnamed-chunk-3-1.png │ ├── unnamed-chunk-4-1.png │ ├── unnamed-chunk-5-1.png │ ├── unnamed-chunk-7-1.png │ ├── unnamed-chunk-8-1.png │ ├── unnamed-chunk-9-1.png │ ├── unnamed-chunk-10-1.png │ ├── unnamed-chunk-11-1.png │ ├── unnamed-chunk-12-1.png │ ├── unnamed-chunk-14-1.png │ ├── unnamed-chunk-15-1.png │ ├── unnamed-chunk-16-1.png │ ├── unnamed-chunk-17-1.png │ └── unnamed-chunk-18-1.png ├── BrightonR-ggplot-demo.Rproj ├── README.md ├── live └── building-a-ggplot.R ├── Building-a-ggplot.Rmd └── Building-a-ggplot.md /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | /recording/* 6 | -------------------------------------------------------------------------------- /Building-a-ggplot_files/figure-gfm/unnamed-chunk-3-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MangoTheCat/BrightonR-ggplot-demo/master/Building-a-ggplot_files/figure-gfm/unnamed-chunk-3-1.png -------------------------------------------------------------------------------- /Building-a-ggplot_files/figure-gfm/unnamed-chunk-4-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MangoTheCat/BrightonR-ggplot-demo/master/Building-a-ggplot_files/figure-gfm/unnamed-chunk-4-1.png -------------------------------------------------------------------------------- /Building-a-ggplot_files/figure-gfm/unnamed-chunk-5-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MangoTheCat/BrightonR-ggplot-demo/master/Building-a-ggplot_files/figure-gfm/unnamed-chunk-5-1.png -------------------------------------------------------------------------------- /Building-a-ggplot_files/figure-gfm/unnamed-chunk-7-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MangoTheCat/BrightonR-ggplot-demo/master/Building-a-ggplot_files/figure-gfm/unnamed-chunk-7-1.png -------------------------------------------------------------------------------- /Building-a-ggplot_files/figure-gfm/unnamed-chunk-8-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MangoTheCat/BrightonR-ggplot-demo/master/Building-a-ggplot_files/figure-gfm/unnamed-chunk-8-1.png -------------------------------------------------------------------------------- /Building-a-ggplot_files/figure-gfm/unnamed-chunk-9-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MangoTheCat/BrightonR-ggplot-demo/master/Building-a-ggplot_files/figure-gfm/unnamed-chunk-9-1.png -------------------------------------------------------------------------------- /Building-a-ggplot_files/figure-gfm/unnamed-chunk-10-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MangoTheCat/BrightonR-ggplot-demo/master/Building-a-ggplot_files/figure-gfm/unnamed-chunk-10-1.png -------------------------------------------------------------------------------- /Building-a-ggplot_files/figure-gfm/unnamed-chunk-11-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MangoTheCat/BrightonR-ggplot-demo/master/Building-a-ggplot_files/figure-gfm/unnamed-chunk-11-1.png -------------------------------------------------------------------------------- /Building-a-ggplot_files/figure-gfm/unnamed-chunk-12-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MangoTheCat/BrightonR-ggplot-demo/master/Building-a-ggplot_files/figure-gfm/unnamed-chunk-12-1.png -------------------------------------------------------------------------------- /Building-a-ggplot_files/figure-gfm/unnamed-chunk-14-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MangoTheCat/BrightonR-ggplot-demo/master/Building-a-ggplot_files/figure-gfm/unnamed-chunk-14-1.png -------------------------------------------------------------------------------- /Building-a-ggplot_files/figure-gfm/unnamed-chunk-15-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MangoTheCat/BrightonR-ggplot-demo/master/Building-a-ggplot_files/figure-gfm/unnamed-chunk-15-1.png -------------------------------------------------------------------------------- /Building-a-ggplot_files/figure-gfm/unnamed-chunk-16-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MangoTheCat/BrightonR-ggplot-demo/master/Building-a-ggplot_files/figure-gfm/unnamed-chunk-16-1.png -------------------------------------------------------------------------------- /Building-a-ggplot_files/figure-gfm/unnamed-chunk-17-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MangoTheCat/BrightonR-ggplot-demo/master/Building-a-ggplot_files/figure-gfm/unnamed-chunk-17-1.png -------------------------------------------------------------------------------- /Building-a-ggplot_files/figure-gfm/unnamed-chunk-18-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MangoTheCat/BrightonR-ggplot-demo/master/Building-a-ggplot_files/figure-gfm/unnamed-chunk-18-1.png -------------------------------------------------------------------------------- /BrightonR-ggplot-demo.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # BrightonR-ggplot-demo 2 | The code to go along with the "Building a ggplot" talk/live demo from BrightonR on the 20/08/2020 3 | 4 | The talk will build up the plot in real time (wow!) but check out `Building-a-ggplot.md` for a step-by-step run through! 5 | The code I put together during the talk is in the `live` directory. 6 | 7 | ## Data 8 | I've taken the data from tidy tuesday, specifically week 14 on beer! 9 | -------------------------------------------------------------------------------- /live/building-a-ggplot.R: -------------------------------------------------------------------------------- 1 | library(tidytuesdayR) 2 | library(dplyr) 3 | library(ggplot2) 4 | 5 | beer <- tt_load(2020, week = 14) 6 | 7 | beer_states <- beer$beer_states %>% 8 | filter(!is.na(barrels), barrels != 0) 9 | 10 | beer_states <- beer_states %>% 11 | filter(state != "total") 12 | 13 | df <- data.frame( 14 | x = 10, 15 | y = 100, 16 | type = "On Premises", 17 | label = "An increase in recent years" 18 | ) 19 | 20 | ggplot(data = beer_states, 21 | mapping = aes(x = factor(year), 22 | y = barrels)) + 23 | geom_boxplot(fill = "orange") + 24 | labs(x = "Year", 25 | y = "Beer production (# of barrels)", 26 | title = "Beer produced in each state, split by type and year") + 27 | scale_y_log10(labels = scales::comma) + 28 | facet_grid(rows = vars(type)) + 29 | geom_text(mapping = aes(x = x, y = y, label = label), 30 | data = df, 31 | size = 3) + 32 | theme_minimal() 33 | 34 | # GitHub: 35 | # MangoTheCat/BrightonR-ggplot-demo 36 | # 37 | # jtalboys@mango-solutions.com 38 | 39 | 40 | -------------------------------------------------------------------------------- /Building-a-ggplot.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Building a ggplot!" 3 | author: "Jack Talboys" 4 | date: "20/08/2020" 5 | output: github_document 6 | --- 7 | 8 | Hello! Welcome to the step-by-step walkthrough of my talk around building up a ggplot from BrightonR on the 20th of August. From personal experience, using `{ggplot2}` most effectively involves a lot of trial and error and plenty of googling, unfortunately I can't give a massive sense of that in this document but just know there was plenty of chopping and changing of plots before I settled on something I liked! 9 | 10 | We're going to use data from #TidyTuesday, for anyone not aware check out it's [github page](https://github.com/rfordatascience/tidytuesday) for more information. Essentially a weekly dataset is released, and anyone can create a visualization however they want, and show it to the world using the #TidyTuesday on twitter (a really good way to get your data visualizations out there!). 11 | 12 | We're going to look at the data on beer from week 14! 13 | 14 | Start by pulling the data using the `{tidytuesdayR}` package then assigning each individual data frame to its own object. 15 | 16 | ```{r, message=FALSE} 17 | # install.packages("tidytuesdayR") 18 | library(tidytuesdayR) 19 | beer <- tt_load(2020, week = 14) 20 | 21 | beer_states <- beer$beer_states 22 | beer_taxed <- beer$beer_taxed 23 | brewer_size <- beer$brewer_size 24 | brewing_materials <- beer$brewing_materials 25 | ``` 26 | 27 | I've only got half an hour so just going to look at improving a simple visualization that doesn't require much pre-processing of the data. 28 | The figures come from the `beer_states` dataset: 29 | 30 | ```{r, message=FALSE} 31 | head(beer_states) 32 | 33 | 34 | # remove all rows with an NA or 0 in barrels column, 35 | # I'm sure these have an interesting story but I've not got time to 36 | # explore it here! 37 | library(dplyr) 38 | beer_states <- beer_states %>% 39 | filter(!is.na(barrels), 40 | barrels != 0) 41 | ``` 42 | 43 | Starting with a simple plot of the number of barrels produced by year. This is where we finally get to use `{ggplot2}` - there's some steps that will be necessary throughout all of your ggplot's so I'll introduce those first. 44 | 45 | ```{r} 46 | # first load (and maybe install if you need to) the ggplot2 package 47 | # install.packages("ggplot2") 48 | library(ggplot2) 49 | 50 | # All plots begin with a call to ggplot, inside we specify the data we're going 51 | # to create our visualization from 52 | ggplot(data = beer_states) 53 | 54 | ``` 55 | 56 | This returns a blank plot, and it's from here that we start to build our plot. 57 | 58 | We decided to plot number of barrels produced by year, to specify this we can add to our `ggplot` call by passing information through to the `mapping` argument via the `aes` function as follows: 59 | 60 | ```{r} 61 | ggplot(data = beer_states, 62 | mapping = aes(x = year, 63 | y = barrels)) 64 | ``` 65 | 66 | So this time we get a blank plot, but our axis are in place with the correct range of values and labels taken straight from the column names. 67 | 68 | No data has been plotted yet because we've not told `{ggplot2}` how we'd like our data to appear. This is where we get to start making choices beyond just our data. I'd highly recommend starting with the [`{ggplot2}` cheatsheet](https://rstudio.com/wp-content/uploads/2015/05/ggplot2-cheatsheet.pdf). 69 | 70 | In our case we've got two variables (R thinks they're both continuous but we know that really `year` is categorical) so we can see from the cheatsheet there's plenty of options. We'll start with a simple scatter plot using `geom_point` 71 | 72 | ```{r} 73 | ggplot(data = beer_states, 74 | mapping = aes(x = year, 75 | y = barrels)) + # we chain functions together using `+` 76 | # think of it as the %>% for ggplots 77 | geom_point() 78 | ``` 79 | 80 | Not the best plot - we see a massive outlier for each along the top of the plot, a bit of investigation and we realise this is actually the total row for each year which combines the barrels produced from each state for every year. Let's remove these rows using `filter` from `{dplyr}` 81 | 82 | ```{r, message=FALSE} 83 | library(dplyr) 84 | 85 | beer_states %>% 86 | arrange(-barrels) %>% 87 | head() 88 | 89 | # remove all rows where state = total 90 | beer_states <- beer_states %>% 91 | filter(state != "total") # read this as "take all rows where state is not equal to total" 92 | 93 | ``` 94 | 95 | Now let's have a look at the plot from before 96 | 97 | ```{r} 98 | ggplot(data = beer_states, 99 | mapping = aes(x = year, 100 | y = barrels)) + 101 | geom_point() 102 | ``` 103 | 104 | So our data is a bit clearer to see now. Next we want to convert our `year` variable so that it's a factor rather than it's current numerical state. This is as simple as wrapping our call to `year` in the `aes` function with the `factor` function like so: 105 | 106 | ```{r} 107 | ggplot(data = beer_states, 108 | mapping = aes(x = factor(year), 109 | y = barrels)) + 110 | geom_point() 111 | ``` 112 | 113 | Great - this has also sorted out the axis ticks! Although it has also messed up the x axis label. 114 | 115 | One thing I'm not so keen on is the high density of points at the bottom of the plot for all years, as the values are close together the points overlap so it's hard to tell how many are _really_ down there. 116 | 117 | One solution is to change our geom, `geom_jitter` will add some random noise to each observation to spread them out. Here we only want some horizontal jitter, any vertical jitter would change the value of `barrels` that we perceive for that point. We can control the amount of horizontal and vertical jitter using `height` and `width` respectively (I had to check the help file for that don't worry). 118 | 119 | ```{r} 120 | ggplot(data = beer_states, 121 | mapping = aes(x = factor(year), 122 | y = barrels)) + 123 | geom_jitter(height = 0, 124 | width = 0.2) 125 | ``` 126 | 127 | It takes a bit of experimenting with the value of `width` to get the right amount of _jitter_, but `0.2` seems to work well. We want to be able to see most of the points (or at least appreciate the density of points everywhere) but still have clearly defined categories (in this case the years). 128 | 129 | Editing the labels on a ggplot can be done in two different ways, either specifying all labels in the `labs` function, or for significant labels like the title and the axis there are specific functions: `xlab`, `ylab` and `ggtitle`. For now I'm going to use `labs` but it really doesn't matter 130 | 131 | ```{r} 132 | ggplot(data = beer_states, 133 | mapping = aes(x = factor(year), 134 | y = barrels)) + 135 | geom_jitter(height = 0, 136 | width = 0.2) + 137 | labs(x = "Year", 138 | y = "Beer produced (# of barrels)", 139 | title = "Barrels of beer produced in each category by year across states") 140 | ``` 141 | 142 | Looking at the long tail for each year, it might be worth looking at this plot with a logarithmic scale on the y axis, we can implement this using one of the `scale_*` functions from `{ggplot2}`, there's a wide range of them but here we'll need to find the one which implements logarithms base 10. 143 | 144 | ```{r} 145 | ggplot(data = beer_states, 146 | mapping = aes(x = factor(year), 147 | y = barrels)) + 148 | geom_jitter(height = 0, 149 | width = 0.2) + 150 | labs(x = "Year", 151 | y = "Beer produced (# of barrels)", 152 | title = "Barrels of beer produced in each category by year across states") + 153 | scale_y_log10(labels = scales::comma) # we can also change from scientific notation here 154 | ``` 155 | We make better use of the space on offer using a log scale, and the addition of `scales::comma` I think really shows the log scale a lot clearer than the scientific notation from before. 156 | 157 | I don't know if the points are working for me, luckily all we need to do to change the geom is change the call from `geom_jitter` to `geom_boxplot` (for example) 158 | 159 | ```{r} 160 | ggplot(data = beer_states, 161 | mapping = aes(x = factor(year), 162 | y = barrels)) + 163 | geom_boxplot() + 164 | labs(x = "Year", 165 | y = "Beer produced (# of barrels)", 166 | title = "Barrels of beer produced in each category by year across states") + 167 | scale_y_log10(labels = scales::comma) 168 | ``` 169 | 170 | We actually have 3 different types of beer output in this dataset 171 | 172 | ```{r} 173 | unique(beer_states$type) 174 | ``` 175 | 176 | Splitting our plot so that there's a separate box plot for each type of beer is as easy as adding to the `aes` call in our initial call to `ggplot`. 177 | 178 | ```{r} 179 | ggplot(data = beer_states, 180 | mapping = aes(x = factor(year), 181 | y = barrels, 182 | colour = type)) + 183 | geom_boxplot() + 184 | labs(x = "Year", 185 | y = "Beer produced (# of barrels)", 186 | title = "Barrels of beer produced in each category by year across states") + 187 | scale_y_log10(labels = scales::comma) 188 | ``` 189 | 190 | This plot isn't ideal - everything's a bit squished. Something we could try instead is 'faceting', in this case that's where each type of beer will get it's own plot: 191 | 192 | ```{r} 193 | ggplot(data = beer_states, 194 | mapping = aes(x = factor(year), 195 | y = barrels)) + # remove colour = type 196 | geom_boxplot() + 197 | labs(x = "Year", 198 | y = "Beer produced (# of barrels)", 199 | title = "Barrels of beer produced in each category by year across states") + 200 | scale_y_log10(labels = scales::comma) + 201 | facet_grid(rows = vars(type)) # have to wrap the variable with `vars()` 202 | ``` 203 | 204 | Adding an annotation on to just the "On Premises" plot is the trickiest thing we'll do in this example (and took the most googling during my prep). It requires defining a new data frame which we pass through to `geom_text`. 205 | 206 | ```{r} 207 | # To add annotations to a facetted plot we need a separate data frame 208 | # (I'll be honest this bit took some googling) 209 | df <- data.frame(x = 10.5, 210 | y = 100, 211 | label = "Increase in recent years \nin on premises beer", 212 | type = "On Premises") 213 | 214 | ggplot(data = beer_states, 215 | mapping = aes(x = factor(year), 216 | y = barrels)) + 217 | geom_boxplot() + 218 | labs(x = "Year", 219 | y = "Beer produced (# of barrels)", 220 | title = "Barrels of beer produced in each category by year across states") + 221 | scale_y_log10(labels = scales::comma) + 222 | facet_grid(rows = vars(type)) + 223 | geom_text(mapping = aes(x = x, y = y, label = label), 224 | data = df, 225 | size = 4) # The size argument does what it says on the tin 226 | ``` 227 | 228 | Now we can focus on colouring and making cosmestic changes to our plot, we can make our boxplots look (kind of) like beer by giving them a lovely orange colour. Pass `'orange'` through to the fill argument of `geom_boxplot`. 229 | 230 | ```{r} 231 | 232 | df <- data.frame(x = 10.5, 233 | y = 100, 234 | label = "Increase in recent years \nin on premises beer", 235 | type = "On Premises") 236 | 237 | ggplot(data = beer_states, 238 | mapping = aes(x = factor(year), 239 | y = barrels)) + 240 | geom_boxplot(fill = 'orange') + 241 | labs(x = "Year", 242 | y = "Beer produced (# of barrels)", 243 | title = "Barrels of beer produced in each category by year across states") + 244 | scale_y_log10(labels = scales::comma) + 245 | facet_grid(rows = vars(type)) + 246 | geom_text(mapping = aes(x = x, y = y, label = label), 247 | data = df, 248 | size = 4) 249 | ``` 250 | 251 | There's loads of little changes we can make to the look of our plot (just check out all the arguments to the `theme` function). So you've got a massive amount of control over your plot. But, if you're lazy like me, there's so pre-made themes you can use... just tag the theme function you want on to the end of our stack of functions 252 | 253 | ```{r} 254 | 255 | df <- data.frame(x = 10.5, 256 | y = 100, 257 | label = "Increase in recent years \nin on premises beer", 258 | type = "On Premises") 259 | 260 | ggplot(data = beer_states, 261 | mapping = aes(x = factor(year), 262 | y = barrels)) + 263 | geom_boxplot(fill = 'orange') + 264 | labs(x = "Year", 265 | y = "Beer produced (# of barrels)", 266 | title = "Barrels of beer produced in each category by year across states") + 267 | scale_y_log10(labels = scales::comma) + 268 | facet_grid(rows = vars(type)) + 269 | geom_text(mapping = aes(x = x, y = y, label = label), 270 | data = df, 271 | size = 4) + 272 | theme_minimal() 273 | ``` 274 | 275 | As you can see the theme makes a big difference here! 276 | 277 | Well that's it for building up this ggplot, hopefully even in just half an hour I've been able to give you a sense of the process for making your best ggplot! Thanks for listening/reading/watching ,any questions feel free to get in touch: jtalboys@mango-solutions.com and (shameless plug) check out [Mango's website](https://www.mango-solutions.com/) for how we can help you and your company on your data journey! -------------------------------------------------------------------------------- /Building-a-ggplot.md: -------------------------------------------------------------------------------- 1 | Building a ggplot\! 2 | ================ 3 | Jack Talboys 4 | 20/08/2020 5 | 6 | Hello\! Welcome to the step-by-step walkthrough of my talk around 7 | building up a ggplot from BrightonR on the 20th of August. From personal 8 | experience, using `{ggplot2}` most effectively involves a lot of trial 9 | and error and plenty of googling, unfortunately I can’t give a massive 10 | sense of that in this document but just know there was plenty of 11 | chopping and changing of plots before I settled on something I liked\! 12 | 13 | We’re going to use data from \#TidyTuesday, for anyone not aware check 14 | out it’s [github page](https://github.com/rfordatascience/tidytuesday) 15 | for more information. Essentially a weekly dataset is released, and 16 | anyone can create a visualization however they want, and show it to the 17 | world using the \#TidyTuesday on twitter (a really good way to get your 18 | data visualizations out there\!). 19 | 20 | We’re going to look at the data on beer from week 14\! 21 | 22 | Start by pulling the data using the `{tidytuesdayR}` package then 23 | assigning each individual data frame to its own object. 24 | 25 | ``` r 26 | # install.packages("tidytuesdayR") 27 | library(tidytuesdayR) 28 | beer <- tt_load(2020, week = 14) 29 | ``` 30 | 31 | ## 32 | ## Downloading file 1 of 4: `beer_states.csv` 33 | ## Downloading file 2 of 4: `beer_taxed.csv` 34 | ## Downloading file 3 of 4: `brewer_size.csv` 35 | ## Downloading file 4 of 4: `brewing_materials.csv` 36 | 37 | ``` r 38 | beer_states <- beer$beer_states 39 | beer_taxed <- beer$beer_taxed 40 | brewer_size <- beer$brewer_size 41 | brewing_materials <- beer$brewing_materials 42 | ``` 43 | 44 | I’ve only got half an hour so just going to look at improving a simple 45 | visualization that doesn’t require much pre-processing of the data. The 46 | figures come from the `beer_states` dataset: 47 | 48 | ``` r 49 | head(beer_states) 50 | ``` 51 | 52 | ## # A tibble: 6 x 4 53 | ## state year barrels type 54 | ## 55 | ## 1 AK 2008 2068. On Premises 56 | ## 2 AK 2009 2264. On Premises 57 | ## 3 AK 2010 1929. On Premises 58 | ## 4 AK 2011 2251. On Premises 59 | ## 5 AK 2012 2312. On Premises 60 | ## 6 AK 2013 2156. On Premises 61 | 62 | ``` r 63 | # remove all rows with an NA or 0 in barrels column, 64 | # I'm sure these have an interesting story but I've not got time to 65 | # explore it here! 66 | library(dplyr) 67 | beer_states <- beer_states %>% 68 | filter(!is.na(barrels), 69 | barrels != 0) 70 | ``` 71 | 72 | Starting with a simple plot of the number of barrels produced by year. 73 | This is where we finally get to use `{ggplot2}` - there’s some steps 74 | that will be necessary throughout all of your ggplot’s so I’ll introduce 75 | those first. 76 | 77 | ``` r 78 | # first load (and maybe install if you need to) the ggplot2 package 79 | # install.packages("ggplot2") 80 | library(ggplot2) 81 | 82 | # All plots begin with a call to ggplot, inside we specify the data we're going 83 | # to create our visualization from 84 | ggplot(data = beer_states) 85 | ``` 86 | 87 | ![](Building-a-ggplot_files/figure-gfm/unnamed-chunk-3-1.png) 88 | 89 | This returns a blank plot, and it’s from here that we start to build our 90 | plot. 91 | 92 | We decided to plot number of barrels produced by year, to specify this 93 | we can add to our `ggplot` call by passing information through to the 94 | `mapping` argument via the `aes` function as follows: 95 | 96 | ``` r 97 | ggplot(data = beer_states, 98 | mapping = aes(x = year, 99 | y = barrels)) 100 | ``` 101 | 102 | ![](Building-a-ggplot_files/figure-gfm/unnamed-chunk-4-1.png) 103 | 104 | So this time we get a blank plot, but our axis are in place with the 105 | correct range of values and labels taken straight from the column names. 106 | 107 | No data has been plotted yet because we’ve not told `{ggplot2}` how we’d 108 | like our data to appear. This is where we get to start making choices 109 | beyond just our data. I’d highly recommend starting with the 110 | [`{ggplot2}` 111 | cheatsheet](https://rstudio.com/wp-content/uploads/2015/05/ggplot2-cheatsheet.pdf). 112 | 113 | In our case we’ve got two variables (R thinks they’re both continuous 114 | but we know that really `year` is categorical) so we can see from the 115 | cheatsheet there’s plenty of options. We’ll start with a simple scatter 116 | plot using `geom_point` 117 | 118 | ``` r 119 | ggplot(data = beer_states, 120 | mapping = aes(x = year, 121 | y = barrels)) + # we chain functions together using `+` 122 | # think of it as the %>% for ggplots 123 | geom_point() 124 | ``` 125 | 126 | ![](Building-a-ggplot_files/figure-gfm/unnamed-chunk-5-1.png) 127 | 128 | Not the best plot - we see a massive outlier for each along the top of 129 | the plot, a bit of investigation and we realise this is actually the 130 | total row for each year which combines the barrels produced from each 131 | state for every year. Let’s remove these rows using `filter` from 132 | `{dplyr}` 133 | 134 | ``` r 135 | library(dplyr) 136 | 137 | beer_states %>% 138 | arrange(-barrels) %>% 139 | head() 140 | ``` 141 | 142 | ## # A tibble: 6 x 4 143 | ## state year barrels type 144 | ## 145 | ## 1 total 2008 166930012. Bottles and Cans 146 | ## 2 total 2009 165432247. Bottles and Cans 147 | ## 3 total 2010 162972113. Bottles and Cans 148 | ## 4 total 2012 161692656. Bottles and Cans 149 | ## 5 total 2011 159708194. Bottles and Cans 150 | ## 6 total 2013 159413579. Bottles and Cans 151 | 152 | ``` r 153 | # remove all rows where state = total 154 | beer_states <- beer_states %>% 155 | filter(state != "total") # read this as "take all rows where state is not equal to total" 156 | ``` 157 | 158 | Now let’s have a look at the plot from before 159 | 160 | ``` r 161 | ggplot(data = beer_states, 162 | mapping = aes(x = year, 163 | y = barrels)) + 164 | geom_point() 165 | ``` 166 | 167 | ![](Building-a-ggplot_files/figure-gfm/unnamed-chunk-7-1.png) 168 | 169 | So our data is a bit clearer to see now. Next we want to convert our 170 | `year` variable so that it’s a factor rather than it’s current numerical 171 | state. This is as simple as wrapping our call to `year` in the `aes` 172 | function with the `factor` function like so: 173 | 174 | ``` r 175 | ggplot(data = beer_states, 176 | mapping = aes(x = factor(year), 177 | y = barrels)) + 178 | geom_point() 179 | ``` 180 | 181 | ![](Building-a-ggplot_files/figure-gfm/unnamed-chunk-8-1.png) 182 | 183 | Great - this has also sorted out the axis ticks\! Although it has also 184 | messed up the x axis label. 185 | 186 | One thing I’m not so keen on is the high density of points at the bottom 187 | of the plot for all years, as the values are close together the points 188 | overlap so it’s hard to tell how many are *really* down there. 189 | 190 | One solution is to change our geom, `geom_jitter` will add some random 191 | noise to each observation to spread them out. Here we only want some 192 | horizontal jitter, any vertical jitter would change the value of 193 | `barrels` that we perceive for that point. We can control the amount of 194 | horizontal and vertical jitter using `height` and `width` respectively 195 | (I had to check the help file for that don’t worry). 196 | 197 | ``` r 198 | ggplot(data = beer_states, 199 | mapping = aes(x = factor(year), 200 | y = barrels)) + 201 | geom_jitter(height = 0, 202 | width = 0.2) 203 | ``` 204 | 205 | ![](Building-a-ggplot_files/figure-gfm/unnamed-chunk-9-1.png) 206 | 207 | It takes a bit of experimenting with the value of `width` to get the 208 | right amount of *jitter*, but `0.2` seems to work well. We want to be 209 | able to see most of the points (or at least appreciate the density of 210 | points everywhere) but still have clearly defined categories (in this 211 | case the years). 212 | 213 | Editing the labels on a ggplot can be done in two different ways, either 214 | specifying all labels in the `labs` function, or for significant labels 215 | like the title and the axis there are specific functions: `xlab`, `ylab` 216 | and `ggtitle`. For now I’m going to use `labs` but it really doesn’t 217 | matter 218 | 219 | ``` r 220 | ggplot(data = beer_states, 221 | mapping = aes(x = factor(year), 222 | y = barrels)) + 223 | geom_jitter(height = 0, 224 | width = 0.2) + 225 | labs(x = "Year", 226 | y = "Beer produced (# of barrels)", 227 | title = "Barrels of beer produced in each category by year across states") 228 | ``` 229 | 230 | ![](Building-a-ggplot_files/figure-gfm/unnamed-chunk-10-1.png) 231 | 232 | Looking at the long tail for each year, it might be worth looking at 233 | this plot with a logarithmic scale on the y axis, we can implement this 234 | using one of the `scale_*` functions from `{ggplot2}`, there’s a wide 235 | range of them but here we’ll need to find the one which implements 236 | logarithms base 10. 237 | 238 | ``` r 239 | ggplot(data = beer_states, 240 | mapping = aes(x = factor(year), 241 | y = barrels)) + 242 | geom_jitter(height = 0, 243 | width = 0.2) + 244 | labs(x = "Year", 245 | y = "Beer produced (# of barrels)", 246 | title = "Barrels of beer produced in each category by year across states") + 247 | scale_y_log10(labels = scales::comma) # we can also change from scientific notation here 248 | ``` 249 | 250 | ![](Building-a-ggplot_files/figure-gfm/unnamed-chunk-11-1.png) 251 | We make better use of the space on offer using a log scale, and the 252 | addition of `scales::comma` I think really shows the log scale a lot 253 | clearer than the scientific notation from before. 254 | 255 | I don’t know if the points are working for me, luckily all we need to do 256 | to change the geom is change the call from `geom_jitter` to 257 | `geom_boxplot` (for example) 258 | 259 | ``` r 260 | ggplot(data = beer_states, 261 | mapping = aes(x = factor(year), 262 | y = barrels)) + 263 | geom_boxplot() + 264 | labs(x = "Year", 265 | y = "Beer produced (# of barrels)", 266 | title = "Barrels of beer produced in each category by year across states") + 267 | scale_y_log10(labels = scales::comma) 268 | ``` 269 | 270 | ![](Building-a-ggplot_files/figure-gfm/unnamed-chunk-12-1.png) 271 | 272 | We actually have 3 different types of beer output in this dataset 273 | 274 | ``` r 275 | unique(beer_states$type) 276 | ``` 277 | 278 | ## [1] "On Premises" "Bottles and Cans" "Kegs and Barrels" 279 | 280 | Splitting our plot so that there’s a separate box plot for each type of 281 | beer is as easy as adding to the `aes` call in our initial call to 282 | `ggplot`. 283 | 284 | ``` r 285 | ggplot(data = beer_states, 286 | mapping = aes(x = factor(year), 287 | y = barrels, 288 | colour = type)) + 289 | geom_boxplot() + 290 | labs(x = "Year", 291 | y = "Beer produced (# of barrels)", 292 | title = "Barrels of beer produced in each category by year across states") + 293 | scale_y_log10(labels = scales::comma) 294 | ``` 295 | 296 | ![](Building-a-ggplot_files/figure-gfm/unnamed-chunk-14-1.png) 297 | 298 | This plot isn’t ideal - everything’s a bit squished. Something we could 299 | try instead is ‘faceting’, in this case that’s where each type of beer 300 | will get it’s own plot: 301 | 302 | ``` r 303 | ggplot(data = beer_states, 304 | mapping = aes(x = factor(year), 305 | y = barrels)) + # remove colour = type 306 | geom_boxplot() + 307 | labs(x = "Year", 308 | y = "Beer produced (# of barrels)", 309 | title = "Barrels of beer produced in each category by year across states") + 310 | scale_y_log10(labels = scales::comma) + 311 | facet_grid(rows = vars(type)) # have to wrap the variable with `vars()` 312 | ``` 313 | 314 | ![](Building-a-ggplot_files/figure-gfm/unnamed-chunk-15-1.png) 315 | 316 | Adding an annotation on to just the “On Premises” plot is the trickiest 317 | thing we’ll do in this example (and took the most googling during my 318 | prep). It requires defining a new data frame which we pass through to 319 | `geom_text`. 320 | 321 | ``` r 322 | # To add annotations to a facetted plot we need a separate data frame 323 | # (I'll be honest this bit took some googling) 324 | df <- data.frame(x = 10.5, 325 | y = 100, 326 | label = "Increase in recent years \nin on premises beer", 327 | type = "On Premises") 328 | 329 | ggplot(data = beer_states, 330 | mapping = aes(x = factor(year), 331 | y = barrels)) + 332 | geom_boxplot() + 333 | labs(x = "Year", 334 | y = "Beer produced (# of barrels)", 335 | title = "Barrels of beer produced in each category by year across states") + 336 | scale_y_log10(labels = scales::comma) + 337 | facet_grid(rows = vars(type)) + 338 | geom_text(mapping = aes(x = x, y = y, label = label), 339 | data = df, 340 | size = 4) # The size argument does what it says on the tin 341 | ``` 342 | 343 | ![](Building-a-ggplot_files/figure-gfm/unnamed-chunk-16-1.png) 344 | 345 | Now we can focus on colouring and making cosmestic changes to our plot, 346 | we can make our boxplots look (kind of) like beer by giving them a 347 | lovely orange colour. Pass `'orange'` through to the fill argument of 348 | `geom_boxplot`. 349 | 350 | ``` r 351 | df <- data.frame(x = 10.5, 352 | y = 100, 353 | label = "Increase in recent years \nin on premises beer", 354 | type = "On Premises") 355 | 356 | ggplot(data = beer_states, 357 | mapping = aes(x = factor(year), 358 | y = barrels)) + 359 | geom_boxplot(fill = 'orange') + 360 | labs(x = "Year", 361 | y = "Beer produced (# of barrels)", 362 | title = "Barrels of beer produced in each category by year across states") + 363 | scale_y_log10(labels = scales::comma) + 364 | facet_grid(rows = vars(type)) + 365 | geom_text(mapping = aes(x = x, y = y, label = label), 366 | data = df, 367 | size = 4) 368 | ``` 369 | 370 | ![](Building-a-ggplot_files/figure-gfm/unnamed-chunk-17-1.png) 371 | 372 | There’s loads of little changes we can make to the look of our plot 373 | (just check out all the arguments to the `theme` function). So you’ve 374 | got a massive amount of control over your plot. But, if you’re lazy like 375 | me, there’s so pre-made themes you can use… just tag the theme function 376 | you want on to the end of our stack of functions 377 | 378 | ``` r 379 | df <- data.frame(x = 10.5, 380 | y = 100, 381 | label = "Increase in recent years \nin on premises beer", 382 | type = "On Premises") 383 | 384 | ggplot(data = beer_states, 385 | mapping = aes(x = factor(year), 386 | y = barrels)) + 387 | geom_boxplot(fill = 'orange') + 388 | labs(x = "Year", 389 | y = "Beer produced (# of barrels)", 390 | title = "Barrels of beer produced in each category by year across states") + 391 | scale_y_log10(labels = scales::comma) + 392 | facet_grid(rows = vars(type)) + 393 | geom_text(mapping = aes(x = x, y = y, label = label), 394 | data = df, 395 | size = 4) + 396 | theme_minimal() 397 | ``` 398 | 399 | ![](Building-a-ggplot_files/figure-gfm/unnamed-chunk-18-1.png) 400 | 401 | As you can see the theme makes a big difference here\! 402 | 403 | Well that’s it for building up this ggplot, hopefully even in just half 404 | an hour I’ve been able to give you a sense of the process for making 405 | your best ggplot\! Thanks for listening/reading/watching ,any questions 406 | feel free to get in touch: and (shameless 407 | plug) check out [Mango’s website](https://www.mango-solutions.com/) for 408 | how we can help you and your company on your data journey\! 409 | --------------------------------------------------------------------------------