├── .gitignore
├── README.md
├── assets
    ├── mess_up_iris.R
    ├── project_files.png
    ├── project_screen1.png
    ├── project_screen2.png
    ├── project_screen3.png
    └── tidy_data.png
├── installing_software.md
├── iris.csv
├── loading_data.md
├── next_steps.md
├── plotting.Rmd
├── plotting.md
├── plotting_files
    └── figure-markdown_strict
    │   ├── unnamed-chunk-1-1.png
    │   └── unnamed-chunk-2-1.png
├── r_markdown.md
├── r_project.md
├── reproduciblecodeR.Rproj
├── summarising_data.Rmd
├── summarising_data.md
├── tidying_data.Rmd
├── tidying_data.md
├── tidying_output.Rmd
└── tidying_output.md


/.gitignore:
--------------------------------------------------------------------------------
1 | .Rproj.user
2 | .Rhistory
3 | .RData
4 | .Ruserdata
5 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Writing reproducible code in R
 2 | 
 3 | In this session, we are going to go through the basics of writing reproducible code in R, using an RMarkdown document. 
 4 | 
 5 | The material is self-paced and includes an example analysis. It is suggested that you work through the sections in order.
 6 | 
 7 | * [Setting up an R project](./r_project.md) - Create a self contained project in RStudio
 8 | * [Creating an RMarkdown notebook](./r_markdown.md) - Create a notebook to have all analyses and notes in one place
 9 | * [Loading Data](./loading_data.md) - Getting your own data into R
10 | * [Tidying Data](./tidying_data.md) - A quick example of transforming a messy dataset into something workable
11 | * [Manipulating and summarising data](./summarising_data.md) - How to take our tidy data and create some useful summaries
12 | * [Tidying model output](./tidying_output.md) - An example of how to tidy the output from a basic statistical model
13 | * [Plotting](./plotting.md) - To finish up, how to plot a summary of a model using ggplot2
14 | * [Additional resources](./next_steps.md) - Steps to further learning
15 | 
16 | My version of the R Notebook we have been working on can be found [here](https://github.com/laurajanegraham/reproducible_r).
17 | 


--------------------------------------------------------------------------------
/assets/mess_up_iris.R:
--------------------------------------------------------------------------------
 1 | library(tidyr)
 2 | library(dplyr)
 3 | library(readr)
 4 | 
 5 | iris$ID <- rep(paste0("sample", 1:50), 3)
 6 | iris_mess <- gather(iris, measurement, value, -Species, -ID) %>%
 7 |   spread(Species, value) %>%
 8 |   unite(measurement, ID, measurement)
 9 | 
10 | write.csv(iris_mess, file="iris.csv", row.names = FALSE)
11 | 
12 | 
13 | 


--------------------------------------------------------------------------------
/assets/project_files.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BES2016Workshop/reproduciblecodeR/97002f73e7587450107b66216555a08b1e182d9f/assets/project_files.png


--------------------------------------------------------------------------------
/assets/project_screen1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BES2016Workshop/reproduciblecodeR/97002f73e7587450107b66216555a08b1e182d9f/assets/project_screen1.png


--------------------------------------------------------------------------------
/assets/project_screen2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BES2016Workshop/reproduciblecodeR/97002f73e7587450107b66216555a08b1e182d9f/assets/project_screen2.png


--------------------------------------------------------------------------------
/assets/project_screen3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BES2016Workshop/reproduciblecodeR/97002f73e7587450107b66216555a08b1e182d9f/assets/project_screen3.png


--------------------------------------------------------------------------------
/assets/tidy_data.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BES2016Workshop/reproduciblecodeR/97002f73e7587450107b66216555a08b1e182d9f/assets/tidy_data.png


--------------------------------------------------------------------------------
/installing_software.md:
--------------------------------------------------------------------------------
 1 | # Installing the software
 2 | 
 3 | Many users of R use it from within another free piece of software called **RStudio.**
 4 | RStudio is a powerful and productive user interface for R. It’s free and open source, and works great on Windows, Mac, and Linux.
 5 | 
 6 | Rstudio's version control functionality is provided by yet another software called **git**
 7 | 
 8 | Our first task, therefore, is to install R, RStudio and git.
 9 | 
10 | ### Install R
11 | 
12 | Install R first. Downloads are available at https://cran.rstudio.com/
13 |   * Direct link for Windows https://cran.r-project.org/bin/windows/base/
14 |   * Direct link for MacOS X https://cran.r-project.org/bin/macosx/
15 |   * Direct link for Linux https://cran.r-project.org/bin/linux/
16 | 
17 | ### Install RStudio
18 | 
19 | Next, install RStudio. **If you already have RStudio, make sure you have the latest version (1.0.44).** The R Notebook used in later lessons will not work in earlier versions.
20 | 
21 | * Downloads are available at https://www.rstudio.com/products/rstudio/download/
22 | 
23 | ### Installing R packages
24 | 
25 | We are going to be using a number of packages in the following example. To install these packages, run the following code in the R console.
26 | 
27 | `install.packages(c("RCurl", "readr", "tidyr", "dplyr", "broom", "ggplot2", "cowplot"))`
28 | 
29 | ### Install git
30 | 
31 | Git is one of the most popular version control systems in the world. It is free and open source.
32 | 
33 | * Windows & OS X: http://git-scm.com/downloads
34 | * Debian/Ubuntu: `sudo apt-get install git-core`
35 | * Fedora/RedHat: `sudo yum install git-core`
36 | 
37 | To check that the installation worked, open a terminal or command prompt:
38 | 
39 | **Windows**
40 | 
41 | * Go to the Start menu
42 | * In the Search or Run line type **cmd** and press enter.
43 | 
44 | **Mac**
45 | 
46 | * Go to **Applications** -> **Utilities** -> **Terminal**
47 | 
48 | Type `git version`. You should see a short message containing some version information.
49 | 
50 | ### Configure git
51 | 
52 | After installing git, you need to tell it who you are. Open a terminal window or command prompt (see above) and type the following:
53 | 
54 | ```
55 | git config --global user.email "you@youremail.com"
56 | git config --global user.name "Your Name"
57 | ```
58 | 
59 | On succesful completion, you should see no output from these commands.
60 | 
61 | You can also configure git to use your preferred editor for commit messages, e.g. on a Mac:
62 | 
63 | ```
64 | git config --global core.editor nano
65 | ```
66 | 
67 | or on Windows:
68 | 
69 | ```
70 | git config --global core.editor notepad
71 | ```
72 | 
73 | It's a good idea to follow this step since the default editor selected by git is quite difficult to use!
74 | 
75 | ### Sign up for an account on GitHub
76 | 
77 | GitHub is a popular online hosting service for git repositories. It provides a useful interface for collaboration and code sharing.
78 | 
79 | Create a free account on GitHub:
80 | 
81 | [https://github.com/join](https://github.com/join)
82 | 
83 | ***If you have an academic email account you should use it here.***
84 | GitHub users can create an unlimited number of free, public repositories but only a limited number of private repositories. However, academic users can request access to an unlimited number of free private repositories.
85 | 
86 | **Optional**
87 | 
88 | If you are an academic user, sign up for free private repositories here:
89 | 
90 | [https://education.github.com/discount_requests/new](https://education.github.com/discount_requests/new)
91 | 
92 | ***This requires your account to be associated with an academic email address.***
93 | 
94 | It may take a while to receive the verification email for this step. Don't worry, we won't need this for the tutorial.


--------------------------------------------------------------------------------
/iris.csv:
--------------------------------------------------------------------------------
  1 | "measurement","setosa","versicolor","virginica"
  2 | "sample1_Petal.Length",1.4,4.7,6
  3 | "sample1_Petal.Width",0.2,1.4,2.5
  4 | "sample1_Sepal.Length",5.1,7,6.3
  5 | "sample1_Sepal.Width",3.5,3.2,3.3
  6 | "sample10_Petal.Length",1.5,3.9,6.1
  7 | "sample10_Petal.Width",0.1,1.4,2.5
  8 | "sample10_Sepal.Length",4.9,5.2,7.2
  9 | "sample10_Sepal.Width",3.1,2.7,3.6
 10 | "sample11_Petal.Length",1.5,3.5,5.1
 11 | "sample11_Petal.Width",0.2,1,2
 12 | "sample11_Sepal.Length",5.4,5,6.5
 13 | "sample11_Sepal.Width",3.7,2,3.2
 14 | "sample12_Petal.Length",1.6,4.2,5.3
 15 | "sample12_Petal.Width",0.2,1.5,1.9
 16 | "sample12_Sepal.Length",4.8,5.9,6.4
 17 | "sample12_Sepal.Width",3.4,3,2.7
 18 | "sample13_Petal.Length",1.4,4,5.5
 19 | "sample13_Petal.Width",0.1,1,2.1
 20 | "sample13_Sepal.Length",4.8,6,6.8
 21 | "sample13_Sepal.Width",3,2.2,3
 22 | "sample14_Petal.Length",1.1,4.7,5
 23 | "sample14_Petal.Width",0.1,1.4,2
 24 | "sample14_Sepal.Length",4.3,6.1,5.7
 25 | "sample14_Sepal.Width",3,2.9,2.5
 26 | "sample15_Petal.Length",1.2,3.6,5.1
 27 | "sample15_Petal.Width",0.2,1.3,2.4
 28 | "sample15_Sepal.Length",5.8,5.6,5.8
 29 | "sample15_Sepal.Width",4,2.9,2.8
 30 | "sample16_Petal.Length",1.5,4.4,5.3
 31 | "sample16_Petal.Width",0.4,1.4,2.3
 32 | "sample16_Sepal.Length",5.7,6.7,6.4
 33 | "sample16_Sepal.Width",4.4,3.1,3.2
 34 | "sample17_Petal.Length",1.3,4.5,5.5
 35 | "sample17_Petal.Width",0.4,1.5,1.8
 36 | "sample17_Sepal.Length",5.4,5.6,6.5
 37 | "sample17_Sepal.Width",3.9,3,3
 38 | "sample18_Petal.Length",1.4,4.1,6.7
 39 | "sample18_Petal.Width",0.3,1,2.2
 40 | "sample18_Sepal.Length",5.1,5.8,7.7
 41 | "sample18_Sepal.Width",3.5,2.7,3.8
 42 | "sample19_Petal.Length",1.7,4.5,6.9
 43 | "sample19_Petal.Width",0.3,1.5,2.3
 44 | "sample19_Sepal.Length",5.7,6.2,7.7
 45 | "sample19_Sepal.Width",3.8,2.2,2.6
 46 | "sample2_Petal.Length",1.4,4.5,5.1
 47 | "sample2_Petal.Width",0.2,1.5,1.9
 48 | "sample2_Sepal.Length",4.9,6.4,5.8
 49 | "sample2_Sepal.Width",3,3.2,2.7
 50 | "sample20_Petal.Length",1.5,3.9,5
 51 | "sample20_Petal.Width",0.3,1.1,1.5
 52 | "sample20_Sepal.Length",5.1,5.6,6
 53 | "sample20_Sepal.Width",3.8,2.5,2.2
 54 | "sample21_Petal.Length",1.7,4.8,5.7
 55 | "sample21_Petal.Width",0.2,1.8,2.3
 56 | "sample21_Sepal.Length",5.4,5.9,6.9
 57 | "sample21_Sepal.Width",3.4,3.2,3.2
 58 | "sample22_Petal.Length",1.5,4,4.9
 59 | "sample22_Petal.Width",0.4,1.3,2
 60 | "sample22_Sepal.Length",5.1,6.1,5.6
 61 | "sample22_Sepal.Width",3.7,2.8,2.8
 62 | "sample23_Petal.Length",1,4.9,6.7
 63 | "sample23_Petal.Width",0.2,1.5,2
 64 | "sample23_Sepal.Length",4.6,6.3,7.7
 65 | "sample23_Sepal.Width",3.6,2.5,2.8
 66 | "sample24_Petal.Length",1.7,4.7,4.9
 67 | "sample24_Petal.Width",0.5,1.2,1.8
 68 | "sample24_Sepal.Length",5.1,6.1,6.3
 69 | "sample24_Sepal.Width",3.3,2.8,2.7
 70 | "sample25_Petal.Length",1.9,4.3,5.7
 71 | "sample25_Petal.Width",0.2,1.3,2.1
 72 | "sample25_Sepal.Length",4.8,6.4,6.7
 73 | "sample25_Sepal.Width",3.4,2.9,3.3
 74 | "sample26_Petal.Length",1.6,4.4,6
 75 | "sample26_Petal.Width",0.2,1.4,1.8
 76 | "sample26_Sepal.Length",5,6.6,7.2
 77 | "sample26_Sepal.Width",3,3,3.2
 78 | "sample27_Petal.Length",1.6,4.8,4.8
 79 | "sample27_Petal.Width",0.4,1.4,1.8
 80 | "sample27_Sepal.Length",5,6.8,6.2
 81 | "sample27_Sepal.Width",3.4,2.8,2.8
 82 | "sample28_Petal.Length",1.5,5,4.9
 83 | "sample28_Petal.Width",0.2,1.7,1.8
 84 | "sample28_Sepal.Length",5.2,6.7,6.1
 85 | "sample28_Sepal.Width",3.5,3,3
 86 | "sample29_Petal.Length",1.4,4.5,5.6
 87 | "sample29_Petal.Width",0.2,1.5,2.1
 88 | "sample29_Sepal.Length",5.2,6,6.4
 89 | "sample29_Sepal.Width",3.4,2.9,2.8
 90 | "sample3_Petal.Length",1.3,4.9,5.9
 91 | "sample3_Petal.Width",0.2,1.5,2.1
 92 | "sample3_Sepal.Length",4.7,6.9,7.1
 93 | "sample3_Sepal.Width",3.2,3.1,3
 94 | "sample30_Petal.Length",1.6,3.5,5.8
 95 | "sample30_Petal.Width",0.2,1,1.6
 96 | "sample30_Sepal.Length",4.7,5.7,7.2
 97 | "sample30_Sepal.Width",3.2,2.6,3
 98 | "sample31_Petal.Length",1.6,3.8,6.1
 99 | "sample31_Petal.Width",0.2,1.1,1.9
100 | "sample31_Sepal.Length",4.8,5.5,7.4
101 | "sample31_Sepal.Width",3.1,2.4,2.8
102 | "sample32_Petal.Length",1.5,3.7,6.4
103 | "sample32_Petal.Width",0.4,1,2
104 | "sample32_Sepal.Length",5.4,5.5,7.9
105 | "sample32_Sepal.Width",3.4,2.4,3.8
106 | "sample33_Petal.Length",1.5,3.9,5.6
107 | "sample33_Petal.Width",0.1,1.2,2.2
108 | "sample33_Sepal.Length",5.2,5.8,6.4
109 | "sample33_Sepal.Width",4.1,2.7,2.8
110 | "sample34_Petal.Length",1.4,5.1,5.1
111 | "sample34_Petal.Width",0.2,1.6,1.5
112 | "sample34_Sepal.Length",5.5,6,6.3
113 | "sample34_Sepal.Width",4.2,2.7,2.8
114 | "sample35_Petal.Length",1.5,4.5,5.6
115 | "sample35_Petal.Width",0.2,1.5,1.4
116 | "sample35_Sepal.Length",4.9,5.4,6.1
117 | "sample35_Sepal.Width",3.1,3,2.6
118 | "sample36_Petal.Length",1.2,4.5,6.1
119 | "sample36_Petal.Width",0.2,1.6,2.3
120 | "sample36_Sepal.Length",5,6,7.7
121 | "sample36_Sepal.Width",3.2,3.4,3
122 | "sample37_Petal.Length",1.3,4.7,5.6
123 | "sample37_Petal.Width",0.2,1.5,2.4
124 | "sample37_Sepal.Length",5.5,6.7,6.3
125 | "sample37_Sepal.Width",3.5,3.1,3.4
126 | "sample38_Petal.Length",1.4,4.4,5.5
127 | "sample38_Petal.Width",0.1,1.3,1.8
128 | "sample38_Sepal.Length",4.9,6.3,6.4
129 | "sample38_Sepal.Width",3.6,2.3,3.1
130 | "sample39_Petal.Length",1.3,4.1,4.8
131 | "sample39_Petal.Width",0.2,1.3,1.8
132 | "sample39_Sepal.Length",4.4,5.6,6
133 | "sample39_Sepal.Width",3,3,3
134 | "sample4_Petal.Length",1.5,4,5.6
135 | "sample4_Petal.Width",0.2,1.3,1.8
136 | "sample4_Sepal.Length",4.6,5.5,6.3
137 | "sample4_Sepal.Width",3.1,2.3,2.9
138 | "sample40_Petal.Length",1.5,4,5.4
139 | "sample40_Petal.Width",0.2,1.3,2.1
140 | "sample40_Sepal.Length",5.1,5.5,6.9
141 | "sample40_Sepal.Width",3.4,2.5,3.1
142 | "sample41_Petal.Length",1.3,4.4,5.6
143 | "sample41_Petal.Width",0.3,1.2,2.4
144 | "sample41_Sepal.Length",5,5.5,6.7
145 | "sample41_Sepal.Width",3.5,2.6,3.1
146 | "sample42_Petal.Length",1.3,4.6,5.1
147 | "sample42_Petal.Width",0.3,1.4,2.3
148 | "sample42_Sepal.Length",4.5,6.1,6.9
149 | "sample42_Sepal.Width",2.3,3,3.1
150 | "sample43_Petal.Length",1.3,4,5.1
151 | "sample43_Petal.Width",0.2,1.2,1.9
152 | "sample43_Sepal.Length",4.4,5.8,5.8
153 | "sample43_Sepal.Width",3.2,2.6,2.7
154 | "sample44_Petal.Length",1.6,3.3,5.9
155 | "sample44_Petal.Width",0.6,1,2.3
156 | "sample44_Sepal.Length",5,5,6.8
157 | "sample44_Sepal.Width",3.5,2.3,3.2
158 | "sample45_Petal.Length",1.9,4.2,5.7
159 | "sample45_Petal.Width",0.4,1.3,2.5
160 | "sample45_Sepal.Length",5.1,5.6,6.7
161 | "sample45_Sepal.Width",3.8,2.7,3.3
162 | "sample46_Petal.Length",1.4,4.2,5.2
163 | "sample46_Petal.Width",0.3,1.2,2.3
164 | "sample46_Sepal.Length",4.8,5.7,6.7
165 | "sample46_Sepal.Width",3,3,3
166 | "sample47_Petal.Length",1.6,4.2,5
167 | "sample47_Petal.Width",0.2,1.3,1.9
168 | "sample47_Sepal.Length",5.1,5.7,6.3
169 | "sample47_Sepal.Width",3.8,2.9,2.5
170 | "sample48_Petal.Length",1.4,4.3,5.2
171 | "sample48_Petal.Width",0.2,1.3,2
172 | "sample48_Sepal.Length",4.6,6.2,6.5
173 | "sample48_Sepal.Width",3.2,2.9,3
174 | "sample49_Petal.Length",1.5,3,5.4
175 | "sample49_Petal.Width",0.2,1.1,2.3
176 | "sample49_Sepal.Length",5.3,5.1,6.2
177 | "sample49_Sepal.Width",3.7,2.5,3.4
178 | "sample5_Petal.Length",1.4,4.6,5.8
179 | "sample5_Petal.Width",0.2,1.5,2.2
180 | "sample5_Sepal.Length",5,6.5,6.5
181 | "sample5_Sepal.Width",3.6,2.8,3
182 | "sample50_Petal.Length",1.4,4.1,5.1
183 | "sample50_Petal.Width",0.2,1.3,1.8
184 | "sample50_Sepal.Length",5,5.7,5.9
185 | "sample50_Sepal.Width",3.3,2.8,3
186 | "sample6_Petal.Length",1.7,4.5,6.6
187 | "sample6_Petal.Width",0.4,1.3,2.1
188 | "sample6_Sepal.Length",5.4,5.7,7.6
189 | "sample6_Sepal.Width",3.9,2.8,3
190 | "sample7_Petal.Length",1.4,4.7,4.5
191 | "sample7_Petal.Width",0.3,1.6,1.7
192 | "sample7_Sepal.Length",4.6,6.3,4.9
193 | "sample7_Sepal.Width",3.4,3.3,2.5
194 | "sample8_Petal.Length",1.5,3.3,6.3
195 | "sample8_Petal.Width",0.2,1,1.8
196 | "sample8_Sepal.Length",5,4.9,7.3
197 | "sample8_Sepal.Width",3.4,2.4,2.9
198 | "sample9_Petal.Length",1.4,4.6,5.8
199 | "sample9_Petal.Width",0.2,1.3,1.8
200 | "sample9_Sepal.Length",4.4,6.6,6.7
201 | "sample9_Sepal.Width",2.9,2.9,2.5
202 | 


--------------------------------------------------------------------------------
/loading_data.md:
--------------------------------------------------------------------------------
 1 | # Loading Data
 2 | 
 3 | R packages exist to load in pretty much any form of data you can think of. Some key examples include:
 4 | 
 5 | - [readr](https://cran.r-project.org/web/packages/readr/README.html) tends to work faster and have more functionality for flat files (.csv, .txt) than base R (useful for big files)
 6 | - [readxl](https://blog.rstudio.org/2015/04/15/readxl-0-1-0/) for Excel spreadsheets
 7 | - [RODBC](https://cran.r-project.org/web/packages/RODBC/RODBC.pdf) for many types of database including Access
 8 | - [RPostgreSQL](https://www.r-bloggers.com/getting-started-with-postgresql-in-r/) for PostgreSQL databases
 9 | - [googlesheets](https://cran.r-project.org/web/packages/googlesheets/googlesheets.pdf) to interface with Google sheets
10 | - [raster](https://cran.r-project.org/web/packages/raster/raster.pdf) and [rgdal](https://cran.r-project.org/web/packages/rgdal/rgdal.pdf) for spatial data
11 | - [RCurl](https://cran.r-project.org/web/packages/RCurl/RCurl.pdf) contains functions to fetch data from webpages (along with lots more functionality for interfacing with webpages)
12 | 
13 | To load these packages into your R session use `library()` e.g. `library(RCurl)`
14 | 
15 | > ### Challenge
16 | >
17 | > In a new code chunk in your R Notebook, download [iris.csv](https://raw.githubusercontent.com/BES2016Workshop/reproduciblecodeR/master/iris.csv) using `getURL()` from the RCurl package, read into R using `read.csv()` and assign to the object name `iris`.
18 | >
19 | > **HINT** Use `df <- read.csv(text = getURL("/url/of/file"))` to read straight into R from webpage (replace `/url/of/file` with location of file). 
20 | 
21 | **Next:** [Tidying Data](./tidying_data.md)


--------------------------------------------------------------------------------
/next_steps.md:
--------------------------------------------------------------------------------
 1 | # Additional resources
 2 | 
 3 | I hope that you’ve found this guide to making your research more reproducible with R helpful. There are plenty of links throughout to learn more about the packages I've talked about. If you’d like to learn more about R or about reproducibility, I’d highly recommend the following resources:
 4 | 
 5 | - [RStudio cheatsheets](https://www.rstudio.com/resources/cheatsheets/): includes ggplot2, RMarkdown, dplyr, tidyr and more
 6 | 
 7 | - [Swirl](http://swirlstats.com/): tutorials for tidyr, dplyr and much more directly in the R console
 8 | 
 9 | - [Python pandas comparison with R](http://pandas.pydata.org/pandas-docs/stable/comparison_with_r.html): Python can be quicker than R, and so is particularly useful when you have a large dataset. This website provides a more detailed look at the R language and its many third party libraries as they relate to the python pandas library
10 | 
11 | - [Software Carpentry lessons](http://software-carpentry.org/lessons/): freely available lessons taught on the Software Carpentry courses. To host or run a workshop also see this site.
12 | 
13 | - [Reproducible Research on Coursera](https://www.coursera.org/learn/reproducible-research): taught by Roger Peng, Jeff Leek and Brian Caffo at Johns Hopkins University


--------------------------------------------------------------------------------
/plotting.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | output: md_document
 3 | ---
 4 | # Plotting
 5 | 
 6 | Finally, we want to plot our data to summarise the model from the previous step. [ggplot2](https://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf) is designed to work with tidy data formats and is based on the idea of the [grammar of graphics](https://ramnathv.github.io/pycon2014-r/visualize/ggplot2.html). This concept makes building up graphs from very simple to complex quite straightforward by adding additional layers. However, ggplot2 does have some less than ideal formatting like a grey gridded background. The [cowplot]() package overrides some of these settings to make publication quality plots. Cowplot also has some nice functionality for arranging plots. The [R graphics cookbook](http://www.cookbook-r.com/Graphs/) provides some helpful tutorials for building up plots using ggplot2. 
 7 | 
 8 | The features that I find most useful in ggplot2 are:
 9 | 
10 | - Build up plots layer-by-layer
11 | - Can use `facet_wrap()` and `facet_grid()` to create separate plots by a factor in the dataframe
12 | 
13 | Let's make a plot of the `mtcars` model from the previous step:
14 | 
15 | ```{r}
16 | data(mtcars)
17 | library(ggplot2)
18 | library(cowplot)
19 | 
20 | p <- ggplot(mtcars, aes(x = wt, y = mpg)) + 
21 |   geom_point() + 
22 |   geom_smooth(method = "lm")
23 | 
24 | p
25 | ```
26 | 
27 | We can then use `facet_wrap()` to get a separate plot for each number of cylinders:
28 | 
29 | ```{r}
30 | p <- p + facet_wrap(~cyl)
31 | 
32 | p
33 | ```
34 | 
35 | The `aes()` part of the call to `ggplot()` allows us to set the aesthetics of the plot, for example the `colour`, based on variables in the dataframe.
36 | 
37 | > ### Challenge
38 | >
39 | > In a new code chunk in your R Notebook, load ggplot2 using `library(ggplot2)` and make a plot of the linear model created in the previous step. Colour the points by species name. 
40 | >
41 | > **HINT** Loading the cowplot package will change the look of the plots to be more suitable for publication.
42 | 
43 | **Next:** [Additional resources](./next_steps.md)
44 | 
45 | 


--------------------------------------------------------------------------------
/plotting.md:
--------------------------------------------------------------------------------
 1 | Plotting
 2 | ========
 3 | 
 4 | Finally, we want to plot our data to summarise the model from the
 5 | previous step.
 6 | [ggplot2](https://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf)
 7 | is designed to work with tidy data formats and is based on the idea of
 8 | the [grammar of
 9 | graphics](https://ramnathv.github.io/pycon2014-r/visualize/ggplot2.html).
10 | This concept makes building up graphs from very simple to complex quite
11 | straightforward by adding additional layers. However, ggplot2 does have
12 | some less than ideal formatting like a grey gridded background. The
13 | [cowplot]() package overrides some of these settings to make publication
14 | quality plots. Cowplot also has some nice functionality for arranging
15 | plots. The [R graphics cookbook](http://www.cookbook-r.com/Graphs/)
16 | provides some helpful tutorials for building up plots using ggplot2.
17 | 
18 | The features that I find most useful in ggplot2 are:
19 | 
20 | -   Build up plots layer-by-layer
21 | -   Can use `facet_wrap()` and `facet_grid()` to create separate plots
22 |     by a factor in the dataframe
23 | 
24 | Let's make a plot of the `mtcars` model from the previous step:
25 | 
26 |     data(mtcars)
27 |     library(ggplot2)
28 | 
29 |     ## Warning: package 'ggplot2' was built under R version 3.3.2
30 | 
31 |     library(cowplot)
32 | 
33 |     ## Warning: package 'cowplot' was built under R version 3.3.2
34 | 
35 |     ## 
36 |     ## Attaching package: 'cowplot'
37 | 
38 |     ## The following object is masked from 'package:ggplot2':
39 |     ## 
40 |     ##     ggsave
41 | 
42 |     p <- ggplot(mtcars, aes(x = wt, y = mpg)) + 
43 |       geom_point() + 
44 |       geom_smooth(method = "lm")
45 | 
46 |     p
47 | 
48 | ![](plotting_files/figure-markdown_strict/unnamed-chunk-1-1.png)
49 | 
50 | We can then use `facet_wrap()` to get a separate plot for each number of
51 | cylinders:
52 | 
53 |     p <- p + facet_wrap(~cyl)
54 | 
55 |     p
56 | 
57 | ![](plotting_files/figure-markdown_strict/unnamed-chunk-2-1.png)
58 | 
59 | The `aes()` part of the call to `ggplot()` allows us to set the
60 | aesthetics of the plot, for example the `colour`, based on variables in
61 | the dataframe.
62 | 
63 | > ### Challenge
64 | >
65 | > In a new code chunk in your R Notebook, load ggplot2 using
66 | > `library(ggplot2)` and make a plot of the linear model created in the
67 | > previous step. Colour the points by species name.
68 | >
69 | > **HINT** Loading the cowplot package will change the look of the plots
70 | > to be more suitable for publication.
71 | 
72 | **Next:** [Additional resources](./next_steps.md)
73 | 


--------------------------------------------------------------------------------
/plotting_files/figure-markdown_strict/unnamed-chunk-1-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BES2016Workshop/reproduciblecodeR/97002f73e7587450107b66216555a08b1e182d9f/plotting_files/figure-markdown_strict/unnamed-chunk-1-1.png


--------------------------------------------------------------------------------
/plotting_files/figure-markdown_strict/unnamed-chunk-2-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BES2016Workshop/reproduciblecodeR/97002f73e7587450107b66216555a08b1e182d9f/plotting_files/figure-markdown_strict/unnamed-chunk-2-1.png


--------------------------------------------------------------------------------
/r_markdown.md:
--------------------------------------------------------------------------------
 1 | # Creating an RMarkdown notebook
 2 | 
 3 | [R Notebooks](http://rmarkdown.rstudio.com/r_notebooks.html) are Markdown documents which allow users to execute chunks of R code independently and interactively while producing publication quality output. They are an example of [literate programming](https://en.wikipedia.org/wiki/Literate_programming).
 4 | 
 5 | Create an R Notebook as follows:
 6 | 
 7 | **File** -> **New File** -> **R Notebook**
 8 | 
 9 | Edit the title of the notebook at the top of the document and try following some of the automatically generated instructions within the notebook. 
10 | 
11 | In the worked example that follows, we can enter R commands into code chunks and make notes using Markdown code in the main part of the document. 
12 | 
13 | Markdown is intended to be as easy-to-read and easy-to-write as possible: [handy guide to Markdown syntax](https://guides.github.com/pdfs/markdown-cheatsheet-online.pdf)
14 | 
15 | **Next:** [Loading Data](./loading_data.md)


--------------------------------------------------------------------------------
/r_project.md:
--------------------------------------------------------------------------------
 1 | # Setting up an R project
 2 | 
 3 | A project is a folder that contains everything concerning your analysis and may include code, data and documentation. It is a complete research object that can be used to describe and reproduce your research.
 4 | 
 5 | Create a new project in RStudio as follows:
 6 | 
 7 | **File** -> **New Project** -> **New Directory**
 8 | 
 9 | ![](./assets/project_screen1.png)
10 | 
11 | In the **Project Type** screen, click on **Empty Project**.
12 | 
13 | ![](./assets/project_screen2.png)
14 | 
15 | In the **Create New Project** screen, give your project a name, set the folder to an appropriate location by clicking browse, and ensure that **create a git repository** is checked. Click on **Create Project**.
16 | 
17 | ![](./assets/project_screen3.png)
18 | 
19 | RStudio will create a new folder containing an empty project and set R's working directory to within it.
20 | 
21 | ![](./assets/project_files.png)
22 | 
23 | Two files are created in the otherwise empty project:-
24 | 
25 | * **.gitignore** - Specifies files that should be ignored by the version control ystem.
26 | * **reproducible_r.Rproj** - Configuration information for the RStudio project
27 | 
28 | There is no need to worry about the contents of either of these for the purposes of this tutorial. Tamora will be covering how to use git for version control in one of the other breakout sessions.
29 | 
30 | **Next:** [Creating an RMarkdown notebook](./r_markdown.md)


--------------------------------------------------------------------------------
/reproduciblecodeR.Rproj:
--------------------------------------------------------------------------------
 1 | Version: 1.0
 2 | 
 3 | RestoreWorkspace: Default
 4 | SaveWorkspace: Default
 5 | AlwaysSaveHistory: Default
 6 | 
 7 | EnableCodeIndexing: Yes
 8 | UseSpacesForTab: Yes
 9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 | 
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 | 


--------------------------------------------------------------------------------
/summarising_data.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | output: md_document
 3 | ---
 4 | # Manipulating and summarising data
 5 | 
 6 | Once we have tidy data, we need to be able to apply data transformation functions to subset, order, summarise and create new variables. tidyr has as it's compliment the [dplyr](https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html) package. 
 7 | 
 8 | dplyr includes the following data transformation functions:
 9 | 
10 | - `select()` subset the columns of the data by selecting variables
11 | - `filter()` subset the rows of the data by a condition
12 | - `group_by()` groups data by one or more variables
13 | - `summarise()` summarise data by functions of choice (e.g. `mean()`, `max()`, `sd()`)
14 | - `arrange()` order data by a variable
15 | - `join()` joining two dataframes
16 | - `mutate()` create new variables
17 | - `summarise_each()` and `mutate_each` allow for applying functions to one or more columns
18 | 
19 | tidyr and dplyr also include the `%>%` pipe function. This takes the output of the previous command and 'pipes' it as the input into the next command. This is neater than using a nested approach to commands, and removes the need to create intermediate output files. 
20 | 
21 | A brief example using the built-in `mtcars` data:
22 | 
23 | ```{r, echo=FALSE}
24 | library(dplyr)
25 | ```
26 | 
27 | ```{r}
28 | data(mtcars)
29 | mtcars_summary <- group_by(mtcars, cyl) %>%
30 |   summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))
31 | mtcars_summary
32 | ```
33 | 
34 | Here we have the mean and standard deviation of MPG for each number of cylinders.
35 | 
36 | > ### Challenge
37 | >
38 | > In a new code chunk in your R Notebook, load the dplyr package using `library(dplyr)` and calculate the mean and standard deviation for each of the measured variables, grouped by species. 
39 | >
40 | > **HINT** Use `summarise_each` rather than multiple calls to `summarise()`.
41 | 
42 | **Next:** [Tidying model output](./tidying_output.md)
43 | 
44 | 


--------------------------------------------------------------------------------
/summarising_data.md:
--------------------------------------------------------------------------------
 1 | Manipulating and summarising data
 2 | =================================
 3 | 
 4 | Once we have tidy data, we need to be able to apply data transformation
 5 | functions to subset, order, summarise and create new variables. tidyr
 6 | has as it's compliment the
 7 | [dplyr](https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html)
 8 | package.
 9 | 
10 | dplyr includes the following data transformation functions:
11 | 
12 | -   `select()` subset the columns of the data by selecting variables
13 | -   `filter()` subset the rows of the data by a condition
14 | -   `group_by()` groups data by one or more variables
15 | -   `summarise()` summarise data by functions of choice (e.g. `mean()`,
16 |     `max()`, `sd()`)
17 | -   `arrange()` order data by a variable
18 | -   `join()` joining two dataframes
19 | -   `mutate()` create new variables
20 | -   `summarise_each()` and `mutate_each` allow for applying functions to
21 |     one or more columns
22 | 
23 | tidyr and dplyr also include the `%>%` pipe function. This takes the
24 | output of the previous command and 'pipes' it as the input into the next
25 | command. This is neater than using a nested approach to commands, and
26 | removes the need to create intermediate output files.
27 | 
28 | A brief example using the built-in `mtcars` data:
29 | 
30 |     ## Warning: package 'dplyr' was built under R version 3.3.2
31 | 
32 |     ## 
33 |     ## Attaching package: 'dplyr'
34 | 
35 |     ## The following objects are masked from 'package:stats':
36 |     ## 
37 |     ##     filter, lag
38 | 
39 |     ## The following objects are masked from 'package:base':
40 |     ## 
41 |     ##     intersect, setdiff, setequal, union
42 | 
43 |     data(mtcars)
44 |     mtcars_summary <- group_by(mtcars, cyl) %>%
45 |       summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))
46 |     mtcars_summary
47 | 
48 |     ## # A tibble: 3 × 3
49 |     ##     cyl mean_mpg   sd_mpg
50 |     ##   <dbl>    <dbl>    <dbl>
51 |     ## 1     4 26.66364 4.509828
52 |     ## 2     6 19.74286 1.453567
53 |     ## 3     8 15.10000 2.560048
54 | 
55 | Here we have the mean and standard deviation of MPG for each number of
56 | cylinders.
57 | 
58 | > ### Challenge
59 | >
60 | > In a new code chunk in your R Notebook, load the dplyr package using
61 | > `library(dplyr)` and calculate the mean and standard deviation for
62 | > each of the measured variables, grouped by species.
63 | >
64 | > **HINT** Use `summarise_each` rather than multiple calls to
65 | > `summarise()`.
66 | 
67 | **Next:** [Tidying model output](./tidying_output.md)
68 | 


--------------------------------------------------------------------------------
/tidying_data.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | output: md_document
 3 | ---
 4 | # Tidying Data
 5 | 
 6 | [“Tidy datasets are all alike but every messy datset is messy in its own way” (Hadley Wickham, 2014)](http://vita.had.co.nz/papers/tidy-data.html)
 7 | 
 8 | Key features of tidy data are:
 9 | 
10 | - Observations in rows
11 | - Variables in columns
12 | - Each type of observational unit is a table
13 | 
14 | ![](./assets/tidy_data.png)
15 | 
16 | Messy data can take many forms. For example:
17 | 
18 | - Column headers are values, not variable names
19 | - Multiple variables stored in one column
20 | - Variables stored in both rows and columns
21 | - Multiple observational unit types in the same table
22 | - Single observational unit in multiple tables
23 | 
24 | Let's explore the data we loaded in the last exercise:
25 | ```{r, echo=FALSE}
26 | iris <- read.csv("iris.csv")
27 | ```
28 | 
29 | ```{r}
30 | # get the structure of the dataframe
31 | str(iris)
32 | 
33 | # head gives us the first 6 rows to explore 
34 | head(iris)
35 | ```
36 | 
37 | Here we have three characteristics of messy data:
38 | 
39 | - Species names (which are values) as column headers
40 | - Multiple variables stored in one column: sample number and measurement type as a compound variable
41 | - Variables (the measurement types) are stored in rows instead of columns
42 | 
43 | The [tidyr](https://blog.rstudio.org/2014/07/22/introducing-tidyr/) package provides functions to fix many of the issues in messy datasets. 
44 | 
45 | - `gather()` takes multiple columns and gathers them into key-value pairs. We can use this to get the species names into rows.
46 | 
47 | - `separate()` takes one column and separates into multiple columns. We can use this to split the sample number from the measurement type. 
48 | 
49 | - `spread()` takes two columns (a key-value pair) and spreads them into multiple columns. We can use this to get the measurement types to form columns.
50 | 
51 | Two other useful packages for tidying data are [lubridate](https://cran.r-project.org/web/packages/lubridate/lubridate.pdf) for working with dates and [taxize](https://ropensci.org/tutorials/taxize_tutorial.html) for cleaning taxonomic information.
52 | 
53 | > ### Challenge
54 | >
55 | > In a new code chunk in your R Notebook, load the tidyr package using `library(tidyr)` and use the suggested functions to get the data into tidy data format. 
56 | >
57 | > **HINT** Use `?` to get help on how to use a function (e.g. `?separate`)
58 | 
59 | **Next:** [Manipulating and summarising data](./summarising_data.md)


--------------------------------------------------------------------------------
/tidying_data.md:
--------------------------------------------------------------------------------
 1 | Tidying Data
 2 | ============
 3 | 
 4 | [“Tidy datasets are all alike but every messy datset is messy in its own
 5 | way” (Hadley Wickham,
 6 | 2014)](http://vita.had.co.nz/papers/tidy-data.html)
 7 | 
 8 | Key features of tidy data are:
 9 | 
10 | -   Observations in rows
11 | -   Variables in columns
12 | -   Each type of observational unit is a table
13 | 
14 | ![](./assets/tidy_data.png)
15 | 
16 | Messy data can take many forms. For example:
17 | 
18 | -   Column headers are values, not variable names
19 | -   Multiple variables stored in one column
20 | -   Variables stored in both rows and columns
21 | -   Multiple observational unit types in the same table
22 | -   Single observational unit in multiple tables
23 | 
24 | Let's explore the data we loaded in the last exercise:
25 | 
26 |     # get the structure of the dataframe
27 |     str(iris)
28 | 
29 |     ## 'data.frame':    200 obs. of  4 variables:
30 |     ##  $ measurement: Factor w/ 200 levels "sample1_Petal.Length",..: 1 2 3 4 5 6 7 8 9 10 ...
31 |     ##  $ setosa     : num  1.4 0.2 5.1 3.5 1.5 0.1 4.9 3.1 1.5 0.2 ...
32 |     ##  $ versicolor : num  4.7 1.4 7 3.2 3.9 1.4 5.2 2.7 3.5 1 ...
33 |     ##  $ virginica  : num  6 2.5 6.3 3.3 6.1 2.5 7.2 3.6 5.1 2 ...
34 | 
35 |     # head gives us the first 6 rows to explore 
36 |     head(iris)
37 | 
38 |     ##             measurement setosa versicolor virginica
39 |     ## 1  sample1_Petal.Length    1.4        4.7       6.0
40 |     ## 2   sample1_Petal.Width    0.2        1.4       2.5
41 |     ## 3  sample1_Sepal.Length    5.1        7.0       6.3
42 |     ## 4   sample1_Sepal.Width    3.5        3.2       3.3
43 |     ## 5 sample10_Petal.Length    1.5        3.9       6.1
44 |     ## 6  sample10_Petal.Width    0.1        1.4       2.5
45 | 
46 | Here we have three characteristics of messy data:
47 | 
48 | -   Species names (which are values) as column headers
49 | -   Multiple variables stored in one column: sample number and
50 |     measurement type as a compound variable
51 | -   Variables (the measurement types) are stored in rows instead of
52 |     columns
53 | 
54 | The [tidyr](https://blog.rstudio.org/2014/07/22/introducing-tidyr/)
55 | package provides functions to fix many of the issues in messy datasets.
56 | 
57 | -   `gather()` takes multiple columns and gathers them into
58 |     key-value pairs. We can use this to get the species names into rows.
59 | 
60 | -   `separate()` takes one column and separates into multiple columns.
61 |     We can use this to split the sample number from the
62 |     measurement type.
63 | 
64 | -   `spread()` takes two columns (a key-value pair) and spreads them
65 |     into multiple columns. We can use this to get the measurement types
66 |     to form columns.
67 | 
68 | Two other useful packages for tidying data are
69 | [lubridate](https://cran.r-project.org/web/packages/lubridate/lubridate.pdf)
70 | for working with dates and
71 | [taxize](https://ropensci.org/tutorials/taxize_tutorial.html) for
72 | cleaning taxonomic information.
73 | 
74 | > ### Challenge
75 | >
76 | > In a new code chunk in your R Notebook, load the tidyr package using
77 | > `library(tidyr)` and use the suggested functions to get the data into
78 | > tidy data format.
79 | >
80 | > **HINT** Use `?` to get help on how to use a function (e.g.
81 | > `?separate`)
82 | 
83 | **Next:** [Manipulating and summarising data](./summarising_data.md)
84 | 


--------------------------------------------------------------------------------
/tidying_output.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | output: md_document
 3 | ---
 4 | # Tidying model output
 5 | 
 6 | In this section, we will run a basic linear model and programatically tidy the output from the model. 
 7 | 
 8 | Again using the `mtcars` data, see an example of the output from a linear model:
 9 | 
10 | ```{r}
11 | data(mtcars)
12 | lmfit <- lm(mpg ~ wt, data = mtcars)
13 | summary(lmfit)
14 | ```
15 | 
16 | While this summary is useful for assessing the output of a single model, it can become quite difficult once the number of models starts to increase. This is where the [broom](https://cran.r-project.org/web/packages/broom/vignettes/broom.html) package comes in handy. This package provides functions to convert model coefficient estimates, predicted values and residuals, and summary statistics to data frames.
17 | 
18 | ```{r}
19 | library(broom)
20 | # we can view a table of the coefficient estimates and p values
21 | tidy(lmfit)
22 | 
23 | # we can view a table of the fit statistics
24 | glance(lmfit)
25 | ```
26 | 
27 | The functions from the broom package work on most classes of model output. 
28 | 
29 | > ### Challenge
30 | >
31 | > In a new code chunk in your R Notebook load the broom package with `library(broom)` and using the `lm()` and `tidy()` functions, fit a linear model relating petal length to petal width and output the table of coefficients.
32 | 
33 | **Next:** [Plotting](./plotting.md)


--------------------------------------------------------------------------------
/tidying_output.md:
--------------------------------------------------------------------------------
 1 | Tidying model output
 2 | ====================
 3 | 
 4 | In this section, we will run a basic linear model and programatically
 5 | tidy the output from the model.
 6 | 
 7 | Again using the `mtcars` data, see an example of the output from a
 8 | linear model:
 9 | 
10 |     data(mtcars)
11 |     lmfit <- lm(mpg ~ wt, data = mtcars)
12 |     summary(lmfit)
13 | 
14 |     ## 
15 |     ## Call:
16 |     ## lm(formula = mpg ~ wt, data = mtcars)
17 |     ## 
18 |     ## Residuals:
19 |     ##     Min      1Q  Median      3Q     Max 
20 |     ## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
21 |     ## 
22 |     ## Coefficients:
23 |     ##             Estimate Std. Error t value Pr(>|t|)    
24 |     ## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
25 |     ## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
26 |     ## ---
27 |     ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
28 |     ## 
29 |     ## Residual standard error: 3.046 on 30 degrees of freedom
30 |     ## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
31 |     ## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10
32 | 
33 | While this summary is useful for assessing the output of a single model,
34 | it can become quite difficult once the number of models starts to
35 | increase. This is where the
36 | [broom](https://cran.r-project.org/web/packages/broom/vignettes/broom.html)
37 | package comes in handy. This package provides functions to convert model
38 | coefficient estimates, predicted values and residuals, and summary
39 | statistics to data frames.
40 | 
41 |     library(broom)
42 |     # we can view a table of the coefficient estimates and p values
43 |     tidy(lmfit)
44 | 
45 |     ##          term  estimate std.error statistic      p.value
46 |     ## 1 (Intercept) 37.285126  1.877627 19.857575 8.241799e-19
47 |     ## 2          wt -5.344472  0.559101 -9.559044 1.293959e-10
48 | 
49 |     # we can view a table of the fit statistics
50 |     glance(lmfit)
51 | 
52 |     ##   r.squared adj.r.squared    sigma statistic      p.value df    logLik
53 |     ## 1 0.7528328     0.7445939 3.045882  91.37533 1.293959e-10  2 -80.01471
54 |     ##        AIC      BIC deviance df.residual
55 |     ## 1 166.0294 170.4266 278.3219          30
56 | 
57 | The functions from the broom package work on most classes of model
58 | output.
59 | 
60 | > ### Challenge
61 | >
62 | > In a new code chunk in your R Notebook load the broom package with
63 | > `library(broom)` and using the `lm()` and `tidy()` functions, fit a
64 | > linear model relating petal length to petal width and output the table
65 | > of coefficients.
66 | 
67 | **Next:** [Plotting](./plotting.md)
68 | 


--------------------------------------------------------------------------------