├── Data ├── 2 │ └── example.csv ├── 3 │ ├── data.zip │ └── example.csv ├── 4 │ ├── data.zip │ └── example.csv ├── 5 │ └── data.zip ├── 6 │ └── data.zip └── 7 │ └── data.zip ├── LICENSE ├── README.md └── R_Markdown ├── 1-basics.RMd ├── 2-recoding-data.Rmd ├── 3-importing-external-data.Rmd ├── 4-attribute-joins.Rmd ├── 5-basic-maps.Rmd ├── 6-basic-spatial-analysis.Rmd ├── 7-converting-coordinates.Rmd └── common-error-msg.Rmd /Data/2/example.csv: -------------------------------------------------------------------------------- 1 | Name,Age,Place,School,Degree John,20,Liverpool,Hillside High School,Geography BA (Hons) Rachel,21,Norwich,Colman High School,Geography & Archaeology BA (Joint Hons) Helen,34,Liverpool,Hillside High School,Geography BA (Hons) Mia,20,Liverpool,Central High School,Geography BA (Hons) Carl,26,Exeter,Central High School,Geography BSc (Hons) Kerryn,21,Exeter,Central High School,Geography BSc (Hons) -------------------------------------------------------------------------------- /Data/3/data.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexsingleton/R-Tutorial-Materials/d5d62b21bee53fc9fceab8c45b240a27560f6c26/Data/3/data.zip -------------------------------------------------------------------------------- /Data/3/example.csv: -------------------------------------------------------------------------------- 1 | Header text we want to ignore 2 | Name,Age,Place,School 3 | John,20,Liverpool,Hillside High School 4 | Rachel,21,Norwich,Colman High School 5 | Helen,34,Liverpool,Hillside High School 6 | -------------------------------------------------------------------------------- /Data/4/data.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexsingleton/R-Tutorial-Materials/d5d62b21bee53fc9fceab8c45b240a27560f6c26/Data/4/data.zip -------------------------------------------------------------------------------- /Data/4/example.csv: -------------------------------------------------------------------------------- 1 | Header text we want to ignore 2 | Name,Age,Place,School 3 | John,20,Liverpool,Hillside High School 4 | Rachel,21,Norwich,Colman High School 5 | Helen,34,Liverpool,Hillside High School 6 | -------------------------------------------------------------------------------- /Data/5/data.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexsingleton/R-Tutorial-Materials/d5d62b21bee53fc9fceab8c45b240a27560f6c26/Data/5/data.zip -------------------------------------------------------------------------------- /Data/6/data.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexsingleton/R-Tutorial-Materials/d5d62b21bee53fc9fceab8c45b240a27560f6c26/Data/6/data.zip -------------------------------------------------------------------------------- /Data/7/data.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexsingleton/R-Tutorial-Materials/d5d62b21bee53fc9fceab8c45b240a27560f6c26/Data/7/data.zip -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2014 alexsingleton 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of 6 | this software and associated documentation files (the "Software"), to deal in 7 | the Software without restriction, including without limitation the rights to 8 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 9 | the Software, and to permit persons to whom the Software is furnished to do so, 10 | subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 17 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 18 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 19 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 20 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 21 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Using R as a GIS 2 | ==================== 3 | 4 | This repository provides the code for a series of R tutorials that illustrate the use of R as a GIS. They are written in R markdown, however, the PDF of these are available to download [here](http://www.alex-singleton.com/R-Tutorial-Materials/). 5 | -------------------------------------------------------------------------------- /R_Markdown/1-basics.RMd: -------------------------------------------------------------------------------- 1 | ```{r set-options, echo=FALSE,comment=NA, cache=FALSE} 2 | options(width=65) 3 | ``` 4 | 5 | 1. R Basics - How do I use R? 6 | ======== 7 | 8 | R is not a traditional program like Word, Excel or Chrome - instead of using the mouse to click on menus, with R you type in commands, which it then runs. Initially this might be is a bit harder to learn, but it does mean that you can easily rerun the same set of commands without having to remember which menus you have clicked. You also have a record of the work you have done, and both of these are very useful as you will see later on. 9 | 10 | To get started, just click the R icon, and a new window (called the R console) will appear: 11 | 12 | ![](r-screenshot.jpg) 13 | 14 | ----- 15 | 16 | >> _One key concept you need to know when using R is the working directory. This is the folder where you keep your R files and any other files you happen to be using. Usually this will be set to `M:/R work` in these helpsheets._ 17 | 18 | >> _If you want to save files elsewhere, this is fine (just put in the appropriate file path). To find out where your current working directory is, run `getwd()`. To set your working directory, run `setwd("M:/R work")`. Make sure that the directory exists - otherwise R may start to give error messages._ 19 | 20 | ----- 21 | 22 | ### R as a calculator 23 | 24 | At the most basic level R can be used as a calculator. Try typing the following and then press enter/return. 25 | 26 | ```{r,results="hide"} 27 | 6 + 8 28 | ``` 29 | 30 | This should output: 31 | 32 | ```{r,echo=FALSE,comment=NA} 33 | 6+8 34 | ``` 35 | 36 | Don't worry about the `[1]` for the moment - just note that R printed out `14` since this is the answer to the sum you typed in. These helpsheets will contain a mix of things you should type in (such as `6 + 8` above) and things R will output (`14` above). They will always use the same style. 37 | 38 | R uses * for multiplication, so for example 5 times 4 is: 39 | 40 | ```{r,results='hide'} 41 | 5 * 4 42 | ``` 43 | 44 | Which outputs: 45 | 46 | ```{r,echo=FALSE,comment=NA} 47 | 5 * 4 48 | ``` 49 | 50 | You also have '`-`' for subtraction and '`/`' for division: 51 | 52 | ```{r,results='hide'} 53 | 15 - 8 54 | ``` 55 | 56 | ```{r,echo=FALSE,comment=NA} 57 | 15 - 8 58 | ``` 59 | 60 | ```{r,results='hide'} 61 | 125 / 5 62 | ``` 63 | 64 | ```{r,echo=FALSE,comment=NA} 65 | 125 / 5 66 | ``` 67 | 68 | R also has functions like square root, sine, cosine and so on. For example: 69 | 70 | ```{r,results='hide'} 71 | sqrt(25) 72 | ``` 73 | 74 | ```{r,echo=FALSE,comment=NA} 75 | sqrt(25) 76 | ``` 77 | 78 | These expressions can also be combined all in one line: 79 | 80 | ```{r,results='hide'} 81 | sqrt(5 + 6 * 10 / 4) 82 | ``` 83 | 84 | ```{r,echo=FALSE,comment=NA} 85 | sqrt(5 + 6 * 10 / 4) 86 | ``` 87 | 88 | ----- 89 | 90 | >>_Note: It is always important to remember the order of combined sums like the one above. Remember that `5 + 6 * 10 / 4` is not the same as `(5 + 6) * 10 / 4`. Remember to put in brackets if they are required (see http://www.mathsisfun.com/operation-order-bodmas.html or http://en.wikipedia.org/wiki/Order\_of\_operations)_. 91 | 92 | ----- 93 | 94 | ### Variables 95 | 96 | You can also assign numbers and results from calculations to variables as follows: 97 | 98 | ```{r,results='hide'} 99 | price <- 300 100 | ``` 101 | 102 | This has stored the value `300` in the variable `price`. The `<-` symbol put the value on the right into the variable on the left. It is typed with a `<` followed by a `-`. 103 | 104 | We can then do calculations using this variable in the same way as the numbers above. For example if we wanted to reduce the `price` by 20% we could do this: 105 | 106 | ```{r,results='hide'} 107 | price - price * 0.2 108 | ``` 109 | 110 | ```{r,echo=FALSE,comment=NA} 111 | price - price * 0.2 112 | ``` 113 | 114 | Or use multiple variables in one line: 115 | 116 | ```{r,results='hide'} 117 | discount <- price * 0.2 118 | price - discount 119 | ``` 120 | 121 | ```{r,echo=FALSE,comment=NA} 122 | discount <- price * 0.2 123 | price - discount 124 | ``` 125 | 126 | ----- 127 | 128 | >> _Remember in R that variables are case-sensitive (a bit like passwords). This means that `price` is not the same variable as `Price`. Try it with `discount <- Price * 0.2`. What happens? It gives you an error message like this:_ 129 | 130 | >> _`Error: object 'Price' not found`_ 131 | 132 | >>_This means it can't find the object / variable `Price`._ 133 | 134 | >>_If you ever want to check which variables are defined in your "workspace", just run `ls()` and R will print a list of the variables you have. `rm(x)` where `x` is the variable will remove the variable - note there is no undo! Try `rm(price)` which will remove the variable price._ 135 | 136 | ----- 137 | 138 | R can also work with lists of numbers as well as individual ones. You can specify a list of numbers using the `c` function. Suppose you have a list of house prices, specified in thousands of pounds. You could store them in a variable called house.prices like this: 139 | 140 | 141 | 142 | ```{r,results='hide'} 143 | house.prices <- c(120, 150, 212, 99, 199, 299, 159) 144 | house.prices 145 | ``` 146 | 147 | ```{r,echo=FALSE,comment=NA} 148 | house.prices <- c(120, 150, 212, 99, 199, 299, 159) 149 | house.prices 150 | ``` 151 | 152 | Variable names can contain full stops in them, like the `house.prices` example above; they still work in the same way. 153 | 154 | You can apply functions to the list. For example, to take the average of a list, enter: 155 | 156 | ```{r,results='hide'} 157 | mean(house.prices) 158 | ``` 159 | 160 | ```{r,echo=FALSE,comment=NA} 161 | mean(house.prices) 162 | ``` 163 | 164 | If the house prices are in thousands of pounds, then this tells us that the mean house price is `176.9` thousand pounds. Note here that on your display, the answer may be displayed with more significant digits, so you may have something like `176.8571` as the mean value. 165 | 166 | ### Data Frames 167 | 168 | Data frames are an important component of R and worth spending some time on. They are like a spreadsheet, in that they can have columns of related information. We are going to create something like this: 169 | 170 | House Price | Burglary Rate 171 | --- | --- 172 | 200 | 0 173 | 130 | 7 174 | 200 | 0 175 | 200 | 0 176 | ... | ... 177 | 178 | Add the following two lists into R, by copying and pasting the code: 179 | 180 | ```{r,results='hide'} 181 | house.prices <- c(200, 130, 200, 200, 180, 140, 65, 220, 180, 200, 210, 170, 182 | 180, 160, 180, 130, 240, 180, 170, 230, 150, 200, 200, 210, 220, 180, 200, 183 | 210, 150, 200, 230, 120, 180, 180, 190, 72, 80, 190, 220, 150, 200, 170, 184 | 170, 230, 200, 160, 140, 100, 140, 170, 180, 260, 170, 230, 190, 220, 140, 185 | 220, 120, 96, 210, 170, 180, 140, 150, 67, 200, 230, 140, 230, 83, 170, 186 | 200, 210, 240, 180, 200, 210, 250, 140, 130, 190, 110, 160, 150, 230, 160, 187 | 210, 200, 230, 210, 190, 120, 180, 87, 160, 190, 190, 230, 180, 110, 200, 188 | 250, 180, 200, 130, 180, 190, 190, 230, 210, 210, 150, 190, 210, 200, 210, 189 | 170) 190 | 191 | burg.rates <- c(0, 7, 0, 0, 6, 19, 32, 0, 0, 0, 15, 6, 12, 8, 7, 6, 0, 0, 6, 192 | 0, 7, 0, 0, 0, 0, 0, 0, 0, 17, 0, 0, 21, 7, 12, 7, 36, 18, 0, 0, 7, 6, 0, 193 | 0, 0, 0, 0, 13, 22, 0, 0, 0, 7, 12, 7, 5, 11, 0, 0, 13, 13, 0, 6, 15, 6, 194 | 17, 37, 0, 6, 6, 5, 24, 0, 0, 0, 0, 0, 0, 0, 5, 15, 0, 5, 6, 0, 0, 0, 13, 195 | 0, 6, 0, 0, 0, 23, 6, 13, 15, 6, 0, 0, 7, 7, 0, 0, 0, 0, 19, 13, 0, 0, 0, 196 | 6, 9, 0, 0, 0, 0, 0, 5) 197 | ``` 198 | 199 | Now, before we go any further we need to make sure that all the data have been entered into R correctly. We can see how many items are in each variable using `length(x)` where `x` is the variable name. 200 | 201 | ```{r,results='hide'} 202 | length(house.prices) 203 | length(burg.rates) 204 | ``` 205 | 206 | You should get `118` for both. If you don't, try adding the numbers into the variables again. 207 | 208 | For any variable, if you just type it's name (e.g. `brug.rates`) R will list all of the values contained within it: 209 | 210 | ```{r,echo=FALSE,comment=NA} 211 | burg.rates 212 | ``` 213 | 214 | This command shows all of the values, and some numbers in square brackets - these relate to the position in the list of the first number of each row. For the example above, the second row begins with `[21]` which means that the first number in this row (a `7` in this case) is the 21st number in the list. The main idea is to allow you to find positions in the list of higher numbers more easily. 215 | 216 | ----- 217 | 218 | >>_A handy hint to remember is that pressing up on the keyboard will get R to show the previous command you typed - handy if you want to repeat something, or make a small correction. Pressing up again will take you to further previous commands, and so on. Try this now._ 219 | 220 | >>_You can also use the "`history()`" command, which will open a new window with the history of the commands that you have typed in R. "`history()`" will only work when you are running R on Windows - it doesn't work on OS X or Ubuntu._ 221 | 222 | ----- 223 | 224 | We can now merge these two lists together (`house.prices` and `burg.rates`) using a data frame. You can think of it as a bit like a spreadsheet where all relevant data are stored together as a set of columns. This is similar to the data set storage in SPSS where each variable corresponds to a column and each case (or observation) corresponds to a row. However, while SPSS can only have one data set active at a time, in R you can have several of them, similar to multiple sheets in an Excel workbook. These are stored in your workspace. 225 | 226 | To create a data frame containing the two lists enter: 227 | 228 | ```{r,results='hide'} 229 | hp.data <- data.frame(Burglary = burg.rates, Price = house.prices) 230 | ``` 231 | 232 | Then type in its name to list it: 233 | 234 | ```{r,results='hide'} 235 | hp.data 236 | ``` 237 | 238 | A little bit of explanation: *read this to understand what has just happened!* 239 | 240 | The function `data.frame` takes all of the variables that you wish to have as columns. The `Burglary=burg.rates` creates a column in the data frame called `Burglary` containing the values in the variable `burg.rates` in the last section. Similarly, it has a column called `Price` containing the values from `house.prices`. This new data frame is called `hp.data` (an object in R is similar to a variable, although it can be more complex - so it can contain more sophisticated things like data frames, not just a list of values). Typing in the name of a data frame object (once it has been created) lists the values in the columns. 241 | 242 | ### Summarising Data 243 | 244 | With a data frame, we can use the `ncol()` and `nrow()` commands to see how many rows and columns are in the data frame. Try this now: 245 | 246 | ```{r,results='hide'} 247 | ncol(hp.data) 248 | ``` 249 | 250 | ```{r,echo=FALSE,comment=NA} 251 | ncol(hp.data) 252 | ``` 253 | 254 | ```{r,results='hide'} 255 | nrow(hp.data) 256 | ``` 257 | 258 | ```{r,echo=FALSE,comment=NA} 259 | nrow(hp.data) 260 | ``` 261 | 262 | We can also get R to give summary values of the data frame by typing: 263 | 264 | ```{r,results='hide'} 265 | summary(hp.data) 266 | ``` 267 | 268 | ```{r,echo=FALSE,comment=NA} 269 | summary(hp.data) 270 | ``` 271 | 272 | For each column, a number of values are listed: 273 | 274 | Item | Description 275 | --- | --- 276 | Min. | The smallest value in the column 277 | 1st. Qu. | The first quartile (the value ¼ of the way along a sorted list of values) 278 | Median | The median (the value ½ of the way along a sorted list of values) 279 | Mean | The average of the column 280 | 3rd. Qu. | The third quartile (the value ¾ of the way along a sorted list of values) 281 | Max. | The largest value in the column 282 | 283 | Between these numbers, an impression of the spread of values of each variable can be obtained. In particular it is possible to see that the median house price in this area by neighborhood ranges from £65,000 to £260,000, and that half of the prices lie between £152,500 and £210,000. Also, it can be seen that since the median measured burglary rate is zero, then at least half of the areas had no burglaries in the month when counts were compiled. 284 | 285 | ### Data Subsets 286 | 287 | As we saw above, you can get a summary of a data set using `summary()`, which will give some summary statistics for the data set specified between the brackets. As well as printing out the whole data set by typing the objects name `hp.data` we can get R to output the first 6 rows using the `head()` command. 288 | 289 | ```{r,results='hide'} 290 | head(hp.data) 291 | ``` 292 | 293 | We can also specify a specific row and/or column using either numbers in square brackets `[]` or column/row names. For example: 294 | 295 | ```{r,results='hide'} 296 | hp.data[15,2] 297 | ``` 298 | 299 | This prints data in the 15th row and the 2nd column, `180` in our case. You can also print using column/row names as well using quote marks: 300 | 301 | ```{r,results='hide'} 302 | hp.data[15,"Price"] 303 | ``` 304 | 305 | You can also get a range of rows, using a colon: 306 | 307 | ```{r,results='hide'} 308 | hp.data[10:15,"Price"] 309 | ``` 310 | 311 | This lists the burglary rates in rows 10-15 in the data set. 312 | 313 | If you want to specify a full column (i.e. all of the burglary rates), just leave the part where you would write the column range empty: 314 | 315 | ```{r,results='hide'} 316 | hp.data[ ,"Burglary"] 317 | ``` 318 | 319 | You can use a similar approach to select a row of the data - 320 | 321 | ```{r,results='hide'} 322 | hp.data[12, ] 323 | ``` 324 | 325 | This gives Burglary and Price (i.e. house price) values for the 12th row in the dataframe. 326 | 327 | Another way of selecting columns is to use the $ (dollar) approach. 328 | 329 | ```{r,results='hide'} 330 | hp.data$Price 331 | ``` 332 | 333 | This prints the column called `Price`. 334 | 335 | ### Graphics 336 | 337 | There are a range of simple graphics that R can do to help you understand your data. 338 | 339 | Firstly a histogram: 340 | 341 | ```{r,results='hide',eval=FALSE} 342 | hist(burg.rates) 343 | ``` 344 | 345 | ```{r,results='hide',echo=FALSE,comment=NA,warning=FALSE} 346 | pdf('plot1.pdf',4, 4) 347 | hist(burg.rates) 348 | dev.off() 349 | ``` 350 | 351 | \begin{center} 352 | \includegraphics{plot1.pdf} 353 | \par 354 | \end{center} 355 | 356 | A new window will appear with the histogram in, and you can copy and paste this into Word, PowerPoint or elsewhere. R will generally give basic plots unless you tell it otherwise. To get a histogram with red bars, enter: 357 | 358 | ```{r,results='hide',eval=FALSE} 359 | hist(burg.rates, col = "red") 360 | ``` 361 | 362 | ```{r,results='hide',echo=FALSE,comment=NA,warning=FALSE} 363 | pdf('plot2.pdf', 4, 4) 364 | hist(burg.rates, col = "red") 365 | dev.off() 366 | ``` 367 | 368 | \begin{center} 369 | \includegraphics{plot2.pdf} 370 | \par 371 | \end{center} 372 | 373 | And to add a title, x-axis label (xlab) and y-axis label (ylab) use: 374 | 375 | ```{r,results='hide',eval=FALSE} 376 | hist(burg.rates, col = "red", main = "Burglaries per 1000 households", xlab = "Rate", 377 | ylab = "Frequency") 378 | ``` 379 | 380 | ```{r,results='hide',echo=FALSE,comment=NA,warning=FALSE} 381 | pdf('plot3.pdf', 4, 4) 382 | hist(burg.rates, col = "red", main = "Burglaries per 1000 households", xlab = "Rate", 383 | ylab = "Frequency") 384 | dev.off() 385 | ``` 386 | 387 | \begin{center} 388 | \includegraphics{plot3.pdf} 389 | \par 390 | \end{center} 391 | 392 | You can also see the relationship between the two variables (median house price and burglary rates) by creating a scatter plot: 393 | 394 | ```{r,results='hide',eval=FALSE} 395 | plot(burg.rates, house.prices, main = "Burglary vs. House Price", 396 | xlab = "Burglaries (per 1000 households)", 397 | ylab = "Median House Price (1000s Pounds)") 398 | ``` 399 | 400 | ```{r,results='hide',echo=FALSE,comment=NA,warning=FALSE} 401 | pdf('plot4.pdf', 4, 4) 402 | plot(burg.rates, house.prices, main = "Burglary vs. House Price", 403 | xlab = "Burglaries (per 1000 households)", 404 | ylab = "Median House Price (1000s Pounds)") 405 | dev.off() 406 | ``` 407 | 408 | \begin{center} 409 | \includegraphics{plot4.pdf} 410 | \par 411 | \end{center} 412 | 413 | This shows that there is a relationship between the two quantities, although there is still a fair amount of randomness as well. The points show there is a general tendency for house prices to fall as burglary rate increases, but that there are other factors affecting house prices as well. -------------------------------------------------------------------------------- /R_Markdown/2-recoding-data.Rmd: -------------------------------------------------------------------------------- 1 | ```{r set-options, echo=FALSE,comment=NA, cache=FALSE} 2 | options(width=87) 3 | ``` 4 | 5 | 2. Reworking and Recoding Data 6 | ================= 7 | 8 | Often when working on a project you will have a data set that will contain additional information that you don't need for your analysis; or, have attributes which aren't specified as you require. This helpsheet explains how to remove and add additional attributes. 9 | 10 | For example, let's say we have a data set as follows: 11 | 12 | Name | Age | Place | School | Degree 13 | --- | --- | --- | --- | --- 14 | John | 20 | Liverpool | Hillside High School | Geography BA (Hons) 15 | Rachel | 21 | Norwich | Colman High School | Geography & Archaeology BA (Joint Hons) 16 | ... | ... | ... | ... | ... 17 | 18 | And we are only interested in people's age for this exercise. As such, we don't need all of the other data. 19 | 20 | Before we start, we need to setup the working directory and read in the data: 21 | 22 | >>*For more information on working directories, see the worksheet '1. R Basics'. Remember to create the folder `R work` if it doesn't exist already.* 23 | 24 | ```{r,echo=FALSE,comment=NA,results='hide'} 25 | setwd("/Users/nickbearman/Dropbox/r-helpsheets/helpsheets/2-recoding-data") 26 | ``` 27 | 30 | ```{r,echo=FALSE,comment=NA,results='hide'} 31 | file_location <- "example.csv" 32 | data <- read.csv(file_location, header = TRUE) 33 | ``` 34 | ```{r,eval=FALSE,results='hide'} 35 | # Set working directory 36 | setwd("M:/R work") 37 | # Read data from the web 38 | data <- read.csv("http://data.alex-singleton.com/r-helpsheets/2/example.csv", header = TRUE) 39 | ``` 40 | 41 | We will now display this to check it has been read in correctly: 42 | 43 | ```{r,results='hide'} 44 | data 45 | ``` 46 | 47 | Which should give you this: 48 | 49 | ```{r,echo=FALSE,comment=NA} 50 | data 51 | ``` 52 | 53 | The `subset` command can be used to extract just the specified columns (and/or rows) from the data set. For example: 54 | 55 | ```{r,results='hide'} 56 | subset(data, select = c("Name", "Age")) 57 | ``` 58 | 59 | ```{r,results='hide'} 60 | subset(data, Place == "Liverpool", select = c("Name", "Age")) 61 | ``` 62 | 63 | We can also store this as a new object: 64 | 65 | ```{r,results='hide'} 66 | data.Liverpool <- subset(data, Place == "Liverpool", select = c("Name", "Age")) 67 | ``` 68 | 69 | Because the statement assigns the output of the subset function to the new object called `"data.Liverpool"`, nothing will be printed. As such, we can check by typing `data.Liverpool`: 70 | 71 | ```{r,echo=FALSE,comment=NA} 72 | data.Liverpool 73 | ``` 74 | 75 | Adding a column to a data frame is done using the $ symbol. We will initially store `NA` (i.e. no value) in the column. 76 | 77 | ```{r,results='hide'} 78 | data.Liverpool$diff100 <- NA 79 | ``` 80 | 81 | We also use the same principle to calculate the age difference from 100 82 | 83 | ```{r,results='hide'} 84 | data.Liverpool$diff100 <- 100 - data.Liverpool$Age 85 | ``` 86 | 87 | Perhaps we decide that we don't like the label of the first column "`Name`" and that it would be more appropriate to call it "`FirstName`". To make this change we create a variable with the column labels that we want: 88 | 89 | ```{r,results='hide'} 90 | new_column_names <- c("FirstName","Age","diff100") 91 | ``` 92 | 93 | When doing this it is always a good idea to check that the length of the object we have just created (it should be `3`) is the same as the number of columns in our data frame. 94 | 95 | ```{r,results='hide'} 96 | length(new_column_names) 97 | ``` 98 | 99 | ```{r,results='hide'} 100 | ncol(data.Liverpool) 101 | ``` 102 | 103 | We can then add the new column names to the data frame: 104 | 105 | ```{r,results='hide'} 106 | colnames(data.Liverpool) <- new_column_names 107 | ``` 108 | 109 | Check the data frame now, and the names should be changed. 110 | 111 | ```{r,echo=FALSE,comment=NA} 112 | data.Liverpool 113 | ``` 114 | 115 | Instead of recording people's age in years, perhaps we just need this in two categories - 21 and over, and under 21. We can _recode_ the `Age` variable into a new variable as follows: 116 | 117 | ```{r,results='hide'} 118 | data.Liverpool$AgeCat[data.Liverpool$Age < 21] <- "Under 21" 119 | data.Liverpool$AgeCat[data.Liverpool$Age >= 21] <- "21 or over" 120 | ``` 121 | 122 | This will create the new variable, `AgeCat`. To see what has happened to the object, print the `data.Liverpool` again: 123 | 124 | ```{r,echo=FALSE,comment=NA} 125 | data.Liverpool 126 | ``` 127 | -------------------------------------------------------------------------------- /R_Markdown/3-importing-external-data.Rmd: -------------------------------------------------------------------------------- 1 | 3. Importing External Data 2 | =============== 3 | 4 | Often one of the first steps when doing a project in R is to import some data. This helpsheet will cover reading in a CSV file and a Shapefile. A CSV file is a basic format for data; a Shapefile is a collection of files that relate to geographic features (points, lines or polygons), associated attribute data and their projection information. Once such files have been read into R, you might need to tidy them up before doing any analysis - see helpsheet "2. Reworking and Recoding Data", for more information. 5 | 6 | ### CSV Files 7 | 8 | CSV (Comma Separated Values) files typically look like this when opened in a text editor: 9 | 10 | ``` 11 | colname1,colname2,.... 12 | row1value,row1value,.... 13 | row2value,row2value,.... 14 | ``` 15 | 16 | Each column in separated by a comma, and each row with a carriage return. We will now read an example CSV file into a data frame in R. This is avaliable as a file (`example.csv`) which we will use in this exercise, and looks like: 17 | 18 | ``` 19 | Header text we want to ignore 20 | Name,Age,Place,School 21 | John,20,Liverpool,Hillside High School 22 | Rachel,21,Norwich,Colman High School 23 | Helen,34,Liverpool,Hillside High School 24 | ``` 25 | 26 | To read the file in, run this command: 27 | ```{r,echo=FALSE,comment=NA,results='hide'} 28 | setwd("/Users/nickbearman/Dropbox/r-helpsheets/helpsheets/3-importing-external-data") 29 | file_location <- "example.csv" 30 | ``` 31 | 36 | ```{r,echo=FALSE,comment=NA,results='hide'} 37 | file_location <- "example.csv" 38 | data <- read.csv(file_location, header = TRUE, skip = 1) 39 | ``` 40 | ```{r,eval=FALSE,results='hide'} 41 | # Set working directory 42 | setwd("M:/R work") 43 | # Read data from the web 44 | file_location <- "http://data.alex-singleton.com/r-helpsheets/3/example.csv" 45 | data <- read.csv(file_location, header = TRUE, skip = 1) 46 | ``` 47 | 48 | And to check that it has been input correctly, which is always a good idea with R, run: 49 | 50 | ```{r,results='hide'} 51 | data 52 | ``` 53 | 54 | This should output: 55 | 56 | ```{r,echo=FALSE,comment=NA} 57 | data 58 | ``` 59 | 60 | Here, the object we created is called `"data"` and the function that we used is called `"read.csv"`, which has a number of options: 61 | 62 | 1. '`File_location`' is where the file is stored (within your working directory, see helpsheet 1. Basics for more details). 63 | 64 | 2. '`Header = TRUE`' tells R that the CSV file has some header information (column names) in it, in this case `Name`, `Age`, `Place` and `School`. 65 | 66 | 3. '`Skip = 1`' tells R to ignore the first line of the CSV file as we don't want this in the data set. This was specified as `"Header text we want to ignore"` in the file. 67 | 68 | We can now look at the object `data` in the normal way and, for example, check the column names using: 69 | 70 | ```{r,results='hide'} 71 | colnames(data) 72 | ``` 73 | 74 | Which should output: 75 | 76 | ```{r,echo=FALSE,comment=NA} 77 | colnames(data) 78 | ``` 79 | 80 | If you want to rename columns or "recode" the attributes of your data, see helpsheet "2. Reworking and Recoding Data". 81 | 82 | ### Shapefiles 83 | 84 | Shapefiles contain geographic data that we can also read into R, but to do this R needs some additional packages. These are already installed, but just need to be loaded. 85 | 86 | To do this, run these commands: 87 | 88 | ```{r,results='hide',message=FALSE} 89 | library(sp) 90 | library(rgeos) 91 | library(maptools) 92 | library(RColorBrewer) 93 | library(GISTools) 94 | library(rgdal) 95 | ``` 96 | 97 | When you load each package, R will write some output to the console. Check for any error messages, and if everything seems to have worked, continue to the next section. 98 | 99 | We can read in a Shapefile and then display it in R. 100 | 101 | ```{r,echo=FALSE,comment=NA,results='hide'} 102 | setwd("/Users/nickbearman/Dropbox/r-helpsheets/helpsheets/3-importing-external-data") 103 | ``` 104 | 109 | ```{r,eval=FALSE,results='hide'} 110 | # Set working directory 111 | setwd("M:/R work") 112 | # Download data.zip from the web 113 | download.file("http://data.alex-singleton.com/r-helpsheets/3/data.zip", "data.zip") 114 | # Unzip file 115 | unzip("data.zip") 116 | ``` 117 | ```{r,results='hide'} 118 | # Read in Shapefile 119 | Wards <- readOGR(".", "england-caswa_2001") 120 | ``` 121 | 122 | 123 | ```{r,eval=FALSE,highlight=TRUE} 124 | plot(Wards) 125 | ``` 126 | 127 | ```{r,results='hide',echo=FALSE,comment=NA,warning=FALSE} 128 | pdf('plot1.pdf', 4, 4) 129 | plot(Wards) 130 | dev.off() 131 | ``` 132 | 133 | ![Image](plot1.pdf)\ 134 | 135 | 136 | The object `Wards` now contains the attributes of the Shapefile. This has created a new type of object called a SpatialPolygonsDataFrame. If the Shapefile had been lines (e.g. roads), this would be a SpatialLinesDataFrame, or points, a SpatialPointsDataFrame. These new object types contain the spatial information (e.g. the boundary locations) as well as attribute data for each of the spatial features (e.g. Ward boundaries). The SpatialPolygonsDataFrame contains a number of different 'slots', each of which hold different information. Use the `slotNames` function to get a list of the different slots: 137 | 138 | ```{r,results='hide'} 139 | slotNames(Wards) 140 | ``` 141 | 142 | ```{r,echo=FALSE,comment=NA} 143 | slotNames(Wards) 144 | ``` 145 | 146 | The slot `data` contains the attribute information for the shape file, and this is accessed using an @ symbol: 147 | 148 | ```{r,results='hide'} 149 | head(Wards@data) 150 | ``` 151 | 152 | ```{r,echo=FALSE,comment=NA} 153 | head(Wards@data) 154 | ``` 155 | 156 | The data slot can be accessed in the same way as any standard data frame. 157 | -------------------------------------------------------------------------------- /R_Markdown/4-attribute-joins.Rmd: -------------------------------------------------------------------------------- 1 | 4. Joining Data 2 | ===== 3 | 4 | When doing spatial data analysis, it is quite common to need to merge different data sets together. There are two main ways of doing this, firstly using the `merge()` command which will match attribute data in data frames, and secondly, using the match technique, which works with attribute data in shape files. 5 | 6 | ### Merging Attribute Data in Data Frames 7 | 8 | The `merge()` function allows us to take two data sets and combine them into one, based on a common variable. To test this, import the following data by running this command: 9 | 10 | ```{r,echo=FALSE,comment=NA,results='hide'} 11 | setwd("/Users/nickbearman/Dropbox/r-helpsheets/helpsheets/4-attribute-joins") 12 | file_location <- "example.csv" 13 | ``` 14 | 19 | ```{r,echo=FALSE,comment=NA,results='hide'} 20 | file_location <- "example.csv" 21 | data <- read.csv(file_location, header = TRUE, skip = 1) 22 | ``` 23 | ```{r,eval=FALSE,results='hide'} 24 | # Set working directory 25 | setwd("M:/R work") 26 | # Read data from the web 27 | data <- read.csv("http://data.alex-singleton.com/r-helpsheets/4/example.csv", header = TRUE, skip = 1) 28 | ``` 29 | 30 | ```{r,results='hide',echo=FALSE,comment=NA} 31 | data <- read.csv(file_location, header = TRUE, skip = 1) 32 | ``` 33 | 34 | And to check that it has imported correctly, which is always a good idea, run: 35 | 36 | ```{r,results='hide'} 37 | data 38 | ``` 39 | 40 | Which should output: 41 | 42 | ```{r,echo=FALSE,comment=NA} 43 | data 44 | ``` 45 | 46 | You now need to create another data frame which we will use as an example. You could create another csv file and import this; however, we will illustrate another way of achieving this by joining a series of vector lists. 47 | 48 | ```{r,results='hide'} 49 | # Create a person vector 50 | Person <- c("Paul", "Mike", "John", "Helen", "Mia", "Leo", "Rachel") 51 | # Create a favourite functions vector 52 | Function <- c("merge()", "read.csv()", "colnames()", "ncol()", "length()", "getwd()", "save.image()") 53 | # We can now join these two vectors into a new data frame of favourite functions 54 | fav_fun <- data.frame(Person, Function) 55 | # View the fav_fun 56 | fav_fun 57 | ``` 58 | Which should look like this: 59 | 60 | ```{r,echo=FALSE,comment=NA} 61 | fav_fun 62 | ``` 63 | 64 | We now have two data sets; `data`, which contains a list of people, locations and schools and `fav_fun`, which contains a list including those people as well as additional people who have attended R workshops. 65 | 66 | The next step is to combine the two. What we are going to do is select the people in the `fav_fun` data frame who also appear in the `test` data frame, and copy their favourite R function into a new data frame, along with all the information from `test`. 67 | 68 | We will refer to the two data frames as `x` and `y`. The x data frame is `data`; and the y is `fav_fun`. In `x`, the column containing the list of people is called "Name", and in `y`, it is called "Person". The parameters of the merge function first accept the two table names, and then the lookup columns as `by.x` or `by.y`. You should also include `all.x=TRUE` as a final parameter. This tells the function to keep all the records in `x`, but only those in `y` that match. 69 | 70 | ```{r,results='hide'} 71 | People_And_Functions <- merge(data, fav_fun, by.x = "Name", by.y = "Person", all.x = TRUE) 72 | ``` 73 | 74 | To see what this command has done, type `People_And_Functions` to show the content of the new data frame. This should look like: 75 | 76 | ```{r,echo=FALSE,comment=NA} 77 | People_And_Functions 78 | ``` 79 | 80 | If the by column names were named the same in both `x` and `y` (e.g. both called "Name"), we could specify this more simply with `by="column name"` rather than `by.x` and `by.y`; and finally, a critical issue when making any join is assuring that the "`by`" columns are in the same format. 81 | 82 | ### Match data in a Shapefile 83 | 84 | The `match()` function works in a very similar way to `merge()` but can be used to append attribute data to a shape file. 'merge()' will often cause errors when working with spatial data frames. 85 | 86 | Load the required packages and example shapefile from helpsheet "3. Importing External Data". 87 | 88 | ```{r,results='hide',message=FALSE} 89 | library(rgdal) 90 | ``` 91 | ```{r,echo=FALSE,comment=NA,results='hide'} 92 | setwd("/Users/nickbearman/Dropbox/r-helpsheets/helpsheets/4-attribute-joins") 93 | file_location <- "example.csv" 94 | ``` 95 | 100 | ```{r,eval=FALSE,results='hide'} 101 | # Set working directory 102 | setwd("M:/R work") 103 | # Download data.zip from the web 104 | download.file("http://data.alex-singleton.com/r-helpsheets/4/data.zip", "data.zip") 105 | # Unzip file 106 | unzip("data.zip") 107 | ``` 108 | ```{r,results='hide'} 109 | # Read in shape file 110 | Wards <- readOGR(".", "england_caswa_2001") 111 | ``` 112 | 113 | ```{r,eval=FALSE,results='hide'} 114 | # Plot Wards to check it has been imported correctly 115 | plot(Wards) 116 | ``` 117 | 118 | ```{r,results='hide',echo=FALSE,comment=NA,warning=FALSE} 119 | pdf('plot1.pdf', 4, 4) 120 | plot(Wards) 121 | dev.off() 122 | ``` 123 | 124 | ![Image](plot1.pdf)\ 125 | 126 | 127 | We now have the content of the `Wards` shapefile in R. Have a look at the content of the data in the data slot: 128 | 129 | ```{r,results='hide'} 130 | head(Wards@data) 131 | ``` 132 | 133 | ```{r,echo=FALSE,comment=NA} 134 | head(Wards@data) 135 | ``` 136 | 137 | We are now going to append the following data onto it, which are index scores for the rate of diabetes prevelance: 138 | 139 | Ward | Rate 140 | --- | --- 141 | 00BYGC | 50 142 | 00BYFN | 198 143 | 00BYFU | 56 144 | 00BYFC | 78 145 | 00BYFG | 123 146 | 00BYFS | 21 147 | 148 | Run this code to create this data frame: 149 | 150 | ```{r,results='hide'} 151 | # Create an ons code vector 152 | Ward <- c("00BYGC", "00BYFN", "00BYFU", "00BYFC", "00BYFG", "00BYFS") 153 | # Create a rate vector 154 | Rate <- c(50, 198, 56, 78, 123, 21) 155 | # We can now join these two vectors into a new data frame of wards_diabetes 156 | wards_diabetes <- data.frame(Ward, Rate) 157 | # View the wards_diabetes 158 | wards_diabetes 159 | ``` 160 | This should look like: 161 | 162 | ```{r,echo=FALSE,comment=NA} 163 | wards_diabetes 164 | ``` 165 | 166 | We can then use the `match()` function to append these diabetes rates on to `Wards@data`, by matching the `Ward` column from the `wards_diabetes` data frame to the `ons_label` column in the data slot of the wards SpatialPolygonsDataFrame. 167 | 168 | ```{r,results='hide'} 169 | Wards@data <- data.frame(Wards@data, wards_diabetes[match(Wards@data[, "ons_label"], wards_diabetes[, "Ward"]), ]) 170 | ``` 171 | 172 | And to check, run: 173 | 174 | ```{r,results='hide'} 175 | head(Wards@data) 176 | ``` 177 | 178 | ```{r,echo=FALSE,comment=NA} 179 | head(Wards@data) 180 | ``` 181 | 182 | We have now appended the data, but also have the ward listed twice. To remove this, run: 183 | 184 | ```{r,results='hide'} 185 | Wards@data$Ward <- NULL 186 | head(Wards@data) 187 | ``` 188 | 189 | Which changes `Wards@data` to: 190 | 191 | ```{r,echo=FALSE,comment=NA} 192 | head(Wards@data) 193 | ``` 194 | -------------------------------------------------------------------------------- /R_Markdown/5-basic-maps.Rmd: -------------------------------------------------------------------------------- 1 | # 5. Basic Maps 2 | 3 | This helpsheet shows you how to make a simple map using the `GISTools` package. 4 | 5 | To start with, we need to load the `GISTools` package as well as some other packages we need: 6 | 7 | ```{r,results='hide', message=FALSE} 8 | library(rgdal) 9 | library(GISTools) 10 | library(RColorBrewer) 11 | ``` 12 | 13 | We also need to load a data set, which in this example, relate to Lower Layer Super Output (LSOA) zones within Liverpool, and also an outline of England. The following commands will set your working directory, download, unzip and load the data files. 14 | 15 | ```{r,echo=FALSE,comment=NA,results='hide'} 16 | setwd("/Users/nickbearman/Dropbox/r-helpsheets/helpsheets/5-basic-maps") 17 | ``` 18 | 23 | ```{r,eval=FALSE,results='hide'} 24 | # Set working directory 25 | setwd("M:/R work") 26 | # Download data.zip from the web 27 | download.file("http://data.alex-singleton.com/r-helpsheets/5/data.zip", "data.zip") 28 | # Unzip file 29 | unzip("data.zip") 30 | ``` 31 | ```{r,results='hide'} 32 | # Read in both shapefiles 33 | LSOA <- readOGR(".", "england_LSOA_2011_dwelling_count") 34 | outline <- readOGR(".", "England_ol_2011_gen_clipped") 35 | ``` 36 | 37 | We can do a very basic plot of the map using: 38 | 39 | ```{r,eval=FALSE,highlight=TRUE} 40 | plot(LSOA) 41 | ``` 42 | 43 | Which gives us a map, just showing the boundaries of the LSOAs. 44 | 45 | 46 | 47 | ```{r,results='hide',echo=FALSE,comment=NA,warning=FALSE,fig.cap = "Map of Wards in Liverpool"} 48 | pdf('plot1.pdf', 5, 5) 49 | plot(LSOA) 50 | dev.off() 51 | ``` 52 | 53 | ![Image](plot1.pdf)\ 54 | 55 | We can also plot an outline of England in a similar way. 56 | 57 | ```{r,eval=FALSE,highlight=TRUE} 58 | plot(outline) 59 | ``` 60 | 61 | This replaces the first map, but we can get R to overlay one on top of the other, by using the command `add = TRUE`. The order of plots is key here - R will maintain the scale and extent of the first map. We can also adjust the colour of the border to a red colour (`border="red"`), and the fill colour (`col="#2C7FB820"`) a shade of blue. These represent two ways of specifying colours. The second contains eight alphanumerics, the first six relate to a HEX colour code. To view various colours that can be used in R, have a look at the website http://research.stowers-institute.org/efg/R/Color/Chart/ColorChart.pdf. The final two characters are the level of transparency (in this case 20%). _Sometimes when running R in Windows, the transparency option will not work - it will just fill it with a solid colour. In this case, just remove the `col = "#2C7FB820"` section from the plot command to just generate a red outline._ 62 | 63 | 64 | ```{r,eval=FALSE,highlight=TRUE} 65 | # Plot the LSOA Map 66 | plot(LSOA) 67 | # Overplot the outline map 68 | plot(outline, add = TRUE, border = "red", col = "#2C7FB820") 69 | ``` 70 | 71 | ```{r,results='hide',echo=FALSE,comment=NA} 72 | pdf('plot2.pdf', 5, 5) 73 | plot(LSOA) 74 | plot(outline, add = TRUE, border = "red", col = "#2C7FB820") 75 | dev.off() 76 | ``` 77 | 78 | \begin{center} 79 | \includegraphics{plot2.pdf} 80 | \par 81 | \end{center} 82 | 83 | The LSOA data frame contains some more information, which we can see by looking in the data slot of the object: 84 | 85 | ```{r,results='hide'} 86 | head(LSOA@data) 87 | ``` 88 | 89 | ```{r,echo=FALSE,comment=NA} 90 | head(LSOA@data) 91 | ``` 92 | 93 | This shows us that the shape file contains a field called '`COUNT_DWELL`' which contains the count of the number of dwellings in each LSOA. We can use this to create a choropleth map with: 94 | 95 | ```{r,eval=FALSE,highlight=TRUE} 96 | choropleth(LSOA, LSOA$COUNT_DWEL) 97 | ``` 98 | 99 | ```{r,results='hide',echo=FALSE,comment=NA} 100 | pdf('plot3.pdf', 5, 5) 101 | choropleth(LSOA, LSOA$COUNT_DWEL) 102 | dev.off() 103 | ``` 104 | 105 | \begin{center} 106 | \includegraphics{plot3.pdf} 107 | \par 108 | \end{center} 109 | 110 | This map is ok, but we can easily make it more effective with a few extra commands. The new commands include: 111 | 112 | 1. `brewer.pal` which returns a set of colours from a range of pre-set palettes that look good on maps. In this case, we are getting `5` colours from the `"Blues"` palette. For more information on the R command, type '`?brewer.pal`' into R, for more information on the concept, see http://colorbrewer.org. 113 | 114 | 1. `auto.shading` which categorises the data we want to show on to the map (in this case, `LSOA$COUNT_DWELL`) into the specified number of categories (`5`), coloured with the specified colours (`cols = brewer.pal(5, "Blues")`). 115 | 116 | 1. `choro.legend` and `north.arrow` both have a set of coordinates as one of their parameters (e.g. `331089, 384493`). These say where the object is located on the map. You may have to fiddle with these to get the spacing correct (see note below). 117 | 118 | Run the commands below in R, and read the text below for more information. 119 | 120 | ```{r,eval=FALSE,highlight=TRUE} 121 | # Set colour and number of classes 122 | shades <- auto.shading(LSOA$COUNT_DWEL, n = 5, cols = brewer.pal(5, "Blues")) 123 | # Draw the map 124 | choropleth(LSOA, LSOA$COUNT_DWEL, shades) 125 | # Add a legend 126 | choro.legend(331089, 384493, shades, fmt = "%g", title = "Count of Dwellings") 127 | # Add a title to the map 128 | title("Count of Dwellings by LSOA, 2011") 129 | # add Notth arrow 130 | north.arrow(332308, 387467, 300) 131 | # Draw a box around the map 132 | box(which = "outer") 133 | ``` 134 | 135 | ```{r,results='hide',echo=FALSE,comment=NA} 136 | pdf('plot4.pdf', 7, 7) 137 | # Set colour and number of classes 138 | shades <- auto.shading(LSOA$COUNT_DWEL, n = 5, cols = brewer.pal(5, "Blues")) 139 | # Draw the map 140 | choropleth(LSOA, LSOA$COUNT_DWEL, shades) 141 | # Add a legend 142 | choro.legend(328089, 384493, shades, fmt = "%g", title = "Count of Dwellings") 143 | # Add a title to the map 144 | title("Count of Dwellings by LSOA, 2011") 145 | # add Notth arrow 146 | north.arrow(332308, 387467, 300) 147 | # Draw a box around the map 148 | box(which = "outer") 149 | dev.off() 150 | ``` 151 | 152 | See the next page for the map. 153 | 154 | You might find you will need to adjust the location or size of the legend to get this to fit onto your plot correctly. To find a new set of location coordinates, type `locator()` into the terminal and press enter. After doing this, when you hover over the plot, the mouse will turn into a cross. If you click, and then right-click and choose 'Stop', the location of the click is printed to the terminal - you can use these to re-position items in the plot. 155 | 156 | To change the size of the legend, use the `cex = ` command. Update the `choro.legend` line to read `choro.legend(328089, 384493, shades, fmt = "%g", title = "Count of Dwellings", cex = 1.1)` and see what happens. The `cex` value is a multiple which increases or decreases the size of the legend. Experiment with this until you find something that works well. 157 | 158 | For more information on the `GISTools` package, have a look at http://cran.r-project.org/web/packages/GISTools/ GISTools.pdf. 159 | 160 | \begin{center} 161 | \includegraphics{plot4.pdf} 162 | \par 163 | \end{center} 164 | 165 | -------------------------------------------------------------------------------- /R_Markdown/6-basic-spatial-analysis.Rmd: -------------------------------------------------------------------------------- 1 | # 6. Basic Spatial Analysis 2 | 3 | This helpsheet will explore a variety of basic spatial analysis techniques, including *clipping*, *point in polygon* and *buffering*. 4 | 5 | ### Clipping 6 | 7 | Clipping allows us to use one set of boundaries to cut another, a bit like using a cookie cutter. 8 | 9 | ```{r,results='hide',message=FALSE} 10 | # Load the Libaries 11 | library(rgdal) 12 | library(maptools) 13 | library(rgeos) 14 | library(stringr) 15 | ``` 16 | 17 | ```{r,echo=FALSE,results='hide'} 18 | setwd("/Users/nickbearman/Dropbox/r-helpsheets/helpsheets/6-basic-spatial-analysis") 19 | ``` 20 | 25 | ```{r,eval=FALSE,results='hide'} 26 | # Set working directory 27 | setwd("M:/R work") 28 | # Download data.zip from the web 29 | download.file("http://data.alex-singleton.com/r-helpsheets/6/data.zip", "data.zip") 30 | # Unzip file 31 | unzip("data.zip") 32 | ``` 33 | ```{r,results='hide'} 34 | # Read in both shape files 35 | LSOA <- readOGR(".", "england_LSOA_2011") 36 | outline <- readOGR(".", "England-outline") 37 | ``` 38 | 39 | First of all, we can plot the LSOA zones in Liverpool. 40 | 41 | ```{r,results='hide',eval=FALSE} 42 | # Plot the LSOA Map 43 | plot(LSOA) 44 | ``` 45 | 46 | ```{r,results='hide',echo=FALSE,warning=FALSE} 47 | pdf('plot1.pdf', 5, 5) 48 | plot(LSOA) 49 | dev.off() 50 | ``` 51 | 52 | ![Image](plot1.pdf)\ 53 | 54 | 55 | 56 | We can also plot the England outline, but if we just run `plot(outline)` it will replace the LSOA plot on the display. To add the `outline` layer to the existing plot window, we can run the code below which will plot the outline with a red border and we can also adjust the colour of the border to a red colour (`border="red"`), and the fill colour (`col="#2C7FB820"`) a shade of blue. These represent two ways of specifying colours. The second contains eight alphanumerics, the first six relate to a HEX colour code. To view various colours that can be used in R, have a look at the website http://research.stowers-institute.org/efg/R/Color/Chart/ColorChart.pdf. The final two characters are the level of transparency (in this case 20%). _Sometimes when running R in Windows, the transparency option will not work - it will just fill it with a solid colour. In this case, just remove the `col = "#2C7FB820"` section from the plot command to generate a red outline._ 57 | 58 | ```{r,results='hide',eval=FALSE} 59 | # Overplot the outline map 60 | plot(outline, add = TRUE, border = "red", col = "#2C7FB820") 61 | ``` 62 | 63 | ```{r,results='hide',echo=FALSE,warning=FALSE} 64 | pdf('plot2.pdf', 5, 5) 65 | plot(LSOA) 66 | plot(outline, add = TRUE, border = "red", col = "#2C7FB820") 67 | dev.off() 68 | ``` 69 | 70 | \begin{center} 71 | \includegraphics{plot2.pdf} 72 | \par 73 | \end{center} 74 | 75 | As you will notice, the LSOA boundaries cross the River Mersey and stop at the river centre line. This doesn't look very nice, so we can tidy this up by getting R to clip the LSOA boundaries where they cross the England outline border. 76 | 77 | We do this using the `gIntersection` command, passing it the two layer variables (`outline` and `LSOA`). We can also tell R we just want it to use the area covering the LSOAs by specifying `byid = TRUE` and `id = my_area_id`. Be aware that the `gIntersection` command may take up to 90 seconds to run - do not worry if your computer appears to freeze. Just wait for the command to complete. 78 | 79 | ```{r,results='hide',eval=FALSE} 80 | # set the area we want to cut 81 | my_area_id <- as.character(LSOA@data$ZONECODE) 82 | # run the Intersection command, saving output to clipLSOA, this may take anywhere up to 90 seconds to run 83 | clipLSOA <- gIntersection(LSOA, outline, byid = TRUE, id = my_area_id) 84 | #replot the map as above to see what we have done 85 | plot(clipLSOA) 86 | plot(outline, add = TRUE, border = "red", col = "#2C7FB820") 87 | ``` 88 | 89 | ```{r,results='hide',echo=FALSE,warning=FALSE} 90 | pdf('plot3.pdf', 5, 5) 91 | # set the area we want to cut 92 | my_area_id <- as.character(LSOA@data$ZONECODE) 93 | # run the Intersection command, saving output to clipLSOA, this may take a few seconds to run 94 | clipLSOA <- gIntersection(LSOA, outline, byid = TRUE, id = my_area_id) 95 | #replot the map as above to see what we have done 96 | plot(clipLSOA) 97 | plot(outline, add = TRUE, border = "red", col = "#2C7FB820") 98 | dev.off() 99 | ``` 100 | 101 | \begin{center} 102 | \includegraphics{plot3.pdf} 103 | \par 104 | \end{center} 105 | 106 | We have now removed the parts of the LSOAs that overlap the coastline, and the map looks much more attractive. 107 | 108 | ### Point in Polygon Analysis 109 | 110 | Point in polygon analysis is useful when you want to create a subset of points from a larger set based on their spatial location. In this example we will load a list of locations that relate to all doctors surgeries in England, and use the polygons of ward boundaries in Leeds to create a subset of the Leeds doctors surgeries. To begin with, we need to load the libraries and get the GP and Wards data. 111 | 112 | ```{r,results='hide'} 113 | # Load the Libaries 114 | library(maptools) 115 | library(rgeos) 116 | ``` 117 | 118 | ```{r,echo=FALSE,results='hide'} 119 | setwd("/Users/nickbearman/Dropbox/r-helpsheets/helpsheets/6-basic-spatial-analysis") 120 | ``` 121 | 126 | ```{r,eval=FALSE,results='hide'} 127 | # Set working directory 128 | setwd("M:/R work") 129 | # Download data.zip from the web 130 | download.file("http://data.alex-singleton.com/r-helpsheets/6/data.zip", "data.zip") 131 | # Unzip file 132 | unzip("data.zip") 133 | ``` 134 | ```{r,results='hide'} 135 | # Read in shapefile 136 | Wards <- readShapeSpatial("CAS-leeds", proj4string = CRS("+init=epsg:27700")) 137 | ``` 138 | 139 | It's worth having a quick look at the Leeds data so we know what it looks like: 140 | 141 | ```{r,eval=FALSE,results='hide'} 142 | # Plot Wards to check it has been read in correctly 143 | plot(Wards) 144 | ``` 145 | 146 | ```{r,echo=FALSE,results='hide',warning=FALSE} 147 | pdf('plot4.pdf', 5, 5) 148 | plot(Wards) 149 | dev.off() 150 | ``` 151 | 152 | \begin{center} 153 | \includegraphics{plot4.pdf} 154 | \par 155 | \end{center} 156 | 157 | The doctors surgeries data is quite untidy - once we've read it in, we need to remove some extra columns that we don't need, and rename the ones we do. 158 | 159 | ```{r,results='hide',comment=NA} 160 | # Get Data 161 | GP <- read.csv("General Practices 2006.csv", header = TRUE, skip = 3) 162 | 163 | # Extract the columns we want 164 | GP <- subset(GP, select =c("Practice.Doctor.s.Name", "Easting", "Northing")) 165 | 166 | # Rename the columns to something more helpful 167 | colnames(GP) <- c("Surgery", "Easting", "Northing") 168 | ``` 169 | 170 | ```{r,eval=FALSE,results='hide'} 171 | # Do a plot to check what the data look like 172 | plot(GP$Easting, GP$Northing) 173 | ``` 174 | 175 | ```{r,echo=FALSE,results='hide',warning=FALSE} 176 | pdf('plot5.pdf', 6, 6) 177 | # Do a plot to check what the data look like 178 | plot(GP$Easting, GP$Northing) 179 | dev.off() 180 | ``` 181 | 182 | \begin{center} 183 | \includegraphics{plot5.pdf} 184 | \par 185 | \end{center} 186 | 187 | This should look like the above. The next stage is to convert the data into a SpatialPointsDataFrame. 188 | 189 | ```{r,results='hide'} 190 | # Remove those GP without Easting or Northing 191 | GP <- subset(GP, Easting != "" & Northing != "") 192 | # Create a unique ID for each GP 193 | GP$GP_ID <- 1:nrow(GP) 194 | # Create the SpatialPointsDataFrame 195 | GP_SP <- SpatialPointsDataFrame(coords = c(GP[2], GP[3]), data = data.frame(GP$Surgery, GP$GP_ID), proj4string = CRS("+init=epsg:27700")) 196 | ``` 197 | 198 | The first line contains a `subset` command which removes any of the entries which have a blank value for Northings or Eastings. `!=` means 'not equal to' and `&` means 'AND' so in "English" the command reads "overwrite the GP data frame with a subset of the GP data frame where the Easting field is not blank and the Northing field is not blank". 199 | 200 | In a SpatialPointsDataFrame each entry must have a unique ID, so the second line creates an ID in the column `GP_ID`. The third line brings together the different elements to create the SpatialPointsDataFrame, `GP-SP`. `GP[2]` and `GP[3]` are the `Easting` and `Northing` columns respectively, and the `data =` section tells R which bits of the data frame to include. In this case we only want the surgery name (`GP$Surgery`) and the ID number (`GP$GP_ID`). The final term (`proj4string`) specifies which projection the data set is in - in this case, British National Grid (`epsg:27700`). 201 | 202 | ```{r,eval=FALSE,results='hide'} 203 | # Show the results 204 | plot(GP_SP) 205 | ``` 206 | 207 | ```{r,echo=FALSE,results='hide',warning=FALSE} 208 | pdf('plot6.pdf', 6, 6) 209 | # Show the results 210 | plot(GP_SP) 211 | dev.off() 212 | ``` 213 | 214 | \begin{center} 215 | \includegraphics{plot6.pdf} 216 | \par 217 | \end{center} 218 | 219 | This plot will look similar to the previous one, but the data are now stored in a SpatialPointsDataFrame. We now can calculate a point in polygon, i.e. to select those points which lie within the boundary of Leeds. The forth line below uses a `!is.na` command. `is.an` is a command to test whether a value is 'NA' and `!` means the inverse, so the command is testing whether the value (of `GP_SP@data$label`) is not `NA`. 220 | 221 | ```{r,results='hide'} 222 | # point in polygon - returns a dataframe of the attributes of the polygons 223 | # that the point is within. 224 | o <- over(GP_SP, Wards) 225 | 226 | # Many of these will be NA values - because most GPs are not in Leeds! 227 | head(o) 228 | 229 | # Add the attributes back onto the GP_SP SpatialPointsDataFrame (they are the same length) 230 | GP_SP@data <- cbind(GP_SP@data, o) 231 | 232 | # Use the NA values to remove those points not within Leeds 233 | GP_SP_Leeds <- GP_SP[!is.na(GP_SP@data$label), ] 234 | ``` 235 | 236 | ```{r,eval=FALSE,results='hide'} 237 | # Map your results 238 | plot(GP_SP_Leeds) 239 | ``` 240 | 241 | ```{r,echo=FALSE,results='hide',warning=FALSE} 242 | pdf('plot7.pdf', 6, 6) 243 | # Map your results 244 | plot(GP_SP_Leeds) 245 | dev.off() 246 | ``` 247 | 248 | \begin{center} 249 | \includegraphics{plot7.pdf} 250 | \par 251 | \end{center} 252 | 253 | We can also plot the points over the Leeds LSOAs: 254 | 255 | ```{r,eval=FALSE,results='hide'} 256 | plot(Wards) 257 | plot(GP_SP_Leeds, add = TRUE) 258 | ``` 259 | 260 | ```{r,echo=FALSE,results='hide',warning=FALSE} 261 | pdf('plot8.pdf', 6, 6) 262 | plot(Wards) 263 | plot(GP_SP_Leeds, add = TRUE) 264 | dev.off() 265 | ``` 266 | 267 | \begin{center} 268 | \includegraphics{plot8.pdf} 269 | \par 270 | \end{center} 271 | 272 | We can also view the data in the `GP_SP_Leeds` data frame. 273 | 274 | ```{r,results='hide'} 275 | # View the data slot of the results 276 | head(GP_SP_Leeds@data) 277 | ``` 278 | 279 | ```{r,echo=FALSE,comment=NA} 280 | head(GP_SP_Leeds@data) 281 | ``` 282 | 283 | 284 | ## Buffers 285 | 286 | >> _This section looks at buffers. It carries on from the section on points in polygon, so make sure you complete that section first._ 287 | 288 | Buffers are often used in spatial analysis for defining context of points. In this example we will calculate a buffer from the doctors surgeries of a 10 minute walking distance, based on an average of 3 mph, which is around 1608m. 289 | 290 | The rgeos package has a function called `gBuffer()` that can be used to create buffers around points, lines or polygon objects. In the following example we create a new SpatialPolygons object called `GP_SP_Leeds_Buffers`. This then needs to be converted into a SpatialPolygonsDataFrame object by joining the `@data` from `GP_SP_Leeds` back onto `GP_SP_Leeds_Buffers`. Spatial Polygons objects do not have the data slot. 291 | 292 | ```{r,results='hide'} 293 | # buffers 294 | GP_SP_Leeds_Buffers <- gBuffer(GP_SP_Leeds, width = 1608, byid = TRUE) 295 | 296 | # Convert GP_SP_Leeds_Buffers into a SpatialPolygonsDataFrame (rather than 297 | # SpatialPolygons) by joining the data of the GP_SP_Leeds 298 | # SpatialPolygonsDataFrame 299 | GP_SP_Leeds_Buffers <- SpatialPolygonsDataFrame(GP_SP_Leeds_Buffers, GP_SP_Leeds@data) 300 | ``` 301 | 302 | We can also now plot this on top of the Wards map. 303 | 304 | ```{r,eval=FALSE,results='hide'} 305 | # Wards wards 306 | plot(Wards, axes = FALSE, col = "#6E7B8B", border = "#CAE1FF") 307 | # GP locations 308 | plot(GP_SP_Leeds, pch = 19, cex = 0.4, col = "#5CACEE", add = TRUE) 309 | # catchment buffers 310 | plot(GP_SP_Leeds_Buffers, axes = FALSE, col = NA, border = "red", add = TRUE) 311 | ``` 312 | 313 | ```{r,echo=FALSE,results='hide',warning=FALSE} 314 | pdf('plot9.pdf', 6, 6) 315 | # Wards wards 316 | plot(Wards, axes = FALSE, col = "#6E7B8B", border = "#CAE1FF") 317 | # GP locations 318 | plot(GP_SP_Leeds, pch = 19, cex = 0.4, col = "#5CACEE", add = TRUE) 319 | # catchment buffers 320 | plot(GP_SP_Leeds_Buffers, axes = FALSE, col = NA, border = "red", add = TRUE) 321 | dev.off() 322 | ``` 323 | 324 | \begin{center} 325 | \includegraphics{plot9.pdf} 326 | \par 327 | \end{center} 328 | -------------------------------------------------------------------------------- /R_Markdown/7-converting-coordinates.Rmd: -------------------------------------------------------------------------------- 1 | ```{r set-options, echo=FALSE,comment=NA, cache=FALSE} 2 | options(width=62) 3 | ``` 4 | 5 | # 7. Converting Coordinates 6 | 7 | Sometimes you will need to convert spatial data from one coordinate system to another. This is often called reprojecting as different coordinate systems typically use different projections; i.e. the way in which the curved Earth is represented as a flat surface. There are lots of different projections, including the Mercator and Gall-Peters projections, as shown below: 8 | 9 | ![The Mercator projection on the left and the Gall-Peters projection on the right. _Images from http://en.wikipedia.org/wiki/File:Mercator\_projection\_SW.jpg and http://en.wikipedia.org/wiki/File:Gall%E2%80%93Peters\_projection\_SW.jpg._](Mercator_projection_SW-Gall-Peters-projection_SW.jpg) 10 | 11 | This helpsheet will take you through the process of converting BNG (British National Grid coordinates, Eastings and Northings) to Latitude and Longitude which requires reprojection between the OSBG36 and WGS84 datums. The same principle can be applied to any re-projection though. 12 | 13 | ### Setup 14 | 15 | There are some initial commands we need to run to setup R for this exercise. Firstly, loading the required library, and secondly, declaring some variables for the two different types of coordinate systems we will be using. 16 | 17 | ```{r,results='hide',message=FALSE} 18 | # Load the packages 19 | library(rgdal) 20 | 21 | #Variables for holding the coordinate system types (see: http://www.epsg.org/ for details) 22 | ukgrid = "+init=epsg:27700" 23 | latlong = "+init=epsg:4326" 24 | ``` 25 | 26 | We will use the locations of doctors surgeries data as an example. Download and import it using the following commands: 27 | 28 | ```{r,echo=FALSE,results='hide'} 29 | setwd("/Users/nickbearman/Dropbox/r-helpsheets/helpsheets/7-converting-coordinates") 30 | ``` 31 | 36 | ```{r,eval=FALSE,results='hide'} 37 | # Set working directory 38 | setwd("M:/R work") 39 | 40 | # Download data.zip from the web 41 | download.file("http://data.alex-singleton.com/r-helpsheets/7/data.zip", "data.zip") 42 | 43 | # Unzip file 44 | unzip("data.zip") 45 | ``` 46 | ```{r,results='hide',comment=NA} 47 | # Get doctors surgeries data 48 | GP <- read.csv("General Practices 2006.csv", header = TRUE, skip = 3) 49 | 50 | # Extract the columns we want 51 | GP <- subset(GP, select =c("Practice.Doctor.s.Name", "Easting", "Northing")) 52 | 53 | # Rename the columns to something more helpful 54 | colnames(GP) <- c("Surgery", "Easting", "Northing") 55 | ``` 56 | 57 | We now have the doctors surgeries, with their eastings and northings. To show a summary, run: 58 | 59 | ```{r,warning=FALSE,comment=NA} 60 | head(GP) 61 | ``` 62 | 63 | We next need to convert the GP object from a data frame into a Spatial Data Frame. 64 | 65 | ```{r,results='hide',comment=NA} 66 | # Remove those doctors surgeries with missing Eastings or Northings 67 | GP <- subset(GP, Easting != "" | Northing != "") 68 | # Create a unique ID for each GP 69 | GP$GP_ID <- 1:nrow(GP) 70 | # Create coordinates variable 71 | coords <- cbind(Easting = as.numeric(as.character(GP$Easting)), Northing = as.numeric(as.character(GP$Northing))) 72 | # Create the SpatialPointsDataFrame 73 | GP_SP <- SpatialPointsDataFrame(coords, data = data.frame(GP$Surgery, GP$GP_ID), proj4string = CRS("+init=epsg:27700")) 74 | ``` 75 | 76 | `GP_SP` is now a spatial data frame. We can do a quick `plot(GP_SP)` to see what this looks like. 77 | 78 | ```{r,eval=FALSE,results='hide'} 79 | # Show the results 80 | plot(GP_SP) 81 | ``` 82 | 83 | ```{r,results='hide',echo=FALSE,warning=FALSE} 84 | pdf('plot1.pdf', 5, 5) 85 | plot(GP_SP) 86 | dev.off() 87 | ``` 88 | 89 | ![Image](plot1.pdf)\ 90 | 91 | 92 | 93 | Because `GP_SP` is now a Spatial Data Frame, we need to use `head(GP_SP@data)` to view content. 94 | 95 | ```{r,eval=FALSE} 96 | head(GP_SP@data) 97 | ``` 98 | 99 | ```{r,echo=FALSE,warning=FALSE,comment=NA} 100 | head(GP_SP@data) 101 | ``` 102 | 103 | You can see that the Eastings and Northings are no longer visible. In fact the eastings and northings are just stored in a different slot of the Spatial Data Frame. Try `head(GP_SP@coords)` instead. 104 | 105 | ```{r,eval=FALSE} 106 | head(GP_SP@coords) 107 | ``` 108 | 109 | ```{r,echo=FALSE,warning=FALSE,comment=NA} 110 | head(GP_SP@coords) 111 | ``` 112 | 113 | And there they are! The `Coords` slot will behave like a normal data frame, so we can access specific elements of it in the usual way, for example `head(GP_SP@coords[,1])`. See the helpsheet "1. R Basics" for more information on data frames. 114 | 115 | Now, the command to reproject from British National Grid (Eastings and Northings) into WGS84 (Latitude and Longitude). 116 | 117 | ```{r,results='hide',comment=NA} 118 | #Convert from Eastings and Northings to Latitude and Longitude 119 | GP_SP_LL <- spTransform(GP_SP, CRS(latlong)) 120 | # we also need to rename the columns 121 | colnames(GP_SP_LL@coords)[colnames(GP_SP_LL@coords)=="Easting"] <- "Longitude" 122 | colnames(GP_SP_LL@coords)[colnames(GP_SP_LL@coords)=="Northing"] <- "Latitude" 123 | ``` 124 | 125 | ```{r,results='hide'} 126 | head(GP_SP_LL@coords) 127 | ``` 128 | 129 | ```{r,echo=FALSE,warning=FALSE,comment=NA} 130 | head(GP_SP_LL@coords) 131 | ``` 132 | 133 | Now the data are in Latitude and Longitude. -------------------------------------------------------------------------------- /R_Markdown/common-error-msg.Rmd: -------------------------------------------------------------------------------- 1 | # Why doesn't my code work? - Common things to check 2 | 3 | There could be many reasons why your code doesn't work, but that doesn't mean all is lost. These are the most common things you should check: 4 | 5 | ### Error Messages 6 | 7 | Read the error message - R can sometimes be a bit cryptic with error messages, but they usually point you in the right direction. Most of the time it involves checking exactly what you typed - typos are very common in R. Remember you can press 'up' on the keyboard to see the last command and edit it - you don't need to type out the whole thing again. You can also run the `history()` command to see all of your previous commands. 8 | 9 | Here's some hints on specific error messages: 10 | 11 | 1. "`Error: unexpected ','`" is fairly self-explanatory - remove the extra comma! 12 | 13 | 1. "`Error - unexpected symbol`" could mean that you've missed an `=` sign, quote mark `'` or some other small but vital piece of information. 14 | 15 | 1. "`Error: object not found`" means that R can't find the object you are referring to. Remember R is case sensitive (i.e. the lower case and CAPITAL letters must be the same when referring to an object) so `House.prices` is not the same object as `house.prices`. Also check that you've spelt the object name correctly. You can use '`ls()`' to give you a list of all the current objects in R. 16 | 17 | If you get a different error message, or no message, check exactly what you have typed. If you can't see anything wrong, get the person sitting next to you to check - a second pair of eyes is often useful. 18 | 19 | ### Packages 20 | 21 | Sometimes missing packages can be a problem. 22 | 23 | 1. Remember when using packages there are two stages to this - installing the package, and then loading the package (using the `library()` command). 24 | 25 | 2. The install command looks like this: `install.packages("maptools", depend = TRUE)` where `maptools` is the package name in this case. When you do this it may ask for a mirror to be selected, by opening a new window - just click one of the UK ones to continue. 26 | 27 | 3. If R says `Error: package 'sp' required by 'maptools' could not be found` it means it couldn't install the `sp` package for some reason - trying intstalling it separatley (`install.packages("sp", depend = TRUE`) and then install `maptools`. 28 | 29 | 30 | ### What does `x` do? 31 | 32 | If you're not sure what a particular function does, type `?`, followed by the function (e.g. `?summary`) and R will open the help file for that tool (`summary` in this case). You could also Google 'R summary' which should generate some useful results. 33 | --------------------------------------------------------------------------------