├── .gitignore ├── syntax ├── syntax.pdf └── syntax.Rmd ├── readme-chapter-1.pdf ├── intro_to_r ├── images │ ├── setwd.png │ ├── sample_df.png │ └── clear_objects.png ├── intro_to_r.pdf ├── intro_to_r.R └── intro_to_r.Rmd ├── tour_rstudio ├── images │ ├── new.png │ ├── left.png │ ├── four-panes.png │ ├── top-right.png │ ├── bottom-right.png │ └── four-panes copy.png ├── tour_rstudio.pdf └── tour_rstudio.Rmd ├── project_for_class ├── images │ ├── project.png │ ├── r_chunks.png │ ├── rmd_file.png │ ├── cut_the_cord.png │ ├── folder_rscripts.png │ ├── multiple_projects.png │ ├── run_chunks_above.png │ └── run_current_chunk.png ├── project_for_class.pdf └── project_for_class.Rmd ├── initial_exploration ├── initial_exploration.pdf ├── initial_exploration.R └── initial_exploration.Rmd ├── learn-chapter-1.Rproj ├── chapter-1 ├── _navbar.html ├── index.Rmd └── index.html └── readme.MD /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | .DS_Store 6 | -------------------------------------------------------------------------------- /syntax/syntax.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/syntax/syntax.pdf -------------------------------------------------------------------------------- /readme-chapter-1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/readme-chapter-1.pdf -------------------------------------------------------------------------------- /intro_to_r/images/setwd.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/intro_to_r/images/setwd.png -------------------------------------------------------------------------------- /intro_to_r/intro_to_r.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/intro_to_r/intro_to_r.pdf -------------------------------------------------------------------------------- /tour_rstudio/images/new.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/tour_rstudio/images/new.png -------------------------------------------------------------------------------- /tour_rstudio/images/left.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/tour_rstudio/images/left.png -------------------------------------------------------------------------------- /tour_rstudio/tour_rstudio.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/tour_rstudio/tour_rstudio.pdf -------------------------------------------------------------------------------- /intro_to_r/images/sample_df.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/intro_to_r/images/sample_df.png -------------------------------------------------------------------------------- /tour_rstudio/images/four-panes.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/tour_rstudio/images/four-panes.png -------------------------------------------------------------------------------- /tour_rstudio/images/top-right.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/tour_rstudio/images/top-right.png -------------------------------------------------------------------------------- /intro_to_r/images/clear_objects.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/intro_to_r/images/clear_objects.png -------------------------------------------------------------------------------- /project_for_class/images/project.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/project_for_class/images/project.png -------------------------------------------------------------------------------- /project_for_class/images/r_chunks.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/project_for_class/images/r_chunks.png -------------------------------------------------------------------------------- /project_for_class/images/rmd_file.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/project_for_class/images/rmd_file.png -------------------------------------------------------------------------------- /tour_rstudio/images/bottom-right.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/tour_rstudio/images/bottom-right.png -------------------------------------------------------------------------------- /project_for_class/project_for_class.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/project_for_class/project_for_class.pdf -------------------------------------------------------------------------------- /tour_rstudio/images/four-panes copy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/tour_rstudio/images/four-panes copy.png -------------------------------------------------------------------------------- /project_for_class/images/cut_the_cord.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/project_for_class/images/cut_the_cord.png -------------------------------------------------------------------------------- /initial_exploration/initial_exploration.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/initial_exploration/initial_exploration.pdf -------------------------------------------------------------------------------- /project_for_class/images/folder_rscripts.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/project_for_class/images/folder_rscripts.png -------------------------------------------------------------------------------- /project_for_class/images/multiple_projects.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/project_for_class/images/multiple_projects.png -------------------------------------------------------------------------------- /project_for_class/images/run_chunks_above.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/project_for_class/images/run_chunks_above.png -------------------------------------------------------------------------------- /project_for_class/images/run_current_chunk.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r-journalism/learn-chapter-1/HEAD/project_for_class/images/run_current_chunk.png -------------------------------------------------------------------------------- /learn-chapter-1.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | -------------------------------------------------------------------------------- /chapter-1/_navbar.html: -------------------------------------------------------------------------------- 1 | 2 |
-------------------------------------------------------------------------------- /initial_exploration/initial_exploration.R: -------------------------------------------------------------------------------- 1 | 2 | vec1 <- c(1,4,6,8,10) 3 | vec1 4 | 5 | vec1[5] 6 | 7 | vec1[3] <- 12 8 | vec1 9 | 10 | vec2 <- seq(from=0, to=1, by=0.25) 11 | vec2 12 | 13 | sum(vec1) 14 | 15 | 16 | patientID <- c(111, 208, 113, 408) 17 | age <- c(25, 34, 28, 52) 18 | sex <- c(1,2,1,1) 19 | diabetes <- c("Type1", "Type2", "Type1", "Type1") 20 | status <- c(1,2,3,1) 21 | 22 | patientdata <- data.frame(patientID, age, sex, diabetes, status) 23 | patientdata 24 | 25 | # a : means "through" 26 | patientdata[1:2] 27 | 28 | # So 1:2 means 1 through 2 29 | 30 | 31 | patientdata[c("diabetes", "status")] 32 | 33 | patientdata$age 34 | 35 | patientdata[1:2] 36 | 37 | patientdata[c(1,3),1:2] 38 | 39 | patientdata[2:3, 1:2] 40 | 41 | mean(patientdata$age) 42 | 43 | mean(patientdata[["age"]]) 44 | 45 | g <- "My First List" 46 | h <- c(25, 26, 18, 39) 47 | # The line below is creating a matrix that's 5 rows deep of numbers 1 through(":") 10 48 | j <- matrix(1:10, nrow = 5) 49 | k <- c("one", "two", "three") 50 | mylist <- list(title = g, ages = h, j, k) 51 | 52 | names(mylist) 53 | 54 | mylist[[2]] 55 | 56 | mylist[["ages"]][[1]] 57 | 58 | mylist$age + 10 59 | 60 | # Run the lines of code below 61 | sample_df <- data.frame(id=c(1001,1002,1003,1004), name=c("Steve", "Pam", "Jim", "Dwight"), age=c(26, 65, 15, 7), race=c("White", "Black", "White", "Hispanic")) 62 | sample_df$name <- as.character(sample_df$name) 63 | sample_df 64 | 65 | length(sample_df$name) 66 | 67 | sample_df$name[1] 68 | nchar(sample_df$name[1]) 69 | 70 | dim(sample_df) 71 | 72 | ncol(sample_df) 73 | 74 | nrow(sample_df) 75 | 76 | str(sample_df) 77 | 78 | summary(sample_df) 79 | 80 | View(sample_df) 81 | 82 | rm(sample_df) 83 | -------------------------------------------------------------------------------- /intro_to_r/intro_to_r.R: -------------------------------------------------------------------------------- 1 | # On a mac, it'd look like this 2 | setwd("~/projects/learn-r-journalism") 3 | 4 | # On a pc, it might look like this 5 | setwd("C:/Documents/learn-r-journalism") 6 | 7 | 10^2 + 26 8 | 9 | a <- 4 10 | a 11 | 12 | 13 | a*5 14 | 15 | a 16 | 17 | 18 | 19 | a <- a + 10 20 | a 21 | 22 | rm(list=ls()) 23 | 24 | 25 | b=c(3,4,5) 26 | b 27 | 28 | (3+4+5)/3 29 | 30 | mean(x=b) 31 | 32 | mean(b) 33 | 34 | 35 | # rnorm() is a base function that creates random samples from a random distribution 36 | 37 | x <- rnorm(100) 38 | 39 | # plot() is a base function that charts 40 | 41 | plot(x) 42 | 43 | # The line below won't work, it's just theoretical 44 | source("script.R") 45 | 46 | j <- c(1,2,NA) 47 | 48 | max(j) 49 | 50 | max(j, na.rm=T) 51 | 52 | 53 | sum(j) 54 | # compared to 55 | sum(j, na.rm=T) 56 | 57 | m <- "apples" 58 | m 59 | 60 | n <- pears 61 | 62 | 63 | m + 2 64 | 65 | date1 <- strptime(c("20100225230000", "20100226000000", "20100226010000"), format="%Y%m%d%H%M%S") 66 | date1 67 | 68 | 69 | # If you don't currently have the lubridate package installed, uncomment the line below and run it 70 | 71 | # install.packages("lubridate") 72 | 73 | library(lubridate) 74 | 75 | date1 <- ymd_hms(c("20100225230000", "20100226000000", "20100226010000")) 76 | 77 | 78 | 79 | sample_df <- data.frame(id=c(1001,1002,1003,1004), name=c("Steve", "Pam", "Jim", "Dwight"), 80 | age=c(26, 65, 15, 7), race=c("White", "Black", "White", "Hispanic")) 81 | sample_df$race <- factor(sample_df$race) 82 | sample_df$id <- factor(sample_df$id) 83 | sample_df$name <- as.character(sample_df$name) 84 | 85 | str(sample_df) 86 | 87 | summary(sample_df$race) 88 | 89 | sample_df$race 90 | as.character(sample_df$race) 91 | 92 | 93 | sample_df$name 94 | factor(sample_df$name) 95 | 96 | 97 | sample_df$id 98 | as.numeric(sample_df$id) 99 | 100 | 101 | sample_df$id 102 | as.numeric(as.character(sample_df$id)) 103 | 104 | 105 | 106 | percent_change <- function(first_number, second_number) { 107 | pc <- (second_number-first_number)/first_number*100 108 | return(pc) 109 | } 110 | 111 | percent_change(100,150) 112 | 113 | percent_change("what", "happens") 114 | 115 | 116 | 117 | 118 | -------------------------------------------------------------------------------- /tour_rstudio/tour_rstudio.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "RStudio guide" 3 | author: "Andrew Ba Tran" 4 | output: 5 | pdf_document: 6 | toc: yes 7 | html_document: 8 | toc: yes 9 | toc_float: yes 10 | description: https://learn.r-journalism.com/en/how_to_use_r/ 11 | --- 12 | 13 | This is from the [first chapter](http://learn.r-journalism.com/en/how_to_use_r/tour_rstudio/rstudio-tour/) of [learn.r-journalism.com](https://learn.r-journalism.com/). 14 | 15 | 16 | ## RStudio tour 17 | 18 | When you first open RStudio, the app is divided into three sections. 19 | 20 | Most sections have tabs for even more additional ways to view different sorts of things. 21 | 22 | **Tip**: These sections can be customized and placed wherever you want through the RStudio menu options. *Preferences > Pane Layout* 23 | 24 | ```{r img1, echo=F, out.width='100%'} 25 | library(knitr) 26 | include_graphics("images/left.png") 27 | ``` 28 | 29 | The tall section on the left is the *console* and that's where you can type in R code to execute. 30 | 31 | This code is also called *commands* or *functions*. 32 | 33 | ```{r img2, echo=F, out.width='100%'} 34 | include_graphics("images/top-right.png") 35 | ``` 36 | 37 | 38 | In the top right section, there's the **Environment** tab where you can see the data you are currently working on. 39 | 40 | At first this section is empty because you have not loaded any data yet. 41 | 42 | There's also a tab in the top right section for **History**-- this is where RStudio keeps track of the commands you run in the console. 43 | 44 | 45 | ```{r img3, echo=F, out.width='100%'} 46 | include_graphics("images/bottom-right.png") 47 | ``` 48 | 49 | A *viewer* is on the bottom right, where there are tabs to flip through the **Files** and folder structure of your computer (like in Finder or Explorer), the **Plots** (diagrams) when they've been generated, your list of available R **Packages**, **Help** information etc. 50 | 51 | ## Reproducibility: Save your scripts 52 | 53 | There's a difference between quick, on-the-fly analysis and analysis you want to rerun later on. 54 | 55 | The code you come up with can be saved in scripts and R Markdown files. Scripts end with .R file extension and R Markdown files, which mixes both R code and Markdown code, end with .Rmd. 56 | 57 | In this course we'll be alternating between typing code in the console and typing and saving code in a script. The code that's disposable and written just for quick exploration can be and will be written in the console. Code we want to reuse and re purpose later on should be saved in a script. 58 | 59 | 60 | ```{r img4, echo=F, out.width='100%'} 61 | include_graphics("images/four-panes.png") 62 | ``` 63 | 64 | 65 | These R source code files can be viewed in a **Source** section and pushes the console window down to accommodate. 66 | 67 | To create a new script go through the menu *File > New File > R Script* or through the green plus button on the top left. 68 | 69 | 70 | ```{r img5, echo=F, out.width='40%'} 71 | include_graphics("images/new.png") 72 | ``` 73 | 74 | 75 | 76 | 77 | The file extensions for these files when saved ends with **.R** 78 | 79 | Any code we type in here can be run in the console. Hitting the `Run` button at the top of the script window will run the line of code on which the cursor is sitting. 80 | 81 | To run multiple lines of code, highlight them and click `Run`. 82 | 83 | Be sure to save your scripts after you create them. And also save before running your code in case you write code that makes R crash-- which will happen once in a while. 84 | 85 | 86 | 87 | -------------------------------------------------------------------------------- /readme.MD: -------------------------------------------------------------------------------- 1 | --- 2 | output: 3 | pdf_document: default 4 | html_document: default 5 | --- 6 | ## Chapter 1 7 | 8 | These files and folders are meant to accompany the first chapter from the learn.r-journalism.com class. 9 | 10 | PDFs, r markdown files, and r scripts representing each lesson are provided. There are also exercises to practice the code you’ve learned. 11 | 12 | There is no “correct” way to follow along with this course aside from the quiz and completing the assignments in the discussion forums. 13 | 14 | Approach these lessons at your own pace and with whichever combination works best for your situation. 15 | 16 | ---- 17 | 18 | 19 | The quickest way to access the data for this chapter is to 1) make sure your current project directory is logically organized. 20 | 21 | Hopefully, you've created folder on your computer in your `Documents` folder called `learning-r` or something like that. 22 | 23 | Once you've set your working directory (with maybe `setwd()`) then you can run these commands in your console to download the correct folders and data: 24 | 25 | ``` 26 | install.packages("usethis") 27 | ``` 28 | 29 | and then 30 | 31 | ``` 32 | usethis::use_course("https://github.com/r-journalism/learn-chapter-1/archive/master.zip") 33 | ``` 34 | 35 | ---- 36 | 37 | Otherwise: 38 | 39 | Clone or download and unzip this repo. Make sure the folder is **not** in a temporary unzipped folder. 40 | 41 | To ensure these files work as expected, please make sure the project directory is set up correctly: 42 | 43 | When you type `getwd()` it should look something like `your_file_path/learn-r-chapter-1` 44 | 45 | You can manually set this up by: 46 | 47 | *Session > Set Working Directory > To Project Directory* 48 | 49 | Alternatively, you can double click the **learn-chapter-1.Rproj** file from your finder/file browser. 50 | 51 | ---- 52 | 53 | Navigate to the .rmd files for each sub-chapter to open and follow along with each section. 54 | 55 | Or you can execute the following commands in the console to bring up the files. 56 | 57 | * `file.edit("tour_rstudio/tour_rstudio.Rmd")` 58 | * `file.edit("intro_to_r/intro_to_r.Rmd")` 59 | * `file.edit("initial_exploration/initial_exploration.Rmd")` 60 | * `file.edit("project_for_class/project_for_class.Rmd")` 61 | * `file.edit("syntax/syntax.Rmd")` 62 | 63 | To view the local html version of the .rmd files, you can try the following commands in your console to see rendered version of the sub-chapter: 64 | 65 | * `browseURL("tour_rstudio/tour_rstudio.html")` 66 | 67 | * `browseURL("intro_to_r/intro_to_r.html")` 68 | 69 | * `browseURL("initial_exploration/initial_exploration.html")` 70 | * `browseURL("project_for_class/project_for_class.html")` 71 | * `browseURL("syntax/syntax.html")` 72 | 73 | ---- 74 | 75 | ## Test yourself 76 | 77 | 78 | There are links to exercise what you've learned spread through out this section. 79 | 80 | It's possible to run these files locally to test yourself if you've downloaded the files for the chapter as instructed above. 81 | 82 | Make sure your project directory is correct and then run these lines in the console: 83 | 84 | 85 | ``` 86 | install.packages("learnr") 87 | install.packages("rmarkdown") 88 | install.packages("tidyverse") 89 | ``` 90 | 91 | and then 92 | 93 | ``` 94 | rmarkdown::run("chapter-1/index.Rmd") 95 | ``` 96 | 97 | 98 | ## License 99 | 100 | The online version of this course is licensed under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-nc-sa/4.0/). 101 | -------------------------------------------------------------------------------- /project_for_class/project_for_class.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Setting up for this class" 3 | author: "Andrew Ba Tran" 4 | output: 5 | html_document: 6 | toc: yes 7 | toc_float: yes 8 | pdf_document: 9 | toc: yes 10 | description: https://learn.r-journalism.com/en/how_to_use_r/ 11 | --- 12 | 13 | This is from the [first chapter](http://learn.r-journalism.com/en/how_to_use_r/project_for_class/class-project/) of [learn.r-journalism.com](https://learn.r-journalism.com/). 14 | 15 | 16 | How to a) get the files needed for each chapter in class and b) how to follow along with each chapter/code through. 17 | 18 | ## Getting files and folders 19 | 20 | The files for this class will include the raw data, the scripts, and r markdown files. 21 | 22 | There are a few options to download them: 23 | 24 | ### 1) Install the package I've created for this class 25 | 26 | Each chapter has a prompt to download the files associated with each one. 27 | 28 | Hopefully, you've created a **New Project** on your computer in your `Documents` folder called `learning-r` or something like that. This will create a *.rProj* file and keep the working directory relative to that folder and all data and scripts should go in there. 29 | 30 | Once you've set your working directory (with maybe `setwd()`) then you can run these commands in your console to download the correct folders and data: 31 | 32 | ```{r install2, eval=F} 33 | install.packages("usethis") 34 | 35 | # This is for chapter 2 files and folders 36 | 37 | usethis::use_course("https://github.com/r-journalism/learn-chapter-2/archive/master.zip") 38 | 39 | ``` 40 | 41 | These project files work because the working directory will be **relative** to the project/chapter folder and not to an **absolute** folder path. Double click the *.rProj* file in each project/chapter folder to open it in RStudio. 42 | 43 | ### 2) Download the entire repo off Github 44 | 45 | Each repo that begins with `learn-chapter-X` has files and folders for each chapter we're going through. You can clone or repo each one and open the .Rproj file in each of those folders to follow along. 46 | 47 | ### 3) Follow from scratch 48 | 49 | Alternatively, you could create the files yourself from scratch with the correct folder structure. This is pretty tough for beginners but useful to really understand how project and folder structures work. 50 | 51 | * Create a new project in RStudio right now and call it learning-r-journalism 52 | * Save all your scripts and your data into here for now (ideally with an organized folder structure) 53 | * Data referenced in each chapter/sub-chapter can be downloaded individually from their respective repos 54 | * create new folders following the structure of the code in each walk through and place data downloaded from the repos as indicated 55 | 56 | ---- 57 | 58 | ## Following the code through 59 | 60 | Each sub chapter has a **.Rmd** file that contains the scripted code for each walk through. 61 | 62 | ```{r img1, echo=F, out.width='100%'} 63 | library(knitr) 64 | include_graphics("images/rmd_file.png") 65 | ``` 66 | 67 | This is an R Markdown file, which we will explain in detail in chapter 6. 68 | 69 | But for now, the important thing to understand is that it's a mix of markdown text and R code. 70 | 71 | The R code sections are indicated by the grayed out areas. 72 | 73 | 74 | ```{r img2, echo=F, out.width='100%'} 75 | include_graphics("images/r_chunks.png") 76 | ``` 77 | 78 | 79 | Each section can be executed with the green triangle button on the right. 80 | 81 | 82 | ```{r img3, echo=F, out.width='100%'} 83 | include_graphics("images/run_current_chunk.png") 84 | ``` 85 | 86 | If you get an error, it might be because you didn't run a chunk of code above. This is essentially an R script broken up into pieces and separated by text. Code depends on code executing above it successfully. 87 | 88 | Here's how to run code above a current chunk: 89 | 90 | ```{r img4, echo=F, out.width='100%'} 91 | include_graphics("images/run_chunks_above.png") 92 | ``` 93 | 94 | 95 | ## The ideal way to follow along 96 | 97 | 98 | Do as much manually as you can. 99 | 100 | After you get the files and folders, open a new script file and write out the code written in each video and sub chapter code through. An R script of only the code for each section is also in each folder. Avoid using that unless you have to. 101 | 102 | Avoid just running code chunks in R Markdown. 103 | 104 | Avoid copying and pasting the code. 105 | 106 | Avoid passively learning. 107 | 108 | Manually typing out your code starts the muscle memory and problem solving training. 109 | 110 | **That being said**, if you get stuck, use that code that I've provided however you want. -------------------------------------------------------------------------------- /syntax/syntax.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "R basics, help, mistakes" 3 | author: "Andrew Ba Tran" 4 | output: 5 | html_document: 6 | toc: yes 7 | toc_float: yes 8 | description: https://learn.r-journalism.com/en/how_to_use_r/ 9 | --- 10 | 11 | This is from the [first chapter](http://learn.r-journalism.com/en/how_to_use_r/syntax/r-syntax/) of [learn.r-journalism.com](https://learn.r-journalism.com/). 12 | 13 | 14 | 15 | ## Some R code basics 16 | 17 | * `<-` is known as an "assignment operator" -- it means "Make the object named to the left equal to the output of the code to the right" 18 | * `&` means AND, in Boolean logic 19 | * `|` means OR, in Boolean logic 20 | * `!` means NOT, in Boolean logic 21 | * When referring to values entered as text, or to dates, put them in quote marks like this: `"United States"`, or `"2016-07-26"`. Numbers are not quoted 22 | * When entering two or more values as a list, combine them using the function `c`, for combine, with the values separated by commas, for example: `c("2017-07-26", "2017-08-04")` 23 | * As in a spreadsheet, you can specify a range of values with a colon, for example: `c(1:10)` creates a list of integers (whole numbers) from one to ten. 24 | * Some common operators: 25 | * `+` `-` add, subtract 26 | * `*` `/` multiply, divide 27 | * `>` `<` greater than less than 28 | * `>=` `<=` greater than or equal to, less than or equal to 29 | * `!=` not equal to 30 | * **Equal signs can be confusing** 31 | * `==` tests whether the objects on either end are equal. This is often used in filtering data 32 | * `=` makes an object equal to a value, which is similar to `<-` but used within a function. 33 | * Handling null values: 34 | * Nulls are designated as `NA` 35 | * `is.na(x)` looks for nulls within variable `x`. 36 | * `!is.na(x) looks for non-null values within variable `x` 37 | 38 | Here, `is.na()` is a **function**. Functions are followed by parentheses, and act on code/data in the parentheses. 39 | 40 | {{% notice disclaimer %}} 41 | Object and variable names in R should not contain spaces 42 | {{% /notice %}} 43 | 44 | 45 | 46 | ## R Workspace 47 | 48 | * Your current R working environment 49 | * Includes any user-defined objects (e.g. vectors, data frames, functions) 50 | 51 | | Function | Action | 52 | | ------ | -------------------------------------------------- | 53 | | `getwd()` | List current working directory | 54 | | `setwd("mydirectory")` | Change the current working directory to my directory | 55 | | `ls()` | List the objects in the current work space | 56 | | `rm(object)` | Delete object | 57 | | `save(objectlist, file="myfile)` | Save specific objects to a file | 58 | | `load("myfile")` | Load a work space into the current session (default = .RData) | 59 | 60 | ## Packages 61 | 62 | * Collections of R functions, data, and compiled code in well-defined format 63 | * Massively extend the functionality of R 64 | * Thousands of user-written packages on CRAN 65 | * [https://cran.r-project.org/web/packages](https://cran.r-project.org/web/packages/) 66 | 67 | 68 | {{% notice tip %}} 69 | Mac users may need to alter their security preferences to allow apps authored by non-Apple developers to install. If you notice an error, try to change [your system preferences](https://www.youtube.com/watch?v=xFpVqkyXFy4). 70 | {{% /notice %}} 71 | 72 | 73 | ## Getting Help 74 | 75 | | Function | Action | 76 | | ------ | -------------------------------------------------- | 77 | | `help.start()` | General help | 78 | | `help("foo")` or `?foo` | Help on function foo (the quotation marks are optional) | 79 | | `help.search("foo")` or `??foo` | Search the help system for instances of the string foo | 80 | | `example("foo")` | Examples of function foo (the quotation marks are optional) | 81 | 82 | ## Working with Packages 83 | 84 | * install.packages("*packagename*") 85 | * update.packages() 86 | 87 | 88 | * library(*packagename*) 89 | * help(package="*packagename*") 90 | 91 | 92 | * library() #what packages are in the library 93 | * search() #what packages are loaded 94 | 95 | ## Common Mistakes 96 | 97 | * **Using the wrong case** 98 | * help(), Help(), and HELP() are three different functions (and only the first one will work) 99 | * **Forgetting to use quotation marks when they are needed** 100 | * install.packages("gclus") will work, while install.packages(gclus) will generate an error. 101 | * **Forgetting to include the parentheses in a function call** 102 | * help() rather than help. Even if there are no options, you still need the(). 103 | * **Using the `\` in a path name on Windows** 104 | * R sees the backlash character as an escape character. 105 | * `setwd("c:\mydata")` will generate an error. Use `setwd("c:/mydata")` or `setwd("c:\\mydata")` instead 106 | * **Using a function from a package that is not loaded** 107 | * For example: The function `str_trim()` is contained in the **stringr** package. 108 | * If you try to use it before loading the package, you will get an error -------------------------------------------------------------------------------- /chapter-1/index.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Chapter 1" 3 | tutorial: 4 | id: "code.r-journalism/chapter-1" 5 | version: .85 6 | output: 7 | learnr::tutorial: 8 | progressive: true 9 | theme: lumen 10 | highlight: espresso 11 | runtime: shiny_prerendered 12 | --- 13 | 14 | 15 | ```{r setup, include=FALSE} 16 | library(learnr) 17 | library(tidyverse) 18 | 19 | knitr::opts_chunk$set(echo = FALSE) 20 | 21 | years <- c(301, 978, 101) 22 | burgers <- data.frame(id=c(60006,60007,60008,60009, 60010), name=c("Bob", "Linda", "Louise", "Tina", "Gene"), age=c(45, 44, 12, 13, 11), shirt=c("White", "Red", "Pink", "Blue", "Yellow")) 23 | burgers$shirt<- factor(burgers$shirt) 24 | burgers$id <- factor(burgers$id) 25 | burgers$name <- as.character(burgers$name) 26 | 27 | nospace <- function(a) { 28 | b <- gsub(" ","",a) 29 | gsub("\\n", "", b) 30 | } 31 | 32 | 33 | 34 | 35 | 36 | is_bad_code <- function(user_code, check_code, envir_result) { 37 | 38 | is_null <- is.null(eval(parse(text = user_code))) 39 | e_rows <- nrow(eval(parse(text = user_code))) 40 | r_rows <- nrow(eval(parse(text = check_code), envir = envir_result)) 41 | e_cols <- ncol(eval(parse(text = user_code))) 42 | r_cols <- ncol(eval(parse(text = check_code), envir = envir_result)) 43 | e_type <- typeof(eval(parse(text = user_code))) 44 | r_type <- typeof(eval(parse(text = check_code), envir=envir_result)) 45 | e_len <- length(eval(parse(text = user_code))) 46 | r_len <- length(eval(parse(text = check_code))) 47 | if (is_null!=T){ 48 | if (e_len!=0 & r_len!=0) { 49 | if (e_type==r_type) { 50 | if (e_type!="character" & e_type!="double" & e_type!="integer" & e_type!="logical") { 51 | if (e_rows==r_rows && e_cols==r_cols) { 52 | eval(parse(text = user_code)) != eval(parse(text = check_code), envir = envir_result) 53 | } else { 54 | TRUE 55 | } 56 | } else { eval(parse(text = user_code)) != eval(parse(text = check_code), envir = envir_result) } 57 | } else { 58 | TRUE 59 | } 60 | } else { TRUE } 61 | } else { nospace(user_code) != nospace(check_code)} 62 | } 63 | 64 | 65 | checker <- function(label, user_code, check_code, envir_result, evaluate_result, ...) { 66 | if (is_bad_code(user_code, check_code, envir_result)) { 67 | return(list(message = "Code wasn't right!", correct = FALSE, location = "append")) 68 | } else { 69 | return(list(message = "Great Job!", correct = TRUE, location = "append")) 70 | } 71 | } 72 | 73 | tutorial_options(exercise.timelimit = 30, exercise.checker = checker) 74 | #tutorial_options(exercise.checker = checkthat::check_exercise) 75 | 76 | 77 | ``` 78 | 79 | 80 | ## Intro to R 81 | 82 | 83 | ### Objects 84 | 85 | Assign the number 17 to the object **ub** 86 | 87 | 88 | ```{r filter, exercise=TRUE} 89 | ub 90 | 91 | ub 92 | ``` 93 | 94 | 95 | ```{r filter-check} 96 | ub <- 17 97 | 98 | ub 99 | ``` 100 | 101 | ### Array 102 | 103 | Create an array of numbers: 301, 978, and 101. 104 | 105 | Assign it to the object "years" 106 | 107 | ```{r y1, exercise=TRUE} 108 | years #replace this with your code 109 | 110 | years 111 | ``` 112 | 113 | 114 | ```{r y1-check} 115 | years <- c(301, 978, 101) 116 | 117 | years 118 | ``` 119 | 120 | 121 | ### Functions 122 | 123 | 124 | What's the average of the array of numbers assigned to "years"? 125 | 126 | ```{r a1, exercise=TRUE} 127 | (years) 128 | ``` 129 | 130 | 131 | ```{r a1-check} 132 | mean(years) 133 | ``` 134 | 135 | ### Classes 136 | 137 | Take a look at the structure of **burgers**: 138 | 139 | 140 | ```{r st1, exercise=TRUE} 141 | 142 | ``` 143 | 144 | 145 | ```{r st1-check} 146 | str(burgers) 147 | ``` 148 | 149 | 150 | ```{r first_quiz} 151 | quiz( 152 | question("What kind of class is the variable id?", 153 | answer("character"), 154 | answer("number"), 155 | answer("factor", correct = TRUE), 156 | answer("date"), 157 | random_answer_order= TRUE 158 | )) 159 | ``` 160 | 161 | ## Data structures in R 162 | 163 | 164 | ### Pulling a column of data 165 | 166 | Consider this data frame **burgers** 167 | 168 | ```{r burger_show} 169 | burgers 170 | ``` 171 | 172 | 173 | How do you refer to the the *shirt* variable/column with **[]**? 174 | 175 | 176 | ```{r v1, exercise=TRUE} 177 | # Add to the line below 178 | 179 | burgers 180 | ``` 181 | 182 | 183 | 184 | ```{r v1-check} 185 | burgers[,4] 186 | ``` 187 | 188 | 189 | How do you refer to the the *shirt* variable/column with $? 190 | 191 | ```{r v2, exercise=TRUE} 192 | # Add to the line below 193 | 194 | burgers 195 | ``` 196 | 197 | ```{r v2-check} 198 | burgers$shirt 199 | ``` 200 | 201 | 202 | 203 | ### Pulling a row of data 204 | 205 | Extract entire row for Linda using []. 206 | 207 | ```{r v4, exercise=TRUE} 208 | # Add to the line below 209 | 210 | burgers 211 | ``` 212 | 213 | ```{r v4-check} 214 | burgers[2,] 215 | ``` 216 | 217 | 218 | ### Converting data classes 219 | 220 | Convert the *id* variable of the **burgers** data frame to numeric. 221 | 222 | 223 | ```{r v3, exercise=TRUE} 224 | # Add to the line below 225 | 226 | burgers$id <- 227 | 228 | burgers$id 229 | class(burgers$id) 230 | ``` 231 | 232 | ```{r v3-check} 233 | burgers$id <- as.numeric(as.character(burgers$id)) 234 | 235 | burgers$id 236 | class(burgers$id) 237 | ``` 238 | 239 | 240 | ### Boolean logic 241 | 242 | Check if Gene's age is 11. 243 | 244 | *Note:* Is the answer the same as above (correct) or is it 1-5 (false)? 245 | 246 | 247 | ```{r b, exercise=TRUE} 248 | # Modify the line of code below 249 | 250 | age_test <- burgers$age[5] 11 251 | 252 | age_test 253 | ``` 254 | 255 | ```{r b-check} 256 | age_test <- burgers$age[5] == 11 257 | 258 | age_test 259 | ``` 260 | -------------------------------------------------------------------------------- /initial_exploration/initial_exploration.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Data structures in R" 3 | author: "Andrew Ba Tran" 4 | output: 5 | html_document: 6 | toc: yes 7 | toc_float: yes 8 | description: https://learn.r-journalism.com/en/how_to_use_r/ 9 | --- 10 | 11 | This is from the [first chapter](https://learn.r-journalism.com/en/how_to_use_r/initial_exploration/data-structures/) of [learn.r-journalism.com](https://learn.r-journalism.com/). 12 | 13 | 14 | ## Vectors 15 | 16 | A **vector** is a sequence of data elements of the same basic type. The parts that consist of a vector are called **components** or **elements**. 17 | 18 | ```{r vectors1} 19 | vec1 <- c(1,4,6,8,10) 20 | vec1 21 | ``` 22 | 23 | A vector `vec` is explicitly constructed by the concatenation function `c()`. 24 | 25 | ```{r vectors2} 26 | vec1[5] 27 | ``` 28 | 29 | Elements in vectors can be addressed by standard `[i]` indexing 30 | 31 | ```{r vectors3} 32 | vec1[3] <- 12 33 | vec1 34 | ``` 35 | 36 | One of the elements in the array is replaced with a new number. 37 | 38 | ```{r vectors4} 39 | vec2 <- seq(from=0, to=1, by=0.25) 40 | vec2 41 | ``` 42 | 43 | This shows another useful way of creating a vector: the `seq()` or sequence function. 44 | 45 | ```{r vectors5} 46 | sum(vec1) 47 | ``` 48 | 49 | ## Matrices 50 | 51 | **Matrices** are two-dimensional vectors. 52 | 53 | It looks like this 54 | 55 | ```{r matrix} 56 | mat <- matrix(data=c(9,2,3,4,5,6), ncol=3) 57 | mat 58 | ``` 59 | 60 | The argument `data` specifies which numbers should be in the matrix. 61 | 62 | Use either `ncol` to specify the number of columns or `nrow` to specify the number of rows. 63 | 64 | Matrix operations are similar to vector operations. 65 | 66 | ```{r matrix2} 67 | mat[1,2] 68 | ``` 69 | 70 | Elements of a matrix can be addressed in the usual way 71 | 72 | ```{r matrix3} 73 | mat[2,1] 74 | ``` 75 | 76 | When you want to select a whole row, you leave the spot for the column number empty and vice versa for the columns. 77 | 78 | ```{r matrix4} 79 | mat[,3] 80 | ``` 81 | 82 | 83 | ## Data frames 84 | 85 | If you're used to working with spreadsheets, then *data frames* will make the most sense to you in R. 86 | 87 | This is how to create a data frame from arrays. You don't have to fully understand this at this point-- the data you'll be working with will come pre-structured if you're importing spreadsheets. 88 | 89 | 90 | ```{r df1} 91 | patientID <- c(111, 208, 113, 408) 92 | age <- c(25, 34, 28, 52) 93 | sex <- c(1,2,1,1) 94 | diabetes <- c("Type1", "Type2", "Type1", "Type1") 95 | status <- c(1,2,3,1) 96 | 97 | patientdata <- data.frame(patientID, age, sex, diabetes, status) 98 | patientdata 99 | ``` 100 | 101 | But this is what's happening. A set of arrays are being created and a function called `data.frame()` joins them together into a data frame structure. 102 | 103 | **How to pull elements from a data frame:** 104 | 105 | ```{r patientdata} 106 | # a : means "through" 107 | patientdata[1:2] 108 | 109 | # So 1:2 means 1 through 2 110 | 111 | 112 | patientdata[c("diabetes", "status")] 113 | 114 | patientdata$age 115 | 116 | patientdata[1:2] 117 | 118 | patientdata[c(1,3),1:2] 119 | 120 | patientdata[2:3, 1:2] 121 | ``` 122 | 123 | 124 | {{% notice tip %}} 125 | You can reference a column with patientdata$age and you can also refer to the column based on the index of it. In this instance it's 2, so patientdata[,2] is the equivalent. If you only wanted the third row, then it'd look like patientdata[3,]. Think of it as data[Row, Column]. I remember it as data[rocks], as in data[Ro,Cks]. 126 | {{% /notice %}} 127 | 128 | Instead of using `mean(patientdata[,2])`, you can select the column `age` from the `patientdata` data frame with the `$` sign. 129 | 130 | ```{r df2} 131 | mean(patientdata$age) 132 | ``` 133 | 134 | Here's an alternative way to refer to the `age` column of the `patientdata` data frame. But you will rarely use this method. 135 | 136 | 137 | ```{r df3} 138 | mean(patientdata[["age"]]) 139 | ``` 140 | 141 | Here's an alternative way to refer to the `age` column of the `patientdata` data frame. But you will rarely use this method. 142 | 143 | ## Lists 144 | 145 | Another basic structure in R is a *list*. 146 | 147 | The main advantage of lists is that the "columns" they're not really ordered in columns any more, but are more of a collection of vectors) don't have to be of the same length, unlike matrices and data frames. 148 | 149 | Kind of like JSON files are structured. 150 | 151 | ```{r list1} 152 | g <- "My First List" 153 | h <- c(25, 26, 18, 39) 154 | # The line below is creating a matrix that's 5 rows deep of numbers 1 through(":") 10 155 | j <- matrix(1:10, nrow = 5) 156 | k <- c("one", "two", "three") 157 | mylist <- list(title = g, ages = h, j, k) 158 | ``` 159 | 160 | This is how a list would appear in the work space 161 | 162 | ```{r list2} 163 | names(mylist) 164 | ``` 165 | 166 | How to find out what's in the list 167 | 168 | ```{r list_extract} 169 | mylist[[2]] 170 | 171 | mylist[["ages"]][[1]] 172 | ``` 173 | 174 | The code above extracts data from the list 175 | 176 | ```{r list3} 177 | mylist$age + 10 178 | ``` 179 | 180 | How to refer to and use the numbers in the example list 181 | 182 | ## Functions for working with objects 183 | 184 | Let's start with the sample_df dataframe below. 185 | 186 | ```{r factor1} 187 | # Run the lines of code below 188 | sample_df <- data.frame(id=c(1001,1002,1003,1004), name=c("Steve", "Pam", "Jim", "Dwight"), age=c(26, 65, 15, 7), race=c("White", "Black", "White", "Hispanic")) 189 | sample_df$name <- as.character(sample_df$name) 190 | sample_df 191 | ``` 192 | 193 | `length(x)` - Find out how many things there are in an object or array 194 | 195 | ```{r length} 196 | length(sample_df$name) 197 | ``` 198 | 199 | `nchar(x)` - If **x** is a string, finds how how many characters there are 200 | 201 | ```{r nchar} 202 | sample_df$name[1] 203 | nchar(sample_df$name[1]) 204 | ``` 205 | 206 | `dim(x)` - Gives the dimensions of **x** 207 | ```{r dim} 208 | dim(sample_df) 209 | ``` 210 | 211 | `ncol(x)` - Counts the number of columns 212 | 213 | ```{r ncol} 214 | ncol(sample_df) 215 | ``` 216 | 217 | `nrow(x)` - Returns the number of rows of **x** 218 | 219 | ```{r nrow} 220 | nrow(sample_df) 221 | ``` 222 | 223 | `str(x)` - Returns the structure of **x** 224 | 225 | ```{r str} 226 | str(sample_df) 227 | ``` 228 | 229 | `summary(x)` - Summarizes the object as understood by R 230 | 231 | ```{r summary} 232 | summary(sample_df) 233 | ``` 234 | 235 | 236 | `View(x)` - A command to open the object to browse in RStudio 237 | 238 | ```{r view, eval=F} 239 | View(sample_df) 240 | ``` 241 | 242 | `rm(x)` - Removes **x** 243 | 244 | ```{r rm, error=T} 245 | rm(sample_df) 246 | sample_df 247 | ``` 248 | 249 | ## Your turn 250 | 251 | Challenge yourself with [these exercises](http://code.r-journalism.com/chapter-1/#section-data-structures-in-r) so you'll retain the knowledge of this section. 252 | 253 | Instructions on how to run the exercise app are in the [intro page](http://learn.r-journalism.com/en/how_to_use_r/) to this section. 254 | 255 | 256 | 257 | -------------------------------------------------------------------------------- /intro_to_r/intro_to_r.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Introduction to R" 3 | author: "Andrew Ba Tran" 4 | output: 5 | html_document: 6 | toc: yes 7 | toc_float: yes 8 | description: https://learn.r-journalism.com/en/how_to_use_r/ 9 | --- 10 | 11 | This is from the [first chapter](http://learn.r-journalism.com/en/how_to_use_r/tour_rstudio/intro-to-r/) of [learn.r-journalism.com](https://learn.r-journalism.com/). 12 | 13 | 14 | ## Syntax 15 | 16 | You must follow pretty specific syntax rules for it to work. R won't guess for you. 17 | 18 | * R is case-sensitive (unlike SQL) [intepreted language](https://en.wikipedia.org/wiki/Interpreted_language) (unlike C) 19 | * Can enter commands at prompt `>` or in batch 20 | * Comments are preceded by `#` 21 | * This is important to use often in your code for documentation 22 | * You'll see it often in this course 23 | * Statements are lines of code such as functions and object assignments 24 | * Enters in code (as in a new line) or semi-colons will separate statements 25 | 26 | 27 | ## Working directories 28 | 29 | Your *working directory* is the folder on your computer in which you are currently working. When you ask R to open a certain file, it will look in the working directory for this file, and when you tell R to save a data file or figure, it will save it in the working directory. 30 | 31 | Before you start working, please set your working directory to where all your data and script files are or should be stored. 32 | 33 | **Tip**: When you see code in a black box throughout this class, **I want you to run that code in R unless noted otherwise**. You can run it in the console, I'd prefer you do so in a script so you can see your history. In the video, you'll see me swap back and forth between coding in the console and the script. This really depends on if I'm experimenting with code quickly or if it's something I know I should track. Often I will copy and paste code executed from the console to growing script as an afterthought. Also, **be sure to read the commented out code** because I try to give additional context, like tips on what to do if the command fails. 34 | 35 | 36 | This is an example of setting a working directory to a manual directory on your computer. 37 | 38 | ```{r setwd, eval=F, warning=F, message=F} 39 | # On a mac, it'd look like this 40 | setwd("~/projects/learn-r-journalism") 41 | 42 | # On a pc, it might look like this 43 | setwd("C:/Documents/learn-r-journalism") 44 | ``` 45 | 46 | **Note**: Make sure that slashes are forward slashes and that you don't forget the quotation marks. 47 | 48 | Within RStudio, you can also set the working directory via the menu *Tools > Set Working Directory* 49 | 50 | ```{r img2, echo=F, out.width='100%'} 51 | library(knitr) 52 | include_graphics("images/setwd.png") 53 | ``` 54 | 55 | 56 | 57 | The commands above, `setwd()` was an example of setting an **absolute** folder. 58 | 59 | This works for you for now, but if you wanted to share your methodology and script in the future or if you wanted to save run the code on another computer it would likely **not** work because it would be looking through a folder structure that doesn't exist on any computer except the one where the original script was written. This is not ideal for reproducibility. 60 | 61 | 62 | Working directories are a tough concept. We'll be going over best practices in [Chapter 6](https://learn.r-journalism.com/en/publishing/) in more detail. Meanwhile, here's a link that might [help explain](https://wangfanghelsinki.wordpress.com/2015/03/31/understanding-and-setting-rstudio-working-directory/) some more. 63 | 64 | 65 | ## Libraries 66 | 67 | R can do many statistical and data analyses. 68 | 69 | They are organized in so-called *packages* or *libraries*. 70 | 71 | You can do a lot of statistical analysis in R without any additional libraries— this is called base R. 72 | 73 | But other users have created libraries with functions that solve common problems. R package users download only the libraries that they need for an individual project 74 | 75 | To get a list of all installed packages, go to the packages window or type `library()` in the console window. If the box in front of the package name is ticked in the packages window, the package is loaded and the functions within it are ready to be called. 76 | 77 | There are many more packages available on the R website. If you want to install and use a package (for example, the packaged called "dplyr") you should: 78 | 79 | * Install the package: click `install packages` in the packages window and type `dplyr` or type `install.packages("dplyr")` in the console window. 80 | * Load the package: Check box in front of `dplyr` or type `library("dplyr")` in the console window. 81 | 82 | # Examples of R commands 83 | 84 | ## Calculator 85 | 86 | R can be used as a calculator. 87 | 88 | Just type an equation in the console window after the `>` 89 | 90 | **Note**: In those code sections, the code preceded by ## is the output of the code from the lines above. 91 | 92 | ```{r calc} 93 | 10^2 + 26 94 | ``` 95 | 96 | ## Workspace 97 | 98 | You can give numbers a name. 99 | 100 | By doing so, they become so-called variables which can be used later. 101 | 102 | 103 | ```{r a4} 104 | a <- 4 105 | a 106 | ``` 107 | 108 | You can do calculations with `a` now. 109 | 110 | ```{r amath} 111 | a*5 112 | ``` 113 | 114 | If you specify `a` again, it will forget what value you had before because you did not assign it to anything. 115 | 116 | ```{r a_again} 117 | a 118 | ``` 119 | 120 | 121 | You can also assign a value to `a` using the old one 122 | 123 | ```{r a10} 124 | a <- a + 10 125 | a 126 | ``` 127 | 128 | To remove all variables from R's memory, type 129 | 130 | ```{r rm} 131 | rm(list=ls()) 132 | ``` 133 | 134 | or click the "clear all" broom button in the work space window. 135 | 136 | ```{r img3, echo=F, out.width='100%'} 137 | library(knitr) 138 | include_graphics("images/clear_objects.png") 139 | ``` 140 | 141 | ## Scalars and vectors 142 | 143 | Like in many other programs, R organizes numbers in *scalars* (a single number 0-dimensional), *vectors* (a row of numbers, also called arrays - `-dimensional) and *matrices* (which we won't get into now). 144 | 145 | The `a` you defined was scalar. 146 | 147 | To define a vector with the numbers 3,4, and 5, you need the function `c()` which is short for concatenate (or paste together). 148 | 149 | ```{r combine} 150 | b=c(3,4,5) 151 | b 152 | ``` 153 | 154 | 155 | ## Functions 156 | 157 | If you would like to compute the mean of all the elements in the vector `b` from the example above, you could type 158 | 159 | ```{r mean} 160 | (3+4+5)/3 161 | ``` 162 | 163 | But when the vector is very long, this is very boring and time-consuming work. 164 | 165 | Functions do things to data. R is built on them. Some functions come with R, like `median()` or `summary()` and others come as part of packages that others have created. 166 | 167 | When you use a function to compute an average, you'll type 168 | 169 | ```{r mean_two} 170 | mean(x=b) 171 | ``` 172 | 173 | Within the brackets you specify the *arguments*. 174 | 175 | Arguments give extra information to the function. In this case, the argument *x* says of which set of numbers (vector) the mean should computed (namely of b). 176 | 177 | Sometimes the name of the argument is not necessary: 178 | 179 | ```{r mean_three} 180 | mean(b) 181 | ``` 182 | 183 | Also works. 184 | 185 | ## Plots 186 | 187 | R can make simple graphics right away. 188 | 189 | ```{r norm_graph} 190 | # rnorm() is a base function that creates random samples from a random distribution 191 | 192 | x <- rnorm(100) 193 | 194 | # plot() is a base function that charts 195 | 196 | plot(x) 197 | ``` 198 | 199 | * In the first line, 100 random numbers are assigned to the variable x, which becomes a vector by this operation. 200 | * In the second line, all these values are plotted in the plot window. 201 | 202 | ## Scripts 203 | 204 | R is an interpreter that uses a command line based environment. 205 | 206 | This means that you have to type commands, rather than use the mouse and menus. 207 | 208 | This has the advantage that you do not always have to retype commands. 209 | 210 | You can store your commands in files, the so-called *scripts*. These scripts have typically file names with the extension **.R** as in **script.R**. 211 | 212 | You can open an editor window to edit these files by clicking *File > New* or *File > Open file...* 213 | 214 | You can run (send to the console window) part of the code by selecting lines and pressing **CTRL+ENTER** or **CMD+ENTER** or click the *Run* button at the top of the script editor window. If you do not select anything, R will run the line your cursor is on. 215 | 216 | You can always run the whole script with the function `source()` 217 | 218 | For example, to run the entire saved **script.R** if it's in the root directory of the working directory, type 219 | 220 | ```{r run_script, eval=F} 221 | source("script.R") 222 | ``` 223 | 224 | The line above won't work because there isn't actually an R script called script.R in the directory. 225 | 226 | You can also click *Run all* in the editor window or type **CTRL+SHIFT+S** or *CMD+SHIFT+S* 227 | 228 | 229 | ## Not available data 230 | 231 | When you work with real data, you will encounter missing values because instrumentation failed or human error. 232 | 233 | When a data is *not available*, you write `NA` instead of a number. 234 | 235 | ```{r na} 236 | j <- c(1,2,NA) 237 | ``` 238 | 239 | Computing statistics of incomplete data sets is strictly not possible. 240 | 241 | maybe the largest value occurred during the weekend when you didn't measure. Therefore, R will say that it doesn't know what the largest value of `j` is 242 | 243 | ```{r na2} 244 | max(j) 245 | 246 | ``` 247 | 248 | If you don't mind about the missing data and want to compute the statistics anyway, you can add the argument `na.rm=TRUE` (Should I remove the `NA`s? Yes) 249 | 250 | ```{r na3} 251 | max(j, na.rm=T) 252 | ``` 253 | 254 | 255 | 256 | **Warning**: `NA`s will also affect any sort of math if you're not careful 257 | 258 | 259 | ```{r na4} 260 | sum(j) 261 | # compared to 262 | sum(j, na.rm=T) 263 | ``` 264 | 265 | Here are [some](http://faculty.nps.edu/sebuttre/home/R/missings.html) [links](https://datascienceplus.com/missing-values-in-r/) on how to [handle](https://stats.idre.ucla.edu/r/faq/how-does-r-handle-missing-values/) `NA`s in your data 266 | 267 | # Data types 268 | 269 | We've been working so far with numbers. 270 | 271 | But sometimes data we work with can be specified as something else, like characters and strings or Boolean values like **TRUE** or **FALSE** or dates. 272 | 273 | ## Characters 274 | 275 | 276 | ```{r characters} 277 | m <- "apples" 278 | m 279 | ``` 280 | 281 | To tell R that something is a character string, you should type the text between apostrophes, otherwise R will start looking for a defined variable with the same name. 282 | 283 | ```{r characters2, error=TRUE} 284 | n <- pears 285 | ``` 286 | 287 | You can't do math with characters. 288 | 289 | ```{r characters3, error=TRUE} 290 | m + 2 291 | ``` 292 | 293 | ## Dates 294 | 295 | Dates and times are complicated. 296 | 297 | R has to know that 3 o'clock comes after 2:59 and that February has 29 days in some years. 298 | 299 | The base way to tell R that something is a date-time combination is with the function `strptime()` 300 | 301 | ```{r dates1} 302 | date1 <- strptime(c("20100225230000", "20100226000000", "20100226010000"), format="%Y%m%d%H%M%S") 303 | date1 304 | ``` 305 | 306 | A vector is created with `c()` and the numbers between the quotation marks are strings, because that's what the `strptim()` function requires. 307 | 308 | That's followed by the argument **format** that defines how the character string should be read. In this instance, the year is denoted first (%Y), then the month (%M) and second (%S). You don't have to specify all of them, as long as the format corresponds to the character string. 309 | 310 | 311 | 312 | In this course, we'll be using a less messy way to deal with dates using the package **lubridate**. 313 | 314 | ```{r dates2} 315 | # If you don't currently have the lubridate package installed, uncomment the line below and run it 316 | 317 | # install.packages("lubridate") 318 | 319 | library(lubridate) 320 | 321 | date1 <- ymd_hms(c("20100225230000", "20100226000000", "20100226010000")) 322 | ``` 323 | 324 | The function `ymd_hms()` converted the year, month, date and hour, minutes, and seconds in the string. We'll go over this in more detail in [Chapter 3](http://learn.r-journalism.com/en/wrangling/dates/dates/). 325 | 326 | ## Factors 327 | 328 | Complicated. 329 | 330 | * Data structure specifying categorical (nominal) or ordered categorical (ordinal) variables 331 | * Tells R how to handle that variable in analyses 332 | * Very important and misunderstood 333 | * Any variable that is categorical or ordinal should usually be stored as a factor. 334 | 335 | For example, Race might be input as "White", "Black", and "Hispanic" 336 | 337 | When importing that data in from a spreadsheet, R will most often interpret it as a **Factor**. 338 | 339 | Run these lines of code to create a new object, a dataframe called **sample_df** 340 | 341 | ```{r factor1} 342 | sample_df <- data.frame(id=c(1001,1002,1003,1004), name=c("Steve", "Pam", "Jim", "Dwight"), 343 | age=c(26, 65, 15, 7), race=c("White", "Black", "White", "Hispanic")) 344 | sample_df$race <- factor(sample_df$race) 345 | sample_df$id <- factor(sample_df$id) 346 | sample_df$name <- as.character(sample_df$name) 347 | ``` 348 | 349 | ```{r img4, echo=F, out.width='100%'} 350 | include_graphics("images/sample_df.png") 351 | ``` 352 | 353 | Let's take a look at the structure behind a dataframe we've created. 354 | 355 | ```{r factor2} 356 | str(sample_df) 357 | ``` 358 | 359 | R sees that the **Race** column is a factor variable with three levels. 360 | 361 | ```{r factor3} 362 | levels(sample_df$race) 363 | ``` 364 | 365 | This means that R groups statistics by these levels. 366 | 367 | ```{r factor4} 368 | summary(sample_df$race) 369 | ``` 370 | 371 | 372 | Internally, R stores the integer values 1, 2, and 3, and maps the character strings in alphabetical order to these values. 1=Black, 2=Hispanic, and 3=White. 373 | 374 | Why is this important to know? 375 | 376 | Journalists are less concerned by factors and will often find themselves converting factors to strings and characters. But when you reach the point that you are wanting to create models and linear regressions then you'll be happy that it's an option. 377 | 378 | **Tip**: Most odd quirks when it comes to R can be traced back to the fact that R was created by and for statisticians. R has grown a lot since then and the community has helped make it evolve to handle data the way we are more used to. But some habits die hard and are ingrained. 379 | 380 | 381 | ## Integers & Numbers 382 | 383 | These are what they sound like, whole or decimal numbers that can be used in calculations 384 | 385 | R doesn’t handle units properly if they are combined with numbers. 386 | 387 | For example, "4in" will be treated as a string and not as 4. 388 | 389 | 390 | ## Converting between the different types 391 | 392 | Here's a warning. 393 | 394 | * You can convert factors into strings. 395 | 396 | ```{r convert1} 397 | sample_df$race 398 | as.character(sample_df$race) 399 | ``` 400 | 401 | * You can convert strings into factors 402 | 403 | ```{r convert2} 404 | sample_df$name 405 | factor(sample_df$name) 406 | ``` 407 | 408 | * You **cannot** convert factors into numbers. 409 | 410 | ```{r convert3} 411 | sample_df$id 412 | as.numeric(sample_df$id) 413 | ``` 414 | 415 | Because R stores **Factors** as **Integer** values. 416 | 417 | You must convert factors into characters first *before* converting it to numbers. 418 | 419 | So you can nest it. 420 | 421 | ```{r convert4} 422 | sample_df$id 423 | as.numeric(as.character(sample_df$id)) 424 | ``` 425 | 426 | **Tip**: It's okay if you don't fully understand this next section. It's pretty advanced. I give you permission to skip down to **Your turn:**. 427 | 428 | 429 | 430 | ## Revisiting Functions 431 | 432 | You can also save code you've written that simplifies your process into a function. 433 | 434 | ```{r function} 435 | percent_change <- function(first_number, second_number) { 436 | pc <- (second_number-first_number)/first_number*100 437 | return(pc) 438 | } 439 | 440 | percent_change(100,150) 441 | ``` 442 | 443 | This is what's happening in the code above: 444 | 445 | * **percent_change** is the name of the function, and assigned to it is the function `function()` 446 | * Two variables are necessary to be passed to this function, **first_number** and **second_number** 447 | * A new object `pc` is created using some math calculating percent change from the two variables passed to it 448 | * the function `return()` assigns the result of the math to `percent_change` from the first line 449 | 450 | Build enough functions and you can save them as [your own package](https://github.com/andrewbtran/muckrakr). 451 | 452 | **The important thing about functions** is that they're programmed by humans. 453 | 454 | I constructed this function because that's how I know that I'll only include two inputs and that each one will be numeric and that they'll be in order of first then second. 455 | 456 | If you're working in R and a function you're using is giving an error, it most likely means there's something wrong with one or more of the variables you're passing to the function. 457 | 458 | Here's what happens when I pass a string to my `percent_change()` function. 459 | 460 | ```{r function2, error=T} 461 | percent_change("what", "happens") 462 | ``` 463 | 464 | 465 | **This is the point: ** 466 | 467 | Sometimes really great R programmers will anticipate errors and catch bad inputs and try to output helpful suggestions instead of a generic error. 468 | 469 | This particular error isn't very explicit. It needs to be interpreted but **you** know that the function needs numbers to work correctly because you created it. 470 | 471 | New users might not know that intuitively. 472 | 473 | And that's how you're going to feel when functions you run don't work. 474 | 475 | You'll have to Google the error or peek into code to see if you can see what it expects and how it might break down thanks to what you've passed it. 476 | 477 | And it won't be entirely your fault. 478 | 479 | When we're coding and sharing it for others we can't anticipate all the ways in which others might want to use it in the future. 480 | 481 | Shoot the function writer a message or if you wrote the package, welcome feedback from others. 482 | 483 | This is what makes participating in the R community so great. We just want to do better. 484 | 485 | ## Your turn 486 | 487 | Challenge yourself with [these exercises](http://code.r-journalism.com/chapter-1/#section-intro-to-r) so you'll retain the knowledge of this section. 488 | 489 | Instructions on how to run the exercise app are in the [intro page](http://learn.r-journalism.com/en/how_to_use_r/) to this section. 490 | 491 | 492 | 493 | -------------------------------------------------------------------------------- /chapter-1/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |Assign the number 17 to the object ub
52 |ub
54 |
55 | ub
56 |
57 | ub <- 17
60 |
61 | ub
62 | Create an array of numbers: 301, 978, and 101.
67 |Assign it to the object “years”
68 |years #replace this with your code
70 |
71 | years
72 |
73 | years <- c(301, 978, 101)
76 |
77 | years
78 | What’s the average of the array of numbers assigned to “years”?
83 |(years)
85 |
86 | mean(years)
89 | Take a look at the structure of burgers:
94 |str(burgers)
99 | | 105 | 106 | 112 | 113 | 114 | | 115 |
Consider this data frame burgers
125 |How do you refer to the the shirt variable/column with []?
131 |# Add to the line below
133 |
134 | burgers
135 |
136 | burgers[,4]
139 | How do you refer to the the shirt variable/column with $?
141 |# Add to the line below
143 |
144 | burgers
145 |
146 | burgers$shirt
149 | Extract entire row for Linda using [].
154 |# Add to the line below
156 |
157 | burgers
158 |
159 | burgers[2,]
162 | Convert the id variable of the burgers data frame to numeric.
167 |# Add to the line below
169 |
170 | burgers$id <-
171 |
172 | burgers$id
173 | class(burgers$id)
174 |
175 | burgers$id <- as.numeric(as.character(burgers$id))
178 |
179 | burgers$id
180 | class(burgers$id)
181 | Check if Gene’s age is 11.
186 |Note: Is the answer the same as above (correct) or is it 1-5 (false)?
187 |# Modify the line of code below
189 |
190 | age_test <- burgers$age[5] 11
191 |
192 | age_test
193 |
194 | age_test <- burgers$age[5] == 11
197 |
198 | age_test
199 |