├── .gitignore
├── 00-Getting-started.Rmd
├── 01-Visualize.Rmd
├── 02-Transform.Rmd
├── 03-Tidy.Rmd
├── 04-Case-Study.Rmd
├── 05-Data-Types.Rmd
├── 06-Iterate.Rmd
├── 07-Model.Rmd
├── 08-Organize.Rmd
├── 99-Setup.md
├── README.md
├── cheatsheets
    ├── dplyr-data-transformation.pdf
    ├── ggplot2-data-visualization.pdf
    ├── lubridate-dates-times.pdf
    ├── purrr-iterate.pdf
    ├── rstudio-ide.pdf
    ├── stringr-strings.pdf
    └── tidyr-readr-data-import-tidy.pdf
├── data-science-in-the-tidyverse.Rproj
├── email-to-participants.md
├── resources
    ├── 01-setup-login.png
    ├── 02-setup-temp-project.png
    ├── 03-setup-navigate-to-project.png
    ├── 04-setup-rproj-file.png
    ├── 05-setup-open-project.png
    ├── 06-setup-inside-project.png
    ├── 07-setup-all-done.png
    └── bialik-fridaythe13th-2.png
└── slides
    ├── 00-Introduction.pdf
    ├── 01-Visualize.pdf
    ├── 02-Transform.pdf
    ├── 03-Tidy.pdf
    ├── 04-Case-Study.pdf
    ├── 05-Data-Types.pdf
    ├── 06-Iteration.pdf
    ├── 07-Model.pdf
    ├── 08-Organize.pdf
    └── 10-wrapping-up.pdf


/.gitignore:
--------------------------------------------------------------------------------
1 | .Rproj.user
2 | .Rhistory
3 | .RData
4 | .Ruserdata
5 | *.html
6 | 


--------------------------------------------------------------------------------
/00-Getting-started.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: "R Notebook"
 3 | output: html_notebook
 4 | ---
 5 | 
 6 | ```{r setup}
 7 | library(tidyverse)
 8 | ```
 9 | 
10 | ## R notebooks
11 | 
12 | This is an [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the results appear beneath the code. 
13 | 
14 | R code goes in **code chunks**, denoted by three backticks. Try executing this chunk by clicking the *Run* button within the chunk or by placing your cursor inside it and pressing *Crtl+Shift+Enter* (Windows) or *Cmd+Shift+Enter* (Mac). 
15 | 
16 | ```{r}
17 | ggplot(data = mpg) +
18 |   geom_point(mapping = aes(x = displ, y = hwy))
19 | ```
20 | 
21 | Add a new chunk by clicking the *Insert* button on the toolbar, then selecting *R* or by pressing  *Ctrl+Alt+I* (Windows) or *Cmd+Option+I* (Mac).
22 | 
23 | When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the *Preview* button or press *Ctrl+Shift+K* (Windows) or *Cmd+Shift+K* (Mac) to preview the HTML file). 
24 | 
25 | The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike *Knit*, *Preview* does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.
26 | 
27 | 


--------------------------------------------------------------------------------
/01-Visualize.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Visualize Data"
  3 | output: html_notebook
  4 | ---
  5 | 
  6 | <!-- This file by Charlotte Wickham is licensed under a Creative Commons Attribution 4.0 International License, adapted from the orignal work at https://github.com/rstudio/master-the-tidyverse by RStudio. -->
  7 | 
  8 | ```{r setup}
  9 | library(tidyverse)
 10 | ```
 11 | 
 12 | 
 13 | ```{r}
 14 | mpg
 15 | ```
 16 | 
 17 | ## Quiz
 18 | 
 19 | What relationship do you expect to see between engine size (displ) and highway fuel efficiency (hwy)?
 20 | 
 21 | ## Your Turn 1
 22 | 
 23 | Run the code on the slide to make a graph. Pay strict attention to spelling, capitalization, and parentheses!
 24 | 
 25 | ```{r}
 26 | 
 27 | 
 28 | ```
 29 | 
 30 | ## Your Turn 2
 31 | 
 32 | Add `color`, `size`, `alpha`, and `shape` aesthetics to your graph. Experiment.  
 33 | 
 34 | ```{r}
 35 | ggplot(data = mpg) + 
 36 |   geom_point(mapping = aes(x = displ, y = hwy))
 37 | ```
 38 | 
 39 | ## Your Turn 3
 40 | 
 41 | Replace this scatterplot with one that draws boxplots. Use the cheatsheet. Try your best guess.
 42 | 
 43 | ```{r}
 44 | ggplot(data = mpg) +
 45 |   geom_point(mapping = aes(x = class, y = hwy))
 46 | ```
 47 | 
 48 | ## Your Turn 4
 49 | 
 50 | Make a histogram of the `hwy` variable from `mpg`.
 51 | 
 52 | ```{r}
 53 | ggplot(data = mpg) 
 54 | 
 55 | ```
 56 | 
 57 | ## Your Turn 5
 58 | 
 59 | Make a density plot of `hwy` colored by `class`.
 60 | 
 61 | ```{r}
 62 | ggplot(data = mpg) 
 63 | 
 64 | ```
 65 | 
 66 | ## Your Turn 6
 67 | 
 68 | Make a bar chart `class` colored by `class`.
 69 | 
 70 | ```{r}
 71 | ggplot(data = mpg) 
 72 | 
 73 | ```
 74 | 
 75 | ## Your Turn 7
 76 | 
 77 | Predict what this code will do. Then run it.
 78 | 
 79 | ```{r}
 80 | ggplot(mpg) + 
 81 |   geom_point(aes(displ, hwy)) +
 82 |   geom_smooth(aes(displ, hwy))
 83 | ```
 84 | 
 85 | ## Your Turn 8
 86 | 
 87 | What does `getwd()` return?
 88 | 
 89 | ```{r}
 90 | getwd()
 91 | ```
 92 | 
 93 | ## Your Turn 9
 94 | 
 95 | Save the last plot and then locate it in the files pane.
 96 | 
 97 | ```{r}
 98 | 
 99 | ```
100 | 
101 | ***
102 | 
103 | # Take aways
104 | 
105 | You can use this code template to make thousands of graphs with **ggplot2**.
106 | 
107 | ```{r, eval = FALSE}
108 | ggplot(data = <DATA>) +
109 |   <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
110 | ```
111 | 


--------------------------------------------------------------------------------
/02-Transform.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Transform Data"
  3 | output: html_notebook
  4 | ---
  5 | 
  6 | <!-- This file by Charlotte Wickham is licensed under a Creative Commons Attribution 4.0 International License, adapted from the orignal work at https://github.com/rstudio/master-the-tidyverse by RStudio. -->
  7 | 
  8 | ```{r setup}
  9 | library(tidyverse)
 10 | library(gapminder)
 11 | 
 12 | # Toy dataset to use
 13 | pollution <- tribble(
 14 |        ~city,   ~size, ~amount, 
 15 |   "New York", "large",      23,
 16 |   "New York", "small",      14,
 17 |     "London", "large",      22,
 18 |     "London", "small",      16,
 19 |    "Beijing", "large",      121,
 20 |    "Beijing", "small",      56
 21 | )
 22 | ```
 23 | 
 24 | ## gapminder
 25 | 
 26 | ```{r}
 27 | gapminder
 28 | ```
 29 | 
 30 | ## Your Turn 1
 31 | 
 32 | See if you can use the logical operators to manipulate our code below to show:
 33 | 
 34 | The data for United States
 35 | ```{r}
 36 | filter(gapminder, country == "New Zealand")
 37 | ```
 38 | 
 39 | All data for countries in Oceania
 40 | ```{r}
 41 | filter(gapminder, country == "New Zealand")
 42 | ```
 43 | 
 44 | Rows where the life expectancy is greater than 82
 45 | ```{r}
 46 | filter(gapminder, country == "New Zealand")
 47 | ```
 48 | 
 49 | 
 50 | ## Your Turn 2
 51 | 
 52 | Use Boolean operators to alter the code below to return only the rows that contain:
 53 | 
 54 | * United States before 1970
 55 | 
 56 | ```{r}
 57 | filter(gapminder, country == "New Zealand", year > 2000)
 58 | ```
 59 | 
 60 | *  Countries where life expectancy in 2007 is below 50
 61 | 
 62 | ```{r}
 63 | filter(gapminder, country == "New Zealand", year > 2000)
 64 | ```
 65 | 
 66 | * Records for any of "New Zealand", "Canada" or "United States"
 67 | 
 68 | ```{r}
 69 | filter(gapminder, country == "New Zealand", year > 2000)
 70 | ```
 71 | 
 72 | ## Your Turn 3
 73 | 
 74 | Use `filter()` to get the records for the US, then plot the life expectancy over time.
 75 | 
 76 | ```{r}
 77 | gapminder
 78 | ```
 79 | 
 80 | ## Your Turn 4
 81 | 
 82 | Find the records with the smallest population.
 83 | ```{r}
 84 | 
 85 | ```
 86 | 
 87 | Find the records with the largest GDP per capita.
 88 | ```{r}
 89 | 
 90 | ```
 91 | 
 92 | ## Quiz 
 93 | 
 94 | A function that returns a vector the same length as the input is called **vectorized**.
 95 | 
 96 | Which of the following functions are vectorized?
 97 | 
 98 |   * `ifelse()`
 99 |   * `diff()`
100 |   * `sum()`
101 | 
102 | You might try these:
103 | ```{r}
104 | gapminder %>% 
105 |   mutate(size = ifelse(pop < 10e06, "small", "large"))
106 | ```
107 | 
108 | ```{r, error = TRUE}
109 | gapminder %>% 
110 |   mutate(diff_pop = diff(pop))
111 | ```
112 | 
113 | ```{r}
114 | gapminder %>% 
115 |   mutate(total_pop = sum(as.numeric(pop)))
116 | ```
117 | 
118 | ## Your Turn 5
119 | 
120 | Alter the code to add a `prev_lifeExp` column that contains the life expectancy from the previous record.
121 | 
122 | (Hint: use cheatsheet, you want to offset elements by one)
123 | 
124 | Extra challenge: Why isn't this quite the 'life expectency five years ago'?
125 | 
126 | ```{r}
127 | gapminder %>%
128 |   mutate()
129 | ```
130 | 
131 | ## Your Turn 6
132 | 
133 | Use summarise() to compute three statistics about the data:
134 | 
135 | * The first (minimum) year in the dataset
136 | * The last (maximum) year in the dataset
137 | * The number of countries represented in the data (Hint: use cheatsheet)
138 | 
139 | ```{r}
140 | gapminder 
141 | ```
142 | 
143 | ## Your Turn 7
144 | 
145 | Extract the rows where continent == "Africa" and year == 2007. 
146 | 
147 | Then use summarise() and summary functions to find:
148 | 
149 | 1. The number of unique countries
150 | 2. The median life expectancy
151 | 
152 | ```{r}
153 | gapminder 
154 | ```
155 | 
156 | ## Your Turn 8
157 | 
158 | Find the median life expectancy by continent in 2007.
159 | 
160 | ```{r}
161 | gapminder %>%
162 |   filter(year == 2007)
163 | ```
164 | 
165 | ## Your Turn 9
166 | 
167 | Brainstorm with your neighbor the sequence of operations to find:  the country with biggest jump in life expectancy  (between any two consecutive records) for each continent.
168 | 
169 | ## Your Turn 10
170 | 
171 | Find the country with biggest jump in life expectancy (between any two consecutive records) for each continent.
172 | 
173 | ```{r}
174 | 
175 | 
176 | ```
177 | 
178 | ## Your Turn 11
179 | 
180 | Use `left_join()` to add the country codes in `country_codes` to the gapminder data.
181 | 
182 | ```{r}
183 | country_codes
184 | ```
185 | 
186 | **Challenge**: Which codes in country_codes have no matches in gapminder?
187 | 
188 | ```{r}
189 | 
190 | ```
191 | 
192 | 
193 | ***
194 | 
195 | # Take aways
196 | 
197 | * Extract cases with `filter()`  
198 | * Make new variables, with `mutate()`  
199 | * Make tables of summaries with `summarise()`  
200 | * Do groupwise operations with `group_by()`
201 | * Connect operations with `%>%`  
202 | * Joins are two table verbs
203 | 


--------------------------------------------------------------------------------
/03-Tidy.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Tidy Data"
  3 | output: html_notebook
  4 | ---
  5 | 
  6 | <!-- This file by Charlotte Wickham is licensed under a Creative Commons Attribution 4.0 International License, adapted from the orignal work at https://github.com/rstudio/master-the-tidyverse by RStudio. -->
  7 | 
  8 | ```{r setup}
  9 | library(tidyverse)
 10 | 
 11 | # Toy data
 12 | cases <- tribble(
 13 |   ~Country, ~"2011", ~"2012", ~"2013",
 14 |       "FR",    7000,    6900,    7000,
 15 |       "DE",    5800,    6000,    6200,
 16 |       "US",   15000,   14000,   13000
 17 | )
 18 | 
 19 | pollution <- tribble(
 20 |        ~city,   ~size, ~amount,
 21 |   "New York", "large",      23,
 22 |   "New York", "small",      14,
 23 |     "London", "large",      22,
 24 |     "London", "small",      16,
 25 |    "Beijing", "large",     121,
 26 |    "Beijing", "small",     56
 27 | )
 28 | 
 29 | 
 30 | bp_systolic <- tribble(
 31 |   ~ subject_id,  ~ time_1, ~ time_2, ~ time_3,
 32 |              1,       120,      118,      121,
 33 |              2,       125,      131,       NA,
 34 |              3,       141,       NA,       NA 
 35 | )
 36 | 
 37 | bp_systolic2 <- tribble(
 38 |   ~ subject_id,  ~ time, ~ systolic,
 39 |              1,       1,        120,
 40 |              1,       2,        118,
 41 |              1,       3,        121,
 42 |              2,       1,        125,
 43 |              2,       2,        131,
 44 |              3,       1,        141
 45 | )
 46 | ```
 47 | 
 48 | ## Tidy and untidy data
 49 | 
 50 | `table1` is tidy:
 51 | ```{r}
 52 | table1 
 53 | ```
 54 | 
 55 | For example, it's easy to add a rate column with `mutate()`:
 56 | ```{r}
 57 | table1 %>%
 58 |   mutate(rate = cases/population)
 59 | ```
 60 | 
 61 | `table2` isn't tidy, the count column really contains two variables:
 62 | ```{r}
 63 | table2
 64 | ```
 65 | 
 66 | It makes it very hard to manipulate.
 67 | 
 68 | ## Your Turn 1
 69 | 
 70 | Is `bp_systolic` tidy?
 71 | 
 72 | ```{r}
 73 | bp_systolic2 
 74 | ```
 75 | 
 76 | ## Your Turn 2
 77 | 
 78 | Using `bp_systolic2` with `group_by()`, and `summarise()`:
 79 | 
 80 | * Find the average systolic blood pressure for each subject
 81 | * Find the last time each subject was measured
 82 | 
 83 | ```{r}
 84 | bp_systolic2
 85 | ```
 86 | 
 87 | ## Your Turn 3
 88 | 
 89 | On a sheet of paper, draw how the cases data set would look if it had the same values grouped into three columns: **country**, **year**, **n**
 90 | 
 91 | ## Your Turn 4
 92 | 
 93 | Use `gather()` to reorganize `table4a` into three columns: **country**, **year**, and **cases**.
 94 | 
 95 | ```{r}
 96 | table4a 
 97 | ```
 98 | 
 99 | ## Your Turn 5
100 | 
101 | On a sheet of paper, draw how this data set would look if it had the same values grouped into three columns: **city**, **large**, **small**
102 | 
103 | ## Your Turn 6
104 | 
105 | Use `spread()` to reorganize `table2` into four columns: **country**, **year**, **cases**, and **population**.
106 | 
107 | ```{r}
108 | table2 
109 | ```
110 | 
111 | ***
112 | 
113 | # Take Aways
114 | 
115 | Data comes in many formats but R prefers just one: _tidy data_.
116 | 
117 | A data set is tidy if and only if:
118 | 
119 | 1. Every variable is in its own column
120 | 2. Every observation is in its own row
121 | 3. Every value is in its own cell (which follows from the above)
122 | 
123 | What is a variable and an observation may depend on your immediate goal.
124 | 


--------------------------------------------------------------------------------
/04-Case-Study.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Case Study: Friday the 13th Effect"
  3 | output: html_notebook
  4 | ---
  5 | 
  6 | <!-- This file by Charlotte Wickham is licensed under a Creative Commons Attribution 4.0 International License. -->
  7 | 
  8 | ```{r setup}
  9 | library(fivethirtyeight)
 10 | library(tidyverse)
 11 | ```
 12 | 
 13 | ## Task
 14 | 
 15 | Reproduce this figure from fivethirtyeight's article [*Some People Are Too Superstitious To Have A Baby On Friday The 13th*](https://fivethirtyeight.com/features/some-people-are-too-superstitious-to-have-a-baby-on-friday-the-13th/):
 16 | 
 17 | ![](resources/bialik-fridaythe13th-2.png)
 18 | 
 19 | ## Data
 20 | 
 21 | In the `fivethiryeight` package there are two datasets containing birth data, but for now let's just work with one, `US_births_1994_2003`.  Note that since we have data from 1994-2003, our results may differ somewhat from the figure based on 1994-2014.
 22 | 
 23 | ## Your Turn 1 
 24 | 
 25 | With your neighbour, brainstorm the steps needed to get the data in a form ready to make the plot.
 26 | 
 27 | ```{r}
 28 | US_births_1994_2003
 29 | ```
 30 | 
 31 | ## Some overviews of the data
 32 | 
 33 | Whole time series:
 34 | ```{r}
 35 | ggplot(US_births_1994_2003, aes(x = date, y = births)) +
 36 |   geom_line()
 37 | ```
 38 | There is so much fluctuation it's really hard to see what is going on.
 39 | 
 40 | Let's try just looking at one year:
 41 | ```{r}
 42 | US_births_1994_2003 %>%
 43 |   filter(year == 1994) %>%
 44 |   ggplot(mapping = aes(x = date, y = births)) +
 45 |     geom_line()
 46 | ```
 47 | Strong weekly pattern accounts for most variation.
 48 | 
 49 | ## Strategy
 50 | 
 51 | Use the figure as a guide for what the data should like to make the final plot.  We want to end up with something like:
 52 | 
 53 | ---------------------------
 54 |  day_of_week   avg_diff_13 
 55 | ------------- -------------
 56 |      Mon         -2.686    
 57 | 
 58 |     Tues         -1.378    
 59 | 
 60 |      Wed         -3.274    
 61 |      
 62 |      ...          ...
 63 |      
 64 | ---------------------------     
 65 | 
 66 | 
 67 | ## Your Turn 2
 68 | 
 69 | Extract just the 6th, 13th and 20th of each month:
 70 | 
 71 | ```{r}
 72 | US_births_1994_2003 %>%
 73 |   select(-date) 
 74 | 
 75 | ```
 76 | 
 77 | ## Your Turn 3
 78 | 
 79 | Which arrangement is tidy?
 80 | 
 81 | **Option 1:**
 82 | 
 83 | -----------------------------------------------------
 84 |  year   month   date_of_month   day_of_week   births 
 85 | ------ ------- --------------- ------------- --------
 86 |  1994     1           6            Thurs      11406  
 87 | 
 88 |  1994     1          13            Thurs      11212  
 89 | 
 90 |  1994     1          20            Thurs      11682  
 91 | -----------------------------------------------------
 92 | 
 93 | **Option 2:**
 94 | 
 95 | ----------------------------------------------------
 96 |  year   month   day_of_week     6      13      20   
 97 | ------ ------- ------------- ------- ------- -------
 98 |  1994     1        Thurs      11406   11212   11682 
 99 | ----------------------------------------------------
100 | 
101 | (**Hint:** think about our next step *"Find the percent difference between the 13th and the average of the 6th and 12th"*. In which layout will this be easier using our tidy tools?)
102 | 
103 | ## Your Turn 4
104 | 
105 | Tidy the filtered data to have the days in columns.
106 | 
107 | ```{r}
108 | US_births_1994_2003 %>%
109 |   select(-date) %>% 
110 |   filter(date_of_month %in% c(6, 13, 20))
111 | ```
112 | 
113 | ## Your Turn 5
114 | 
115 | Now use `mutate()` to add columns for:
116 | 
117 | * The average of the births on the 6th and 20th
118 | * The percentage difference between the number of births on the 13th and the average of the 6th and 20th
119 | 
120 | ```{r}
121 | US_births_1994_2003 %>%
122 |   select(-date) %>% 
123 |   filter(date_of_month %in% c(6, 13, 20)) %>%
124 |   spread(date_of_month, births) 
125 | ```
126 | 
127 | ## A little additional exploring
128 | 
129 | Now we have a percent difference between the 13th and the 6th and 20th of each month, it's probably worth exploring a little (at the very least to check our calculations seem reasonable).
130 | 
131 | To make it a little easier let's assign our current data to a variable
132 | ```{r}
133 | births_diff_13 <- US_births_1994_2003 %>%
134 |   select(-date) %>% 
135 |   filter(date_of_month %in% c(6, 13, 20)) %>%
136 |   spread(date_of_month, births) %>%
137 |   mutate(
138 |     avg_6_20 = (`6` + `20`)/2,
139 |     diff_13 = (`13` - avg_6_20) / avg_6_20 * 100
140 |   )
141 | ```
142 | 
143 | Then take a look
144 | ```{r}
145 | births_diff_13 %>% 
146 |   ggplot(mapping = aes(day_of_week, diff_13)) +
147 |     geom_point()
148 | ```
149 | 
150 | Looks like we are on the right path.  There's a big outlier one Monday
151 | ```{r}
152 | births_diff_13 %>%
153 |   filter(day_of_week == "Mon", diff_13 > 10)
154 | ```
155 | 
156 | Seem's to be driven but a particularly low number of births on the 6th of Sep 1999. Maybe a holiday effect? Labour Day was of the 6th of Sep that year.
157 | 
158 | ## Your Turn 6
159 | 
160 | Summarize each day of the week to have mean of diff_13.
161 | 
162 | Then, recreate the fivethirtyeight plot.
163 | 
164 | ```{r}
165 | US_births_1994_2003 %>%
166 |   select(-date) %>% 
167 |   filter(date_of_month %in% c(6, 13, 20)) %>%
168 |   spread(date_of_month, births) %>%
169 |   mutate(
170 |     avg_6_20 = (`6` + `20`)/2,
171 |     diff_13 = (`13` - avg_6_20) / avg_6_20 * 100
172 |   ) 
173 | ```
174 | 
175 | ## Extra Challenges
176 | 
177 | * If you wanted to use the `US_births_2000_2014` data instead, what would you need to change in the pipeline?  How about using both `US_births_1994_2003` and `US_births_2000_2014`?
178 | 
179 | * Try not removing the `date` column. At what point in the pipeline does it cause problems? Why?
180 | 
181 | * Can you come up with an alternative way to investigate the Friday the 13th effect?  Try it out!
182 | 
183 | ## Takeaways
184 | 
185 | The power of the tidyverse comes from being able to easily combine functions that do simple things well.  
186 | 
187 | 


--------------------------------------------------------------------------------
/05-Data-Types.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Data Types"
  3 | output: html_notebook
  4 | ---
  5 | 
  6 | <!-- This file by Charlotte Wickham is licensed under a Creative Commons Attribution 4.0 International License, adapted from the orignal work at https://github.com/rstudio/master-the-tidyverse by RStudio. -->
  7 | 
  8 | ```{r setup}
  9 | library(tidyverse)
 10 | library(lubridate)
 11 | 
 12 | # Example of a factor
 13 | eyes <- factor(x = c("blue", "green", "green"), 
 14 |                levels = c("blue", "brown", "green"))
 15 | 
 16 | # An example for times/dates
 17 | library(fivethirtyeight)
 18 | births <- US_births_1994_2003 %>%
 19 |   select(date, births)
 20 | ```
 21 | 
 22 | ## Warm-up / Review
 23 | 
 24 | Using the data `gss_cat`, find the average hours of tv watched (`tvhours`) for each category of marital status (`marital`).
 25 | 
 26 | ```{r}
 27 | gss_cat
 28 | ```
 29 | 
 30 | ## Your Turn 1
 31 | 
 32 | What kind of object is the `marital` variable?  
 33 | 
 34 | ```{r}
 35 | gss_cat
 36 | ```
 37 | 
 38 | Brainstorm with your neighbor, all the things you know about that kind of object.
 39 | 
 40 | # Factors
 41 | 
 42 | ## Your Turn 2
 43 | 
 44 | Fix your summary of average hours of tv watched (`tvhours`) by marital status (`marital`), to drop missing values in `tvhours`, then create a plot to examine the results.
 45 | 
 46 | ```{r}
 47 | gss_cat %>%
 48 |   group_by(marital) %>% 
 49 |   summarise(avg_tvhours = mean(tvhours)) 
 50 | ```
 51 | 
 52 | ## Your Turn 3
 53 | 
 54 | Fill in the blanks (`   `) to explore the average hours of tv watched by religion.
 55 | 
 56 | ```{r, error = TRUE}
 57 | gss_cat %>%
 58 |   drop_na(    ) %>%
 59 |   group_by(    ) %>%
 60 |   summarise(    ) %>%
 61 |   ggplot() +
 62 |     geom_point(mapping = aes(x =     , y =    ))
 63 | ```
 64 | 
 65 | ## Quiz 
 66 | 
 67 | Why is this plot not very useful?
 68 | 
 69 | ```{r}
 70 | gss_cat %>%
 71 |   drop_na(tvhours) %>%
 72 |   group_by(denom) %>%
 73 |   summarise(avg_tvhours = mean(tvhours)) %>%
 74 |   ggplot() +
 75 |     geom_point(mapping = aes(x = avg_tvhours,
 76 |       y = fct_reorder(denom, avg_tvhours)))
 77 | ```
 78 | 
 79 | ## Your Turn 4
 80 | 
 81 | Edit the code to also relabel some other Baptist denominations:
 82 | 
 83 | * "Baptist-dk which" -> "Baptist - Don't Know"    
 84 | * "Other baptists" -> "Baptist = Other"
 85 | 
 86 | ```{r}
 87 | gss_cat %>%
 88 |   mutate(denom = fct_recode(denom,
 89 |     "Baptist - Southern" = "Southern baptist")
 90 |   ) %>%
 91 |   pull(denom) %>%
 92 |   levels()
 93 | ```
 94 | 
 95 | ## Your Turn 5
 96 | 
 97 | What does the function `detect_denom()` do?
 98 | 
 99 | ```{r}
100 | detect_denom <- function(x){
101 |   case_when(
102 |     str_detect(x, "[Bb]ap") ~ "Baptist", 
103 |     str_detect(x, "[Pp]res") ~ "Presbyterian",
104 |     str_detect(x, "[Ll]uth") ~ "Lutheran",
105 |     str_detect(x, "[Mm]eth") ~ "Methodist",
106 |     TRUE ~ x
107 |   )
108 | }
109 | 
110 | gss_cat %>% pull(denom) %>% levels() %>% detect_denom()
111 | ```
112 | 
113 | # Strings
114 | 
115 | With your neighbor, predict what these might return:
116 | 
117 | ```{r}
118 | strings <- c("Apple", "Pineapple", "Orange")
119 | 
120 | str_detect(strings, pattern = "pp")
121 | str_detect(strings, pattern =  "apple")
122 | str_detect(strings, pattern = "[Aa]pple")
123 | ```
124 | 
125 | Then run them!
126 | 
127 | # Times and Dates
128 | 
129 | ## Your Turn 7
130 | 
131 | For each of the following formats (of the same date), pick the right `ymd()` function to parse them:
132 | 
133 | ```{r}
134 | "2018 Feb 01"
135 | "2-1-18"
136 | "01/02/2018"
137 | ```
138 | 
139 | ## Your Turn 8
140 | 
141 | Fill in the blanks to:
142 | 
143 | * Extract the month from date. 
144 | * Extract the year from date.
145 | * Find the total births for each year/month.
146 | * Plot the results as a line chart.
147 | 
148 | ```{r, error = TRUE}
149 | births %>%
150 |   mutate(year = ___,
151 |     month = ___) %>%
152 |   group_by(___, ___) %>%
153 |   summarise(total_births = ___) %>%
154 |   ggplot() + 
155 |     geom_line(aes(x = month, y = total_births, group = year))
156 | ```
157 | 
158 | 
159 | # Take Aways
160 | 
161 | Dplyr gives you three _general_ functions for manipulating data: `mutate()`, `summarise()`, and `group_by()`. Augment these with functions from the packages below, which focus on specific types of data.
162 | 
163 | Package   | Data Type
164 | --------- | --------
165 | forcats   | factors
166 | stringr   | strings
167 | hms       | times
168 | lubridate | dates and times
169 | 


--------------------------------------------------------------------------------
/06-Iterate.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Iteration"
  3 | output: html_document
  4 | ---
  5 | 
  6 | <!-- This file by Charlotte Wickham is licensed under a Creative Commons Attribution 4.0 International License, adapted from the orignal work at https://github.com/rstudio/master-the-tidyverse by RStudio. -->
  7 | 
  8 | ```{r setup}
  9 | library(tidyverse)
 10 | 
 11 | # Toy data
 12 | set.seed(1000)
 13 | exams <- list(
 14 |   student1 = round(runif(10, 50, 100)),
 15 |   student2 = round(runif(10, 50, 100)),
 16 |   student3 = round(runif(10, 50, 100)),
 17 |   student4 = round(runif(10, 50, 100)),
 18 |   student5 = round(runif(10, 50, 100))
 19 | )
 20 | 
 21 | extra_credit <- list(0, 0, 10, 10, 15)
 22 | ```
 23 | 
 24 | ## Your Turn 1
 25 | 
 26 | What kind of object is `mod`?  Why are models stored as this kind of object?
 27 | 
 28 | ```{r}
 29 | mod <- lm(price ~ carat + cut + color + clarity, data = diamonds)
 30 | View(mod)
 31 | ```
 32 | 
 33 | ## Quiz
 34 | 
 35 | What's the difference between a list and an **atomic** vector?
 36 | 
 37 | Atomic vectors are: "logical", "integer", "numeric" (synonym "double"), "complex", "character" and "raw" vectors.
 38 | 
 39 | ## Your Turn 2
 40 | 
 41 | Here is a list:
 42 | 
 43 | ```{r}
 44 | a_list <- list(nums = c(8, 9), 
 45 |             log = TRUE,    
 46 |             cha = c("a", "b", "c"))
 47 | ```
 48 | 
 49 | Here are two subsetting commands. Do they return the same values? Run the code chunk above, _and then_ run the code chunks below to confirm
 50 | 
 51 | ```{r}
 52 | a_list["nums"] 
 53 | ```
 54 | 
 55 | ```{r}
 56 | a_list$nums
 57 | ```
 58 | 
 59 | ## Your Turn 3
 60 | 
 61 | What will each of these return? Run the code chunks to confirm.
 62 | 
 63 | ```{r}
 64 | vec <- c(-2, -1, 0, 1, 2)
 65 | abs(vec)
 66 | ```
 67 | 
 68 | ```{r, error = TRUE}
 69 | lst <- list(-2, -1, 0, 1, 2)
 70 | abs(lst)
 71 | ```
 72 | 
 73 | ## Your Turn 4
 74 | 
 75 | Run the code in the chunks. What does it return?
 76 | 
 77 | ```{r}
 78 | list(student1 = mean(exams$student1),
 79 |      student2 = mean(exams$student2),
 80 |      student3 = mean(exams$student3),
 81 |      student4 = mean(exams$student4),
 82 |      student5 = mean(exams$student5))
 83 | ```
 84 | 
 85 | ```{r}
 86 | library(purrr)
 87 | map(exams, mean)
 88 | ```
 89 | 
 90 | ## Your Turn 5
 91 | 
 92 | Calculate the variance (`var()`) of each student’s exam grades.
 93 | 
 94 | ```{r}
 95 | exams 
 96 | ```
 97 | 
 98 | ## Your Turn 6
 99 | 
100 | Calculate the max grade (`max()`)for each student. Return the result as a vector.
101 | 
102 | ```{r}
103 | exams
104 | ```
105 | 
106 | ## Your Turn 7
107 | 
108 | Write a function that counts the best exam twice and then takes the average. Use it to grade all of the students.
109 | 
110 | 1. Write code that solves the problem for a real object  
111 | 2. Wrap the code in `function(){}` to save it  
112 | 3. Add the name of the real object as the function argument 
113 | 
114 | ```{r}
115 | vec <- exams[[1]]
116 | 
117 | 
118 | ```
119 | 
120 | ### Your Turn 8
121 | 
122 | Compute a final grade for each student, where the final grade is the average test score plus any `extra_credit` assigned to the student. Return the results as a double (i.e. numeric) vector.
123 | 
124 | ```{r}
125 | 
126 | ```
127 | 
128 | 
129 | ***
130 | 
131 | # Take Aways
132 | 
133 | Lists are a useful way to organize data, but you need to arrange manually for functions to iterate over the elements of a list.
134 | 
135 | You can do this with the `map()` family of functions in the purrr package.
136 | 
137 | To write a function, 
138 | 
139 | 1. Write code that solves the problem for a real object  
140 | 2. Wrap the code in `function(){}` to save it  
141 | 3. Add the name of the real object as the function argument 
142 | 
143 | This sequence will help prevent bugs in your code (and reduce the time you spend correcting bugs). 
144 | 


--------------------------------------------------------------------------------
/07-Model.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: "Model"
 3 | output: html_notebook
 4 | ---
 5 | 
 6 | ```{r setup, message=FALSE}
 7 | library(tidyverse)
 8 | library(modelr)
 9 | library(broom)
10 | 
11 | wages <- heights %>% filter(income > 0)
12 | ```
13 | 
14 | ## Your Turn 1
15 | 
16 | Fit the model on the slide and then examine the output. What does it look like?
17 | 
18 | ```{r}
19 | mod_e <- lm(log(income) ~ education, data = wages)
20 | mod_e
21 | ```
22 | 
23 | ## Your Turn 2
24 | 
25 | Use a pipe to model `log(income)` against `height`. Then use broom and dplyr functions to extract:
26 | 
27 | 1. The **coefficient estimates** and their related statistics 
28 | 2. The **adj.r.squared** and **p.value** for the overall model
29 | 
30 | ```{r, error = TRUE}
31 | mod_h <- wages %>% lm(    )
32 | 
33 | 
34 | ```
35 | 
36 | ## Your Turn 3
37 | 
38 | Model `log(income)` against `education` _and_ `height`. Do the coefficients change?
39 | 
40 | ```{r, error = TRUE}
41 | mod_eh <- wages %>% lm(    )
42 | 
43 | ```
44 | 
45 | ## Your Turn 4
46 | 
47 | Model `log(income)` against `education` and `height` and `sex`. Can you interpret the coefficients?
48 | 
49 | ```{r, error = TRUE}
50 | mod_ehs <- wages %>%  lm(   )
51 | ```
52 | 
53 | ## Your Turn 5
54 | 
55 | Use a broom function and ggplot2 to make a line graph of `height` vs `.fitted` for our heights model, `mod_h`.
56 | 
57 | _Bonus: Overlay the plot on the original data points._
58 | 
59 | ```{r}
60 | mod_h <- wages %>% lm(log(income) ~ height, data = .)
61 | 
62 | ```
63 | 
64 | ## Your Turn 6
65 | 
66 | Repeat the process to make a line graph of `height` vs `.fitted` colored by `sex` for model `mod_ehs`. Are the results interpretable? Add `+ facet_wrap(~education)` to the end of your code. What happens?
67 | 
68 | ```{r}
69 | mod_ehs <- wages %>% lm(log(income) ~ education + height + sex, data = .)
70 | 
71 | ```
72 | 
73 | ## Your Turn 7
74 | 
75 | Use one of `spread_predictions()` or `gather_predictions()` to make a line graph of `height` vs `pred` colored by `model` for each of mod_h, mod_eh, and mod_ehs. Are the results interpretable? 
76 | 
77 | Add `+ facet_grid(sex ~ education)` to the end of your code. What happens?
78 | 
79 | ```{r warning = FALSE, message = FALSE}
80 | mod_h <- wages %>% lm(log(income) ~ height, data = .)
81 | mod_eh <- wages %>% lm(log(income) ~ education + height, data = .)
82 | mod_ehs <- wages %>% lm(log(income) ~ education + height + sex, data = .)
83 | 
84 | 
85 | ```
86 | 
87 | ***
88 | 
89 | # Take Aways
90 | 
91 | * Use `glance()`, `tidy()`, and `augment()` from the **broom** package to return model values in a data frame.
92 | 
93 | * Use `add_predictions()` or `gather_predictions()` or `spread_predictions()` from the **modelr** package to visualize predictions.
94 | 
95 | * Use `add_residuals()` or `gather_residuals()` or `spread_residuals()` from the **modelr** package to visualize residuals.
96 | 
97 | 


--------------------------------------------------------------------------------
/08-Organize.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Organize with List Columns"
  3 | output: html_notebook
  4 | ---
  5 | 
  6 | <!-- This file by Charlotte Wickham is licensed under a Creative Commons Attribution 4.0 International License, adapted from the orignal work at https://github.com/rstudio/master-the-tidyverse by RStudio. -->
  7 | 
  8 | ```{r setup}
  9 | library(tidyverse)
 10 | library(gapminder)
 11 | library(broom)
 12 | 
 13 | nz <- gapminder %>%
 14 |   filter(country == "New Zealand")
 15 | us <- gapminder %>%
 16 |   filter(country == "United States")
 17 | ```
 18 | 
 19 | ## Your turn 1
 20 | 
 21 | How has life expectancy changed over time?
 22 | Make a line plot of lifeExp vs. year grouped by country.  
 23 | Set alpha to 0.2, to see the results better.
 24 | 
 25 | ```{r}
 26 | gapminder
 27 | 
 28 | 
 29 | ```
 30 | 
 31 | ## Quiz
 32 | 
 33 | How is a data frame/tibble similar to a list?
 34 | 
 35 | ## Quiz
 36 | 
 37 | If one of the elements of a list can be another list,
 38 | can one of the columns of a data frame be another list?
 39 | 
 40 | ## Your turn 2
 41 | 
 42 | Run this chunk:
 43 | ```{r}
 44 | gapminder_nested <- gapminder %>%
 45 |   group_by(country) %>%
 46 |   nest()
 47 | 
 48 | fit_model <- function(df) lm(lifeExp ~ year, data = df)
 49 | 
 50 | gapminder_nested <- gapminder_nested %>% 
 51 |   mutate(model = map(data, fit_model))
 52 | 
 53 | get_rsq <- function(mod) glance(mod)$r.squared
 54 | 
 55 | gapminder_nested <- gapminder_nested %>% 
 56 |   mutate(r.squared = map_dbl(model, get_rsq))
 57 | ```
 58 | 
 59 | Then filter `gapminder_nested` to find the countries with r.squared less than 0.5.  
 60 | 
 61 | ```{r}
 62 | 
 63 | ```
 64 | 
 65 | ## Your Turn 3
 66 | 
 67 | Edit the code in the chunk provided to instead find and plot countries with a slope above 0.6 years/year.
 68 | 
 69 | ```{r}
 70 | get_slope <- function(mod) {
 71 |   tidy(mod) %>% filter(term == "year") %>% pull(estimate)
 72 | }
 73 | 
 74 | # Add new column with r-sqaured
 75 | gapminder_nested <- gapminder_nested %>% 
 76 |   mutate(r.squared = map_dbl(model, get_rsq))
 77 | 
 78 | # filter out low r-squared countries
 79 | poor_fit <- gapminder_nested %>% 
 80 |   filter(r.squared < 0.5)
 81 | 
 82 | # unnest and plot result
 83 | unnest(poor_fit, data) %>%
 84 |   ggplot(aes(x = year, y = lifeExp)) + 
 85 |     geom_line(aes(color = country))
 86 | ```
 87 | 
 88 | ## Your Turn 4
 89 | 
 90 | **Challenge:**
 91 | 
 92 | 1. Create your own copy of `gapminder_nested` and then add one more list column: `output` which contains the output of `augment()` for each model.
 93 | 
 94 | 
 95 | ```{r}
 96 | 
 97 | ```
 98 | 
 99 | # Take away
100 | 
101 | 


--------------------------------------------------------------------------------
/99-Setup.md:
--------------------------------------------------------------------------------
 1 | # Getting Set Up
 2 | 
 3 | During the workshop you'll do your work on [rstudio.cloud](https://rstudio.cloud/).  This provides an easy way for me to share all the materials with you, and removes the hassle of getting the right versions of R, RStudio or any packages.
 4 | 
 5 | ## To get started:
 6 | 
 7 | To get set up follow these steps:
 8 | 
 9 | 1.  Visit the project at https://rstudio.cloud/project/10871
10 | 
11 | 2.  Log in using google, github, shinyapps.io or "Sign Up".
12 | 
13 | ![](resources/01-setup-login.png)
14 | 
15 | 3.  The "Data Science in the tidyverse" project will open, but it's a *Temporary copy*.  Click *Save a copy*.
16 | 
17 | ![](resources/02-setup-temp-project.png)
18 | 
19 | 4. Now the "Data Science in the tidyverse" project will open again, but this time it is your own copy.  Navigate to the project folder:
20 | 
21 | ![](resources/03-setup-navigate-to-project.png)
22 | 
23 | 5. Inside the project folder navigate to the "data-science-in-the-tidyverse.Rproj" file and click it.
24 | 
25 | ![](resources/04-setup-rproj-file.png)
26 | 
27 | 6. You'll be asked if you want to open the project, hit Yes.
28 | 
29 | ![](resources/05-setup-open-project.png)
30 | 
31 | 7. All going well, you should now see your project looking like this.  Now, open "00-Getting-started.Rmd"
32 | 
33 | ![](resources/06-setup-inside-project.png)
34 | 
35 | 8.  You're all set!  You might like to read through "00-Getting-started.Rmd" and do what it tells you.
36 | 
37 | ![](resources/07-setup-all-done.png)
38 | 
39 | ## Once you are set up
40 | 
41 | You can access your copy of the project from *Your Workspace* on [rstudio.cloud](https://rstudio.cloud/).  


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | This is the repo for *"Data Science in the tidyverse"* given at `rstudio::conf(2018)` in Jan 2018.
 2 | 
 3 | ## Description
 4 | 
 5 | This is a two-day hands on workshop based on the book [R for Data Science](http://r4ds.had.co.nz/). This workshop is designed for people who are familiar with R and want to learn how to achieve their data analysis goals the "tidy" way. You will learn how to visualize, transform, and model data in R and work with date-times, character strings, and untidy data formats. Along the way, you will learn and use many packages from the tidyverse including ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, lubridate, and forcats.
 6 | 
 7 | ## Software Requirements
 8 | 
 9 | You'll be using RStudio Cloud, so (all going well) on the day of the workshop all you'll need is **a laptop that can access the internet** (wifi will be available).  
10 | 
11 | However, as a backup (e.g. in case of wifi issues) it would be best if you also have R and RStudio installed locally, along with the following packages:
12 | 
13 |     install.packages(c("tidyverse", "fivethirtyeight",
14 |        "gapminder", "rmarkdown"))
15 | 
16 | Don't forget to bring your power cable!
17 | 
18 | ## Instructor Info
19 | 
20 | Charlotte Wickham
21 | 
22 | -   [cwickham@gmail.com](cwickham@gmail.com)
23 | -   [cwick.co.nz](http://www.cwick.co.nz)
24 | -   @[cvwickham](http://www.twitter.com/cvwickham)
25 | 
26 | ## License
27 | 
28 | <a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a>
29 | 
30 | <span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">*Data Science in the tidyverse*</span> by <a xmlns:cc="http://creativecommons.org/ns#" href="https://github.com/cwickham/data-science-in-the-tidyverse" property="cc:attributionName" rel="cc:attributionURL">Charlotte Wickham</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.  Based on a work at <a xmlns:dct="http://purl.org/dc/terms/" href="https://github.com/rstudio/master-the-tidyverse" rel="dct:source">https://github.com/rstudio/master-the-tidyverse</a>.
31 | 


--------------------------------------------------------------------------------
/cheatsheets/dplyr-data-transformation.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/cheatsheets/dplyr-data-transformation.pdf


--------------------------------------------------------------------------------
/cheatsheets/ggplot2-data-visualization.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/cheatsheets/ggplot2-data-visualization.pdf


--------------------------------------------------------------------------------
/cheatsheets/lubridate-dates-times.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/cheatsheets/lubridate-dates-times.pdf


--------------------------------------------------------------------------------
/cheatsheets/purrr-iterate.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/cheatsheets/purrr-iterate.pdf


--------------------------------------------------------------------------------
/cheatsheets/rstudio-ide.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/cheatsheets/rstudio-ide.pdf


--------------------------------------------------------------------------------
/cheatsheets/stringr-strings.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/cheatsheets/stringr-strings.pdf


--------------------------------------------------------------------------------
/cheatsheets/tidyr-readr-data-import-tidy.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/cheatsheets/tidyr-readr-data-import-tidy.pdf


--------------------------------------------------------------------------------
/data-science-in-the-tidyverse.Rproj:
--------------------------------------------------------------------------------
 1 | Version: 1.0
 2 | 
 3 | RestoreWorkspace: Default
 4 | SaveWorkspace: Default
 5 | AlwaysSaveHistory: Default
 6 | 
 7 | EnableCodeIndexing: Yes
 8 | UseSpacesForTab: Yes
 9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 | 
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 | 


--------------------------------------------------------------------------------
/email-to-participants.md:
--------------------------------------------------------------------------------
 1 | *(This will be sent to registered particpants by email, but I'm also posting here as a convenient place to field any questions/issues.)*
 2 | 
 3 | Hello,
 4 | 
 5 | I'm excited to meet you all next week in "Data Science in the tidyverse"!
 6 | 
 7 | The workshop starts Weds Jan 31st at 9am in Seaport H, but you may want to arrive a little before 9am to get set up.  The TAs and I will be there to help you if you need it. 
 8 | 
 9 | **What should you bring?**
10 | 
11 | A **laptop** that can access the internet (wifi will be available) and your **power cable**.
12 | 
13 | You'll be using [RStudio Cloud](https://rstudio.cloud/), so on the day of the workshop (all going well), you shouldn't need anything else.  
14 | 
15 | However, as a backup (e.g. in case of wifi issues), **you should also have**:
16 | 
17 | * R and RStudio installed locally,
18 | 
19 | * the following packages:
20 |     ```    
21 |     install.packages(c("tidyverse", "fivethirtyeight",
22 |       "gapminder", "rmarkdown"))
23 |     ```
24 |     
25 | * and the materials as a .ZIP from https://github.com/cwickham/data-science-in-tidyverse/archive/master.zip
26 | 
27 | **Is this class for me?**
28 | 
29 | This class is designed for people who have some experience with R, but want to learn about tools in the tidyverse.  In, particular, 
30 | here are some things I'll assume you already know how to do:
31 | 
32 | * You know how to assign variables in R
33 | * You recognize the basic syntax for calling R functions, i.e. `function_name(arg_1 = arg1_value, ...)` etc.
34 | * You can predict the output of something like: `mean(c(1, 2, 3))`
35 | 
36 | Here are some others that I'll remind you about, but assume you have probably seen before:
37 | 
38 | * How to open an R script, or R notebook and execute the code in it
39 | * How to get an overview of a data frame or tibble (things like `head()`, `names()`, `dplyr::glimpse()`, `View()`)
40 | * The difference between lists and atomic vectors (double, integer, character etc.)
41 | 
42 | If those items seem foreign, you might be better served in  "Intro to R & RStudio".  You'll still cover some of the tidyverse, but the pace will be a little slower, and more time will be spent getting familiar with R and RStudio.
43 | 
44 | **Is there anything you should do before arriving?**
45 | 
46 | Visit [rstudio.cloud](https://rstudio.cloud/) and check you can log in (you can use your existing google or github account, or sign up for an account).
47 | 
48 | If you are dying to get started, feel free to poke around the materials on [github](https://github.com/cwickham/data-science-in-tidyverse), or even try [getting set up](https://github.com/cwickham/data-science-in-tidyverse/blob/master/99-Setup.md) early, but be aware that there might some small changes between now and next Wednesday.
49 | 
50 | If you have any other questions, feel free to ask them over at https://community.rstudio.com/t/information-for-data-science-in-the-tidyverse/4539
51 | 
52 | 
53 | 


--------------------------------------------------------------------------------
/resources/01-setup-login.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/resources/01-setup-login.png


--------------------------------------------------------------------------------
/resources/02-setup-temp-project.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/resources/02-setup-temp-project.png


--------------------------------------------------------------------------------
/resources/03-setup-navigate-to-project.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/resources/03-setup-navigate-to-project.png


--------------------------------------------------------------------------------
/resources/04-setup-rproj-file.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/resources/04-setup-rproj-file.png


--------------------------------------------------------------------------------
/resources/05-setup-open-project.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/resources/05-setup-open-project.png


--------------------------------------------------------------------------------
/resources/06-setup-inside-project.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/resources/06-setup-inside-project.png


--------------------------------------------------------------------------------
/resources/07-setup-all-done.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/resources/07-setup-all-done.png


--------------------------------------------------------------------------------
/resources/bialik-fridaythe13th-2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/resources/bialik-fridaythe13th-2.png


--------------------------------------------------------------------------------
/slides/00-Introduction.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/00-Introduction.pdf


--------------------------------------------------------------------------------
/slides/01-Visualize.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/01-Visualize.pdf


--------------------------------------------------------------------------------
/slides/02-Transform.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/02-Transform.pdf


--------------------------------------------------------------------------------
/slides/03-Tidy.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/03-Tidy.pdf


--------------------------------------------------------------------------------
/slides/04-Case-Study.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/04-Case-Study.pdf


--------------------------------------------------------------------------------
/slides/05-Data-Types.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/05-Data-Types.pdf


--------------------------------------------------------------------------------
/slides/06-Iteration.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/06-Iteration.pdf


--------------------------------------------------------------------------------
/slides/07-Model.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/07-Model.pdf


--------------------------------------------------------------------------------
/slides/08-Organize.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/08-Organize.pdf


--------------------------------------------------------------------------------
/slides/10-wrapping-up.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/10-wrapping-up.pdf


--------------------------------------------------------------------------------