├── .gitignore ├── 00-Getting-started.Rmd ├── 01-Visualize.Rmd ├── 02-Transform.Rmd ├── 03-Tidy.Rmd ├── 04-Case-Study.Rmd ├── 05-Data-Types.Rmd ├── 06-Iterate.Rmd ├── 07-Model.Rmd ├── 08-Organize.Rmd ├── 99-Setup.md ├── README.md ├── cheatsheets ├── dplyr-data-transformation.pdf ├── ggplot2-data-visualization.pdf ├── lubridate-dates-times.pdf ├── purrr-iterate.pdf ├── rstudio-ide.pdf ├── stringr-strings.pdf └── tidyr-readr-data-import-tidy.pdf ├── data-science-in-the-tidyverse.Rproj ├── email-to-participants.md ├── resources ├── 01-setup-login.png ├── 02-setup-temp-project.png ├── 03-setup-navigate-to-project.png ├── 04-setup-rproj-file.png ├── 05-setup-open-project.png ├── 06-setup-inside-project.png ├── 07-setup-all-done.png └── bialik-fridaythe13th-2.png └── slides ├── 00-Introduction.pdf ├── 01-Visualize.pdf ├── 02-Transform.pdf ├── 03-Tidy.pdf ├── 04-Case-Study.pdf ├── 05-Data-Types.pdf ├── 06-Iteration.pdf ├── 07-Model.pdf ├── 08-Organize.pdf └── 10-wrapping-up.pdf /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | *.html 6 | -------------------------------------------------------------------------------- /00-Getting-started.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "R Notebook" 3 | output: html_notebook 4 | --- 5 | 6 | ```{r setup} 7 | library(tidyverse) 8 | ``` 9 | 10 | ## R notebooks 11 | 12 | This is an [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the results appear beneath the code. 13 | 14 | R code goes in **code chunks**, denoted by three backticks. Try executing this chunk by clicking the *Run* button within the chunk or by placing your cursor inside it and pressing *Crtl+Shift+Enter* (Windows) or *Cmd+Shift+Enter* (Mac). 15 | 16 | ```{r} 17 | ggplot(data = mpg) + 18 | geom_point(mapping = aes(x = displ, y = hwy)) 19 | ``` 20 | 21 | Add a new chunk by clicking the *Insert* button on the toolbar, then selecting *R* or by pressing *Ctrl+Alt+I* (Windows) or *Cmd+Option+I* (Mac). 22 | 23 | When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the *Preview* button or press *Ctrl+Shift+K* (Windows) or *Cmd+Shift+K* (Mac) to preview the HTML file). 24 | 25 | The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike *Knit*, *Preview* does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed. 26 | 27 | -------------------------------------------------------------------------------- /01-Visualize.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Visualize Data" 3 | output: html_notebook 4 | --- 5 | 6 | 7 | 8 | ```{r setup} 9 | library(tidyverse) 10 | ``` 11 | 12 | 13 | ```{r} 14 | mpg 15 | ``` 16 | 17 | ## Quiz 18 | 19 | What relationship do you expect to see between engine size (displ) and highway fuel efficiency (hwy)? 20 | 21 | ## Your Turn 1 22 | 23 | Run the code on the slide to make a graph. Pay strict attention to spelling, capitalization, and parentheses! 24 | 25 | ```{r} 26 | 27 | 28 | ``` 29 | 30 | ## Your Turn 2 31 | 32 | Add `color`, `size`, `alpha`, and `shape` aesthetics to your graph. Experiment. 33 | 34 | ```{r} 35 | ggplot(data = mpg) + 36 | geom_point(mapping = aes(x = displ, y = hwy)) 37 | ``` 38 | 39 | ## Your Turn 3 40 | 41 | Replace this scatterplot with one that draws boxplots. Use the cheatsheet. Try your best guess. 42 | 43 | ```{r} 44 | ggplot(data = mpg) + 45 | geom_point(mapping = aes(x = class, y = hwy)) 46 | ``` 47 | 48 | ## Your Turn 4 49 | 50 | Make a histogram of the `hwy` variable from `mpg`. 51 | 52 | ```{r} 53 | ggplot(data = mpg) 54 | 55 | ``` 56 | 57 | ## Your Turn 5 58 | 59 | Make a density plot of `hwy` colored by `class`. 60 | 61 | ```{r} 62 | ggplot(data = mpg) 63 | 64 | ``` 65 | 66 | ## Your Turn 6 67 | 68 | Make a bar chart `class` colored by `class`. 69 | 70 | ```{r} 71 | ggplot(data = mpg) 72 | 73 | ``` 74 | 75 | ## Your Turn 7 76 | 77 | Predict what this code will do. Then run it. 78 | 79 | ```{r} 80 | ggplot(mpg) + 81 | geom_point(aes(displ, hwy)) + 82 | geom_smooth(aes(displ, hwy)) 83 | ``` 84 | 85 | ## Your Turn 8 86 | 87 | What does `getwd()` return? 88 | 89 | ```{r} 90 | getwd() 91 | ``` 92 | 93 | ## Your Turn 9 94 | 95 | Save the last plot and then locate it in the files pane. 96 | 97 | ```{r} 98 | 99 | ``` 100 | 101 | *** 102 | 103 | # Take aways 104 | 105 | You can use this code template to make thousands of graphs with **ggplot2**. 106 | 107 | ```{r, eval = FALSE} 108 | ggplot(data = ) + 109 | (mapping = aes()) 110 | ``` 111 | -------------------------------------------------------------------------------- /02-Transform.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Transform Data" 3 | output: html_notebook 4 | --- 5 | 6 | 7 | 8 | ```{r setup} 9 | library(tidyverse) 10 | library(gapminder) 11 | 12 | # Toy dataset to use 13 | pollution <- tribble( 14 | ~city, ~size, ~amount, 15 | "New York", "large", 23, 16 | "New York", "small", 14, 17 | "London", "large", 22, 18 | "London", "small", 16, 19 | "Beijing", "large", 121, 20 | "Beijing", "small", 56 21 | ) 22 | ``` 23 | 24 | ## gapminder 25 | 26 | ```{r} 27 | gapminder 28 | ``` 29 | 30 | ## Your Turn 1 31 | 32 | See if you can use the logical operators to manipulate our code below to show: 33 | 34 | The data for United States 35 | ```{r} 36 | filter(gapminder, country == "New Zealand") 37 | ``` 38 | 39 | All data for countries in Oceania 40 | ```{r} 41 | filter(gapminder, country == "New Zealand") 42 | ``` 43 | 44 | Rows where the life expectancy is greater than 82 45 | ```{r} 46 | filter(gapminder, country == "New Zealand") 47 | ``` 48 | 49 | 50 | ## Your Turn 2 51 | 52 | Use Boolean operators to alter the code below to return only the rows that contain: 53 | 54 | * United States before 1970 55 | 56 | ```{r} 57 | filter(gapminder, country == "New Zealand", year > 2000) 58 | ``` 59 | 60 | * Countries where life expectancy in 2007 is below 50 61 | 62 | ```{r} 63 | filter(gapminder, country == "New Zealand", year > 2000) 64 | ``` 65 | 66 | * Records for any of "New Zealand", "Canada" or "United States" 67 | 68 | ```{r} 69 | filter(gapminder, country == "New Zealand", year > 2000) 70 | ``` 71 | 72 | ## Your Turn 3 73 | 74 | Use `filter()` to get the records for the US, then plot the life expectancy over time. 75 | 76 | ```{r} 77 | gapminder 78 | ``` 79 | 80 | ## Your Turn 4 81 | 82 | Find the records with the smallest population. 83 | ```{r} 84 | 85 | ``` 86 | 87 | Find the records with the largest GDP per capita. 88 | ```{r} 89 | 90 | ``` 91 | 92 | ## Quiz 93 | 94 | A function that returns a vector the same length as the input is called **vectorized**. 95 | 96 | Which of the following functions are vectorized? 97 | 98 | * `ifelse()` 99 | * `diff()` 100 | * `sum()` 101 | 102 | You might try these: 103 | ```{r} 104 | gapminder %>% 105 | mutate(size = ifelse(pop < 10e06, "small", "large")) 106 | ``` 107 | 108 | ```{r, error = TRUE} 109 | gapminder %>% 110 | mutate(diff_pop = diff(pop)) 111 | ``` 112 | 113 | ```{r} 114 | gapminder %>% 115 | mutate(total_pop = sum(as.numeric(pop))) 116 | ``` 117 | 118 | ## Your Turn 5 119 | 120 | Alter the code to add a `prev_lifeExp` column that contains the life expectancy from the previous record. 121 | 122 | (Hint: use cheatsheet, you want to offset elements by one) 123 | 124 | Extra challenge: Why isn't this quite the 'life expectency five years ago'? 125 | 126 | ```{r} 127 | gapminder %>% 128 | mutate() 129 | ``` 130 | 131 | ## Your Turn 6 132 | 133 | Use summarise() to compute three statistics about the data: 134 | 135 | * The first (minimum) year in the dataset 136 | * The last (maximum) year in the dataset 137 | * The number of countries represented in the data (Hint: use cheatsheet) 138 | 139 | ```{r} 140 | gapminder 141 | ``` 142 | 143 | ## Your Turn 7 144 | 145 | Extract the rows where continent == "Africa" and year == 2007. 146 | 147 | Then use summarise() and summary functions to find: 148 | 149 | 1. The number of unique countries 150 | 2. The median life expectancy 151 | 152 | ```{r} 153 | gapminder 154 | ``` 155 | 156 | ## Your Turn 8 157 | 158 | Find the median life expectancy by continent in 2007. 159 | 160 | ```{r} 161 | gapminder %>% 162 | filter(year == 2007) 163 | ``` 164 | 165 | ## Your Turn 9 166 | 167 | Brainstorm with your neighbor the sequence of operations to find: the country with biggest jump in life expectancy (between any two consecutive records) for each continent. 168 | 169 | ## Your Turn 10 170 | 171 | Find the country with biggest jump in life expectancy (between any two consecutive records) for each continent. 172 | 173 | ```{r} 174 | 175 | 176 | ``` 177 | 178 | ## Your Turn 11 179 | 180 | Use `left_join()` to add the country codes in `country_codes` to the gapminder data. 181 | 182 | ```{r} 183 | country_codes 184 | ``` 185 | 186 | **Challenge**: Which codes in country_codes have no matches in gapminder? 187 | 188 | ```{r} 189 | 190 | ``` 191 | 192 | 193 | *** 194 | 195 | # Take aways 196 | 197 | * Extract cases with `filter()` 198 | * Make new variables, with `mutate()` 199 | * Make tables of summaries with `summarise()` 200 | * Do groupwise operations with `group_by()` 201 | * Connect operations with `%>%` 202 | * Joins are two table verbs 203 | -------------------------------------------------------------------------------- /03-Tidy.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Tidy Data" 3 | output: html_notebook 4 | --- 5 | 6 | 7 | 8 | ```{r setup} 9 | library(tidyverse) 10 | 11 | # Toy data 12 | cases <- tribble( 13 | ~Country, ~"2011", ~"2012", ~"2013", 14 | "FR", 7000, 6900, 7000, 15 | "DE", 5800, 6000, 6200, 16 | "US", 15000, 14000, 13000 17 | ) 18 | 19 | pollution <- tribble( 20 | ~city, ~size, ~amount, 21 | "New York", "large", 23, 22 | "New York", "small", 14, 23 | "London", "large", 22, 24 | "London", "small", 16, 25 | "Beijing", "large", 121, 26 | "Beijing", "small", 56 27 | ) 28 | 29 | 30 | bp_systolic <- tribble( 31 | ~ subject_id, ~ time_1, ~ time_2, ~ time_3, 32 | 1, 120, 118, 121, 33 | 2, 125, 131, NA, 34 | 3, 141, NA, NA 35 | ) 36 | 37 | bp_systolic2 <- tribble( 38 | ~ subject_id, ~ time, ~ systolic, 39 | 1, 1, 120, 40 | 1, 2, 118, 41 | 1, 3, 121, 42 | 2, 1, 125, 43 | 2, 2, 131, 44 | 3, 1, 141 45 | ) 46 | ``` 47 | 48 | ## Tidy and untidy data 49 | 50 | `table1` is tidy: 51 | ```{r} 52 | table1 53 | ``` 54 | 55 | For example, it's easy to add a rate column with `mutate()`: 56 | ```{r} 57 | table1 %>% 58 | mutate(rate = cases/population) 59 | ``` 60 | 61 | `table2` isn't tidy, the count column really contains two variables: 62 | ```{r} 63 | table2 64 | ``` 65 | 66 | It makes it very hard to manipulate. 67 | 68 | ## Your Turn 1 69 | 70 | Is `bp_systolic` tidy? 71 | 72 | ```{r} 73 | bp_systolic2 74 | ``` 75 | 76 | ## Your Turn 2 77 | 78 | Using `bp_systolic2` with `group_by()`, and `summarise()`: 79 | 80 | * Find the average systolic blood pressure for each subject 81 | * Find the last time each subject was measured 82 | 83 | ```{r} 84 | bp_systolic2 85 | ``` 86 | 87 | ## Your Turn 3 88 | 89 | On a sheet of paper, draw how the cases data set would look if it had the same values grouped into three columns: **country**, **year**, **n** 90 | 91 | ## Your Turn 4 92 | 93 | Use `gather()` to reorganize `table4a` into three columns: **country**, **year**, and **cases**. 94 | 95 | ```{r} 96 | table4a 97 | ``` 98 | 99 | ## Your Turn 5 100 | 101 | On a sheet of paper, draw how this data set would look if it had the same values grouped into three columns: **city**, **large**, **small** 102 | 103 | ## Your Turn 6 104 | 105 | Use `spread()` to reorganize `table2` into four columns: **country**, **year**, **cases**, and **population**. 106 | 107 | ```{r} 108 | table2 109 | ``` 110 | 111 | *** 112 | 113 | # Take Aways 114 | 115 | Data comes in many formats but R prefers just one: _tidy data_. 116 | 117 | A data set is tidy if and only if: 118 | 119 | 1. Every variable is in its own column 120 | 2. Every observation is in its own row 121 | 3. Every value is in its own cell (which follows from the above) 122 | 123 | What is a variable and an observation may depend on your immediate goal. 124 | -------------------------------------------------------------------------------- /04-Case-Study.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Case Study: Friday the 13th Effect" 3 | output: html_notebook 4 | --- 5 | 6 | 7 | 8 | ```{r setup} 9 | library(fivethirtyeight) 10 | library(tidyverse) 11 | ``` 12 | 13 | ## Task 14 | 15 | Reproduce this figure from fivethirtyeight's article [*Some People Are Too Superstitious To Have A Baby On Friday The 13th*](https://fivethirtyeight.com/features/some-people-are-too-superstitious-to-have-a-baby-on-friday-the-13th/): 16 | 17 | ![](resources/bialik-fridaythe13th-2.png) 18 | 19 | ## Data 20 | 21 | In the `fivethiryeight` package there are two datasets containing birth data, but for now let's just work with one, `US_births_1994_2003`. Note that since we have data from 1994-2003, our results may differ somewhat from the figure based on 1994-2014. 22 | 23 | ## Your Turn 1 24 | 25 | With your neighbour, brainstorm the steps needed to get the data in a form ready to make the plot. 26 | 27 | ```{r} 28 | US_births_1994_2003 29 | ``` 30 | 31 | ## Some overviews of the data 32 | 33 | Whole time series: 34 | ```{r} 35 | ggplot(US_births_1994_2003, aes(x = date, y = births)) + 36 | geom_line() 37 | ``` 38 | There is so much fluctuation it's really hard to see what is going on. 39 | 40 | Let's try just looking at one year: 41 | ```{r} 42 | US_births_1994_2003 %>% 43 | filter(year == 1994) %>% 44 | ggplot(mapping = aes(x = date, y = births)) + 45 | geom_line() 46 | ``` 47 | Strong weekly pattern accounts for most variation. 48 | 49 | ## Strategy 50 | 51 | Use the figure as a guide for what the data should like to make the final plot. We want to end up with something like: 52 | 53 | --------------------------- 54 | day_of_week avg_diff_13 55 | ------------- ------------- 56 | Mon -2.686 57 | 58 | Tues -1.378 59 | 60 | Wed -3.274 61 | 62 | ... ... 63 | 64 | --------------------------- 65 | 66 | 67 | ## Your Turn 2 68 | 69 | Extract just the 6th, 13th and 20th of each month: 70 | 71 | ```{r} 72 | US_births_1994_2003 %>% 73 | select(-date) 74 | 75 | ``` 76 | 77 | ## Your Turn 3 78 | 79 | Which arrangement is tidy? 80 | 81 | **Option 1:** 82 | 83 | ----------------------------------------------------- 84 | year month date_of_month day_of_week births 85 | ------ ------- --------------- ------------- -------- 86 | 1994 1 6 Thurs 11406 87 | 88 | 1994 1 13 Thurs 11212 89 | 90 | 1994 1 20 Thurs 11682 91 | ----------------------------------------------------- 92 | 93 | **Option 2:** 94 | 95 | ---------------------------------------------------- 96 | year month day_of_week 6 13 20 97 | ------ ------- ------------- ------- ------- ------- 98 | 1994 1 Thurs 11406 11212 11682 99 | ---------------------------------------------------- 100 | 101 | (**Hint:** think about our next step *"Find the percent difference between the 13th and the average of the 6th and 12th"*. In which layout will this be easier using our tidy tools?) 102 | 103 | ## Your Turn 4 104 | 105 | Tidy the filtered data to have the days in columns. 106 | 107 | ```{r} 108 | US_births_1994_2003 %>% 109 | select(-date) %>% 110 | filter(date_of_month %in% c(6, 13, 20)) 111 | ``` 112 | 113 | ## Your Turn 5 114 | 115 | Now use `mutate()` to add columns for: 116 | 117 | * The average of the births on the 6th and 20th 118 | * The percentage difference between the number of births on the 13th and the average of the 6th and 20th 119 | 120 | ```{r} 121 | US_births_1994_2003 %>% 122 | select(-date) %>% 123 | filter(date_of_month %in% c(6, 13, 20)) %>% 124 | spread(date_of_month, births) 125 | ``` 126 | 127 | ## A little additional exploring 128 | 129 | Now we have a percent difference between the 13th and the 6th and 20th of each month, it's probably worth exploring a little (at the very least to check our calculations seem reasonable). 130 | 131 | To make it a little easier let's assign our current data to a variable 132 | ```{r} 133 | births_diff_13 <- US_births_1994_2003 %>% 134 | select(-date) %>% 135 | filter(date_of_month %in% c(6, 13, 20)) %>% 136 | spread(date_of_month, births) %>% 137 | mutate( 138 | avg_6_20 = (`6` + `20`)/2, 139 | diff_13 = (`13` - avg_6_20) / avg_6_20 * 100 140 | ) 141 | ``` 142 | 143 | Then take a look 144 | ```{r} 145 | births_diff_13 %>% 146 | ggplot(mapping = aes(day_of_week, diff_13)) + 147 | geom_point() 148 | ``` 149 | 150 | Looks like we are on the right path. There's a big outlier one Monday 151 | ```{r} 152 | births_diff_13 %>% 153 | filter(day_of_week == "Mon", diff_13 > 10) 154 | ``` 155 | 156 | Seem's to be driven but a particularly low number of births on the 6th of Sep 1999. Maybe a holiday effect? Labour Day was of the 6th of Sep that year. 157 | 158 | ## Your Turn 6 159 | 160 | Summarize each day of the week to have mean of diff_13. 161 | 162 | Then, recreate the fivethirtyeight plot. 163 | 164 | ```{r} 165 | US_births_1994_2003 %>% 166 | select(-date) %>% 167 | filter(date_of_month %in% c(6, 13, 20)) %>% 168 | spread(date_of_month, births) %>% 169 | mutate( 170 | avg_6_20 = (`6` + `20`)/2, 171 | diff_13 = (`13` - avg_6_20) / avg_6_20 * 100 172 | ) 173 | ``` 174 | 175 | ## Extra Challenges 176 | 177 | * If you wanted to use the `US_births_2000_2014` data instead, what would you need to change in the pipeline? How about using both `US_births_1994_2003` and `US_births_2000_2014`? 178 | 179 | * Try not removing the `date` column. At what point in the pipeline does it cause problems? Why? 180 | 181 | * Can you come up with an alternative way to investigate the Friday the 13th effect? Try it out! 182 | 183 | ## Takeaways 184 | 185 | The power of the tidyverse comes from being able to easily combine functions that do simple things well. 186 | 187 | -------------------------------------------------------------------------------- /05-Data-Types.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Data Types" 3 | output: html_notebook 4 | --- 5 | 6 | 7 | 8 | ```{r setup} 9 | library(tidyverse) 10 | library(lubridate) 11 | 12 | # Example of a factor 13 | eyes <- factor(x = c("blue", "green", "green"), 14 | levels = c("blue", "brown", "green")) 15 | 16 | # An example for times/dates 17 | library(fivethirtyeight) 18 | births <- US_births_1994_2003 %>% 19 | select(date, births) 20 | ``` 21 | 22 | ## Warm-up / Review 23 | 24 | Using the data `gss_cat`, find the average hours of tv watched (`tvhours`) for each category of marital status (`marital`). 25 | 26 | ```{r} 27 | gss_cat 28 | ``` 29 | 30 | ## Your Turn 1 31 | 32 | What kind of object is the `marital` variable? 33 | 34 | ```{r} 35 | gss_cat 36 | ``` 37 | 38 | Brainstorm with your neighbor, all the things you know about that kind of object. 39 | 40 | # Factors 41 | 42 | ## Your Turn 2 43 | 44 | Fix your summary of average hours of tv watched (`tvhours`) by marital status (`marital`), to drop missing values in `tvhours`, then create a plot to examine the results. 45 | 46 | ```{r} 47 | gss_cat %>% 48 | group_by(marital) %>% 49 | summarise(avg_tvhours = mean(tvhours)) 50 | ``` 51 | 52 | ## Your Turn 3 53 | 54 | Fill in the blanks (` `) to explore the average hours of tv watched by religion. 55 | 56 | ```{r, error = TRUE} 57 | gss_cat %>% 58 | drop_na( ) %>% 59 | group_by( ) %>% 60 | summarise( ) %>% 61 | ggplot() + 62 | geom_point(mapping = aes(x = , y = )) 63 | ``` 64 | 65 | ## Quiz 66 | 67 | Why is this plot not very useful? 68 | 69 | ```{r} 70 | gss_cat %>% 71 | drop_na(tvhours) %>% 72 | group_by(denom) %>% 73 | summarise(avg_tvhours = mean(tvhours)) %>% 74 | ggplot() + 75 | geom_point(mapping = aes(x = avg_tvhours, 76 | y = fct_reorder(denom, avg_tvhours))) 77 | ``` 78 | 79 | ## Your Turn 4 80 | 81 | Edit the code to also relabel some other Baptist denominations: 82 | 83 | * "Baptist-dk which" -> "Baptist - Don't Know" 84 | * "Other baptists" -> "Baptist = Other" 85 | 86 | ```{r} 87 | gss_cat %>% 88 | mutate(denom = fct_recode(denom, 89 | "Baptist - Southern" = "Southern baptist") 90 | ) %>% 91 | pull(denom) %>% 92 | levels() 93 | ``` 94 | 95 | ## Your Turn 5 96 | 97 | What does the function `detect_denom()` do? 98 | 99 | ```{r} 100 | detect_denom <- function(x){ 101 | case_when( 102 | str_detect(x, "[Bb]ap") ~ "Baptist", 103 | str_detect(x, "[Pp]res") ~ "Presbyterian", 104 | str_detect(x, "[Ll]uth") ~ "Lutheran", 105 | str_detect(x, "[Mm]eth") ~ "Methodist", 106 | TRUE ~ x 107 | ) 108 | } 109 | 110 | gss_cat %>% pull(denom) %>% levels() %>% detect_denom() 111 | ``` 112 | 113 | # Strings 114 | 115 | With your neighbor, predict what these might return: 116 | 117 | ```{r} 118 | strings <- c("Apple", "Pineapple", "Orange") 119 | 120 | str_detect(strings, pattern = "pp") 121 | str_detect(strings, pattern = "apple") 122 | str_detect(strings, pattern = "[Aa]pple") 123 | ``` 124 | 125 | Then run them! 126 | 127 | # Times and Dates 128 | 129 | ## Your Turn 7 130 | 131 | For each of the following formats (of the same date), pick the right `ymd()` function to parse them: 132 | 133 | ```{r} 134 | "2018 Feb 01" 135 | "2-1-18" 136 | "01/02/2018" 137 | ``` 138 | 139 | ## Your Turn 8 140 | 141 | Fill in the blanks to: 142 | 143 | * Extract the month from date. 144 | * Extract the year from date. 145 | * Find the total births for each year/month. 146 | * Plot the results as a line chart. 147 | 148 | ```{r, error = TRUE} 149 | births %>% 150 | mutate(year = ___, 151 | month = ___) %>% 152 | group_by(___, ___) %>% 153 | summarise(total_births = ___) %>% 154 | ggplot() + 155 | geom_line(aes(x = month, y = total_births, group = year)) 156 | ``` 157 | 158 | 159 | # Take Aways 160 | 161 | Dplyr gives you three _general_ functions for manipulating data: `mutate()`, `summarise()`, and `group_by()`. Augment these with functions from the packages below, which focus on specific types of data. 162 | 163 | Package | Data Type 164 | --------- | -------- 165 | forcats | factors 166 | stringr | strings 167 | hms | times 168 | lubridate | dates and times 169 | -------------------------------------------------------------------------------- /06-Iterate.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Iteration" 3 | output: html_document 4 | --- 5 | 6 | 7 | 8 | ```{r setup} 9 | library(tidyverse) 10 | 11 | # Toy data 12 | set.seed(1000) 13 | exams <- list( 14 | student1 = round(runif(10, 50, 100)), 15 | student2 = round(runif(10, 50, 100)), 16 | student3 = round(runif(10, 50, 100)), 17 | student4 = round(runif(10, 50, 100)), 18 | student5 = round(runif(10, 50, 100)) 19 | ) 20 | 21 | extra_credit <- list(0, 0, 10, 10, 15) 22 | ``` 23 | 24 | ## Your Turn 1 25 | 26 | What kind of object is `mod`? Why are models stored as this kind of object? 27 | 28 | ```{r} 29 | mod <- lm(price ~ carat + cut + color + clarity, data = diamonds) 30 | View(mod) 31 | ``` 32 | 33 | ## Quiz 34 | 35 | What's the difference between a list and an **atomic** vector? 36 | 37 | Atomic vectors are: "logical", "integer", "numeric" (synonym "double"), "complex", "character" and "raw" vectors. 38 | 39 | ## Your Turn 2 40 | 41 | Here is a list: 42 | 43 | ```{r} 44 | a_list <- list(nums = c(8, 9), 45 | log = TRUE, 46 | cha = c("a", "b", "c")) 47 | ``` 48 | 49 | Here are two subsetting commands. Do they return the same values? Run the code chunk above, _and then_ run the code chunks below to confirm 50 | 51 | ```{r} 52 | a_list["nums"] 53 | ``` 54 | 55 | ```{r} 56 | a_list$nums 57 | ``` 58 | 59 | ## Your Turn 3 60 | 61 | What will each of these return? Run the code chunks to confirm. 62 | 63 | ```{r} 64 | vec <- c(-2, -1, 0, 1, 2) 65 | abs(vec) 66 | ``` 67 | 68 | ```{r, error = TRUE} 69 | lst <- list(-2, -1, 0, 1, 2) 70 | abs(lst) 71 | ``` 72 | 73 | ## Your Turn 4 74 | 75 | Run the code in the chunks. What does it return? 76 | 77 | ```{r} 78 | list(student1 = mean(exams$student1), 79 | student2 = mean(exams$student2), 80 | student3 = mean(exams$student3), 81 | student4 = mean(exams$student4), 82 | student5 = mean(exams$student5)) 83 | ``` 84 | 85 | ```{r} 86 | library(purrr) 87 | map(exams, mean) 88 | ``` 89 | 90 | ## Your Turn 5 91 | 92 | Calculate the variance (`var()`) of each student’s exam grades. 93 | 94 | ```{r} 95 | exams 96 | ``` 97 | 98 | ## Your Turn 6 99 | 100 | Calculate the max grade (`max()`)for each student. Return the result as a vector. 101 | 102 | ```{r} 103 | exams 104 | ``` 105 | 106 | ## Your Turn 7 107 | 108 | Write a function that counts the best exam twice and then takes the average. Use it to grade all of the students. 109 | 110 | 1. Write code that solves the problem for a real object 111 | 2. Wrap the code in `function(){}` to save it 112 | 3. Add the name of the real object as the function argument 113 | 114 | ```{r} 115 | vec <- exams[[1]] 116 | 117 | 118 | ``` 119 | 120 | ### Your Turn 8 121 | 122 | Compute a final grade for each student, where the final grade is the average test score plus any `extra_credit` assigned to the student. Return the results as a double (i.e. numeric) vector. 123 | 124 | ```{r} 125 | 126 | ``` 127 | 128 | 129 | *** 130 | 131 | # Take Aways 132 | 133 | Lists are a useful way to organize data, but you need to arrange manually for functions to iterate over the elements of a list. 134 | 135 | You can do this with the `map()` family of functions in the purrr package. 136 | 137 | To write a function, 138 | 139 | 1. Write code that solves the problem for a real object 140 | 2. Wrap the code in `function(){}` to save it 141 | 3. Add the name of the real object as the function argument 142 | 143 | This sequence will help prevent bugs in your code (and reduce the time you spend correcting bugs). 144 | -------------------------------------------------------------------------------- /07-Model.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Model" 3 | output: html_notebook 4 | --- 5 | 6 | ```{r setup, message=FALSE} 7 | library(tidyverse) 8 | library(modelr) 9 | library(broom) 10 | 11 | wages <- heights %>% filter(income > 0) 12 | ``` 13 | 14 | ## Your Turn 1 15 | 16 | Fit the model on the slide and then examine the output. What does it look like? 17 | 18 | ```{r} 19 | mod_e <- lm(log(income) ~ education, data = wages) 20 | mod_e 21 | ``` 22 | 23 | ## Your Turn 2 24 | 25 | Use a pipe to model `log(income)` against `height`. Then use broom and dplyr functions to extract: 26 | 27 | 1. The **coefficient estimates** and their related statistics 28 | 2. The **adj.r.squared** and **p.value** for the overall model 29 | 30 | ```{r, error = TRUE} 31 | mod_h <- wages %>% lm( ) 32 | 33 | 34 | ``` 35 | 36 | ## Your Turn 3 37 | 38 | Model `log(income)` against `education` _and_ `height`. Do the coefficients change? 39 | 40 | ```{r, error = TRUE} 41 | mod_eh <- wages %>% lm( ) 42 | 43 | ``` 44 | 45 | ## Your Turn 4 46 | 47 | Model `log(income)` against `education` and `height` and `sex`. Can you interpret the coefficients? 48 | 49 | ```{r, error = TRUE} 50 | mod_ehs <- wages %>% lm( ) 51 | ``` 52 | 53 | ## Your Turn 5 54 | 55 | Use a broom function and ggplot2 to make a line graph of `height` vs `.fitted` for our heights model, `mod_h`. 56 | 57 | _Bonus: Overlay the plot on the original data points._ 58 | 59 | ```{r} 60 | mod_h <- wages %>% lm(log(income) ~ height, data = .) 61 | 62 | ``` 63 | 64 | ## Your Turn 6 65 | 66 | Repeat the process to make a line graph of `height` vs `.fitted` colored by `sex` for model `mod_ehs`. Are the results interpretable? Add `+ facet_wrap(~education)` to the end of your code. What happens? 67 | 68 | ```{r} 69 | mod_ehs <- wages %>% lm(log(income) ~ education + height + sex, data = .) 70 | 71 | ``` 72 | 73 | ## Your Turn 7 74 | 75 | Use one of `spread_predictions()` or `gather_predictions()` to make a line graph of `height` vs `pred` colored by `model` for each of mod_h, mod_eh, and mod_ehs. Are the results interpretable? 76 | 77 | Add `+ facet_grid(sex ~ education)` to the end of your code. What happens? 78 | 79 | ```{r warning = FALSE, message = FALSE} 80 | mod_h <- wages %>% lm(log(income) ~ height, data = .) 81 | mod_eh <- wages %>% lm(log(income) ~ education + height, data = .) 82 | mod_ehs <- wages %>% lm(log(income) ~ education + height + sex, data = .) 83 | 84 | 85 | ``` 86 | 87 | *** 88 | 89 | # Take Aways 90 | 91 | * Use `glance()`, `tidy()`, and `augment()` from the **broom** package to return model values in a data frame. 92 | 93 | * Use `add_predictions()` or `gather_predictions()` or `spread_predictions()` from the **modelr** package to visualize predictions. 94 | 95 | * Use `add_residuals()` or `gather_residuals()` or `spread_residuals()` from the **modelr** package to visualize residuals. 96 | 97 | -------------------------------------------------------------------------------- /08-Organize.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Organize with List Columns" 3 | output: html_notebook 4 | --- 5 | 6 | 7 | 8 | ```{r setup} 9 | library(tidyverse) 10 | library(gapminder) 11 | library(broom) 12 | 13 | nz <- gapminder %>% 14 | filter(country == "New Zealand") 15 | us <- gapminder %>% 16 | filter(country == "United States") 17 | ``` 18 | 19 | ## Your turn 1 20 | 21 | How has life expectancy changed over time? 22 | Make a line plot of lifeExp vs. year grouped by country. 23 | Set alpha to 0.2, to see the results better. 24 | 25 | ```{r} 26 | gapminder 27 | 28 | 29 | ``` 30 | 31 | ## Quiz 32 | 33 | How is a data frame/tibble similar to a list? 34 | 35 | ## Quiz 36 | 37 | If one of the elements of a list can be another list, 38 | can one of the columns of a data frame be another list? 39 | 40 | ## Your turn 2 41 | 42 | Run this chunk: 43 | ```{r} 44 | gapminder_nested <- gapminder %>% 45 | group_by(country) %>% 46 | nest() 47 | 48 | fit_model <- function(df) lm(lifeExp ~ year, data = df) 49 | 50 | gapminder_nested <- gapminder_nested %>% 51 | mutate(model = map(data, fit_model)) 52 | 53 | get_rsq <- function(mod) glance(mod)$r.squared 54 | 55 | gapminder_nested <- gapminder_nested %>% 56 | mutate(r.squared = map_dbl(model, get_rsq)) 57 | ``` 58 | 59 | Then filter `gapminder_nested` to find the countries with r.squared less than 0.5. 60 | 61 | ```{r} 62 | 63 | ``` 64 | 65 | ## Your Turn 3 66 | 67 | Edit the code in the chunk provided to instead find and plot countries with a slope above 0.6 years/year. 68 | 69 | ```{r} 70 | get_slope <- function(mod) { 71 | tidy(mod) %>% filter(term == "year") %>% pull(estimate) 72 | } 73 | 74 | # Add new column with r-sqaured 75 | gapminder_nested <- gapminder_nested %>% 76 | mutate(r.squared = map_dbl(model, get_rsq)) 77 | 78 | # filter out low r-squared countries 79 | poor_fit <- gapminder_nested %>% 80 | filter(r.squared < 0.5) 81 | 82 | # unnest and plot result 83 | unnest(poor_fit, data) %>% 84 | ggplot(aes(x = year, y = lifeExp)) + 85 | geom_line(aes(color = country)) 86 | ``` 87 | 88 | ## Your Turn 4 89 | 90 | **Challenge:** 91 | 92 | 1. Create your own copy of `gapminder_nested` and then add one more list column: `output` which contains the output of `augment()` for each model. 93 | 94 | 95 | ```{r} 96 | 97 | ``` 98 | 99 | # Take away 100 | 101 | -------------------------------------------------------------------------------- /99-Setup.md: -------------------------------------------------------------------------------- 1 | # Getting Set Up 2 | 3 | During the workshop you'll do your work on [rstudio.cloud](https://rstudio.cloud/). This provides an easy way for me to share all the materials with you, and removes the hassle of getting the right versions of R, RStudio or any packages. 4 | 5 | ## To get started: 6 | 7 | To get set up follow these steps: 8 | 9 | 1. Visit the project at https://rstudio.cloud/project/10871 10 | 11 | 2. Log in using google, github, shinyapps.io or "Sign Up". 12 | 13 | ![](resources/01-setup-login.png) 14 | 15 | 3. The "Data Science in the tidyverse" project will open, but it's a *Temporary copy*. Click *Save a copy*. 16 | 17 | ![](resources/02-setup-temp-project.png) 18 | 19 | 4. Now the "Data Science in the tidyverse" project will open again, but this time it is your own copy. Navigate to the project folder: 20 | 21 | ![](resources/03-setup-navigate-to-project.png) 22 | 23 | 5. Inside the project folder navigate to the "data-science-in-the-tidyverse.Rproj" file and click it. 24 | 25 | ![](resources/04-setup-rproj-file.png) 26 | 27 | 6. You'll be asked if you want to open the project, hit Yes. 28 | 29 | ![](resources/05-setup-open-project.png) 30 | 31 | 7. All going well, you should now see your project looking like this. Now, open "00-Getting-started.Rmd" 32 | 33 | ![](resources/06-setup-inside-project.png) 34 | 35 | 8. You're all set! You might like to read through "00-Getting-started.Rmd" and do what it tells you. 36 | 37 | ![](resources/07-setup-all-done.png) 38 | 39 | ## Once you are set up 40 | 41 | You can access your copy of the project from *Your Workspace* on [rstudio.cloud](https://rstudio.cloud/). -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | This is the repo for *"Data Science in the tidyverse"* given at `rstudio::conf(2018)` in Jan 2018. 2 | 3 | ## Description 4 | 5 | This is a two-day hands on workshop based on the book [R for Data Science](http://r4ds.had.co.nz/). This workshop is designed for people who are familiar with R and want to learn how to achieve their data analysis goals the "tidy" way. You will learn how to visualize, transform, and model data in R and work with date-times, character strings, and untidy data formats. Along the way, you will learn and use many packages from the tidyverse including ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, lubridate, and forcats. 6 | 7 | ## Software Requirements 8 | 9 | You'll be using RStudio Cloud, so (all going well) on the day of the workshop all you'll need is **a laptop that can access the internet** (wifi will be available). 10 | 11 | However, as a backup (e.g. in case of wifi issues) it would be best if you also have R and RStudio installed locally, along with the following packages: 12 | 13 | install.packages(c("tidyverse", "fivethirtyeight", 14 | "gapminder", "rmarkdown")) 15 | 16 | Don't forget to bring your power cable! 17 | 18 | ## Instructor Info 19 | 20 | Charlotte Wickham 21 | 22 | - [cwickham@gmail.com](cwickham@gmail.com) 23 | - [cwick.co.nz](http://www.cwick.co.nz) 24 | - @[cvwickham](http://www.twitter.com/cvwickham) 25 | 26 | ## License 27 | 28 | Creative Commons License 29 | 30 | *Data Science in the tidyverse* by Charlotte Wickham is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at https://github.com/rstudio/master-the-tidyverse. 31 | -------------------------------------------------------------------------------- /cheatsheets/dplyr-data-transformation.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/cheatsheets/dplyr-data-transformation.pdf -------------------------------------------------------------------------------- /cheatsheets/ggplot2-data-visualization.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/cheatsheets/ggplot2-data-visualization.pdf -------------------------------------------------------------------------------- /cheatsheets/lubridate-dates-times.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/cheatsheets/lubridate-dates-times.pdf -------------------------------------------------------------------------------- /cheatsheets/purrr-iterate.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/cheatsheets/purrr-iterate.pdf -------------------------------------------------------------------------------- /cheatsheets/rstudio-ide.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/cheatsheets/rstudio-ide.pdf -------------------------------------------------------------------------------- /cheatsheets/stringr-strings.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/cheatsheets/stringr-strings.pdf -------------------------------------------------------------------------------- /cheatsheets/tidyr-readr-data-import-tidy.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/cheatsheets/tidyr-readr-data-import-tidy.pdf -------------------------------------------------------------------------------- /data-science-in-the-tidyverse.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | -------------------------------------------------------------------------------- /email-to-participants.md: -------------------------------------------------------------------------------- 1 | *(This will be sent to registered particpants by email, but I'm also posting here as a convenient place to field any questions/issues.)* 2 | 3 | Hello, 4 | 5 | I'm excited to meet you all next week in "Data Science in the tidyverse"! 6 | 7 | The workshop starts Weds Jan 31st at 9am in Seaport H, but you may want to arrive a little before 9am to get set up. The TAs and I will be there to help you if you need it. 8 | 9 | **What should you bring?** 10 | 11 | A **laptop** that can access the internet (wifi will be available) and your **power cable**. 12 | 13 | You'll be using [RStudio Cloud](https://rstudio.cloud/), so on the day of the workshop (all going well), you shouldn't need anything else. 14 | 15 | However, as a backup (e.g. in case of wifi issues), **you should also have**: 16 | 17 | * R and RStudio installed locally, 18 | 19 | * the following packages: 20 | ``` 21 | install.packages(c("tidyverse", "fivethirtyeight", 22 | "gapminder", "rmarkdown")) 23 | ``` 24 | 25 | * and the materials as a .ZIP from https://github.com/cwickham/data-science-in-tidyverse/archive/master.zip 26 | 27 | **Is this class for me?** 28 | 29 | This class is designed for people who have some experience with R, but want to learn about tools in the tidyverse. In, particular, 30 | here are some things I'll assume you already know how to do: 31 | 32 | * You know how to assign variables in R 33 | * You recognize the basic syntax for calling R functions, i.e. `function_name(arg_1 = arg1_value, ...)` etc. 34 | * You can predict the output of something like: `mean(c(1, 2, 3))` 35 | 36 | Here are some others that I'll remind you about, but assume you have probably seen before: 37 | 38 | * How to open an R script, or R notebook and execute the code in it 39 | * How to get an overview of a data frame or tibble (things like `head()`, `names()`, `dplyr::glimpse()`, `View()`) 40 | * The difference between lists and atomic vectors (double, integer, character etc.) 41 | 42 | If those items seem foreign, you might be better served in "Intro to R & RStudio". You'll still cover some of the tidyverse, but the pace will be a little slower, and more time will be spent getting familiar with R and RStudio. 43 | 44 | **Is there anything you should do before arriving?** 45 | 46 | Visit [rstudio.cloud](https://rstudio.cloud/) and check you can log in (you can use your existing google or github account, or sign up for an account). 47 | 48 | If you are dying to get started, feel free to poke around the materials on [github](https://github.com/cwickham/data-science-in-tidyverse), or even try [getting set up](https://github.com/cwickham/data-science-in-tidyverse/blob/master/99-Setup.md) early, but be aware that there might some small changes between now and next Wednesday. 49 | 50 | If you have any other questions, feel free to ask them over at https://community.rstudio.com/t/information-for-data-science-in-the-tidyverse/4539 51 | 52 | 53 | -------------------------------------------------------------------------------- /resources/01-setup-login.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/resources/01-setup-login.png -------------------------------------------------------------------------------- /resources/02-setup-temp-project.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/resources/02-setup-temp-project.png -------------------------------------------------------------------------------- /resources/03-setup-navigate-to-project.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/resources/03-setup-navigate-to-project.png -------------------------------------------------------------------------------- /resources/04-setup-rproj-file.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/resources/04-setup-rproj-file.png -------------------------------------------------------------------------------- /resources/05-setup-open-project.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/resources/05-setup-open-project.png -------------------------------------------------------------------------------- /resources/06-setup-inside-project.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/resources/06-setup-inside-project.png -------------------------------------------------------------------------------- /resources/07-setup-all-done.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/resources/07-setup-all-done.png -------------------------------------------------------------------------------- /resources/bialik-fridaythe13th-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/resources/bialik-fridaythe13th-2.png -------------------------------------------------------------------------------- /slides/00-Introduction.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/00-Introduction.pdf -------------------------------------------------------------------------------- /slides/01-Visualize.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/01-Visualize.pdf -------------------------------------------------------------------------------- /slides/02-Transform.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/02-Transform.pdf -------------------------------------------------------------------------------- /slides/03-Tidy.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/03-Tidy.pdf -------------------------------------------------------------------------------- /slides/04-Case-Study.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/04-Case-Study.pdf -------------------------------------------------------------------------------- /slides/05-Data-Types.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/05-Data-Types.pdf -------------------------------------------------------------------------------- /slides/06-Iteration.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/06-Iteration.pdf -------------------------------------------------------------------------------- /slides/07-Model.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/07-Model.pdf -------------------------------------------------------------------------------- /slides/08-Organize.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/08-Organize.pdf -------------------------------------------------------------------------------- /slides/10-wrapping-up.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cwickham/data-science-in-tidyverse/bcd8438c9be11d0e47f062958389e329eb00b766/slides/10-wrapping-up.pdf --------------------------------------------------------------------------------