├── .gitignore ├── img └── title-slide.png ├── ds-puzzles.Rproj ├── 11_sandwiches ├── 11_soln.R ├── 11_text.Rmd ├── sample-data-walkthrough.Rmd ├── sample-data-walkthrough.md └── 11_data.csv ├── README.Rmd └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | .Rhistory 2 | .RData 3 | .Rproj.user 4 | -------------------------------------------------------------------------------- /img/title-slide.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/isteves/ds-puzzles/HEAD/img/title-slide.png -------------------------------------------------------------------------------- /ds-puzzles.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 4 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | -------------------------------------------------------------------------------- /11_sandwiches/11_soln.R: -------------------------------------------------------------------------------- 1 | #' --- 2 | #' title: Sandwiches 3 | #' --- 4 | #' 5 | #' Use _Ctrl (Cmd) + Shift + K_ to render this file 6 | #' 7 | #+ r setup, include = FALSE 8 | options(tidyverse.quiet = TRUE) 9 | 10 | #+ r 11 | library(tidyverse) 12 | library(here) 13 | data_path <- here::here('11_sandwiches', '11_data.csv') 14 | 15 | # YOUR SOLUTION CODE HERE 16 | 17 | -------------------------------------------------------------------------------- /11_sandwiches/11_text.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | output: html_document 3 | --- 4 | 5 | ```{r setup, include = FALSE} 6 | knitr::opts_chunk$set(echo = FALSE, warning = FALSE, error = FALSE) 7 | library(tidyverse) 8 | ``` 9 | 10 | The neighborhood sandwich store makes the _best_ sandwiches! They’ve got everything from classics like BLTs to more unusual options like Fluffernutters. Since many of their specialty ingredients keep going bad, they've decided to cut their selection and only focus on their best-selling sandwich. 11 | 12 | To help with the decision, the storeowners have collected data on their customers’ favorite sandwiches. Most people listed several varieties (in no particular order). Here’s a sample of the data: 13 | 14 | ```{r warning = FALSE, echo = FALSE} 15 | library(tidyverse) 16 | sw <- tibble::tribble( 17 | ~names, ~sandwiches, 18 | "Abby", "Denver; BLT; Torta ahogada; Barbecue", 19 | "Abigail", "BLT; Ftira; Primanti; Ice cream; Choripán", 20 | "Adam", "Corned beef; Montadito; Cheesesteak; Tripleta; Dagwood; Jambon-beurre", 21 | "Alexa", "Dagwood; Mortadella", 22 | "Alexandria", "Slider; Beschuit met muisjes; Chicken salad", 23 | "Ana", "Fried brain; Polish boy; Vegetable; Pudgy Pie; Dagwood" 24 | ) 25 | 26 | sw %>% knitr::kable() 27 | ``` 28 | 29 | In this sample, the Dagwood sandwich is the most popular. 30 | 31 | In the full dataset, **what is the most popular sandwich among the customers?** 32 | 33 | -------------------------------------------------------------------------------- /11_sandwiches/sample-data-walkthrough.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Sample data walk-through" 3 | output: md_document 4 | --- 5 | 6 | ```{r setup, include=FALSE} 7 | knitr::opts_chunk$set(echo = TRUE) 8 | options(tidyverse.quiet = TRUE) 9 | ``` 10 | 11 |  12 | 13 | In the puzzle, we're given some sample data to use as a _test case_. That is, if we can determine the most popular sandwich in our sample, we'll be most--if not all--of the way to answering this question for our full dataset. 14 | 15 | ```{r warning = FALSE, echo = FALSE} 16 | library(tidyverse) 17 | sw <- tibble::tribble( 18 | ~names, ~sandwiches, 19 | "Abby", "Denver; BLT; Torta ahogada; Barbecue", 20 | "Abigail", "BLT; Ftira; Primanti; Ice cream; Choripán", 21 | "Adam", "Corned beef; Montadito; Cheesesteak; Tripleta; Dagwood; Jambon-beurre", 22 | "Alexa", "Dagwood; Mortadella", 23 | "Alexandria", "Slider; Beschuit met muisjes; Chicken salad", 24 | "Ana", "Fried brain; Polish boy; Vegetable; Pudgy Pie; Dagwood" 25 | ) 26 | 27 | sw %>% knitr::kable() 28 | ``` 29 | 30 | Let's take a look at the sample data in tibble-mode. Note that there are a few non-English letters that could give you some trouble depending on how you import the data into R. Sometimes the letters get "translated" into a mix of letters and punctuation (i.e. "Choripán" rather than "Choripán"). 31 | 32 | ```{r} 33 | sw 34 | ``` 35 | 36 | As first step, let's separate out the sandwiches into individual observations using the handy `tidyr` function, `separate_rows()`. 37 | 38 | ```{r} 39 | sw %>% 40 | separate_rows(sandwiches, sep = "; ") 41 | ``` 42 | 43 | Keep in mind that omitting the space in the separator may cause some of the results not to match up. For example, "BLT" and " BLT" would require an extra cleaning step, such as `mutate(sandwiches = str_trim(sandwiches))`. 44 | 45 | Next, we can count the sandwiches to determine which type is the most popular. Adding `sort = TRUE` brings the most popular sandwich to the top of the tibble. 46 | 47 | ```{r} 48 | sw %>% 49 | separate_rows(sandwiches, sep = "; ") %>% 50 | count(sandwiches, sort = TRUE) 51 | ``` 52 | 53 | That's it! With this small sample, you've got the basics of a working wrangling script that you can try out on the full data. 54 | -------------------------------------------------------------------------------- /README.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | output: md_document 3 | --- 4 | # Teaching data science with puzzles 5 | ### useR! 2019 [slides](https://speakerdeck.com/isteves/teaching-data-science-with-puzzles-fd46c088-e5d5-4297-9629-60e81cc6403c) 6 | ### rstudio::conf 2019, [slides](https://speakerdeck.com/isteves/teaching-data-science-with-puzzles), [video](https://resources.rstudio.com/rstudio-conf-2019/teaching-data-science-with-puzzles) 7 | 8 | Of the many coding puzzles on the web, few focus on the programming skills needed for handling untidy data. During my summer internship at RStudio, I worked with Jenny Bryan to develop a series of data science puzzles known as the "Tidies of March." These puzzles isolate data wrangling tasks into bite-sized pieces to nurture core data science skills such as importing, reshaping, and summarizing data. We also provide access to puzzles and puzzle data directly in R through an accompanying Tidies of March package. I will show how this package models best practices for both data wrangling and project management. 9 | 10 | [](https://speakerdeck.com/isteves/teaching-data-science-with-puzzles) 11 | 12 | If you'd like to take a closer look at the sandwiches example from the talk, check out the [sandwiches folder](https://github.com/isteves/ds-puzzles/tree/master/11_sandwiches) in this repo. 13 | 14 | ## Additional resources 15 | 16 | - [How to name files](https://speakerdeck.com/jennybc/how-to-name-files) talk by Jenny Bryan 17 | - [A summer of puzzles at RStudio](https://irene.rbind.io/post/summer-rstudio/) blogpost about my internship experience 18 | - [it’s not the maths, it’s the code - how testing has changed my workflow](http://cantabile.rbind.io/posts/2019-01-05-its-not-not-the-math-its-the-code/) blogpost by Charles T. Gray 19 | 20 | Packages mentioned in my talk: 21 | 22 | - [usethis](https://usethis.r-lib.org/) - a workflow package: it automates repetitive tasks that arise during project setup and development, both for R packages and non-package projects 23 | - [testthat](https://testthat.r-lib.org/) - to make testing fun 24 | - [testrmd](https://github.com/ropenscilabs/testrmd) - test chunks for RMarkdown 25 | - [reprex](https://reprex.tidyverse.org/) - render bits of R code for sharing, e.g., on GitHub or StackOverflow 26 | - [rmarkdown](https://rmarkdown.rstudio.com/) - create reproducible text and analyses 27 | 28 | ## Thank yous 29 | 30 | A big thanks to the Tidyverse team, fellow interns, and RStudio folks for a fun & interesting summer! 31 | 32 | Also thanks to Maria Novosolov, Alex Slavenko, Alex Hayes, Steven Chong, and Julien Brun for their comments and support in early versions of this talk! 33 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Teaching data science with puzzles 2 | ================================== 3 | 4 | ### useR! 2019 [slides](https://speakerdeck.com/isteves/teaching-data-science-with-puzzles-fd46c088-e5d5-4297-9629-60e81cc6403c) 5 | 6 | ### rstudio::conf 2019, [slides](https://speakerdeck.com/isteves/teaching-data-science-with-puzzles), [video](https://resources.rstudio.com/rstudio-conf-2019/teaching-data-science-with-puzzles) 7 | 8 | Of the many coding puzzles on the web, few focus on the programming 9 | skills needed for handling untidy data. During my summer internship at 10 | RStudio, I worked with Jenny Bryan to develop a series of data science 11 | puzzles known as the "Tidies of March." These puzzles isolate data 12 | wrangling tasks into bite-sized pieces to nurture core data science 13 | skills such as importing, reshaping, and summarizing data. We also 14 | provide access to puzzles and puzzle data directly in R through an 15 | accompanying Tidies of March package. I will show how this package 16 | models best practices for both data wrangling and project management. 17 | 18 |  19 | 20 | If you'd like to take a closer look at the sandwiches example from the 21 | talk, check out the [sandwiches 22 | folder](https://github.com/isteves/ds-puzzles/tree/master/11_sandwiches) 23 | in this repo. 24 | 25 | Additional resources 26 | -------------------- 27 | 28 | - [How to name 29 | files](https://speakerdeck.com/jennybc/how-to-name-files) talk by 30 | Jenny Bryan 31 | - [A summer of puzzles at 32 | RStudio](https://irene.rbind.io/post/summer-rstudio/) blogpost about 33 | my internship experience 34 | - [it’s not the maths, it’s the code - how testing has changed my 35 | workflow](http://cantabile.rbind.io/posts/2019-01-05-its-not-not-the-math-its-the-code/) 36 | blogpost by Charles T. Gray 37 | 38 | Packages mentioned in my talk: 39 | 40 | - [usethis](https://usethis.r-lib.org/) - a workflow package: it 41 | automates repetitive tasks that arise during project setup and 42 | development, both for R packages and non-package projects 43 | - [testthat](https://testthat.r-lib.org/) - to make testing fun 44 | - [testrmd](https://github.com/ropenscilabs/testrmd) - test chunks for 45 | RMarkdown 46 | - [reprex](https://reprex.tidyverse.org/) - render bits of R code for 47 | sharing, e.g., on GitHub or StackOverflow 48 | - [rmarkdown](https://rmarkdown.rstudio.com/) - create reproducible 49 | text and analyses 50 | 51 | Thank yous 52 | ---------- 53 | 54 | A big thanks to the Tidyverse team, fellow interns, and RStudio folks 55 | for a fun & interesting summer! 56 | 57 | Also thanks to Maria Novosolov, Alex Slavenko, Alex Hayes, Steven Chong, 58 | and Julien Brun for their comments and support in early versions of this 59 | talk! 60 | -------------------------------------------------------------------------------- /11_sandwiches/sample-data-walkthrough.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | In the puzzle, we’re given some sample data to use as a *test case*. 4 | That is, if we can determine the most popular sandwich in our sample, 5 | we’ll be most–if not all–of the way to answering this question for our 6 | full dataset. 7 | 8 |
| names | 12 |sandwiches | 13 |
|---|---|
| Abby | 18 |Denver; BLT; Torta ahogada; Barbecue | 19 |
| Abigail | 22 |BLT; Ftira; Primanti; Ice cream; Choripán | 23 |
| Adam | 26 |Corned beef; Montadito; Cheesesteak; Tripleta; Dagwood; Jambon-beurre | 27 |
| Alexa | 30 |Dagwood; Mortadella | 31 |
| Alexandria | 34 |Slider; Beschuit met muisjes; Chicken salad | 35 |
| Ana | 38 |Fried brain; Polish boy; Vegetable; Pudgy Pie; Dagwood | 39 |