├── .github ├── .gitignore └── workflows │ ├── deploy_bookdown.yml │ ├── pr_check.yml │ └── pr_check_readme.yml ├── preamble.tex ├── data ├── .gitignore ├── roster.xlsx ├── students.rds ├── survey.xlsx ├── bake-sale.xlsx ├── penguins.xlsx ├── students.xlsx ├── students.parquet ├── 03-sales.csv ├── 02-sales.csv ├── 01-sales.csv ├── students-2.csv └── students.csv ├── images ├── sql.png ├── 19_lt.png ├── 28-bib.png ├── cases.png ├── ge_aes.png ├── ge_all.png ├── hadley.jpg ├── layers.png ├── merge.png ├── shape.png ├── 19_anti.png ├── 19_cross.png ├── 19_full.png ├── 19_inner.png ├── 19_left.png ├── 19_right.png ├── 19_semi.png ├── 28-fig28.png ├── 6-tidy-1.png ├── 28_just-1.png ├── 28_themes.png ├── 5_diagram_1.odg ├── 5_diagram_1.png ├── 5_diagram_2.odg ├── 5_diagram_2.png ├── 5_diagram_3.odg ├── 5_diagram_3.png ├── 5_diagram_4.odg ├── 5_diagram_4.png ├── 6-Projects.png ├── duplicates.png ├── duplicates2.png ├── files_pane.png ├── ge_themes.png ├── script_pane.png ├── transform.png ├── 19_relational.png ├── 22-resampling.png ├── 28_book_cairo.png ├── console_pane.png ├── data-science.png ├── ggplot2_logo.png ├── string_stuck.png ├── 14_venn_diagrams.png ├── 19_many-to-many.png ├── 19_many-to-one.png ├── 19_one-to-many.png ├── 28-chunk-label.png ├── 28-chunk-options.png ├── 28-execute-yaml.png ├── 28-knitr-options.png ├── 6-column-names.png ├── 6-multiple-names.png ├── 6-panes_layout.png ├── environment_pane.png ├── horst-spelling.png ├── quarto-chunk-nav.png ├── quarto-dark-bg.jpeg ├── test_functions.png ├── 17_datetime_codes.png ├── 19_equality_match.png ├── 6-names-and-values.png ├── transform-logical.png ├── 22-data-science-model.png ├── data-science-explore.png ├── seperate_wider_delim.png ├── stringr-autocomplete.png ├── 28-quarto-visual-editor.png ├── seperate_longer_delim1.png ├── seperate_wider_position.png ├── visualization-stat-bar.png ├── 17-lord_howe_stick_insect.jpg ├── data-structures-overview.png ├── seperate_longer_position.png ├── 15_search_google_sheets_regex.png ├── special_missing_values_doubles.png └── visualization-coordinate-systems.png ├── penguin-plot.png ├── .Rbuildignore ├── _bookdown.yml ├── .gitignore ├── book.bib ├── bookclub-r4ds.Rproj ├── _output.yml ├── test.qmd ├── quarto └── markdown.qmd ├── references.bib ├── style.css ├── DESCRIPTION ├── index.Rmd ├── README.md ├── 24-web_scraping.Rmd ├── 08-workflow_getting_help.Rmd ├── 21-databases.Rmd ├── 99-24-model_building.Rmd ├── 17-dates_and_times.Rmd ├── 02-workflow_basics.Rmd ├── 22-arrow.Rmd ├── 29-quarto_formats.Rmd ├── 26-iteration.Rmd ├── 10-exploratory_data_analysis.Rmd ├── 27-base_r.Rmd ├── 11-communication.Rmd ├── 00-introduction.Rmd ├── 04-workflow_code_style.Rmd ├── 20-spreadsheets.Rmd ├── 16-factors.Rmd ├── 99-23-model_basics.Rmd ├── 23-hierarchical_data.Rmd └── 12-logical_vectors.Rmd /.github/.gitignore: -------------------------------------------------------------------------------- 1 | *.html 2 | -------------------------------------------------------------------------------- /preamble.tex: -------------------------------------------------------------------------------- 1 | \usepackage{booktabs} 2 | -------------------------------------------------------------------------------- /data/.gitignore: -------------------------------------------------------------------------------- 1 | seattle-library-checkouts 2 | seattle-library-checkouts.csv 3 | -------------------------------------------------------------------------------- /images/sql.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/sql.png -------------------------------------------------------------------------------- /data/roster.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/data/roster.xlsx -------------------------------------------------------------------------------- /data/students.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/data/students.rds -------------------------------------------------------------------------------- /data/survey.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/data/survey.xlsx -------------------------------------------------------------------------------- /images/19_lt.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/19_lt.png -------------------------------------------------------------------------------- /images/28-bib.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/28-bib.png -------------------------------------------------------------------------------- /images/cases.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/cases.png -------------------------------------------------------------------------------- /images/ge_aes.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/ge_aes.png -------------------------------------------------------------------------------- /images/ge_all.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/ge_all.png -------------------------------------------------------------------------------- /images/hadley.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/hadley.jpg -------------------------------------------------------------------------------- /images/layers.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/layers.png -------------------------------------------------------------------------------- /images/merge.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/merge.png -------------------------------------------------------------------------------- /images/shape.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/shape.png -------------------------------------------------------------------------------- /penguin-plot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/penguin-plot.png -------------------------------------------------------------------------------- /data/bake-sale.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/data/bake-sale.xlsx -------------------------------------------------------------------------------- /data/penguins.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/data/penguins.xlsx -------------------------------------------------------------------------------- /data/students.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/data/students.xlsx -------------------------------------------------------------------------------- /images/19_anti.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/19_anti.png -------------------------------------------------------------------------------- /images/19_cross.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/19_cross.png -------------------------------------------------------------------------------- /images/19_full.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/19_full.png -------------------------------------------------------------------------------- /images/19_inner.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/19_inner.png -------------------------------------------------------------------------------- /images/19_left.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/19_left.png -------------------------------------------------------------------------------- /images/19_right.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/19_right.png -------------------------------------------------------------------------------- /images/19_semi.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/19_semi.png -------------------------------------------------------------------------------- /images/28-fig28.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/28-fig28.png -------------------------------------------------------------------------------- /images/6-tidy-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/6-tidy-1.png -------------------------------------------------------------------------------- /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^renv$ 2 | ^renv\.lock$ 3 | ^\.github$ 4 | ^.*\.Rproj$ 5 | ^\.Rproj\.user$ 6 | -------------------------------------------------------------------------------- /data/students.parquet: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/data/students.parquet -------------------------------------------------------------------------------- /images/28_just-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/28_just-1.png -------------------------------------------------------------------------------- /images/28_themes.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/28_themes.png -------------------------------------------------------------------------------- /images/5_diagram_1.odg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/5_diagram_1.odg -------------------------------------------------------------------------------- /images/5_diagram_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/5_diagram_1.png -------------------------------------------------------------------------------- /images/5_diagram_2.odg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/5_diagram_2.odg -------------------------------------------------------------------------------- /images/5_diagram_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/5_diagram_2.png -------------------------------------------------------------------------------- /images/5_diagram_3.odg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/5_diagram_3.odg -------------------------------------------------------------------------------- /images/5_diagram_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/5_diagram_3.png -------------------------------------------------------------------------------- /images/5_diagram_4.odg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/5_diagram_4.odg -------------------------------------------------------------------------------- /images/5_diagram_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/5_diagram_4.png -------------------------------------------------------------------------------- /images/6-Projects.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/6-Projects.png -------------------------------------------------------------------------------- /images/duplicates.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/duplicates.png -------------------------------------------------------------------------------- /images/duplicates2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/duplicates2.png -------------------------------------------------------------------------------- /images/files_pane.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/files_pane.png -------------------------------------------------------------------------------- /images/ge_themes.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/ge_themes.png -------------------------------------------------------------------------------- /images/script_pane.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/script_pane.png -------------------------------------------------------------------------------- /images/transform.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/transform.png -------------------------------------------------------------------------------- /images/19_relational.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/19_relational.png -------------------------------------------------------------------------------- /images/22-resampling.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/22-resampling.png -------------------------------------------------------------------------------- /images/28_book_cairo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/28_book_cairo.png -------------------------------------------------------------------------------- /images/console_pane.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/console_pane.png -------------------------------------------------------------------------------- /images/data-science.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/data-science.png -------------------------------------------------------------------------------- /images/ggplot2_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/ggplot2_logo.png -------------------------------------------------------------------------------- /images/string_stuck.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/string_stuck.png -------------------------------------------------------------------------------- /images/14_venn_diagrams.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/14_venn_diagrams.png -------------------------------------------------------------------------------- /images/19_many-to-many.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/19_many-to-many.png -------------------------------------------------------------------------------- /images/19_many-to-one.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/19_many-to-one.png -------------------------------------------------------------------------------- /images/19_one-to-many.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/19_one-to-many.png -------------------------------------------------------------------------------- /images/28-chunk-label.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/28-chunk-label.png -------------------------------------------------------------------------------- /images/28-chunk-options.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/28-chunk-options.png -------------------------------------------------------------------------------- /images/28-execute-yaml.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/28-execute-yaml.png -------------------------------------------------------------------------------- /images/28-knitr-options.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/28-knitr-options.png -------------------------------------------------------------------------------- /images/6-column-names.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/6-column-names.png -------------------------------------------------------------------------------- /images/6-multiple-names.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/6-multiple-names.png -------------------------------------------------------------------------------- /images/6-panes_layout.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/6-panes_layout.png -------------------------------------------------------------------------------- /images/environment_pane.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/environment_pane.png -------------------------------------------------------------------------------- /images/horst-spelling.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/horst-spelling.png -------------------------------------------------------------------------------- /images/quarto-chunk-nav.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/quarto-chunk-nav.png -------------------------------------------------------------------------------- /images/quarto-dark-bg.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/quarto-dark-bg.jpeg -------------------------------------------------------------------------------- /images/test_functions.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/test_functions.png -------------------------------------------------------------------------------- /images/17_datetime_codes.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/17_datetime_codes.png -------------------------------------------------------------------------------- /images/19_equality_match.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/19_equality_match.png -------------------------------------------------------------------------------- /images/6-names-and-values.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/6-names-and-values.png -------------------------------------------------------------------------------- /images/transform-logical.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/transform-logical.png -------------------------------------------------------------------------------- /images/22-data-science-model.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/22-data-science-model.png -------------------------------------------------------------------------------- /images/data-science-explore.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/data-science-explore.png -------------------------------------------------------------------------------- /images/seperate_wider_delim.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/seperate_wider_delim.png -------------------------------------------------------------------------------- /images/stringr-autocomplete.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/stringr-autocomplete.png -------------------------------------------------------------------------------- /images/28-quarto-visual-editor.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/28-quarto-visual-editor.png -------------------------------------------------------------------------------- /images/seperate_longer_delim1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/seperate_longer_delim1.png -------------------------------------------------------------------------------- /images/seperate_wider_position.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/seperate_wider_position.png -------------------------------------------------------------------------------- /images/visualization-stat-bar.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/visualization-stat-bar.png -------------------------------------------------------------------------------- /images/17-lord_howe_stick_insect.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/17-lord_howe_stick_insect.jpg -------------------------------------------------------------------------------- /images/data-structures-overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/data-structures-overview.png -------------------------------------------------------------------------------- /images/seperate_longer_position.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/seperate_longer_position.png -------------------------------------------------------------------------------- /images/15_search_google_sheets_regex.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/15_search_google_sheets_regex.png -------------------------------------------------------------------------------- /images/special_missing_values_doubles.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/special_missing_values_doubles.png -------------------------------------------------------------------------------- /images/visualization-coordinate-systems.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/r4ds/bookclub-r4ds/HEAD/images/visualization-coordinate-systems.png -------------------------------------------------------------------------------- /data/03-sales.csv: -------------------------------------------------------------------------------- 1 | month,year,brand,item,n 2 | March,2019,1,1234,3 3 | March,2019,1,3627,1 4 | March,2019,1,8820,3 5 | March,2019,2,7253,1 6 | March,2019,2,8766,3 7 | March,2019,2,8288,6 8 | -------------------------------------------------------------------------------- /_bookdown.yml: -------------------------------------------------------------------------------- 1 | book_filename: "bookclub-r4ds" 2 | repo: https://github.com/r4ds/bookclub-r4ds 3 | edit: "https://github.com/r4ds/bookclub-r4ds/edit/main/%s" 4 | output_dir: "_book" 5 | delete_merged_file: true 6 | -------------------------------------------------------------------------------- /data/02-sales.csv: -------------------------------------------------------------------------------- 1 | month,year,brand,item,n 2 | February,2019,1,1234,8 3 | February,2019,1,8721,2 4 | February,2019,1,1822,3 5 | February,2019,2,3333,1 6 | February,2019,2,2156,3 7 | February,2019,2,3987,6 8 | -------------------------------------------------------------------------------- /data/01-sales.csv: -------------------------------------------------------------------------------- 1 | month,year,brand,item,n 2 | January,2019,1,1234,3 3 | January,2019,1,8721,9 4 | January,2019,1,1822,2 5 | January,2019,2,3333,1 6 | January,2019,2,2156,9 7 | January,2019,2,3987,6 8 | January,2019,2,3827,6 -------------------------------------------------------------------------------- /.github/workflows/deploy_bookdown.yml: -------------------------------------------------------------------------------- 1 | on: 2 | push: 3 | branches: main 4 | paths-ignore: 5 | - 'README.md' 6 | workflow_dispatch: 7 | 8 | jobs: 9 | bookdown: 10 | uses: r4ds/r4dsactions/.github/workflows/render_pages.yml@main 11 | -------------------------------------------------------------------------------- /.github/workflows/pr_check.yml: -------------------------------------------------------------------------------- 1 | on: 2 | pull_request: 3 | branches: main 4 | paths-ignore: 5 | - 'README.md' 6 | workflow_dispatch: 7 | 8 | jobs: 9 | pr_check: 10 | uses: r4ds/r4dsactions/.github/workflows/render_check.yml@main 11 | -------------------------------------------------------------------------------- /.github/workflows/pr_check_readme.yml: -------------------------------------------------------------------------------- 1 | on: 2 | pull_request: 3 | branches: main 4 | paths: 5 | - 'README.md' 6 | workflow_dispatch: 7 | 8 | jobs: 9 | pr_check: 10 | uses: r4ds/r4dsactions/.github/workflows/render_check_readme.yml@main 11 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .Rdata 4 | .Renviron 5 | .Rprofile 6 | .httr-oauth 7 | .DS_Store 8 | _book 9 | _bookdown_files 10 | bookclub-r4ds.Rmd 11 | bookclub-r4ds_files 12 | *.html 13 | libs 14 | renv 15 | bookclub-r4ds.knit.md 16 | -------------------------------------------------------------------------------- /book.bib: -------------------------------------------------------------------------------- 1 | @Book{xie2015, 2 | title = {Dynamic Documents with {R} and knitr}, 3 | author = {Yihui Xie}, 4 | publisher = {Chapman and Hall/CRC}, 5 | address = {Boca Raton, Florida}, 6 | year = {2015}, 7 | edition = {2nd}, 8 | note = {ISBN 978-1498716963}, 9 | url = {http://yihui.org/knitr/}, 10 | } 11 | -------------------------------------------------------------------------------- /data/students-2.csv: -------------------------------------------------------------------------------- 1 | student_id,full_name,favourite_food,meal_plan,age 2 | 1,Sunil Huffmann,Strawberry yoghurt,Lunch only,4 3 | 2,Barclay Lynn,French fries,Lunch only,5 4 | 3,Jayendra Lyne,NA,Breakfast and lunch,7 5 | 4,Leon Rossini,Anchovies,Lunch only,NA 6 | 5,Chidiegwu Dunkel,Pizza,Breakfast and lunch,5 7 | 6,Güvenç Attila,Ice cream,Lunch only,6 8 | -------------------------------------------------------------------------------- /data/students.csv: -------------------------------------------------------------------------------- 1 | Student ID,Full Name,favourite.food,mealPlan,AGE 2 | 1,Sunil Huffmann,Strawberry yoghurt,Lunch only,4 3 | 2,Barclay Lynn,French fries,Lunch only,5 4 | 3,Jayendra Lyne,N/A,Breakfast and lunch,7 5 | 4,Leon Rossini,Anchovies,Lunch only, 6 | 5,Chidiegwu Dunkel,Pizza,Breakfast and lunch,five 7 | 6,Güvenç Attila,Ice cream,Lunch only,6 -------------------------------------------------------------------------------- /bookclub-r4ds.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | ProjectId: 0e4ac423-e010-4dc1-ba28-4aa8506b6d4c 3 | 4 | RestoreWorkspace: Default 5 | SaveWorkspace: Default 6 | AlwaysSaveHistory: Default 7 | 8 | EnableCodeIndexing: Yes 9 | UseSpacesForTab: Yes 10 | NumSpacesForTab: 2 11 | Encoding: UTF-8 12 | 13 | RnwWeave: Sweave 14 | LaTeX: pdfLaTeX 15 | 16 | BuildType: Website 17 | -------------------------------------------------------------------------------- /_output.yml: -------------------------------------------------------------------------------- 1 | bookdown::gitbook: 2 | css: style.css 3 | split_by: section 4 | config: 5 | toc: 6 | collapse: section 7 | before: | 8 |
  • R for Data Science Book Club
  • 9 | after: | 10 |
  • Published with bookdown
  • 11 | edit: 12 | link: https://github.com/r4ds/bookclub-r4ds/edit/main/%s 13 | text: "Edit" 14 | sharing: 15 | github: yes 16 | facebook: no 17 | twitter: no 18 | -------------------------------------------------------------------------------- /test.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Untitled" 3 | format: html 4 | editor: visual 5 | bibliography: references.bib 6 | --- 7 | 8 | ###### Quarto 9 | 10 | Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see . 11 | 12 | ## Running Code 13 | 14 | When you click the **Render** button a document will be generated that includes both content and the output of embedded code. You can embed code like this: 15 | 16 | ```{r} 17 | 1 + 1 18 | ``` 19 | 20 | You can add options to executable code like this [@abrahms2016] 21 | 22 | ```{r} 23 | #| echo: false 24 | 2 * 2 25 | ``` 26 | 27 | The `echo: false` option disables the printing of code (only output is displayed) . 28 | 29 | ```{r} 30 | 2+2 31 | 32 | i = 1 33 | print (i) 34 | ``` 35 | -------------------------------------------------------------------------------- /quarto/markdown.qmd: -------------------------------------------------------------------------------- 1 | ## Text formatting 2 | 3 | *italic* **bold** ~~strikeout~~ `code` 4 | 5 | superscript^2^ subscript~2~ 6 | 7 | [underline]{.underline} [small caps]{.smallcaps} 8 | 9 | ## Headings 10 | 11 | # 1st Level Header 12 | 13 | ## 2nd Level Header 14 | 15 | ### 3rd Level Header 16 | 17 | ## Lists 18 | 19 | - Bulleted list item 1 20 | 21 | - Item 2 22 | 23 | - Item 2a 24 | 25 | - Item 2b 26 | 27 | 1. Numbered list item 1 28 | 29 | 2. Item 2. 30 | The numbers are incremented automatically in the output. 31 | 32 | ## Links and images 33 | 34 | 35 | 36 | [linked phrase](http://example.com) 37 | 38 | ![optional caption text](quarto.png){fig-alt="Quarto logo and the word quarto spelled in small case letters"} 39 | 40 | ## Tables 41 | 42 | | First Header | Second Header | 43 | |--------------|---------------| 44 | | Content Cell | Content Cell | 45 | | Content Cell | Content Cell | 46 | -------------------------------------------------------------------------------- /references.bib: -------------------------------------------------------------------------------- 1 | 2 | @article{abrahms2016, 3 | title = {Lessons from integrating behaviour and resource selection: activity-specific responses of A frican wild dogs to roads}, 4 | author = {Abrahms, B and Jordan, NR and Golabek, KA and McNutt, JW and Wilson, AM and Brashares, JS}, 5 | year = {2016}, 6 | date = {2016}, 7 | journal = {Animal Conservation}, 8 | pages = {247{\textendash}255}, 9 | volume = {19}, 10 | number = {3}, 11 | note = {Publisher: Wiley Online Library} 12 | } 13 | 14 | @article{abrahms2016, 15 | title = {Lessons from integrating behaviour and resource selection: activity-specific responses of A frican wild dogs to roads}, 16 | author = {Abrahms, B and Jordan, NR and Golabek, KA and McNutt, JW and Wilson, AM and Brashares, JS}, 17 | year = {2016}, 18 | date = {2016}, 19 | journal = {Animal Conservation}, 20 | pages = {247{\textendash}255}, 21 | volume = {19}, 22 | number = {3}, 23 | note = {Publisher: Wiley Online Library} 24 | } 25 | -------------------------------------------------------------------------------- /style.css: -------------------------------------------------------------------------------- 1 | .page-inner { 2 | max-width: 1000px !important; 3 | } 4 | 5 | .book.font-size-0 .book-body .page-inner section { 6 | font-size: 1em !important; 7 | } 8 | .book.font-size-1 .book-body .page-inner section { 9 | font-size: 1.5em !important; 10 | } 11 | .book.font-size-2 .book-body .page-inner section { 12 | font-size: 2em !important; 13 | } 14 | .book.font-size-3 .book-body .page-inner section { 15 | font-size: 2.5em !important; 16 | } 17 | .book.font-size-4 .book-body .page-inner section { 18 | font-size: 3em !important; 19 | } 20 | 21 | /* Styles below here were customized before standardization. 22 | Try to get rid of these! */ 23 | 24 | p.caption { 25 | color: #777; 26 | margin-top: 10px; 27 | } 28 | p code { 29 | white-space: inherit; 30 | } 31 | pre { 32 | word-break: normal; 33 | word-wrap: normal; 34 | } 35 | pre code { 36 | white-space: inherit; 37 | } 38 | 39 | .book .book-body .page-wrapper .page-inner section.normal img.robot { 40 | float: left; 41 | height: 75px; 42 | margin-right: 20px; 43 | } 44 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: bookclub-r4ds 2 | Title: R for Data Science Book Club 3 | Version: 0.0.9.9000 4 | Authors@R: 5 | person("Data Science Learning Community", role = c("aut", "cre", "cph")) 6 | URL: https://r4ds.github.io/bookclub-r4ds, 7 | https://github.com/r4ds/bookclub-r4ds 8 | Depends: 9 | R (>= 3.1.0) 10 | Imports: 11 | arrow, 12 | babynames, 13 | bookdown, 14 | curl, 15 | DBI, 16 | dbplyr, 17 | details, 18 | duckdb, 19 | emo, 20 | gapminder, 21 | ggplot2, 22 | ggthemes, 23 | googlesheets4, 24 | here, 25 | hexbin, 26 | hrbrthemes, 27 | htmlwidgets, 28 | janitor, 29 | jsonlite, 30 | Lahman, 31 | lubridate, 32 | manipulate, 33 | maps, 34 | microbenchmark, 35 | nycflights13, 36 | palmerpenguins, 37 | patchwork, 38 | reactable, 39 | reactablefmtr, 40 | readxl, 41 | repurrrsive, 42 | rvest, 43 | styler, 44 | tidyverse, 45 | tufte, 46 | tvthemes, 47 | viridis, 48 | writexl 49 | Remotes: 50 | hadley/emo 51 | Encoding: UTF-8 52 | -------------------------------------------------------------------------------- /index.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "R for Data Science Book Club" 3 | date: "`r Sys.Date()`" 4 | site: bookdown::bookdown_site 5 | documentclass: book 6 | bibliography: book.bib 7 | biblio-style: apalike 8 | link-citations: yes 9 | github-repo: r4ds/bookclub-r4ds 10 | description: "This is the product of the Data Science Learning Community's Book Club." 11 | --- 12 | 13 | # Welcome {-} 14 | 15 | This is a companion for the book [R for Data Science](https://r4ds.hadley.nz/) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund. 16 | This companion is available at [dslc.io/r4ds](https://dslc.io/r4ds). 17 | 18 | This website is being developed by the [Data Science Learning Community](https://dslc.io). Follow along, and [join the community](https://dslc.io/join) to participate. 19 | 20 | This companion follows the [Data Science Learning Community Code of Conduct](https://dslc.io/conduct). 21 | 22 | ## Book club meetings {-} 23 | 24 | - Volunteer leads discussion of a chapter 25 | - **This is the best way to learn the material.** 26 | - Presentations: 27 | - Review of material 28 | - Questions you have 29 | - Maybe live demo 30 | - More info about editing: [this github repo](https://github.com/r4ds/bookclub-r4ds). 31 | - Recorded, available on the [Data Science Learning Community YouTube Channel](https://dslc.io/youtube). 32 | 33 | ## Pace {-} 34 | 35 | - **Goal:** 1 chapter/week 36 | - Ok to split overwhelming chapters 37 | - Ok to combine short chapters 38 | - Meet ***every*** week except holidays, etc 39 | - We'll discuss even if presenter unavailable 40 | 41 | ## Learning objectives {-} 42 | 43 | - Students who study with LOs in mind ***retain more.*** 44 | - **Tips:** 45 | - "After today's session, you will be able to..." 46 | - *Very* roughly **1 per section.** 47 | - Likely need to be refined 48 | 49 | ## Today's learning objectives {-} 50 | 51 | After today's session, you will be able to... 52 | 53 | - Explain how our weekly meetings work. 54 | - Sign up to lead a discussion. 55 | - Edit notes on GitHub. 56 | - (More LOs coming in Chapter 1) 57 | 58 | ## GitHub {-} 59 | 60 | - Even *tech bros* can figure it out, ***you'll be fine!*** 61 | - See README for setup instructions 62 | - [Cohort 9 Week 1](https://youtu.be/9ar16FGFgT0) included a walk-through 63 | - Ok to edit directly in browser! 64 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DSLC R for Data Science Book Club 2 | 3 | Welcome to the DSLC R for Data Science Book Club! 4 | 5 | We are working together to read [R for Data Science](https://r4ds.hadley.nz/) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund. 6 | Join the #book_club-r_for_data_science channel on the [DSLC Slack](https://dslc.io/join) to participate. 7 | As we read, we are producing [notes about the book](https://r4ds.github.io/bookclub-r4ds/). 8 | 9 | ## Meeting Schedule 10 | 11 | If you would like to present, please see the sign-up sheet for your cohort (linked below, and pinned in the [#book_club-r4ds](https://dslcio.slack.com/archives/C012VLJ0KRB) channel on Slack)! 12 | 13 | - Cohort 1 (started 2020-07-31, ended 2020-10-12): [meeting videos](https://youtube.com/playlist?list=PL3x6DOfs2NGgUOBkwtRJQW0hDWCwdzboM) 14 | - Cohort 2 (started 2020-08-03, ended 2021-03-29): [meeting videos](https://www.youtube.com/playlist?list=PL3x6DOfs2NGglHEO3WBEaxiEZ0_ZiwZJi) 15 | - Cohort 3 (started 2020-12-08, ended 2021-11-09): [meeting videos](https://www.youtube.com/playlist?list=PL3x6DOfs2NGiiKcrDqW4m9qhlpbiQ7HCt) 16 | - Cohort 4 (started 2020-12-16, ended 2021-06-23): [meeting videos](https://www.youtube.com/playlist?list=PL3x6DOfs2NGjtn1_4BSX99R5wrLjK7XvY) 17 | - Cohort 5 (started 2021-07-24, ended 2022-04-23): [meeting videos](https://www.youtube.com/playlist?list=PL3x6DOfs2NGjk1sPsrn2CazGiel0yZrhc) 18 | - Cohort 6 (started 2021-10-15, ended 2022-11-17): [meeting videos](https://www.youtube.com/playlist?list=PL3x6DOfs2NGiYnQdq8mgMBeob3YONUWRM) 19 | - Cohort 7 (started 2022-08-29, ended 2023-07-31): [meeting videos](https://youtube.com/playlist?list=PL3x6DOfs2NGi3qrPu8xxURdUoYAJpko5G) 20 | - Cohort 8 (started 2022-09-24, ended 2023-08-19): [meeting videos](https://www.youtube.com/playlist?list=PL3x6DOfs2NGjeq_14X43I3OHYxuE2mO4I) 21 | - Cohort 9 (started 2023-07-30, ended 2024-04-28): [meeting videos](https://www.youtube.com/playlist?list=PL3x6DOfs2NGjVMs1NtbWu4s_ZgGhGKnrN) 22 | - Cohort 10 (started 2023-10-06, ended 2024-07-19): [meeting videos](https://www.youtube.com/playlist?list=PL3x6DOfs2NGj_fqbuP0xWjm5pD9hz6G5Z) 23 | - Cohort 11 (started 2024-08-22, ended 2025-05-01): [meeting videos](https://www.youtube.com/playlist?list=PL3x6DOfs2NGhcXLwZHIEnDLv2HhmhD4ma) 24 | 25 | 26 | ## How to Present 27 | 28 | This repository is structured as a [{bookdown}](https://CRAN.R-project.org/package=bookdown) site. 29 | To present, follow these instructions: 30 | 31 | Do these steps once: 32 | 33 | 1. [Setup Git and GitHub to work with RStudio](https://github.com/r4ds/bookclub-setup) (click through for detailed, step-by-step instructions; I recommend checking this out even if you're pretty sure you're all set). 34 | 2. `usethis::create_from_github("r4ds/bookclub-r4ds")` (cleanly creates your own copy of this repository). 35 | 36 | Do these steps each time you present another chapter: 37 | 38 | 1. Open your project for this book. 39 | 2. `usethis::pr_init("my-chapter")` (creates a branch for your work, to avoid confusion, making sure that you have the latest changes from other contributors; replace `my-chapter` with a descriptive name, ideally). 40 | 3. `devtools::install_dev_deps()` (installs any packages used by the book that you don't already have installed). 41 | 4. Edit the appropriate chapter file, if necessary. Use `##` to indicate new slides (new sections). 42 | 5. If you use any packages that are not already in the `DESCRIPTION`, add them. You can use `usethis::use_package("myCoolPackage")` to add them quickly! 43 | 6. Build the book! ctrl-shift-b (or command-shift-b) will render the full book, or ctrl-shift-k (command-shift-k) to render just your slide. Please do this to make sure it works before you push your changes up to the main repo! 44 | 7. Commit your changes (either through the command line or using Rstudio's Git tab). 45 | 8. `usethis::pr_push()` (pushes the changes up to github, and opens a "pull request" (PR) to let us know your work is ready). 46 | 9. (If we request changes, make them) 47 | 10. When your PR has been accepted ("merged"), `usethis::pr_finish()` to close out your branch and prepare your local repository for future work. 48 | 11. Now that your local copy is up-to-date with the main repo, you need to update your remote fork. Run `gert::git_push("origin")` or click the `Push` button on the `Git` tab of Rstudio. 49 | 50 | When your PR is checked into the main branch, the bookdown site will rebuild, adding your slides to [this site](https://dslc.io/r4ds). 51 | -------------------------------------------------------------------------------- /24-web_scraping.Rmd: -------------------------------------------------------------------------------- 1 | # Web scraping 2 | 3 | **Learning objectives** 4 | 5 | - Decide whether to scrape data from a web page. 6 | - Recognize enough HTML to find your way around a web page. 7 | - Extract tables from web pages. 8 | - Extract other data from web pages. 9 | 10 | ```{r web_scraping-packages, eval=TRUE, message=FALSE, warning=FALSE} 11 | library(rvest) 12 | library(tidyverse) 13 | ``` 14 | 15 | ## Ethics & Legalities {-} 16 | 17 | > [If the data isn’t public, non-personal, or factual or you’re scraping the data specifically to make money with it, you’ll need to talk to a lawyer.](https://r4ds.hadley.nz/webscraping#scraping-ethics-and-legalities) 18 | 19 | - Be polite (and {[polite](https://dmi3kno.github.io/polite/)}) 20 | - Check Terms of Service 21 | - Beware PII 22 | - Facts usually aren't copyrightable 23 | 24 | ## Typical HTML structure {-} 25 | 26 | HTML = **H**yper**T**ext **M**arkup **L**anguage 27 | 28 | - Hierarchical structure 29 | - Element = `content` 30 | - Start tag: `` 31 | - Attributes: `attribute="a" other="b"` 32 | - Content: `content` 33 | - End tag: `` 34 | - Elements nest inside elements (as content) 35 | - Nested elements = "children" 36 | 37 | ## Use {rvest} to scrape web pages {-} 38 | 39 | [{rvest}](https://rvest.tidyverse.org/) ("harvest") = tidyverse web-scraping package 40 | 41 | - Load html to scrape: `read_html()` 42 | - Shortcut for tables: `html_table()` 43 | 44 | ## Example: Table {-} 45 | 46 | [Wikipedia List of world expositions](https://en.wikipedia.org/wiki/List_of_world_expositions) 47 | 48 | ```{r web_scraping-tables, eval=TRUE} 49 | url <- "https://en.wikipedia.org/wiki/List_of_world_expositions" 50 | html <- read_html(url) 51 | html |> 52 | html_table() 53 | ``` 54 | 55 | ## Select a specific element {-} 56 | 57 | `html_element()` returns same # outputs as inputs (1 thing in, 1 thing out) 58 | 59 | - `"thing"` = `` tag 60 | - `".thing"` = something with attribute `class="thing"` 61 | - `"#thing"` = something with attribute `id="thing"` 62 | 63 | ## Example: One specific table {-} 64 | 65 | ```{r web_scraping-html_element} 66 | html |> 67 | html_element("table.wikitable") |> 68 | html_table() 69 | ``` 70 | 71 | ## Select finer-grained elements {-} 72 | 73 | `html_elements()` finds *all* matches 74 | 75 | 👍 Rule of thumb: 76 | 77 | - `html_elements()` to get observations (rows) 78 | - `html_element()` to get variables for each observation (columns) 79 | 80 | ## Extract data {-} 81 | 82 | - `html_text()` for raw text (you probably don't want this) 83 | - `html_text2()` for clean text 84 | - `html_attr()` for attribute value (eg url `href`) 85 | 86 | ## Example: Star Wars Rows {-} 87 | 88 | [Star Wars films (1-7)](https://rvest.tidyverse.org/articles/starwars.html) 89 | 90 | ```{r web_scraping-star_wars-section} 91 | url <- "https://rvest.tidyverse.org/articles/starwars.html" 92 | html <- read_html(url) 93 | 94 | section <- html |> html_elements("section") 95 | section 96 | ``` 97 | 98 | ## Example: Star Wars Directors {-} 99 | 100 | ```{r web_scraping-star_wars-directors} 101 | section |> html_element(".director") |> html_text2() 102 | ``` 103 | 104 | ## Example: Star Wars Tibble {-} 105 | 106 | ```{r web_scraping-star_wars-tibble} 107 | tibble( 108 | title = section |> 109 | html_element("h2") |> 110 | html_text2(), 111 | released = section |> 112 | html_element("p") |> 113 | html_text2() |> 114 | stringr::str_remove("Released: ") |> 115 | readr::parse_date(), 116 | director = section |> 117 | html_element(".director") |> 118 | html_text2(), 119 | intro = section |> 120 | html_element(".crawl") |> 121 | html_text2() 122 | ) 123 | ``` 124 | 125 | ## Learn more {-} 126 | 127 | - [SelectorGadget](https://rvest.tidyverse.org/articles/selectorgadget.html) 128 | - [CSS Diner](https://flukeout.github.io/) 129 | - [MDN CSS selectors](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors) 130 | - [*Web APIs with R* book club](https://DSLC.io/wapir) 131 | 132 | ## Meeting Videos {-} 133 | 134 | ### Cohort 7 {-} 135 | 136 | `r knitr::include_url("https://www.youtube.com/embed/G5_pr9HxbT4")` 137 | 138 |
    139 | Meeting chat log 140 | ``` 141 | 00:04:47 Oluwafemi Oyedele: Hi Tim!!! 142 | 00:05:27 Tim Newby: Hi Oluwafemi - can you hear me? 143 | 00:05:37 Oluwafemi Oyedele: Yes 144 | 00:11:53 Oluwafemi Oyedele: start 145 | 00:33:49 Oluwafemi Oyedele: https://rvest.tidyverse.org/articles/selectorgadget.html 146 | 00:40:24 Oluwafemi Oyedele: stop 147 | ``` 148 |
    149 | 150 | `r knitr::include_url("https://www.youtube.com/embed/HnJ3ZY1seY4")` 151 | 152 | ### Cohort 8 {-} 153 | 154 | `r knitr::include_url("https://www.youtube.com/embed/dOVWSSqUvt0")` 155 | 156 | ### Cohort 9 {-} 157 | 158 | `r knitr::include_url("https://www.youtube.com/embed/Hs928CH-_E4")` 159 | -------------------------------------------------------------------------------- /08-workflow_getting_help.Rmd: -------------------------------------------------------------------------------- 1 | # Workflow: getting help 2 | 3 | **Learning objectives:** 4 | 5 | - Describe a few tips beyond the book on how to get help and to help you keep learning. 6 | 7 | ## Google 8 | 9 | - If you get an R error message and you have no idea what it means, chances are that someone else has been confused by it in the past, and there will be help somewhere on the web. 10 | 11 | - Typically adding "R" to a Google query is enough to restrict it to relevant results, but if the search isn't useful, try adding package names like "tidyverse" or "ggplot2" to narrow down the results 12 | 13 | - e.g., "how to make a boxplot in R" vs. "how to make a boxplot in R with ggplot2". 14 | 15 | - If the error message isn't in English, run `Sys.setenv(LANGUAGE = "en")` and re-run the code as you're more likely to find help for English error messages. 16 | 17 | - If Google doesn't help, try spending a little time searching [Stack Overflow](https://stackoverflow.com/) for an existing answer, including [R], to restrict your search to questions and answers that use R. 18 | 19 | ## Reprex 20 | 21 | - If your googling doesn't find anything useful, it's a really good idea to prepare a **reprex**, short for minimal **repr**oducible **ex**ample. 22 | 23 | - A good reprex makes it easier for other people to help you, and often you'll figure out the problem yourself in the course of making it. 24 | 25 | - There are two parts to creating a reprex: 26 | 27 | - *Make your code reproducible*: Capture everything, i.e. include any `library()` calls and create all necessary objects. 28 | - *Make your code minimal*. Strip away everything that is not directly related to your problem by creating a much smaller and simpler R object than the one you're facing in real life or even using built-in data. 29 | 30 | - Creating a reprex may sound like a lot of work, but it has a great payoff: 31 | 32 | - Creating an excellent reprex often reveals the source of your problem and may allow you to answer your own question. 33 | 34 | - You'll capture the essence of your problem in a way that is easy for others to play with which improves your chances of getting help. 35 | 36 | - The easiest way to avoid the mistake of accidentally miss something problem when creating a reprex by hand is by using the [`reprex`](https://reprex.tidyverse.org/) package. 37 | 38 | ## Making reprexes reproducible 39 | 40 | - There are three things you need to include to make your example reproducible: required packages, data, and code. 41 | 42 | - Packages should be loaded at the top of the script so it's easy to see which ones the example needs, and check that you're using the latest version of each package; 43 | 44 | - you may have discovered a bug that's been fixed since you installed or last updated the package. For packages in the tidyverse, the easiest way to check is to run `tidyverse_update()`. 45 | 46 | - The easiest way to include data is to use `dput()` to generate the R code needed to recreate it. For example, to recreate the mtcars dataset in R, perform the following steps: 47 | 48 | - Run dput(mtcars) in R. 49 | - Copy the output. 50 | - In reprex, type mtcars \<-, then paste. 51 | - Alternatively, click Addins, then Render reprex. 52 | 53 | - Spend a little bit of time ensuring that your code is easy for others to read: 54 | 55 | - Make sure you've used spaces and your variable names are concise yet informative. 56 | 57 | - Use comments to indicate where your problem lies. 58 | 59 | - Do your best to remove everything that is not related to the problem because the shorter your code is, the easier it is to understand and the easier it is to fix. 60 | 61 | - Try to use the smallest subset of your data that still reveals the problem, and finish by checking that you have actually made a reproducible example by starting a fresh R session and copying and pasting your script. 62 | 63 | ## Investing in yourself 64 | 65 | - It will take some practice to learn to create good, truly minimal reprexes, however learning to ask questions that include the code, and investing the time to make it reproducible will continue to pay off as you learn and master R. 66 | 67 | - Also, spend time preparing yourself to solve problems before they occur by investing a little time in learning R each day will pay off handsomely in the long run. 68 | 69 | - One way is to follow what the tidyverse team is doing on the [tidyverse blog](https://www.tidyverse.org/blog/). 70 | - To keep up with the R community more broadly, we recommend reading [R Weekly](https://rweekly.org/), a community effort to aggregate the most interesting news in the R community each week. 71 | 72 | ## Meeting Videos 73 | 74 | ### Cohort 7 75 | 76 | `r knitr::include_url("https://www.youtube.com/embed/kmc54BI9GTg")` 77 | 78 |
    79 | 80 | Meeting chat log 81 | 82 | ``` 83 | 00:27:53 Oluwafemi Oyedele: https://www.youtube.com/watch?v=5gqksthQ0cM 84 | 00:43:48 Oluwafemi Oyedele: #TidyTuesday 85 | ``` 86 | 87 |
    88 | 89 | ### Cohort 8 90 | 91 | `r knitr::include_url("https://www.youtube.com/embed/rbYO0oVkJC4")` 92 | -------------------------------------------------------------------------------- /21-databases.Rmd: -------------------------------------------------------------------------------- 1 | # Databases 2 | 3 | **Learning objectives:** 4 | 5 | - Use {DBI} to connect to a database and retrieve data. 6 | - Use {dbplyr} to translate dplyr code to SQL. 7 | 8 | ```{r 21-packages-used, message=FALSE, warning=FALSE} 9 | library(DBI) 10 | library(dbplyr) 11 | library(tidyverse) 12 | ``` 13 | 14 | ## Database basics {-} 15 | 16 | ![](https://media.tenor.com/OUpv1OW0bJMAAAAM/database-db.gif) 17 | 18 | - database (db) = collection of data frames (dfs) 19 | - each df = "table" 20 | - named columns where every value is the same type 21 | - db tables vs dfs: 22 | - db tables on disk (can be huge), dfs in memory (limited) 23 | - db tables have indexes, dfs don't 24 | - dbs row-oriented for fast data collection, dfs column-oriented for fast analysis 25 | 26 | ## Connecting to a database {-} 27 | 28 | - {DBI} = generic SQL interface 29 | - Specific package for your DBMS ({RPostgres}, {RMariaDB}, {duckdb}, etc) 30 | - {odbc} if no specific package available 31 | 32 | ```{r} 33 | con <- DBI::dbConnect(duckdb::duckdb()) 34 | ``` 35 | 36 | - When using duckdb in a project 37 | ```{r,eval=FALSE,warning=FALSE,message=FALSE} 38 | con <- DBI::dbConnect(duckdb::duckdb(), dbdir = "duckdb") 39 | ``` 40 | 41 | ## Load some data {-} 42 | 43 | ```{r} 44 | dbWriteTable(con, "mpg", ggplot2::mpg) 45 | dbWriteTable(con, "diamonds", ggplot2::diamonds) 46 | ``` 47 | 48 | ## DBI basics {-} 49 | ```{r} 50 | dbListTables(con) 51 | 52 | 53 | con |> 54 | dbReadTable("diamonds") |> 55 | as_tibble() 56 | ``` 57 | 58 | - SQL Syntax 59 | 60 | ```{r} 61 | sql <- " 62 | SELECT carat, cut, clarity, color, price 63 | FROM diamonds 64 | WHERE price > 15000 65 | " 66 | ``` 67 | 68 | ```{r} 69 | as_tibble(dbGetQuery(con, sql)) 70 | ``` 71 | 72 | ## dbplyr basics {-} 73 | 74 | ```{r} 75 | diamonds_db <- tbl(con, "diamonds") 76 | 77 | diamonds_db 78 | ``` 79 | 80 | ```{r} 81 | big_diamonds_db <- diamonds_db |> 82 | filter(price > 15000) |> 83 | select(carat:clarity, price) 84 | 85 | big_diamonds_db 86 | ``` 87 | 88 | ```{r} 89 | big_diamonds_db |> 90 | show_query() 91 | ``` 92 | 93 | - `collect()` moves data into R 94 | 95 | ```{r} 96 | big_diamonds <- big_diamonds_db |> 97 | collect() 98 | big_diamonds 99 | ``` 100 | 101 | ## SQL {-} 102 | 103 | ```{r} 104 | dbplyr::copy_nycflights13(con) 105 | 106 | 107 | flights <- tbl(con, "flights") 108 | planes <- tbl(con, "planes") 109 | ``` 110 | 111 | ## SQL basics {-} 112 | 113 | - *statements* = top level 114 | - `CREATE` = new tables 115 | - `INSERT` = add data 116 | - `SELECT` = retrieve data 117 | - aka "queries" 118 | 119 | ```{r} 120 | flights |> show_query() 121 | 122 | planes |> show_query() 123 | 124 | ``` 125 | 126 | - `WHERE` = `filter()` 127 | - `ORDER BY` = `arrange()` 128 | 129 | ```{r} 130 | flights |> 131 | filter(dest == "IAH") |> 132 | arrange(dep_delay) |> 133 | show_query() 134 | ``` 135 | 136 | ## SELECT {-} 137 | 138 | `SELECT` = tons of things! 139 | 140 | - `select()`, `rename()`, and `relocate()` 141 | 142 | ```{r} 143 | planes |> 144 | select(tailnum, type, manufacturer, model, year) |> 145 | show_query() 146 | 147 | 148 | planes |> 149 | select(tailnum, type, manufacturer, model, year) |> rename(year_built = year) |> 150 | show_query() 151 | 152 | 153 | planes |> 154 | select(tailnum, type, manufacturer, model, year) |> 155 | relocate(manufacturer, model, .before = type) |> 156 | show_query() 157 | ``` 158 | 159 | Not shown: `mutate()`, `summarize()` are also `SELECT` 160 | 161 | ## Subqueries {-} 162 | 163 | Sometimes {dbplyr} uses subqueries to translate {dplyr} code 164 | 165 | - **subquery** = query used in `FROM` in place of a table 166 | 167 | ```{r} 168 | flights |> 169 | mutate( 170 | year1 = year + 1, 171 | year2 = year1 + 1 172 | ) |> 173 | show_query() 174 | ``` 175 | 176 | ## Joins {-} 177 | 178 | SQL joins similar to {dplyr} joins 179 | 180 | ```{r} 181 | flights |> 182 | left_join(planes |> rename(year_built = year), by = "tailnum") |> 183 | show_query() 184 | ``` 185 | 186 | ## Other verbs {-} 187 | 188 | - `distinct()` 189 | - `slice_*()` 190 | - `intersect()` 191 | - `tidyr::pivot_longer()` 192 | - `tidyr::pivot_wider()` 193 | - Full list on [dbplyr website](https://dbplyr.tidyverse.org/reference/) 194 | 195 | ## Function translations {-} 196 | 197 | How does {dbplyr} deal with `mean()` vs `median()`? 198 | 199 | ```{r} 200 | summarize_query <- function(df, ...) { 201 | df |> 202 | summarize(...) |> 203 | show_query() 204 | } 205 | mutate_query <- function(df, ...) { 206 | df |> 207 | mutate(..., .keep = "none") |> 208 | show_query() 209 | } 210 | ``` 211 | 212 | ```{r} 213 | flights |> 214 | group_by(year, month, day) |> 215 | summarize_query( 216 | mean = mean(arr_delay, na.rm = TRUE), 217 | median = median(arr_delay, na.rm = TRUE) 218 | ) 219 | ``` 220 | 221 | ```{r} 222 | flights |> 223 | group_by(year, month, day) |> 224 | mutate_query( 225 | mean = mean(arr_delay, na.rm = TRUE), 226 | ) 227 | ``` 228 | 229 | ```{r} 230 | flights |> 231 | group_by(dest) |> 232 | arrange(time_hour) |> 233 | mutate_query( 234 | lead = lead(arr_delay), 235 | lag = lag(arr_delay) 236 | ) 237 | ``` 238 | 239 | 240 | ## Clean up {-} 241 | 242 | ```{r clean-up} 243 | dbDisconnect(con, shutdown = TRUE) 244 | ``` 245 | 246 | ## Meeting Videos {-} 247 | 248 | ### Cohort 7 {-} 249 | 250 | `r knitr::include_url("https://www.youtube.com/embed/0AWywckm3W4")` 251 | 252 |
    253 | Meeting chat log 254 | ``` 255 | 00:09:36 Oluwafemi Oyedele: Hi Tim, Good Evening!!! 256 | 00:10:59 Tim Newby: Hi Oluwafemi :-) 257 | 00:14:10 Oluwafemi Oyedele: start 258 | 00:48:43 Oluwafemi Oyedele: https://dbplyr.tidyverse.org/reference/ 259 | 00:48:58 Oluwafemi Oyedele: https://dbplyr.tidyverse.org/articles/dbplyr.html 260 | 00:56:01 Oluwafemi Oyedele: https://sqlfordatascientists.com/ 261 | 00:56:09 Oluwafemi Oyedele: https://www.practicalsql.com/ 262 | 00:57:28 Oluwafemi Oyedele: stop 263 | ``` 264 |
    265 | 266 | 267 | ### Cohort 8 {-} 268 | 269 | `r knitr::include_url("https://www.youtube.com/embed/ylTfwbQq1v0")` 270 | 271 | `r knitr::include_url("https://www.youtube.com/embed/HnJ3ZY1seY4")` 272 | -------------------------------------------------------------------------------- /99-24-model_building.Rmd: -------------------------------------------------------------------------------- 1 | # Model building {-} 2 | 3 | **Learning objectives:** 4 | 5 | - Build a **linear model** to explain trends in data. 6 | - Examine the **residuals** of a model to identify remaining trends in data. 7 | - Perform **feature engineering** to explain trends in data. 8 | - Recognize some resources to **learn more about modeling.** 9 | 10 | ## EDA vs Prediction 11 | 12 | **Reminder:** This book focuses on exploratory data analysis, not prediction. 13 | 14 | ![](images/data-science-explore.png) 15 | 16 | ## Build a Linear Model 17 | 18 | ```{r 99-24-setup, include = FALSE} 19 | # By this point these are probably already libraried, but I want to be sure. 20 | library(tidyverse) 21 | library(modelr) 22 | library(nycflights13) 23 | library(lubridate) 24 | ``` 25 | 26 | ```{r 99-24-lm} 27 | diamonds2 <- diamonds %>% 28 | filter(carat <= 2.5) %>% 29 | mutate(log_price = log2(price), log_carat = log2(carat)) 30 | 31 | mod_diamond <- lm(log_price ~ log_carat, data = diamonds2) 32 | 33 | grid <- diamonds2 %>% 34 | data_grid(carat = seq_range(carat, 20)) %>% 35 | mutate(log_carat = log2(carat)) %>% 36 | add_predictions(mod_diamond, "log_price") %>% 37 | mutate(price = 2 ^ log_price) 38 | 39 | ggplot(diamonds2) + 40 | aes(carat, price) + 41 | geom_hex(bins = 50) + 42 | geom_line(data = grid, color = "red", linewidth = 1) 43 | ``` 44 | 45 | ## Examine Residuals 46 | 47 | ```{r 99-24-residuals} 48 | diamonds2 <- diamonds2 %>% 49 | add_residuals(mod_diamond, "log_resid") 50 | 51 | ggplot(diamonds2) + 52 | aes(log_carat, log_resid) + 53 | geom_hex(bins = 50) 54 | ``` 55 | 56 | ```{r 99-24-residuals-plots} 57 | base_plot <- ggplot(diamonds2) + 58 | aes(y = log_resid) + 59 | geom_boxplot() 60 | 61 | base_plot + 62 | aes(cut) 63 | 64 | base_plot + 65 | aes(color) 66 | 67 | base_plot + 68 | aes(clarity) 69 | ``` 70 | 71 | ## Another Diamonds Model 72 | 73 | ```{r 99-24-lm2} 74 | mod_diamond2 <- lm( 75 | log_price ~ log_carat + color + cut + clarity, 76 | data = diamonds2 77 | ) 78 | 79 | plot_mod2 <- function(parameter) { 80 | grid <- diamonds2 %>% 81 | data_grid({{parameter}}, .model = mod_diamond2) %>% 82 | add_predictions(mod_diamond2) 83 | 84 | ggplot(grid) + 85 | aes(x = {{parameter}}, y = pred) + 86 | geom_point() 87 | } 88 | 89 | plot_mod2(cut) 90 | plot_mod2(color) 91 | plot_mod2(clarity) 92 | ``` 93 | 94 | ```{r 99-24-diamond-leftovers} 95 | diamonds2 <- diamonds2 %>% 96 | add_residuals(mod_diamond2, "log_resid2") 97 | 98 | ggplot(diamonds2) + 99 | aes(log_carat, log_resid2) + 100 | geom_hex(bins = 50) 101 | ``` 102 | 103 | ## Feature Engineering 104 | 105 | ```{r 99-24-flights} 106 | daily <- flights %>% 107 | mutate(date = make_date(year, month, day)) %>% 108 | group_by(date) %>% 109 | summarise(n = n()) 110 | 111 | ggplot(daily) + 112 | aes(date, n) + 113 | geom_line() 114 | ``` 115 | 116 | Feature engineering = using data to create new features to use in models 117 | 118 | ```{r 99-24-wday} 119 | daily <- daily %>% 120 | mutate(wday = wday(date, label = TRUE, week_start = 1)) 121 | ggplot(daily) + 122 | aes(wday, n) + 123 | geom_boxplot() 124 | ``` 125 | 126 | ```{r 99-24-wday-mod} 127 | mod <- lm(n ~ wday, data = daily) 128 | 129 | grid <- daily %>% 130 | data_grid(wday) %>% 131 | add_predictions(mod, "n") 132 | 133 | ggplot(daily) + 134 | aes(wday, n) + 135 | geom_boxplot() + 136 | geom_point(data = grid, colour = "red", size = 4) 137 | ``` 138 | 139 | ```{r 99-24-wday-residuals} 140 | daily <- daily %>% 141 | add_residuals(mod) 142 | 143 | base_plot <- ggplot(daily) + 144 | aes(date, resid) + 145 | geom_ref_line(h = 0) + 146 | geom_line() 147 | 148 | base_plot 149 | 150 | base_plot + 151 | aes(color = wday) 152 | 153 | base_plot + 154 | geom_smooth(se = FALSE, span = 0.20) 155 | ``` 156 | 157 | ```{r 99-24-wday-low} 158 | daily %>% 159 | filter(resid < -100) %>% 160 | pull(date, wday) 161 | ``` 162 | 163 | ```{r 99-24-seasonal} 164 | term <- function(date) { 165 | cut(date, 166 | breaks = ymd(20130101, 20130605, 20130825, 20140101), 167 | labels = c("spring", "summer", "fall") 168 | ) 169 | } 170 | 171 | daily <- daily %>% 172 | mutate(term = term(date)) 173 | 174 | mod2 <- MASS::rlm(n ~ wday * term, data = daily) 175 | 176 | daily %>% 177 | add_residuals(mod2, "resid") %>% 178 | ggplot() + 179 | aes(date, resid) + 180 | geom_hline(yintercept = 0, linewidth = 2, colour = "white") + 181 | geom_line() 182 | ``` 183 | 184 | ## Learning More 185 | 186 | - An Introduction to Statistical Learning (with Applications in R) ([statlearning.com](https://www.statlearning.com/) / #book_club-islr): Statistical explanations of various machine learning methods, with explanations of how to apply them in R. A good introduction to all of the types of models and why they work (or don't work) the way they do. 187 | - Tidy Modeling with R ([tmwr.org](https://www.tmwr.org/) / #book_club-tmwr): An opinionated introduction to using the tidymodels family of packages to build predictive models. Very hands-on and useful, but I think I might want to read it again after ISLR. 188 | - Feature Engineering and Selection: A Practical Approach for Predictive Models ([feat.engineering](http://www.feat.engineering/) / #book_club-feat_eng): Techniques for manipulating data to get better results out of models. 189 | - Applied Predictive Modeling ([github.com/topepo/tidy-apm](https://github.com/topepo/tidy-apm) / #project-tidy_apm): There isn't a free online version of this book yet, but it's at least theoretically in the works. This was published about 10 years ago by the leader of the tidymodels team, and he has started to update it to tidymodels code. I'd recommend *not* reading this one until/unless he takes that project back up (very possibly with the help of the DSLC community). 190 | 191 | ## Meeting Videos 192 | 193 | ### Cohort 5 194 | 195 | `r knitr::include_url("https://www.youtube.com/embed/jZmSbkkJIzQ")` 196 | 197 |
    198 | Meeting chat log 199 | ``` 200 | 00:18:47 Njoki Njuki Lucy: yes 201 | 00:56:00 Ryan Metcalf: @Sandra, here is a LARGE section to answer your question. I’m banking that Federica will provide a more specific code snippet….https://ggplot2-book.org/scales-guides.html#scales-guides 202 | 00:56:09 Federica Gazzelloni: https://ggplot2.tidyverse.org/reference/guide_colourbar.html 203 | 00:57:03 Federica Gazzelloni: ggplot()+geom_…()+guides() 204 | 00:58:35 Federica Gazzelloni: guides(color=guide_colourbar()) 205 | ``` 206 |
    207 | 208 | ### Cohort 6 209 | 210 | `r knitr::include_url("https://www.youtube.com/embed/FXR0WWyqDf8")` 211 | 212 | `r knitr::include_url("https://www.youtube.com/embed/jMXyhgS4AVg")` 213 | -------------------------------------------------------------------------------- /17-dates_and_times.Rmd: -------------------------------------------------------------------------------- 1 | # Dates and times 2 | 3 | **Learning objectives:** 4 | 5 | - **Create date** and **datetime** objects. 6 | - Work with **datetime components.** 7 | - Perform **arithmetic** on **timespans.** 8 | - Recognize ways to deal with **timezones** in R. 9 | 10 | ```{r dates-and-times-libraries, warning=FALSE, message=FALSE} 11 | library(tidyverse) 12 | library(nycflights13) 13 | ``` 14 | 15 | ## Date/time objects {-} 16 | 17 | 3 types of date/time objects: 18 | 19 | - **date** = `` 20 | - **time** = `