├── .gitignore ├── README.md ├── abstract_Rpackaging.Rmd ├── caa2021_Rpackage_workshop.Rproj ├── exercises ├── exercise_build_package.Rmd ├── exercise_functions.R └── solution_exercise_functions.R ├── figures ├── Screenshot_documentation.png ├── Screenshot_pckgdevcheatsheet.png ├── Screenshot_readxl_github.png ├── Screenshot_rextensions.png └── Screenshot_rpackages.png ├── handout ├── caa2021_handout.pdf └── caa2021_handout.svg ├── references.bib ├── render_slides.R ├── rendered_slides ├── 01_slides_intro.pdf ├── 02_slides_functions.pdf ├── 03_slides_Rpackage_structure.pdf ├── 04_slides_documentation.pdf ├── 05_slides_fluffy_context.pdf ├── 06_slides_data.pdf ├── 07_slides_advanced_topics.pdf └── 08_slides_wrap_up.pdf └── slides ├── 01_slides_intro.Rmd ├── 02_slides_functions.Rmd ├── 03_slides_Rpackage_structure.Rmd ├── 04_slides_documentation.Rmd ├── 05_slides_fluffy_context.Rmd ├── 06_slides_data.Rmd ├── 07_slides_advanced_topics.Rmd ├── 08_slides_wrap_up.Rmd └── preamble.tex /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | 6 | # rendered output 7 | *.pdf 8 | !rendered_slides/*.pdf 9 | *.html 10 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Developing R Packages 2 | 3 | This workshop by the [CAA–SIG Scientific Scripting Languages in Archaeology](https://sslarch.github.io) given at [CAA2021, Cyprus](https://2021.caaconference.org) offers a low-level introduction to R package development. The workshop was complementary to the session 4 | [Tools for the Revolution: developing packages for scientific programming in archaeology](https://github.com/sslarch/caa2021_packages) at the conference. 5 | 6 | The slides are available as .pdf in [`rendered_slides`](rendered_slides). 7 | 8 | ## What this workshop covers 9 | 10 | In this workshop we focused on the main points in Hadley Wickham's book on package development ([Wickham 2020](https://r-pkgs.org)) and created an example application together. Workshop attendees will get to know a structured workflow, which will aid them in organizing their personal scripts afterwards. Main topics include function definition in R, the R package structure and the typical package development cycle. If time allows, we will also introduce topics like vignettes, unit tests or shipping data with packages. 11 | 12 | ## What you need to code along 13 | 14 | Basic R knowledge is strongly recommended. Beyond that you need the following software on the computer you use to participate in the workshop: 15 | 16 | - [R](https://cran.rstudio.com/) 17 | - [RStudio Desktop](https://rstudio.com/products/rstudio/download/#download) 18 | - For Windows users: [Rtools](https://cran.r-project.org/bin/windows/Rtools) 19 | 20 | To test if your R setup is ready for package development, you can try to install a package from Github. 21 | 22 | ``` 23 | if(!require('remotes')) install.packages('remotes') 24 | remotes::install_github("r-lib/devtools") 25 | ``` 26 | 27 | Devtools is also the main package we will need for the workshop. 28 | -------------------------------------------------------------------------------- /abstract_Rpackaging.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Developing R packages" 3 | author: 4 | - Sophie Schmidt <>, DAI 5 | - Petr Pajdla <>, UAM MUNI 6 | - Clemens Schmid <>, MPI-SHH 7 | output: pdf_document 8 | bibliography: references.bib 9 | geometry: "left=2cm,right=2cm,top=0cm,bottom=1cm" 10 | --- 11 | 12 | "Other" session proposal for CAA2021, Cyprus 13 | Organised by the CAA *Scientific Scripting Languages in Archaeology* special interest group ([SIG SSLA](https://sslarch.github.io/)) 14 | 15 | ## Abstract 16 | 17 | A growing number of researchers use the scripting language R [@R] for scientific data analysis. Many organise their code in scripts and functions to perform sequences of data manipulation, statistics and visualisation. Sometimes these workflows gain in complexity and it becomes feasible to outsource core components into a dedicated R package. Packages are one of the best ways to make R code reproducible as they provide a well established structure to share functions, data and their documentation with other R-users. The vast numbers of packages by diverse developers on the Comprehensive R Archive Network ([CRAN](https://cran.r-project.org/)) indicate their popularity in the scientific community and they could very well become a pillar of scientific progress in archaeology [@Schmidt2020]. Indeed more and more packages are also being developed by and for archaeologists (e.g. ). 18 | 19 | For CAA2021 we would like to offer a workshop to teach R-users how to develop R packages from their scripts. We believe that many archaeological R-users do not engage in package development as they lack training and the learning curve *seems* steep. We will try to fill this gap and offer a low-level introduction to R package development for users with basic R-skills. 20 | 21 | This workshop is designed in tandem with the session *"Tools for the Revolution: developing packages for scientific programming in archaeology"* by the SIG SSLA. 22 | 23 | ### Therefore: 24 | 25 | - Do you use the scientific scripting language R for your analyses? 26 | - Do you, too, now have a number of script files flying about and don't know how to organise them? 27 | 28 | Join us and learn how to create an R-package! 29 | 30 | In this workshop we will focus on the main points in Hadley Wickham's book on package development [@wickham_2020, ] and create an example application together. Workshop attendees will get to know a structured workflow, which will aid them in organizing their personal scripts afterwards. 31 | 32 | Basic topics will include: Package setup, function documentation and development cycle. As every package should come with example data, we will show how to implement these into a package, as well as more detailed function explanations within a vignette. Testing routines and licensing for publication, e.g. using git ([Github](https://github.com/), [Gitlab](https://gitlab.com/) or similar) will enable attendees to share their work safely. 33 | 34 | Basic R knowledge is strongly recommended for the workshop. Software requirements will be announced to registered attendees later. 35 | 36 | ## References 37 | -------------------------------------------------------------------------------- /caa2021_Rpackage_workshop.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | -------------------------------------------------------------------------------- /exercises/exercise_build_package.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Exercise package building" 3 | author: "Sophie C. Schmidt" 4 | date: "7 6 2021" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(eval = FALSE, 10 | echo = TRUE) 11 | ``` 12 | 13 | # Create your package! 14 | 15 | ```{r} 16 | usethis::create_package("/path/to/myAMAZINGpackage") 17 | ``` 18 | 19 | R will tell you, what it does, it should look something like this: 20 | 21 | ```r 22 | ✓ Creating '/path/to/myAMAZINGpackage/' 23 | ✓ Setting active project to '/path/to/myAMAZINGpackage' 24 | ✓ Creating 'R/' 25 | ✓ Writing 'DESCRIPTION' 26 | ... tells you what it writes in there 27 | ✓ Writing 'NAMESPACE' 28 | ✓ Writing 'myAMAZINGpackage.Rproj' 29 | ✓ Adding '^myAMAZINGpackage\\.Rproj$' to '.Rbuildignore' 30 | ✓ Adding '.Rproj.user' to '.gitignore' 31 | ✓ Adding '^\\.Rproj\\.user$' to '.Rbuildignore' 32 | ✓ Opening '/path/to/myAMAZINGpackage' in new RStudio session 33 | ✓ Setting active project to '' 34 | ``` 35 | 36 | So you now have a new Rstudio project and the bare bones of a package. 37 | 38 | In Rstudio Cloud now make sure your build tools are up for package building: 39 | 40 | - upper horizontal menu -> Build -> Configure Build Tools -> in drop down menu "project build tools" choose *package* 41 | - a new tab on the upper right hand corner (next to environment) should appear 42 | 43 | ## edit the DESCRIPTION file 44 | Go in your new package to the DESCRIPTION file, open it and change the title, description and authors - with dummy data, if you want. For Title: remember capital letters all around (and no more than 65 character), in the description remember: no more than 80 characters and indent the 2.,3. & 4.th line with 4 empty spaces. 45 | 46 | ## add a function to the R/ folder 47 | 48 | Now we will add a small function to the R-folder. 49 | 50 | Create a script, call it "doublemean.R". In it write: 51 | 52 | ```{r} 53 | doublemean <- function(x) { 54 | y <- mean(x)*2 55 | return(y) 56 | } 57 | ``` 58 | 59 | and save it. 60 | 61 | ## try it! 62 | 63 | Let's try the function we just defined! 64 | 65 | ```{r} 66 | doublemean(c(2,3,4)) 67 | ``` 68 | 69 | Uh. Couldn't find function. 70 | 71 | Yes, because we didn't load our new package yet. We didn't actually run the code. 72 | 73 | ## load your function! 74 | 75 | Now, make sure your workspace is set to your package, load `devtools` and try `load_all()`. There should be a small message saying "Loading yourpackagename". 76 | 77 | Let's now try the doublemean function! 78 | 79 | ```{r} 80 | doublemean(c(2,3,4)) 81 | ``` 82 | 83 | Did it work? Yay! 84 | 85 | 86 | ## adding dependencies 87 | 88 | Now, let's go back to our doublemean function and add a twist. For absolutely logical reasons we want to know the doublemean only, if our vector x doesn't come from the same distribution as c(1,2,3)... else, we just want to be told it's significant. 89 | 90 | ```{r} 91 | doublemean <- function(x) { 92 | #Does x come from the same distribution as 1,2,3? 93 | n <- c(1,2,3) 94 | z <- ks.test(x, n) 95 | if (z$p.value > 0.05) { 96 | y <- mean(x)*2 97 | return(y) 98 | } 99 | else 100 | {print("It's significant")} 101 | 102 | } 103 | 104 | ``` 105 | 106 | Copy the code, save the doublemean.R - file, do a `load_all()` and try the function. 107 | 108 | Why doesn't it work? 109 | 110 | Because we forgot to add the dependency. 111 | 112 | ks.test comes from the package stats. 113 | 114 | We therefore need to do two changes: 115 | 116 | a) Add `stats::` in front of the function call inside our function code and 117 | b) Add `stats` to the `Imports` in the DESCRIPTION 118 | 119 | 120 | Now, after these changes, what will you do? 121 | 122 | Yeah. 123 | 124 | `load_all()` and try again... 125 | 126 | Does it run? 127 | 128 | Super! 129 | 130 | ## check it! 131 | run 132 | ```{r} 133 | devtools::check() 134 | ``` 135 | Are there any errors, warnings or messages plopping up? 136 | Do they help you? 137 | 138 | I can see one warning about the license I specified (I didn't change anything). Don't worry about it for now. a) we will talk about licensing later and b) it's not an error. 139 | 140 | ## export the function 141 | 142 | Now, we are at the moment working "inside" our package. To make the function available "outside" our package, we define it in the NAMESPACE. 143 | Go open NAMESPACE, ignore the line *# Generated by roxygen2: do not edit by hand* for now and type underneath: 144 | 145 | ```{r} 146 | export(doublemean) 147 | ``` 148 | 149 | We will later learn how to do this automatically. For now, let's continue: 150 | 151 | ## build it! 152 | Let's now build our package! 153 | 154 | ```{r} 155 | devtools::build() 156 | ``` 157 | 158 | This should return something like 159 | 160 | ```r 161 | ✓ checking for file ‘/path/myexample/DESCRIPTION’ ... 162 | ─ preparing ‘myexample’: 163 | ✓ checking DESCRIPTION meta-information ... 164 | ─ checking for LF line-endings in source and make files and shell scripts 165 | ─ checking for empty or unneeded directories 166 | Removed empty directory ‘myexample/man’ 167 | ─ building ‘myexample_0.0.0.9000.tar.gz’ 168 | 169 | [1] "/path/myexample_0.0.0.9000.tar.gz" 170 | ``` 171 | 172 | Yay! We've got a tar.gz! Now, let's install it. 173 | 174 | Make sure you're not in the package - project or workspace anymore. Now let's install our package from source: 175 | ```{r} 176 | install.packages("/samepath/myexample_0.0.0.9000.tar.gz", repos = NULL, type="source") 177 | ``` 178 | 179 | And let's have a look whether our amazing doublemean-function runs! 180 | ```{r} 181 | myexample::doublemean(c(3,4,5)) 182 | ``` 183 | 184 | Yay! 185 | 186 | 187 | # Now you! 188 | 189 | Please add another function to your package, in which you check whether a vector is normally distributed and if yes, print the mean, if not, give the message "It's not normal, use median!". Use the `jarque.bera.test()` from the `tseries` - package. If p larger than 0.05 x should be normally distributed. 190 | 191 | 192 | 193 | -------------------------------------------------------------------------------- /exercises/exercise_functions.R: -------------------------------------------------------------------------------- 1 | #### Generate test data #### 2 | input_data <- data.frame( 3 | id = LETTERS[1:26], 4 | value_A = runif(26), 5 | value_B = runif(26) 6 | ) 7 | input_data$value_A[sample(1:26, 5)] <- NA 8 | input_data$value_B[sample(1:26, 5)] <- NA 9 | 10 | #### What the following code does #### 11 | # This (very verbose) code creates a third column for input_data that 12 | # combines the two columns value_A and value_B in a somewhat complicated 13 | # manner. 14 | # This is not a real world application, but resembles typical data-cleaning 15 | # operations. 16 | 17 | # create an empty vector to store the loop's result 18 | # (so the values for the new column) 19 | loop_res <- rep(NA, nrow(input_data)) 20 | # loop to somehow combine the two value columns 21 | # for each row of the data.frame 22 | for (i in seq_len(nrow(input_data))) { 23 | # get the row for the current loop cycle 24 | current_row <- input_data[i,] 25 | if (current_row$id < "G") { 26 | # if the id is smaller than the cutoff-letter "G", 27 | # then the combination of value_A and value_B is very simple 28 | loop_res[i] <- current_row$value_A + current_row$value_B 29 | } else { 30 | # else it gets more complicated, because NA values 31 | # have to be replaced with 0 32 | if (is.na(current_row$value_A)) { 33 | # NA values in value A get replaced with 0 34 | v_A <- 0 35 | } else { 36 | # normal values just stay as they are 37 | v_A <- current_row$value_A 38 | } 39 | if (is.na(current_row$value_B)) { 40 | # NA values in value B get replaced with 0 41 | v_B <- 0 42 | } else { 43 | # again: normal values just stay as they are 44 | v_B <- current_row$value_B 45 | } 46 | # finally the two modified values are added 47 | loop_res[i] <- v_A + v_B 48 | } 49 | } 50 | 51 | # result vector 52 | loop_res 53 | 54 | #### Your task #### 55 | # Imagine you have to perform the operation implemented above for 56 | # hundreds of data.frames equivalent to input_data. 57 | # But you have to treat them with a different cutoff-letter each, 58 | # so not only "G". 59 | 60 | # Can you rewrite this code into one or multiple functions? 61 | 62 | #### Hints #### 63 | # - Take a look at the input data to understand its structure! 64 | # - What are the moving parts in this code, so what are the input 65 | # arguments of your new function? 66 | # - Does this code section where we replace NA by 0 remind you of 67 | # something? 68 | -------------------------------------------------------------------------------- /exercises/solution_exercise_functions.R: -------------------------------------------------------------------------------- 1 | # This solution introduces the function my_function to solve 2 | # the data manipulation task for arbitrary cutoff-letters. 3 | # It also replaces some of the code with the %na0plus% operator 4 | # we already implemented for one of the mini-exercises. 5 | 6 | #### Generate test data #### 7 | input_data <- data.frame( 8 | id = LETTERS[1:26], 9 | value_A = runif(26), 10 | value_B = runif(26) 11 | ) 12 | input_data$value_A[sample(1:26, 5)] <- NA 13 | input_data$value_B[sample(1:26, 5)] <- NA 14 | 15 | #### Solution code #### 16 | 17 | `%na0plus%` <- function(x, y) { 18 | x <- `if`(is.na(x), 0, x) 19 | y <- `if`(is.na(y), 0, y) 20 | x + y 21 | } 22 | 23 | my_function <- function(x, cutoff_letter) { 24 | loop_res <- rep(NA, nrow(x)) 25 | for (i in seq_len(nrow(x))) { 26 | current_row <- x[i,] 27 | if (current_row$id < cutoff_letter) { 28 | loop_res[i] <- current_row$value_A + current_row$value_B 29 | } else { 30 | loop_res[i] <- current_row$value_A %na0plus% current_row$value_B 31 | } 32 | } 33 | return(loop_res) 34 | } 35 | 36 | my_function(input_data, "G") 37 | -------------------------------------------------------------------------------- /figures/Screenshot_documentation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sslarch/caa2021_Rpackage_workshop/a1a1a071d25e2c581b669d4949544deff29c5781/figures/Screenshot_documentation.png -------------------------------------------------------------------------------- /figures/Screenshot_pckgdevcheatsheet.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sslarch/caa2021_Rpackage_workshop/a1a1a071d25e2c581b669d4949544deff29c5781/figures/Screenshot_pckgdevcheatsheet.png -------------------------------------------------------------------------------- /figures/Screenshot_readxl_github.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sslarch/caa2021_Rpackage_workshop/a1a1a071d25e2c581b669d4949544deff29c5781/figures/Screenshot_readxl_github.png -------------------------------------------------------------------------------- /figures/Screenshot_rextensions.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sslarch/caa2021_Rpackage_workshop/a1a1a071d25e2c581b669d4949544deff29c5781/figures/Screenshot_rextensions.png -------------------------------------------------------------------------------- /figures/Screenshot_rpackages.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sslarch/caa2021_Rpackage_workshop/a1a1a071d25e2c581b669d4949544deff29c5781/figures/Screenshot_rpackages.png -------------------------------------------------------------------------------- /handout/caa2021_handout.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sslarch/caa2021_Rpackage_workshop/a1a1a071d25e2c581b669d4949544deff29c5781/handout/caa2021_handout.pdf -------------------------------------------------------------------------------- /handout/caa2021_handout.svg: -------------------------------------------------------------------------------- 1 | 2 | 21 | 23 | 29 | 35 | 41 | 47 | 53 | 54 | 82 | 85 | 92 | 99 | 106 | 113 | 120 | 127 | 134 | 141 | 148 | 152 | 159 | 160 | 162 | 163 | 165 | image/svg+xml 166 | 168 | 169 | 170 | 171 | 172 | 176 | 184 | 195 | 199 | 202 | [CAA 2021~]$ 210 | Call for papers & participants 221 | 222 | 225 | 228 | 232 | 250 | 268 | 286 | $./ 298 | 303 | 304 | 305 | Brought to you by CAA Special Interest GroupScientific Scripting Languages in Archaeology 325 | 326 | 329 | 331 | ~/conference 339 | 14th - 18th June online 356 | 360 | https://2021.caaconference.org 371 | 372 | 1st March deadline for paper submissions 387 | 388 | 389 | 393 | 404 | 405 | 408 | ~/workshop S31 419 | Developing R packages 430 | Sophie C. Schmidt, Petr Pajdla, Clemens Schmid 441 | 445 | https://github.com/sslarch/caa2021_Rpackage_workshop 456 | 457 | 458 | 461 | ~/session S17 469 | Tools for the Revolution:developing packages for scientificprogramming in archaeology 490 | Joe Roe, Martin Hinz, Clemens Schmid 501 | 505 | https://github.com/sslarch/caa2021_packages 516 | 517 | 518 | Please check the repositories for details before applying! 531 | 532 | 533 | -------------------------------------------------------------------------------- /references.bib: -------------------------------------------------------------------------------- 1 | 2 | @Manual{R, 3 | title = {R: A Language and Environment for Statistical Computing}, 4 | author = {{R Core Team}}, 5 | organization = {R Foundation for Statistical Computing}, 6 | address = {Vienna, Austria}, 7 | year = {2020}, 8 | url = {https://www.R-project.org/}, 9 | } 10 | 11 | 12 | @ARTICLE{Schmidt2020, 13 | title = "{Tool-Driven} Revolutions in Archaeological Science", 14 | author = "Schmidt, Sophie C and Marwick, Ben", 15 | journal = "Journal of Computer Applications in Archaeology", 16 | volume = 3, 17 | number = 1, 18 | pages = "18--32", 19 | month = jan, 20 | year = 2020, 21 | issn = "2514-8362", 22 | doi = "10.5334/jcaa.29" 23 | } 24 | 25 | 26 | 27 | @book{wickham_2020, 28 | edition = {2}, 29 | title = {R Packages. Organize, test, document and share your code}, 30 | url = {https://r-pkgs.org/}, 31 | abstract = {source repo: https://github.com/hadley/r-pkgs/}, 32 | publisher = {O'Reilly}, 33 | author = {Wickham, Hadley}, 34 | urldate = {2020-12-14}, 35 | date = {2020} 36 | } -------------------------------------------------------------------------------- /render_slides.R: -------------------------------------------------------------------------------- 1 | # render the Rmd files in slides/ to pdf documents in rendered_slides/ 2 | 3 | Map( 4 | \(x) rmarkdown::render(x, output_dir = "rendered_slides"), 5 | list.files("slides", pattern = ".Rmd", full.names = T) 6 | ) 7 | -------------------------------------------------------------------------------- /rendered_slides/01_slides_intro.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sslarch/caa2021_Rpackage_workshop/a1a1a071d25e2c581b669d4949544deff29c5781/rendered_slides/01_slides_intro.pdf -------------------------------------------------------------------------------- /rendered_slides/02_slides_functions.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sslarch/caa2021_Rpackage_workshop/a1a1a071d25e2c581b669d4949544deff29c5781/rendered_slides/02_slides_functions.pdf -------------------------------------------------------------------------------- /rendered_slides/03_slides_Rpackage_structure.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sslarch/caa2021_Rpackage_workshop/a1a1a071d25e2c581b669d4949544deff29c5781/rendered_slides/03_slides_Rpackage_structure.pdf -------------------------------------------------------------------------------- /rendered_slides/04_slides_documentation.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sslarch/caa2021_Rpackage_workshop/a1a1a071d25e2c581b669d4949544deff29c5781/rendered_slides/04_slides_documentation.pdf -------------------------------------------------------------------------------- /rendered_slides/05_slides_fluffy_context.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sslarch/caa2021_Rpackage_workshop/a1a1a071d25e2c581b669d4949544deff29c5781/rendered_slides/05_slides_fluffy_context.pdf -------------------------------------------------------------------------------- /rendered_slides/06_slides_data.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sslarch/caa2021_Rpackage_workshop/a1a1a071d25e2c581b669d4949544deff29c5781/rendered_slides/06_slides_data.pdf -------------------------------------------------------------------------------- /rendered_slides/07_slides_advanced_topics.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sslarch/caa2021_Rpackage_workshop/a1a1a071d25e2c581b669d4949544deff29c5781/rendered_slides/07_slides_advanced_topics.pdf -------------------------------------------------------------------------------- /rendered_slides/08_slides_wrap_up.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sslarch/caa2021_Rpackage_workshop/a1a1a071d25e2c581b669d4949544deff29c5781/rendered_slides/08_slides_wrap_up.pdf -------------------------------------------------------------------------------- /slides/01_slides_intro.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "'Hello World' and Welcome" 3 | author: "Sophie Schmidt" 4 | output: 5 | beamer_presentation: 6 | slide_level: 2 7 | theme: "Singapore" 8 | includes: 9 | in_header: preamble.tex 10 | editor_options: 11 | chunk_output_type: console 12 | --- 13 | 14 | # Welcome to the R package workshop 15 | 16 | ## short introduction 17 | 18 | Please tell us in one sentence: 19 | 20 | - Who you are and what you hope to gain from this workshop 21 | 22 | ## Reasons to create R packages 23 | 24 | - Sharing with my future self 25 | - Sharing with the scientific community 26 | - Reproducibility 27 | - "speeds up" science 28 | - Standards facilitate use by others 29 | 30 | ## Let's dive in! 31 | 32 | 1. function writing 33 | 34 | 2. package structure 35 | 36 | 3. documentation! 37 | 38 | 4. fluffy context 39 | 40 | 5. data 41 | 42 | 6. advanced topics 43 | 44 | **breaks will be decided by the system** -------------------------------------------------------------------------------- /slides/02_slides_functions.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "R functions" 3 | author: "Clemens Schmid" 4 | output: 5 | beamer_presentation: 6 | slide_level: 2 7 | theme: "Singapore" 8 | includes: 9 | in_header: preamble.tex 10 | editor_options: 11 | chunk_output_type: console 12 | --- 13 | 14 | ```{r setup, include=FALSE} 15 | knitr::opts_chunk$set(eval = FALSE) 16 | ``` 17 | 18 | # Functions 19 | 20 | ## What is a function? 21 | 22 | Functions are code modules that perform a specific operation 23 | 24 | - Self-contained 25 | - Take in specific input and return output 26 | - Code in the function body runs in an own scope 27 | - Black box: We don't have to understand a function to use it 28 | - Reusable 29 | - Can be called over and over again in different contexts 30 | - Can be made more or less generic for more or less specific usecases 31 | - Can be made available to other developers (via R Packages) 32 | 33 | Writing packages is mostly writing functions 34 | 35 | # Functions in R 36 | 37 | ## Functions in R 38 | 39 | R provides several thousand functions in the core packages (base, graphics, stats, methods, ...) 40 | 41 | ```r 42 | mean(c(1,2,3)) 43 | ``` 44 | 45 | More functions are distributed through packages in a large ecosystem (CRAN, Github, ...) 46 | 47 | ```r 48 | c14bazAAR::get_c14data("IRDD") 49 | ``` 50 | 51 | ## Functions in R 52 | 53 | Developers and user can easily define own functions 54 | 55 | ```r 56 | # function definition 57 | myfunc <- function(x, y) { 58 | z <- x + y 59 | return(z) 60 | } 61 | # function application 62 | myfunc(1,2) 63 | ``` 64 | 65 | ## Syntax 66 | 67 | R uses the following syntax for function definition 68 | 69 | ```r 70 | myfunc <- function(x, y) { 71 | z <- x + y 72 | return(z) 73 | } 74 | ``` 75 | - Function name: `myfunc` 76 | - Input arguments: `x` and `y` 77 | - Function body: 78 | 79 | ```r 80 | z <- x + y 81 | return(z) 82 | ``` 83 | 84 | - Return value: `z` 85 | 86 | ## Exercise 87 | 88 | Write a function that 89 | 90 | - takes a numeric vector without NA and 91 | - returns the arithmetic mean of this input vector 92 | 93 | ## Exercise 94 | 95 | Write a function that 96 | 97 | - takes a numeric vector without NA and 98 | - returns the arithmetic mean of this input vector 99 | 100 | Possible solution: 101 | 102 | ```r 103 | mymean <- function(x) { 104 | z <- sum(x) / length(x) 105 | return(z) 106 | } 107 | ``` 108 | 109 | ```r 110 | mymean(c(1,2,3,4)) # 2.5 111 | ``` 112 | 113 | ## Syntax 114 | 115 | We can reduce the function definition: 116 | 117 | Functions don't need an explicit `return` statement. The value of the last statement is automatically returned 118 | 119 | ```r 120 | myfunc <- function(x, y) { x + y } 121 | ``` 122 | 123 | For one-line function bodies we also don't need curly brackets 124 | 125 | ```r 126 | myfunc <- function(x, y) x + y 127 | ``` 128 | 129 | ## Syntax 130 | 131 | Functions do not even need a name: Lambda functions 132 | 133 | ```r 134 | function(x, y) x + y 135 | ``` 136 | 137 | This is useful when we want to use a function only once for a very specific purpose (later: Higher Order functions) 138 | 139 | ```r 140 | myvec1 <- c(1,2,3) 141 | myvec2 <- c(5,6,7) 142 | Map( 143 | f = function(x, y) x + y, 144 | myvec1, myvec2 145 | ) 146 | ``` 147 | 148 | ## Syntax 149 | 150 | R 4.1 introduced even shorter syntax for function definition 151 | 152 | ```r 153 | \(x, y) x + y 154 | ``` 155 | 156 | This makes for really elegant code 157 | 158 | ```r 159 | myvec1 <- c(1,2,3) 160 | myvec2 <- c(5,6,7) 161 | Map( 162 | f = \(x, y) x + y, 163 | myvec1, myvec2 164 | ) 165 | ``` 166 | 167 | But this syntax is very new and only works for the latest R versions 168 | 169 | ## Input and Output 170 | 171 | Each function has zero, one or multiple input arguments, but only exactly one output argument 172 | 173 | Even a function without output returns `NULL` 174 | 175 | ```r 176 | myfunc <- function(x, y) { message(x) } 177 | myfunc(1) # NULL 178 | ``` 179 | 180 | Usually the output of functions is immediately printed on the console. This can be prevented with `invisible` instead of `return` 181 | 182 | ```r 183 | myfunc <- function(x, y) { 184 | z <- x + y 185 | invisible(z) 186 | } 187 | ``` 188 | 189 | ## Types 190 | 191 | Every object in a programming environment has some type. But R functions have no way of being explicit about this: Dynamic type system 192 | 193 | ```r 194 | myfunc <- function(x, y) { x + y } 195 | ``` 196 | 197 | We can use `myfunc` with every input for which the `+` operator is defined, so all kinds of numeric data types 198 | 199 | ```r 200 | class(1.1) # "numeric" 201 | myfunc(1.1, 2.4) # 3.5 202 | ``` 203 | 204 | ```r 205 | class(1L) # "integer" 206 | myfunc(1L, 2L) # 3 207 | ``` 208 | 209 | ```r 210 | class(1i) # "complex" 211 | myfunc(1i, 2i) # 0+3i 212 | ``` 213 | 214 | ## Types 215 | 216 | But it fails for other datatypes 217 | 218 | ```r 219 | class("A") # "character" 220 | myfunc("A", "B") 221 | # Error in x + y : 222 | # non-numeric argument to binary operator 223 | ``` 224 | 225 | ## Types 226 | 227 | Advantages of dynamic typing 228 | 229 | - less verbose function definition, because the types don't have to be named 230 | - potentially very flexible functions (functions often automatically work for multiple input types) 231 | - rapid prototyping 232 | 233 | Disadvantages 234 | 235 | - Nasty error messages for wrong input: No explicit input validation 236 | - No type validation at "compile"-time: significant loss of robustness 237 | 238 | That's just the way R is designed 239 | 240 | ## Namespaces 241 | 242 | Functions we did not define ourself always come from some package 243 | 244 | - Core R functions can just be accessed directly 245 | 246 | ```r 247 | mean() 248 | ``` 249 | 250 | - For all other packages we have to load the respective namespace to access their functions 251 | 252 | ```r 253 | library(ggplot2) 254 | require(magrittr) # rarely used 255 | ``` 256 | 257 | - Or we contextualize the function explicitly with the `::` operator 258 | 259 | ```r 260 | dplyr::mutate() 261 | c14bazAAR:::check_connection_to_url() # internal 262 | ``` 263 | 264 | ## Namespaces 265 | 266 | When developing a package, this becomes less convenient: 267 | 268 | All functions need to be explicitly called, except they come from the base package 269 | 270 | ```r 271 | stats::anova() 272 | ``` 273 | 274 | Alternatively we could use the `importFrom` statement in the `NAMESPACE` file 275 | 276 | ## Exercise 277 | 278 | Convert this function to a function that could exist in a package 279 | 280 | ```r 281 | library(palmerpenguins) 282 | 283 | myfunc <- function() { 284 | bm <- penguins$body_mass_g 285 | bl <- penguins$bill_length_mm 286 | plot(bm, bl) 287 | pearson <- cor(bm, bl, use = "complete.obs") 288 | text(x = 5500, y = 35, labels = pearson) 289 | } 290 | ``` 291 | 292 | ## Exercise 293 | 294 | Convert this function to a function that could exist in a package 295 | 296 | Possible solution: 297 | 298 | ```r 299 | myfunc <- function() { 300 | bm <- palmerpenguins::penguins$body_mass_g 301 | bl <- palmerpenguins::penguins$bill_length_mm 302 | graphics::plot(bm, bl) 303 | pearson <- stats::cor(bm, bl, use = "complete.obs") 304 | graphics::text(x = 5500, y = 35, labels = pearson) 305 | } 306 | ``` 307 | 308 | Package code is often more verbose than script code 309 | 310 | Tidy evaluation: https://dplyr.tidyverse.org/articles/programming.html 311 | 312 | # Advanced topics 313 | 314 | ## Infix operators 315 | 316 | Infix operators are special binary functions, so functions with two arguments, that can be written in between the function arguments 317 | 318 | ```r 319 | 3 + 5 320 | ``` 321 | 322 | R comes with a set of operators, some prefix, some infix, some postfix 323 | 324 | ``` 325 | +, -, *, /, ^, &, |, :, ::, :::, $, =, <-, <<-, ==, 326 | <, <=, >, >=, !=, ~, &&, ||, !, ?, @, :=, (, {, [, [[ 327 | ``` 328 | 329 | R allows you to write infix operators as normal functions and to define own infix operators 330 | 331 | ## Infix operators 332 | 333 | Using an infix operator as a normal function 334 | 335 | ```r 336 | 3 + 5 # 8 337 | ``` 338 | 339 | ```r 340 | `+`(3, 5) # 8 341 | ``` 342 | 343 | 3 is the LHS (Left-hand side) and 5 the RHS (Right-hand side) input of the `+` operator 344 | 345 | ## Infix operators 346 | 347 | Defining your own infix operator 348 | 349 | ```r 350 | `%horseplus%` <- function(x, y) { 351 | z <- x + y 352 | message("This horse likes bread.") 353 | return(z) 354 | } 355 | ``` 356 | 357 | ```r 358 | 3 %horseplus% 5 359 | # This horse likes bread. 360 | # 8 361 | ``` 362 | 363 | All self-defined infix operators have to be fenced with `%` 364 | 365 | https://stackoverflow.com/questions/24697248/is-it-possible-to-define-operator-without 366 | 367 | ## Exercise 368 | 369 | Define an infix operator `%na0plus%` that 370 | 371 | - takes two numeric scalars (so individual numbers, not vectors) 372 | - returns the sum of the input values, but replaces NA with 0 373 | 374 | ```r 375 | NA + 5 # NA 376 | ``` 377 | 378 | ```r 379 | NA %na0plus% 5 # 5 380 | ``` 381 | 382 | ## Exercise 383 | 384 | Define an infix operator `%na0plus%` that 385 | 386 | - takes two numeric scalars (so individual numbers, not vectors) 387 | - returns the sum of the input values, but replaces NA with 0 388 | 389 | Possible solution: 390 | 391 | ```r 392 | `%na0plus%` <- function(x, y) { 393 | x <- `if`(is.na(x), 0, x) 394 | y <- `if`(is.na(y), 0, y) 395 | x + y 396 | } 397 | ``` 398 | 399 | ```r 400 | NA %na0plus% 5 # 5 401 | ``` 402 | 403 | https://codegolf.stackexchange.com/questions/4024/tips-for-golfing-in-r 404 | 405 | ## Chaining functions together 406 | 407 | The pipe `%>%` in the magrittr package is nothing but a clever infix operator 408 | 409 | ```r 410 | c(1,2,3) %>% mean() 411 | ``` 412 | 413 | It *pipes* the LHS *in* as the first argument of the function appearing on the RHS 414 | 415 | That allows for sequences of functions ("tidyverse style") 416 | 417 | ```r 418 | mtcars %>% 419 | dplyr::group_by(cyl) %>% 420 | dplyr::summarise(mean_mpg = mean(mpg)) 421 | ``` 422 | 423 | ## Default input values 424 | 425 | R functions can have default values for all of their arguments. That is a great way to simplify complicated interfaces for normal usecases 426 | 427 | ```r 428 | myfunc <- function(x, y = 5) { 429 | z <- x + y 430 | return(z) 431 | } 432 | ``` 433 | 434 | ```r 435 | myfunc(1) # 6 436 | myfunc(1, 2) # 3 437 | ``` 438 | 439 | ## Default input values 440 | 441 | Default arguments can even be used in the definition of other default arguments 442 | 443 | ```r 444 | calibrate <- function ( 445 | x, choices = c("calrange"), sigma = 2, 446 | calCurves = rep("intcal20", nrow(x)) 447 | ) { ... } 448 | ``` 449 | 450 | ## The ellipsis 451 | 452 | The ellipsis `...` is a very special function argument, that can collect an arbitrary amount of unspecified arguments 453 | 454 | ```r 455 | myfunc <- function(...) { 456 | ell_args <- list(...) 457 | z <- Reduce(`+`, ell_args, init = 0) 458 | return(z) 459 | } 460 | ``` 461 | 462 | ```r 463 | myfunc(1, 2) # 3 464 | myfunc(x = 1, y = 2) # 3 465 | myfunc(x = 1, y = 2, z = 5) # 8 466 | myfunc(1,2,3,4,5,6,7,8,9,10,11,12) # 78 467 | ``` 468 | 469 | ```r 470 | myfunc(1, 2, NA) # NA 471 | ``` 472 | 473 | ## The ellipsis 474 | 475 | The ellipsis can also be combined with normal arguments 476 | 477 | ```r 478 | myfunc <- function(..., na.rm = T) { 479 | ell_args <- list(...) 480 | if (!na.rm) { 481 | z <- Reduce(`+`, ell_args, init = 0) 482 | } else { 483 | z <- Reduce(`%na0plus%`, ell_args, init = 0) 484 | } 485 | z <- return(z) 486 | } 487 | ``` 488 | 489 | ```r 490 | myfunc(1, 2, NA) # 3 491 | ``` 492 | 493 | That is equivalent to `base::sum()` (but very inefficient) 494 | 495 | ## Higher-order functions 496 | 497 | A higher-order function is a function that does one of the following: 498 | 499 | - takes one or more functions as arguments 500 | - returns a function as its result 501 | 502 | R supports this, so functions are "first class citizens" in R 503 | 504 | Why would one want to do this? 505 | 506 | - make a function interface more powerful 507 | - Mapping, Folding, Moving windows... 508 | 509 | ## Higher-order functions 510 | 511 | Functions as an input argument 512 | 513 | ```r 514 | myfunc <- function(vec, f) { 515 | z <- f(vec) 516 | return(z) 517 | } 518 | ``` 519 | 520 | ```r 521 | myfunc(c(1, 2, 3), mean) # 2 522 | myfunc(c(1, 2, 3), sum) # 6 523 | ``` 524 | 525 | ## Higher-order functions 526 | 527 | A function as a function's input and output 528 | 529 | ```r 530 | times_two <- function(x) { x * 2 } 531 | ``` 532 | 533 | ```r 534 | do_it_twice <- function(f) { 535 | function(x) { f(f(x)) } 536 | } 537 | ``` 538 | 539 | ```r 540 | times_two(5) # 10 541 | do_it_twice(times_two)(5) # 20 542 | do_it_twice(do_it_twice(times_two))(5) # 80 543 | ``` 544 | 545 | # Practical concerns 546 | 547 | ## How to write functions 548 | 549 | 1. Define the purpose of your functions 550 | - What operation should be performed? 551 | 2. Think about the function interface 552 | - What goes into the function (input)? 553 | - What should the function return (output)? 554 | 3. Implement the function 555 | - Which algorithm is capable to perform the desired operation? 556 | 557 | ## Functions in scripts 558 | 559 | General advice for using functions in scripts 560 | 561 | - Identify repeating code patterns in your script 562 | - If you do something **three or more** times, then it is worth putting it into a function 563 | - Cover small differences between patterns with function arguments 564 | - Function length: One function should only do one thing, but complete atomization decreases readability 565 | 566 | ## Scripts and Packages 567 | 568 | A script usually covers one workflow, but in a package all code lives in functions, so workflows live in sequences of functions 569 | 570 | ```r 571 | read_data("path/to/file") %>% 572 | mypackage::manipulate_data_A() %>% 573 | another_package::manipulate_data_B() %>% 574 | mypackage::manipulate_data_C() %>% 575 | mypackage::plot_data_D() 576 | ``` 577 | 578 | Not only your functions, of course 579 | 580 | ## Communication with the user 581 | 582 | Package functions have (!) to communicate with the user (beyond the documentation) 583 | 584 | - What is going on? 585 | - Why did something fail? 586 | 587 | Interface options to improve user feedback 588 | 589 | - Check conditions and use `message()`, `warning()`, `stop()` to inform the user 590 | - Write clear, helpful messages with advice how to solve an issue 591 | - Show progress updates and progress bars for long operations 592 | - Catch and handle errors that might occur in complex code (`try`, `?conditions`) 593 | 594 | ## Input argument validation 595 | 596 | As R is a dynamically typed language, a better user experience can be ensured with explicit input argument validation 597 | 598 | ```r 599 | myfunc <- function(x, y) { 600 | checkmate::assert_numeric(x, len = 1) 601 | checkmate::assert_numeric(y, len = 1) 602 | z <- x + y 603 | return(z) 604 | } 605 | ``` 606 | 607 | Different packages simplify this, e.g. the `checkmate` package 608 | 609 | ```r 610 | myfunc(x = 5, y = "cookies") 611 | # Assertion on 'y' failed: Must be of type 'numeric', 612 | # not 'character'. 613 | ``` 614 | 615 | # Final exercise 616 | 617 | ## Final exercise 618 | 619 | https://github.com/sslarch/caa2021_Rpackage_workshop/blob/main/exercises/exercise_functions.R 620 | -------------------------------------------------------------------------------- /slides/03_slides_Rpackage_structure.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "R package structure" 3 | author: "Sophie Schmidt" 4 | output: 5 | beamer_presentation: 6 | slide_level: 2 7 | theme: "Singapore" 8 | includes: 9 | in_header: preamble.tex 10 | editor_options: 11 | chunk_output_type: console 12 | --- 13 | 14 | ```{r setup, include=FALSE} 15 | knitr::opts_chunk$set(eval = FALSE) 16 | ``` 17 | 18 | # What is a package? 19 | 20 | ## What makes a package a package? 21 | 22 | 5 different "states" of a package: 23 | 24 | - source = directory of files with a specific structure 25 | - bundled = compressed into a single file (.tar.gz = "tarball") 26 | - binary = compressed in one file, platform specific (Mac: .tgz, Windows: .zip), used by `install.packages()` 27 | - installed = binary package that’s been decompressed into a package library 28 | - in-memory = after library() 29 | 30 | ## What do we do with a package? 31 | 32 | create: 33 | 34 | - source = directory of files with a specific structure 35 | 36 | build: 37 | 38 | - bundled = compressed into a single file (.tar.gz = "tarball") 39 | - binary = compressed in one file, platform specific (Mac: .tgz, Windows: .zip), used by install.packages() 40 | 41 | use: 42 | 43 | - installed = binary package that’s been decompressed into a package library 44 | - in-memory = after library() 45 | 46 | ## Source Package 47 | 48 | ![screenshot of github of readxl package](../figures/Screenshot_readxl_github.png) 49 | 50 | ## most important elements of a source package 51 | - /R folder 52 | - DESCRIPTION file 53 | - NAMESPACE file 54 | 55 | ## R/ 56 | - the "heart" of the package: all your beautiful functions live here 57 | - each script ends on .R 58 | - what goes in one file? 59 | - if a function is very large, it may live alone 60 | - often: one important function + its helpers 61 | - often: a family of functions 62 | - "utils.R" often contains "helpers" needed in several other functions 63 | - the functions are defined, when the package builds --> make everything a function! 64 | 65 | ## DESCRIPTION 66 | - We <3 Metadata! 67 | - human&machine readable 68 | - shows up on CRAN 69 | - it's what makes a package a package 70 | 71 | - a text file that follows DCF, the Debian control format 72 | - each line consists of a field name with a : (colon) behind it and the value 73 | 74 | ## bare bones DESCRIPTION: 75 | ```r 76 | Package: myexample 77 | Title: What the Package Does (One Line, Title Case) 78 | Version: 0.0.0.9000 79 | Authors@R: 80 | person(given = "xxx", 81 | family = "yyy", 82 | role = c("aut", "cre"), 83 | email = "first.last@example.com" 84 | Description: What the package does (one paragraph). 85 | Encoding: UTF-8 86 | LazyData: true 87 | ``` 88 | ## most important DESCRIPTION parts 89 | 90 | `Title` is a one line description of the package, plain text (no markup), capitalised like a title, does NOT end in a period, < 65 characters 91 | 92 | `Description` is more detailed than the title, one paragraph. If your description spans multiple lines (each line <= 80 characters), indent subsequent lines with 4 spaces 93 | 94 | `Authors@R`: That's you and your collaborators. Think about your roles. 95 | 96 | - aut: author, cre: creator <- must have 97 | - ctb: contributors, cph: copyright holder, ... <- might have 98 | - one person may have more than one role, several ppl may have the same role 99 | - give at least one email 100 | 101 | 102 | # Dependencies 103 | 104 | ## Dependencies management 105 | - we need to manage how to deal with our functions relying on other functions 106 | - Linux users now nod sagely please 107 | - everyone, who ever used library() or require() please nod sagely 108 | - in package building we need to do things differently than in scripts 109 | 110 | ### in CODE: 111 | - DO: `package::function()` 112 | - DON'T `library()` or `require()` ! 113 | - add packages to your DESCRIPTION using `usethis::use_package("pkgname")` 114 | - use the namespace 115 | 116 | ## in DESCRIPTION 117 | 118 | manually add (or using `use_package()` leads to) 119 | ```r 120 | Imports: 121 | dplyr (>= 0.2), 122 | ggplot2 123 | ``` 124 | - packages listed under imports 125 | - are the ones that MUST be there or your package won't work 126 | - will be installed if your package is installed and they are missing 127 | - specific version of a package in () behind the name should be minimum version (>= not just =) 128 | - (otherwise things get complicated fast) 129 | - have a reason for the minimum version, ppl might have to install it 130 | - giving a minimum version leads to better error messages for ppl who may not have the needed version installed 131 | - packages listet under `Suggests` are not necessary for the code, but e.g. example data sets, to build the vignette, ... 132 | 133 | ## what to do about tidyverse pipe 134 | 135 | - `magrittr::%>%` ?? 136 | 137 | - `usethis::use_pipe(export = TRUE)` 138 | - requires roxygen (see later) 139 | - adds magrittr to Imports in DESCRIPTION 140 | - imports the pipe specifically 141 | - exports the pipe (if `export = TRUE`) so that `%>%` is available to the users of your package 142 | --> adds the file `R/utils-pipe.R`, which provides the roxygen template to import and re-export `%>%` 143 | 144 | ## NAMESPACE 145 | - NAMESPACE is another txt just chilling in the project directory 146 | - listed are (most important): 147 | - imports and importFrom: packages & functions you want to load 148 | - exports: functions you define to be used outside of your package 149 | 150 | ```r 151 | # Generated by roxygen2 (4.0.2): do not edit by hand 152 | importFrom(methods,setRefClass) 153 | export(myfunc) 154 | ``` 155 | - just added underneath each other 156 | - usually done by Roxygen2 (see later) 157 | 158 | 163 | 164 | 165 | 166 | # Pack the package! 167 | 168 | ## Create the package 169 | 170 | ### naming problems 171 | - name may contain numbers, letters and periods. 172 | - name must start with a letter and mustn't end with a period 173 | - recommendation: just don't use periods 174 | - have fun and read the blog post here: *https://www.njtierney.com/post/2018/06/20/naming-things/* 175 | 176 | - check whether the name is available: 177 | 178 | ```r 179 | library(available) 180 | 181 | available("doofus") 182 | ``` 183 | 184 | ## make it so (mighty wizard usethis) 185 | 186 | ```r 187 | usethis::create_package("path/to/package/amazingpkgname") 188 | ``` 189 | This path should not lead to your lib or anywhere near your installed packages! 190 | 191 | - now we have a package at the given path which contains the "most important parts": DESCRIPTION, R/, NAMESPACE 192 | 193 | - Rstudio users will notice .Rbuildignore is being created and Rproj-files added to it 194 | - in .Rbuildignore we can add file names, that will be ignored when it's time to build the pkg 195 | - using an R project makes the workflow a bit easier, but isn't necessary 196 | 197 | ## Tweaking workflow 198 | 199 | - “lather, rinse, repeat” cycle of package development: 200 | 201 | 1. Tweak a function 202 | 2. `devtools::load_all()` 203 | 3. Try out the change -> run a small example / test 204 | 205 | - in Rstudio you can do `load_all()` using: 206 | 207 | - Keyboard shortcut: Cmd+Shift+L (macOS), Ctrl+Shift+L (Windows, Linux) 208 | - Build pane’s More ... menu 209 | - Build > Load All 210 | 211 | ## load_all does: 212 | 213 | ```{r} 214 | # with devtools attached and 215 | # working directory = top-level of the source package ... 216 | 217 | load_all() 218 | ``` 219 | 220 | - simulates the process of building, installing and attaching the package 221 | - load_all "sources" the script files safely for you 222 | - `source()` is not a good idea, because paths change during package development 223 | - no need to :: your own package "under development" 224 | 225 | # check that package 226 | 227 | ## R CMD CHECK 228 | - once you're happy the functions work 229 | - and you think you did all the right documentation steps, and added packages to DESCRIPTON etc 230 | - check your package! run: `devtools::check()` or press Ctrl/Cmd + Shift + E (in Rstudio) 231 | 232 | `devtools::check()` 233 | 234 | - ensures that the documentation is up-to-date by running devtools::document(). 235 | - bundles the package before checking it 236 | - sets the NOT_CRAN environment variable to TRUE. This allows you to selectively skip tests on CRAN 237 | - checks a lot: metadata of the package, package-structure, DESCRIPTION & NAMESPACE (esp. dependencies), Code for non-ASCII characters, syntax errors... 238 | - for a list see: *https://r-pkgs.org/r-cmd-check.html* 239 | 240 | ## messages 241 | 242 | - ERROR: needs to be adressed! 243 | - WARNING: needs probably to be fixed if the pckg should go to CRAN 244 | - NOTE: mild problems (will be checked by humans for CRAN submission) 245 | 246 | 247 | ### 3 typical error messages and warnings 248 | - "there is no package" --> forgot to add a package to DESCRIPTION 249 | - "Undocumented code objects" --> forgot to add documentation 250 | - "no visible binding for global variable a" --> happens when using dplyr 251 | ```{r} 252 | # option 1 (then you should also put utils in Imports) 253 | utils::globalVariables(c("a")) 254 | # option 2 255 | a <- NULL 256 | ``` 257 | 258 | ## Build the package 259 | No more error messages?! 260 | 261 | Congrats! Time to build using `devtools::build()` 262 | 263 | - `devtools::build(binary = FALSE)` --> tar.gz (should be usable by anyone) 264 | - `devtools::build(binary = TRUE)` --> platform specific (zip or tgz) to your own platform 265 | 266 | Using `devtools::install()` (re-)installs your package right away on your system and attaches it. 267 | 268 | # excercise 269 | 270 | follow this tutorial: 271 | 272 | https://github.com/sslarch/caa2021_Rpackage_workshop/blob/main/exercises/exercise_build_package.Rmd -------------------------------------------------------------------------------- /slides/04_slides_documentation.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Documentation" 3 | author: "Petr Pajdla" 4 | output: 5 | beamer_presentation: 6 | slide_level: 2 7 | theme: "Singapore" 8 | includes: 9 | in_header: preamble.tex 10 | editor_options: 11 | chunk_output_type: console 12 | --- 13 | 14 | ```{r setup, include=FALSE} 15 | knitr::opts_chunk$set(eval = FALSE) 16 | ``` 17 | 18 | # Documentation 19 | 20 | ## Documentation 21 | 22 | > Crucial both for other people and for future-you! 23 | 24 | - `README` File 25 | 26 | - *Object* documentation 27 | 28 | - documentation of individual objects (functions) 29 | - accessed by `help(myfun)` or `?myfun` 30 | 31 | - Vignettes 32 | 33 | - long-form documentation 34 | - whole *workflows* of functions implemented in the package 35 | 36 | ## Documenting functions 37 | 38 | - Documentation goes into the `man/` directory 39 | - *Each* (non-lambda) function should be documented 40 | 41 | For `myfun()` function: 42 | 43 | ```r 44 | myfun <- fucntion(x) { 45 | ... 46 | } 47 | ``` 48 | 49 | a file `myfun.Rd` exists in `./man` 50 | 51 | ## The `.Rd` file 52 | 53 | - Plain text file, loosely based on `LaTeX` 54 | 55 | ``` 56 | \name{myfun} 57 | \alias{myfun} 58 | \title{myfun, doing this and that (...) 59 | The myfun function calculates (...)} 60 | \usage{ 61 | myfun(x, y, method = "something", type = (...)) 62 | } 63 | \arguments{ 64 | \item{x}{x is a vector of length (...)} 65 | } 66 | \description{ 67 | The myfun function calculates (...) 68 | } 69 | ``` 70 | 71 | # `roxygen2` 72 | 73 | ## `roxygen2` 74 | 75 | - Using **`roxygen2`** simplifies the process of creating documentation 76 | 77 | ### Workflow 78 | 79 | 1. Create a **function** 80 | 2. Add ***roxygen* comments** in the `.R` script with special *tags* 81 | 3. Run `roxygen2::roxygenize()` to generate documentation 82 | 83 | (`devtools::document()` does the trick as well) 84 | 85 | 4. **Preview** the documentation (`help()` or `?`) 86 | 5. Have fun and **repeat**! 87 | 88 | ## *roxygen* comment blocks 89 | 90 | - Written *above* the function definition in a given `.R` file 91 | - Always start with **`#'`** 92 | - Tags for various *sections* `#' @tag` 93 | 94 | ``` 95 | #' My Function. 96 | #' 97 | #' My function does this and that. 98 | #' 99 | #' @param x (...) 100 | #' @return A number (...) 101 | #' 102 | myfun <- function(x) { 103 | ... 104 | } 105 | ``` 106 | 107 | ## Basic structure 108 | 109 | - First sentence = **title of documentation** 110 | 111 | - Sentence Case, ends with a Full Stop. 112 | 113 | - Second paragraph = **description** 114 | 115 | - short description of the function 116 | 117 | - Third and subsequent paragraphs = *details section* 118 | 119 | ![part of documentation of sum function](../figures/Screenshot_documentation.png) 120 | 121 | # Tags 122 | 123 | ## Parameters (function arguments) 124 | 125 | `@param name description` 126 | 127 | - function parameters - document all inputs! 128 | - A sentence, paragraph or even longer text if necessary. 129 | 130 | ``` 131 | #' @param x A numeric vector. 132 | #' @param data A data frame. See below for details. 133 | #' @param x,y Numeric vectors. 134 | ``` 135 | 136 | ## Examples of code 137 | 138 | `@examples` 139 | 140 | - `R` code with examples of the function in practice 141 | - the code **must** work, it is run during the checks (`R CMD check`) 142 | - `\dontrun{...}` = code is not run 143 | 144 | `@example` 145 | 146 | - contains relative path to `.R` file with examples 147 | 148 | ``` 149 | #' @examples 150 | #' mean(c(1, 2, 3)) 151 | #' 152 | #' \dontrun{ 153 | #' mean(c("a", "b", "c")) 154 | #' } 155 | ``` 156 | 157 | ## Function output 158 | 159 | `@return description` 160 | 161 | - describes the output of the function 162 | 163 | ``` 164 | #' @return The default method returns a length-one 165 | #' object of the same type as \code{x}. If (...) 166 | 167 | #' @return An object of the same type as \code{data} 168 | #' is returned. 169 | ``` 170 | 171 | ## Export the function 172 | 173 | `@export` 174 | 175 | - exports the function for the *end* user 176 | - adds a proper line to **`NAMESPACE`** file 177 | - functions that are not exported remain *internal* 178 | 179 | - to access internal function, use `pckgname:::funname` 180 | 181 | 182 | 183 | ## Linking 184 | 185 | `@seealso` 186 | 187 | - points to other resources inside the package or elsewhere 188 | 189 | ``` 190 | #' @seealso For details, see similar 191 | #' function \code{\link{funname}}. 192 | 193 | #' @seealso See \url{http://...} for details. 194 | 195 | #' @seealso See \code{\link[pckgname]{funname}} 196 | #' function from package (...) 197 | ``` 198 | 199 | 202 | 203 | ## Other tags 204 | 205 | `@section` 206 | 207 | - allows to break long texts, i.e., in *Details* section 208 | 209 | `@aliases alias1 alias2 ...` 210 | 211 | - adds additional aliases to the function 212 | - the topic is found by `?alias1` or `?alias2` etc. 213 | 214 | 219 | 220 | 225 | 226 | # `Rd` Markup 227 | 228 | ## Formatting 229 | 230 | - `@` = start of roxygen tag, to write at sign (@), use `@@` 231 | - `%` = LaTeX comment sign, escape it by backwards slash `\%` 232 | - `\` = LaTeX escape sign, escape it to get a single bw. slash `\\` 233 | 234 | ``` 235 | #' @author My Name 236 | ``` 237 | 238 | - `\code{}` - code snippets 239 | - `\link{funname}` - link to function in this package 240 | - `\link[pckgname]{funname}` - link to function in another package 241 | 242 | ``` 243 | #' @return Object of class \code{data.frame}. 244 | #' @seealso See \code{\link[base]{data.frame}} for details. 245 | ``` 246 | 247 | ## Equations 248 | 249 | - standard LaTeX math (without AMS or other extensions) 250 | - `\eqn{}` - inline equation 251 | - `\deqn{}` - block (*display*) equation 252 | 253 | ``` 254 | #' @section Details on maths 255 | #' Sample mean is calculated as 256 | #' \deqn{\overline{x} = \frac{1}{n}\sum_{i=1}^n x_i} 257 | ``` 258 | 259 | should result in something like this: 260 | 261 | $$ 262 | \overline{x} = \frac{1}{n}\sum_{i=1}^n x_i 263 | $$ 264 | *(but in the html help files, equations are simplified...)* 265 | 266 | # Wrap up & exercise 267 | 268 | ## Where to get help? 269 | 270 | - Do I have to know all of this by hearth?! 271 | 272 | - Luckily, nope! 273 | 274 | - package development [cheat sheet](https://raw.githubusercontent.com/rstudio/cheatsheets/master/package-development.pdf) (https://www.rstudio.com/resources/cheatsheets/) 275 | - See the introduction and other vignettes for `roxygen2` package 276 | 277 | - `vignette("roxygen2")` 278 | - `vignette(package = "roxygen2")` 279 | 280 | 281 | If you are using `RStudio`: 282 | 283 | - go to `Help/Roxygen Quick Reference` 284 | 285 | ## Exercise 286 | 287 | - Do you have a package with two functions defined? 288 | 289 | - `doublemean()` 290 | - `normalmean()` (?) 291 | 292 | - Let's document the functions! 293 | 294 | 1. Add meaningful title 295 | 2. Add basic description 296 | 3. Document the parameters 297 | 4. What does the function return? 298 | 5. Any examples? 299 | 6. Export the function to the `NAMESPACE` 300 | 7. Add a link to function `mean` from `base R` 301 | 8. Run `devtools::document()` 302 | 9. Explore the help files with `?doublemean` 303 | 304 | -------------------------------------------------------------------------------- /slides/05_slides_fluffy_context.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Fluff it up!" 3 | author: "Petr Pajdla" 4 | output: 5 | beamer_presentation: 6 | slide_level: 2 7 | theme: "Singapore" 8 | includes: 9 | in_header: preamble.tex 10 | editor_options: 11 | chunk_output_type: console 12 | --- 13 | 14 | ```{r setup, include=FALSE} 15 | knitr::opts_chunk$set(eval = FALSE) 16 | ``` 17 | 18 | # Readme 19 | 20 | ## `README` file 21 | 22 | - Simple `Markdown` file in the root directory (`./README.md`) 23 | - Sometimes all you have to read: 24 | 25 | - purpose of the package 26 | - basic usage of the package 27 | - how to install the package 28 | - stage of the development 29 | - citation, license information and much more 30 | 31 | - use `usethis::use_readme_md()` or `usethis::use_readme_rmd()` if you want to include examples of code 32 | 33 | See examples of `README` files on GitHub, e.g. [c14bazAAR](https://github.com/ropensci/c14bazAAR) 34 | 35 | # Versioning 36 | 37 | ## Semantic versioning 38 | 39 | **`..`**, e.g. 4.2.0 40 | 41 | - **major version** `usethis::use_version("major")` 42 | 43 | incremented when an existing function is changed or removed, i.e. the change might (oh it will!) break existing code 44 | 45 | - **minor version** `usethis::use_version("minor")` 46 | 47 | new functionality is added but the code is backward compatible, i.e. old code works but there are some new functions 48 | 49 | - **patch** `usethis::use_version("patch")` 50 | 51 | updates to existing functions, bugs are fixed etc. 52 | 53 | ### Versioning pacakges in development 54 | 55 | - Start at **0.0.0.9000** and increment when adding features 56 | 57 | `usethis::use_version("dev")` 58 | 59 | # Licenses 60 | 61 | ## Licenses 62 | 63 | - There are many options and differences under various legal systems, in general, you want your code to be accessible... 64 | - What if I do not specify an **open source** license? 65 | 66 | [https://choosealicense.com/](https://choosealicense.com/) helps you with choosing a license... 67 | 68 | - Most common options for software: 69 | 70 | - **MIT License**: simple & permissive 71 | 72 | *"Do whatever you want with my stuff."* 73 | 74 | - **GNU GPLv3**: *copyleft* license 75 | 76 | *"Do whatever you want, but always show the source code."* 77 | 78 | \scriptsize 79 | - What about the `CC-BY-Licenses`? 80 | 81 | - \scriptsize These are licenses for the data (etc.), `CC-BY-4.0` and `CC-BY-SA-4.0` should not be used for software. 82 | 83 | [https://creativecommons.org/choose/](https://creativecommons.org/choose/) 84 | 85 | \tiny 86 | For details, see chapter **[Licenses](https://r-pkgs.org/license.html?q=license#license-compatibility)** in R Packages book by Hadley Wickham and **[Licensing R book](https://thinkr-open.github.io/licensing-r/)** by Colin Fay. 87 | 88 | ## Adding a license 89 | 90 | - License lives in the `LICENSE` file 91 | - It is specified in a proper field in the `DESCRIPTION` 92 | 93 | `usethis::use_*_license()` helper function: 94 | 95 | - `usethis::use_mit_license("First Last")` 96 | 97 | to use MIT License 98 | 99 | - `usethis::use_gpl3_license()` 100 | 101 | to use GNU GPL v3 102 | 103 | - `usethis::use_cc0_license()` or `usethis::use_ccby_license()` 104 | 105 | to use CC License, with a *data* package only! 106 | 107 | # Citations 108 | 109 | ## Citations 110 | 111 | > `We have invested a lot of time and effort in creating R, please cite it when using it for data analysis.` 112 | 113 | - Citing packages employed in your analysis is a good practice 114 | (as well as citing books, articles and other sources you use...) 115 | 116 | - `R` makes this super easy! 117 | 118 | `citation()` returns citation for `R` 119 | 120 | `citation(package = "pckgname")` returns citation for a package 121 | 122 | \footnotesize 123 | sometimes, there are several items you can cite and no BiBTeX, wrap the `citation()` call into `toBibtex()` function: `toBibtex(citation(package = "spatstat"))` 124 | 125 | ## Citing `R` 126 | 127 | \footnotesize 128 | 129 | ``` 130 | > citation() 131 | 132 | To cite R in publications use: 133 | 134 | R Core Team (2021). R: A language and environment (...) 135 | R Foundation for Statistical Computing, Vienna, (...) 136 | URL https://www.R-project.org/. 137 | 138 | A BibTeX entry for LaTeX users is 139 | 140 | @Manual{, 141 | title = {R: A Language and Environment (...)}, 142 | author = {{R Core Team}}, 143 | organization = {R Foundation for Statistical (...)}, 144 | (...) 145 | } 146 | ``` 147 | 148 | ## Adding a citation to your package 149 | 150 | - if you do not add it yourself, it is generated from `DESCRIPTION` 151 | - lives in a file `inst/CITATION` 152 | 153 | **`usethis::use_citation()`** creates the file for you: 154 | 155 | \scriptsize 156 | ``` 157 | citHeader("To cite myAMAZINGpackage in publications use:") 158 | citEntry( 159 | entry = "Article", 160 | title = "myAMAZINGpacakge", 161 | author = as.person("first last"), 162 | journal = "Amazing packages journal", 163 | year = "2021", 164 | volume = "4", 165 | number = "2", 166 | pages = "42-69", 167 | url = "www.myAMAZINGwebsite.org", 168 | textVersion = paste( 169 | "First, Last 2021: myAMAZINGpackage. Amazing packages journal 170 | 4(2), 42-69. www.myAMAZINGwebsite.org")) 171 | ``` 172 | 173 | # Communities 174 | 175 | ## Fostering community development 176 | 177 | - Use git with [GitHub](https://github.com/) or [GitLab](https://gitlab.com/) so people can cooperate 178 | 179 | (slightly more on this in the *Advanced topics*) 180 | 181 | - Be clear on **how** to contribute by including 182 | 183 | - Contributing guidelines 184 | - Code of conduct (`usethis::use_code_of_conduct()`) 185 | - How to troubleshoot issues etc. 186 | 187 | # Exercise 188 | 189 | ## Exercise I 190 | 191 | ### Add and explore basic `README` file, fill it in 192 | 193 | 194 | 195 | ### Add a development version (0.0.0.9000) 196 | 197 | ### Add license 198 | 199 | 1. Explore the `DESCRIPTION` file, what does it say about the license? 200 | 2. Add MIT License with your name to your package. 201 | 3. Explore the `DESCRIPTION`, `LICENSE` and `LICENSE.md` files 202 | 203 | ## Exercise II 204 | 205 | ### Citations 206 | 207 | 1. Load your package 208 | 209 | (`devtools::load_all()` or `Ctrl+Shift+L` in `Rstudio`) 210 | 211 | 2. See the default citation `citation("pckgname")` 212 | 213 | 3. Let's edit the citation! 214 | 215 | 4. Use `usethis::use_citation()` to generate a `CITATION` file 216 | 217 | 5. Fill in the gaps! 218 | 219 | 6. Load the package. 220 | 221 | 7. Inspect the citation. 222 | 223 | 8. Repeat until: 224 | 225 | 1. You get no error messages and/or 226 | 2. It is perfect... 227 | 228 | 229 | -------------------------------------------------------------------------------- /slides/06_slides_data.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Data in R packages" 3 | author: "Sophie Schmidt" 4 | output: 5 | beamer_presentation: 6 | slide_level: 2 7 | theme: "Singapore" 8 | includes: 9 | in_header: preamble.tex 10 | editor_options: 11 | chunk_output_type: console 12 | --- 13 | 14 | ```{r, echo=FALSE} 15 | knitr::opts_chunk$set(eval = FALSE, 16 | echo = TRUE) 17 | ``` 18 | 19 | ## Why should we want to put data in a package? 20 | 21 | - example data for use cases 22 | - distribute data along with a documentation for others to use 23 | - is part of the service your package provides 24 | 25 | # Three ways to add data to a package 26 | 27 | - binary: use the folder data/ 28 | - parsed data, that's not available to the user: store it as R/sysdata.rda 29 | - raw data, availabe for the user: inst/extdata 30 | 31 | --> in package development, working with the source package, data/ is the usual choice 32 | 33 | ## Exported data using /data 34 | 35 | - save each object into an Rdata-file with the same name 36 | - the `use_data()` function can take several objects, will create the data folder and write the objects as Rdata-files in there using the object names for file names. 37 | 38 | ```r 39 | x <- sample(1000) 40 | usethis::use_data(x, mtcars) 41 | ``` 42 | 43 | --> leads to data/x.Rda and data/mtcars.Rda 44 | 45 | - DESCRIPTION: `LazyData: true` --> data will be lazily loaded --> doesn't occupy memory until used 46 | - is the default when using `usethis::create_package()` 47 | 48 | ## raw data 49 | 50 | - data included in the package is often a cleaned version of some raw data 51 | - recommended: include the raw data + the code used to clean it in the source version of the package 52 | - makes it easy to update and reproduce the package 53 | 54 | - this code can go in a data-raw/ folder 55 | - isn't needed in the bundled version of the package --> add it to .Rbuildignore. 56 | 57 | - usethis wizardry does it all for you: 58 | ```r 59 | usethis::use_data_raw() 60 | ``` 61 | - input should be name of dataset --> a string in " " 62 | 63 | # Documenting data 64 | 65 | - objects in data/ always have to be documented! 66 | - similar to documenting functions 67 | - can't write Roxygen documentation "into" the dataset 68 | - instead: write it in an R-file in R/ with the same name as the data 69 | - like function documentation: #', first paragraph = title, second paragraph = description 70 | 71 | ## example documentation (from `ggplot2`): 72 | \scriptsize 73 | ```{r} 74 | #' Prices of 50,000 round cut diamonds. 75 | #' 76 | #' A dataset containing the prices and other attributes of almost 54,000 77 | #' diamonds. 78 | #' 79 | #' @format A data frame with 53940 rows and 10 variables: 80 | #' \describe{ 81 | #' \item{price}{price, in US dollars} 82 | #' \item{carat}{weight of the diamond, in carats} 83 | #' ... 84 | #' } 85 | #' @source \url{http://www.diamondse.info/} 86 | "diamonds" 87 | ``` 88 | - `@format` overview over dataset, description of variables and their units + `@source`: where you got the data from 89 | - DON'T `@export` your data 90 | 91 | ## internal data 92 | 93 | - sometimes functions need "invisible" pre-computed data tables 94 | - save these in R/sysdata.rda 95 | - example: munsell uses R/sysdata.rda to store large tables of colour data 96 | 97 | - `usethis::use_data()` to create this file with the argument `internal = TRUE`: 98 | 99 | ```{r} 100 | x <- sample(1000) 101 | usethis::use_data(x, mtcars, internal = TRUE) 102 | ``` 103 | 104 | - code used to prepare this --> data-raw/ 105 | - Objects in R/sysdata.rda are not exported --> don’t need to be documented 106 | 107 | ## Raw data for the bundled package 108 | 109 | - if you want to show e.g. how to load raw data --> inst/extdata 110 | - all files in inst/ move up one level to the top-level directory when built 111 | - to refer to files in inst/extdata (whether installed or not), use system.file() 112 | - readr package uses inst/extdata to store delimited files for use in examples: 113 | 114 | \scriptsize 115 | ```{r} 116 | system.file("extdata", "mtcars.csv", package = "readr") 117 | #> [1] "/Users/runner/work/_temp/Library/readr/extdata/mtcars.csv" 118 | 119 | ``` 120 | - by default, if the file does not exnist, system.file() does not return a error - it just returns the empty string: 121 | - argument mustWork = TRUE --> error message if file doesn't exist 122 | 123 | # Exercise! 124 | 125 | - create a small dataset, save it with `usethis::use_data()` and try `usethis::use_data_raw()` 126 | - document the data in an R-file within the R-folder (same name as dataset!) 127 | - remember to use `@format` with 128 | 129 | \scriptsize 130 | ```r 131 | \describe{ 132 | \item {variable}{unit} 133 | } 134 | 135 | # example data: 136 | ceram <- data.frame(c("A","B","C"), c(10,5,2), c(10.5,2.6,3.4)) 137 | colnames(ceram) <-c("sites", "n_types", "ha") 138 | ``` 139 | 140 | -------------------------------------------------------------------------------- /slides/07_slides_advanced_topics.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Advanced Topics" 3 | author: "Clemens Schmid" 4 | output: 5 | beamer_presentation: 6 | slide_level: 2 7 | theme: "Singapore" 8 | editor_options: 9 | chunk_output_type: console 10 | --- 11 | 12 | ```{r setup, include=FALSE} 13 | #knitr::opts_chunk$set(echo = FALSE) 14 | ``` 15 | 16 | ## Git 17 | 18 | **https://r-pkgs.org/git.html** 19 | 20 | Git is a version control system that documents all changes to files in a directory 21 | 22 | - Documentation of all steps in the development process 23 | - Browsing and rolling back to old stages 24 | - Safe collaboration on the same files 25 | 26 | ```r 27 | ?usethis::use_git() 28 | ``` 29 | 30 | ## Github 31 | 32 | **https://github.com** 33 | 34 | Github is an online platform to store, publish and manage Git projects 35 | 36 | - Free and easy "backup" and publication 37 | - Low lock-in, because based on git 38 | - Manage collaborative working: Issues, Pull Requests 39 | - Well implemented interaction with many coding web services 40 | - Social media features: A community of coding archaeologists 41 | 42 | ```r 43 | ?usethis::use_github() 44 | ``` 45 | 46 | ## Vignettes 47 | 48 | **https://r-pkgs.org/vignettes.html** 49 | 50 | Vignettes are a special part of the R package documentation, that explains the workflows a package supports 51 | 52 | ```r 53 | browseVignettes() 54 | vignette(topic = "introduction", package = "cowplot") 55 | ``` 56 | 57 | - Free form: Tutorial/Blog post/Book chapter/Paper 58 | - Less technical, more focussed on applications and usecases 59 | - Usually written in Rmarkdown 60 | 61 | ```r 62 | ?usethis::use_vignette() 63 | ``` 64 | 65 | ## Unit testing 66 | 67 | **https://r-pkgs.org/tests.html** 68 | 69 | A unit test is test code to check if a function returns what you expect given a certain input 70 | 71 | - Guarantee correctness of complex functions by testing them and their smaller parts 72 | - Facilitates refactoring and prevents you from accidentally breaking things 73 | - Test code serves as a concrete example for how your functions can be used 74 | 75 | ```r 76 | ?usethis::use_testthat() 77 | ``` 78 | 79 | ## CI 80 | 81 | **https://github.com/r-lib/actions** 82 | 83 | Continuous integration means automatic testing and checking of code changes. Multiple companies offer free web services for open source projects (Travis-CI, Gitlab, Github) 84 | 85 | - Each change to your project triggers a full test 86 | - Clean, virtual environments and multiple OS 87 | - Not just for testing, but also for any other operations (e.g. deploying a static website) 88 | - Should be used sparingly to save energy 89 | 90 | ```r 91 | ?usethis::use_github_action 92 | ``` 93 | 94 | ## OOP and custom types 95 | 96 | **https://adv-r.hadley.nz/oo.html** 97 | 98 | R supports multiple different Object-oriented programming systems to implement packages with specific needs for their data types 99 | 100 | - Competing systems: S3, S4, RC, R6, R.oo, proto, ? 101 | - Many packages rely on custom S3, S4 or R6 classes 102 | - Many functions in these packages are class methods, so they only work with specific input data types or work explicitly different for different input types 103 | - base R relies extensively on S3 classes 104 | 105 | ## OOP and custom types 106 | 107 | ```r 108 | ?as.data.frame 109 | ``` 110 | 111 | ```r 112 | ## S3 method for class 'list' 113 | as.data.frame( 114 | x, row.names = NULL, optional = FALSE, ..., 115 | cut.names = FALSE, col.names = names(x), 116 | fix.empty.names = TRUE, 117 | stringsAsFactors = default.stringsAsFactors() 118 | ) 119 | 120 | ## S3 method for class 'matrix' 121 | as.data.frame( 122 | x, row.names = NULL, optional = FALSE, 123 | make.names = TRUE, ..., 124 | stringsAsFactors = default.stringsAsFactors() 125 | ) 126 | ``` 127 | 128 | ## Compiled code 129 | 130 | **https://r-pkgs.org/src.html** 131 | 132 | R packages can incorporate code from compiled languages to speed up processes 133 | 134 | - C, C++, Fortran, ... 135 | - Orders-of-magnitude performance increases 136 | - Steep learning curve 137 | - C++ with Rcpp (https://adv-r.hadley.nz/rcpp.html) 138 | 139 | ```r 140 | ?usethis::use_rcpp() 141 | ``` 142 | 143 | ## Releasing a package to CRAN 144 | 145 | **https://r-pkgs.org/release.html** 146 | 147 | At the very end of the initial package development process you can consider a submission to CRAN 148 | 149 | - Intensive checks on different test systems 150 | - The CRAN submission process 151 | - There are serious alternatives to a CRAN submission (https://ropensci.org/r-universe) 152 | 153 | `?devtools::release()` 154 | 155 | -------------------------------------------------------------------------------- /slides/08_slides_wrap_up.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Wrap up and final discussion" 3 | author: "Sophie Schmidt, Clemens Schmid and Petr Pajdla" 4 | output: 5 | beamer_presentation: 6 | slide_level: 2 7 | theme: "Singapore" 8 | includes: 9 | in_header: preamble.tex 10 | editor_options: 11 | chunk_output_type: console 12 | --- 13 | 14 | ```{r setup, include=FALSE} 15 | knitr::opts_chunk$set(eval = FALSE) 16 | ``` 17 | 18 | # Recap 19 | 20 | ## Recap 21 | 22 | - Adopting package development techniques is good for: 23 | 24 | - future-self 25 | - the community (reproducibility etc.) 26 | 27 | - Don't be afraid to use the *wizards*, they are here to help! 28 | 29 | - `devtools` 30 | - `usethis` 31 | - `roxygen2` 32 | 33 | ## The basic `R` package directory tree 34 | 35 | \scriptsize 36 | ``` 37 | . 38 | |--- DESCRIPTION 39 | |--- NAMESPACE 40 | |--- mypackage.Rproj 41 | |--- R 42 | | |--- myfunction.R 43 | | |--- myotherfunction.R 44 | |--- man 45 | | |--- myfunction.Rd 46 | | |--- myotherfunction.Rd 47 | ``` 48 | 49 | ## Package structure after fluffing it up 50 | 51 | \scriptsize 52 | ``` 53 | . 54 | |--- DESCRIPTION 55 | |--- NAMESPACE 56 | |--- mypackage.Rproj 57 | |--- R 58 | | |--- myfunction.R 59 | | |--- myotherfunction.R 60 | |--- man 61 | | |--- myfunction.Rd 62 | | |--- myotherfunction.Rd 63 | |--- README.md 64 | |--- LICENSE 65 | |--- LICENSE.md 66 | |--- INST 67 | | |--- CITATION 68 | ``` 69 | 70 | # Going further 71 | 72 | ## Where to learn more I 73 | 74 | - Official `CRAN` **Writing R Extensions** manual 75 | 76 | - [https://cran.r-project.org/doc/manuals/R-exts.html](https://cran.r-project.org/doc/manuals/R-exts.html) 77 | - very long and throughout 78 | 79 | \begin{center} 80 | \includegraphics[width=300px]{"../figures/Screenshot_rextensions.png"} 81 | \end{center} 82 | 83 | ## Where to learn more II 84 | 85 | - **R Packages** book by Hadley Wickham and Jenny Bryan 86 | 87 | - online version at **[https://r-pkgs.org/](https://r-pkgs.org/)** 88 | 89 | \begin{center} 90 | \includegraphics[width=120px]{"../figures/Screenshot_rpackages.png"} 91 | \end{center} 92 | 93 | ## Where to get help? 94 | 95 | - Package development **cheat sheet** 96 | 97 | - [https://www.rstudio.com/resources/cheatsheets/](https://www.rstudio.com/resources/cheatsheets/) 98 | 99 | \begin{center} 100 | \includegraphics[width=300px]{"../figures/Screenshot_pckgdevcheatsheet.png"} 101 | \end{center} 102 | 103 | - and of course **help files** and package **vignettes**... 104 | 105 | ## Questions and general discussion 106 | 107 | Don't forget to check out session 108 | 109 | - **S17 Tools for the Revolution:** 110 | 111 | Developing packages for scientific programming in archaeology 112 | 113 | - organized by Joe Roe, Martin Hinz and Clemens Schmid 114 | - Wednesday, **June 16th 10:50 - 14:20** EET 115 | 116 | For materials from the workshop, see 117 | 118 | [https://github.com/sslarch/caa2021_Rpackage_workshop](https://github.com/sslarch/caa2021_Rpackage_workshop) 119 | 120 | ## 121 | 122 | Please give us some feedback! 123 | 124 | [https://forms.gle/MzTsMntBxtCCmge1A](https://forms.gle/MzTsMntBxtCCmge1A) 125 | 126 | \vfill 127 | 128 | \centering 129 | \Large 130 | **Thank you for attending the workshop!** 131 | 132 | 133 | \flushright 134 | \normalsize 135 | *Sophie, Clemens and Petr* -------------------------------------------------------------------------------- /slides/preamble.tex: -------------------------------------------------------------------------------- 1 | \setbeamertemplate{navigation symbols}{} 2 | \setbeamertemplate{footline}[page number] --------------------------------------------------------------------------------