├── .github └── FUNDING.yml ├── .gitignore └── README.md /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | # These are supported funding model platforms 2 | 3 | patreon: bryanjenks 4 | github: tallguyjenks 5 | custom: ["https://www.buymeacoffee.com/tallguyjenks", "https://www.paypal.me/tallguyjenks"] 6 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Table Of Contents 2 | 3 | * [Setup](#setup) 4 | - [Step 1 - New Package](#Create-a-new-package-file) 5 | - [Step 2 - DESCRIPTION](#Fill-out-description-file) 6 | - [Step 3 - Package Loading](#use-the-package-loading-script) 7 | - [Step 4 - TODO Management](#todo-management) 8 | - [Step 5 - Data/](#create-data-directory) 9 | - [Step 6 - .Rbuildignore](#Update-Rbuildignore) 10 | - [Step 7 - Ethics](#ethics) 11 | * [Analysis](#analysis) 12 | - [Step 1 - Writing .Rmd](#Begin-Writing-Your-Content) 13 | - [Step 2 - Visualization](#visualization) 14 | - [Step 3 - New Functions](#Create-New-R-Function-as-needed) 15 | - [Step 4 - Unit Tests](#write-unit-tests) 16 | - [Step 5 - Iteration](#test-fix-iterate) 17 | - [Step 6 - Documentation](#Document-Completed-R-Functions) 18 | - [Step 7 - Documentation Compilation](#Compile-Your-Documentation) 19 | * [Modeling](#modeling) 20 | * [References](#references) 21 | * [Reproducability](#reproducability) 22 | * [Fine Tuning](#fine-tuning) 23 | * [Clean Code](#clean-code) 24 | - [Step 1 - Devtools Check](#devtools-check) 25 | - [Step 2 - Lintr](#lintr) 26 | - [Step 3 - README](#a-readme) 27 | - [Step 4 - Issue Templates](#issue-templates) 28 | - [Step 5 - CHANGELOG](#changelog) 29 | * [Tips](#tips) 30 | 31 | # SETUP 32 | 33 | [Return To Table Of Contents](#table-of-contents) 34 | 35 | ## Create a new package file 36 | 37 | [Return To Table Of Contents](#table-of-contents) 38 | 39 | > file --> New Project --> New/Existing Directory --> R Package 40 | 41 | ## Fill out description file 42 | 43 | [Return To Table Of Contents](#table-of-contents) 44 | 45 | ```yaml 46 | Package: workflow 47 | Title: A Robust workflow for software driven data analysis 48 | Version: 0.0.1 49 | Authors@R: 50 | person(given = "Bryan", 51 | family = "Jenks", 52 | role = c("aut", "cre"), 53 | email = "bryanjenks@protonmail.com", 54 | comment = c(ORCID = "0000-0002-9604-3069")) 55 | Description: A Robust document for discussing a great way to structure analysis. 56 | License: MIT file LICENSE # The license can be written out in the 'LICENSE' File 57 | Encoding: UTF-8 58 | LazyData: true 59 | # Roxygen: list(markdown = TRUE) if you want markdown support for the documentation use this option 60 | ``` 61 | 62 | 63 | ## use the package loading script 64 | 65 | [Return To Table Of Contents](#table-of-contents) 66 | 67 | This way it just loops over a vector of the packages and installs what isnt alread installed and loads what is installed so it is available for the RMarkdown product. 68 | 69 | ```r 70 | packages <- c("tidyverse", "here", "todor", "lintr", "DT", "kableExtra", "roxygen2", "testthat", "usethis", "devtools", "tidylog") 71 | xfun::pkg_attach2(packages, message = FALSE) 72 | ``` 73 | 74 | If performing a reporoducable analysis utilize `packrat` for a snapshot of your utilized packages / libraries. 75 | 76 | ```r 77 | # setup packrat snapshot in your new package/project 78 | packrat::init(here::here()) 79 | # To add package for use to your project in this snapshot environment you install as normal: 80 | install.packages("runes") 81 | # when you're ready to save your snapshot to packrat for your reproducable project: 82 | packrat::snapshot() 83 | # to check the status of your snapshot 84 | packrat::status() 85 | # to remove a package from your snapshot 86 | remove.packages("runes") 87 | # and to restore one 88 | packrat::restore() 89 | # if packages are not used: 90 | # Use packrat::clean() to remove them. Or, if they are actually needed 91 | # by your project, add `library(packagename)` calls to a .R file 92 | # somewhere in your project. 93 | ``` 94 | 95 | There are also plenty of GUI options for working with `packrat` in RStudio 96 | 97 | ## TODO management 98 | 99 | [Return To Table Of Contents](#table-of-contents) 100 | 101 | If you have multiple files or a large RMarkdown document and you use commented `` items and want to see where all of them are then use the `todor` package with the following snippet 102 | 103 | ```r 104 | # Create a vector of document paths in the current directory (use with HERE() package) 105 | # This is great for multiple R markdown documents 106 | docs <- dir(pattern = "*.Rmd") %>% 107 | as.vector() 108 | todor::todor(file = docs) 109 | 110 | # A less hacky way of checking a whole PACKAGE for TODO's is just the built in function: 111 | todor::todor_package() 112 | ``` 113 | 114 | ## Create Data directory 115 | 116 | [Return To Table Of Contents](#table-of-contents) 117 | 118 | Create the `Data/` directory to hold raw data files that will be cleaned and processed by `R` scripts in the `R/` directory for the RMarkdown document when sourced. 119 | 120 | To save tibbles or data from `R` that has already been tidy-ified to make sure they dont lose their specifications i.e. that a column is a factor, etc etc use the _{feather}_ package. 121 | 122 | ```r 123 | library(feather) 124 | feather::write_feather(,) 125 | feather::read_feather() 126 | ``` 127 | 128 | ## Update Rbuildignore 129 | 130 | [Return To Table Of Contents](#table-of-contents) 131 | 132 | When building a package for installation and reproducablilty have the build process ignore certain files, driectories and other things that it shouldn't touch during the build process 133 | 134 | **OPTIONAL** 135 | 136 | if keeping the package in `GIT` version control then also update the `.gitignore` 137 | 138 | 139 | ## Ethics 140 | 141 | [Return To Table Of Contents](#table-of-contents) 142 | 143 | "`deon` is a command line tool that allows you to easily add an ethics checklist to your data science projects. The conversation about ethics in data science, machine learning, and AI is increasingly important. The goal of `deon` is to push that conversation forward and provide concrete, actionable reminders to the developers that have influence over how data science gets done." 144 | 145 | [deon](https://github.com/drivendataorg/deon) 146 | 147 | [![Deon badge](https://img.shields.io/badge/ethics%20checklist-deon-brightgreen.svg?style=popout-square)](http://deon.drivendata.org/) 148 | 149 | # ANALYSIS 150 | 151 | [Return To Table Of Contents](#table-of-contents) 152 | 153 | ## Begin Writing Your Content 154 | 155 | [Return To Table Of Contents](#table-of-contents) 156 | 157 | In your RMarkdown Document you can begin filling in your content with what ever template or way you prefer to write in your document. There are many ways to convey the results and workflow of your analysis, you have a package, a single stand alone RMarkdown document, a bookdown book, HTML output only, theres a million ways to perofrm an analysis and this is just going to be a document about some of the more common parts of the workflow with nuances left to personalization and preference. 158 | 159 | Never use `require()` or `library()` in a packaged analysis, put these items in the DESCRIPTION file as imports or suggests to import them 160 | 161 | For local file management in the `.Rproj` project directory, i and many many others prefer to use the `here` package that uses the project root directory as the relative root and use relative directory references to reference other files in your package. 162 | 163 | ## Visualization 164 | 165 | [Return To Table Of Contents](#table-of-contents) 166 | 167 | Two very great addins in RStudio for graphically editing and creating initial plots and visualizations without having to type all the code from scratch: 168 | 169 | - `esquisse` --- Initial plot creation to minimize boiler plate writing 170 | - `ggedit` --- Editing created plots graphically 171 | - `colourpicker` --- Custom color code pickers for themes and general use 172 | 173 | ## Create New R Function as needed 174 | 175 | [Return To Table Of Contents](#table-of-contents) 176 | 177 | functions into seperate `R` script files in `R/` and if there are a lot of functions group their filenames with some sort of convention that groups them `AAA_Function.R` 178 | 179 | ## Write Unit Tests 180 | 181 | [Return To Table Of Contents](#table-of-contents) 182 | 183 | To start using unit tests `devtools::use_testthat()` 184 | 185 | to run all current tests `Ctrl + Shift + T` or `devtools::test()` 186 | 187 | ## Test Fix Iterate 188 | 189 | [Return To Table Of Contents](#table-of-contents) 190 | 191 | Run your tests on your developing functions and fix any **ERRORS**, **WARNINGS**, or **NOTES** that come up 192 | 193 | To find answers to your errors you can use the [`tracestack`](https://github.com/dgrtwo/tracestack) package to find the last error message on stackoverflow 194 | 195 | ## Document Completed R Functions 196 | 197 | [Return To Table Of Contents](#table-of-contents) 198 | 199 | Use `roxygen2` documentation on all functions script files in `R/` 200 | 201 | - **First line:** Title 202 | - **Second line:** Description 203 | - **Subsequent lines:** Details 204 | 205 | [A link to Cheat Sheet Documentation](https://roxygen2.r-lib.org/articles/rd-formatting.html#introduction) 206 | 207 | Bare Bones Template: 208 | 209 | ```r 210 | #' @title # This Is the Name of your funtion 211 | #' @description # This is a good explanation of your function 212 | #' @detail # This is each granular detail of your function (there can be multiple of these sections) 213 | #' @param # This is a parameter of your function 214 | #' @return # This is what your function returns 215 | #' @export # This is how your function gets exported to the NAMESPACE and is available for use after library() otherwise you use ::: 216 | ``` 217 | 218 | [Documentation Info](http://r-pkgs.had.co.nz/man.html) 219 | 220 | ## Compile Your Documentation 221 | 222 | [Return To Table Of Contents](#table-of-contents) 223 | 224 | Run `devtools::document()` (or press `Ctrl + Shift + D` in RStudio) to compile your documents into function documentation that appears in the `man/` directory and the NAMESPACE that contains all `@export` functions. 225 | 226 | # MODELING 227 | 228 | beyond just the `lm()` function, you can make a model object by `model <- lm(var1 ~ var2 + var4, data)` and then wrap that model object with 229 | 230 | ```r 231 | performance::check_model(model) 232 | ``` 233 | 234 | and the output is graphical and awesomely useful. it is a bit slow though 235 | 236 | 237 | 238 | # REFERENCES 239 | 240 | [Return To Table Of Contents](#table-of-contents) 241 | 242 | Writing a bibliography for your R packages 243 | 244 | ```r 245 | # automatically create a bib database for R packages 246 | knitr::write_bib(c( 247 | .packages(), packages #this is made in the lib loading section 248 | ), 'packages.bib') 249 | ``` 250 | 251 | in your `yaml` portion of the RMarkdown document you can use a `yaml` array to contain multiple `.bib` files to have one solely for your R Packages that are generated from the code chunk above and also any other cited sources you wish to compile manually or otherwise. like so: 252 | 253 | ```yaml 254 | bibliography: [cited.bib, packages.bib] 255 | ``` 256 | 257 | and for packages, you can use this `yaml` trick to have all non-inline citations i.e. the `R` packages used, immediately cited at the end of the document: 258 | 259 | ```yaml 260 | nocite: '@*' 261 | ``` 262 | 263 | you can also use the {`citr`} package to use an RStudio addin for citations, Now in newer versions of RStudio this integrates seamlessly with Zotero. 264 | 265 | # REPRODUCABILITY 266 | 267 | [Return To Table Of Contents](#table-of-contents) 268 | 269 | One of the most important parts of science and academia is the ability for research or conclusions to be reproduced. People shouldn't be wondering what software you were using, or what versions of them you were running, and one way of capturing this information would be to capture your session info in a text document. 270 | 271 | ```r 272 | writeLines(capture.output(sessionInfo()), "sessionInfo.txt") 273 | ``` 274 | 275 | # FINE TUNING 276 | 277 | [Return To Table Of Contents](#table-of-contents) 278 | 279 | with the `profvis` package you can select a chunk of code and in RStudio click the profile option and profile the selected code and show resource intensive code and tune them to be more efficient. Good background on this from [HERE](https://resources.rstudio.com/rstudio-conf-2017/understand-code-performance-with-the-profiler-winston-chang) 280 | Another tool to test variations of a function to see which is the most maximally efficient is the `microbenchmark` package. 281 | 282 | ```r 283 | # this defaults to 100 iterations of each passed function call and will benchmark them 284 | # and tells you which of the different variations are maximally efficient. 285 | microbenchmark::microbenchmark(func1, ...) 286 | ``` 287 | 288 | # CLEAN CODE 289 | 290 | [Return To Table Of Contents](#table-of-contents) 291 | 292 | ## Devtools Check 293 | 294 | [Return To Table Of Contents](#table-of-contents) 295 | 296 | To check if your pacakge is ready for distribution and installable use: 297 | 298 | `devtools::check()`, or press `Ctrl + Shift + E` in RStudio. to check your package for ERRORS, WARNINGS, or NOTES 299 | 300 | ## Lintr 301 | 302 | [Return To Table Of Contents](#table-of-contents) 303 | 304 | use `lintr` for linting your R code 305 | 306 | ```r 307 | # Good suggestions for making legible and consistently formatted code 308 | lintr::lint_package() 309 | ``` 310 | 311 | ## A README 312 | 313 | [Return To Table Of Contents](#table-of-contents) 314 | 315 | Use `README.md` file for github or just general user info, even keep an `.Rmd` document that compiles to a markdown document if you so wish that can be used to explain the package to users in a medium --> long form format so the user knows what to do to reproduce the analysis or use the package. 316 | 317 | ## Issue Templates 318 | 319 | [Return To Table Of Contents](#table-of-contents) 320 | 321 | [IBM Watson](https://github.com/IBM-Watson/design-guide/wiki/Issue-Label-Style-Guide_) 322 | 323 | ## CHANGELOG 324 | 325 | [Return To Table Of Contents](#table-of-contents) 326 | 327 | Use `NEWS.md` as the CHANGELOG for your package 328 | 329 | i start all changelogs with semantic versioning from the 'Keep a change log project' a snippet of that changelog might look like this: 330 | 331 | ```markdown 332 | # Changelog 333 | 334 | All notable changes to this project will be documented in this file. 335 | 336 | The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), 337 | and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). 338 | 339 | ## Summary 340 | 341 | > Given a version number MAJOR.MINOR.PATCH, increment the: 342 | > 343 | > 1. MAJOR version when you make incompatible API changes, 344 | > 2. MINOR version when you add functionality in a backwards compatible manner, and 345 | > 3. PATCH version when you make backwards compatible bug fixes. 346 | > 347 | > Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format. 348 | 349 | ### Guiding Principles 350 | 351 | > Changelogs are for _humans_, not machines. 352 | > There should be an entry for every single version. 353 | > The same types of changes should be grouped. 354 | > Versions and sections should be linkable. 355 | > The latest version comes first. 356 | > The release date of each version is displayed. 357 | > Mention whether you follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html). 358 | 359 | ### Types of changes 360 | 361 | > Added for new features. 362 | > Changed for changes in existing functionality. 363 | > Deprecated for soon-to-be removed features. 364 | > Removed for now removed features. 365 | > Fixed for any bug fixes. 366 | > Security in case of vulnerabilities. 367 | 368 | ## [Unreleased] 369 | 370 | ### Added 371 | 372 | - 373 | 374 | ### Changed 375 | 376 | - 377 | 378 | ### Deprecated 379 | 380 | - 381 | 382 | ### Removed 383 | 384 | - 385 | 386 | ### Fixed 387 | 388 | - 389 | 390 | ### Security 391 | 392 | - 393 | 394 | ## [0.0.1] - 2020-01-22 395 | 396 | ### Added 397 | 398 | - Core functionality of a-z alphabet translation to Elder Futhark 399 | - all special characters, numeric, or otherwise all ignored as pass through 400 | - New function `runes_table()` create a 3 column data frame with the unicode sequence, transcription, and character of the entire Elder Futhark alphabet to be used for inputting into documents, reference, or any other purpose 401 | 402 | ### Changed 403 | 404 | - new parameter to `runes()` `hide=FALSE` is now the default option but when set to true, the English 'x' & 'q' characters will not pass through the function and appear at all since there is no equivalent rune. 405 | + added unit tests to support this new parameter option 406 | 407 | ### Deprecated 408 | 409 | - 410 | 411 | ### Removed 412 | 413 | - 414 | 415 | ### Fixed 416 | 417 | - 418 | 419 | ### Security 420 | 421 | - 422 | 423 | 424 | ``` 425 | 426 | # TIPS 427 | 428 | [Return To Table Of Contents](#table-of-contents) 429 | 430 | - Using `# TEXT -----` inside an R code chunk adds it to the table of contents of the RMarkdown document 431 | - Good websites for [RDocumentation](https://www.rdocumentation.org/) and Searching for [R Resources](https://rseek.org/) 432 | - Date opertor with lubridate `%m+%` and `%m-%` to add a date part to a date that is wise to things like jan31st doesnt go to feb31st but to feb28/29th 433 | - if moving data between languages use feather package and `.feather` files to make the interchange 434 | 435 | 436 | 437 | to be expounded on later: 438 | 439 | Template package, project, document templates for my analyses 440 | 441 | - reference: https://github.com/ledbettc/CIDAtools 442 | - reference: https://github.com/atlas-aai/ratlas 443 | --------------------------------------------------------------------------------