├── .Rbuildignore ├── .github ├── CONTRIBUTING.md ├── issue_template.md └── pull_request_template.md ├── .gitignore ├── .travis.yml ├── CONDUCT.md ├── DESCRIPTION ├── LICENSE ├── NAMESPACE ├── NEWS.md ├── R ├── cacheDirectories.R ├── deleteCache.R ├── examples │ └── example.R ├── listCaches.R ├── loadCaches.R ├── sharedCaches.R ├── simpleCache.R ├── storeCache.R └── utility.R ├── README.md ├── _pkgdown.yaml ├── cran-comments.md ├── inst ├── cache │ └── existingCache.RData └── templates │ └── slurm-advanced.tmpl ├── man ├── addCacheSearchEnvironment.Rd ├── deleteCaches.Rd ├── dot-tooOld.Rd ├── getCacheDir.Rd ├── listCaches.Rd ├── loadCaches.Rd ├── resetCacheSearchEnvironment.Rd ├── secToTime.Rd ├── setCacheBuildDir.Rd ├── setCacheDir.Rd ├── setSharedCacheDir.Rd ├── simpleCache-package.Rd ├── simpleCache.Rd ├── simpleCacheGlobal.Rd ├── simpleCacheOptions.Rd ├── simpleCacheShared.Rd ├── simpleCacheSharedGlobal.Rd ├── storeCache.Rd ├── tic.Rd └── toc.Rd ├── paper ├── paper.bib └── paper.md ├── simpleCache.Rproj ├── tests ├── testthat.R └── testthat │ ├── helper-lifespan.R │ ├── test_all.R │ └── test_cache_lifespan.R └── vignettes ├── clusterCaches.Rmd ├── sharingCaches.Rmd └── simpleCacheIntroduction.Rmd /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^.*\.Rproj$ 2 | ^\.Rproj\.user$ 3 | ^\.travis\.yml$ 4 | ^cran-comments.md$ 5 | ^CONDUCT\.md$ 6 | ^paper$ 7 | .github 8 | ^_pkgdown\.yaml$ 9 | ^doc$ 10 | ^Meta$ 11 | -------------------------------------------------------------------------------- /.github/CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # CONTRIBUTING # 2 | 3 | ### Please contribute! 4 | 5 | We love collaboration. 6 | 7 | ### Bugs? 8 | 9 | * Submit an issue on the [Issues page](https://github.com/databio/simpleCache/issues) 10 | 11 | ### Code contributions 12 | 13 | * Fork this repo to your Github account 14 | * Clone your version on your account down to your machine from your account, e.g,. `git clone https://github.com/databio/simpleCache.git` 15 | * Make sure to track progress upstream (i.e., on our version of `simpleCache` at `databio/simpleCache`) by doing `git remote add upstream https://github.com/databio/simpleCache.git`. Before making changes make sure to pull changes in from upstream by doing either `git fetch upstream` then merge later or `git pull upstream` to fetch and merge in one step 16 | * Make your changes (bonus points for making changes on a new feature branch) 17 | * Push up to your account 18 | * Submit a pull request to home base (likely master branch, but check to make sure) at `databio/simpleCache` 19 | 20 | ### Thanks for contributing! -------------------------------------------------------------------------------- /.github/issue_template.md: -------------------------------------------------------------------------------- 1 | 2 | 3 |
Session Info 4 | 5 | ```r 6 | 7 | ``` 8 |
-------------------------------------------------------------------------------- /.github/pull_request_template.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | ## Description 4 | 5 | 6 | ## Related Issue 7 | 10 | 11 | ## Example 12 | 14 | 15 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Compiled source # 2 | ################### 3 | *.com 4 | *.class 5 | *.dll 6 | *.exe 7 | *.o 8 | *.so 9 | 10 | # Packages # 11 | ############ 12 | # it's better to unpack these files and commit the raw source 13 | # git has its own built in compression methods 14 | *.7z 15 | *.dmg 16 | *.gz 17 | *.iso 18 | *.jar 19 | *.rar 20 | *.tar 21 | *.zip 22 | 23 | # Logs and databases # 24 | ###################### 25 | *.log 26 | *.sql 27 | *.sqlite 28 | 29 | # IDEs # 30 | ######## 31 | .idea/ 32 | 33 | # OS generated files # 34 | ###################### 35 | .DS_Store 36 | .DS_Store? 37 | ._* 38 | .Spotlight-V100 39 | .Trashes 40 | ehthumbs.db 41 | Thumbs.db 42 | 43 | # Gedit temporary files # 44 | ######################### 45 | 46 | *~ 47 | ~ 48 | .Rproj.user 49 | .Rhistory 50 | /doc/ 51 | /Meta/ 52 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: R 2 | sudo: false 3 | cache: packages 4 | r_packages: batchtools -------------------------------------------------------------------------------- /CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Code of Conduct 2 | 3 | As contributors and maintainers of this project, we pledge to respect all people who 4 | contribute through reporting issues, posting feature requests, updating documentation, 5 | submitting pull requests or patches, and other activities. 6 | 7 | We are committed to making participation in this project a harassment-free experience for 8 | everyone, regardless of level of experience, gender, gender identity and expression, 9 | sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion. 10 | 11 | Examples of unacceptable behavior by participants include the use of sexual language or 12 | imagery, derogatory comments or personal attacks, trolling, public or private harassment, 13 | insults, or other unprofessional conduct. 14 | 15 | Project maintainers have the right and responsibility to remove, edit, or reject comments, 16 | commits, code, wiki edits, issues, and other contributions that are not aligned to this 17 | Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed 18 | from the project team. 19 | 20 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by 21 | opening an issue or contacting one or more of the project maintainers. 22 | 23 | This Code of Conduct is adapted from the Contributor Covenant 24 | (http:contributor-covenant.org), version 1.0.0, available at 25 | http://contributor-covenant.org/version/1/0/0/ 26 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: simpleCache 2 | Version: 0.4.2 3 | Date: 2021-04-16 4 | Title: Simply Caching R Objects 5 | Description: Provides intuitive functions for caching R objects, encouraging 6 | reproducible, restartable, and distributed R analysis. The user selects a 7 | location to store caches, and then provides nothing more than a cache name 8 | and instructions (R code) for how to produce the R object. Also 9 | provides some advanced options like environment assignments, recreating or 10 | reloading caches, and cluster compute bindings (using the 'batchtools' 11 | package) making it flexible enough for use in large-scale data analysis 12 | projects. 13 | Authors@R: c(person("VP", "Nagraj", email = "vpnagraj@virginia.edu", role = 14 | c("aut")), person("Nathan", "Sheffield", email = "nathan@code.databio.org", 15 | role = c("aut", "cre"))) 16 | Suggests: 17 | knitr, 18 | rmarkdown, 19 | testthat 20 | Enhances: batchtools 21 | VignetteBuilder: knitr 22 | License: BSD_2_clause + file LICENSE 23 | Encoding: UTF-8 24 | URL: https://github.com/databio/simpleCache 25 | BugReports: https://github.com/databio/simpleCache 26 | RoxygenNote: 7.1.1 27 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | YEAR: 2017 2 | COPYRIGHT HOLDER: Nathan Sheffield -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | export(addCacheSearchEnvironment) 4 | export(deleteCaches) 5 | export(getCacheDir) 6 | export(listCaches) 7 | export(loadCaches) 8 | export(resetCacheSearchEnvironment) 9 | export(setCacheBuildDir) 10 | export(setCacheDir) 11 | export(setSharedCacheDir) 12 | export(simpleCache) 13 | export(simpleCacheGlobal) 14 | export(simpleCacheOptions) 15 | export(simpleCacheShared) 16 | export(simpleCacheSharedGlobal) 17 | export(storeCache) 18 | -------------------------------------------------------------------------------- /NEWS.md: -------------------------------------------------------------------------------- 1 | # Change log 2 | All notable changes to this project will be documented in this file. 3 | 4 | ## simpleCache [0.4.2] -- 2021-04-16 5 | 6 | - updates to accommodate latest knitr for vignettes 7 | 8 | ## simpleCache [0.4.1] -- 2019-02-26 9 | 10 | - fixes unit tests on windows 11 | - fixes lifespan bug that used creation time instead of modification time 12 | - allow arg-less directory setting to default to current working dir 13 | 14 | ## simpleCache [0.4.0] -- 2017-12-20 15 | 16 | - adds a lifespan arg to simpleCache() to create auto-expiring caches 17 | - remove unnecessary parse argument to simpleCache() 18 | - viewCacheDirs() renamed to simpleCacheOptions() 19 | 20 | ## simpleCache [0.3.1] -- 2017-08-21 21 | 22 | - fixed a bug in unit tests that left behind a test cache in user home dir. 23 | - changes cache building to happen in parent.frame() 24 | - repaired vignette so R code is displayed properly 25 | - added deleteCaches() function and docs 26 | - reduced size of unit test cache for speed increase 27 | 28 | ## simpleCache [0.3.0] -- 2017-08-21 29 | 30 | - switched default cache dir to tempdir() 31 | - changed availCaches() to listCaches() 32 | - changes cache building to happen in globalenv(), so that any loaded 33 | packages are available for cache building 34 | 35 | 36 | ## simpleCache [0.2.1] -- 2017-07-30 37 | 38 | - added examples 39 | 40 | ## simpleCache [0.2.0] -- 2017-07-30 41 | 42 | - support for batchjobs parallel processing 43 | - docs, prep for submission to CRAN 44 | 45 | ## simpleCache [0.0.1] 46 | 47 | - long-term stable version 48 | -------------------------------------------------------------------------------- /R/cacheDirectories.R: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | # CACHE DIRECTORY FUNCTIONS 3 | ################################################################################ 4 | # These are exported functions for interacting with global variables that 5 | # specify default directories for 2 cache types: project caches and shared 6 | # caches. 7 | 8 | #' Sets a global variable specifying the default cache directory for 9 | #' \code{\link{simpleCache}} calls. 10 | #' 11 | #' @param cacheDir Directory where caches should be stored 12 | #' @export 13 | #' @example 14 | #' R/examples/example.R 15 | setCacheDir = function(cacheDir=NULL) { .setDir("RCACHE.DIR", cacheDir) } 16 | 17 | #' Fetcher of the currently set cache directory. 18 | #' 19 | #' \code{getCacheDir} retrieves the value of the option that stores the currently 20 | #' set cache directory path. 21 | #' 22 | #' @return If the option is set, the path to the currently set cache directory; otherwise, \code{NULL}. 23 | #' @export 24 | getCacheDir = function() { getOption("RCACHE.DIR") } 25 | 26 | #' Set shared cache directory 27 | #' 28 | #' Sets global variable specifying the default cache directory for 29 | #' \code{\link{simpleCacheShared}} calls; this function is simply a helper alias for caching 30 | #' results that will be used across projects. 31 | #' 32 | #' @param sharedCacheDir Directory where shared caches should be stored 33 | #' @export 34 | setSharedCacheDir = function(sharedCacheDir=NULL) { .setDir("RESOURCES.RCACHE", sharedCacheDir) } 35 | 36 | #' Sets local cache build directory with scripts for building files. 37 | #' 38 | #' @param cacheBuildDir Directory where build scripts are stored. 39 | #' @export 40 | setCacheBuildDir = function(cacheBuildDir=NULL) { .setDir("RBUILD.DIR", cacheBuildDir) } 41 | 42 | #' View simpleCache options 43 | #' 44 | #' Views simpleCache global variables 45 | #' @export 46 | simpleCacheOptions = function() { 47 | message("RESOURCES.RCACHE:\t", getOption("RESOURCES.RCACHE")) 48 | message("RCACHE.DIR:\t", getCacheDir()) 49 | message("RBUILD.DIR:\t", getOption("RBUILD.DIR")) 50 | message("SIMPLECACHE.ENV:\t", getOption("SIMPLECACHE.ENV")) 51 | } 52 | 53 | #' Add a cache search environment 54 | #' 55 | #' Append a new Environment name (a character string) to a global option 56 | #' which is a vector of such names. SimpleCache will search all of these 57 | #' environments to check if a cache is previously loaded, before reloading it. 58 | #' 59 | #' @param addEnv Environment to append to the shared cache search list 60 | #' @export 61 | addCacheSearchEnvironment = function(addEnv) { 62 | options(SIMPLECACHE.ENV=append(addEnv, getOption("SIMPLECACHE.ENV"))) 63 | } 64 | 65 | #' Sets global option of cache search environments to \code{NULL}. 66 | #' 67 | #' @export 68 | resetCacheSearchEnvironment = function() { 69 | options(SIMPLECACHE.ENV=NULL) 70 | } 71 | 72 | 73 | .setDir = function(optname, dirpath=NULL) { 74 | diropts = list(ifelse(is.null(dirpath), getwd(), dirpath)) 75 | names(diropts) = optname 76 | do.call(options, diropts) 77 | } 78 | -------------------------------------------------------------------------------- /R/deleteCache.R: -------------------------------------------------------------------------------- 1 | #' Deletes caches 2 | #' 3 | #' Given a cache name, this function will attempt to delete the cache of that 4 | #' name on disk. 5 | #' @param cacheNames Name(s) of the cache to delete 6 | #' @param cacheDir Directory where caches are kept 7 | #' @param force Force deletion without user prompt 8 | #' @export 9 | #' @example 10 | #' R/examples/example.R 11 | deleteCaches = function(cacheNames, cacheDir=getCacheDir(), 12 | force=FALSE) { 13 | 14 | if (force) { 15 | response = "y" 16 | } else { 17 | response = readline("Are you sure you want to delete this cache? [y/N]") 18 | } 19 | 20 | if (tolower(response) == "yes" || tolower(response) == "y") { 21 | for (cacheName in cacheNames) { 22 | cacheFile = file.path(cacheDir, paste0(cacheName, ".RData")) 23 | message("Deleting ", cacheFile) 24 | unlink(cacheFile) 25 | } 26 | } else { 27 | message("User aborted cache delete.") 28 | } 29 | } -------------------------------------------------------------------------------- /R/examples/example.R: -------------------------------------------------------------------------------- 1 | # choose location to store caches 2 | cacheDir = tempdir() 3 | cacheDir 4 | setCacheDir(cacheDir) 5 | 6 | # build some caches 7 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE) 8 | simpleCache("normSample", { rnorm(5e3, 0,1) }) 9 | simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) 10 | 11 | # storing a cache after-the-fact 12 | normSample2 = rnorm(10, 0, 1) 13 | storeCache("normSample2") 14 | 15 | # what's available? 16 | listCaches() 17 | 18 | # load a cache 19 | simpleCache("normSample") 20 | 21 | # load multiples caches 22 | loadCaches(c("normSample", "normSample2"), reload=TRUE) 23 | -------------------------------------------------------------------------------- /R/listCaches.R: -------------------------------------------------------------------------------- 1 | #' Show available caches. 2 | #' 3 | #' Lists any cache files in the cache directory. 4 | #' 5 | #' @param cacheSubDir Optional parameter to specify a subdirectory of the cache folder. 6 | #' @return \code{character} vector in which each element is the path to a file that 7 | #' represents an available cache (within \code{getOption("RCACHE.DIR")}) 8 | #' @export 9 | #' @example 10 | #' R/examples/example.R 11 | listCaches = function(cacheSubDir="") { 12 | cacheDirFiles = list.files(paste0(getCacheDir(), cacheSubDir)) 13 | cacheDirFiles[which(sapply(cacheDirFiles, function(f) endsWith(f, ".RData")))] 14 | } 15 | 16 | -------------------------------------------------------------------------------- /R/loadCaches.R: -------------------------------------------------------------------------------- 1 | #' Loads pre-made caches 2 | #' 3 | #' This function just takes a list of caches, and loads them. It's designed 4 | #' for stuff you already cached previously, so it won't build any caches. 5 | #' 6 | #' @param cacheNames Vector of caches to load. 7 | #' @param loadEnvir Environment into which to load each cache. 8 | #' @param ... Additional parameters passed to simpleCache. 9 | #' @export 10 | #' @example 11 | #' R/examples/example.R 12 | loadCaches = function(cacheNames, loadEnvir=NULL, ...) { 13 | if (is.null(loadEnvir)) { loadEnvir = parent.frame(n=2) } 14 | for (i in 1:length(cacheNames)) { 15 | # By default, load these caches into the environment that 16 | # calls loadCaches (which is the grandparent, n=2, of the call to 17 | # simpleCache. 18 | simpleCache(cacheNames[i], loadEnvir=loadEnvir, ...) 19 | } 20 | } 21 | -------------------------------------------------------------------------------- /R/sharedCaches.R: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | # Helper aliases for common options 3 | 4 | #' Alias to default to a shared cache folder. 5 | #' 6 | #' Helper alias for caching across experiments/people. 7 | #' Just sets the cacheDir to the default SHARE directory 8 | #' (instead of the typical default PROJECT directory) 9 | 10 | #' 11 | #' @param ... Parameters passed to \code{\link{simpleCache}}. 12 | #' @export 13 | simpleCacheShared = function(...) { 14 | # Since this is a function calling this, I have to set the loadEnvir here, 15 | # otherwise by default simpleCache will load the data into *this* environment, 16 | # which then gets prompty discarded -- not good. The downside is that this 17 | # function now couldn't take a custom loadEnvir. 18 | simpleCache(..., cacheDir=getOption("RESOURCES.RCACHE"), loadEnvir=parent.frame()) 19 | } 20 | 21 | #' Helper alias for loading caches into the global environment. 22 | #' simpleCache normally loads variables into the calling environment; this 23 | #' ensures that the variables are loaded in the global environment. 24 | #' 25 | #' @param ... Parameters passed to \code{\link{simpleCache}}. 26 | #' @export 27 | simpleCacheGlobal = function(...) { 28 | simpleCache(..., loadEnvir=globalenv()) 29 | } 30 | 31 | #' Helper alias for loading shared caches into the global environment. 32 | #' 33 | #' @param ... Parameters passed to \code{\link{simpleCache}}. 34 | #' @export 35 | simpleCacheSharedGlobal = function(...) { 36 | simpleCache(..., cacheDir=getOption("RESOURCES.RCACHE"), loadEnvir=globalenv()) 37 | } 38 | 39 | -------------------------------------------------------------------------------- /R/simpleCache.R: -------------------------------------------------------------------------------- 1 | ## Package documentation 2 | #' Provides intuitive functions for caching R objects, encouraging faster 3 | #' reproducible and restartable R analysis 4 | #' 5 | #' @references \url{https://github.com/databio/simpleCache} 6 | #' @docType package 7 | #' @author Nathan Sheffield 8 | #' @aliases simpleCache-package 9 | "_PACKAGE" 10 | 11 | ################################################################################ 12 | 13 | #' Create a new cache or load a previously created cache. 14 | #' 15 | #' Given a unique name for an R object, and instructions for how to make that 16 | #' object, use the simpleCache function to create and cache or load the object. 17 | #' This should be used for computations that take a long time and generate a 18 | #' table or something used repeatedly (in other scripts, for example). Because 19 | #' the cache is tied to the object name, there is some danger of causing 20 | #' troubles if you misuse the caching system. The object should be considered 21 | #' static. 22 | #' 23 | #' You should pass a bracketed R code snippet like \code{rnorm(500)} as the 24 | #' instruction, and simpleCache will create the object. Alternatively, if the 25 | #' code to create the cache is large, you can put an R script called object.R in 26 | #' the \code{\link[=setCacheBuildDir]{RBUILD.DIR}} (the name of the file *must* match the name of the object it 27 | #' creates *exactly*). If you don't provide an instruction, the function sources 28 | 29 | #' RBUILD.DIR/object.R and caches the result as the object. This source file 30 | #' *must* create an object with the same name of the object. If you already have 31 | #' an object with the name of the object to load in your current environment, 32 | #' this function will not try to reload the object; instead, it returns the 33 | #' local object. In essence, it assumes that this is a static object, which you 34 | #' will not change. You can force it to load the cached version instead with 35 | #' "reload". 36 | #' 37 | #' Because R uses lexical scope and not dynamic scope, you may need to pass some 38 | #' environment variables you use in your instruction code. You can use this 39 | #' using the parameter buildEnvir (just provide a list of named variables). 40 | #' 41 | #' @param cacheName A character vector for a unique name for the cache. Be careful. 42 | #' @param instruction R expression (in braces) to be evaluated. The returned value of this 43 | #' code is what will be cached under the cacheName. 44 | #' @param buildEnvir An environment (or list) providing additional variables 45 | #' necessary for evaluating the code in instruction. 46 | #' @param reload Logical indicating whether to force re-loading the cache, 47 | #' even if it exists in the env. 48 | #' @param recreate Logical indicating whether to force reconstruction of the 49 | #' cache 50 | #' @param noload Logical indicating whether to create but not load the cache. 51 | #' noload is useful for: you want to create the caches, but not load (like a 52 | #' cache creation loop). 53 | #' @param cacheDir Character vector specifying the directory where caches are 54 | #' saved (and loaded from). Defaults to the variable set by 55 | #' \code{\link[=setCacheDir]{setCacheDir()}}. 56 | #' @param cacheSubDir Character vector specifying a subdirectory within the 57 | #' \code{cacheDir} variable. Defaults to \code{NULL}. 58 | #' @param assignToVariable Character vector for a variable name to load the 59 | #' cache into. By default, \code{simpleCache} assigns the cache to a 60 | #' variable named \code{cacheName}; you can overrule that here. 61 | #' @param loadEnvir An environment. Into which environment would you like to 62 | #' load the variable? Defaults to \code{\link[base]{parent.frame}}. 63 | #' @param searchEnvir a vector of environments to search for the already loaded 64 | #' cache. 65 | #' @param timer Logical indicating whether to report how long it took to create 66 | #' the cache. 67 | #' @param buildDir Location of Build files (files with instructions for use If 68 | #' the instructions argument is not provided). Defaults to RBUILD.DIR 69 | #' global option. 70 | #' @param nofail By default, simpleCache throws an error if the instructions 71 | #' fail. Use this option to convert this error into a warning. No cache will 72 | #' be created, but simpleCache will not then hard-stop your processing. This 73 | #' is useful, for example, if you are creating a bunch of caches (for 74 | #' example using \code{lapply}) and it's ok if some of them do not complete. 75 | #' @param batchRegistry A \code{batchtools} registry object (built with 76 | #' \code{\link[batchtools]{makeRegistry}}). If provided, this cache will be created on 77 | #' the cluster using your batchtools configuration 78 | #' @param batchResources A list of variables to provide to batchtools for 79 | #' cluster resource managers. Used as the \code{res} argument to 80 | #' \code{\link[batchtools]{batchMap}} 81 | #' @param lifespan Numeric specifying the maximum age of cache, in days, to 82 | #' allow before automatically triggering \code{recreate=TRUE}. 83 | #' @param pepSettings Experimental untested feature. 84 | #' @param ignoreLock Internal parameter used for batch job submission; don't 85 | #' touch. 86 | #' @export 87 | #' @example 88 | #' R/examples/example.R 89 | simpleCache = function(cacheName, instruction=NULL, buildEnvir=NULL, 90 | reload=FALSE, recreate=FALSE, noload=FALSE, 91 | cacheDir=getCacheDir(), cacheSubDir=NULL, timer=FALSE, 92 | buildDir=getOption("RBUILD.DIR"), assignToVariable=NULL, 93 | loadEnvir=parent.frame(), searchEnvir=getOption("SIMPLECACHE.ENV"), 94 | nofail=FALSE, batchRegistry=NULL, batchResources=NULL, pepSettings=NULL, 95 | ignoreLock=FALSE, lifespan=NULL) { 96 | 97 | if (!"character" %in% class(cacheName)) { 98 | stop("simpleCache expects the cacheName variable to be a character vector.") 99 | } 100 | 101 | # Because R evaluates arguments lazily (only when they are used), 102 | # it will not evaluate the instruction if I first wrap it in a 103 | # primitive substitute call. Then I can evaluate conditionally 104 | # (if the cache needs to be recreated) 105 | instruction = substitute(instruction) 106 | if ("character" %in% class(instruction)) { 107 | message("Character instruction; consider wrapping in braces.") 108 | parse = TRUE 109 | } else { parse = FALSE } 110 | 111 | # Handle directory paths. 112 | if (!is.null(cacheSubDir)) { cacheDir = file.path(cacheDir, cacheSubDir) } 113 | if (is.null(cacheDir)) { 114 | message(strwrap("No cacheDir specified. You should set global option 115 | RCACHE.DIR with setCacheDir(), or specify a cacheDir parameter directly 116 | to simpleCache(). With no other option, simpleCache will use tempdir(): 117 | ", initial="", prefix=" "), tempdir()) 118 | cacheDir = tempdir() 119 | } 120 | 121 | if (!file.exists(cacheDir)) { 122 | dir.create(cacheDir, recursive=TRUE) 123 | } 124 | cacheFile = file.path(cacheDir, paste0(cacheName, ".RData")) 125 | lockFile = file.path(cacheDir, paste0(cacheName, ".lock")) 126 | if (ignoreLock) { 127 | # remove the lock file when this function call is complete. 128 | on.exit(file.remove(lockFile)) 129 | } 130 | submitted = FALSE 131 | # Check if cache exists in any provided search environment. 132 | searchEnvir = append(searchEnvir, ".GlobalEnv") # Assume global env. 133 | cacheExists = FALSE 134 | cacheWhere = NULL 135 | 136 | for ( curEnv in searchEnvir ) { 137 | if(! ( exists(curEnv) && is.environment(get(curEnv))) ) { 138 | warning(curEnv, " is not an environment.") 139 | } else if( exists(cacheName, where=get(curEnv))) { 140 | cacheExists = TRUE 141 | cacheWhere = curEnv 142 | break 143 | } 144 | } 145 | 146 | 147 | ret = NULL # The default, in case the cache construction fails. 148 | 149 | if (.tooOld(cacheFile, lifespan)) { 150 | message(sprintf( 151 | "Stale cache: '%s' (age > %d day(s))", cacheFile, lifespan)) 152 | recreate = TRUE 153 | } 154 | 155 | if(cacheExists & !reload & !recreate) { 156 | message("::Object exists (in ", cacheWhere, ")::\t", cacheName) 157 | #return(get(cacheName)) 158 | #return() 159 | ret = get(cacheName, pos = get(cacheWhere)) 160 | } else if (file.exists(lockFile) & !ignoreLock) { 161 | message("::Cache processing (lock file exists)::\t", lockFile) 162 | #check for slurm log... 163 | 164 | if (!is.null(batchRegistry)) { 165 | # Grabbing log from batchtools 166 | # 1 is the job id. 167 | message(paste(batchtools::getLog(1, reg=batchRegistry), collapse="\n")) 168 | } 169 | if (!is.null(pepSettings)) { 170 | # TODO: retrieve log 171 | stop("PEP settings submission is not yet implemented") 172 | 173 | } 174 | 175 | return() 176 | 177 | } else if(file.exists(cacheFile) & !recreate & !noload) { 178 | message("::Loading cache::\t", cacheFile) 179 | load(cacheFile) 180 | } else if(file.exists(cacheFile) & !recreate) { 181 | message("::Cache exists (no load)::\t", cacheFile) 182 | return(NULL) 183 | } else { 184 | message("::Creating cache::\t", cacheFile) 185 | 186 | tryCatch( { # Intercept any errors with creating this cache. 187 | 188 | if(is.null(instruction)) { 189 | if (is.null(buildDir)) { 190 | stop(strwrap("::Error::\tIf you do not provide an 191 | instruction argument, you must set global option RBUILD.DIR 192 | with setCacheBuildDir, or specify a buildDir parameter 193 | directly to simpleCache().")) 194 | } 195 | RBuildFile = file.path(buildDir, paste0(cacheName, ".R")) 196 | 197 | if (!file.exists(RBuildFile)) { 198 | stop("::Error::\tNo instruction or RBuild file provided.") 199 | } 200 | 201 | if (timer) { tic() } 202 | source(file.path(buildDir, paste0(cacheName, ".R")), local=FALSE) 203 | if (timer) { toc() } 204 | ret = get(cacheName) 205 | } else { 206 | 207 | if (is.null(buildEnvir)) { 208 | if (timer) { tic() } 209 | if ( ! is.null(batchRegistry) ) { 210 | # Submit to cluster using batchtools 211 | 212 | if (! requireNamespace("batchtools", quietly=TRUE)) { 213 | stop("Install batchtools for cluster submission...") 214 | } 215 | if (is.null(batchResources)) { 216 | stop("You must provide both batchRegistry and batchResources.") 217 | } 218 | message("Submitting job to cluster") 219 | # You have to wrap `instruction` in substitute() so it won't be evaluated, 220 | # then you have to wrap that in list so it won't be misinterpreted 221 | # by batchMap as multiple arguments, causing extra jobs. 222 | args = list(cacheName=cacheName, 223 | instruction=list(substitute(instruction)), 224 | cacheDir=cacheDir, ignoreLock=TRUE) 225 | 226 | ids = batchtools::batchMap( 227 | fun=simpleCache, 228 | args=args, 229 | reg=batchRegistry) 230 | 231 | # lock cache so it won't be loaded prematurely or double-written 232 | file.create(lockFile) 233 | batchtools::submitJobs(ids=ids, reg=batchRegistry, res=batchResources) 234 | 235 | message("Done submitting to cluster") 236 | submitted = "batch" 237 | } else if ( ! is.null(pepSettings) ) { 238 | stop("PEP settings submission is not yet implemented") 239 | # Build a simpleCache command 240 | #simpleCacheCode = paste0("simpleCache('", cacheName, "', 241 | # instruction='", paste0(deparse(instruction), collapse="\n"), "', 242 | # recreate=", recreate, ", 243 | # cacheDir='", cacheDir,"', 244 | # ignoreLock=TRUE)") 245 | #if (slurmParams$jobName=="test") { slurmParams$jobName=cacheName } 246 | #with(slurmParams, buildSlurmScript( 247 | # simpleCacheCode, preamble, submit, hpcFolder, 248 | # jobName, mem, cores, partition, timeLimit, sourceProjectInit)) 249 | } else { 250 | # No cluster submission request, so just run it here! 251 | # "ret," for return, is the name the cacheName is stored under. 252 | if (parse) { 253 | ret = eval(parse(text=instruction), envir=parent.frame()) 254 | } else { 255 | # Here we do the evaluation in the parent frame so that 256 | # it will have access to any packages the user has loaded 257 | # that may be required to run the code. Otherwise, it will 258 | # run in the simpleCache namespace which could lack these 259 | # packages (or have a different search path hierarchy), 260 | # leading to failures. The `substitute` call here ensures 261 | # the code isn't evaluated at argument stage, but is retained 262 | # until it makes it to the `eval` call. 263 | ret = eval(instruction, envir=parent.frame()) 264 | } 265 | } 266 | if (timer) { toc() } 267 | } else { 268 | # Build environment was provided. 269 | # we must place the instruction in the environment to build from 270 | if (exists("instruction", buildEnvir)) { 271 | stop("Can't provide a variable named 'instruction' in buildEnvir") 272 | } 273 | buildEnvir$instruction = instruction 274 | be = as.environment(buildEnvir) 275 | # As described above, this puts global package functions into 276 | # scope so instructions can use them. 277 | parent.env(be) = parent.frame() 278 | if (timer) { tic() } 279 | if (parse) { 280 | ret = with(be, eval(parse(text=instruction))) 281 | } else { 282 | #ret = with(buildEnvir, evalq(instruction)) 283 | ret = with(be, eval(instruction)) 284 | 285 | } 286 | if (timer) { toc() } 287 | } 288 | } 289 | 290 | # tryCatch 291 | }, error = function(e) { if (nofail) warning(e) else stop(e) }) 292 | 293 | if (submitted == "batch") { 294 | message("Job submitted, check for cache.") 295 | return() 296 | } else if (is.null(ret)) { 297 | message("NULL value returned, no cache created") 298 | return() #so we don't assign NULL to the object. 299 | } else { 300 | save(ret, file=cacheFile) 301 | } 302 | } 303 | if (noload) { 304 | rm(ret) 305 | gc() 306 | return() 307 | } 308 | if(is.null(assignToVariable)) { 309 | assignToVariable = cacheName 310 | } 311 | assign(assignToVariable, ret, envir=loadEnvir) 312 | 313 | #return() #used to return ret, but not any more 314 | } 315 | -------------------------------------------------------------------------------- /R/storeCache.R: -------------------------------------------------------------------------------- 1 | #' Stores as a cache an already-produced R object 2 | #' 3 | #' Sometimes you use significant computational power to create an object, but 4 | #' you didn't cache it with \code{\link{simpleCache}}. Oops, maybe you wish you had, after the 5 | #' fact. This function lets you store an object in the environment so it could 6 | #' be loaded by future calls to \code{simpleCache}. 7 | #' 8 | #' This can be used in interactive sessions, but could also be used for another 9 | #' use case: you have a complicated set of instructions (too much to pass as the 10 | #' instruction argument to \code{simpleCache}), so you could just stick a call to 11 | #' \code{storeCache} at the end. 12 | #' 13 | #' @param cacheName Unique name for the cache (and R object to be cached). 14 | #' @param cacheDir The directory where caches are saved (and loaded from). 15 | #' Defaults to the global \code{\link[=setCacheDir]{RCACHE.DIR}} variable 16 | #' @param cacheSubDir You can specify a subdirectory within the cacheDir 17 | #' variable. Defaults to \code{NULL}. 18 | #' @param recreate Forces reconstruction of the cache 19 | #' @export 20 | #' @example 21 | #' R/examples/example.R 22 | storeCache = function(cacheName, cacheDir = getCacheDir(), 23 | cacheSubDir = NULL, recreate=FALSE) { 24 | 25 | if(!is.null(cacheSubDir)) { 26 | cacheDir = file.path(cacheDir, cacheSubDir) 27 | 28 | } 29 | 30 | if (is.null(cacheDir)) { 31 | message(strwrap("You must set global option RCACHE.DIR with 32 | setSharedCacheDir(), or specify a cacheDir parameter directly to 33 | simpleCache().")) 34 | return(NA); 35 | } 36 | 37 | if(! "character" %in% class(cacheName)) { 38 | stop(strwrap("storeCache expects the cacheName variable to be a 39 | character vector.")) 40 | } 41 | 42 | if (!file.exists(cacheDir)) { 43 | dir.create(cacheDir, recursive=TRUE) 44 | } 45 | cacheFile = file.path(cacheDir, paste0(cacheName, ".RData")) 46 | if(file.exists(cacheFile) & !recreate) { 47 | message("::Cache already exists (use recreate to overwrite)::\t", 48 | cacheFile) 49 | return (NULL) 50 | } else if (!exists(cacheName)) { 51 | message("::Object does not exist::\t", cacheName) 52 | return(NULL) 53 | } else { 54 | message("::Creating cache::\t", cacheFile) 55 | ret = get(cacheName) 56 | save(ret, file=cacheFile) 57 | } 58 | } 59 | -------------------------------------------------------------------------------- /R/utility.R: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | # UTILITY FUNCTIONS 3 | ################################################################################ 4 | # These are functions copied over from my repository of utilities used 5 | # by this package. They are repeated here simply for portability, so this 6 | # package can be deployed on systems without access to my utilities. 7 | # Any changes should probably be backported to the primary functions rather 8 | # than in these convenience duplications. 9 | # 10 | # These functions should probably remain interior to the package (not exported) 11 | # 12 | #' Determine if a cache file is sufficiently old to warrant refresh. 13 | #' 14 | #' \code{.tooOld} accepts a maximum cache age and checks for an option with 15 | #' that setting under \code{MAX.CACHE.AGE} if such an argument isn't passed. 16 | #' If the indicated file exists and is older than the threshold passed or 17 | #' set as an option, the file is deemed "stale." If an age threshold is 18 | #' provided, no check for an option is performed. If the file does not 19 | #' exist or there's not an age threshold directly passed or set as an option, 20 | #' the result is \code{FALSE}. 21 | #' 22 | #' @param pathCacheFile Path to file to ask about staleness. 23 | #' @param lifespan Maximum file age before it's "stale." 24 | #' @return \code{TRUE} if the file exists and its age exceeds 25 | #' \code{lifespan} if given or 26 | #' \code{getOption("MAX.CACHE.AGE")} if no age threshold is passed 27 | #' and that option exists; \code{FALSE} otherwise. 28 | .tooOld = function(pathCacheFile, lifespan=NULL) { 29 | if (!utils::file_test("-f", pathCacheFile)) { return(FALSE) } 30 | if (is.null(lifespan)) { lifespan = getOption("MAX.CACHE.AGE") } 31 | if (is.null(lifespan)) { return(FALSE) } 32 | cacheTime = file.info(pathCacheFile)$mtime 33 | cacheAge = difftime(Sys.time(), cacheTime, units="days") 34 | as.numeric(cacheAge) > as.numeric(lifespan) 35 | } 36 | 37 | 38 | # MATLAB-style timing functions to start/stop timer. 39 | # These functions were based on an idea by some helpful soul on 40 | # Stackoverflow that I can no longer recall... 41 | 42 | #' This function takes a time in seconds and converts it to a more 43 | #' human-readable format, showing hours, minutes, or seconds, depending 44 | #' on how long the time is. Used by my implementation of tic()/toc(). 45 | #' @param timeInSec numeric value of time measured in seconds. 46 | secToTime = function(timeInSec) { 47 | hr = timeInSec %/% 3600 #hours 48 | min = timeInSec %% 3600 %/% 60 #minutes 49 | sec = timeInSec %% 60 #seconds 50 | return(paste0(sprintf("%02d", hr), "h ", sprintf("%02d", min), "m ", 51 | sprintf("%02.01f", signif(sec, 3)), "s")) 52 | } 53 | 54 | ticTocEnv = new.env() 55 | 56 | #' Start a timer 57 | #' @param gcFirst Garbage Collect before starting the timer? 58 | #' @param type Type of time to return, 59 | #' can be 'elapsed', 'user.self', or 'sys.self' 60 | tic = function(gcFirst = TRUE, type=c("elapsed", "user.self", "sys.self")) { 61 | type <- match.arg(type) 62 | assign(".type_simpleCache", type, envir=ticTocEnv) 63 | if(gcFirst) gc(FALSE) 64 | tic <- proc.time()[type] 65 | assign(".tic_simpleCache", tic, envir=ticTocEnv) 66 | invisible(tic) 67 | } 68 | 69 | #' Check the time since the current timer was started with tic() 70 | toc = function() { 71 | type <- get(".type_simpleCache", envir=ticTocEnv) 72 | toc <- proc.time()[type] 73 | tic <- get(".tic_simpleCache", envir=ticTocEnv) 74 | timeInSec = as.numeric(toc-tic); 75 | message("<", secToTime(timeInSec), ">", appendLF=FALSE) 76 | invisible(toc) 77 | } 78 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | simpleCache: R caching for restartable analysis 2 | ----------------------------------------------- 3 | 4 | Travis CI status 5 | 6 | `simpleCache` is an R package providing functions for caching R objects. Its 7 | purpose is to encourage writing reusable, restartable, and reproducible analysis 8 | pipelines for projects with massive data and computational requirements. 9 | 10 | Like its name indicates, `simpleCache` is intended to be simple. You choose a 11 | location to store your caches, and then provide the function with nothing more 12 | than a cache name and instructions (R code) for how to produce the R object. 13 | While simple, `simpleCache` also provides some advanced options like environment 14 | assignments, recreating caches, reloading caches, and even cluster compute 15 | bindings (using the `batchtools` package) making it flexible enough for use in 16 | large-scale data analysis projects. 17 | 18 | -------------------------------------------------------------------------------- 19 | ### Installing simpleCache 20 | 21 | `simpleCache` is on 22 | [CRAN](https://cran.r-project.org/package=simpleCache) and can 23 | be installed as usual: 24 | 25 | ``` 26 | install.packages("simpleCache") 27 | ``` 28 | 29 | -------------------------------------------------------------------------------- 30 | ### Running simpleCache 31 | 32 | `simpleCache` comes with a single primary function (`simpleCache()`) that will do almost 33 | everything you need. In short, you run it with a few lines like this: 34 | 35 | ``` 36 | library(simpleCache) 37 | setCacheDir(tempdir()) 38 | simpleCache("normSample", { rnorm(1e7, 0,1) }, recreate=TRUE) 39 | simpleCache("normSample", { rnorm(1e7, 0,1) }) 40 | ``` 41 | 42 | `simpleCache` also interfaces with the `batchtools` package to let you build 43 | caches on any cluster resource manager. 44 | 45 | -------------------------------------------------------------------------------- 46 | ### Highlights of exported functions 47 | 48 | - `simpleCache()`: Creates and caches or reloads cached results of provided R instruction code 49 | - `listCaches()`: Lists all of the caches available in the `cacheDir` 50 | - `deleteCaches()`: Deletes cache(s) from the `cacheDir` 51 | - `setCacheDir()`: Sets a global option for a cache directory so you don't have to specify one in each `simpleCache` call 52 | - `simpleCacheOptions()`: Views all of the `simpleCache` global options that have been set 53 | 54 | ### simpleCache Philosophy 55 | 56 | The use case I had in mind for `simpleCache` is that you find yourself 57 | constantly recalculating the same R object in several different scripts, or 58 | repeatedly in the same script, every time you open it and want to continue that 59 | project. SimpleCache is well-suited for interactive analysis, allowing you to 60 | pick up right where you left off in a new R session, without having to 61 | recalculate everything. It is equally useful in automatic pipelines, where 62 | separate scripts may benefit from loading, instead of recalculating, the same R 63 | objects produced by other scripts. 64 | 65 | R provides some base functions (`save`, `serialize`, and `load`) to let you save 66 | and reload such objects, but these low-level functions are a bit cumbersome. 67 | `simpleCache` simply provides a convenient, user-friendly interface to these 68 | functions, streamlining the process. For example, a single `simpleCache` call 69 | will check for a cache and load it if it exists, or create it if it does not. 70 | With the base R `save` and `load` functions, you can't just write a single 71 | function call and then run the same thing every time you start the script -- 72 | even this simple use case requires additional logic to check for an existing 73 | cache. `simpleCache` just does all this for you. 74 | 75 | The thing to keep in mind with `simpleCache` is that **the cache name is 76 | paramount**. `simpleCache` assumes that your name for an object is a perfect 77 | identifier for that object; in other words, don't cache things that you plan to 78 | change. 79 | 80 | ### Contributing 81 | 82 | `simpleCache` is licensed under the [2-Clause BSD License](https://opensource.org/licenses/BSD-2-Clause). Questions, feature requests and bug reports are welcome via the [issue queue](https://github.com/databio/simpleCache/issues). The maintainer will review pull requests and incorporate contributions at his discretion. 83 | 84 | For more information refer to the contributing document and pull request / issue templates in the [.github folder](https://github.com/databio/simpleCache/tree/master/.github) of this repository. 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | -------------------------------------------------------------------------------- /_pkgdown.yaml: -------------------------------------------------------------------------------- 1 | 2 | template: 3 | params: 4 | bootswatch: yeti 5 | 6 | navbar: 7 | left: 8 | - text: Vignettes 9 | icon: fa-play-circle 10 | href: articles/index.html 11 | - text: Documentation 12 | icon: fa-pencil 13 | href: reference/index.html 14 | - text: GitHub 15 | icon: fa-github fa-lg 16 | href: https://github.com/databio/simpleCache 17 | 18 | right: 19 | - text: Databio.org 20 | href: http://databio.org 21 | - text: Software & Data 22 | href: http://databio.org/software/ 23 | 24 | articles: 25 | - title: Introduction 26 | contents: 27 | - simpleCacheIntroduction 28 | - title: Advanced features 29 | contents: 30 | - clusterCaches 31 | - sharingCaches 32 | -------------------------------------------------------------------------------- /cran-comments.md: -------------------------------------------------------------------------------- 1 | ## Test environments 2 | 3 | * local OS X install, R 3.3.2 4 | * Ubuntu 12.04, R 3.3.2 via TRAVIS CI 5 | * win-builder 6 | 7 | ## R CMD check results 8 | 9 | There were no ERRORs or WARNINGs. 10 | 11 | There was 1 note: 12 | 13 | `checking CRAN incoming feasibility ...` 14 | 15 | ## Downstream dependencies 16 | 17 | None. -------------------------------------------------------------------------------- /inst/cache/existingCache.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/databio/simpleCache/806f818dc2c61d5bd433eea540e2b18e84010416/inst/cache/existingCache.RData -------------------------------------------------------------------------------- /inst/templates/slurm-advanced.tmpl: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | ## Job Resource Interface Definition 4 | ## 5 | ## ntasks [integer(1)]: Number of required tasks, 6 | ## Set larger than 1 if you want to further parallelize 7 | ## with MPI within your job. 8 | ## ncpus [integer(1)]: Number of required cpus per task, 9 | ## Set larger than 1 if you want to further parallelize 10 | ## with multicore/parallel within each task. 11 | ## walltime [integer(1)]: Walltime for this job, in minutes. 12 | ## Must be at least 1 minute. 13 | ## memory [integer(1)]: Memory in megabytes for each cpu. 14 | ## Must be at least 100 (when I tried lower values my 15 | ## jobs did not start at all). 16 | ## 17 | ## Default resources can be set in your .batchtools.conf.R by defining the variable 18 | ## 'default.resources' as a named list. 19 | 20 | <% 21 | # relative paths are not handled well by Slurm 22 | log.file = normalizePath(log.file, winslash = "/", mustWork = FALSE) 23 | -%> 24 | 25 | 26 | #SBATCH --job-name=<%= job.hash %> 27 | #SBATCH --output=<%= log.file %> 28 | #SBATCH --error=<%= log.file %> 29 | #SBATCH --time=<%= ceiling(resources$walltime / 60) %> 30 | #SBATCH --ntasks=1 31 | #SBATCH --cpus-per-task=<%= resources$ncpus %> 32 | #SBATCH --mem-per-cpu=<%= resources$memory %> 33 | <%= if (!is.null(resources$partition)) sprintf(paste0("#SBATCH --partition='", resources$partition, "'")) %> 34 | <%= if (array.jobs) sprintf("#SBATCH --array=1-%i", nrow(jobs)) else "" %> 35 | 36 | ## Initialize work environment like 37 | ## source /etc/profile 38 | ## module add ... 39 | 40 | ## Export value of DEBUGME environemnt var to slave 41 | export DEBUGME=<%= Sys.getenv("DEBUGME") %> 42 | 43 | ## Run R: 44 | ## we merge R output with stdout from SLURM, which gets then logged via --output option 45 | Rscript -e 'batchtools::doJobCollection("<%= uri %>")' 46 | -------------------------------------------------------------------------------- /man/addCacheSearchEnvironment.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/cacheDirectories.R 3 | \name{addCacheSearchEnvironment} 4 | \alias{addCacheSearchEnvironment} 5 | \title{Add a cache search environment} 6 | \usage{ 7 | addCacheSearchEnvironment(addEnv) 8 | } 9 | \arguments{ 10 | \item{addEnv}{Environment to append to the shared cache search list} 11 | } 12 | \description{ 13 | Append a new Environment name (a character string) to a global option 14 | which is a vector of such names. SimpleCache will search all of these 15 | environments to check if a cache is previously loaded, before reloading it. 16 | } 17 | -------------------------------------------------------------------------------- /man/deleteCaches.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/deleteCache.R 3 | \name{deleteCaches} 4 | \alias{deleteCaches} 5 | \title{Deletes caches} 6 | \usage{ 7 | deleteCaches(cacheNames, cacheDir = getCacheDir(), force = FALSE) 8 | } 9 | \arguments{ 10 | \item{cacheNames}{Name(s) of the cache to delete} 11 | 12 | \item{cacheDir}{Directory where caches are kept} 13 | 14 | \item{force}{Force deletion without user prompt} 15 | } 16 | \description{ 17 | Given a cache name, this function will attempt to delete the cache of that 18 | name on disk. 19 | } 20 | \examples{ 21 | # choose location to store caches 22 | cacheDir = tempdir() 23 | cacheDir 24 | setCacheDir(cacheDir) 25 | 26 | # build some caches 27 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE) 28 | simpleCache("normSample", { rnorm(5e3, 0,1) }) 29 | simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) 30 | 31 | # storing a cache after-the-fact 32 | normSample2 = rnorm(10, 0, 1) 33 | storeCache("normSample2") 34 | 35 | # what's available? 36 | listCaches() 37 | 38 | # load a cache 39 | simpleCache("normSample") 40 | 41 | # load multiples caches 42 | loadCaches(c("normSample", "normSample2"), reload=TRUE) 43 | } 44 | -------------------------------------------------------------------------------- /man/dot-tooOld.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/utility.R 3 | \name{.tooOld} 4 | \alias{.tooOld} 5 | \title{Determine if a cache file is sufficiently old to warrant refresh.} 6 | \usage{ 7 | .tooOld(pathCacheFile, lifespan = NULL) 8 | } 9 | \arguments{ 10 | \item{pathCacheFile}{Path to file to ask about staleness.} 11 | 12 | \item{lifespan}{Maximum file age before it's "stale."} 13 | } 14 | \value{ 15 | \code{TRUE} if the file exists and its age exceeds 16 | \code{lifespan} if given or 17 | \code{getOption("MAX.CACHE.AGE")} if no age threshold is passed 18 | and that option exists; \code{FALSE} otherwise. 19 | } 20 | \description{ 21 | \code{.tooOld} accepts a maximum cache age and checks for an option with 22 | that setting under \code{MAX.CACHE.AGE} if such an argument isn't passed. 23 | If the indicated file exists and is older than the threshold passed or 24 | set as an option, the file is deemed "stale." If an age threshold is 25 | provided, no check for an option is performed. If the file does not 26 | exist or there's not an age threshold directly passed or set as an option, 27 | the result is \code{FALSE}. 28 | } 29 | -------------------------------------------------------------------------------- /man/getCacheDir.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/cacheDirectories.R 3 | \name{getCacheDir} 4 | \alias{getCacheDir} 5 | \title{Fetcher of the currently set cache directory.} 6 | \usage{ 7 | getCacheDir() 8 | } 9 | \value{ 10 | If the option is set, the path to the currently set cache directory; otherwise, \code{NULL}. 11 | } 12 | \description{ 13 | \code{getCacheDir} retrieves the value of the option that stores the currently 14 | set cache directory path. 15 | } 16 | -------------------------------------------------------------------------------- /man/listCaches.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/listCaches.R 3 | \name{listCaches} 4 | \alias{listCaches} 5 | \title{Show available caches.} 6 | \usage{ 7 | listCaches(cacheSubDir = "") 8 | } 9 | \arguments{ 10 | \item{cacheSubDir}{Optional parameter to specify a subdirectory of the cache folder.} 11 | } 12 | \value{ 13 | \code{character} vector in which each element is the path to a file that 14 | represents an available cache (within \code{getOption("RCACHE.DIR")}) 15 | } 16 | \description{ 17 | Lists any cache files in the cache directory. 18 | } 19 | \examples{ 20 | # choose location to store caches 21 | cacheDir = tempdir() 22 | cacheDir 23 | setCacheDir(cacheDir) 24 | 25 | # build some caches 26 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE) 27 | simpleCache("normSample", { rnorm(5e3, 0,1) }) 28 | simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) 29 | 30 | # storing a cache after-the-fact 31 | normSample2 = rnorm(10, 0, 1) 32 | storeCache("normSample2") 33 | 34 | # what's available? 35 | listCaches() 36 | 37 | # load a cache 38 | simpleCache("normSample") 39 | 40 | # load multiples caches 41 | loadCaches(c("normSample", "normSample2"), reload=TRUE) 42 | } 43 | -------------------------------------------------------------------------------- /man/loadCaches.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/loadCaches.R 3 | \name{loadCaches} 4 | \alias{loadCaches} 5 | \title{Loads pre-made caches} 6 | \usage{ 7 | loadCaches(cacheNames, loadEnvir = NULL, ...) 8 | } 9 | \arguments{ 10 | \item{cacheNames}{Vector of caches to load.} 11 | 12 | \item{loadEnvir}{Environment into which to load each cache.} 13 | 14 | \item{...}{Additional parameters passed to simpleCache.} 15 | } 16 | \description{ 17 | This function just takes a list of caches, and loads them. It's designed 18 | for stuff you already cached previously, so it won't build any caches. 19 | } 20 | \examples{ 21 | # choose location to store caches 22 | cacheDir = tempdir() 23 | cacheDir 24 | setCacheDir(cacheDir) 25 | 26 | # build some caches 27 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE) 28 | simpleCache("normSample", { rnorm(5e3, 0,1) }) 29 | simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) 30 | 31 | # storing a cache after-the-fact 32 | normSample2 = rnorm(10, 0, 1) 33 | storeCache("normSample2") 34 | 35 | # what's available? 36 | listCaches() 37 | 38 | # load a cache 39 | simpleCache("normSample") 40 | 41 | # load multiples caches 42 | loadCaches(c("normSample", "normSample2"), reload=TRUE) 43 | } 44 | -------------------------------------------------------------------------------- /man/resetCacheSearchEnvironment.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/cacheDirectories.R 3 | \name{resetCacheSearchEnvironment} 4 | \alias{resetCacheSearchEnvironment} 5 | \title{Sets global option of cache search environments to \code{NULL}.} 6 | \usage{ 7 | resetCacheSearchEnvironment() 8 | } 9 | \description{ 10 | Sets global option of cache search environments to \code{NULL}. 11 | } 12 | -------------------------------------------------------------------------------- /man/secToTime.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/utility.R 3 | \name{secToTime} 4 | \alias{secToTime} 5 | \title{This function takes a time in seconds and converts it to a more 6 | human-readable format, showing hours, minutes, or seconds, depending 7 | on how long the time is. Used by my implementation of tic()/toc().} 8 | \usage{ 9 | secToTime(timeInSec) 10 | } 11 | \arguments{ 12 | \item{timeInSec}{numeric value of time measured in seconds.} 13 | } 14 | \description{ 15 | This function takes a time in seconds and converts it to a more 16 | human-readable format, showing hours, minutes, or seconds, depending 17 | on how long the time is. Used by my implementation of tic()/toc(). 18 | } 19 | -------------------------------------------------------------------------------- /man/setCacheBuildDir.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/cacheDirectories.R 3 | \name{setCacheBuildDir} 4 | \alias{setCacheBuildDir} 5 | \title{Sets local cache build directory with scripts for building files.} 6 | \usage{ 7 | setCacheBuildDir(cacheBuildDir = NULL) 8 | } 9 | \arguments{ 10 | \item{cacheBuildDir}{Directory where build scripts are stored.} 11 | } 12 | \description{ 13 | Sets local cache build directory with scripts for building files. 14 | } 15 | -------------------------------------------------------------------------------- /man/setCacheDir.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/cacheDirectories.R 3 | \name{setCacheDir} 4 | \alias{setCacheDir} 5 | \title{Sets a global variable specifying the default cache directory for 6 | \code{\link{simpleCache}} calls.} 7 | \usage{ 8 | setCacheDir(cacheDir = NULL) 9 | } 10 | \arguments{ 11 | \item{cacheDir}{Directory where caches should be stored} 12 | } 13 | \description{ 14 | Sets a global variable specifying the default cache directory for 15 | \code{\link{simpleCache}} calls. 16 | } 17 | \examples{ 18 | # choose location to store caches 19 | cacheDir = tempdir() 20 | cacheDir 21 | setCacheDir(cacheDir) 22 | 23 | # build some caches 24 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE) 25 | simpleCache("normSample", { rnorm(5e3, 0,1) }) 26 | simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) 27 | 28 | # storing a cache after-the-fact 29 | normSample2 = rnorm(10, 0, 1) 30 | storeCache("normSample2") 31 | 32 | # what's available? 33 | listCaches() 34 | 35 | # load a cache 36 | simpleCache("normSample") 37 | 38 | # load multiples caches 39 | loadCaches(c("normSample", "normSample2"), reload=TRUE) 40 | } 41 | -------------------------------------------------------------------------------- /man/setSharedCacheDir.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/cacheDirectories.R 3 | \name{setSharedCacheDir} 4 | \alias{setSharedCacheDir} 5 | \title{Set shared cache directory} 6 | \usage{ 7 | setSharedCacheDir(sharedCacheDir = NULL) 8 | } 9 | \arguments{ 10 | \item{sharedCacheDir}{Directory where shared caches should be stored} 11 | } 12 | \description{ 13 | Sets global variable specifying the default cache directory for 14 | \code{\link{simpleCacheShared}} calls; this function is simply a helper alias for caching 15 | results that will be used across projects. 16 | } 17 | -------------------------------------------------------------------------------- /man/simpleCache-package.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/simpleCache.R 3 | \docType{package} 4 | \name{simpleCache-package} 5 | \alias{simpleCache-package} 6 | \alias{_PACKAGE} 7 | \title{Provides intuitive functions for caching R objects, encouraging faster 8 | reproducible and restartable R analysis} 9 | \description{ 10 | Provides intuitive functions for caching R objects, encouraging 11 | reproducible, restartable, and distributed R analysis. The user selects a 12 | location to store caches, and then provides nothing more than a cache name 13 | and instructions (R code) for how to produce the R object. Also 14 | provides some advanced options like environment assignments, recreating or 15 | reloading caches, and cluster compute bindings (using the 'batchtools' 16 | package) making it flexible enough for use in large-scale data analysis 17 | projects. 18 | } 19 | \references{ 20 | \url{https://github.com/databio/simpleCache} 21 | } 22 | \seealso{ 23 | Useful links: 24 | \itemize{ 25 | \item \url{https://github.com/databio/simpleCache} 26 | \item Report bugs at \url{https://github.com/databio/simpleCache} 27 | } 28 | 29 | } 30 | \author{ 31 | Nathan Sheffield 32 | } 33 | -------------------------------------------------------------------------------- /man/simpleCache.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/simpleCache.R 3 | \name{simpleCache} 4 | \alias{simpleCache} 5 | \title{Create a new cache or load a previously created cache.} 6 | \usage{ 7 | simpleCache( 8 | cacheName, 9 | instruction = NULL, 10 | buildEnvir = NULL, 11 | reload = FALSE, 12 | recreate = FALSE, 13 | noload = FALSE, 14 | cacheDir = getCacheDir(), 15 | cacheSubDir = NULL, 16 | timer = FALSE, 17 | buildDir = getOption("RBUILD.DIR"), 18 | assignToVariable = NULL, 19 | loadEnvir = parent.frame(), 20 | searchEnvir = getOption("SIMPLECACHE.ENV"), 21 | nofail = FALSE, 22 | batchRegistry = NULL, 23 | batchResources = NULL, 24 | pepSettings = NULL, 25 | ignoreLock = FALSE, 26 | lifespan = NULL 27 | ) 28 | } 29 | \arguments{ 30 | \item{cacheName}{A character vector for a unique name for the cache. Be careful.} 31 | 32 | \item{instruction}{R expression (in braces) to be evaluated. The returned value of this 33 | code is what will be cached under the cacheName.} 34 | 35 | \item{buildEnvir}{An environment (or list) providing additional variables 36 | necessary for evaluating the code in instruction.} 37 | 38 | \item{reload}{Logical indicating whether to force re-loading the cache, 39 | even if it exists in the env.} 40 | 41 | \item{recreate}{Logical indicating whether to force reconstruction of the 42 | cache} 43 | 44 | \item{noload}{Logical indicating whether to create but not load the cache. 45 | noload is useful for: you want to create the caches, but not load (like a 46 | cache creation loop).} 47 | 48 | \item{cacheDir}{Character vector specifying the directory where caches are 49 | saved (and loaded from). Defaults to the variable set by 50 | \code{\link[=setCacheDir]{setCacheDir()}}.} 51 | 52 | \item{cacheSubDir}{Character vector specifying a subdirectory within the 53 | \code{cacheDir} variable. Defaults to \code{NULL}.} 54 | 55 | \item{timer}{Logical indicating whether to report how long it took to create 56 | the cache.} 57 | 58 | \item{buildDir}{Location of Build files (files with instructions for use If 59 | the instructions argument is not provided). Defaults to RBUILD.DIR 60 | global option.} 61 | 62 | \item{assignToVariable}{Character vector for a variable name to load the 63 | cache into. By default, \code{simpleCache} assigns the cache to a 64 | variable named \code{cacheName}; you can overrule that here.} 65 | 66 | \item{loadEnvir}{An environment. Into which environment would you like to 67 | load the variable? Defaults to \code{\link[base]{parent.frame}}.} 68 | 69 | \item{searchEnvir}{a vector of environments to search for the already loaded 70 | cache.} 71 | 72 | \item{nofail}{By default, simpleCache throws an error if the instructions 73 | fail. Use this option to convert this error into a warning. No cache will 74 | be created, but simpleCache will not then hard-stop your processing. This 75 | is useful, for example, if you are creating a bunch of caches (for 76 | example using \code{lapply}) and it's ok if some of them do not complete.} 77 | 78 | \item{batchRegistry}{A \code{batchtools} registry object (built with 79 | \code{\link[batchtools]{makeRegistry}}). If provided, this cache will be created on 80 | the cluster using your batchtools configuration} 81 | 82 | \item{batchResources}{A list of variables to provide to batchtools for 83 | cluster resource managers. Used as the \code{res} argument to 84 | \code{\link[batchtools]{batchMap}}} 85 | 86 | \item{pepSettings}{Experimental untested feature.} 87 | 88 | \item{ignoreLock}{Internal parameter used for batch job submission; don't 89 | touch.} 90 | 91 | \item{lifespan}{Numeric specifying the maximum age of cache, in days, to 92 | allow before automatically triggering \code{recreate=TRUE}.} 93 | } 94 | \description{ 95 | Given a unique name for an R object, and instructions for how to make that 96 | object, use the simpleCache function to create and cache or load the object. 97 | This should be used for computations that take a long time and generate a 98 | table or something used repeatedly (in other scripts, for example). Because 99 | the cache is tied to the object name, there is some danger of causing 100 | troubles if you misuse the caching system. The object should be considered 101 | static. 102 | } 103 | \details{ 104 | You should pass a bracketed R code snippet like \code{rnorm(500)} as the 105 | instruction, and simpleCache will create the object. Alternatively, if the 106 | code to create the cache is large, you can put an R script called object.R in 107 | the \code{\link[=setCacheBuildDir]{RBUILD.DIR}} (the name of the file *must* match the name of the object it 108 | creates *exactly*). If you don't provide an instruction, the function sources 109 | RBUILD.DIR/object.R and caches the result as the object. This source file 110 | *must* create an object with the same name of the object. If you already have 111 | an object with the name of the object to load in your current environment, 112 | this function will not try to reload the object; instead, it returns the 113 | local object. In essence, it assumes that this is a static object, which you 114 | will not change. You can force it to load the cached version instead with 115 | "reload". 116 | 117 | Because R uses lexical scope and not dynamic scope, you may need to pass some 118 | environment variables you use in your instruction code. You can use this 119 | using the parameter buildEnvir (just provide a list of named variables). 120 | } 121 | \examples{ 122 | # choose location to store caches 123 | cacheDir = tempdir() 124 | cacheDir 125 | setCacheDir(cacheDir) 126 | 127 | # build some caches 128 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE) 129 | simpleCache("normSample", { rnorm(5e3, 0,1) }) 130 | simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) 131 | 132 | # storing a cache after-the-fact 133 | normSample2 = rnorm(10, 0, 1) 134 | storeCache("normSample2") 135 | 136 | # what's available? 137 | listCaches() 138 | 139 | # load a cache 140 | simpleCache("normSample") 141 | 142 | # load multiples caches 143 | loadCaches(c("normSample", "normSample2"), reload=TRUE) 144 | } 145 | -------------------------------------------------------------------------------- /man/simpleCacheGlobal.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/sharedCaches.R 3 | \name{simpleCacheGlobal} 4 | \alias{simpleCacheGlobal} 5 | \title{Helper alias for loading caches into the global environment. 6 | simpleCache normally loads variables into the calling environment; this 7 | ensures that the variables are loaded in the global environment.} 8 | \usage{ 9 | simpleCacheGlobal(...) 10 | } 11 | \arguments{ 12 | \item{...}{Parameters passed to \code{\link{simpleCache}}.} 13 | } 14 | \description{ 15 | Helper alias for loading caches into the global environment. 16 | simpleCache normally loads variables into the calling environment; this 17 | ensures that the variables are loaded in the global environment. 18 | } 19 | -------------------------------------------------------------------------------- /man/simpleCacheOptions.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/cacheDirectories.R 3 | \name{simpleCacheOptions} 4 | \alias{simpleCacheOptions} 5 | \title{View simpleCache options} 6 | \usage{ 7 | simpleCacheOptions() 8 | } 9 | \description{ 10 | Views simpleCache global variables 11 | } 12 | -------------------------------------------------------------------------------- /man/simpleCacheShared.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/sharedCaches.R 3 | \name{simpleCacheShared} 4 | \alias{simpleCacheShared} 5 | \title{Alias to default to a shared cache folder.} 6 | \usage{ 7 | simpleCacheShared(...) 8 | } 9 | \arguments{ 10 | \item{...}{Parameters passed to \code{\link{simpleCache}}.} 11 | } 12 | \description{ 13 | Helper alias for caching across experiments/people. 14 | Just sets the cacheDir to the default SHARE directory 15 | (instead of the typical default PROJECT directory) 16 | } 17 | -------------------------------------------------------------------------------- /man/simpleCacheSharedGlobal.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/sharedCaches.R 3 | \name{simpleCacheSharedGlobal} 4 | \alias{simpleCacheSharedGlobal} 5 | \title{Helper alias for loading shared caches into the global environment.} 6 | \usage{ 7 | simpleCacheSharedGlobal(...) 8 | } 9 | \arguments{ 10 | \item{...}{Parameters passed to \code{\link{simpleCache}}.} 11 | } 12 | \description{ 13 | Helper alias for loading shared caches into the global environment. 14 | } 15 | -------------------------------------------------------------------------------- /man/storeCache.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/storeCache.R 3 | \name{storeCache} 4 | \alias{storeCache} 5 | \title{Stores as a cache an already-produced R object} 6 | \usage{ 7 | storeCache( 8 | cacheName, 9 | cacheDir = getCacheDir(), 10 | cacheSubDir = NULL, 11 | recreate = FALSE 12 | ) 13 | } 14 | \arguments{ 15 | \item{cacheName}{Unique name for the cache (and R object to be cached).} 16 | 17 | \item{cacheDir}{The directory where caches are saved (and loaded from). 18 | Defaults to the global \code{\link[=setCacheDir]{RCACHE.DIR}} variable} 19 | 20 | \item{cacheSubDir}{You can specify a subdirectory within the cacheDir 21 | variable. Defaults to \code{NULL}.} 22 | 23 | \item{recreate}{Forces reconstruction of the cache} 24 | } 25 | \description{ 26 | Sometimes you use significant computational power to create an object, but 27 | you didn't cache it with \code{\link{simpleCache}}. Oops, maybe you wish you had, after the 28 | fact. This function lets you store an object in the environment so it could 29 | be loaded by future calls to \code{simpleCache}. 30 | } 31 | \details{ 32 | This can be used in interactive sessions, but could also be used for another 33 | use case: you have a complicated set of instructions (too much to pass as the 34 | instruction argument to \code{simpleCache}), so you could just stick a call to 35 | \code{storeCache} at the end. 36 | } 37 | \examples{ 38 | # choose location to store caches 39 | cacheDir = tempdir() 40 | cacheDir 41 | setCacheDir(cacheDir) 42 | 43 | # build some caches 44 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE) 45 | simpleCache("normSample", { rnorm(5e3, 0,1) }) 46 | simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) 47 | 48 | # storing a cache after-the-fact 49 | normSample2 = rnorm(10, 0, 1) 50 | storeCache("normSample2") 51 | 52 | # what's available? 53 | listCaches() 54 | 55 | # load a cache 56 | simpleCache("normSample") 57 | 58 | # load multiples caches 59 | loadCaches(c("normSample", "normSample2"), reload=TRUE) 60 | } 61 | -------------------------------------------------------------------------------- /man/tic.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/utility.R 3 | \name{tic} 4 | \alias{tic} 5 | \title{Start a timer} 6 | \usage{ 7 | tic(gcFirst = TRUE, type = c("elapsed", "user.self", "sys.self")) 8 | } 9 | \arguments{ 10 | \item{gcFirst}{Garbage Collect before starting the timer?} 11 | 12 | \item{type}{Type of time to return, 13 | can be 'elapsed', 'user.self', or 'sys.self'} 14 | } 15 | \description{ 16 | Start a timer 17 | } 18 | -------------------------------------------------------------------------------- /man/toc.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/utility.R 3 | \name{toc} 4 | \alias{toc} 5 | \title{Check the time since the current timer was started with tic()} 6 | \usage{ 7 | toc() 8 | } 9 | \description{ 10 | Check the time since the current timer was started with tic() 11 | } 12 | -------------------------------------------------------------------------------- /paper/paper.bib: -------------------------------------------------------------------------------- 1 | @Manual{R, 2 | title = {R: A Language and Environment for Statistical Computing}, 3 | author = {{R Core Team}}, 4 | organization = {R Foundation for Statistical Computing}, 5 | address = {Vienna, Austria}, 6 | year = {2016}, 7 | url = {https://www.R-project.org/}, 8 | } 9 | 10 | @Article{batchtools, 11 | title = {batchtools: Tools for R to work on batch systems}, 12 | author = {Michel Lang and Bernd Bischl and Dirk Surmann}, 13 | journal = {The Journal of Open Source Software}, 14 | year = {2017}, 15 | month = {feb}, 16 | volume = {2}, 17 | number = {10}, 18 | doi = {10.21105/joss.00135}, 19 | url = {https://doi.org/10.21105/joss.00135}, 20 | } 21 | 22 | @Article{RPIM, 23 | author="Sheffield, N. C. and Pierron, G. and Klughammer, J. and Datlinger, P. and Schonegger, A. and Schuster, M. and Hadler, J. and Surdez, D. and Guillemot, D. and Lapouble, E. and Freneaux, P. and Champigneulle, J. and Bouvier, R. and Walder, D. and Ambros, I. M. and Hutter, C. and Sorz, E. and Amaral, A. T. and de Alava, E. and Schallmoser, K. and Strunk, D. and Rinner, B. and Liegl-Atzwanger, B. and Huppertz, B. and Leithner, A. and de Pinieux, G. and Terrier, P. and Laurence, V. and Michon, J. and Ladenstein, R. and Holter, W. and Windhager, R. and Dirksen, U. and Ambros, P. F. and Delattre, O. and Kovar, H. and Bock, C. and Tomazou, E. M. ", 24 | title={DNA methylation heterogeneity defines a disease spectrum in Ewing sarcoma}, 25 | journal={Nature Medicine}, 26 | year={2017}, 27 | volume={23}, 28 | number={3}, 29 | pages={386-395}, 30 | month={Mar}, 31 | doi = {10.1038/nm.4273} 32 | } 33 | 34 | @Article{LOLA, 35 | author={Sheffield, N. C. and Bock, C.}, 36 | title="{LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor}, 37 | journal={Bioinformatics}, 38 | year={2016}, 39 | volume={32}, 40 | number={4}, 41 | pages={587-589}, 42 | month={Feb}, 43 | doi = {10.1093/bioinformatics/btv612} 44 | } 45 | -------------------------------------------------------------------------------- /paper/paper.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'simpleCache: R caching for reproducible, distributed, large-scale projects' 3 | authors: 4 | - name: Nathan C. Sheffield 5 | orcid: 0000-0001-5643-4068 6 | affiliation: 1 7 | - name: VP Nagraj 8 | orcid: 0000-0003-0060-566X 9 | affiliation: 1 10 | - name: Vince Reuter 11 | orcid: 0000-0002-7967-976X 12 | affiliation: 1 13 | affiliations: 14 | - name: University of Virginia 15 | index: 1 16 | date: 28 October 2017 17 | bibliography: paper.bib 18 | --- 19 | 20 | # Summary 21 | 22 | `simpleCache` is an R[@R] package that provides functions for caching R objects. Its purpose is to encourage writing reusable, restartable, and reproducible analysis for projects with large data and computational requirements. Like its name indicates, `simpleCache` is intended to be simple. Users specify a location to store caches, and then provide nothing more than a cache name and instructions (R code) for how to produce an R object. `simpleCache` either creates and saves or simply loads the result as necessary with just a single function call. 23 | 24 | In addition to this basic functionality, `simpleCache` has advanced options for assigning objects to specific environments, recreating caches, reloading caches, and even distributing caching operations to cluster computing resources via the `batchools`[@batchtools] interface. These features make the package particularly useful for large-scale data analysis and research projects. `simpleCache` is most helpful for caching objects that are computationally expensive to create, but used in multiple scripts or by multiple users. 25 | 26 | `simpleCache` is also useful to enhance performance in a package that relies on large databases. For example, `simpleCache` has been incorporated with the LOLA R package[@LOLA] to more efficiently cache and retrieve genomic region databases. Similarly, `simpleCache` has been used to store cached baseline statistical tables for faster lookup to determine statistical differences on tables with hundreds of millions of data points [@RPIM]. 27 | 28 | In summary, `simpleCache` provides a user-friendly interface to help the R programmer manage computationally intensive, repeated data analysis. 29 | 30 | # References 31 | -------------------------------------------------------------------------------- /simpleCache.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | 15 | BuildType: Package 16 | PackageUseDevtools: Yes 17 | PackageInstallArgs: --no-multiarch --with-keep.source 18 | -------------------------------------------------------------------------------- /tests/testthat.R: -------------------------------------------------------------------------------- 1 | library(testthat) 2 | library(simpleCache) 3 | 4 | test_check("simpleCache") 5 | -------------------------------------------------------------------------------- /tests/testthat/helper-lifespan.R: -------------------------------------------------------------------------------- 1 | # Ancillary functions for cache lifespan tests. 2 | 3 | # Build a small data frame to cache. 4 | buildTestFrame = function() { data.frame(matrix(1:9, nrow=3)) } 5 | 6 | # Remove test case's temp cache folder. 7 | cleanLSTest = function() { unlink(lifespanTestsTmpdir(), recursive=TRUE) } 8 | 9 | # Count the number of items in the cache folder. 10 | countCacheItems = function() { length(list.files(getOption("RCACHE.DIR"))) } 11 | 12 | # Generate path to temp folder for test case. 13 | lifespanTestsTmpdir = function() { file.path(tempdir(), "lifespan") } 14 | 15 | # Establish a temp folder and set the cache home location to it. 16 | setupLSTest = function() { 17 | testdir = lifespanTestsTmpdir() 18 | if (!file_test("-d", testdir)) { dir.create(testdir) } 19 | setCacheDir(lifespanTestsTmpdir()) 20 | } 21 | -------------------------------------------------------------------------------- /tests/testthat/test_all.R: -------------------------------------------------------------------------------- 1 | library(simpleCache) 2 | 3 | context("error checking") 4 | 5 | 6 | # Map option name to its setter. 7 | kSetters = list(RCACHE.DIR=setCacheDir, RESOURCES.RCACHE=setSharedCacheDir, RBUILD.DIR=setCacheBuildDir) 8 | 9 | 10 | # Test a cache dir setting in managed context fashion, resetting before and after test. 11 | test_dir_default = function(cacheDirOptname) { 12 | resetCacheSearchEnvironment() 13 | test_that(sprintf("%s setter uses current folder for argument-less call", cacheDirOptname), { 14 | do.call(kSetters[[cacheDirOptname]], args=list()) 15 | expect_equal(getwd(), getOption(cacheDirOptname)) 16 | }) 17 | resetCacheSearchEnvironment() 18 | } 19 | 20 | 21 | test_that("notifications and messages as expected", { 22 | 23 | # message if cache exists 24 | simpleCache("normSample", instruction = {rnorm(5e3, 0,1)}, cacheDir = tempdir(), recreate=TRUE) 25 | expect_message(simpleCache("normSample", instruction = {rnorm(5e3, 0,1)}, cacheDir = tempdir(), recreate=FALSE, noload = TRUE), "^::Cache exists") 26 | deleteCaches("normSample", force = TRUE) 27 | 28 | # storeCache should not accept non-character cacheName 29 | expect_error(storeCache(cacheName = normSample, recreate = TRUE, cacheDir = tempdir()), "storeCache expects the cacheName variable to be a character vector.") 30 | 31 | # message when cacheDir isn't defined 32 | expect_message(simpleCache("normSample", { rnorm(5e3, 0,1) }), regexp = "^No cacheDir specified.") 33 | 34 | # error when buildDir is empty without instruction 35 | expect_error(simpleCache("normSample", cacheDir = tempdir(), buildDir = tempdir(), recreate = TRUE), "::Error::\tNo instruction or RBuild file provided.") 36 | 37 | # error when buildEnvir includes "instruction" 38 | expect_error(simpleCache("normSample", { rnorm(5e3, 0,1) }, buildEnvir = list(instruction="foo"), recreate=TRUE, cacheDir = tempdir()), "Can't provide a variable named 'instruction' in buildEnvir") 39 | 40 | # error when instruction and buildDir are null 41 | expect_error(simpleCache("normSample", instruction = NULL, buildDir = NULL, cacheDir = tempdir(), recreate=TRUE)) 42 | 43 | # error when cacheName is not character 44 | expect_error(simpleCache(12345, instruction = { rnorm(5e3, 0,1) }, buildDir = NULL, cacheDir = tempdir(), recreate=TRUE)) 45 | 46 | # message when return is NULL 47 | expect_message(simpleCache("normSample", instruction = {normSample <- NULL}, recreate = TRUE, cacheDir = tempdir()), "NULL value returned, no cache created") 48 | 49 | # we must clean up any temporary caches we make 50 | deleteCaches("normSample", force=TRUE, cacheDir = tempdir()) 51 | 52 | }) 53 | 54 | test_that("Caching respects files existing", { 55 | setCacheDir(tempdir()) 56 | set.seed(1) 57 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE) 58 | expect_equal(signif(normSample[1], 6), -0.626454) 59 | 60 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE) 61 | expect_equal(signif(normSample[1], 6), -1.51637) 62 | 63 | # Should not evaluate 64 | simpleCache("normSample", { rnorm(5e3, 0,1) }) 65 | expect_equal(signif(normSample[1], 6), -1.51637) 66 | 67 | 68 | # These delete cache should force the reload to recreate cache 69 | deleteCaches("normSample", force=TRUE) 70 | simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) 71 | expect_equal(signif(normSample[1], 6), -0.804332) 72 | 73 | # we must clean up any temporary caches we make 74 | deleteCaches("normSample", force=TRUE) 75 | 76 | }) 77 | 78 | context("basic functionality") 79 | 80 | test_that("timer works", { 81 | 82 | setCacheDir(tempdir()) 83 | timeout <- capture_messages(simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer = TRUE))[2] 84 | expect_match(timeout, "<[0-9][0-9]h [0-9][0-9]m [0-9].[0-9]s>") 85 | 86 | # we must clean up any temporary caches we make 87 | deleteCaches("normSample", force=TRUE) 88 | 89 | }) 90 | 91 | test_that("cache can be created without loading", { 92 | 93 | expect_null(simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate = TRUE, noload = TRUE, cacheDir = tempdir())) 94 | 95 | expect_true("normSample.RData" %in% listCaches()) 96 | 97 | # we must clean up any temporary caches we make 98 | deleteCaches("normSample", force=TRUE) 99 | 100 | }) 101 | 102 | test_that("object can be stored as cache", { 103 | 104 | normSample2 <<- rnorm(5e3,0,1) 105 | 106 | expect_message(storeCache("normSample2", cacheDir = NULL, recreate = TRUE), "^You must set global option RCACHE.DIR") 107 | 108 | expect_message(storeCache("normSample2", cacheDir = tempdir(), recreate = TRUE), "^::Creating cache::") 109 | 110 | expect_message(storeCache("normSample2", cacheDir = tempdir(), recreate = FALSE), "^::Cache already exists") 111 | 112 | # we must clean up any temporary caches we make 113 | deleteCaches("normSample2", force=TRUE) 114 | 115 | }) 116 | 117 | test_that("option setting works", { 118 | 119 | # set all options 120 | setCacheDir(tempdir()) 121 | setSharedCacheDir(tempdir()) 122 | setCacheBuildDir(tempdir()) 123 | addCacheSearchEnvironment("cacheEnv") 124 | 125 | # Windows uses double slashes, which get consumed weirdly by grep; 126 | # This command will replace double slashes with quadruple slashes, 127 | # which behave correctly in grep. 128 | grep_tempdir = gsub("\\\\", "\\\\\\\\", tempdir()) 129 | # capture output and check 130 | options_out <- capture_messages(simpleCacheOptions()) 131 | 132 | expect_true(grepl(grep_tempdir, options_out[1])) 133 | expect_true(grepl(grep_tempdir, options_out[2])) 134 | expect_true(grepl(grep_tempdir, options_out[3])) 135 | expect_true(grepl("cacheEnv", options_out[4])) 136 | 137 | # reset the cache search option 138 | resetCacheSearchEnvironment() 139 | 140 | # check to make sure it is gone 141 | options_out <- capture_messages(simpleCacheOptions()) 142 | expect_true(!grepl("cacheEnv", options_out[4])) 143 | 144 | }) 145 | 146 | test_that("Cache dir fetch works", { 147 | options(RCACHE.DIR = NULL) 148 | expect_true(is.null(getCacheDir())) 149 | setCacheDir(tempdir()) 150 | expect_false(is.null(getCacheDir())) 151 | expect_equal(getCacheDir(), tempdir()) 152 | }) 153 | 154 | # Test each cache directory option setter. 155 | for (optname in names(kSetters)) { test_dir_default(optname) } 156 | 157 | 158 | test_that("objects pass through in buildEnvir", { 159 | 160 | setCacheDir(tempdir()) 161 | 162 | set.seed(1) 163 | simpleCache("piSample", { pi^x }, buildEnvir = list(x=2), recreate=TRUE, timer = TRUE) 164 | rm(piSample) 165 | 166 | simpleCache("piSample", reload = TRUE) 167 | 168 | expect_equal(signif(piSample, 3), 9.87) 169 | 170 | # we must clean up any temporary caches we make 171 | deleteCaches("piSample", force=TRUE) 172 | 173 | }) 174 | 175 | test_that("caches can be loaded", { 176 | 177 | setCacheDir(tempdir()) 178 | 179 | simpleCache("loadSample", { rnorm(5e3, 0,1) }, recreate=TRUE) 180 | loadCaches("loadSample") 181 | 182 | expect_true("loadSample" %in% ls()) 183 | 184 | # we must clean up any temporary caches we make 185 | deleteCaches("loadSample", force=TRUE) 186 | 187 | }) 188 | context("misc") 189 | 190 | test_that("listCaches returns name of given cache", { 191 | 192 | setCacheDir(tempdir()) 193 | set.seed(1) 194 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE) 195 | expect_true("normSample.RData" %in% listCaches()) 196 | 197 | # we must clean up any temporary caches we make 198 | deleteCaches("normSample", force=TRUE) 199 | 200 | }) 201 | 202 | -------------------------------------------------------------------------------- /tests/testthat/test_cache_lifespan.R: -------------------------------------------------------------------------------- 1 | # test_cache_lifespan.R 2 | # Tests for enforcing cache lifespan requirement. 3 | 4 | # The pattern/them for each test is something like: 5 | # 1. Ensure the test case has a fresh, clean folder. 6 | # 2. Create a dummy cache. 7 | # 3. Check that only 1 file is in the cache. 8 | # 4. Grab the cache timestamp. 9 | # 5. Make another simpleCache call. 10 | # 6. Again check that there's a single cache file and grab the timestamp. 11 | # 7. Compare timestamps. 12 | 13 | context("lifespan") 14 | 15 | # Provide clean cache folder (pre-set) for each test case. 16 | my_test_that = function(description, instruction) { 17 | setupLSTest() 18 | test_that(description, instruction) 19 | cleanLSTest() 20 | } 21 | 22 | # Control loading behavior for these tests to focus on lifespan/recreate effects. 23 | mySimpleCache = function(...) { simpleCache(..., noload=TRUE) } 24 | 25 | # Negative control 26 | my_test_that("Cache file isn't replaced if no lifespan is specified and recreate=FALSE", { 27 | expect_equal(0, countCacheItems()) 28 | mySimpleCache("testDF", recreate=FALSE, instruction={ buildTestFrame() }) 29 | expect_equal(1, countCacheItems()) 30 | fp = file.path(getOption("RCACHE.DIR"), "testDF.RData") 31 | t0 = file.info(fp)$ctime 32 | mySimpleCache("testDF", recreate=FALSE, instruction={ buildTestFrame() }) 33 | expect_equal(1, countCacheItems()) 34 | t1 = file.info(fp)$ctime 35 | expect_equal(t0, t1) 36 | }) 37 | 38 | # Another sort of control 39 | my_test_that("Cache file is replaced if no lifespan is specified and recreate=TRUE", { 40 | expect_equal(0, countCacheItems()) 41 | fp = file.path(getOption("RCACHE.DIR"), "testDF.RData") 42 | expect_false(file_test("-f", fp)) 43 | mySimpleCache("testDF", instruction={ buildTestFrame() }) 44 | expect_equal(1, countCacheItems()) 45 | expect_true(file_test("-f", fp)) 46 | t0 = file.info(fp)$mtime 47 | Sys.sleep(1) # Delay so that our time comparison can work. 48 | mySimpleCache("testDF", recreate=TRUE, instruction={ buildTestFrame() }) 49 | expect_equal(1, countCacheItems()) 50 | t1 = file.info(fp)$mtime 51 | expect_true(t1 > t0) 52 | }) 53 | 54 | # Specificity 55 | my_test_that("Cache remains unchanged if younger than explicit lifespan", { 56 | expect_equal(0, countCacheItems()) 57 | fp = file.path(getOption("RCACHE.DIR"), "testDF.RData") 58 | expect_false(file_test("-f", fp)) 59 | mySimpleCache("testDF", instruction={ buildTestFrame() }) 60 | expect_equal(1, countCacheItems()) 61 | expect_true(file_test("-f", fp)) 62 | t0 = file.info(fp)$mtime 63 | mySimpleCache("testDF", lifespan=0.5, instruction={ buildTestFrame() }) 64 | expect_equal(1, countCacheItems()) 65 | t1 = file.info(fp)$mtime 66 | expect_true(t1 == t0) 67 | }) 68 | 69 | # Sensitivity 70 | my_test_that("Cache is replaced if older than explicit lifespan", { 71 | expect_equal(0, countCacheItems()) 72 | fp = file.path(getOption("RCACHE.DIR"), "testDF.RData") 73 | expect_false(file_test("-f", fp)) 74 | mySimpleCache("testDF", instruction={ buildTestFrame() }) 75 | expect_equal(1, countCacheItems()) 76 | expect_true(file_test("-f", fp)) 77 | t0 = file.info(fp)$mtime 78 | Sys.sleep(1) # Time difference comparison reliability. 79 | mySimpleCache("testDF", lifespan=0, instruction={ buildTestFrame() }) 80 | expect_equal(1, countCacheItems()) 81 | t1 = file.info(fp)$mtime 82 | expect_true(t1 > t0) 83 | }) 84 | 85 | # Explicit recreate argument trumps cache lifespan to determine recreation. 86 | my_test_that("Cache is replaced if recreate=TRUE even if cache is fresh", { 87 | expect_equal(0, countCacheItems()) 88 | fp = file.path(getOption("RCACHE.DIR"), "testDF.RData") 89 | expect_false(file_test("-f", fp)) 90 | mySimpleCache("testDF", instruction={ buildTestFrame() }) 91 | expect_true(file_test("-f", fp)) 92 | expect_equal(1, countCacheItems()) 93 | t0 = file.info(fp)$mtime 94 | Sys.sleep(1) # Time difference comparison reliability. 95 | mySimpleCache("testDF", recreate=TRUE, lifespan=0, instruction={ buildTestFrame() }) 96 | expect_equal(1, countCacheItems()) 97 | t1 = file.info(fp)$mtime 98 | expect_true(t1 > t0) 99 | }) 100 | 101 | my_test_that("simpleCache can pick up option specifying max cache age.", { 102 | options(MAX.CACHE.AGE = 0) 103 | fp = file.path(getOption("RCACHE.DIR"), "testDF.RData") 104 | expect_false(file_test("-f", fp)) 105 | mySimpleCache("testDF", instruction={ buildTestFrame() }) 106 | expect_true(file_test("-f", fp)) 107 | t0 = file.info(fp)$mtime 108 | Sys.sleep(1) # Time difference comparison reliability. 109 | mySimpleCache("testDF", instruction={ buildTestFrame() }) 110 | t1 = file.info(fp)$mtime 111 | expect_true(t1 > t0) 112 | }) 113 | 114 | my_test_that("Direct lifespan specification is preferred to background option", { 115 | options(MAX.CACHE.AGE = 1) 116 | fp = file.path(getOption("RCACHE.DIR"), "testDF.RData") 117 | expect_false(file_test("-f", fp)) 118 | mySimpleCache("testDF", instruction={ buildTestFrame() }) 119 | expect_true(file_test("-f", fp)) 120 | t0 = file.info(fp)$mtime 121 | Sys.sleep(1) 122 | mySimpleCache("testDF", instruction={ buildTestFrame() }) 123 | expect_equal(t0, file.info(fp)$ctime) # Cache is fresh via MAX.CACHE.AGE. 124 | Sys.sleep(1) # Time difference comparison reliability. 125 | mySimpleCache("testDF", lifespan=0, instruction={ buildTestFrame() }) 126 | t1 = file.info(fp)$mtime 127 | expect_true(t1 > t0) # Cache is stale via lifespan. 128 | }) 129 | -------------------------------------------------------------------------------- /vignettes/clusterCaches.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Generating caches on a cluster" 3 | author: "Nathan Sheffield" 4 | date: "`r Sys.Date()`" 5 | vignette: > 6 | %\VignetteEngine{knitr::rmarkdown} 7 | %\VignetteIndexEntry{Generating caches on a cluster} 8 | output: knitr:::html_vignette 9 | --- 10 | 11 | # Generating caches in parallel using batchtools 12 | 13 | By default, `simpleCache` creates caches in the R session you use to call it. If you need to make lots of caches, or very large caches, you may want instead to sub these as jobs to a cluster resource manager (like SLURM). simpleCache can do this using functionality from the `batchtools` package. 14 | 15 | This vignette is unevaluated because it relies on the `batchtools` package and a cluster environment. 16 | 17 | To do this, first, create a `batchtools` registry. You can follow more detailed documentation in the `batchtools` package, but here's some code to get you started: 18 | 19 | ```{r Try it out, eval=FALSE} 20 | library(simpleCache) 21 | setCacheDir(tempdir()) 22 | 23 | registry = batchtools::makeRegistry(NA) 24 | templateFile = system.file("templates/slurm-advanced.tmpl", package = "simpleCache") 25 | registry$cluster.functions = batchtools::makeClusterFunctionsSlurm( 26 | template = templateFile) 27 | registry 28 | ``` 29 | 30 | Notice that I'm using a custom slurm template here. With a registry in hand, we next need to define the resources this cache job will require: 31 | 32 | ```{r} 33 | resources = list(ncpus=1, memory=1000, walltime=60, partition="serial") 34 | ``` 35 | 36 | Then, we simply add these as arguments to `simpleCache()` like so: 37 | ```{r, eval=FALSE} 38 | simpleCache("testBatch", { 39 | rnorm(1e7, 0, 1) 40 | }, batchRegistry=registry, batchResources=resources) 41 | ``` 42 | 43 | This will now create and submit a job script to the cluster. That job script will have R code to create your `testBatch` cache by calling the code in your `simpleCache` call, `rnorm(1e7, 0, 1)`. Next time you run this function, it will just load the cache without recreating it, as you would expect simpleCache to do. Now there's a bunch of other stuff you can use `batchtools` to do with these jobs: 44 | 45 | ```{r, eval=FALSE} 46 | batchtools::getJobTable(reg=registry) 47 | batchtools::getJobPars() 48 | batchtools::getStatus() 49 | 50 | batchtools::getJobTable(reg=registry) 51 | batchtools::getJobPars(1, reg=registry) 52 | batchtools::loadResult(1, reg=registry) 53 | # batchtools::testJob(1, reg=registry) 54 | # killJobs() 55 | ``` 56 | 57 | When you're done, you may want to remove your temporary registry: 58 | ```{r, eval=FALSE} 59 | batchtools::removeRegistry(reg=registry) 60 | ``` 61 | 62 | See `batchtools` documentation for more details on using registries. -------------------------------------------------------------------------------- /vignettes/sharingCaches.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Sharing caches across projects" 3 | author: "Nathan Sheffield" 4 | date: "`r Sys.Date()`" 5 | vignette: > 6 | %\VignetteEngine{knitr::rmarkdown} 7 | %\VignetteIndexEntry{Sharing caches across projects} 8 | output: knitr:::html_vignette 9 | --- 10 | 11 | # Sharing caches across projects 12 | 13 | By default, `simpleCache` will store its caches for your project in the `RCACHE.DIR` global option. This is designed to be a project-specific directory, so I have a different RCache dir for each of my projects. Sometimes, though, I want to share caches across projects, and so it's useful to have a definition of a shared cache directory. I think of this as a general resource. For instance, I use this to store the location of all CpGs in the human genome, which I use repeatedly in many projects. 14 | 15 | To solve this problem, `simpleCache` uses a second global option, `SHARE.RCACHE.DIR`, which you can access with the convenience setter `setSharedCacheDir()`. Then, you use `simpleCache` as normal but with the additional parameter of cacheDir, or the convenience alias `simpleCacheShared()`, as outlined below: 16 | 17 | ```{r Try it out} 18 | library(simpleCache) 19 | cacheDir = tempdir() 20 | setSharedCacheDir(cacheDir) 21 | simpleCacheShared("normSample", { rnorm(1e7, 0,1) }, recreate=TRUE) 22 | simpleCacheShared("normSample", { rnorm(1e7, 0,1) }) 23 | ``` 24 | 25 | ```{r Clean up} 26 | deleteCaches("normSample", force=TRUE) 27 | ``` 28 | -------------------------------------------------------------------------------- /vignettes/simpleCacheIntroduction.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "An introduction to simpleCache" 3 | author: "Nathan Sheffield" 4 | date: "`r Sys.Date()`" 5 | vignette: > 6 | %\VignetteEngine{knitr::rmarkdown} 7 | %\VignetteIndexEntry{An introduction to simpleCache} 8 | output: knitr:::html_vignette 9 | --- 10 | 11 | # An introduction to simpleCache 12 | 13 | ## Your first cache 14 | 15 | `simpleCache` has 2 main use cases: First, it can help you pick up where you left off in an R session, and second, it can help you parallelize code by enabling you to share results across R sessions. 16 | 17 | The workhorse of `simpleCache` is the eponymous `simpleCache()` function, which in the simplest case requires just two parameters: a cache name, and a block of code. The cache name should be considered unique and its underlying object immutable, while the block of code (or *instruction*) is the `R` code that generates the object you wish to cache. 18 | 19 | But before we start creating caches, it's important to tell `simpleCache` where to store the caches. `simpleCache` uses a global variable (`RCACHE.DIR`) for caches, and provides a setter function (`setCacheDir()`) to change this. To get started, choose a cache directory, and generate some random data. 20 | 21 | ```{r Try it out} 22 | library(simpleCache) 23 | cacheDir = tempdir() 24 | setCacheDir(cacheDir) 25 | simpleCache("normSamp", { rnorm(1e7, 0,1) }) 26 | ``` 27 | 28 | Now, watch what happens when we run that same function call again: 29 | 30 | ```{r} 31 | simpleCache("normSamp", { rnorm(1e7, 0,1) }) 32 | ``` 33 | 34 | Notice that the second call to `simpleCache()` doesn't re-run the `rnorm` calculation. In fact, it doesn't even re-load the cache, because it notices that it's already in memory. If the cache weren't already in memory, this call would load it from disk. This means you can put this code in multiple scripts and pull the same randomized data, without re-doing the compute work. 35 | 36 | You can also force a cache to reload using the `reload` option. This could be useful, for example, if you've loaded a cache and then accidentally changed it, and want to reset. By default, a call to `simpleCache()` will not reload an object that already exists in your environment. But you can always force it with the `reload` parameter: 37 | 38 | ```{r} 39 | normSamp = NA # Oops broke my object in memory. 40 | # Regular call won't reload because we have an object called normSamp already: 41 | simpleCache("normSamp", { rnorm(1e7, 0,1) }) 42 | # But we can force reload and get it back with reload=TRUE 43 | simpleCache("normSamp", { rnorm(1e7, 0,1) }, reload=TRUE) 44 | ``` 45 | 46 | What if we want to start over and blow that cache, getting a new random set? Use the `recreate` flag if you want to ensure that the cache is produced and overwritten even if it already exists: 47 | 48 | ```{r} 49 | simpleCache("normSamp", { rnorm(1e7, 0,1) }, recreate=TRUE) 50 | ``` 51 | 52 | With just those parameters (cache name, instruction, recreate, and reload), you should be able to make good use of `simpleCache`. The essence is: if the object exists in memory already: do nothing. If it does not exist in memory, but exists on disk: load it into memory. If it exists neither in memory or on disk: create it and store it to disk and memory. Now you've got the basics. 53 | 54 | But there's more if you want it: read on! 55 | 56 | ## Comparison to base R save() and load() 57 | 58 | Of course, R has base functions that accomplish this (`save()` and `load()`), so what does simpleCache add? Well, `simpleCache` is essentially a convenience wrapper around the base R functions. The first advantage is that we now require only a single function: `simpleCache()` handles both saving and loading. This means your script does not need to be written differently depending on whether it's generating or loading a cache, because the same function can do either, depending on whether the cache exists or not. The second advantage is that caches are keyed by cache name instead of by filename. So instead of putting a whole path to an Rdata file into `load()`, we just pass a unique identifier for the cache, and simpleCache handles the rest. Third, `simpleCache` tries to be smart: if you already have the object in memory, it won't re-load it. For big caches, this can save you time if you accidentally call `simpleCache()` multiple times on the same cache (or if you write functions to populate an R environment with a bunch of pre-existing data). 59 | 60 | Beyond that, `simpleCache` also offers several convenient options that just make it really easy to save and re-load R objects. Let's go into a bit more detail into these features. 61 | 62 | ## Cache names 63 | 64 | By default, the object will be loaded into a variable with the same name as the cache. You can change this behavior with the `assignTo` parameter: 65 | 66 | ```{r} 67 | simpleCache("normSamp", { rnorm(1e7, 0,1) }, assignTo="mySamp") 68 | ``` 69 | 70 | After doing this command, we have both `normSamp` (from the previous calls, not from this one) and `mySamp` (loaded in this call) in the workspace, and these objects are identical: 71 | 72 | ```{r} 73 | identical(normSamp, mySamp) 74 | ``` 75 | 76 | This `assignTo` concept is useful if you want to create caches but not load them, or load caches one at a time. Which leads us to... 77 | 78 | ## Creating but not loading caches 79 | 80 | It may be that you want to create a bunch of caches that are quite memory intensive, and you don't actually need them all in this particular R workspace at the same time. If you just create each object and save it, you'll end with all those objects in memory at the same time. Instead, you can use the `noload` parameter, which will create the caches but not load them into memory (so the object will be cached, but will not persist in this R environment). I use this frequently in a setup script to build caches that I will need later in individual scripts that will run on each one individually. Let's make 5 caches but not load them: 81 | 82 | ```{r} 83 | for (i in 1:5) { 84 | cacheName = paste0("normSamp_", i) 85 | simpleCache(cacheName, { rnorm(1e6, 0,1) }, recreate=TRUE, noload=TRUE) 86 | } 87 | ``` 88 | 89 | We've now produced 5 different sample data caches. They exist on disk, but not in memory. This could, for example, be done in an initial data-generation or setup script. We then may be interested in using these (same) caches in several downstream scripts, and we could do some iterative operation on them and use `assignTo` to avoid loading more than 1 at a time into memory: 90 | 91 | ```{r} 92 | overallMinimum = 1e6 # pick some high number to start 93 | for (i in 1:5) { 94 | cacheName = paste0("normSamp_", i) 95 | simpleCache(cacheName, assignTo="temp") 96 | overallMinimum = min(overallMinimum, temp) 97 | } 98 | 99 | message(overallMinimum) 100 | ``` 101 | 102 | In this code block, by assigning the caches to the variable `temp`, we only have 1 in memory at a time, because each cache load overwrites the previous one, which is exactly what we want in this case. We keep track of the minimum value of each one independently, and we've effectively calculated an overall minimum while loading only a single cache in memory at a time. 103 | 104 | ## Loading multiple caches 105 | 106 | If you've got a bunch of caches and you want them all in memory, you could just load all the caches into memory with this convenience alias: 107 | ```{r} 108 | loadCaches(paste0("normSamp_", 1:5)) 109 | ``` 110 | 111 | The disadvantage of doing it this way is that you've lost the advantage of using the single `simpleCache()` function for both saving and loading, but this may be desirable in some cases. 112 | 113 | By the way, once a cache is created, you no longer need to provide instructions: 114 | 115 | ```{r} 116 | simpleCache("normSamp") 117 | ``` 118 | 119 | `simpleCache` will load it if it can; if not, it will give you an error saying it requires an `instruction`. 120 | 121 | ## Timing cache creating 122 | 123 | If you want to record how long it takes to create a new cache, you can set `timer=TRUE`. 124 | 125 | ```{r} 126 | simpleCache("normSamp", { rnorm(1e6, 0,1) }, recreate=TRUE, timer=TRUE) 127 | ``` 128 | 129 | ## Complicated code 130 | 131 | So far, our examples have cached the result of a very simple instruction code block: the `rnorm` call to randomly generate some numbers. But really, simpleCache can be used to cache anything. The code block can be whatever you want; whatever it returns will be cached. For example, let's cache the result of a call to `t.test()`: 132 | 133 | ```{r} 134 | simpleCache("tResult", { 135 | dat2 = rnorm(1e5, 0.05,2) 136 | t.test(normSamp, dat2) 137 | }, recreate=TRUE) 138 | 139 | tResult 140 | tResult$p.value 141 | ``` 142 | 143 | The point is that the code could be quite complicated and time-consuming. You may only want to calculate it once, and then re-use the result in another script -- or in this same script next time you run it. `simpleCache` makes that, well, simple. 144 | 145 | That's the end of the basics. There are a few more advanced options as well, such as using a shared cache directory, submitting compute requests to a cluster using `batchtools`, tweaking the loading environment with the `loadEnvir` parameter (if you need to call `simpleCache()` from within a function), and tweaking the cache building resources with the `buildEnvir` parameter. But these options are more advanced and probably not needed for 95% of `simpleCache` use cases. If you do need more information, you can find further help in the other vignettes or in the detailed R function documentation (see `?simpleCache`). 146 | 147 | ```{r Clean up} 148 | deleteCaches("normSamp", force=TRUE) 149 | deleteCaches(paste0("normSamp_", 1:5), force=TRUE) 150 | deleteCaches("tResult", force=TRUE) 151 | ``` --------------------------------------------------------------------------------