├── .Rbuildignore
├── .github
├── CONTRIBUTING.md
├── issue_template.md
└── pull_request_template.md
├── .gitignore
├── .travis.yml
├── CONDUCT.md
├── DESCRIPTION
├── LICENSE
├── NAMESPACE
├── NEWS.md
├── R
├── cacheDirectories.R
├── deleteCache.R
├── examples
│ └── example.R
├── listCaches.R
├── loadCaches.R
├── sharedCaches.R
├── simpleCache.R
├── storeCache.R
└── utility.R
├── README.md
├── _pkgdown.yaml
├── cran-comments.md
├── inst
├── cache
│ └── existingCache.RData
└── templates
│ └── slurm-advanced.tmpl
├── man
├── addCacheSearchEnvironment.Rd
├── deleteCaches.Rd
├── dot-tooOld.Rd
├── getCacheDir.Rd
├── listCaches.Rd
├── loadCaches.Rd
├── resetCacheSearchEnvironment.Rd
├── secToTime.Rd
├── setCacheBuildDir.Rd
├── setCacheDir.Rd
├── setSharedCacheDir.Rd
├── simpleCache-package.Rd
├── simpleCache.Rd
├── simpleCacheGlobal.Rd
├── simpleCacheOptions.Rd
├── simpleCacheShared.Rd
├── simpleCacheSharedGlobal.Rd
├── storeCache.Rd
├── tic.Rd
└── toc.Rd
├── paper
├── paper.bib
└── paper.md
├── simpleCache.Rproj
├── tests
├── testthat.R
└── testthat
│ ├── helper-lifespan.R
│ ├── test_all.R
│ └── test_cache_lifespan.R
└── vignettes
├── clusterCaches.Rmd
├── sharingCaches.Rmd
└── simpleCacheIntroduction.Rmd
/.Rbuildignore:
--------------------------------------------------------------------------------
1 | ^.*\.Rproj$
2 | ^\.Rproj\.user$
3 | ^\.travis\.yml$
4 | ^cran-comments.md$
5 | ^CONDUCT\.md$
6 | ^paper$
7 | .github
8 | ^_pkgdown\.yaml$
9 | ^doc$
10 | ^Meta$
11 |
--------------------------------------------------------------------------------
/.github/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # CONTRIBUTING #
2 |
3 | ### Please contribute!
4 |
5 | We love collaboration.
6 |
7 | ### Bugs?
8 |
9 | * Submit an issue on the [Issues page](https://github.com/databio/simpleCache/issues)
10 |
11 | ### Code contributions
12 |
13 | * Fork this repo to your Github account
14 | * Clone your version on your account down to your machine from your account, e.g,. `git clone https://github.com/databio/simpleCache.git`
15 | * Make sure to track progress upstream (i.e., on our version of `simpleCache` at `databio/simpleCache`) by doing `git remote add upstream https://github.com/databio/simpleCache.git`. Before making changes make sure to pull changes in from upstream by doing either `git fetch upstream` then merge later or `git pull upstream` to fetch and merge in one step
16 | * Make your changes (bonus points for making changes on a new feature branch)
17 | * Push up to your account
18 | * Submit a pull request to home base (likely master branch, but check to make sure) at `databio/simpleCache`
19 |
20 | ### Thanks for contributing!
--------------------------------------------------------------------------------
/.github/issue_template.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | Session Info
4 |
5 | ```r
6 |
7 | ```
8 |
--------------------------------------------------------------------------------
/.github/pull_request_template.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | ## Description
4 |
5 |
6 | ## Related Issue
7 |
10 |
11 | ## Example
12 |
14 |
15 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Compiled source #
2 | ###################
3 | *.com
4 | *.class
5 | *.dll
6 | *.exe
7 | *.o
8 | *.so
9 |
10 | # Packages #
11 | ############
12 | # it's better to unpack these files and commit the raw source
13 | # git has its own built in compression methods
14 | *.7z
15 | *.dmg
16 | *.gz
17 | *.iso
18 | *.jar
19 | *.rar
20 | *.tar
21 | *.zip
22 |
23 | # Logs and databases #
24 | ######################
25 | *.log
26 | *.sql
27 | *.sqlite
28 |
29 | # IDEs #
30 | ########
31 | .idea/
32 |
33 | # OS generated files #
34 | ######################
35 | .DS_Store
36 | .DS_Store?
37 | ._*
38 | .Spotlight-V100
39 | .Trashes
40 | ehthumbs.db
41 | Thumbs.db
42 |
43 | # Gedit temporary files #
44 | #########################
45 |
46 | *~
47 | ~
48 | .Rproj.user
49 | .Rhistory
50 | /doc/
51 | /Meta/
52 |
--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
1 | language: R
2 | sudo: false
3 | cache: packages
4 | r_packages: batchtools
--------------------------------------------------------------------------------
/CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Contributor Code of Conduct
2 |
3 | As contributors and maintainers of this project, we pledge to respect all people who
4 | contribute through reporting issues, posting feature requests, updating documentation,
5 | submitting pull requests or patches, and other activities.
6 |
7 | We are committed to making participation in this project a harassment-free experience for
8 | everyone, regardless of level of experience, gender, gender identity and expression,
9 | sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.
10 |
11 | Examples of unacceptable behavior by participants include the use of sexual language or
12 | imagery, derogatory comments or personal attacks, trolling, public or private harassment,
13 | insults, or other unprofessional conduct.
14 |
15 | Project maintainers have the right and responsibility to remove, edit, or reject comments,
16 | commits, code, wiki edits, issues, and other contributions that are not aligned to this
17 | Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed
18 | from the project team.
19 |
20 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by
21 | opening an issue or contacting one or more of the project maintainers.
22 |
23 | This Code of Conduct is adapted from the Contributor Covenant
24 | (http:contributor-covenant.org), version 1.0.0, available at
25 | http://contributor-covenant.org/version/1/0/0/
26 |
--------------------------------------------------------------------------------
/DESCRIPTION:
--------------------------------------------------------------------------------
1 | Package: simpleCache
2 | Version: 0.4.2
3 | Date: 2021-04-16
4 | Title: Simply Caching R Objects
5 | Description: Provides intuitive functions for caching R objects, encouraging
6 | reproducible, restartable, and distributed R analysis. The user selects a
7 | location to store caches, and then provides nothing more than a cache name
8 | and instructions (R code) for how to produce the R object. Also
9 | provides some advanced options like environment assignments, recreating or
10 | reloading caches, and cluster compute bindings (using the 'batchtools'
11 | package) making it flexible enough for use in large-scale data analysis
12 | projects.
13 | Authors@R: c(person("VP", "Nagraj", email = "vpnagraj@virginia.edu", role =
14 | c("aut")), person("Nathan", "Sheffield", email = "nathan@code.databio.org",
15 | role = c("aut", "cre")))
16 | Suggests:
17 | knitr,
18 | rmarkdown,
19 | testthat
20 | Enhances: batchtools
21 | VignetteBuilder: knitr
22 | License: BSD_2_clause + file LICENSE
23 | Encoding: UTF-8
24 | URL: https://github.com/databio/simpleCache
25 | BugReports: https://github.com/databio/simpleCache
26 | RoxygenNote: 7.1.1
27 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | YEAR: 2017
2 | COPYRIGHT HOLDER: Nathan Sheffield
--------------------------------------------------------------------------------
/NAMESPACE:
--------------------------------------------------------------------------------
1 | # Generated by roxygen2: do not edit by hand
2 |
3 | export(addCacheSearchEnvironment)
4 | export(deleteCaches)
5 | export(getCacheDir)
6 | export(listCaches)
7 | export(loadCaches)
8 | export(resetCacheSearchEnvironment)
9 | export(setCacheBuildDir)
10 | export(setCacheDir)
11 | export(setSharedCacheDir)
12 | export(simpleCache)
13 | export(simpleCacheGlobal)
14 | export(simpleCacheOptions)
15 | export(simpleCacheShared)
16 | export(simpleCacheSharedGlobal)
17 | export(storeCache)
18 |
--------------------------------------------------------------------------------
/NEWS.md:
--------------------------------------------------------------------------------
1 | # Change log
2 | All notable changes to this project will be documented in this file.
3 |
4 | ## simpleCache [0.4.2] -- 2021-04-16
5 |
6 | - updates to accommodate latest knitr for vignettes
7 |
8 | ## simpleCache [0.4.1] -- 2019-02-26
9 |
10 | - fixes unit tests on windows
11 | - fixes lifespan bug that used creation time instead of modification time
12 | - allow arg-less directory setting to default to current working dir
13 |
14 | ## simpleCache [0.4.0] -- 2017-12-20
15 |
16 | - adds a lifespan arg to simpleCache() to create auto-expiring caches
17 | - remove unnecessary parse argument to simpleCache()
18 | - viewCacheDirs() renamed to simpleCacheOptions()
19 |
20 | ## simpleCache [0.3.1] -- 2017-08-21
21 |
22 | - fixed a bug in unit tests that left behind a test cache in user home dir.
23 | - changes cache building to happen in parent.frame()
24 | - repaired vignette so R code is displayed properly
25 | - added deleteCaches() function and docs
26 | - reduced size of unit test cache for speed increase
27 |
28 | ## simpleCache [0.3.0] -- 2017-08-21
29 |
30 | - switched default cache dir to tempdir()
31 | - changed availCaches() to listCaches()
32 | - changes cache building to happen in globalenv(), so that any loaded
33 | packages are available for cache building
34 |
35 |
36 | ## simpleCache [0.2.1] -- 2017-07-30
37 |
38 | - added examples
39 |
40 | ## simpleCache [0.2.0] -- 2017-07-30
41 |
42 | - support for batchjobs parallel processing
43 | - docs, prep for submission to CRAN
44 |
45 | ## simpleCache [0.0.1]
46 |
47 | - long-term stable version
48 |
--------------------------------------------------------------------------------
/R/cacheDirectories.R:
--------------------------------------------------------------------------------
1 | ################################################################################
2 | # CACHE DIRECTORY FUNCTIONS
3 | ################################################################################
4 | # These are exported functions for interacting with global variables that
5 | # specify default directories for 2 cache types: project caches and shared
6 | # caches.
7 |
8 | #' Sets a global variable specifying the default cache directory for
9 | #' \code{\link{simpleCache}} calls.
10 | #'
11 | #' @param cacheDir Directory where caches should be stored
12 | #' @export
13 | #' @example
14 | #' R/examples/example.R
15 | setCacheDir = function(cacheDir=NULL) { .setDir("RCACHE.DIR", cacheDir) }
16 |
17 | #' Fetcher of the currently set cache directory.
18 | #'
19 | #' \code{getCacheDir} retrieves the value of the option that stores the currently
20 | #' set cache directory path.
21 | #'
22 | #' @return If the option is set, the path to the currently set cache directory; otherwise, \code{NULL}.
23 | #' @export
24 | getCacheDir = function() { getOption("RCACHE.DIR") }
25 |
26 | #' Set shared cache directory
27 | #'
28 | #' Sets global variable specifying the default cache directory for
29 | #' \code{\link{simpleCacheShared}} calls; this function is simply a helper alias for caching
30 | #' results that will be used across projects.
31 | #'
32 | #' @param sharedCacheDir Directory where shared caches should be stored
33 | #' @export
34 | setSharedCacheDir = function(sharedCacheDir=NULL) { .setDir("RESOURCES.RCACHE", sharedCacheDir) }
35 |
36 | #' Sets local cache build directory with scripts for building files.
37 | #'
38 | #' @param cacheBuildDir Directory where build scripts are stored.
39 | #' @export
40 | setCacheBuildDir = function(cacheBuildDir=NULL) { .setDir("RBUILD.DIR", cacheBuildDir) }
41 |
42 | #' View simpleCache options
43 | #'
44 | #' Views simpleCache global variables
45 | #' @export
46 | simpleCacheOptions = function() {
47 | message("RESOURCES.RCACHE:\t", getOption("RESOURCES.RCACHE"))
48 | message("RCACHE.DIR:\t", getCacheDir())
49 | message("RBUILD.DIR:\t", getOption("RBUILD.DIR"))
50 | message("SIMPLECACHE.ENV:\t", getOption("SIMPLECACHE.ENV"))
51 | }
52 |
53 | #' Add a cache search environment
54 | #'
55 | #' Append a new Environment name (a character string) to a global option
56 | #' which is a vector of such names. SimpleCache will search all of these
57 | #' environments to check if a cache is previously loaded, before reloading it.
58 | #'
59 | #' @param addEnv Environment to append to the shared cache search list
60 | #' @export
61 | addCacheSearchEnvironment = function(addEnv) {
62 | options(SIMPLECACHE.ENV=append(addEnv, getOption("SIMPLECACHE.ENV")))
63 | }
64 |
65 | #' Sets global option of cache search environments to \code{NULL}.
66 | #'
67 | #' @export
68 | resetCacheSearchEnvironment = function() {
69 | options(SIMPLECACHE.ENV=NULL)
70 | }
71 |
72 |
73 | .setDir = function(optname, dirpath=NULL) {
74 | diropts = list(ifelse(is.null(dirpath), getwd(), dirpath))
75 | names(diropts) = optname
76 | do.call(options, diropts)
77 | }
78 |
--------------------------------------------------------------------------------
/R/deleteCache.R:
--------------------------------------------------------------------------------
1 | #' Deletes caches
2 | #'
3 | #' Given a cache name, this function will attempt to delete the cache of that
4 | #' name on disk.
5 | #' @param cacheNames Name(s) of the cache to delete
6 | #' @param cacheDir Directory where caches are kept
7 | #' @param force Force deletion without user prompt
8 | #' @export
9 | #' @example
10 | #' R/examples/example.R
11 | deleteCaches = function(cacheNames, cacheDir=getCacheDir(),
12 | force=FALSE) {
13 |
14 | if (force) {
15 | response = "y"
16 | } else {
17 | response = readline("Are you sure you want to delete this cache? [y/N]")
18 | }
19 |
20 | if (tolower(response) == "yes" || tolower(response) == "y") {
21 | for (cacheName in cacheNames) {
22 | cacheFile = file.path(cacheDir, paste0(cacheName, ".RData"))
23 | message("Deleting ", cacheFile)
24 | unlink(cacheFile)
25 | }
26 | } else {
27 | message("User aborted cache delete.")
28 | }
29 | }
--------------------------------------------------------------------------------
/R/examples/example.R:
--------------------------------------------------------------------------------
1 | # choose location to store caches
2 | cacheDir = tempdir()
3 | cacheDir
4 | setCacheDir(cacheDir)
5 |
6 | # build some caches
7 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE)
8 | simpleCache("normSample", { rnorm(5e3, 0,1) })
9 | simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE)
10 |
11 | # storing a cache after-the-fact
12 | normSample2 = rnorm(10, 0, 1)
13 | storeCache("normSample2")
14 |
15 | # what's available?
16 | listCaches()
17 |
18 | # load a cache
19 | simpleCache("normSample")
20 |
21 | # load multiples caches
22 | loadCaches(c("normSample", "normSample2"), reload=TRUE)
23 |
--------------------------------------------------------------------------------
/R/listCaches.R:
--------------------------------------------------------------------------------
1 | #' Show available caches.
2 | #'
3 | #' Lists any cache files in the cache directory.
4 | #'
5 | #' @param cacheSubDir Optional parameter to specify a subdirectory of the cache folder.
6 | #' @return \code{character} vector in which each element is the path to a file that
7 | #' represents an available cache (within \code{getOption("RCACHE.DIR")})
8 | #' @export
9 | #' @example
10 | #' R/examples/example.R
11 | listCaches = function(cacheSubDir="") {
12 | cacheDirFiles = list.files(paste0(getCacheDir(), cacheSubDir))
13 | cacheDirFiles[which(sapply(cacheDirFiles, function(f) endsWith(f, ".RData")))]
14 | }
15 |
16 |
--------------------------------------------------------------------------------
/R/loadCaches.R:
--------------------------------------------------------------------------------
1 | #' Loads pre-made caches
2 | #'
3 | #' This function just takes a list of caches, and loads them. It's designed
4 | #' for stuff you already cached previously, so it won't build any caches.
5 | #'
6 | #' @param cacheNames Vector of caches to load.
7 | #' @param loadEnvir Environment into which to load each cache.
8 | #' @param ... Additional parameters passed to simpleCache.
9 | #' @export
10 | #' @example
11 | #' R/examples/example.R
12 | loadCaches = function(cacheNames, loadEnvir=NULL, ...) {
13 | if (is.null(loadEnvir)) { loadEnvir = parent.frame(n=2) }
14 | for (i in 1:length(cacheNames)) {
15 | # By default, load these caches into the environment that
16 | # calls loadCaches (which is the grandparent, n=2, of the call to
17 | # simpleCache.
18 | simpleCache(cacheNames[i], loadEnvir=loadEnvir, ...)
19 | }
20 | }
21 |
--------------------------------------------------------------------------------
/R/sharedCaches.R:
--------------------------------------------------------------------------------
1 | ################################################################################
2 | # Helper aliases for common options
3 |
4 | #' Alias to default to a shared cache folder.
5 | #'
6 | #' Helper alias for caching across experiments/people.
7 | #' Just sets the cacheDir to the default SHARE directory
8 | #' (instead of the typical default PROJECT directory)
9 |
10 | #'
11 | #' @param ... Parameters passed to \code{\link{simpleCache}}.
12 | #' @export
13 | simpleCacheShared = function(...) {
14 | # Since this is a function calling this, I have to set the loadEnvir here,
15 | # otherwise by default simpleCache will load the data into *this* environment,
16 | # which then gets prompty discarded -- not good. The downside is that this
17 | # function now couldn't take a custom loadEnvir.
18 | simpleCache(..., cacheDir=getOption("RESOURCES.RCACHE"), loadEnvir=parent.frame())
19 | }
20 |
21 | #' Helper alias for loading caches into the global environment.
22 | #' simpleCache normally loads variables into the calling environment; this
23 | #' ensures that the variables are loaded in the global environment.
24 | #'
25 | #' @param ... Parameters passed to \code{\link{simpleCache}}.
26 | #' @export
27 | simpleCacheGlobal = function(...) {
28 | simpleCache(..., loadEnvir=globalenv())
29 | }
30 |
31 | #' Helper alias for loading shared caches into the global environment.
32 | #'
33 | #' @param ... Parameters passed to \code{\link{simpleCache}}.
34 | #' @export
35 | simpleCacheSharedGlobal = function(...) {
36 | simpleCache(..., cacheDir=getOption("RESOURCES.RCACHE"), loadEnvir=globalenv())
37 | }
38 |
39 |
--------------------------------------------------------------------------------
/R/simpleCache.R:
--------------------------------------------------------------------------------
1 | ## Package documentation
2 | #' Provides intuitive functions for caching R objects, encouraging faster
3 | #' reproducible and restartable R analysis
4 | #'
5 | #' @references \url{https://github.com/databio/simpleCache}
6 | #' @docType package
7 | #' @author Nathan Sheffield
8 | #' @aliases simpleCache-package
9 | "_PACKAGE"
10 |
11 | ################################################################################
12 |
13 | #' Create a new cache or load a previously created cache.
14 | #'
15 | #' Given a unique name for an R object, and instructions for how to make that
16 | #' object, use the simpleCache function to create and cache or load the object.
17 | #' This should be used for computations that take a long time and generate a
18 | #' table or something used repeatedly (in other scripts, for example). Because
19 | #' the cache is tied to the object name, there is some danger of causing
20 | #' troubles if you misuse the caching system. The object should be considered
21 | #' static.
22 | #'
23 | #' You should pass a bracketed R code snippet like \code{rnorm(500)} as the
24 | #' instruction, and simpleCache will create the object. Alternatively, if the
25 | #' code to create the cache is large, you can put an R script called object.R in
26 | #' the \code{\link[=setCacheBuildDir]{RBUILD.DIR}} (the name of the file *must* match the name of the object it
27 | #' creates *exactly*). If you don't provide an instruction, the function sources
28 |
29 | #' RBUILD.DIR/object.R and caches the result as the object. This source file
30 | #' *must* create an object with the same name of the object. If you already have
31 | #' an object with the name of the object to load in your current environment,
32 | #' this function will not try to reload the object; instead, it returns the
33 | #' local object. In essence, it assumes that this is a static object, which you
34 | #' will not change. You can force it to load the cached version instead with
35 | #' "reload".
36 | #'
37 | #' Because R uses lexical scope and not dynamic scope, you may need to pass some
38 | #' environment variables you use in your instruction code. You can use this
39 | #' using the parameter buildEnvir (just provide a list of named variables).
40 | #'
41 | #' @param cacheName A character vector for a unique name for the cache. Be careful.
42 | #' @param instruction R expression (in braces) to be evaluated. The returned value of this
43 | #' code is what will be cached under the cacheName.
44 | #' @param buildEnvir An environment (or list) providing additional variables
45 | #' necessary for evaluating the code in instruction.
46 | #' @param reload Logical indicating whether to force re-loading the cache,
47 | #' even if it exists in the env.
48 | #' @param recreate Logical indicating whether to force reconstruction of the
49 | #' cache
50 | #' @param noload Logical indicating whether to create but not load the cache.
51 | #' noload is useful for: you want to create the caches, but not load (like a
52 | #' cache creation loop).
53 | #' @param cacheDir Character vector specifying the directory where caches are
54 | #' saved (and loaded from). Defaults to the variable set by
55 | #' \code{\link[=setCacheDir]{setCacheDir()}}.
56 | #' @param cacheSubDir Character vector specifying a subdirectory within the
57 | #' \code{cacheDir} variable. Defaults to \code{NULL}.
58 | #' @param assignToVariable Character vector for a variable name to load the
59 | #' cache into. By default, \code{simpleCache} assigns the cache to a
60 | #' variable named \code{cacheName}; you can overrule that here.
61 | #' @param loadEnvir An environment. Into which environment would you like to
62 | #' load the variable? Defaults to \code{\link[base]{parent.frame}}.
63 | #' @param searchEnvir a vector of environments to search for the already loaded
64 | #' cache.
65 | #' @param timer Logical indicating whether to report how long it took to create
66 | #' the cache.
67 | #' @param buildDir Location of Build files (files with instructions for use If
68 | #' the instructions argument is not provided). Defaults to RBUILD.DIR
69 | #' global option.
70 | #' @param nofail By default, simpleCache throws an error if the instructions
71 | #' fail. Use this option to convert this error into a warning. No cache will
72 | #' be created, but simpleCache will not then hard-stop your processing. This
73 | #' is useful, for example, if you are creating a bunch of caches (for
74 | #' example using \code{lapply}) and it's ok if some of them do not complete.
75 | #' @param batchRegistry A \code{batchtools} registry object (built with
76 | #' \code{\link[batchtools]{makeRegistry}}). If provided, this cache will be created on
77 | #' the cluster using your batchtools configuration
78 | #' @param batchResources A list of variables to provide to batchtools for
79 | #' cluster resource managers. Used as the \code{res} argument to
80 | #' \code{\link[batchtools]{batchMap}}
81 | #' @param lifespan Numeric specifying the maximum age of cache, in days, to
82 | #' allow before automatically triggering \code{recreate=TRUE}.
83 | #' @param pepSettings Experimental untested feature.
84 | #' @param ignoreLock Internal parameter used for batch job submission; don't
85 | #' touch.
86 | #' @export
87 | #' @example
88 | #' R/examples/example.R
89 | simpleCache = function(cacheName, instruction=NULL, buildEnvir=NULL,
90 | reload=FALSE, recreate=FALSE, noload=FALSE,
91 | cacheDir=getCacheDir(), cacheSubDir=NULL, timer=FALSE,
92 | buildDir=getOption("RBUILD.DIR"), assignToVariable=NULL,
93 | loadEnvir=parent.frame(), searchEnvir=getOption("SIMPLECACHE.ENV"),
94 | nofail=FALSE, batchRegistry=NULL, batchResources=NULL, pepSettings=NULL,
95 | ignoreLock=FALSE, lifespan=NULL) {
96 |
97 | if (!"character" %in% class(cacheName)) {
98 | stop("simpleCache expects the cacheName variable to be a character vector.")
99 | }
100 |
101 | # Because R evaluates arguments lazily (only when they are used),
102 | # it will not evaluate the instruction if I first wrap it in a
103 | # primitive substitute call. Then I can evaluate conditionally
104 | # (if the cache needs to be recreated)
105 | instruction = substitute(instruction)
106 | if ("character" %in% class(instruction)) {
107 | message("Character instruction; consider wrapping in braces.")
108 | parse = TRUE
109 | } else { parse = FALSE }
110 |
111 | # Handle directory paths.
112 | if (!is.null(cacheSubDir)) { cacheDir = file.path(cacheDir, cacheSubDir) }
113 | if (is.null(cacheDir)) {
114 | message(strwrap("No cacheDir specified. You should set global option
115 | RCACHE.DIR with setCacheDir(), or specify a cacheDir parameter directly
116 | to simpleCache(). With no other option, simpleCache will use tempdir():
117 | ", initial="", prefix=" "), tempdir())
118 | cacheDir = tempdir()
119 | }
120 |
121 | if (!file.exists(cacheDir)) {
122 | dir.create(cacheDir, recursive=TRUE)
123 | }
124 | cacheFile = file.path(cacheDir, paste0(cacheName, ".RData"))
125 | lockFile = file.path(cacheDir, paste0(cacheName, ".lock"))
126 | if (ignoreLock) {
127 | # remove the lock file when this function call is complete.
128 | on.exit(file.remove(lockFile))
129 | }
130 | submitted = FALSE
131 | # Check if cache exists in any provided search environment.
132 | searchEnvir = append(searchEnvir, ".GlobalEnv") # Assume global env.
133 | cacheExists = FALSE
134 | cacheWhere = NULL
135 |
136 | for ( curEnv in searchEnvir ) {
137 | if(! ( exists(curEnv) && is.environment(get(curEnv))) ) {
138 | warning(curEnv, " is not an environment.")
139 | } else if( exists(cacheName, where=get(curEnv))) {
140 | cacheExists = TRUE
141 | cacheWhere = curEnv
142 | break
143 | }
144 | }
145 |
146 |
147 | ret = NULL # The default, in case the cache construction fails.
148 |
149 | if (.tooOld(cacheFile, lifespan)) {
150 | message(sprintf(
151 | "Stale cache: '%s' (age > %d day(s))", cacheFile, lifespan))
152 | recreate = TRUE
153 | }
154 |
155 | if(cacheExists & !reload & !recreate) {
156 | message("::Object exists (in ", cacheWhere, ")::\t", cacheName)
157 | #return(get(cacheName))
158 | #return()
159 | ret = get(cacheName, pos = get(cacheWhere))
160 | } else if (file.exists(lockFile) & !ignoreLock) {
161 | message("::Cache processing (lock file exists)::\t", lockFile)
162 | #check for slurm log...
163 |
164 | if (!is.null(batchRegistry)) {
165 | # Grabbing log from batchtools
166 | # 1 is the job id.
167 | message(paste(batchtools::getLog(1, reg=batchRegistry), collapse="\n"))
168 | }
169 | if (!is.null(pepSettings)) {
170 | # TODO: retrieve log
171 | stop("PEP settings submission is not yet implemented")
172 |
173 | }
174 |
175 | return()
176 |
177 | } else if(file.exists(cacheFile) & !recreate & !noload) {
178 | message("::Loading cache::\t", cacheFile)
179 | load(cacheFile)
180 | } else if(file.exists(cacheFile) & !recreate) {
181 | message("::Cache exists (no load)::\t", cacheFile)
182 | return(NULL)
183 | } else {
184 | message("::Creating cache::\t", cacheFile)
185 |
186 | tryCatch( { # Intercept any errors with creating this cache.
187 |
188 | if(is.null(instruction)) {
189 | if (is.null(buildDir)) {
190 | stop(strwrap("::Error::\tIf you do not provide an
191 | instruction argument, you must set global option RBUILD.DIR
192 | with setCacheBuildDir, or specify a buildDir parameter
193 | directly to simpleCache()."))
194 | }
195 | RBuildFile = file.path(buildDir, paste0(cacheName, ".R"))
196 |
197 | if (!file.exists(RBuildFile)) {
198 | stop("::Error::\tNo instruction or RBuild file provided.")
199 | }
200 |
201 | if (timer) { tic() }
202 | source(file.path(buildDir, paste0(cacheName, ".R")), local=FALSE)
203 | if (timer) { toc() }
204 | ret = get(cacheName)
205 | } else {
206 |
207 | if (is.null(buildEnvir)) {
208 | if (timer) { tic() }
209 | if ( ! is.null(batchRegistry) ) {
210 | # Submit to cluster using batchtools
211 |
212 | if (! requireNamespace("batchtools", quietly=TRUE)) {
213 | stop("Install batchtools for cluster submission...")
214 | }
215 | if (is.null(batchResources)) {
216 | stop("You must provide both batchRegistry and batchResources.")
217 | }
218 | message("Submitting job to cluster")
219 | # You have to wrap `instruction` in substitute() so it won't be evaluated,
220 | # then you have to wrap that in list so it won't be misinterpreted
221 | # by batchMap as multiple arguments, causing extra jobs.
222 | args = list(cacheName=cacheName,
223 | instruction=list(substitute(instruction)),
224 | cacheDir=cacheDir, ignoreLock=TRUE)
225 |
226 | ids = batchtools::batchMap(
227 | fun=simpleCache,
228 | args=args,
229 | reg=batchRegistry)
230 |
231 | # lock cache so it won't be loaded prematurely or double-written
232 | file.create(lockFile)
233 | batchtools::submitJobs(ids=ids, reg=batchRegistry, res=batchResources)
234 |
235 | message("Done submitting to cluster")
236 | submitted = "batch"
237 | } else if ( ! is.null(pepSettings) ) {
238 | stop("PEP settings submission is not yet implemented")
239 | # Build a simpleCache command
240 | #simpleCacheCode = paste0("simpleCache('", cacheName, "',
241 | # instruction='", paste0(deparse(instruction), collapse="\n"), "',
242 | # recreate=", recreate, ",
243 | # cacheDir='", cacheDir,"',
244 | # ignoreLock=TRUE)")
245 | #if (slurmParams$jobName=="test") { slurmParams$jobName=cacheName }
246 | #with(slurmParams, buildSlurmScript(
247 | # simpleCacheCode, preamble, submit, hpcFolder,
248 | # jobName, mem, cores, partition, timeLimit, sourceProjectInit))
249 | } else {
250 | # No cluster submission request, so just run it here!
251 | # "ret," for return, is the name the cacheName is stored under.
252 | if (parse) {
253 | ret = eval(parse(text=instruction), envir=parent.frame())
254 | } else {
255 | # Here we do the evaluation in the parent frame so that
256 | # it will have access to any packages the user has loaded
257 | # that may be required to run the code. Otherwise, it will
258 | # run in the simpleCache namespace which could lack these
259 | # packages (or have a different search path hierarchy),
260 | # leading to failures. The `substitute` call here ensures
261 | # the code isn't evaluated at argument stage, but is retained
262 | # until it makes it to the `eval` call.
263 | ret = eval(instruction, envir=parent.frame())
264 | }
265 | }
266 | if (timer) { toc() }
267 | } else {
268 | # Build environment was provided.
269 | # we must place the instruction in the environment to build from
270 | if (exists("instruction", buildEnvir)) {
271 | stop("Can't provide a variable named 'instruction' in buildEnvir")
272 | }
273 | buildEnvir$instruction = instruction
274 | be = as.environment(buildEnvir)
275 | # As described above, this puts global package functions into
276 | # scope so instructions can use them.
277 | parent.env(be) = parent.frame()
278 | if (timer) { tic() }
279 | if (parse) {
280 | ret = with(be, eval(parse(text=instruction)))
281 | } else {
282 | #ret = with(buildEnvir, evalq(instruction))
283 | ret = with(be, eval(instruction))
284 |
285 | }
286 | if (timer) { toc() }
287 | }
288 | }
289 |
290 | # tryCatch
291 | }, error = function(e) { if (nofail) warning(e) else stop(e) })
292 |
293 | if (submitted == "batch") {
294 | message("Job submitted, check for cache.")
295 | return()
296 | } else if (is.null(ret)) {
297 | message("NULL value returned, no cache created")
298 | return() #so we don't assign NULL to the object.
299 | } else {
300 | save(ret, file=cacheFile)
301 | }
302 | }
303 | if (noload) {
304 | rm(ret)
305 | gc()
306 | return()
307 | }
308 | if(is.null(assignToVariable)) {
309 | assignToVariable = cacheName
310 | }
311 | assign(assignToVariable, ret, envir=loadEnvir)
312 |
313 | #return() #used to return ret, but not any more
314 | }
315 |
--------------------------------------------------------------------------------
/R/storeCache.R:
--------------------------------------------------------------------------------
1 | #' Stores as a cache an already-produced R object
2 | #'
3 | #' Sometimes you use significant computational power to create an object, but
4 | #' you didn't cache it with \code{\link{simpleCache}}. Oops, maybe you wish you had, after the
5 | #' fact. This function lets you store an object in the environment so it could
6 | #' be loaded by future calls to \code{simpleCache}.
7 | #'
8 | #' This can be used in interactive sessions, but could also be used for another
9 | #' use case: you have a complicated set of instructions (too much to pass as the
10 | #' instruction argument to \code{simpleCache}), so you could just stick a call to
11 | #' \code{storeCache} at the end.
12 | #'
13 | #' @param cacheName Unique name for the cache (and R object to be cached).
14 | #' @param cacheDir The directory where caches are saved (and loaded from).
15 | #' Defaults to the global \code{\link[=setCacheDir]{RCACHE.DIR}} variable
16 | #' @param cacheSubDir You can specify a subdirectory within the cacheDir
17 | #' variable. Defaults to \code{NULL}.
18 | #' @param recreate Forces reconstruction of the cache
19 | #' @export
20 | #' @example
21 | #' R/examples/example.R
22 | storeCache = function(cacheName, cacheDir = getCacheDir(),
23 | cacheSubDir = NULL, recreate=FALSE) {
24 |
25 | if(!is.null(cacheSubDir)) {
26 | cacheDir = file.path(cacheDir, cacheSubDir)
27 |
28 | }
29 |
30 | if (is.null(cacheDir)) {
31 | message(strwrap("You must set global option RCACHE.DIR with
32 | setSharedCacheDir(), or specify a cacheDir parameter directly to
33 | simpleCache()."))
34 | return(NA);
35 | }
36 |
37 | if(! "character" %in% class(cacheName)) {
38 | stop(strwrap("storeCache expects the cacheName variable to be a
39 | character vector."))
40 | }
41 |
42 | if (!file.exists(cacheDir)) {
43 | dir.create(cacheDir, recursive=TRUE)
44 | }
45 | cacheFile = file.path(cacheDir, paste0(cacheName, ".RData"))
46 | if(file.exists(cacheFile) & !recreate) {
47 | message("::Cache already exists (use recreate to overwrite)::\t",
48 | cacheFile)
49 | return (NULL)
50 | } else if (!exists(cacheName)) {
51 | message("::Object does not exist::\t", cacheName)
52 | return(NULL)
53 | } else {
54 | message("::Creating cache::\t", cacheFile)
55 | ret = get(cacheName)
56 | save(ret, file=cacheFile)
57 | }
58 | }
59 |
--------------------------------------------------------------------------------
/R/utility.R:
--------------------------------------------------------------------------------
1 | ################################################################################
2 | # UTILITY FUNCTIONS
3 | ################################################################################
4 | # These are functions copied over from my repository of utilities used
5 | # by this package. They are repeated here simply for portability, so this
6 | # package can be deployed on systems without access to my utilities.
7 | # Any changes should probably be backported to the primary functions rather
8 | # than in these convenience duplications.
9 | #
10 | # These functions should probably remain interior to the package (not exported)
11 | #
12 | #' Determine if a cache file is sufficiently old to warrant refresh.
13 | #'
14 | #' \code{.tooOld} accepts a maximum cache age and checks for an option with
15 | #' that setting under \code{MAX.CACHE.AGE} if such an argument isn't passed.
16 | #' If the indicated file exists and is older than the threshold passed or
17 | #' set as an option, the file is deemed "stale." If an age threshold is
18 | #' provided, no check for an option is performed. If the file does not
19 | #' exist or there's not an age threshold directly passed or set as an option,
20 | #' the result is \code{FALSE}.
21 | #'
22 | #' @param pathCacheFile Path to file to ask about staleness.
23 | #' @param lifespan Maximum file age before it's "stale."
24 | #' @return \code{TRUE} if the file exists and its age exceeds
25 | #' \code{lifespan} if given or
26 | #' \code{getOption("MAX.CACHE.AGE")} if no age threshold is passed
27 | #' and that option exists; \code{FALSE} otherwise.
28 | .tooOld = function(pathCacheFile, lifespan=NULL) {
29 | if (!utils::file_test("-f", pathCacheFile)) { return(FALSE) }
30 | if (is.null(lifespan)) { lifespan = getOption("MAX.CACHE.AGE") }
31 | if (is.null(lifespan)) { return(FALSE) }
32 | cacheTime = file.info(pathCacheFile)$mtime
33 | cacheAge = difftime(Sys.time(), cacheTime, units="days")
34 | as.numeric(cacheAge) > as.numeric(lifespan)
35 | }
36 |
37 |
38 | # MATLAB-style timing functions to start/stop timer.
39 | # These functions were based on an idea by some helpful soul on
40 | # Stackoverflow that I can no longer recall...
41 |
42 | #' This function takes a time in seconds and converts it to a more
43 | #' human-readable format, showing hours, minutes, or seconds, depending
44 | #' on how long the time is. Used by my implementation of tic()/toc().
45 | #' @param timeInSec numeric value of time measured in seconds.
46 | secToTime = function(timeInSec) {
47 | hr = timeInSec %/% 3600 #hours
48 | min = timeInSec %% 3600 %/% 60 #minutes
49 | sec = timeInSec %% 60 #seconds
50 | return(paste0(sprintf("%02d", hr), "h ", sprintf("%02d", min), "m ",
51 | sprintf("%02.01f", signif(sec, 3)), "s"))
52 | }
53 |
54 | ticTocEnv = new.env()
55 |
56 | #' Start a timer
57 | #' @param gcFirst Garbage Collect before starting the timer?
58 | #' @param type Type of time to return,
59 | #' can be 'elapsed', 'user.self', or 'sys.self'
60 | tic = function(gcFirst = TRUE, type=c("elapsed", "user.self", "sys.self")) {
61 | type <- match.arg(type)
62 | assign(".type_simpleCache", type, envir=ticTocEnv)
63 | if(gcFirst) gc(FALSE)
64 | tic <- proc.time()[type]
65 | assign(".tic_simpleCache", tic, envir=ticTocEnv)
66 | invisible(tic)
67 | }
68 |
69 | #' Check the time since the current timer was started with tic()
70 | toc = function() {
71 | type <- get(".type_simpleCache", envir=ticTocEnv)
72 | toc <- proc.time()[type]
73 | tic <- get(".tic_simpleCache", envir=ticTocEnv)
74 | timeInSec = as.numeric(toc-tic);
75 | message("<", secToTime(timeInSec), ">", appendLF=FALSE)
76 | invisible(toc)
77 | }
78 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | simpleCache: R caching for restartable analysis
2 | -----------------------------------------------
3 |
4 | 
5 |
6 | `simpleCache` is an R package providing functions for caching R objects. Its
7 | purpose is to encourage writing reusable, restartable, and reproducible analysis
8 | pipelines for projects with massive data and computational requirements.
9 |
10 | Like its name indicates, `simpleCache` is intended to be simple. You choose a
11 | location to store your caches, and then provide the function with nothing more
12 | than a cache name and instructions (R code) for how to produce the R object.
13 | While simple, `simpleCache` also provides some advanced options like environment
14 | assignments, recreating caches, reloading caches, and even cluster compute
15 | bindings (using the `batchtools` package) making it flexible enough for use in
16 | large-scale data analysis projects.
17 |
18 | --------------------------------------------------------------------------------
19 | ### Installing simpleCache
20 |
21 | `simpleCache` is on
22 | [CRAN](https://cran.r-project.org/package=simpleCache) and can
23 | be installed as usual:
24 |
25 | ```
26 | install.packages("simpleCache")
27 | ```
28 |
29 | --------------------------------------------------------------------------------
30 | ### Running simpleCache
31 |
32 | `simpleCache` comes with a single primary function (`simpleCache()`) that will do almost
33 | everything you need. In short, you run it with a few lines like this:
34 |
35 | ```
36 | library(simpleCache)
37 | setCacheDir(tempdir())
38 | simpleCache("normSample", { rnorm(1e7, 0,1) }, recreate=TRUE)
39 | simpleCache("normSample", { rnorm(1e7, 0,1) })
40 | ```
41 |
42 | `simpleCache` also interfaces with the `batchtools` package to let you build
43 | caches on any cluster resource manager.
44 |
45 | --------------------------------------------------------------------------------
46 | ### Highlights of exported functions
47 |
48 | - `simpleCache()`: Creates and caches or reloads cached results of provided R instruction code
49 | - `listCaches()`: Lists all of the caches available in the `cacheDir`
50 | - `deleteCaches()`: Deletes cache(s) from the `cacheDir`
51 | - `setCacheDir()`: Sets a global option for a cache directory so you don't have to specify one in each `simpleCache` call
52 | - `simpleCacheOptions()`: Views all of the `simpleCache` global options that have been set
53 |
54 | ### simpleCache Philosophy
55 |
56 | The use case I had in mind for `simpleCache` is that you find yourself
57 | constantly recalculating the same R object in several different scripts, or
58 | repeatedly in the same script, every time you open it and want to continue that
59 | project. SimpleCache is well-suited for interactive analysis, allowing you to
60 | pick up right where you left off in a new R session, without having to
61 | recalculate everything. It is equally useful in automatic pipelines, where
62 | separate scripts may benefit from loading, instead of recalculating, the same R
63 | objects produced by other scripts.
64 |
65 | R provides some base functions (`save`, `serialize`, and `load`) to let you save
66 | and reload such objects, but these low-level functions are a bit cumbersome.
67 | `simpleCache` simply provides a convenient, user-friendly interface to these
68 | functions, streamlining the process. For example, a single `simpleCache` call
69 | will check for a cache and load it if it exists, or create it if it does not.
70 | With the base R `save` and `load` functions, you can't just write a single
71 | function call and then run the same thing every time you start the script --
72 | even this simple use case requires additional logic to check for an existing
73 | cache. `simpleCache` just does all this for you.
74 |
75 | The thing to keep in mind with `simpleCache` is that **the cache name is
76 | paramount**. `simpleCache` assumes that your name for an object is a perfect
77 | identifier for that object; in other words, don't cache things that you plan to
78 | change.
79 |
80 | ### Contributing
81 |
82 | `simpleCache` is licensed under the [2-Clause BSD License](https://opensource.org/licenses/BSD-2-Clause). Questions, feature requests and bug reports are welcome via the [issue queue](https://github.com/databio/simpleCache/issues). The maintainer will review pull requests and incorporate contributions at his discretion.
83 |
84 | For more information refer to the contributing document and pull request / issue templates in the [.github folder](https://github.com/databio/simpleCache/tree/master/.github) of this repository.
85 |
86 |
87 |
88 |
89 |
90 |
91 |
92 |
93 |
94 |
95 |
96 |
97 |
--------------------------------------------------------------------------------
/_pkgdown.yaml:
--------------------------------------------------------------------------------
1 |
2 | template:
3 | params:
4 | bootswatch: yeti
5 |
6 | navbar:
7 | left:
8 | - text: Vignettes
9 | icon: fa-play-circle
10 | href: articles/index.html
11 | - text: Documentation
12 | icon: fa-pencil
13 | href: reference/index.html
14 | - text: GitHub
15 | icon: fa-github fa-lg
16 | href: https://github.com/databio/simpleCache
17 |
18 | right:
19 | - text: Databio.org
20 | href: http://databio.org
21 | - text: Software & Data
22 | href: http://databio.org/software/
23 |
24 | articles:
25 | - title: Introduction
26 | contents:
27 | - simpleCacheIntroduction
28 | - title: Advanced features
29 | contents:
30 | - clusterCaches
31 | - sharingCaches
32 |
--------------------------------------------------------------------------------
/cran-comments.md:
--------------------------------------------------------------------------------
1 | ## Test environments
2 |
3 | * local OS X install, R 3.3.2
4 | * Ubuntu 12.04, R 3.3.2 via TRAVIS CI
5 | * win-builder
6 |
7 | ## R CMD check results
8 |
9 | There were no ERRORs or WARNINGs.
10 |
11 | There was 1 note:
12 |
13 | `checking CRAN incoming feasibility ...`
14 |
15 | ## Downstream dependencies
16 |
17 | None.
--------------------------------------------------------------------------------
/inst/cache/existingCache.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/databio/simpleCache/806f818dc2c61d5bd433eea540e2b18e84010416/inst/cache/existingCache.RData
--------------------------------------------------------------------------------
/inst/templates/slurm-advanced.tmpl:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | ## Job Resource Interface Definition
4 | ##
5 | ## ntasks [integer(1)]: Number of required tasks,
6 | ## Set larger than 1 if you want to further parallelize
7 | ## with MPI within your job.
8 | ## ncpus [integer(1)]: Number of required cpus per task,
9 | ## Set larger than 1 if you want to further parallelize
10 | ## with multicore/parallel within each task.
11 | ## walltime [integer(1)]: Walltime for this job, in minutes.
12 | ## Must be at least 1 minute.
13 | ## memory [integer(1)]: Memory in megabytes for each cpu.
14 | ## Must be at least 100 (when I tried lower values my
15 | ## jobs did not start at all).
16 | ##
17 | ## Default resources can be set in your .batchtools.conf.R by defining the variable
18 | ## 'default.resources' as a named list.
19 |
20 | <%
21 | # relative paths are not handled well by Slurm
22 | log.file = normalizePath(log.file, winslash = "/", mustWork = FALSE)
23 | -%>
24 |
25 |
26 | #SBATCH --job-name=<%= job.hash %>
27 | #SBATCH --output=<%= log.file %>
28 | #SBATCH --error=<%= log.file %>
29 | #SBATCH --time=<%= ceiling(resources$walltime / 60) %>
30 | #SBATCH --ntasks=1
31 | #SBATCH --cpus-per-task=<%= resources$ncpus %>
32 | #SBATCH --mem-per-cpu=<%= resources$memory %>
33 | <%= if (!is.null(resources$partition)) sprintf(paste0("#SBATCH --partition='", resources$partition, "'")) %>
34 | <%= if (array.jobs) sprintf("#SBATCH --array=1-%i", nrow(jobs)) else "" %>
35 |
36 | ## Initialize work environment like
37 | ## source /etc/profile
38 | ## module add ...
39 |
40 | ## Export value of DEBUGME environemnt var to slave
41 | export DEBUGME=<%= Sys.getenv("DEBUGME") %>
42 |
43 | ## Run R:
44 | ## we merge R output with stdout from SLURM, which gets then logged via --output option
45 | Rscript -e 'batchtools::doJobCollection("<%= uri %>")'
46 |
--------------------------------------------------------------------------------
/man/addCacheSearchEnvironment.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/cacheDirectories.R
3 | \name{addCacheSearchEnvironment}
4 | \alias{addCacheSearchEnvironment}
5 | \title{Add a cache search environment}
6 | \usage{
7 | addCacheSearchEnvironment(addEnv)
8 | }
9 | \arguments{
10 | \item{addEnv}{Environment to append to the shared cache search list}
11 | }
12 | \description{
13 | Append a new Environment name (a character string) to a global option
14 | which is a vector of such names. SimpleCache will search all of these
15 | environments to check if a cache is previously loaded, before reloading it.
16 | }
17 |
--------------------------------------------------------------------------------
/man/deleteCaches.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/deleteCache.R
3 | \name{deleteCaches}
4 | \alias{deleteCaches}
5 | \title{Deletes caches}
6 | \usage{
7 | deleteCaches(cacheNames, cacheDir = getCacheDir(), force = FALSE)
8 | }
9 | \arguments{
10 | \item{cacheNames}{Name(s) of the cache to delete}
11 |
12 | \item{cacheDir}{Directory where caches are kept}
13 |
14 | \item{force}{Force deletion without user prompt}
15 | }
16 | \description{
17 | Given a cache name, this function will attempt to delete the cache of that
18 | name on disk.
19 | }
20 | \examples{
21 | # choose location to store caches
22 | cacheDir = tempdir()
23 | cacheDir
24 | setCacheDir(cacheDir)
25 |
26 | # build some caches
27 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE)
28 | simpleCache("normSample", { rnorm(5e3, 0,1) })
29 | simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE)
30 |
31 | # storing a cache after-the-fact
32 | normSample2 = rnorm(10, 0, 1)
33 | storeCache("normSample2")
34 |
35 | # what's available?
36 | listCaches()
37 |
38 | # load a cache
39 | simpleCache("normSample")
40 |
41 | # load multiples caches
42 | loadCaches(c("normSample", "normSample2"), reload=TRUE)
43 | }
44 |
--------------------------------------------------------------------------------
/man/dot-tooOld.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/utility.R
3 | \name{.tooOld}
4 | \alias{.tooOld}
5 | \title{Determine if a cache file is sufficiently old to warrant refresh.}
6 | \usage{
7 | .tooOld(pathCacheFile, lifespan = NULL)
8 | }
9 | \arguments{
10 | \item{pathCacheFile}{Path to file to ask about staleness.}
11 |
12 | \item{lifespan}{Maximum file age before it's "stale."}
13 | }
14 | \value{
15 | \code{TRUE} if the file exists and its age exceeds
16 | \code{lifespan} if given or
17 | \code{getOption("MAX.CACHE.AGE")} if no age threshold is passed
18 | and that option exists; \code{FALSE} otherwise.
19 | }
20 | \description{
21 | \code{.tooOld} accepts a maximum cache age and checks for an option with
22 | that setting under \code{MAX.CACHE.AGE} if such an argument isn't passed.
23 | If the indicated file exists and is older than the threshold passed or
24 | set as an option, the file is deemed "stale." If an age threshold is
25 | provided, no check for an option is performed. If the file does not
26 | exist or there's not an age threshold directly passed or set as an option,
27 | the result is \code{FALSE}.
28 | }
29 |
--------------------------------------------------------------------------------
/man/getCacheDir.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/cacheDirectories.R
3 | \name{getCacheDir}
4 | \alias{getCacheDir}
5 | \title{Fetcher of the currently set cache directory.}
6 | \usage{
7 | getCacheDir()
8 | }
9 | \value{
10 | If the option is set, the path to the currently set cache directory; otherwise, \code{NULL}.
11 | }
12 | \description{
13 | \code{getCacheDir} retrieves the value of the option that stores the currently
14 | set cache directory path.
15 | }
16 |
--------------------------------------------------------------------------------
/man/listCaches.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/listCaches.R
3 | \name{listCaches}
4 | \alias{listCaches}
5 | \title{Show available caches.}
6 | \usage{
7 | listCaches(cacheSubDir = "")
8 | }
9 | \arguments{
10 | \item{cacheSubDir}{Optional parameter to specify a subdirectory of the cache folder.}
11 | }
12 | \value{
13 | \code{character} vector in which each element is the path to a file that
14 | represents an available cache (within \code{getOption("RCACHE.DIR")})
15 | }
16 | \description{
17 | Lists any cache files in the cache directory.
18 | }
19 | \examples{
20 | # choose location to store caches
21 | cacheDir = tempdir()
22 | cacheDir
23 | setCacheDir(cacheDir)
24 |
25 | # build some caches
26 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE)
27 | simpleCache("normSample", { rnorm(5e3, 0,1) })
28 | simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE)
29 |
30 | # storing a cache after-the-fact
31 | normSample2 = rnorm(10, 0, 1)
32 | storeCache("normSample2")
33 |
34 | # what's available?
35 | listCaches()
36 |
37 | # load a cache
38 | simpleCache("normSample")
39 |
40 | # load multiples caches
41 | loadCaches(c("normSample", "normSample2"), reload=TRUE)
42 | }
43 |
--------------------------------------------------------------------------------
/man/loadCaches.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/loadCaches.R
3 | \name{loadCaches}
4 | \alias{loadCaches}
5 | \title{Loads pre-made caches}
6 | \usage{
7 | loadCaches(cacheNames, loadEnvir = NULL, ...)
8 | }
9 | \arguments{
10 | \item{cacheNames}{Vector of caches to load.}
11 |
12 | \item{loadEnvir}{Environment into which to load each cache.}
13 |
14 | \item{...}{Additional parameters passed to simpleCache.}
15 | }
16 | \description{
17 | This function just takes a list of caches, and loads them. It's designed
18 | for stuff you already cached previously, so it won't build any caches.
19 | }
20 | \examples{
21 | # choose location to store caches
22 | cacheDir = tempdir()
23 | cacheDir
24 | setCacheDir(cacheDir)
25 |
26 | # build some caches
27 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE)
28 | simpleCache("normSample", { rnorm(5e3, 0,1) })
29 | simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE)
30 |
31 | # storing a cache after-the-fact
32 | normSample2 = rnorm(10, 0, 1)
33 | storeCache("normSample2")
34 |
35 | # what's available?
36 | listCaches()
37 |
38 | # load a cache
39 | simpleCache("normSample")
40 |
41 | # load multiples caches
42 | loadCaches(c("normSample", "normSample2"), reload=TRUE)
43 | }
44 |
--------------------------------------------------------------------------------
/man/resetCacheSearchEnvironment.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/cacheDirectories.R
3 | \name{resetCacheSearchEnvironment}
4 | \alias{resetCacheSearchEnvironment}
5 | \title{Sets global option of cache search environments to \code{NULL}.}
6 | \usage{
7 | resetCacheSearchEnvironment()
8 | }
9 | \description{
10 | Sets global option of cache search environments to \code{NULL}.
11 | }
12 |
--------------------------------------------------------------------------------
/man/secToTime.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/utility.R
3 | \name{secToTime}
4 | \alias{secToTime}
5 | \title{This function takes a time in seconds and converts it to a more
6 | human-readable format, showing hours, minutes, or seconds, depending
7 | on how long the time is. Used by my implementation of tic()/toc().}
8 | \usage{
9 | secToTime(timeInSec)
10 | }
11 | \arguments{
12 | \item{timeInSec}{numeric value of time measured in seconds.}
13 | }
14 | \description{
15 | This function takes a time in seconds and converts it to a more
16 | human-readable format, showing hours, minutes, or seconds, depending
17 | on how long the time is. Used by my implementation of tic()/toc().
18 | }
19 |
--------------------------------------------------------------------------------
/man/setCacheBuildDir.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/cacheDirectories.R
3 | \name{setCacheBuildDir}
4 | \alias{setCacheBuildDir}
5 | \title{Sets local cache build directory with scripts for building files.}
6 | \usage{
7 | setCacheBuildDir(cacheBuildDir = NULL)
8 | }
9 | \arguments{
10 | \item{cacheBuildDir}{Directory where build scripts are stored.}
11 | }
12 | \description{
13 | Sets local cache build directory with scripts for building files.
14 | }
15 |
--------------------------------------------------------------------------------
/man/setCacheDir.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/cacheDirectories.R
3 | \name{setCacheDir}
4 | \alias{setCacheDir}
5 | \title{Sets a global variable specifying the default cache directory for
6 | \code{\link{simpleCache}} calls.}
7 | \usage{
8 | setCacheDir(cacheDir = NULL)
9 | }
10 | \arguments{
11 | \item{cacheDir}{Directory where caches should be stored}
12 | }
13 | \description{
14 | Sets a global variable specifying the default cache directory for
15 | \code{\link{simpleCache}} calls.
16 | }
17 | \examples{
18 | # choose location to store caches
19 | cacheDir = tempdir()
20 | cacheDir
21 | setCacheDir(cacheDir)
22 |
23 | # build some caches
24 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE)
25 | simpleCache("normSample", { rnorm(5e3, 0,1) })
26 | simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE)
27 |
28 | # storing a cache after-the-fact
29 | normSample2 = rnorm(10, 0, 1)
30 | storeCache("normSample2")
31 |
32 | # what's available?
33 | listCaches()
34 |
35 | # load a cache
36 | simpleCache("normSample")
37 |
38 | # load multiples caches
39 | loadCaches(c("normSample", "normSample2"), reload=TRUE)
40 | }
41 |
--------------------------------------------------------------------------------
/man/setSharedCacheDir.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/cacheDirectories.R
3 | \name{setSharedCacheDir}
4 | \alias{setSharedCacheDir}
5 | \title{Set shared cache directory}
6 | \usage{
7 | setSharedCacheDir(sharedCacheDir = NULL)
8 | }
9 | \arguments{
10 | \item{sharedCacheDir}{Directory where shared caches should be stored}
11 | }
12 | \description{
13 | Sets global variable specifying the default cache directory for
14 | \code{\link{simpleCacheShared}} calls; this function is simply a helper alias for caching
15 | results that will be used across projects.
16 | }
17 |
--------------------------------------------------------------------------------
/man/simpleCache-package.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/simpleCache.R
3 | \docType{package}
4 | \name{simpleCache-package}
5 | \alias{simpleCache-package}
6 | \alias{_PACKAGE}
7 | \title{Provides intuitive functions for caching R objects, encouraging faster
8 | reproducible and restartable R analysis}
9 | \description{
10 | Provides intuitive functions for caching R objects, encouraging
11 | reproducible, restartable, and distributed R analysis. The user selects a
12 | location to store caches, and then provides nothing more than a cache name
13 | and instructions (R code) for how to produce the R object. Also
14 | provides some advanced options like environment assignments, recreating or
15 | reloading caches, and cluster compute bindings (using the 'batchtools'
16 | package) making it flexible enough for use in large-scale data analysis
17 | projects.
18 | }
19 | \references{
20 | \url{https://github.com/databio/simpleCache}
21 | }
22 | \seealso{
23 | Useful links:
24 | \itemize{
25 | \item \url{https://github.com/databio/simpleCache}
26 | \item Report bugs at \url{https://github.com/databio/simpleCache}
27 | }
28 |
29 | }
30 | \author{
31 | Nathan Sheffield
32 | }
33 |
--------------------------------------------------------------------------------
/man/simpleCache.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/simpleCache.R
3 | \name{simpleCache}
4 | \alias{simpleCache}
5 | \title{Create a new cache or load a previously created cache.}
6 | \usage{
7 | simpleCache(
8 | cacheName,
9 | instruction = NULL,
10 | buildEnvir = NULL,
11 | reload = FALSE,
12 | recreate = FALSE,
13 | noload = FALSE,
14 | cacheDir = getCacheDir(),
15 | cacheSubDir = NULL,
16 | timer = FALSE,
17 | buildDir = getOption("RBUILD.DIR"),
18 | assignToVariable = NULL,
19 | loadEnvir = parent.frame(),
20 | searchEnvir = getOption("SIMPLECACHE.ENV"),
21 | nofail = FALSE,
22 | batchRegistry = NULL,
23 | batchResources = NULL,
24 | pepSettings = NULL,
25 | ignoreLock = FALSE,
26 | lifespan = NULL
27 | )
28 | }
29 | \arguments{
30 | \item{cacheName}{A character vector for a unique name for the cache. Be careful.}
31 |
32 | \item{instruction}{R expression (in braces) to be evaluated. The returned value of this
33 | code is what will be cached under the cacheName.}
34 |
35 | \item{buildEnvir}{An environment (or list) providing additional variables
36 | necessary for evaluating the code in instruction.}
37 |
38 | \item{reload}{Logical indicating whether to force re-loading the cache,
39 | even if it exists in the env.}
40 |
41 | \item{recreate}{Logical indicating whether to force reconstruction of the
42 | cache}
43 |
44 | \item{noload}{Logical indicating whether to create but not load the cache.
45 | noload is useful for: you want to create the caches, but not load (like a
46 | cache creation loop).}
47 |
48 | \item{cacheDir}{Character vector specifying the directory where caches are
49 | saved (and loaded from). Defaults to the variable set by
50 | \code{\link[=setCacheDir]{setCacheDir()}}.}
51 |
52 | \item{cacheSubDir}{Character vector specifying a subdirectory within the
53 | \code{cacheDir} variable. Defaults to \code{NULL}.}
54 |
55 | \item{timer}{Logical indicating whether to report how long it took to create
56 | the cache.}
57 |
58 | \item{buildDir}{Location of Build files (files with instructions for use If
59 | the instructions argument is not provided). Defaults to RBUILD.DIR
60 | global option.}
61 |
62 | \item{assignToVariable}{Character vector for a variable name to load the
63 | cache into. By default, \code{simpleCache} assigns the cache to a
64 | variable named \code{cacheName}; you can overrule that here.}
65 |
66 | \item{loadEnvir}{An environment. Into which environment would you like to
67 | load the variable? Defaults to \code{\link[base]{parent.frame}}.}
68 |
69 | \item{searchEnvir}{a vector of environments to search for the already loaded
70 | cache.}
71 |
72 | \item{nofail}{By default, simpleCache throws an error if the instructions
73 | fail. Use this option to convert this error into a warning. No cache will
74 | be created, but simpleCache will not then hard-stop your processing. This
75 | is useful, for example, if you are creating a bunch of caches (for
76 | example using \code{lapply}) and it's ok if some of them do not complete.}
77 |
78 | \item{batchRegistry}{A \code{batchtools} registry object (built with
79 | \code{\link[batchtools]{makeRegistry}}). If provided, this cache will be created on
80 | the cluster using your batchtools configuration}
81 |
82 | \item{batchResources}{A list of variables to provide to batchtools for
83 | cluster resource managers. Used as the \code{res} argument to
84 | \code{\link[batchtools]{batchMap}}}
85 |
86 | \item{pepSettings}{Experimental untested feature.}
87 |
88 | \item{ignoreLock}{Internal parameter used for batch job submission; don't
89 | touch.}
90 |
91 | \item{lifespan}{Numeric specifying the maximum age of cache, in days, to
92 | allow before automatically triggering \code{recreate=TRUE}.}
93 | }
94 | \description{
95 | Given a unique name for an R object, and instructions for how to make that
96 | object, use the simpleCache function to create and cache or load the object.
97 | This should be used for computations that take a long time and generate a
98 | table or something used repeatedly (in other scripts, for example). Because
99 | the cache is tied to the object name, there is some danger of causing
100 | troubles if you misuse the caching system. The object should be considered
101 | static.
102 | }
103 | \details{
104 | You should pass a bracketed R code snippet like \code{rnorm(500)} as the
105 | instruction, and simpleCache will create the object. Alternatively, if the
106 | code to create the cache is large, you can put an R script called object.R in
107 | the \code{\link[=setCacheBuildDir]{RBUILD.DIR}} (the name of the file *must* match the name of the object it
108 | creates *exactly*). If you don't provide an instruction, the function sources
109 | RBUILD.DIR/object.R and caches the result as the object. This source file
110 | *must* create an object with the same name of the object. If you already have
111 | an object with the name of the object to load in your current environment,
112 | this function will not try to reload the object; instead, it returns the
113 | local object. In essence, it assumes that this is a static object, which you
114 | will not change. You can force it to load the cached version instead with
115 | "reload".
116 |
117 | Because R uses lexical scope and not dynamic scope, you may need to pass some
118 | environment variables you use in your instruction code. You can use this
119 | using the parameter buildEnvir (just provide a list of named variables).
120 | }
121 | \examples{
122 | # choose location to store caches
123 | cacheDir = tempdir()
124 | cacheDir
125 | setCacheDir(cacheDir)
126 |
127 | # build some caches
128 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE)
129 | simpleCache("normSample", { rnorm(5e3, 0,1) })
130 | simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE)
131 |
132 | # storing a cache after-the-fact
133 | normSample2 = rnorm(10, 0, 1)
134 | storeCache("normSample2")
135 |
136 | # what's available?
137 | listCaches()
138 |
139 | # load a cache
140 | simpleCache("normSample")
141 |
142 | # load multiples caches
143 | loadCaches(c("normSample", "normSample2"), reload=TRUE)
144 | }
145 |
--------------------------------------------------------------------------------
/man/simpleCacheGlobal.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/sharedCaches.R
3 | \name{simpleCacheGlobal}
4 | \alias{simpleCacheGlobal}
5 | \title{Helper alias for loading caches into the global environment.
6 | simpleCache normally loads variables into the calling environment; this
7 | ensures that the variables are loaded in the global environment.}
8 | \usage{
9 | simpleCacheGlobal(...)
10 | }
11 | \arguments{
12 | \item{...}{Parameters passed to \code{\link{simpleCache}}.}
13 | }
14 | \description{
15 | Helper alias for loading caches into the global environment.
16 | simpleCache normally loads variables into the calling environment; this
17 | ensures that the variables are loaded in the global environment.
18 | }
19 |
--------------------------------------------------------------------------------
/man/simpleCacheOptions.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/cacheDirectories.R
3 | \name{simpleCacheOptions}
4 | \alias{simpleCacheOptions}
5 | \title{View simpleCache options}
6 | \usage{
7 | simpleCacheOptions()
8 | }
9 | \description{
10 | Views simpleCache global variables
11 | }
12 |
--------------------------------------------------------------------------------
/man/simpleCacheShared.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/sharedCaches.R
3 | \name{simpleCacheShared}
4 | \alias{simpleCacheShared}
5 | \title{Alias to default to a shared cache folder.}
6 | \usage{
7 | simpleCacheShared(...)
8 | }
9 | \arguments{
10 | \item{...}{Parameters passed to \code{\link{simpleCache}}.}
11 | }
12 | \description{
13 | Helper alias for caching across experiments/people.
14 | Just sets the cacheDir to the default SHARE directory
15 | (instead of the typical default PROJECT directory)
16 | }
17 |
--------------------------------------------------------------------------------
/man/simpleCacheSharedGlobal.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/sharedCaches.R
3 | \name{simpleCacheSharedGlobal}
4 | \alias{simpleCacheSharedGlobal}
5 | \title{Helper alias for loading shared caches into the global environment.}
6 | \usage{
7 | simpleCacheSharedGlobal(...)
8 | }
9 | \arguments{
10 | \item{...}{Parameters passed to \code{\link{simpleCache}}.}
11 | }
12 | \description{
13 | Helper alias for loading shared caches into the global environment.
14 | }
15 |
--------------------------------------------------------------------------------
/man/storeCache.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/storeCache.R
3 | \name{storeCache}
4 | \alias{storeCache}
5 | \title{Stores as a cache an already-produced R object}
6 | \usage{
7 | storeCache(
8 | cacheName,
9 | cacheDir = getCacheDir(),
10 | cacheSubDir = NULL,
11 | recreate = FALSE
12 | )
13 | }
14 | \arguments{
15 | \item{cacheName}{Unique name for the cache (and R object to be cached).}
16 |
17 | \item{cacheDir}{The directory where caches are saved (and loaded from).
18 | Defaults to the global \code{\link[=setCacheDir]{RCACHE.DIR}} variable}
19 |
20 | \item{cacheSubDir}{You can specify a subdirectory within the cacheDir
21 | variable. Defaults to \code{NULL}.}
22 |
23 | \item{recreate}{Forces reconstruction of the cache}
24 | }
25 | \description{
26 | Sometimes you use significant computational power to create an object, but
27 | you didn't cache it with \code{\link{simpleCache}}. Oops, maybe you wish you had, after the
28 | fact. This function lets you store an object in the environment so it could
29 | be loaded by future calls to \code{simpleCache}.
30 | }
31 | \details{
32 | This can be used in interactive sessions, but could also be used for another
33 | use case: you have a complicated set of instructions (too much to pass as the
34 | instruction argument to \code{simpleCache}), so you could just stick a call to
35 | \code{storeCache} at the end.
36 | }
37 | \examples{
38 | # choose location to store caches
39 | cacheDir = tempdir()
40 | cacheDir
41 | setCacheDir(cacheDir)
42 |
43 | # build some caches
44 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE)
45 | simpleCache("normSample", { rnorm(5e3, 0,1) })
46 | simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE)
47 |
48 | # storing a cache after-the-fact
49 | normSample2 = rnorm(10, 0, 1)
50 | storeCache("normSample2")
51 |
52 | # what's available?
53 | listCaches()
54 |
55 | # load a cache
56 | simpleCache("normSample")
57 |
58 | # load multiples caches
59 | loadCaches(c("normSample", "normSample2"), reload=TRUE)
60 | }
61 |
--------------------------------------------------------------------------------
/man/tic.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/utility.R
3 | \name{tic}
4 | \alias{tic}
5 | \title{Start a timer}
6 | \usage{
7 | tic(gcFirst = TRUE, type = c("elapsed", "user.self", "sys.self"))
8 | }
9 | \arguments{
10 | \item{gcFirst}{Garbage Collect before starting the timer?}
11 |
12 | \item{type}{Type of time to return,
13 | can be 'elapsed', 'user.self', or 'sys.self'}
14 | }
15 | \description{
16 | Start a timer
17 | }
18 |
--------------------------------------------------------------------------------
/man/toc.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/utility.R
3 | \name{toc}
4 | \alias{toc}
5 | \title{Check the time since the current timer was started with tic()}
6 | \usage{
7 | toc()
8 | }
9 | \description{
10 | Check the time since the current timer was started with tic()
11 | }
12 |
--------------------------------------------------------------------------------
/paper/paper.bib:
--------------------------------------------------------------------------------
1 | @Manual{R,
2 | title = {R: A Language and Environment for Statistical Computing},
3 | author = {{R Core Team}},
4 | organization = {R Foundation for Statistical Computing},
5 | address = {Vienna, Austria},
6 | year = {2016},
7 | url = {https://www.R-project.org/},
8 | }
9 |
10 | @Article{batchtools,
11 | title = {batchtools: Tools for R to work on batch systems},
12 | author = {Michel Lang and Bernd Bischl and Dirk Surmann},
13 | journal = {The Journal of Open Source Software},
14 | year = {2017},
15 | month = {feb},
16 | volume = {2},
17 | number = {10},
18 | doi = {10.21105/joss.00135},
19 | url = {https://doi.org/10.21105/joss.00135},
20 | }
21 |
22 | @Article{RPIM,
23 | author="Sheffield, N. C. and Pierron, G. and Klughammer, J. and Datlinger, P. and Schonegger, A. and Schuster, M. and Hadler, J. and Surdez, D. and Guillemot, D. and Lapouble, E. and Freneaux, P. and Champigneulle, J. and Bouvier, R. and Walder, D. and Ambros, I. M. and Hutter, C. and Sorz, E. and Amaral, A. T. and de Alava, E. and Schallmoser, K. and Strunk, D. and Rinner, B. and Liegl-Atzwanger, B. and Huppertz, B. and Leithner, A. and de Pinieux, G. and Terrier, P. and Laurence, V. and Michon, J. and Ladenstein, R. and Holter, W. and Windhager, R. and Dirksen, U. and Ambros, P. F. and Delattre, O. and Kovar, H. and Bock, C. and Tomazou, E. M. ",
24 | title={DNA methylation heterogeneity defines a disease spectrum in Ewing sarcoma},
25 | journal={Nature Medicine},
26 | year={2017},
27 | volume={23},
28 | number={3},
29 | pages={386-395},
30 | month={Mar},
31 | doi = {10.1038/nm.4273}
32 | }
33 |
34 | @Article{LOLA,
35 | author={Sheffield, N. C. and Bock, C.},
36 | title="{LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor},
37 | journal={Bioinformatics},
38 | year={2016},
39 | volume={32},
40 | number={4},
41 | pages={587-589},
42 | month={Feb},
43 | doi = {10.1093/bioinformatics/btv612}
44 | }
45 |
--------------------------------------------------------------------------------
/paper/paper.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: 'simpleCache: R caching for reproducible, distributed, large-scale projects'
3 | authors:
4 | - name: Nathan C. Sheffield
5 | orcid: 0000-0001-5643-4068
6 | affiliation: 1
7 | - name: VP Nagraj
8 | orcid: 0000-0003-0060-566X
9 | affiliation: 1
10 | - name: Vince Reuter
11 | orcid: 0000-0002-7967-976X
12 | affiliation: 1
13 | affiliations:
14 | - name: University of Virginia
15 | index: 1
16 | date: 28 October 2017
17 | bibliography: paper.bib
18 | ---
19 |
20 | # Summary
21 |
22 | `simpleCache` is an R[@R] package that provides functions for caching R objects. Its purpose is to encourage writing reusable, restartable, and reproducible analysis for projects with large data and computational requirements. Like its name indicates, `simpleCache` is intended to be simple. Users specify a location to store caches, and then provide nothing more than a cache name and instructions (R code) for how to produce an R object. `simpleCache` either creates and saves or simply loads the result as necessary with just a single function call.
23 |
24 | In addition to this basic functionality, `simpleCache` has advanced options for assigning objects to specific environments, recreating caches, reloading caches, and even distributing caching operations to cluster computing resources via the `batchools`[@batchtools] interface. These features make the package particularly useful for large-scale data analysis and research projects. `simpleCache` is most helpful for caching objects that are computationally expensive to create, but used in multiple scripts or by multiple users.
25 |
26 | `simpleCache` is also useful to enhance performance in a package that relies on large databases. For example, `simpleCache` has been incorporated with the LOLA R package[@LOLA] to more efficiently cache and retrieve genomic region databases. Similarly, `simpleCache` has been used to store cached baseline statistical tables for faster lookup to determine statistical differences on tables with hundreds of millions of data points [@RPIM].
27 |
28 | In summary, `simpleCache` provides a user-friendly interface to help the R programmer manage computationally intensive, repeated data analysis.
29 |
30 | # References
31 |
--------------------------------------------------------------------------------
/simpleCache.Rproj:
--------------------------------------------------------------------------------
1 | Version: 1.0
2 |
3 | RestoreWorkspace: Default
4 | SaveWorkspace: Default
5 | AlwaysSaveHistory: Default
6 |
7 | EnableCodeIndexing: Yes
8 | UseSpacesForTab: Yes
9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 |
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 |
15 | BuildType: Package
16 | PackageUseDevtools: Yes
17 | PackageInstallArgs: --no-multiarch --with-keep.source
18 |
--------------------------------------------------------------------------------
/tests/testthat.R:
--------------------------------------------------------------------------------
1 | library(testthat)
2 | library(simpleCache)
3 |
4 | test_check("simpleCache")
5 |
--------------------------------------------------------------------------------
/tests/testthat/helper-lifespan.R:
--------------------------------------------------------------------------------
1 | # Ancillary functions for cache lifespan tests.
2 |
3 | # Build a small data frame to cache.
4 | buildTestFrame = function() { data.frame(matrix(1:9, nrow=3)) }
5 |
6 | # Remove test case's temp cache folder.
7 | cleanLSTest = function() { unlink(lifespanTestsTmpdir(), recursive=TRUE) }
8 |
9 | # Count the number of items in the cache folder.
10 | countCacheItems = function() { length(list.files(getOption("RCACHE.DIR"))) }
11 |
12 | # Generate path to temp folder for test case.
13 | lifespanTestsTmpdir = function() { file.path(tempdir(), "lifespan") }
14 |
15 | # Establish a temp folder and set the cache home location to it.
16 | setupLSTest = function() {
17 | testdir = lifespanTestsTmpdir()
18 | if (!file_test("-d", testdir)) { dir.create(testdir) }
19 | setCacheDir(lifespanTestsTmpdir())
20 | }
21 |
--------------------------------------------------------------------------------
/tests/testthat/test_all.R:
--------------------------------------------------------------------------------
1 | library(simpleCache)
2 |
3 | context("error checking")
4 |
5 |
6 | # Map option name to its setter.
7 | kSetters = list(RCACHE.DIR=setCacheDir, RESOURCES.RCACHE=setSharedCacheDir, RBUILD.DIR=setCacheBuildDir)
8 |
9 |
10 | # Test a cache dir setting in managed context fashion, resetting before and after test.
11 | test_dir_default = function(cacheDirOptname) {
12 | resetCacheSearchEnvironment()
13 | test_that(sprintf("%s setter uses current folder for argument-less call", cacheDirOptname), {
14 | do.call(kSetters[[cacheDirOptname]], args=list())
15 | expect_equal(getwd(), getOption(cacheDirOptname))
16 | })
17 | resetCacheSearchEnvironment()
18 | }
19 |
20 |
21 | test_that("notifications and messages as expected", {
22 |
23 | # message if cache exists
24 | simpleCache("normSample", instruction = {rnorm(5e3, 0,1)}, cacheDir = tempdir(), recreate=TRUE)
25 | expect_message(simpleCache("normSample", instruction = {rnorm(5e3, 0,1)}, cacheDir = tempdir(), recreate=FALSE, noload = TRUE), "^::Cache exists")
26 | deleteCaches("normSample", force = TRUE)
27 |
28 | # storeCache should not accept non-character cacheName
29 | expect_error(storeCache(cacheName = normSample, recreate = TRUE, cacheDir = tempdir()), "storeCache expects the cacheName variable to be a character vector.")
30 |
31 | # message when cacheDir isn't defined
32 | expect_message(simpleCache("normSample", { rnorm(5e3, 0,1) }), regexp = "^No cacheDir specified.")
33 |
34 | # error when buildDir is empty without instruction
35 | expect_error(simpleCache("normSample", cacheDir = tempdir(), buildDir = tempdir(), recreate = TRUE), "::Error::\tNo instruction or RBuild file provided.")
36 |
37 | # error when buildEnvir includes "instruction"
38 | expect_error(simpleCache("normSample", { rnorm(5e3, 0,1) }, buildEnvir = list(instruction="foo"), recreate=TRUE, cacheDir = tempdir()), "Can't provide a variable named 'instruction' in buildEnvir")
39 |
40 | # error when instruction and buildDir are null
41 | expect_error(simpleCache("normSample", instruction = NULL, buildDir = NULL, cacheDir = tempdir(), recreate=TRUE))
42 |
43 | # error when cacheName is not character
44 | expect_error(simpleCache(12345, instruction = { rnorm(5e3, 0,1) }, buildDir = NULL, cacheDir = tempdir(), recreate=TRUE))
45 |
46 | # message when return is NULL
47 | expect_message(simpleCache("normSample", instruction = {normSample <- NULL}, recreate = TRUE, cacheDir = tempdir()), "NULL value returned, no cache created")
48 |
49 | # we must clean up any temporary caches we make
50 | deleteCaches("normSample", force=TRUE, cacheDir = tempdir())
51 |
52 | })
53 |
54 | test_that("Caching respects files existing", {
55 | setCacheDir(tempdir())
56 | set.seed(1)
57 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE)
58 | expect_equal(signif(normSample[1], 6), -0.626454)
59 |
60 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE)
61 | expect_equal(signif(normSample[1], 6), -1.51637)
62 |
63 | # Should not evaluate
64 | simpleCache("normSample", { rnorm(5e3, 0,1) })
65 | expect_equal(signif(normSample[1], 6), -1.51637)
66 |
67 |
68 | # These delete cache should force the reload to recreate cache
69 | deleteCaches("normSample", force=TRUE)
70 | simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE)
71 | expect_equal(signif(normSample[1], 6), -0.804332)
72 |
73 | # we must clean up any temporary caches we make
74 | deleteCaches("normSample", force=TRUE)
75 |
76 | })
77 |
78 | context("basic functionality")
79 |
80 | test_that("timer works", {
81 |
82 | setCacheDir(tempdir())
83 | timeout <- capture_messages(simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer = TRUE))[2]
84 | expect_match(timeout, "<[0-9][0-9]h [0-9][0-9]m [0-9].[0-9]s>")
85 |
86 | # we must clean up any temporary caches we make
87 | deleteCaches("normSample", force=TRUE)
88 |
89 | })
90 |
91 | test_that("cache can be created without loading", {
92 |
93 | expect_null(simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate = TRUE, noload = TRUE, cacheDir = tempdir()))
94 |
95 | expect_true("normSample.RData" %in% listCaches())
96 |
97 | # we must clean up any temporary caches we make
98 | deleteCaches("normSample", force=TRUE)
99 |
100 | })
101 |
102 | test_that("object can be stored as cache", {
103 |
104 | normSample2 <<- rnorm(5e3,0,1)
105 |
106 | expect_message(storeCache("normSample2", cacheDir = NULL, recreate = TRUE), "^You must set global option RCACHE.DIR")
107 |
108 | expect_message(storeCache("normSample2", cacheDir = tempdir(), recreate = TRUE), "^::Creating cache::")
109 |
110 | expect_message(storeCache("normSample2", cacheDir = tempdir(), recreate = FALSE), "^::Cache already exists")
111 |
112 | # we must clean up any temporary caches we make
113 | deleteCaches("normSample2", force=TRUE)
114 |
115 | })
116 |
117 | test_that("option setting works", {
118 |
119 | # set all options
120 | setCacheDir(tempdir())
121 | setSharedCacheDir(tempdir())
122 | setCacheBuildDir(tempdir())
123 | addCacheSearchEnvironment("cacheEnv")
124 |
125 | # Windows uses double slashes, which get consumed weirdly by grep;
126 | # This command will replace double slashes with quadruple slashes,
127 | # which behave correctly in grep.
128 | grep_tempdir = gsub("\\\\", "\\\\\\\\", tempdir())
129 | # capture output and check
130 | options_out <- capture_messages(simpleCacheOptions())
131 |
132 | expect_true(grepl(grep_tempdir, options_out[1]))
133 | expect_true(grepl(grep_tempdir, options_out[2]))
134 | expect_true(grepl(grep_tempdir, options_out[3]))
135 | expect_true(grepl("cacheEnv", options_out[4]))
136 |
137 | # reset the cache search option
138 | resetCacheSearchEnvironment()
139 |
140 | # check to make sure it is gone
141 | options_out <- capture_messages(simpleCacheOptions())
142 | expect_true(!grepl("cacheEnv", options_out[4]))
143 |
144 | })
145 |
146 | test_that("Cache dir fetch works", {
147 | options(RCACHE.DIR = NULL)
148 | expect_true(is.null(getCacheDir()))
149 | setCacheDir(tempdir())
150 | expect_false(is.null(getCacheDir()))
151 | expect_equal(getCacheDir(), tempdir())
152 | })
153 |
154 | # Test each cache directory option setter.
155 | for (optname in names(kSetters)) { test_dir_default(optname) }
156 |
157 |
158 | test_that("objects pass through in buildEnvir", {
159 |
160 | setCacheDir(tempdir())
161 |
162 | set.seed(1)
163 | simpleCache("piSample", { pi^x }, buildEnvir = list(x=2), recreate=TRUE, timer = TRUE)
164 | rm(piSample)
165 |
166 | simpleCache("piSample", reload = TRUE)
167 |
168 | expect_equal(signif(piSample, 3), 9.87)
169 |
170 | # we must clean up any temporary caches we make
171 | deleteCaches("piSample", force=TRUE)
172 |
173 | })
174 |
175 | test_that("caches can be loaded", {
176 |
177 | setCacheDir(tempdir())
178 |
179 | simpleCache("loadSample", { rnorm(5e3, 0,1) }, recreate=TRUE)
180 | loadCaches("loadSample")
181 |
182 | expect_true("loadSample" %in% ls())
183 |
184 | # we must clean up any temporary caches we make
185 | deleteCaches("loadSample", force=TRUE)
186 |
187 | })
188 | context("misc")
189 |
190 | test_that("listCaches returns name of given cache", {
191 |
192 | setCacheDir(tempdir())
193 | set.seed(1)
194 | simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE)
195 | expect_true("normSample.RData" %in% listCaches())
196 |
197 | # we must clean up any temporary caches we make
198 | deleteCaches("normSample", force=TRUE)
199 |
200 | })
201 |
202 |
--------------------------------------------------------------------------------
/tests/testthat/test_cache_lifespan.R:
--------------------------------------------------------------------------------
1 | # test_cache_lifespan.R
2 | # Tests for enforcing cache lifespan requirement.
3 |
4 | # The pattern/them for each test is something like:
5 | # 1. Ensure the test case has a fresh, clean folder.
6 | # 2. Create a dummy cache.
7 | # 3. Check that only 1 file is in the cache.
8 | # 4. Grab the cache timestamp.
9 | # 5. Make another simpleCache call.
10 | # 6. Again check that there's a single cache file and grab the timestamp.
11 | # 7. Compare timestamps.
12 |
13 | context("lifespan")
14 |
15 | # Provide clean cache folder (pre-set) for each test case.
16 | my_test_that = function(description, instruction) {
17 | setupLSTest()
18 | test_that(description, instruction)
19 | cleanLSTest()
20 | }
21 |
22 | # Control loading behavior for these tests to focus on lifespan/recreate effects.
23 | mySimpleCache = function(...) { simpleCache(..., noload=TRUE) }
24 |
25 | # Negative control
26 | my_test_that("Cache file isn't replaced if no lifespan is specified and recreate=FALSE", {
27 | expect_equal(0, countCacheItems())
28 | mySimpleCache("testDF", recreate=FALSE, instruction={ buildTestFrame() })
29 | expect_equal(1, countCacheItems())
30 | fp = file.path(getOption("RCACHE.DIR"), "testDF.RData")
31 | t0 = file.info(fp)$ctime
32 | mySimpleCache("testDF", recreate=FALSE, instruction={ buildTestFrame() })
33 | expect_equal(1, countCacheItems())
34 | t1 = file.info(fp)$ctime
35 | expect_equal(t0, t1)
36 | })
37 |
38 | # Another sort of control
39 | my_test_that("Cache file is replaced if no lifespan is specified and recreate=TRUE", {
40 | expect_equal(0, countCacheItems())
41 | fp = file.path(getOption("RCACHE.DIR"), "testDF.RData")
42 | expect_false(file_test("-f", fp))
43 | mySimpleCache("testDF", instruction={ buildTestFrame() })
44 | expect_equal(1, countCacheItems())
45 | expect_true(file_test("-f", fp))
46 | t0 = file.info(fp)$mtime
47 | Sys.sleep(1) # Delay so that our time comparison can work.
48 | mySimpleCache("testDF", recreate=TRUE, instruction={ buildTestFrame() })
49 | expect_equal(1, countCacheItems())
50 | t1 = file.info(fp)$mtime
51 | expect_true(t1 > t0)
52 | })
53 |
54 | # Specificity
55 | my_test_that("Cache remains unchanged if younger than explicit lifespan", {
56 | expect_equal(0, countCacheItems())
57 | fp = file.path(getOption("RCACHE.DIR"), "testDF.RData")
58 | expect_false(file_test("-f", fp))
59 | mySimpleCache("testDF", instruction={ buildTestFrame() })
60 | expect_equal(1, countCacheItems())
61 | expect_true(file_test("-f", fp))
62 | t0 = file.info(fp)$mtime
63 | mySimpleCache("testDF", lifespan=0.5, instruction={ buildTestFrame() })
64 | expect_equal(1, countCacheItems())
65 | t1 = file.info(fp)$mtime
66 | expect_true(t1 == t0)
67 | })
68 |
69 | # Sensitivity
70 | my_test_that("Cache is replaced if older than explicit lifespan", {
71 | expect_equal(0, countCacheItems())
72 | fp = file.path(getOption("RCACHE.DIR"), "testDF.RData")
73 | expect_false(file_test("-f", fp))
74 | mySimpleCache("testDF", instruction={ buildTestFrame() })
75 | expect_equal(1, countCacheItems())
76 | expect_true(file_test("-f", fp))
77 | t0 = file.info(fp)$mtime
78 | Sys.sleep(1) # Time difference comparison reliability.
79 | mySimpleCache("testDF", lifespan=0, instruction={ buildTestFrame() })
80 | expect_equal(1, countCacheItems())
81 | t1 = file.info(fp)$mtime
82 | expect_true(t1 > t0)
83 | })
84 |
85 | # Explicit recreate argument trumps cache lifespan to determine recreation.
86 | my_test_that("Cache is replaced if recreate=TRUE even if cache is fresh", {
87 | expect_equal(0, countCacheItems())
88 | fp = file.path(getOption("RCACHE.DIR"), "testDF.RData")
89 | expect_false(file_test("-f", fp))
90 | mySimpleCache("testDF", instruction={ buildTestFrame() })
91 | expect_true(file_test("-f", fp))
92 | expect_equal(1, countCacheItems())
93 | t0 = file.info(fp)$mtime
94 | Sys.sleep(1) # Time difference comparison reliability.
95 | mySimpleCache("testDF", recreate=TRUE, lifespan=0, instruction={ buildTestFrame() })
96 | expect_equal(1, countCacheItems())
97 | t1 = file.info(fp)$mtime
98 | expect_true(t1 > t0)
99 | })
100 |
101 | my_test_that("simpleCache can pick up option specifying max cache age.", {
102 | options(MAX.CACHE.AGE = 0)
103 | fp = file.path(getOption("RCACHE.DIR"), "testDF.RData")
104 | expect_false(file_test("-f", fp))
105 | mySimpleCache("testDF", instruction={ buildTestFrame() })
106 | expect_true(file_test("-f", fp))
107 | t0 = file.info(fp)$mtime
108 | Sys.sleep(1) # Time difference comparison reliability.
109 | mySimpleCache("testDF", instruction={ buildTestFrame() })
110 | t1 = file.info(fp)$mtime
111 | expect_true(t1 > t0)
112 | })
113 |
114 | my_test_that("Direct lifespan specification is preferred to background option", {
115 | options(MAX.CACHE.AGE = 1)
116 | fp = file.path(getOption("RCACHE.DIR"), "testDF.RData")
117 | expect_false(file_test("-f", fp))
118 | mySimpleCache("testDF", instruction={ buildTestFrame() })
119 | expect_true(file_test("-f", fp))
120 | t0 = file.info(fp)$mtime
121 | Sys.sleep(1)
122 | mySimpleCache("testDF", instruction={ buildTestFrame() })
123 | expect_equal(t0, file.info(fp)$ctime) # Cache is fresh via MAX.CACHE.AGE.
124 | Sys.sleep(1) # Time difference comparison reliability.
125 | mySimpleCache("testDF", lifespan=0, instruction={ buildTestFrame() })
126 | t1 = file.info(fp)$mtime
127 | expect_true(t1 > t0) # Cache is stale via lifespan.
128 | })
129 |
--------------------------------------------------------------------------------
/vignettes/clusterCaches.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Generating caches on a cluster"
3 | author: "Nathan Sheffield"
4 | date: "`r Sys.Date()`"
5 | vignette: >
6 | %\VignetteEngine{knitr::rmarkdown}
7 | %\VignetteIndexEntry{Generating caches on a cluster}
8 | output: knitr:::html_vignette
9 | ---
10 |
11 | # Generating caches in parallel using batchtools
12 |
13 | By default, `simpleCache` creates caches in the R session you use to call it. If you need to make lots of caches, or very large caches, you may want instead to sub these as jobs to a cluster resource manager (like SLURM). simpleCache can do this using functionality from the `batchtools` package.
14 |
15 | This vignette is unevaluated because it relies on the `batchtools` package and a cluster environment.
16 |
17 | To do this, first, create a `batchtools` registry. You can follow more detailed documentation in the `batchtools` package, but here's some code to get you started:
18 |
19 | ```{r Try it out, eval=FALSE}
20 | library(simpleCache)
21 | setCacheDir(tempdir())
22 |
23 | registry = batchtools::makeRegistry(NA)
24 | templateFile = system.file("templates/slurm-advanced.tmpl", package = "simpleCache")
25 | registry$cluster.functions = batchtools::makeClusterFunctionsSlurm(
26 | template = templateFile)
27 | registry
28 | ```
29 |
30 | Notice that I'm using a custom slurm template here. With a registry in hand, we next need to define the resources this cache job will require:
31 |
32 | ```{r}
33 | resources = list(ncpus=1, memory=1000, walltime=60, partition="serial")
34 | ```
35 |
36 | Then, we simply add these as arguments to `simpleCache()` like so:
37 | ```{r, eval=FALSE}
38 | simpleCache("testBatch", {
39 | rnorm(1e7, 0, 1)
40 | }, batchRegistry=registry, batchResources=resources)
41 | ```
42 |
43 | This will now create and submit a job script to the cluster. That job script will have R code to create your `testBatch` cache by calling the code in your `simpleCache` call, `rnorm(1e7, 0, 1)`. Next time you run this function, it will just load the cache without recreating it, as you would expect simpleCache to do. Now there's a bunch of other stuff you can use `batchtools` to do with these jobs:
44 |
45 | ```{r, eval=FALSE}
46 | batchtools::getJobTable(reg=registry)
47 | batchtools::getJobPars()
48 | batchtools::getStatus()
49 |
50 | batchtools::getJobTable(reg=registry)
51 | batchtools::getJobPars(1, reg=registry)
52 | batchtools::loadResult(1, reg=registry)
53 | # batchtools::testJob(1, reg=registry)
54 | # killJobs()
55 | ```
56 |
57 | When you're done, you may want to remove your temporary registry:
58 | ```{r, eval=FALSE}
59 | batchtools::removeRegistry(reg=registry)
60 | ```
61 |
62 | See `batchtools` documentation for more details on using registries.
--------------------------------------------------------------------------------
/vignettes/sharingCaches.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Sharing caches across projects"
3 | author: "Nathan Sheffield"
4 | date: "`r Sys.Date()`"
5 | vignette: >
6 | %\VignetteEngine{knitr::rmarkdown}
7 | %\VignetteIndexEntry{Sharing caches across projects}
8 | output: knitr:::html_vignette
9 | ---
10 |
11 | # Sharing caches across projects
12 |
13 | By default, `simpleCache` will store its caches for your project in the `RCACHE.DIR` global option. This is designed to be a project-specific directory, so I have a different RCache dir for each of my projects. Sometimes, though, I want to share caches across projects, and so it's useful to have a definition of a shared cache directory. I think of this as a general resource. For instance, I use this to store the location of all CpGs in the human genome, which I use repeatedly in many projects.
14 |
15 | To solve this problem, `simpleCache` uses a second global option, `SHARE.RCACHE.DIR`, which you can access with the convenience setter `setSharedCacheDir()`. Then, you use `simpleCache` as normal but with the additional parameter of cacheDir, or the convenience alias `simpleCacheShared()`, as outlined below:
16 |
17 | ```{r Try it out}
18 | library(simpleCache)
19 | cacheDir = tempdir()
20 | setSharedCacheDir(cacheDir)
21 | simpleCacheShared("normSample", { rnorm(1e7, 0,1) }, recreate=TRUE)
22 | simpleCacheShared("normSample", { rnorm(1e7, 0,1) })
23 | ```
24 |
25 | ```{r Clean up}
26 | deleteCaches("normSample", force=TRUE)
27 | ```
28 |
--------------------------------------------------------------------------------
/vignettes/simpleCacheIntroduction.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "An introduction to simpleCache"
3 | author: "Nathan Sheffield"
4 | date: "`r Sys.Date()`"
5 | vignette: >
6 | %\VignetteEngine{knitr::rmarkdown}
7 | %\VignetteIndexEntry{An introduction to simpleCache}
8 | output: knitr:::html_vignette
9 | ---
10 |
11 | # An introduction to simpleCache
12 |
13 | ## Your first cache
14 |
15 | `simpleCache` has 2 main use cases: First, it can help you pick up where you left off in an R session, and second, it can help you parallelize code by enabling you to share results across R sessions.
16 |
17 | The workhorse of `simpleCache` is the eponymous `simpleCache()` function, which in the simplest case requires just two parameters: a cache name, and a block of code. The cache name should be considered unique and its underlying object immutable, while the block of code (or *instruction*) is the `R` code that generates the object you wish to cache.
18 |
19 | But before we start creating caches, it's important to tell `simpleCache` where to store the caches. `simpleCache` uses a global variable (`RCACHE.DIR`) for caches, and provides a setter function (`setCacheDir()`) to change this. To get started, choose a cache directory, and generate some random data.
20 |
21 | ```{r Try it out}
22 | library(simpleCache)
23 | cacheDir = tempdir()
24 | setCacheDir(cacheDir)
25 | simpleCache("normSamp", { rnorm(1e7, 0,1) })
26 | ```
27 |
28 | Now, watch what happens when we run that same function call again:
29 |
30 | ```{r}
31 | simpleCache("normSamp", { rnorm(1e7, 0,1) })
32 | ```
33 |
34 | Notice that the second call to `simpleCache()` doesn't re-run the `rnorm` calculation. In fact, it doesn't even re-load the cache, because it notices that it's already in memory. If the cache weren't already in memory, this call would load it from disk. This means you can put this code in multiple scripts and pull the same randomized data, without re-doing the compute work.
35 |
36 | You can also force a cache to reload using the `reload` option. This could be useful, for example, if you've loaded a cache and then accidentally changed it, and want to reset. By default, a call to `simpleCache()` will not reload an object that already exists in your environment. But you can always force it with the `reload` parameter:
37 |
38 | ```{r}
39 | normSamp = NA # Oops broke my object in memory.
40 | # Regular call won't reload because we have an object called normSamp already:
41 | simpleCache("normSamp", { rnorm(1e7, 0,1) })
42 | # But we can force reload and get it back with reload=TRUE
43 | simpleCache("normSamp", { rnorm(1e7, 0,1) }, reload=TRUE)
44 | ```
45 |
46 | What if we want to start over and blow that cache, getting a new random set? Use the `recreate` flag if you want to ensure that the cache is produced and overwritten even if it already exists:
47 |
48 | ```{r}
49 | simpleCache("normSamp", { rnorm(1e7, 0,1) }, recreate=TRUE)
50 | ```
51 |
52 | With just those parameters (cache name, instruction, recreate, and reload), you should be able to make good use of `simpleCache`. The essence is: if the object exists in memory already: do nothing. If it does not exist in memory, but exists on disk: load it into memory. If it exists neither in memory or on disk: create it and store it to disk and memory. Now you've got the basics.
53 |
54 | But there's more if you want it: read on!
55 |
56 | ## Comparison to base R save() and load()
57 |
58 | Of course, R has base functions that accomplish this (`save()` and `load()`), so what does simpleCache add? Well, `simpleCache` is essentially a convenience wrapper around the base R functions. The first advantage is that we now require only a single function: `simpleCache()` handles both saving and loading. This means your script does not need to be written differently depending on whether it's generating or loading a cache, because the same function can do either, depending on whether the cache exists or not. The second advantage is that caches are keyed by cache name instead of by filename. So instead of putting a whole path to an Rdata file into `load()`, we just pass a unique identifier for the cache, and simpleCache handles the rest. Third, `simpleCache` tries to be smart: if you already have the object in memory, it won't re-load it. For big caches, this can save you time if you accidentally call `simpleCache()` multiple times on the same cache (or if you write functions to populate an R environment with a bunch of pre-existing data).
59 |
60 | Beyond that, `simpleCache` also offers several convenient options that just make it really easy to save and re-load R objects. Let's go into a bit more detail into these features.
61 |
62 | ## Cache names
63 |
64 | By default, the object will be loaded into a variable with the same name as the cache. You can change this behavior with the `assignTo` parameter:
65 |
66 | ```{r}
67 | simpleCache("normSamp", { rnorm(1e7, 0,1) }, assignTo="mySamp")
68 | ```
69 |
70 | After doing this command, we have both `normSamp` (from the previous calls, not from this one) and `mySamp` (loaded in this call) in the workspace, and these objects are identical:
71 |
72 | ```{r}
73 | identical(normSamp, mySamp)
74 | ```
75 |
76 | This `assignTo` concept is useful if you want to create caches but not load them, or load caches one at a time. Which leads us to...
77 |
78 | ## Creating but not loading caches
79 |
80 | It may be that you want to create a bunch of caches that are quite memory intensive, and you don't actually need them all in this particular R workspace at the same time. If you just create each object and save it, you'll end with all those objects in memory at the same time. Instead, you can use the `noload` parameter, which will create the caches but not load them into memory (so the object will be cached, but will not persist in this R environment). I use this frequently in a setup script to build caches that I will need later in individual scripts that will run on each one individually. Let's make 5 caches but not load them:
81 |
82 | ```{r}
83 | for (i in 1:5) {
84 | cacheName = paste0("normSamp_", i)
85 | simpleCache(cacheName, { rnorm(1e6, 0,1) }, recreate=TRUE, noload=TRUE)
86 | }
87 | ```
88 |
89 | We've now produced 5 different sample data caches. They exist on disk, but not in memory. This could, for example, be done in an initial data-generation or setup script. We then may be interested in using these (same) caches in several downstream scripts, and we could do some iterative operation on them and use `assignTo` to avoid loading more than 1 at a time into memory:
90 |
91 | ```{r}
92 | overallMinimum = 1e6 # pick some high number to start
93 | for (i in 1:5) {
94 | cacheName = paste0("normSamp_", i)
95 | simpleCache(cacheName, assignTo="temp")
96 | overallMinimum = min(overallMinimum, temp)
97 | }
98 |
99 | message(overallMinimum)
100 | ```
101 |
102 | In this code block, by assigning the caches to the variable `temp`, we only have 1 in memory at a time, because each cache load overwrites the previous one, which is exactly what we want in this case. We keep track of the minimum value of each one independently, and we've effectively calculated an overall minimum while loading only a single cache in memory at a time.
103 |
104 | ## Loading multiple caches
105 |
106 | If you've got a bunch of caches and you want them all in memory, you could just load all the caches into memory with this convenience alias:
107 | ```{r}
108 | loadCaches(paste0("normSamp_", 1:5))
109 | ```
110 |
111 | The disadvantage of doing it this way is that you've lost the advantage of using the single `simpleCache()` function for both saving and loading, but this may be desirable in some cases.
112 |
113 | By the way, once a cache is created, you no longer need to provide instructions:
114 |
115 | ```{r}
116 | simpleCache("normSamp")
117 | ```
118 |
119 | `simpleCache` will load it if it can; if not, it will give you an error saying it requires an `instruction`.
120 |
121 | ## Timing cache creating
122 |
123 | If you want to record how long it takes to create a new cache, you can set `timer=TRUE`.
124 |
125 | ```{r}
126 | simpleCache("normSamp", { rnorm(1e6, 0,1) }, recreate=TRUE, timer=TRUE)
127 | ```
128 |
129 | ## Complicated code
130 |
131 | So far, our examples have cached the result of a very simple instruction code block: the `rnorm` call to randomly generate some numbers. But really, simpleCache can be used to cache anything. The code block can be whatever you want; whatever it returns will be cached. For example, let's cache the result of a call to `t.test()`:
132 |
133 | ```{r}
134 | simpleCache("tResult", {
135 | dat2 = rnorm(1e5, 0.05,2)
136 | t.test(normSamp, dat2)
137 | }, recreate=TRUE)
138 |
139 | tResult
140 | tResult$p.value
141 | ```
142 |
143 | The point is that the code could be quite complicated and time-consuming. You may only want to calculate it once, and then re-use the result in another script -- or in this same script next time you run it. `simpleCache` makes that, well, simple.
144 |
145 | That's the end of the basics. There are a few more advanced options as well, such as using a shared cache directory, submitting compute requests to a cluster using `batchtools`, tweaking the loading environment with the `loadEnvir` parameter (if you need to call `simpleCache()` from within a function), and tweaking the cache building resources with the `buildEnvir` parameter. But these options are more advanced and probably not needed for 95% of `simpleCache` use cases. If you do need more information, you can find further help in the other vignettes or in the detailed R function documentation (see `?simpleCache`).
146 |
147 | ```{r Clean up}
148 | deleteCaches("normSamp", force=TRUE)
149 | deleteCaches(paste0("normSamp_", 1:5), force=TRUE)
150 | deleteCaches("tResult", force=TRUE)
151 | ```
--------------------------------------------------------------------------------