├── .Rbuildignore ├── DESCRIPTION ├── NAMESPACE ├── NEWS ├── R ├── BFCOption.R ├── BiocFileCache-class.R ├── httr.R ├── makeBiocFileCacheFromDataFrame.R ├── makeCachedActiveBinding.R ├── sql.R ├── sql_migration.R ├── utilities.R └── zzz.R ├── README.md ├── TODO.md ├── inst └── schema │ └── BiocFileCache.sql ├── man ├── BFCOption.Rd ├── BiocFileCache-class.Rd ├── makeBiocFileCacheFromDataFrame.Rd └── makeCachedActiveBinding.Rd ├── tests ├── testthat.R └── testthat │ ├── test_BiocFileCache_class.R │ ├── test_httr.R │ ├── test_makeBiocFileCacheFromDataFrame.R │ ├── test_sql.R │ ├── test_sql_migration.R │ └── test_utility.R └── vignettes ├── BiocFileCache.Rmd ├── BiocFileCache_Troubleshooting.Rmd └── BiocFileCache_UseCases.Rmd /.Rbuildignore: -------------------------------------------------------------------------------- 1 | TODO.md -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: BiocFileCache 2 | Title: Manage Files Across Sessions 3 | Version: 2.99.5 4 | Authors@R: c(person("Lori", "Shepherd", 5 | email = "lori.shepherd@roswellpark.org", 6 | role = c("aut", "cre")), 7 | person("Martin", "Morgan", 8 | email = "martin.morgan@roswellpark.org", 9 | role = "aut")) 10 | Description: This package creates a persistent on-disk cache of files 11 | that the user can add, update, and retrieve. It is useful for 12 | managing resources (such as custom Txdb objects) that are costly 13 | or difficult to create, web resources, and data files used across 14 | sessions. 15 | Depends: 16 | R (>= 3.4.0), 17 | dbplyr (>= 1.0.0) 18 | Imports: 19 | methods, 20 | stats, 21 | utils, 22 | dplyr, 23 | RSQLite, 24 | DBI, 25 | filelock, 26 | curl, 27 | httr2 28 | BugReports: https://github.com/Bioconductor/BiocFileCache/issues 29 | DevelopmentURL: https://github.com/Bioconductor/BiocFileCache 30 | License: Artistic-2.0 31 | Encoding: UTF-8 32 | RoxygenNote: 7.3.2 33 | biocViews: DataImport 34 | VignetteBuilder: knitr 35 | Suggests: 36 | testthat, 37 | knitr, 38 | BiocStyle, 39 | rmarkdown, 40 | rtracklayer 41 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | export("bfcmeta<-") 4 | export(BiocFileCache) 5 | export(bfcadd) 6 | export(bfccache) 7 | export(bfccount) 8 | export(bfcdownload) 9 | export(bfcinfo) 10 | export(bfcmeta) 11 | export(bfcmetalist) 12 | export(bfcmetaremove) 13 | export(bfcneedsupdate) 14 | export(bfcnew) 15 | export(bfcpath) 16 | export(bfcquery) 17 | export(bfcquerycols) 18 | export(bfcremove) 19 | export(bfcrid) 20 | export(bfcrpath) 21 | export(bfcsync) 22 | export(bfcupdate) 23 | export(cleanbfc) 24 | export(exportbfc) 25 | export(getBFCOption) 26 | export(importbfc) 27 | export(makeBiocFileCacheFromDataFrame) 28 | export(makeCachedActiveBinding) 29 | export(removebfc) 30 | export(setBFCOption) 31 | exportMethods("[") 32 | exportMethods("[[") 33 | exportMethods("[[<-") 34 | exportMethods("bfcmeta<-") 35 | exportMethods(bfcadd) 36 | exportMethods(bfccache) 37 | exportMethods(bfccount) 38 | exportMethods(bfcdownload) 39 | exportMethods(bfcinfo) 40 | exportMethods(bfcmeta) 41 | exportMethods(bfcmetalist) 42 | exportMethods(bfcmetaremove) 43 | exportMethods(bfcneedsupdate) 44 | exportMethods(bfcnew) 45 | exportMethods(bfcpath) 46 | exportMethods(bfcquery) 47 | exportMethods(bfcquerycols) 48 | exportMethods(bfcremove) 49 | exportMethods(bfcrid) 50 | exportMethods(bfcrpath) 51 | exportMethods(bfcsync) 52 | exportMethods(bfcupdate) 53 | exportMethods(cleanbfc) 54 | exportMethods(exportbfc) 55 | exportMethods(importbfc) 56 | exportMethods(length) 57 | exportMethods(makeBiocFileCacheFromDataFrame) 58 | exportMethods(removebfc) 59 | exportMethods(show) 60 | import(RSQLite) 61 | import(dbplyr) 62 | import(httr2) 63 | import(methods) 64 | importFrom(DBI,dbExecute) 65 | importFrom(DBI,dbSendStatement) 66 | importFrom(curl,curl_escape) 67 | importFrom(dplyr,"%>%") 68 | importFrom(dplyr,collect) 69 | importFrom(dplyr,filter) 70 | importFrom(dplyr,left_join) 71 | importFrom(dplyr,mutate) 72 | importFrom(dplyr,n) 73 | importFrom(dplyr,select) 74 | importFrom(dplyr,summarize) 75 | importFrom(dplyr,tbl) 76 | importFrom(filelock,lock) 77 | importFrom(filelock,unlock) 78 | importFrom(stats,setNames) 79 | importFrom(tools,R_user_dir) 80 | importFrom(utils,capture.output) 81 | importFrom(utils,packageVersion) 82 | importFrom(utils,tar) 83 | importFrom(utils,untar) 84 | importFrom(utils,unzip) 85 | importFrom(utils,zip) 86 | -------------------------------------------------------------------------------- /NEWS: -------------------------------------------------------------------------------- 1 | CHANGES IN VERSION 2.99.0 2 | ---------------------- 3 | 4 | MAJOR UPDATES 5 | 6 | o Update downloading method from httr to httr2 7 | 8 | BUG FIX 9 | 10 | o (2.99.1) Soft failure for if HEAD does not work. Will still download and 11 | add to cache without caching information. Additionally fix bug so config can 12 | be passed down to HEAD call from bfcneedsupdate, bfcadd, and bfcrpath. 13 | 14 | o (2.99.4) Fix bug to pass proxy down to HEAD call 15 | 16 | NEW FEATURE 17 | 18 | o (2.99.2) Add progress argument to bfcadd/bfcdownload to control if a 19 | progress bar is displayed during an interactive session 20 | 21 | CHANGES IN VERSION 2.15 22 | ---------------------- 23 | 24 | BUG FIX 25 | 26 | o (2.15.1) Fix issue trying to create multiple caches in single R session 27 | 28 | CHANGES IN VERSION 2.11 29 | ---------------------- 30 | 31 | BUG FIX 32 | 33 | o (2.11.1) Merge PR to fix dbplyr compatibility issue 34 | 35 | CHANGES IN VERSION 2.9 36 | ---------------------- 37 | 38 | USER VISIBLE CHANGE 39 | 40 | o (2.9.1) Add documentation for operating behind a proxy 41 | 42 | CHANGES IN VERSION 2.7 43 | ---------------------- 44 | 45 | USER VISIBLE CHANGE 46 | 47 | o (2.7.1) Remove rappdirs officially from package after deprecating 48 | functionality and moving to using default R tools caching location. 49 | 50 | BUG FIX 51 | 52 | o (2.7.2) Fix mismatch of arguments from function to generic for bfcupdate 53 | 54 | CHANGES IN VERSION 2.5 55 | ---------------------- 56 | 57 | ENHANCEMENT 58 | 59 | o (2.5.2) Add option to override unique identifer when adding files to the 60 | cache. This will allow exact match of original file name. 61 | 62 | BUG FIX 63 | 64 | o (2.5.1) ERROR if interactively decide not to removebfc 65 | 66 | CHANGES IN VERSION 2.3 67 | ---------------------- 68 | 69 | ENHANCEMENT 70 | 71 | o (2.3.4) Add instructions for a shared cache across multiple users of a system 72 | o (2.3.2) Add direct SQL calls for certain retrieval functions to speed up 73 | access time. This will speed up the bfcquery function as well as any 74 | function tha utilized the underlying .sql_get_field, .get_all_rids or 75 | .get_all_web_rids. 76 | o (2.3.1) Add @LTLA solution for making bfcrpath thread safe 77 | 78 | CHANGES IN VERSION 2.1 79 | ---------------------- 80 | 81 | MAJOR UPDATES 82 | 83 | o (2.1.1) Change caching location warning/deprecation to an ERROR in 84 | preparation for removal of dependency next release. 85 | 86 | CHANGES IN VERSION 1.99 87 | ---------------------- 88 | 89 | MAJOR UPDATES 90 | 91 | o (1.99.0) The default caching location has changed. Instead of 92 | rappdirs::user_cache_dir using tools::R_user_dir. To avoid conflicting 93 | caches, a user will have to manage an old cache location before 94 | proceeding. Information for handling an old cache location is provided in 95 | the vignette. 96 | o (1.99.0) Another major change, a default caching location is automatically 97 | created in a non interactive session instead of using a temporary 98 | location. In an interactive session, a user is still prompted for 99 | permission. 100 | o (1.99.0) An enviornment variable may be set system wide or user wide to 101 | control the default caching location: BFC_CACHE. Note: do not use R 102 | variables or command line export to set this variable. It must be set system 103 | wide or user wide for reproducibility in future R sessions or else it must 104 | be specified upon ever usage. It must be set before calling 105 | library(BiocFileCache) to take effect. 106 | o (1.99.0) Fixes partial argument matching error in SQL function SQLExecute 107 | o (1.15.1) Added file locking for thread-safe SQL operations. Thanks for the PR @LTLA 108 | 109 | BUG FIX 110 | 111 | o (1.99.6) cleanbfc() incorrect format string; see 112 | https://github.com/Bioconductor/BiocFileCache/issues/31 113 | 114 | CHANGES IN VERSION 1.11 115 | ---------------------- 116 | 117 | BUG FIX 118 | 119 | o (1.11.5) Update dplyr functions for dplyr_1.0.0 release 120 | o (1.11.2) Fixes run of examples. Different result manual vs interactive so 121 | have conditional that will allow complete example if run by user rather than 122 | run automatically through the checks. 123 | 124 | NEW FEATURES 125 | 126 | o (1.11.1) Add makeCachedActiveBinding(). Allows data to be cached during 127 | ActiveBinding call. Avoids callback function executing data retrieval 128 | everytime. 129 | 130 | o (1.11.3) Only asks for / reports use of temporary cache once per 131 | session. https://github.com/Bioconductor/TENxBrainData/issues/7 132 | 133 | CHANGES IN VERSION 1.7 134 | ---------------------- 135 | 136 | BUG FIX 137 | 138 | o (1.7.10) Fix date comparison to use as.POSIXlt instead of as.Date. This 139 | catches the rare edge case if a remote resource is updated twice in a day 140 | and a download happened in between the two. 141 | o (1.7.9) Make creation of database atomic commits 142 | o (1.7.8) Make adding resource to database atomic commits 143 | o (1.7.4) Limit where possible opening database in RW mode. Use read-only 144 | mode whereever possible to avoid database locking 145 | o (1.7.2) Fix bug in multi path addition through bfcadd/bfcrpaths. The 146 | function was assuming all files had same rtype and action. Now can 147 | vectorize 148 | o (1.7.1) Fix man documentation example. Interactive vs non interactive 149 | session 150 | 151 | CHANGES IN VERSION 1.5 152 | ---------------------- 153 | 154 | NEW FEATURES 155 | 156 | o (v. 1.5.1) Add 'exact = ' for exact matching in bfcquery(), 157 | bfcrpath(). (v. 1.5.2) defaults to TRUE for bfcrpath() 158 | 159 | BUG FIX 160 | 161 | o (v. 1.5.2) bfcrpath() more robust when adding regular expression 162 | rnames. 163 | 164 | USER-VISIBLE CHANGES 165 | 166 | o (v. 1.5.4) bfcpath() implementation change. Only displays rpath and can 167 | work with multiple rids. bfcpath only for rpath access while bfcrpath is the 168 | option to get or add. 169 | 170 | CHANGES IN VERSION 1.3 171 | ---------------------- 172 | 173 | NEW FEATURES 174 | 175 | o (v. 1.3.35) Save a post download processed file to cache. 176 | o (v. 1.3.28) Add ask = TRUE argument to BiocFileCache(). 177 | o (v. 1.3.24) Add function makeBiocFileCacheFromDataFrame to 178 | convert data.frame to BiocFileCache 179 | o (v. 1.3.19) etag now checked in addition to last modified time to 180 | determine if local version of file is current 181 | o (v. 1.3.16) importbfc to load output of exportbfc 182 | o (v. 1.3.15) exportbfc allows users to create exportable archive of bfc 183 | related files 184 | o (v. 1.3.11) bfcisrelative checks for rtype='local' in addition to relative 185 | o (v. 1.3.11) Helper function to check rtype for local but 186 | relative, and web and update if necessary 187 | o (v. 1.3.10) Add function to check portability of BiocFileCache: 188 | bfcisrelative 189 | o (v. 1.3.10) Add function to convert rpaths for portability: bfcrelative 190 | o (v. 1.3.9) Optionally download web resource when adding to cache 191 | bfcadd(download=TRUE) 192 | 193 | SCHEMA CHANGE 194 | 195 | o (v. 1.3.36) expires added 196 | o (v. 1.3.19) etag added 197 | o (v. 1.3.9) Last modified time default is NA 198 | 199 | USER-VISIBLE CHANGES 200 | 201 | o (v. 1.3.38) ... argument exposed and pased to GET, this includes bfcadd 202 | which original ... was pasesed to file.copy. This use of ... is used in bfc 203 | functions bfcadd(), bfcupdate() and bfcdownload() which could potential 204 | download the file. 205 | o (v. 1.3.30) bfcnew(), bfcadd(), accept vector arguments; 206 | performance improvements and 207 | o (v. 1.3.25) bfcinfo and bfcquery will show full rpaths even if 208 | stored as relative 209 | o (v. 1.3.22) prompt user only once when using default cache 210 | o (v. 1.3.17) prompt user when overwriting exisiting file 211 | o (v. 1.3.16) add importbfc to extract output of exportbfc and 212 | load bfc object 213 | o (v. 1.3.16) added exportbfc to export bfc or subset of bfc 214 | o (v. 1.3.16) bfcisportblae and bfcportbale removed for exportbfc 215 | o (v. 1.3.14) upon bfc creation, option to update to most current schema 216 | o (v. 1.3.13) bfcportable operates over all offending ids instead 217 | of asking individually 218 | o (v. 1.3.12) bfcisrelative/bfcrelative changed to 219 | bfcisportbale/bfcportable 220 | o (v. 1.3.11) web resource rpaths are stored as relative links 221 | o (v. 1.3.9) If web resource was not downloaded, bfcneedsupdate is TRUE 222 | o (v. 1.3.9) Local and non downloaded web will have last modified 223 | time as NA 224 | o (v. 1.3.6) Update how default cache location is determined 225 | o (v. 1.3.1) Expose GET::config argument to web resource functions 226 | 227 | BUG CORRECTION 228 | 229 | o (v. 1.3.40) correct trycatch with cache_info. cache_info bug workaround 230 | orginally output NA even if present, now manually grab values if present 231 | o (v. 1.3.37) correct which functions update access_time 232 | o (v. 1.3.26) patch for cache_info after returning etag and last_modified 233 | o (v. 1.3.4) patch for cache_info bug 234 | 235 | 236 | 237 | CHANGES IN VERSION 1.1 238 | ---------------------- 239 | 240 | NEW FEATURES 241 | 242 | o (v. 1.1.7) bfcmeta() and friends allows arbitrary metadata on 243 | records and ability to query bfcmeta data. 244 | 245 | USER-VISIBLE CHANGES 246 | 247 | o (v. 1.1.16) if httr::HEAD fails don't try httr::GET, just report not 248 | available 249 | 250 | o (v. 1.1.7) bfcquery() syntax changed to grep(), rather than SQL 251 | 'LIKE'. 252 | 253 | o (v. 1.1.7) bfcquery() supports user-specified 'fields='. 254 | 255 | o (v. 1.1.5) queryCount renamed to bfccount 256 | 257 | o (v. 1.1.2) Add user specified extension to file in cache 258 | 259 | o (v. 1.1.2) Use dbExecute/dbSendStatement rather than pasting query 260 | 261 | o Files in cache retain basename and extension of original file 262 | 263 | CHANGES IN VERSION 0.99.0 264 | ------------------------- 265 | 266 | SIGNIFICANT NEW FEATURES 267 | 268 | o First Bioconductor release. 269 | -------------------------------------------------------------------------------- /R/BFCOption.R: -------------------------------------------------------------------------------- 1 | .bfc_options <- new.env(parent=emptyenv()) 2 | 3 | 4 | .bfc_option_key <- function(key0=c("CACHE")) 5 | match.arg(key0) 6 | 7 | #' BFCOption 8 | #' These functions help get and set an R variable CACHE that controls the 9 | #' default caching location. 10 | #' @details 11 | #' Currently the only supported option is CACHE. This controls the default 12 | #' location of the BiocFileCache caching directory. By default the value is 13 | #' established by \code{tools::R_user_dir("BiocFileCache",which="cache")}. This 14 | #' value can also be defaultly set by using system and global environment 15 | #' variables visible \emph{before} the package is loaded. The variable that 16 | #' should be set if utilized is \dQuote{BFC_CACHE} 17 | #' @param arg character(1) option to get or set 18 | #' @param value The value to be assigned to the designated option 19 | #' @return Value of request option or invisible successfully set option 20 | #' @examples 21 | #' origPath = getBFCOption('CACHE') 22 | #' \donttest{setBFCOption('CACHE', "~/.myBFC") } 23 | #' @name BFCOption 24 | #' @author Lori Shepherd 25 | #' @aliases setBFCOption 26 | #' @aliases getBFCOption 27 | #' @export getBFCOption 28 | #' @export setBFCOption 29 | setBFCOption <- function(arg, value) 30 | { 31 | key <- .bfc_option_key(toupper(trimws(arg))) 32 | 33 | .bfc_options[[key]] <- switch(key, CACHE={ 34 | value <- as.character(value) 35 | stopifnot(length(value)==1) 36 | value 37 | }) 38 | } 39 | 40 | getBFCOption <- function(arg) { 41 | arg <- .bfc_option_key(toupper(arg)) 42 | .bfc_options[[arg]] 43 | } 44 | -------------------------------------------------------------------------------- /R/httr.R: -------------------------------------------------------------------------------- 1 | # taken/based on httr::parse_http_date 2 | .fmt_http_date <- function(x, failure = structure(NA_real_, class = "Date")) { 3 | if (length(x) == 0) 4 | return(NULL) 5 | fmts <- c("%a, %d %b %Y %H:%M:%S", "%A, %d-%b-%y %H:%M:%S", 6 | "%a %b %d %H:%M:%S %Y") 7 | for (fmt in fmts) { 8 | parsed <- as.POSIXct(strptime(x, fmt, tz = "GMT")) 9 | if (all(!is.na(parsed))) 10 | return(parsed) 11 | } 12 | rep(failure, length(x)) 13 | } 14 | 15 | # taken/based on httr::parse_cache_control 16 | .fmt_cache_control <- function(cachecontrol){ 17 | if (is.null(cachecontrol)) 18 | return(list()) 19 | pieces <- strsplit(cachecontrol, ",")[[1]] 20 | pieces <- gsub("^\\s+|\\s+$", "", pieces) 21 | pieces <- tolower(pieces) 22 | is_value <- grepl("=", pieces) 23 | flags <- pieces[!is_value] 24 | keyvalues <- strsplit(pieces[is_value], "\\s*=\\s*") 25 | keys <- c(rep("flags", length(flags)), lapply(keyvalues, 26 | "[[", 1)) 27 | values <- c(flags, lapply(keyvalues, "[[", 2)) 28 | stats::setNames(values, keys) 29 | } 30 | 31 | # taken/based on httr::cache_info 32 | .get_http_expires <- function(response){ 33 | expires <- .fmt_http_date(resp_header(response, "expires"), Inf) %||% NULL 34 | control <- .fmt_cache_control(resp_header(response, "cache-control")) 35 | max_age <- as.integer(control$`max-age`) %||% NULL 36 | if (!is.null(max_age)) { 37 | expires <- .fmt_http_date(resp_header(response, "date")) + max_age 38 | } else { 39 | if (!is.null(resp_header(response, "expires"))) { 40 | expires <- .fmt_http_date(resp_header(response, "expires"), -Inf) 41 | } else { 42 | expires <- NULL 43 | } 44 | } 45 | expires 46 | } 47 | 48 | .httr_get_cache_info <- 49 | function(link, proxy, config) 50 | { 51 | if(missing(config)) 52 | config <- NULL 53 | 54 | if((length(config)==0)){ 55 | config <- NULL 56 | } else { 57 | stopifnot(is.list(config)) 58 | } 59 | 60 | if (missing(proxy)){ 61 | proxy <- "" 62 | } 63 | if (proxy == ""){ 64 | proxy <- NULL 65 | } 66 | 67 | req <- request(link) %>% 68 | req_method("HEAD") 69 | 70 | # Apply the proxy if it's not NULL 71 | if (!is.null(proxy)) { 72 | req <- req %>% req_proxy(proxy) 73 | } 74 | 75 | # Apply the config if it's not NULL 76 | if (!is.null(config)) { 77 | req <- req %>% req_options(!!!config) 78 | } 79 | 80 | response = tryCatch({ 81 | req %>% req_perform() 82 | }, warning = function(w) { 83 | invokeRestart("muffleWarning") 84 | }, error = function(e){ 85 | message("Error while performing HEAD request.\n", 86 | " Proceeding without cache information.") 87 | response() 88 | }) 89 | 90 | tryCatch({ 91 | etag <- ifelse(resp_header_exists(response, "etag"), 92 | gsub("\"", "",as.character(resp_header(response, "etag"))), 93 | NA_character_) 94 | last_mod <- ifelse(resp_header_exists(response, "last-modified"), 95 | as.character(.fmt_http_date(resp_header(response, "last-modified"))), 96 | NA_character_) 97 | expires <- .get_http_expires(response) 98 | expires <- ifelse(is.null(expires), NA_character_, as.character(expires)) 99 | 100 | c(etag = etag, modified = last_mod, expires = expires) 101 | }, error = function(err) { 102 | 103 | if ("etag" %in% names(response$headers)){ 104 | etag = as.character(response[["headers"]]["etag"]) 105 | } else { 106 | etag = NA_character_ 107 | } 108 | if ("last-modified" %in% names(response$headers)){ 109 | modified = as.character(response[["headers"]]["last-modified"]) 110 | } else { 111 | modified = NA_character_ 112 | } 113 | if ("cache-control" %in% names(response$headers)){ 114 | cachecontrol = 115 | .fmt_cache_control(as.character(response[["headers"]]["cache-control"])) 116 | max_age = as.integer(control$`max-age`) %||% NULL 117 | if (!is.null(max_age)) { 118 | expires <- .fmt_http_date(as.character(response[["headers"]]["date"])) + max_age 119 | }else{ 120 | if ("expires" %in% names(response$headers)){ 121 | expires = as.character(response[["headers"]]["expires"]) 122 | } else { 123 | expires = NA_character_ 124 | } 125 | } 126 | } else { 127 | expires = NA_character_ 128 | } 129 | 130 | c(etag = etag, modified = modified, expires = expires) 131 | }) 132 | } 133 | 134 | #' @importFrom utils packageVersion 135 | .httr_download <- 136 | function(websource, localfile, proxy, progress, config, ...) 137 | { 138 | 139 | ## retrieve file from hub to cache 140 | tryCatch({ 141 | 142 | if (proxy == ""){ 143 | proxy <- NULL 144 | } 145 | 146 | if((length(config)==0)){ 147 | config <- NULL 148 | } else { 149 | stopifnot(is.list(config)) 150 | } 151 | 152 | if(missing(progress)){ 153 | progress=TRUE 154 | } 155 | 156 | if (!all(file.exists(dirname(localfile)))) dir.create(dirname(localfile), recursive=TRUE) 157 | 158 | # set up request using httr2 159 | req <- request(websource) 160 | # This enables progress bars in httr2 161 | if((interactive() && progress)){ 162 | req <- req %>% req_progress() 163 | } 164 | # Apply the proxy if it's not NULL 165 | if (!is.null(proxy)) { 166 | req <- req %>% req_proxy(proxy) 167 | } 168 | # Apply the config if it's not NULL 169 | if (!is.null(config)) { 170 | req <- req %>% req_options(!!!config) 171 | } 172 | ## httr2 req_perform does not have an overwrite 173 | ## assume overwrite will occur automatically 174 | ## Perform the request and capture the response 175 | response <- req_perform(req, path=localfile, ...) 176 | 177 | cat("\n") ## line break after progress bar 178 | 179 | # No longer needed? 180 | # if (length(status_code(response))) { 181 | # if (status_code(response) != 302L) 182 | # stop_for_status(response) 183 | # } 184 | 185 | TRUE 186 | }, error = function(err) { 187 | warning("download failed", 188 | "\n web resource path: ", sQuote(websource), 189 | "\n local file path: ", sQuote(localfile), 190 | "\n reason: ", conditionMessage(err), 191 | call.=FALSE) 192 | FALSE 193 | }) 194 | 195 | } 196 | -------------------------------------------------------------------------------- /R/makeBiocFileCacheFromDataFrame.R: -------------------------------------------------------------------------------- 1 | #' Make BiocFileCache objects from an existing data.frame 2 | #' 3 | #' If there are a lot of resources being added this could take some 4 | #' time but if a cache is saved in a permanent location this should 5 | #' only have to be run once. The original data.frame must have the 6 | #' required columns 'rtype', 'fpath', and 'rpath'; See the vignette 7 | #' for more information on the expected information contained in these 8 | #' columns. Similarly, the optional columns 'rname', 'etag', 9 | #' 'last_modified_time', and 'expires' may be included. Any additional columns 10 | #' not listed as required or optional will be kept as an additional 11 | #' metadata table in the BiocFileCache database. 12 | #' 13 | #' @param df data.frame or tibble to convert 14 | #' @param cache character(1) On-disk location (directory path) of 15 | #' cache. For default location see 16 | #' \code{\link[tools]{R_user_dir}}. 17 | #' @param actionLocal If local copy of file should be moved, copied or 18 | #' left in original location. See 'action' param of bfcadd. 19 | #' @param actionWeb If a local copy of a remote resource already 20 | #' exists, should the file be copied or moved to the 21 | #' cache. Locally downloaded remote resources must exist in the 22 | #' cache location. 23 | #' @param metadataName If there are additional columns of data in the 24 | #' original data.frame besides required BiocFileCache columns, 25 | #' this data will be added as a metadata table with this name. 26 | #' @param ... additional arguments passed to `file.copy()`. 27 | #' @param ask logical(1) Confirm creation of BiocFileCache. 28 | #' 29 | #' @return A BiocFileCache object 30 | #' @export 31 | setGeneric("makeBiocFileCacheFromDataFrame", 32 | function(df, cache, 33 | actionLocal=c("move","copy","asis"), actionWeb=c("move","copy"), 34 | metadataName, 35 | ..., ask = TRUE) 36 | standardGeneric("makeBiocFileCacheFromDataFrame"), 37 | signature = "df" 38 | ) 39 | 40 | #' @rdname makeBiocFileCacheFromDataFrame 41 | #' @aliases makeBiocFileCacheFromDataFrame,ANY-method 42 | #' @exportMethod makeBiocFileCacheFromDataFrame 43 | setMethod("makeBiocFileCacheFromDataFrame", "ANY", 44 | function(df, cache, 45 | actionLocal=c("move","copy","asis"), actionWeb=c("move","copy"), 46 | metadataName, 47 | ..., ask = TRUE) 48 | { 49 | stopifnot(is.data.frame(df)) 50 | DF <- as.data.frame(df, stringsAsFactors = FALSE) 51 | if (missing(cache)) 52 | cache <- tools::R_user_dir("BiocFileCache", which="cache") 53 | stopifnot( 54 | is.character(cache), length(cache) == 1L, !is.na(cache), 55 | !dir.exists(cache) 56 | ) 57 | actionLocal <- match.arg(actionLocal) 58 | actionWeb <- match.arg(actionWeb) 59 | 60 | .required <- c("rtype", "fpath", "rpath") 61 | .optional <- c("rname", "etag", "last_modified_time", "expires") 62 | .possible <- c(.required, .optional) 63 | if (!all(.required %in% names(DF))) { 64 | stop("One of the following required columns in not in data.frame:", 65 | "\n ", paste(.required, collapse=", "), 66 | "\n Please insert into original data.frame") 67 | } 68 | .optional <- .optional[.optional %in% names(DF)] 69 | .available <- c(.required, .optional) 70 | metadata <- names(DF)[!names(DF) %in% .available] 71 | if (any(metadata %in% c("rid",.RESERVED$COLUMNS))) { 72 | nocols <- c("rid", setdiff(.RESERVED$COLUMNS, .possible)) 73 | stop("The following are reserved column names:", 74 | "\n ", paste(nocols, collapse=", "), 75 | "\n Please rename offending column name.") 76 | } 77 | if (length(metadata) != 0) 78 | stopifnot(!missing(metadataName), 79 | is.character(metadataName), length(metadataName) == 1L, 80 | !is.na(metadataName), !(metadataName %in% .RESERVED$TABLES)) 81 | 82 | # validity of .required columns 83 | stopifnot(is.character(DF[["rtype"]]), 84 | is.character(DF[["fpath"]]), 85 | is.character(DF[["rpath"]])) 86 | rtype <- DF[["rtype"]] 87 | fpath <- DF[["fpath"]] 88 | rpath <- DF[["rpath"]] 89 | 90 | stopifnot(all(rtype %in% c("web", "local"))) 91 | web <- which(rtype == "web") 92 | if (length(web) != 0L) { 93 | webpaths <- fpath[web] 94 | test <- startsWith(webpaths, "http") | startsWith(webpaths, "ftp") 95 | if (!all(test)) 96 | stop("Some source urls for files identified with 'rtype=web'\n", 97 | " do not start with: http, https, or ftp") 98 | } 99 | nonweb <- which(rtype != "web") 100 | if (length(nonweb) != 0L && !all(file.exists(rpath[nonweb]))) 101 | stop("Not all files identified as 'rtype=local' have existing files") 102 | 103 | # validity of .optional columns 104 | if (length(.optional) != 0L) { 105 | check <- vapply(.optional, FUN = function(x, df) { 106 | is.character(df[[x]]) 107 | }, logical(1), df=DF) 108 | if (!all(check)) 109 | stop("The following columns must have entries of type 'character':", 110 | "\n ", paste(.optional, collapse=", ")) 111 | } 112 | if ("last_modified_time" %in% .optional) { 113 | check <- tryCatch({ 114 | as.Date(DF[["last_modified_time"]]) 115 | TRUE 116 | }, error=function(e) { 117 | warning(conditionMessage(e)) 118 | FALSE 119 | }) 120 | if (!check) { 121 | stop("Column 'last_modified_time' must have entries of type ", 122 | "'character' that can be converted to Date via 'as.Date()'") 123 | } 124 | modified <- DF[["last_modified_time"]] 125 | } else { 126 | modified <- rep(NA_character_, nrow(DF)) 127 | } 128 | 129 | if ("rname" %in% .optional) { 130 | rname <- DF[["rname"]] 131 | } else { 132 | rname <- fpath 133 | } 134 | 135 | if ("etag" %in% .optional) { 136 | etag <- DF[["etag"]] 137 | } else { 138 | etag <- rep(NA_character_, nrow(DF)) 139 | } 140 | 141 | if ("expires" %in% .optional) { 142 | expires <- DF[["expires"]] 143 | } else { 144 | expires <- rep(NA_character_, nrow(DF)) 145 | } 146 | 147 | bfc <- BiocFileCache(cache, ask = ask) 148 | 149 | # add resources to cache 150 | for (i in seq_len(nrow(DF))) { 151 | 152 | if (rtype[i] == "web") { 153 | npath <- fpath[i] 154 | action <- actionWeb 155 | } else { 156 | npath <- rpath[i] 157 | action <- actionLocal 158 | } 159 | 160 | res <- bfcadd(bfc, rname=rname[i], fpath = npath, rtype = "auto", 161 | action = action, download=FALSE, ...) 162 | rid <- names(res) 163 | .sql_set_last_modified(bfc, rid, modified[i]) 164 | .sql_set_etag(bfc, rid, etag[i]) 165 | .sql_set_expires(bfc, rid, expires[i]) 166 | } 167 | 168 | # if local version of remote exists, copy or move 169 | for (i in web) { 170 | cpath <- bfcrpath(bfc, rids=paste0("BFC",i)) 171 | opath <- rpath[i] 172 | if (file.exists(opath)) { 173 | switch(actionWeb, 174 | copy = file.copy(opath, cpath, ...), 175 | move = file.rename(opath, cpath) 176 | ) 177 | } 178 | } 179 | 180 | # create metadata 181 | if (length(metadata) != 0) { 182 | tbl <- cbind(rid=paste0("BFC",seq_len(nrow(DF))), 183 | DF[,metadata,drop=FALSE]) 184 | bfcmeta(bfc, name=metadataName) <- tbl 185 | } 186 | 187 | bfc 188 | }) 189 | -------------------------------------------------------------------------------- /R/makeCachedActiveBinding.R: -------------------------------------------------------------------------------- 1 | #' makeCachedActiveBinding 2 | #' 3 | #' Like \code{\link{makeActiveBinding}} but the value of the active 4 | #' binding gets only evaluated once and is "remembered". 5 | #' 6 | #' @param sym See \code{\link{makeActiveBinding}} in the \pkg{base} 7 | #' package. 8 | #' @param fun See \code{\link{makeActiveBinding}} in the \pkg{base} 9 | #' package. 10 | #' @param env See \code{\link{makeActiveBinding}} in the \pkg{base} 11 | #' package. 12 | #' @param verbose Set to TRUE to see caching in action (useful for 13 | #' troubleshooting). 14 | #' 15 | #' @name makeCachedActiveBinding 16 | #' @aliases makeCachedActiveBinding 17 | #' @export makeCachedActiveBinding 18 | #' 19 | #' @examples 20 | #' makeCachedActiveBinding("x", function() runif(1), verbose=TRUE) 21 | #' x 22 | #' x 23 | makeCachedActiveBinding <- function(sym, fun, env=.GlobalEnv, verbose=FALSE) 24 | { 25 | caching_env <- new.env(parent=emptyenv()) 26 | fun2 <- function(value) { 27 | if (!missing(value)) 28 | stop("assignment to active binding '", sym, "' is not allowed") 29 | val <- try(get(sym, envir=caching_env, inherits=FALSE), silent=TRUE) 30 | if (inherits(val, "try-error")) { 31 | if (verbose) 32 | cat("evaluating and caching value ", 33 | "for active binding '", sym, "' ... ", sep="") 34 | val <- fun() 35 | assign(sym, val, envir=caching_env) 36 | if (verbose) 37 | cat("OK\n") 38 | } else { 39 | if (verbose) 40 | cat("using cached value for active binding '", sym, "'\n", 41 | sep="") 42 | } 43 | val 44 | } 45 | makeActiveBinding(sym, fun2, env=env) 46 | } 47 | 48 | -------------------------------------------------------------------------------- /R/sql.R: -------------------------------------------------------------------------------- 1 | #' @import RSQLite 2 | #' @importFrom DBI dbExecute dbSendStatement 3 | #' @import dbplyr 4 | #' @importFrom dplyr %>% tbl select collect summarize filter n left_join 5 | #' @importFrom curl curl_escape 6 | #' @importFrom filelock lock unlock 7 | 8 | .formatID <- . %>% collect(Inf) %>% `[[`("rid") 9 | 10 | lock.env <- new.env() 11 | lock.env$status <- NA 12 | 13 | .lock2 <- function(dbfile, exclusive) { 14 | if (is.na(lock.env$status)) { 15 | lock.env$status <- exclusive 16 | lock(.sql_lock_path(dbfile), exclusive = exclusive) 17 | } else if (lock.env$status || !exclusive) { 18 | # Exclusive lock held by a caller is compatible 19 | # with a subsequent request for a shared lock; 20 | # we're not escalating privileges here. 21 | NULL 22 | } else { 23 | stop("requested an exclusive lock when caller only holds a shared lock") 24 | } 25 | } 26 | 27 | .unlock2 <- function(loc) { 28 | if (!is.null(loc)) { 29 | lock.env$status <- NA 30 | unlock(loc) 31 | } 32 | } 33 | 34 | .sql_file <- 35 | function(bfc, file) 36 | { 37 | file.path(bfccache(bfc), file) 38 | } 39 | 40 | .sql_dbfile <- 41 | function(bfc) 42 | { 43 | .sql_file(bfc, .CACHE_FILE) 44 | } 45 | 46 | .sql_cmd <- 47 | function(cmd_name, add=FALSE, ...) 48 | { 49 | sql_cmd_file <- 50 | system.file(package="BiocFileCache", "schema", "BiocFileCache.sql") 51 | sql_cmds <- readLines(sql_cmd_file) 52 | grps <- cumsum(grepl("^--", sql_cmds)) 53 | cmds <- split(sql_cmds, grps) 54 | names <- vapply(cmds, "[[", character(1), 1) 55 | cmds <- paste(cmds[[which(names == cmd_name)]], collapse="\n") 56 | if (add) 57 | cmds <- sprintf(cmds, ...) 58 | cmds 59 | } 60 | 61 | .sql_lock_path <- 62 | function(dbfile) 63 | { 64 | paste0(dbfile, '.LOCK') 65 | } 66 | 67 | .sql_connect_RO <- 68 | function(dbfile) 69 | { 70 | ## See notes in AnnotationDbi::dbFileConnect 71 | ## did not want to import AnnotationDbi in BFC because it is large 72 | ## overkill for this function 73 | if (!file.exists(dbfile)) 74 | stop("DB file '", dbfile, "' not found") 75 | 76 | loc <- .lock2(dbfile, exclusive=FALSE) 77 | 78 | if (.Platform$OS.type == "unix") { 79 | con <- dbConnect(SQLite(), dbname=dbfile, cache_size=64000L, 80 | synchronous="off", flags=SQLITE_RO, vfs="unix-none") 81 | } else { 82 | ## Use default 'vfs' on Windows. 83 | con <- dbConnect(SQLite(), dbname=dbfile, cache_size=64000L, 84 | synchronous="off", flags=SQLITE_RO) 85 | } 86 | 87 | list(lock=loc, con=con) 88 | } 89 | 90 | .sql_connect_RW <- 91 | function(dbfile) 92 | { 93 | ## We also need a RW function to allow writing to the cache 94 | 95 | loc <- .lock2(dbfile, exclusive=TRUE) 96 | 97 | if (.Platform$OS.type == "unix") { 98 | con <- dbConnect(SQLite(), dbname=dbfile, cache_size=64000L, 99 | synchronous="off", vfs="unix-none") 100 | } else { 101 | ## Use default 'vfs' on Windows. 102 | con <- dbConnect(SQLite(), dbname=dbfile, cache_size=64000L, 103 | synchronous="off") 104 | } 105 | 106 | list(lock=loc, con=con) 107 | } 108 | 109 | .sql_disconnect <- 110 | function(info) 111 | { 112 | dbDisconnect(info$con) 113 | .unlock2(info$lock) 114 | } 115 | 116 | .sql_schema_version <- 117 | function(bfc) 118 | { 119 | tryCatch({ 120 | info <- .sql_connect_RO(.sql_dbfile(bfc)) 121 | con <- info$con 122 | src <- src_dbi(con) 123 | tbl <- tbl(src, "metadata") %>% collect(n = Inf) 124 | }, finally={.sql_disconnect(info)}) 125 | tbl$value[tbl$key=="schema_version"] 126 | } 127 | 128 | ## R / RSQLite, DBI interface 129 | 130 | .sql_db_execute <- 131 | function(bfc, sql, ..., con) 132 | { 133 | param <- data.frame(..., stringsAsFactors = FALSE) 134 | if (nrow(param) == 0L) 135 | param <- NULL 136 | 137 | if (missing(con)) { 138 | info <- .sql_connect_RW(.sql_dbfile(bfc)) 139 | con <- info$con 140 | on.exit(.sql_disconnect(info)) 141 | } 142 | dbExecute(con, sql, params = param) 143 | } 144 | 145 | .sql_db_get_query <- 146 | function(bfc, sql, ..., con) 147 | { 148 | param <- data.frame(..., stringsAsFactors = FALSE) 149 | if (nrow(param) == 0L) 150 | param <- NULL 151 | 152 | if (missing(con)) { 153 | info <- .sql_connect_RO(.sql_dbfile(bfc)) 154 | con <- info$con 155 | on.exit(.sql_disconnect(info)) 156 | } 157 | dbGetQuery(con, sql, param) 158 | } 159 | 160 | ## BiocFileCache / RSQLite interface 161 | 162 | .sql_create_db <- 163 | function(bfc) 164 | { 165 | fl <- .sql_dbfile(bfc) 166 | if (!file.exists(fl)) { 167 | sql <- strsplit(.sql_cmd("-- CREATE_DB"), ";")[[1]] 168 | tryCatch({ 169 | info <- .sql_connect_RW(.sql_dbfile(bfc)) 170 | con <- info$con 171 | dbExecute(con, sql[[1]]) 172 | ## update metadata table 173 | .sql_db_execute(bfc, sql[[2]], con=con) 174 | package_version <- as.character(packageVersion("BiocFileCache")) 175 | .sql_db_execute( 176 | bfc, sql[[3]], 177 | key = c('schema_version', 'package_version'), 178 | value = c(.CURRENT_SCHEMA_VERSION, package_version), 179 | con=con) 180 | ## create new resource table 181 | .sql_db_execute(bfc, sql[[4]], con=con) 182 | dbExecute(con, sql[[5]]) 183 | }, finally={.sql_disconnect(info)}) 184 | } 185 | .sql_validate_version(bfc) 186 | fl 187 | } 188 | 189 | .sql_select_query <- 190 | function(bfc, where, ...) 191 | { 192 | sql <- .sql_cmd("-- SELECT_QUERY") 193 | cmd <- sprintf(sql, where) 194 | res <- .sql_db_get_query(bfc, cmd, ...) 195 | class(res) <- c("tbl_bfc", class(res)) 196 | res 197 | } 198 | 199 | .sql_add_resource <- 200 | function(bfc, rname, rtype, fpath, ext = NA_character_, fname = "unique") 201 | { 202 | # The connection attempt handles locking if another process is adding a 203 | # resource at the same time; by trying to connect first, we don't have to 204 | # worry about whether the choice of temporary file name is thread-safe. 205 | info <- .sql_connect_RW(.sql_dbfile(bfc)) 206 | on.exit(if (!is.null(info)) { .sql_disconnect(info) }) 207 | 208 | rpath <- rep(path.expand(tempfile("", bfccache(bfc))), length(fpath)) 209 | rtype <- unname(rtype) 210 | dx <- rtype == "relative" | rtype == "web" 211 | rpath[dx] <- basename(rpath[dx]) 212 | 213 | fpath[is.na(fpath)] <- rpath[is.na(fpath)] 214 | ext[is.na(ext)] <- "" 215 | bfname <- basename(fpath) 216 | bfname <- curl_escape(bfname) 217 | if (fname=="unique") { 218 | rpath <- sprintf("%s_%s%s", rpath, bfname, ext) 219 | } else { 220 | rpath <- sprintf("%s%s",bfname, ext) 221 | } 222 | sql <- strsplit(.sql_cmd("-- INSERT"), ";")[[1]] 223 | con <- info$con 224 | dbExecute(con, sql[[1]]) 225 | original_rid <- .sql_db_get_query(bfc, sql[[2]], con=con)[["rid"]] 226 | .sql_db_execute( 227 | bfc, sql[[3]], 228 | rname = rname, rtype = rtype, fpath = fpath, rpath = rpath, 229 | last_modified_time = as.Date(NA_character_), etag = NA_character_, 230 | expires = NA_character_, con=con 231 | ) 232 | .sql_db_execute(bfc, sql[[4]], con=con) 233 | rid <- .sql_db_get_query(bfc, sql[[2]], con=con)[["rid"]] 234 | dbExecute(con, sql[[5]]) 235 | 236 | # Free the file, as .sql_get_rpath() reacquires the lock internally. 237 | .sql_disconnect(info) 238 | info <- NULL 239 | 240 | .sql_get_rpath(bfc, setdiff(rid, original_rid)) 241 | } 242 | 243 | .sql_remove_resource <- 244 | function(bfc, rid) 245 | { 246 | sql <- .sql_cmd("-- REMOVE") 247 | cmd <- sprintf(sql, paste0("'", rid, "'", collapse = ",")) 248 | .sql_db_execute(bfc, cmd) 249 | } 250 | 251 | .sql_get_resource_table <- 252 | function(bfc, rids) 253 | { 254 | tryCatch({ 255 | info <- .sql_connect_RO(.sql_dbfile(bfc)) 256 | con <- info$con 257 | src <- src_dbi(con) 258 | tbl <- tbl(src, "resource") 259 | 260 | if (missing(rids)) { 261 | } else if (length(rids) == 0) { 262 | tbl <- tbl %>% dplyr::filter(rid == NA_character_) 263 | } else if (length(rids) == 1) { 264 | tbl <- tbl %>% dplyr::filter(rid == rids) 265 | } else { 266 | tbl <- tbl %>% dplyr::filter(rid %in% rids) 267 | } 268 | 269 | ## join metadata 270 | meta <- setdiff(dbListTables(con), .RESERVED$TABLES) 271 | for (m in meta) 272 | tbl <- left_join(tbl, tbl(src, m), by="rid") 273 | 274 | tbl <- tbl %>% collect 275 | }, finally={.sql_disconnect(info)}) 276 | class(tbl) <- c("tbl_bfc", class(tbl)) 277 | tbl %>% dplyr::select(-id) 278 | } 279 | 280 | .sql_get_nrows <- 281 | function(bfc) 282 | { 283 | summarize(bfc, n=n()) %>% collect %>% `[[`("n") 284 | } 285 | 286 | .sql_get_field <- 287 | function(bfc, id, field) 288 | { 289 | stopifnot(all(id %in% .get_all_rids(bfc))) 290 | sql <- .sql_cmd("-- SELECT_COLUMN") 291 | cmd <- sprintf(sql, field, paste0("'", id, "'", collapse = ",")) 292 | res <- .sql_db_get_query(bfc, cmd) 293 | setNames(res[[field]], res[["rid"]]) 294 | } 295 | 296 | .sql_get_rname <- 297 | function(bfc, rid) 298 | { 299 | .sql_get_field(bfc, rid, "rname") 300 | } 301 | 302 | .sql_get_rtype <- 303 | function(bfc, rid) 304 | { 305 | .sql_get_field(bfc, rid, "rtype") 306 | } 307 | 308 | .sql_get_fpath <- 309 | function(bfc, rid) 310 | { 311 | .sql_get_field(bfc, rid, "fpath") 312 | } 313 | 314 | .sql_get_rpath <- 315 | function(bfc, rid) 316 | { 317 | rtype <- .sql_get_rtype(bfc, rid) 318 | rpath <- .sql_get_field(bfc, rid, "rpath") 319 | idx <- rtype %in% c("relative", "web") 320 | rpath[idx] <- file.path(bfccache(bfc), rpath)[idx] 321 | rpath 322 | } 323 | 324 | .sql_set_rpath <- 325 | function(bfc, rid, rpath) 326 | { 327 | sql <- .sql_cmd("-- UPDATE_PATH") 328 | .sql_db_execute(bfc, sql, rid = rid, rpath = rpath) 329 | } 330 | 331 | .sql_set_time <- 332 | function(bfc, rid) 333 | { 334 | sql <- .sql_cmd("-- UPDATE_TIME") 335 | cmd <- sprintf(sql, paste0("'", rid, "'", collapse = ",")) 336 | .sql_db_execute(bfc, cmd) 337 | } 338 | 339 | .sql_set_rname <- 340 | function(bfc, rid, rname) 341 | 342 | { 343 | sql <- .sql_cmd("-- UPDATE_RNAME") 344 | .sql_db_execute(bfc, sql, rid = rid, rname = rname) 345 | } 346 | 347 | .sql_set_rtype <- 348 | function(bfc, rid, rtype) 349 | { 350 | sql <- .sql_cmd("-- UPDATE_RTYPE") 351 | .sql_db_execute(bfc, sql, rid = rid, rtype = rtype) 352 | } 353 | 354 | .sql_clean_cache <- 355 | function(bfc, days) 356 | { 357 | mytbl <- .sql_get_resource_table(bfc) %>% 358 | dplyr::select(rid, access_time) %>% collect(Inf) 359 | accessDate <- as.Date(as.character(mytbl$access_time)) 360 | diffTime <- Sys.Date() - accessDate 361 | mytbl[diffTime > days, 1] %>% .formatID 362 | } 363 | 364 | .get_all_rids <- 365 | function(bfc) 366 | { 367 | sql <- .sql_cmd("-- SELECT_IDS") 368 | .sql_db_get_query(bfc, sql)[,1] 369 | } 370 | 371 | .get_all_web_rids <- 372 | function(bfc) 373 | { 374 | sql <- .sql_cmd("-- SELECT_WEB") 375 | .sql_db_get_query(bfc, sql)[,1] 376 | } 377 | 378 | .sql_get_last_modified <- 379 | function(bfc, rid) 380 | { 381 | .sql_get_field(bfc, rid, "last_modified_time") 382 | } 383 | 384 | .sql_set_last_modified <- 385 | function(bfc, rid, last_modified_time) 386 | { 387 | sql <- .sql_cmd("-- UPDATE_MODIFIED") 388 | .sql_db_execute( 389 | bfc, sql, rid = rid, last_modified_time = last_modified_time 390 | ) 391 | } 392 | 393 | .sql_get_etag <- 394 | function(bfc, rid) 395 | { 396 | .sql_get_field(bfc, rid, "etag") 397 | } 398 | 399 | .sql_set_etag <- 400 | function(bfc, rid, etag) 401 | { 402 | sql <- .sql_cmd("-- UPDATE_ETAG") 403 | .sql_db_execute(bfc, sql, rid = rid, etag = etag) 404 | } 405 | .sql_get_expires <- 406 | function(bfc, rid) 407 | { 408 | .sql_get_field(bfc, rid, "expires") 409 | } 410 | 411 | .sql_set_expires <- 412 | function(bfc, rid, expires) 413 | { 414 | sql <- .sql_cmd("-- UPDATE_EXPIRES") 415 | .sql_db_execute(bfc, sql, rid = rid, expires = expires) 416 | } 417 | 418 | .sql_set_fpath <- 419 | function(bfc, rid, fpath) 420 | { 421 | sql <- .sql_cmd("-- UPDATE_FPATH") 422 | .sql_db_execute(bfc, sql, rid = rid, fpath = fpath) 423 | } 424 | 425 | .get_rid_filenotfound <- 426 | function(bfc) 427 | { 428 | allpaths <- bfcrpath(bfc) 429 | names(allpaths)[!file.exists(allpaths)] 430 | } 431 | 432 | .get_tbl_rid <- 433 | function(tbl) 434 | { 435 | tbl %>% .formatID 436 | } 437 | 438 | .get_all_colnames <- 439 | function(bfc) 440 | { 441 | colnames(.sql_get_resource_table(bfc)) 442 | } 443 | 444 | .get_nonrelative_ids <- 445 | function(bfc) 446 | { 447 | rpaths <- .sql_get_rpath(bfc, bfcrid(bfc)) 448 | res <- startsWith(rpaths, bfccache(bfc)) 449 | names(rpaths)[!res] 450 | } 451 | 452 | ## 453 | ## .sql_meta_* 454 | ## 455 | 456 | .sql_meta_gets <- 457 | function(bfc, name, value, ...) 458 | { 459 | tryCatch({ 460 | info <- .sql_connect_RW(.sql_dbfile(bfc)) 461 | con <- info$con 462 | dbWriteTable(con, name, value, ...) 463 | }, finally={.sql_disconnect(info)}) 464 | } 465 | 466 | .sql_meta_remove <- 467 | function(bfc, name, ...) 468 | { 469 | tryCatch({ 470 | info <- .sql_connect_RW(.sql_dbfile(bfc)) 471 | con <- info$con 472 | if (dbExistsTable(con, name)) 473 | dbRemoveTable(con, name, ...) 474 | }, finally={.sql_disconnect(info)}) 475 | } 476 | 477 | .sql_meta <- 478 | function(bfc, name, ...) 479 | { 480 | tryCatch({ 481 | info <- .sql_connect_RO(.sql_dbfile(bfc)) 482 | con <- info$con 483 | if (!dbExistsTable(con, name)) 484 | stop("'", name, "' not found in database") 485 | dbReadTable(con, name, ...) 486 | }, finally={.sql_disconnect(info)}) 487 | } 488 | 489 | .sql_meta_list <- 490 | function(bfc) 491 | { 492 | tryCatch({ 493 | info <- .sql_connect_RO(.sql_dbfile(bfc)) 494 | con <- info$con 495 | res <- dbListTables(con) 496 | setdiff(res, .RESERVED$TABLES) 497 | }, finally={.sql_disconnect(info)}) 498 | } 499 | 500 | .sql_filter_metadata <- 501 | function(bfc, name, verbose) 502 | { 503 | df <- bfcmeta(bfc, name) 504 | rids <- bfcrid(bfc) 505 | check <- as.character(df$rid) %in% rids 506 | if (all(!check)) { 507 | bfcmetaremove(bfc, name) 508 | vl <- FALSE 509 | } else if (any(!check)) { 510 | df <- df[check,] 511 | bfcmeta(bfc, name, overwrite=TRUE) <- df 512 | vl <- FALSE 513 | } else { 514 | vl <- TRUE 515 | } 516 | vl 517 | } 518 | -------------------------------------------------------------------------------- /R/sql_migration.R: -------------------------------------------------------------------------------- 1 | .sql_validate_version <- 2 | function(bfc) 3 | { 4 | schema_version <- .sql_schema_version(bfc) 5 | 6 | if (!schema_version %in% .SUPPORTED_SCHEMA_VERSIONS) 7 | stop( 8 | "unsupported schema version ", 9 | "\n sqlite file: ", .sql_dbfile(bfc), 10 | "\n file schema version: '", schema_version, "'", 11 | "\n supported version(s): ", 12 | paste(sQuote(.SUPPORTED_SCHEMA_VERSIONS), collapse=" ") 13 | ) 14 | 15 | if (schema_version != .CURRENT_SCHEMA_VERSION) 16 | .sql_migration(bfc) 17 | 18 | schema_version 19 | } 20 | 21 | .sql_migration_update_schema_version <- 22 | function(bfc, schema_version) 23 | { 24 | ## update metadata table for package version and schema 25 | sql <- .sql_cmd("-- MIGRATION_UPDATE_METADATA") 26 | .sql_db_execute(bfc, sql, key = "schema_version", value = schema_version) 27 | .sql_db_execute( 28 | bfc, sql, key = "package_version", 29 | value = as.character(packageVersion("BiocFileCache")) 30 | ) 31 | 32 | schema_version 33 | } 34 | 35 | .sql_migration <- 36 | function(bfc) 37 | { 38 | schema_version <- .sql_schema_version(bfc) 39 | 40 | if (.biocfilecache_flags$get_update_asked()) 41 | return(schema_version) 42 | 43 | ## This is necessary for a few modifications from the old schema 44 | ## We made web resource rpaths relative since we only allow to use 45 | ## download and checks if using a cache location for the files 46 | ## default last_modified_time to NA instead of Sys.Date for 47 | ## local/relative/Non downloaded web resources We added option to 48 | ## Non download resource which can't default to Sys.Date 49 | 50 | message("Current schema_version ", schema_version, " is out-of-date.\n\n", 51 | "Current Version will NOT work as expect.\n", 52 | "Recommend Updating to lastest schema_version.\n", 53 | "Notable Changes:\n", 54 | " 1. Web Resource 'rpath' stored as relative path\n", 55 | " 2. Default last_modified time for\n", 56 | " local/relative/nondownloaded/last_modified_notfound\n", 57 | " resources is NA not Sys.Date\n", 58 | " 3. Added etag to schema\n", 59 | " 4. Added expires to schema\n") 60 | doit <- .util_ask( 61 | "Update current BiocFileCache to be consistent with\n", 62 | " schema_version: ", .CURRENT_SCHEMA_VERSION, "\n", 63 | " This will be a permanent change but only necessary once.\n", 64 | " Continue?" 65 | ) 66 | .biocfilecache_flags$set_update_asked() 67 | 68 | if (!doit) { 69 | warning("BiocFileCache schema not updated\n", 70 | " bfccache(): ", bfccache(bfc)) 71 | return() 72 | } 73 | 74 | if (schema_version == "0.99.1") 75 | schema_version <- .sql_migration_0991_to_0992(bfc) 76 | 77 | if (schema_version == "0.99.2") 78 | schema_version <- .sql_migration_0992_to_0993(bfc) 79 | 80 | if (schema_version == "0.99.3") 81 | schema_version <- .sql_migration_0993_to_0994(bfc) 82 | 83 | schema_version 84 | } 85 | 86 | .sql_migration_0993_to_0994 <- 87 | function(bfc) 88 | { 89 | message("applying migration from 0.99.3 to 0.99.4") 90 | sql <- .sql_cmd("-- MIGRATION_0_99_3_to_0_99_4") 91 | .sql_db_execute(bfc, sql) 92 | .sql_migration_update_schema_version(bfc, "0.99.4") 93 | } 94 | 95 | .sql_migration_0992_to_0993 <- 96 | function(bfc) 97 | { 98 | message("applying migration from 0.99.2 to 0.99.3") 99 | sql <- .sql_cmd("-- MIGRATION_0_99_2_to_0_99_3") 100 | .sql_db_execute(bfc, sql) 101 | .sql_migration_update_schema_version(bfc, "0.99.3") 102 | } 103 | 104 | .sql_migration_0991_to_0992 <- 105 | function(bfc) 106 | { 107 | message("applying migration from 0.99.1 to 0.99.2") 108 | ## truncate rpaths of all web resources 109 | wid <- .get_all_web_rids(bfc) 110 | badpaths <- bfcrpath(bfc, rids=wid) 111 | pattern <- paste0(bfccache(bfc),"/", bfccache(bfc),"/") 112 | check <- startsWith(badpaths, pattern) 113 | if (any(!check)) { 114 | ids <- wid[!check] 115 | warning("Some web resources do not currently have rpath in cache.\n", 116 | " Bad paths: ", paste0("'", ids, "'", collapse=" "), "\n", 117 | " These resources will now be considered rtype='local'") 118 | .sql_set_rtype(bfc, ids, "local") 119 | } 120 | 121 | wid <- wid[check] 122 | badpaths <- badpaths[check] 123 | newpaths <- gsub(badpaths, pattern=pattern, replacement="") 124 | message("Updating rpath for the following web resources:\n", 125 | " ", paste0("'", wid, "'", collapse=" ")) 126 | for(i in seq_along(wid)){ 127 | .sql_set_rpath(bfc, wid[i], newpaths[i]) 128 | } 129 | 130 | ## change local/relative lmt to NA 131 | nonweb <- setdiff(.get_all_rids(bfc), wid) 132 | if (length(nonweb) != 0) { 133 | message("Updating last modified time for the following\n", 134 | " non web resources:\n", 135 | " ", paste0("'", nonweb, "'", collapse=" ")) 136 | .sql_set_last_modified(bfc, nonweb, NA_character_) 137 | } 138 | 139 | ## check last_modified of all web 140 | for (i in seq_along(wid)) { 141 | fpath <- .sql_get_fpath(bfc, wid[i]) 142 | check_time <- .httr_get_cache_info(fpath)[["modified"]] 143 | if (is.na(check_time)) 144 | .sql_set_last_modified(bfc, wid[i], NA_character_) 145 | } 146 | 147 | .sql_migration_update_schema_version(bfc, "0.99.2") 148 | } 149 | -------------------------------------------------------------------------------- /R/utilities.R: -------------------------------------------------------------------------------- 1 | .CACHE_FILE <- "BiocFileCache.sqlite" 2 | .CACHE_FILE_LOCK <- "BiocFileCache.sqlite.LOCK" 3 | 4 | .CURRENT_SCHEMA_VERSION <- "0.99.4" 5 | 6 | .SUPPORTED_SCHEMA_VERSIONS <- c("0.99.1", "0.99.2", "0.99.3", "0.99.4") 7 | 8 | .RESERVED <- list( # dynamically, in .onLoad? 9 | TABLES = c("metadata", "resource", "sqlite_sequence"), 10 | COLUMNS = c( 11 | "id", "rname", "create_time", "access_time", "rpath", "rtype", 12 | "fpath", "last_modified_time", "etag", "expires" 13 | ) 14 | ) 15 | 16 | .biocfilecache_flags <- local({ 17 | update_asked <- FALSE 18 | create_asked <- FALSE 19 | ## used for unit tests -- default response to '.util_ask' 20 | ask_response <- NULL 21 | list(get_update_asked = function() { 22 | update_asked 23 | }, set_update_asked = function() { 24 | update_asked <<- TRUE 25 | }, get_create_asked = function() { 26 | create_asked 27 | }, set_create_asked = function() { 28 | create_asked <<- TRUE 29 | }, get_ask_response = function() { 30 | ask_response 31 | }, set_ask_response = function(value) { 32 | #oresponse <- ask_response 33 | ask_response <<- value 34 | #invisible(oresponse) 35 | }) 36 | }) 37 | 38 | .util_standardize_rtype <- 39 | function(rtype, fpath, action) 40 | { 41 | stopifnot(length(rtype) == length(fpath), 42 | length(fpath) == length(action)) 43 | 44 | vapply(seq_along(fpath), 45 | function(i, rtype, fpath, action){ 46 | .util_standardize_rtype_helper(rtype[i], fpath[i], action[i]) 47 | }, 48 | character(1), USE.NAMES=FALSE, rtype=rtype, fpath=fpath, 49 | action=action) 50 | } 51 | 52 | .util_standardize_rtype_helper<- 53 | function(rtype, fpath, action) 54 | { 55 | if (identical(unname(rtype), "auto")) { 56 | test <- startsWith(fpath, "http") || startsWith(fpath, "ftp") 57 | if (test) 58 | rtype <- "web" 59 | else if (action == "asis") 60 | rtype <- "local" 61 | else 62 | rtype <- "relative" 63 | } else if (rtype != "local" && action == "asis") { 64 | warning( 65 | "action = 'asis' requires rtype = 'local'; ", 66 | "setting rtype = 'local'" 67 | ) 68 | rtype <- "local" 69 | } 70 | 71 | rtype 72 | } 73 | 74 | 75 | .util_ask <- 76 | function(..., .interactive = interactive()) 77 | { 78 | if (!.interactive) 79 | return(FALSE) 80 | txt <- paste0(..., " (yes/no): ") 81 | if (!is.null(.biocfilecache_flags$get_ask_response())) { 82 | ## unit tests only 83 | message(txt) 84 | return(.biocfilecache_flags$get_ask_response()) 85 | } 86 | repeat { 87 | response <- substr(tolower(readline(txt)), 1, 1) 88 | doit <- switch(response, y = TRUE, n = FALSE, NA) 89 | if (!is.na(doit)) 90 | break 91 | } 92 | doit 93 | } 94 | 95 | .util_unlink <- 96 | function(rpaths, ...) 97 | { 98 | gc() 99 | status <- unlink(rpaths, ..., force=TRUE) == 0L 100 | if (!all(status)) 101 | warning( 102 | "failed to unlink cache resource(s):", 103 | "\n ", paste(sQuote(rpaths[status]), collapse="\n ") 104 | ) 105 | gc() 106 | status 107 | } 108 | 109 | .util_set_cache_info <- 110 | function(bfc, rid, fpath = .sql_get_fpath(bfc, rid), proxy, config) 111 | { 112 | if (length(rid) == 0L) 113 | return(bfc) 114 | 115 | cache_info <- .httr_get_cache_info(fpath, proxy, config) 116 | .sql_set_last_modified(bfc, rid, cache_info[["modified"]]) 117 | .sql_set_etag(bfc, rid, cache_info[["etag"]]) 118 | .sql_set_expires(bfc, rid, cache_info[["expires"]]) 119 | bfc 120 | } 121 | 122 | .util_download <- 123 | function(bfc, rid, proxy, progress, config, call, ...) 124 | { 125 | rpath <- .sql_get_rpath(bfc, rid) 126 | fpath <- .sql_get_fpath(bfc, rid) 127 | status <- Map( 128 | .httr_download, fpath, rpath, 129 | MoreArgs = list(proxy = proxy, progress = progress, config = config, ...) 130 | ) 131 | ok <- vapply(status, isTRUE, logical(1)) 132 | if (!all(ok)) { 133 | bfcremove(bfc, rid[!ok]) 134 | warning( 135 | call, " failed; resource removed", 136 | "\n rid: ", paste(rid[!ok], collapse = " "), 137 | "\n fpath: ", paste(sQuote(fpath[!ok]), collapse = "\n "), 138 | "\n reason: download failed", 139 | call. = FALSE 140 | ) 141 | } 142 | .util_set_cache_info(bfc, rid[ok], proxy=proxy, config=config) 143 | 144 | if (!all(ok)) 145 | stop(call, " failed; see warnings()") 146 | } 147 | 148 | .util_download_and_rename <- 149 | function(bfc, rid, proxy, progress, config, call, fpath = .sql_get_fpath(bfc, rid), 150 | FUN, ...) 151 | { 152 | rpath <- .sql_get_rpath(bfc, rid) 153 | force(fpath) 154 | 155 | # The connection is not actually necessary - but we just use it to 156 | # handle thread-safe locking, specifically to avoid race conditions 157 | # from multiple threads choosing the same tempfile name during download. 158 | info <- .sql_connect_RW(.sql_dbfile(bfc)) 159 | on.exit(.sql_disconnect(info)) 160 | 161 | if (missing(FUN)) 162 | FUN <- file.rename 163 | 164 | status <- Map(function(rpath, fpath) { 165 | temppath <- tempfile(tmpdir=bfccache(bfc)) 166 | 167 | status <- .httr_download(fpath, temppath, proxy, progress, config, ...) 168 | if (!status) 169 | return("download failed") 170 | 171 | status <- tryCatch({ 172 | st <- FUN(temppath, rpath) 173 | if (file.exists(temppath)){ file.remove(temppath) } 174 | st 175 | }, error = function(err){ 176 | warning("FUN() failed", 177 | "\n reason: ", conditionMessage(err), 178 | call.=FALSE) 179 | FALSE 180 | }) 181 | 182 | status 183 | }, rpath, fpath) 184 | 185 | ok <- vapply(status, isTRUE, logical(1)) 186 | if (!all(ok)) 187 | warning( 188 | call, " failed", 189 | "\n rid: ", paste(rid[!ok], collapse=" "), 190 | "\n file: ", paste(sQuote(fpath)[!ok], collapse = "\n "), 191 | "\n reason: ", paste(unique(unlist(status[!ok])), collapse = ", "), 192 | call. = FALSE 193 | ) 194 | 195 | .util_set_cache_info(bfc, rid[ok], fpath[ok], proxy=proxy, config=config) 196 | 197 | if (!all(ok)) 198 | stop("download failed; see warnings()", call.=FALSE) 199 | } 200 | 201 | .util_export_file <- 202 | function(bfc, rid, dir) 203 | { 204 | 205 | rtype <- .sql_get_rtype(bfc, rid) 206 | rpath <- .sql_get_rpath(bfc, rid) 207 | loc <- file.exists(rpath) 208 | if (!loc) { 209 | if (identical(unname(rtype), "web")) { 210 | vl <- "web" 211 | } else { 212 | vl <- NA_character_ 213 | } 214 | } else { 215 | if (identical(unname(rtype), "local")) { 216 | vl <- "local" 217 | } else { 218 | newpath <- file.path(dir, basename(rpath)) 219 | file.copy(rpath, newpath) 220 | vl <- "relative" 221 | } 222 | } 223 | vl 224 | } 225 | -------------------------------------------------------------------------------- /R/zzz.R: -------------------------------------------------------------------------------- 1 | globalVariables(c("rid", "id", "control", "access_time")) 2 | 3 | .onLoad <- function(libname, pkgname, ...) { 4 | ## options from getOption or Sys.env or default, in that order 5 | if (is.null(getBFCOption("CACHE"))) { 6 | path <- tools::R_user_dir("BiocFileCache", which="cache") 7 | opt <- getOption("BFC_CACHE", path) 8 | opt <- Sys.getenv("BFC_CACHE", opt) 9 | setBFCOption("CACHE", opt) 10 | } 11 | } 12 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Bioconductor/BiocFileCache/e88fb8e4beb200304c4f72f2e07cbe8da6ad49a8/README.md -------------------------------------------------------------------------------- /TODO.md: -------------------------------------------------------------------------------- 1 | ######################### 2 | # 3 | # quick run 4 | # 5 | ######################### 6 | 7 | library(devtools) 8 | library(RSQLite) 9 | library(DBI) 10 | library(dplyr) 11 | library(httr) 12 | library(testthat) 13 | document() 14 | install() 15 | library(BiocFileCache) 16 | example("BiocFileCache-class") 17 | 18 | ## SCHEMA help 19 | library(devtools) 20 | library(RSQLite) 21 | library(dplyr) 22 | library(httr) 23 | library(DBI) 24 | install() 25 | library(BiocFileCache) 26 | source("R/utilities.R") 27 | source("R/sql.R") 28 | 29 | bfc = BiocFileCache() 30 | 31 | bfc0 <- BiocFileCache(tempfile()) 32 | path <- bfcnew(bfc0, "NewResource") 33 | 34 | 35 | sqlfile <- .sql_dbfile(bfc0) 36 | con <- dbConnect(SQLite(), sqlfile) 37 | dbReadTable(con, "resource") 38 | 39 | ## meta data testing 40 | x = BiocFileCache() 41 | 42 | bfcinfo() 43 | 44 | bfclistmeta() 45 | 46 | meta = as.data.frame(list(rid=c("BFC1", "BFC3"), info=c("something", "to add"), 47 | num=c(1,3))) 48 | meta2 = as.data.frame(list(rid=c("BFC5", "BFC6"), new=c("blah", "foo"), info=c(4,6))) 49 | 50 | bfcaddmeta(meta=meta) 51 | bfcaddmeta(meta=meta2) 52 | bfcaddmeta(meta=meta2, name="secondname") 53 | 54 | bfclistmeta() 55 | 56 | bfcinfo() 57 | bfcinfo(rids=c("BFC11", "BFC7")) 58 | bfcgetmeta(name="resourcedata") 59 | -------------------------------------------------------------------------------- /inst/schema/BiocFileCache.sql: -------------------------------------------------------------------------------- 1 | -- IF UPDATE SCHEME CHANGE VARIABLES IN utilities.R 2 | -- CREATE_DB 3 | BEGIN TRANSACTION; 4 | CREATE TABLE metadata ( 5 | key TEXT UNIQUE NOT NULL, 6 | value TEXT 7 | ); 8 | INSERT INTO metadata ( 9 | key, value 10 | ) VALUES (:key, :value); 11 | CREATE TABLE resource ( 12 | id INTEGER PRIMARY KEY AUTOINCREMENT, 13 | rid TEXT, 14 | rname TEXT, 15 | create_time DATETIME DEFAULT CURRENT_TIMESTAMP, 16 | access_time DATETIME DEFAULT CURRENT_TIMESTAMP, 17 | rpath TEXT, 18 | rtype TEXT, 19 | fpath TEXT, 20 | last_modified_time DATETIME DEFAULT NA, 21 | etag TEXT DEFAULT NA, 22 | expires DATETIME DEFAULT NA 23 | ); 24 | COMMIT; 25 | -- INSERT 26 | BEGIN TRANSACTION; 27 | SELECT rid FROM resource; 28 | INSERT INTO resource ( 29 | rname, rpath, rtype, fpath, last_modified_time, etag, expires 30 | ) VALUES ( 31 | :rname, :rpath, :rtype, :fpath, :last_modified_time, :etag, :expires 32 | ); 33 | UPDATE resource SET rid = "BFC" || id; 34 | COMMIT; 35 | -- REMOVE 36 | DELETE FROM resource WHERE rid IN (%s); 37 | -- SELECT_QUERY 38 | SELECT * FROM resource WHERE %s; 39 | -- SELECT_COLUMN 40 | SELECT rid, %s FROM resource WHERE rid IN (%s); 41 | -- SELECT_IDS 42 | SELECT rid FROM resource; 43 | -- SELECT_WEB 44 | SELECT rid FROM resource where rtype == 'web'; 45 | -- UPDATE_PATH 46 | UPDATE resource 47 | SET rpath = :rpath, access_time = CURRENT_TIMESTAMP 48 | WHERE rid = :rid; 49 | -- UPDATE_TIME 50 | UPDATE resource 51 | SET access_time = CURRENT_TIMESTAMP 52 | WHERE rid IN (%s); 53 | -- UPDATE_RNAME 54 | UPDATE resource 55 | SET rname = :rname, access_time = CURRENT_TIMESTAMP 56 | WHERE rid = :rid; 57 | -- UPDATE_RTYPE 58 | UPDATE resource 59 | SET rtype = :rtype, access_time = CURRENT_TIMESTAMP 60 | WHERE rid = :rid; 61 | -- UPDATE_MODIFIED 62 | UPDATE resource 63 | SET last_modified_time = :last_modified_time, access_time = CURRENT_TIMESTAMP 64 | WHERE rid = :rid; 65 | -- UPDATE_FPATH 66 | UPDATE resource 67 | SET fpath = :fpath, access_time = CURRENT_TIMESTAMP 68 | WHERE rid = :rid; 69 | -- UPDATE_ETAG 70 | UPDATE resource 71 | SET etag = :etag, access_time = CURRENT_TIMESTAMP 72 | WHERE rid = :rid; 73 | -- UPDATE_EXPIRES 74 | UPDATE resource 75 | SET expires = :expires, access_time = CURRENT_TIMESTAMP 76 | WHERE rid = :rid; 77 | -- MIGRATION_0_99_1_to_0_99_2 78 | -- MIGRATION_0_99_2_to_0_99_3 79 | ALTER TABLE resource 80 | ADD etag TEXT; 81 | -- MIGRATION_0_99_3_to_0_99_4 82 | ALTER TABLE resource 83 | ADD expires DATETIME; 84 | -- MIGRATION_UPDATE_METADATA 85 | UPDATE metadata 86 | SET value = :value 87 | WHERE key = :key 88 | -------------------------------------------------------------------------------- /man/BFCOption.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/BFCOption.R 3 | \name{BFCOption} 4 | \alias{BFCOption} 5 | \alias{setBFCOption} 6 | \alias{getBFCOption} 7 | \title{BFCOption 8 | These functions help get and set an R variable CACHE that controls the 9 | default caching location.} 10 | \usage{ 11 | setBFCOption(arg, value) 12 | } 13 | \arguments{ 14 | \item{arg}{character(1) option to get or set} 15 | 16 | \item{value}{The value to be assigned to the designated option} 17 | } 18 | \value{ 19 | Value of request option or invisible successfully set option 20 | } 21 | \description{ 22 | BFCOption 23 | These functions help get and set an R variable CACHE that controls the 24 | default caching location. 25 | } 26 | \details{ 27 | Currently the only supported option is CACHE. This controls the default 28 | location of the BiocFileCache caching directory. By default the value is 29 | established by \code{tools::R_user_dir("BiocFileCache",which="cache")}. This 30 | value can also be defaultly set by using system and global environment 31 | variables visible \emph{before} the package is loaded. The variable that 32 | should be set if utilized is \dQuote{BFC_CACHE} 33 | } 34 | \examples{ 35 | origPath = getBFCOption('CACHE') 36 | \donttest{setBFCOption('CACHE', "~/.myBFC") } 37 | } 38 | \author{ 39 | Lori Shepherd 40 | } 41 | -------------------------------------------------------------------------------- /man/BiocFileCache-class.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/BiocFileCache-class.R 3 | \name{BiocFileCache-class} 4 | \alias{BiocFileCache-class} 5 | \alias{BiocFileCache} 6 | \alias{bfccache,BiocFileCacheBase-method} 7 | \alias{bfccache} 8 | \alias{bfccache,missing-method} 9 | \alias{length,BiocFileCacheBase-method} 10 | \alias{bfcrid} 11 | \alias{bfcrid,missing-method} 12 | \alias{bfcrid,BiocFileCacheReadOnly-method} 13 | \alias{bfcrid,BiocFileCache-method} 14 | \alias{[,BiocFileCache,character,missing-method} 15 | \alias{[,BiocFileCacheReadOnly,character,missing-method} 16 | \alias{[,BiocFileCache,missing,missing-method} 17 | \alias{[,BiocFileCacheReadOnly,missing,missing-method} 18 | \alias{[[,BiocFileCacheBase,character,missing-method} 19 | \alias{[[<-,BiocFileCache,character,missing,character-method} 20 | \alias{bfcnew,missing-method} 21 | \alias{bfcnew,BiocFileCache-method} 22 | \alias{bfcnew} 23 | \alias{bfcadd,missing-method} 24 | \alias{bfcadd,BiocFileCache-method} 25 | \alias{bfcadd} 26 | \alias{bfcinfo,missing-method} 27 | \alias{bfcinfo,BiocFileCacheBase-method} 28 | \alias{bfcinfo} 29 | \alias{bfcrid,tbl_bfc-method} 30 | \alias{bfcpath,missing-method} 31 | \alias{bfcpath,BiocFileCacheBase-method} 32 | \alias{bfcpath} 33 | \alias{bfcrpath,missing-method} 34 | \alias{bfcrpath,BiocFileCacheBase-method} 35 | \alias{bfcrpath} 36 | \alias{bfcupdate,missing-method} 37 | \alias{bfcupdate,BiocFileCache-method} 38 | \alias{bfcupdate} 39 | \alias{bfcmeta<-} 40 | \alias{bfcmeta<-,BiocFileCacheBase-method} 41 | \alias{bfcmetaremove,missing-method} 42 | \alias{bfcmetaremove,BiocFileCacheBase-method} 43 | \alias{bfcmetaremove} 44 | \alias{bfcmetalist,missing-method} 45 | \alias{bfcmetalist,BiocFileCacheBase-method} 46 | \alias{bfcmetalist} 47 | \alias{bfcmeta,missing-method} 48 | \alias{bfcmeta,BiocFileCacheBase-method} 49 | \alias{bfcmeta} 50 | \alias{bfcquerycols,missing-method} 51 | \alias{bfcquerycols,BiocFileCacheBase-method} 52 | \alias{bfcquerycols} 53 | \alias{bfcquery,missing-method} 54 | \alias{bfcquery,BiocFileCacheBase-method} 55 | \alias{bfcquery} 56 | \alias{bfccount,missing-method} 57 | \alias{bfccount,BiocFileCacheBase-method} 58 | \alias{bfccount} 59 | \alias{bfccount,tbl_bfc-method} 60 | \alias{bfcneedsupdate,missing-method} 61 | \alias{bfcneedsupdate,BiocFileCacheBase-method} 62 | \alias{bfcneedsupdate} 63 | \alias{bfcdownload,missing-method} 64 | \alias{bfcdownload,BiocFileCache-method} 65 | \alias{bfcdownload} 66 | \alias{bfcremove,missing-method} 67 | \alias{bfcremove,BiocFileCache-method} 68 | \alias{bfcremove} 69 | \alias{bfcsync,missing-method} 70 | \alias{bfcsync,BiocFileCache-method} 71 | \alias{bfcsync} 72 | \alias{exportbfc,missing-method} 73 | \alias{exportbfc,BiocFileCacheBase-method} 74 | \alias{exportbfc} 75 | \alias{importbfc,character-method} 76 | \alias{importbfc} 77 | \alias{cleanbfc,missing-method} 78 | \alias{cleanbfc,BiocFileCache-method} 79 | \alias{cleanbfc} 80 | \alias{removebfc,missing-method} 81 | \alias{removebfc,BiocFileCache-method} 82 | \alias{removebfc} 83 | \alias{show,BiocFileCacheBase-method} 84 | \title{BiocFileCache class} 85 | \usage{ 86 | BiocFileCache(cache = getBFCOption("CACHE"), ask = interactive()) 87 | 88 | \S4method{bfccache}{BiocFileCacheBase}(x) 89 | 90 | \S4method{bfccache}{missing}(x) 91 | 92 | \S4method{length}{BiocFileCacheBase}(x) 93 | 94 | bfcrid(x) 95 | 96 | \S4method{bfcrid}{missing}(x) 97 | 98 | \S4method{bfcrid}{BiocFileCacheReadOnly}(x) 99 | 100 | \S4method{bfcrid}{BiocFileCache}(x) 101 | 102 | \S4method{[}{BiocFileCache,character,missing}(x, i, j, ..., drop = TRUE) 103 | 104 | \S4method{[}{BiocFileCacheReadOnly,character,missing}(x, i, j, ..., drop = TRUE) 105 | 106 | \S4method{[}{BiocFileCache,missing,missing}(x, i, j, ..., drop = TRUE) 107 | 108 | \S4method{[}{BiocFileCacheReadOnly,missing,missing}(x, i, j, ..., drop = TRUE) 109 | 110 | \S4method{[[}{BiocFileCacheBase,character,missing}(x, i, j) 111 | 112 | \S4method{[[}{BiocFileCache,character,missing,character}(x, i, j, ...) <- value 113 | 114 | \S4method{bfcnew}{missing}( 115 | x, 116 | rname, 117 | rtype = c("relative", "local"), 118 | ext = NA_character_, 119 | fname = c("unique", "exact") 120 | ) 121 | 122 | \S4method{bfcnew}{BiocFileCache}( 123 | x, 124 | rname, 125 | rtype = c("relative", "local"), 126 | ext = NA_character_, 127 | fname = c("unique", "exact") 128 | ) 129 | 130 | \S4method{bfcadd}{missing}( 131 | x, 132 | rname, 133 | fpath = rname, 134 | rtype = c("auto", "relative", "local", "web"), 135 | action = c("copy", "move", "asis"), 136 | proxy = "", 137 | download = TRUE, 138 | progress = TRUE, 139 | config = list(), 140 | ext = NA_character_, 141 | fname = c("unique", "exact"), 142 | ... 143 | ) 144 | 145 | \S4method{bfcadd}{BiocFileCache}( 146 | x, 147 | rname, 148 | fpath = rname, 149 | rtype = c("auto", "relative", "local", "web"), 150 | action = c("copy", "move", "asis"), 151 | proxy = "", 152 | download = TRUE, 153 | progress = TRUE, 154 | config = list(), 155 | ext = NA_character_, 156 | fname = c("unique", "exact"), 157 | ... 158 | ) 159 | 160 | \S4method{bfcinfo}{missing}(x, rids) 161 | 162 | \S4method{bfcinfo}{BiocFileCacheBase}(x, rids) 163 | 164 | \S4method{bfcrid}{tbl_bfc}(x) 165 | 166 | \S4method{bfcpath}{missing}(x, rids) 167 | 168 | \S4method{bfcpath}{BiocFileCacheBase}(x, rids) 169 | 170 | \S4method{bfcrpath}{missing}(x, rnames, ..., rids, exact = TRUE) 171 | 172 | \S4method{bfcrpath}{BiocFileCacheBase}(x, rnames, ..., rids, exact = TRUE) 173 | 174 | \S4method{bfcupdate}{missing}(x, rids, ...) 175 | 176 | \S4method{bfcupdate}{BiocFileCache}( 177 | x, 178 | rids, 179 | ..., 180 | rname = NULL, 181 | rpath = NULL, 182 | fpath = NULL, 183 | proxy = "", 184 | progress = TRUE, 185 | config = list(), 186 | ask = TRUE 187 | ) 188 | 189 | bfcmeta(x, name, ...) <- value 190 | 191 | \S4method{bfcmeta}{BiocFileCacheBase}(x, name, ...) <- value 192 | 193 | \S4method{bfcmetaremove}{missing}(x, name, ...) 194 | 195 | \S4method{bfcmetaremove}{BiocFileCacheBase}(x, name, ...) 196 | 197 | \S4method{bfcmetalist}{missing}(x) 198 | 199 | \S4method{bfcmetalist}{BiocFileCacheBase}(x) 200 | 201 | \S4method{bfcmeta}{missing}(x, name, ...) 202 | 203 | \S4method{bfcmeta}{BiocFileCacheBase}(x, name, ...) 204 | 205 | \S4method{bfcquerycols}{missing}(x) 206 | 207 | \S4method{bfcquerycols}{BiocFileCacheBase}(x) 208 | 209 | \S4method{bfcquery}{missing}(x, query, field = c("rname", "rpath", "fpath"), ..., exact = FALSE) 210 | 211 | \S4method{bfcquery}{BiocFileCacheBase}(x, query, field = c("rname", "rpath", "fpath"), ..., exact = FALSE) 212 | 213 | \S4method{bfccount}{missing}(x) 214 | 215 | \S4method{bfccount}{BiocFileCacheBase}(x) 216 | 217 | \S4method{bfccount}{tbl_bfc}(x) 218 | 219 | \S4method{bfcneedsupdate}{missing}(x, rids, ..., proxy = "", config = list()) 220 | 221 | \S4method{bfcneedsupdate}{BiocFileCacheBase}(x, rids, ..., proxy = "", config = list()) 222 | 223 | \S4method{bfcdownload}{missing}( 224 | x, 225 | rid, 226 | proxy = "", 227 | progress = TRUE, 228 | config = list(), 229 | ask = TRUE, 230 | FUN, 231 | ... 232 | ) 233 | 234 | \S4method{bfcdownload}{BiocFileCache}( 235 | x, 236 | rid, 237 | proxy = "", 238 | progress = TRUE, 239 | config = list(), 240 | ask = TRUE, 241 | FUN, 242 | ... 243 | ) 244 | 245 | \S4method{bfcremove}{missing}(x, rids) 246 | 247 | \S4method{bfcremove}{BiocFileCache}(x, rids) 248 | 249 | \S4method{bfcsync}{missing}(x, verbose = TRUE, ask = TRUE) 250 | 251 | \S4method{bfcsync}{BiocFileCache}(x, verbose = TRUE, ask = TRUE) 252 | 253 | \S4method{exportbfc}{missing}( 254 | x, 255 | rids, 256 | outputFile = "BiocFileCacheExport.tar", 257 | outputMethod = c("tar", "zip"), 258 | verbose = TRUE, 259 | ... 260 | ) 261 | 262 | \S4method{exportbfc}{BiocFileCacheBase}( 263 | x, 264 | rids, 265 | outputFile = "BiocFileCacheExport.tar", 266 | outputMethod = c("tar", "zip"), 267 | verbose = TRUE, 268 | ... 269 | ) 270 | 271 | \S4method{importbfc}{character}(filename, archiveMethod = c("untar", "unzip"), exdir = ".", ...) 272 | 273 | \S4method{cleanbfc}{missing}(x, days = 120, ask = TRUE) 274 | 275 | \S4method{cleanbfc}{BiocFileCache}(x, days = 120, ask = TRUE) 276 | 277 | \S4method{removebfc}{missing}(x, ask = TRUE) 278 | 279 | \S4method{removebfc}{BiocFileCache}(x, ask = TRUE) 280 | 281 | \S4method{show}{BiocFileCacheBase}(object) 282 | } 283 | \arguments{ 284 | \item{cache}{character(1) On-disk location (directory path) of 285 | cache. For default location see 286 | \code{\link[tools]{R_user_dir}}.} 287 | 288 | \item{ask}{logical(1) Ask before creating, updating, overwriting, 289 | or removing cache or local file locations.} 290 | 291 | \item{x}{A \code{BiocFileCache} instance or, if missing, the result 292 | of \code{BiocFileCache()}.} 293 | 294 | \item{i}{character() 'rid' identifiers.} 295 | 296 | \item{j}{Ignored.} 297 | 298 | \item{...}{For 'bfcadd', 'bfcupdate' and 'bfcdownload': Additional 299 | arguments passed to internal download functions for use with 300 | \code{httr2::req_perform}. For 'bfcrpaths': Additional arguments passed 301 | to 'bfcadd', or \code{exact} passed to 'bfcquery'. For 302 | 'bfcquery': Additional arguments passed to \code{grepl}. For 303 | 'exportbfc': Additional arguments to the selected outputMethod 304 | function. See \code{utils::tar} or \code{utils::zip} for more 305 | information. For 'importbfc': Additional arguments to the 306 | selected archiveMethod function. See \code{utils::untar} or 307 | \code{utils::unzip} for more information.} 308 | 309 | \item{drop}{Ignored.} 310 | 311 | \item{value}{character(1) Replacement file path.} 312 | 313 | \item{rname}{character(1) Name of object in file cache. For 314 | 'bfcupdate' a character vector of replacement rnames.} 315 | 316 | \item{rtype}{character(1) 'local', 'relative', or 'web' indicating 317 | if the resource is a local file, a relative path in the cache, 318 | or a web resource. For \code{bfcnew}: local or relative are 319 | only options. For \code{bfcadd}, the default 'auto' creates 320 | relative or web paths, based on the path prefix.} 321 | 322 | \item{ext}{character(1) A file extension to add to the local 323 | copy of the file (e.g., \sQuote{sqlite}, \sQuote{txt}, 324 | \sQuote{tar.gz}).} 325 | 326 | \item{fname}{character(1). Options are \sQuote{unique} or 327 | \sQuote{exact}. \sQuote{unique} provides each bfc resource with a unique 328 | identifier when storing the file, allowing resources with the same name 329 | to be stored in the cache. \sQuote{exact} uses the exact file name of the 330 | resource; only one of foo/my.txt and bar/my.txt could be stored. Default 331 | is \sQuote{unique}.} 332 | 333 | \item{fpath}{For bfcadd(), character(1) path to current file 334 | location or remote web resource. If none is given, the rname is 335 | assumed to also be the path location. For bfcupdate() 336 | character() vector of replacement web resources.} 337 | 338 | \item{action}{character(1) How to handle the file: create a 339 | \code{copy} of \code{fpath} in the cache directory; \code{move} 340 | the file to the cache directory; or \code{asis} leave the file 341 | in current location but save the path in the cache. If 'rtype 342 | == "relative"', action can not be "asis".} 343 | 344 | \item{proxy}{character(1) (Optional) proxy server passed to 345 | \code{httr2::req_proxy}} 346 | 347 | \item{download}{logical(1) If \code{rtype=web}, should remote 348 | resource be downloaded locally immediately.} 349 | 350 | \item{progress}{TRUE/FALSE if progress bar for downloads in interactive 351 | session should be shown} 352 | 353 | \item{config}{list() passed as argument to \code{httr2::req_options}. The 354 | names of items should be valid curl options as defined in 355 | \code{curl::curl_options}.} 356 | 357 | \item{rids}{character() Vector of rids.} 358 | 359 | \item{rnames}{character() to match against rnames. Each element of 360 | \code{rnames} must match exactly one record. Use \code{exact = 361 | FALSE} to use regular expression matching.} 362 | 363 | \item{exact}{logical(1) when FALSE, treat \code{query} as a regular 364 | expression. When TRUE, use exact matching. For \code{bfcquery}, 365 | the default is \code{FALSE} (regular expression matching; for 366 | \code{bfcrpath}, the default is \code{TRUE} (exact matching).} 367 | 368 | \item{rpath}{character() vector of replacement rpaths.} 369 | 370 | \item{name}{character(1) name of metadata table.} 371 | 372 | \item{query}{character() Regular expression pattern(s) to match in 373 | resource. It will match the pattern against \code{fields}, 374 | using \code{&} logic across query element. By default, case 375 | sensitive. When \code{exact = TRUE}, \code{query} uses exact 376 | matching.} 377 | 378 | \item{field}{character() column names in resource to query, using 379 | \code{||} logic across multiple field elements. By default, 380 | matches pattern agains rname, rpath, and fpath. If exact 381 | matching, may only be a single value.} 382 | 383 | \item{rid}{character(1) Unique resource id.} 384 | 385 | \item{FUN}{A specialized implemented function designed by the user. This 386 | function can be used to perform and save the results of a post download 387 | processing step rather than direct output. The function should ONLY take in 388 | two file names: the first the raw downloaded file and the second the output 389 | file for saved results. The output of the function should be TRUE/FALSE if 390 | step was successful. See vignette section on Specialty Advance Use Case for 391 | more details.} 392 | 393 | \item{verbose}{logical(1) If descriptive message and list of issues 394 | should be included as output.} 395 | 396 | \item{outputFile}{character(1) The /basename for the 397 | output archive. Please include appropriate extension based on 398 | outMethod and any additional parameters selected for 399 | \code{utils::tar} or \code{utils::zip}} 400 | 401 | \item{outputMethod}{Either 'tar' or 'zip' for how the directory 402 | should be archived. Default is 'tar'.} 403 | 404 | \item{filename}{character(1) The name of the archive.} 405 | 406 | \item{archiveMethod}{Either 'untar' or 'unzip' for how the directory should 407 | be extracted. Default is 'untar'.} 408 | 409 | \item{exdir}{Directory to extract files too. See \code{utils::untar} or 410 | \code{utils::unzip} for more details.} 411 | 412 | \item{days}{integer(1) Number of days between accessDate and 413 | currentDate; if exceeded entry will be deleted.} 414 | 415 | \item{object}{A \code{BiocFileCache} instance.} 416 | } 417 | \value{ 418 | For 'BiocFileCache': a \code{BiocFileCache} instance. 419 | 420 | For 'bfccache': character(1) location of the directory 421 | containing the cache. 422 | 423 | For 'length': integer(1) Number of objects in the file 424 | cache. 425 | 426 | For '[': A subset of the BiocFileCache object. 427 | 428 | For '[[': named character(1) rpath for the given resource 429 | in the cache. 430 | 431 | For '[[<-': Updated BiocFileCache, invisibly. 432 | 433 | For 'bfcnew': named character(1), the path to save your 434 | object / file. The name of the return value is the unique rid 435 | for the resource. 436 | 437 | For 'bfcadd': named character(1), the path to save your 438 | object / file. The name of the character is the unique rid for 439 | the resource. 440 | 441 | For 'bfcinfo': A \code{bfc_tbl} of current resources in the 442 | database. 443 | 444 | For 'bfcpath': the file path location to load 445 | 446 | For 'bfcrpath': The local file path location to load. 447 | 448 | For 'bfcupdate': an updated \code{BiocFileCache} object, 449 | invisibly. 450 | 451 | For 'bfcmeta': updated BiocFileCache, invisibly 452 | 453 | For 'bfcmetaremove': updated BiocFileCache, invisibly 454 | 455 | For 'bfcmetalist': returns a character() of all metadata tables 456 | currently in the database. If no metadata tables are available returns 457 | character(0) 458 | 459 | For 'bfcmeta': returns a data.frame representation of database 460 | table 461 | 462 | For 'bfcquerycols': character() all columns in all database tables 463 | available for query. 464 | 465 | For 'bfcquery': A \code{bfc_tbl} of current resources in 466 | the database whose \code{field} contained query. If multiple 467 | values are given, the resource must contain all of the 468 | patterns. A tbl with zero rows is returned when no resources 469 | match the query. 470 | 471 | For 'bfccount': integer(1) Number of objects in the cache 472 | or query. 473 | 474 | For 'bfcneedsupdate': named logical vector if resource 475 | needs to be updated. The name is the resource 476 | 'rid'. \code{TRUE}: fpath \code{etag} or \code{modified} time of 477 | web resource more recent than in BiocFileCache; \code{FALSE}: fpath 478 | \code{etag} or \code{modified} time of web resource not more recent 479 | than in BiocFileCache; \code{NA}: web resource etag and modified time 480 | could not be determined. If the etag is available the function will use 481 | that information definitively and only compare last modified time if 482 | etag is not available. If there is an \code{expires} time that will be 483 | used to initially determine if the resource should be updated. 484 | 485 | For 'bfcdownload': character(1) path to downloaded resource 486 | in cache. 487 | 488 | For 'bfcremove': updated BiocFileCache object, invisibly 489 | 490 | For 'bfcsync': logical(1) indicating whether the cache is 491 | in sync (\code{TRUE}) or not. 'verbose' is TRUE by default, so 492 | descriptive messages will also be included. 493 | 494 | character(1) The outputFile path. 495 | 496 | A BiocFileCache object 497 | 498 | For 'cleanbfc': updated BiocFileCache, invisibly. 499 | 500 | For 'removebfc': TRUE if successfully removed. 501 | } 502 | \description{ 503 | This class represents the location of files stored on disk. Use the 504 | return value to add and retrieve files that persist across 505 | sessions. 506 | } 507 | \details{ 508 | The package defines 'BiocFileCache', 'BiocFileCacheBase' and 509 | 'BiocFileCacheReadOnly' classes. 510 | 511 | Slots unique to 'BiocFileCache' and related classes: 512 | \describe{ 513 | \item{'cache': }{character(1) on-disk location (directory path) of the 514 | cache} 515 | \item{'rid': }{character() of unique rids in the cache. } 516 | } 517 | 518 | The cache creates an RSQLite database to keep track of local and remote 519 | resources. Each item located in the database will have the following 520 | information: 521 | \describe{ 522 | \item{'rid': }{resource id. Autogenerated. This is a unique identifier 523 | automatically generated when a resource is added to the cache} 524 | \item{'rname': }{resource name. This is given by the user when a 525 | resource is added to the cache. It does not have to be unique 526 | and can be updated at anytime. We recommend descriptive key 527 | words and identifers.} 528 | \item{'create_time': }{The date and time a resource is added to the cache.} 529 | \item{'access_time': }{The date and time a resource is utilized 530 | within the cache. The access time is updated when the resource 531 | is updated or accessed} 532 | \item{'rpath': }{resource path. This is the path to the local 533 | (on-disk) file} 534 | \item{'rtype': }{resource type. Either "relative", "local", or 535 | "web", indicating if the resource has a remote origin} 536 | \item{'fpath': }{If rtype is "web", this is the link to the 537 | remote resource. It will be utilized to download or update the 538 | remote data} 539 | \item{'last_modified_time': }{For a remote resource, the 540 | last_modified (if available) information for the local copy of 541 | the data. This information is checked against the remote 542 | resource to determine if the local copy is stale and needs to 543 | be updated} 544 | } 545 | 546 | All functions have a quick implementation where if the BiocFileCache object 547 | is not passed as an argument, the function uses default 'BiocFileCache()' for 548 | implementation. e.g 'bfcinfo()' can be used instead of 549 | 'bfcinfo(BiocFileCache())'. The only function this is not available for is 550 | 'bfcmeta()<-'; The BiocFileCache object must be defined as a varaible and 551 | passed as an argument. See vignette("BiocFileCache") for more details. 552 | } 553 | \section{Methods (by generic)}{ 554 | \itemize{ 555 | \item \code{bfccache(BiocFileCacheBase)}: Get the location of the on-disk cache. 556 | 557 | \item \code{length(BiocFileCacheBase)}: Get the number of objects in the file 558 | cache. 559 | 560 | \item \code{bfcrid(BiocFileCacheReadOnly)}: Get the rids of the object. 561 | 562 | \item \code{x[i}: Subset a BiocFileCache object. 563 | 564 | \item \code{x[[i}: Get a file path for select resources from 565 | the cache. 566 | 567 | \item \code{`[[`(x = BiocFileCache, i = character, j = missing) <- value}: Set the file path of selected resources 568 | from the cache. 569 | 570 | \item \code{bfcnew(BiocFileCache)}: Add a resource to the database 571 | 572 | \item \code{bfcadd(BiocFileCache)}: Add an existing resource to the database 573 | 574 | \item \code{bfcinfo(BiocFileCacheBase)}: list resources in database 575 | 576 | \item \code{bfcrid(tbl_bfc)}: Get the rids of the object 577 | 578 | \item \code{bfcpath(BiocFileCacheBase)}: display rpaths of resource. 579 | 580 | \item \code{bfcrpath(BiocFileCacheBase)}: display rpath of resource. If 'rnames' is 581 | in the cache the path is returned, if it is not it will try to 582 | add it to the cache with 'bfcadd' 583 | 584 | \item \code{bfcupdate(BiocFileCache)}: Update a resource in the cache 585 | 586 | \item \code{bfcmeta(BiocFileCacheBase) <- value}: add meta data table in database 587 | 588 | \item \code{bfcmetaremove(BiocFileCacheBase)}: remove meta data table in database 589 | 590 | \item \code{bfcmetalist(BiocFileCacheBase)}: retrieve listing of metadata tables 591 | 592 | \item \code{bfcmeta(BiocFileCacheBase)}: retrieve metadata table 593 | 594 | \item \code{bfcquerycols(BiocFileCacheBase)}: Get all the possible columns to query 595 | 596 | \item \code{bfcquery(BiocFileCacheBase)}: query resource 597 | 598 | \item \code{bfccount(BiocFileCacheBase)}: Get the number of objects in the file 599 | cache or query. 600 | 601 | \item \code{bfcneedsupdate(BiocFileCacheBase)}: check if a resource needs to be updated 602 | 603 | \item \code{bfcdownload(BiocFileCache)}: Redownload resource to location in cache 604 | 605 | \item \code{bfcremove(BiocFileCache)}: Remove a resource to the database. If 606 | the local file is located in \code{bfccache(x)}, the file will 607 | also be deleted. This will not delete information in any metadata 608 | table. 609 | 610 | \item \code{bfcsync(BiocFileCache)}: sync cache and resource. 611 | 612 | \item \code{exportbfc(BiocFileCacheBase)}: Create exportable file containing 613 | BiocFileCache. 614 | 615 | \item \code{importbfc(character)}: Import file created with exportbfc containing 616 | BiocFileCache. 617 | 618 | \item \code{cleanbfc(BiocFileCache)}: Remove old/unused files in 619 | BiocFileCache. If file to be removed is not in the bfccache 620 | location it will not be deleted. Setting \code{days=-Inf} 621 | will remove all cached files. 622 | 623 | \item \code{removebfc(BiocFileCache)}: Completely remove the BiocFileCache 624 | 625 | \item \code{show(BiocFileCacheBase)}: Display a \code{BiocFileCache} instance. 626 | 627 | }} 628 | \examples{ 629 | # bfc <- BiocFileCache() # global cache 630 | # bfc 631 | bfc0 <- BiocFileCache(tempfile()) # temporary catch for examples 632 | bfccache(bfc0) 633 | length(bfc0) 634 | path <- bfcnew(bfc0, "NewResource") 635 | path 636 | fl1 <- tempfile(); file.create(fl1) 637 | bfcadd(bfc0, "Test1", fl1) # copy 638 | fl2 <- tempfile(); file.create(fl2) 639 | bfcadd(bfc0, "Test2", fl2, action="move") # move 640 | fl3 <- tempfile(); file.create(fl3) 641 | add3 <- bfcadd(bfc0, "Test3", fl3, rtype="local", action="asis") # reference 642 | rid3 <- names(add3) 643 | 644 | bfc0 645 | file.exists(fl1) # TRUE 646 | file.exists(fl2) # FALSE 647 | file.exists(fl3) # TRUE 648 | 649 | # add a remote resource 650 | url <- "https://httpbin.org/get" 651 | bfcadd(bfc0, "TestWeb", fpath=url) 652 | bfcinfo(bfc0) 653 | bfcpath(bfc0, rid3) 654 | bfcrpath(bfc0, rids = rid3) 655 | bfcupdate(bfc0, rid3, rpath=fl3, rname="NewRname") 656 | bfc0[[rid3]] = fl1 657 | bfcupdate(bfc0, "BFC5", fpath="http://google.com") 658 | meta = data.frame(list(rid = paste("BFC",seq_len(bfccount(bfc0)), sep=""), 659 | num=seq(bfccount(bfc0),1,-1), 660 | data=c(paste("Letter", 661 | letters[seq_len(bfccount(bfc0))]))), 662 | stringsAsFactors=FALSE) 663 | bfcmeta(bfc0, name="resourcedata") <- meta 664 | \dontrun{bfcmetaremove(bfc0, "resourcedata")} 665 | bfcmetalist(bfc0) 666 | tbl = bfcmeta(bfc0, "resourcedata") 667 | tbl 668 | bfcquerycols(bfc0) 669 | bfcquery(bfc0, "Test") 670 | bfcquery(bfc0, "^Test1$", field="rname") 671 | bfccount(bfc0) 672 | bfccount(bfcquery(bfc0, "test")) 673 | bfcneedsupdate(bfc0, "BFC5") 674 | bfcdownload(bfc0, "BFC5") 675 | bfcremove(bfc0, rid3) 676 | bfcinfo(bfc0) 677 | bfcsync(bfc0) 678 | 679 | if (!interactive()){ 680 | # in interactive mode, in the sync above 681 | # this was probably already removed 682 | # noninteractive mode does not remove resources 683 | # so can remove manually here 684 | bfcremove(bfc0, "BFC1") 685 | } 686 | bfcsync(bfc0, FALSE) 687 | \dontrun{exportbfc(bfc)} 688 | \dontrun{importbfc("ExportBiocFileCache.tar")} 689 | \dontrun{cleanbfc(bfc, ask=FALSE)} 690 | \dontrun{removebfc(bfc, ask=FALSE)} 691 | } 692 | -------------------------------------------------------------------------------- /man/makeBiocFileCacheFromDataFrame.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/makeBiocFileCacheFromDataFrame.R 3 | \name{makeBiocFileCacheFromDataFrame} 4 | \alias{makeBiocFileCacheFromDataFrame} 5 | \alias{makeBiocFileCacheFromDataFrame,ANY-method} 6 | \title{Make BiocFileCache objects from an existing data.frame} 7 | \usage{ 8 | makeBiocFileCacheFromDataFrame( 9 | df, 10 | cache, 11 | actionLocal = c("move", "copy", "asis"), 12 | actionWeb = c("move", "copy"), 13 | metadataName, 14 | ..., 15 | ask = TRUE 16 | ) 17 | 18 | \S4method{makeBiocFileCacheFromDataFrame}{ANY}( 19 | df, 20 | cache, 21 | actionLocal = c("move", "copy", "asis"), 22 | actionWeb = c("move", "copy"), 23 | metadataName, 24 | ..., 25 | ask = TRUE 26 | ) 27 | } 28 | \arguments{ 29 | \item{df}{data.frame or tibble to convert} 30 | 31 | \item{cache}{character(1) On-disk location (directory path) of 32 | cache. For default location see 33 | \code{\link[tools]{R_user_dir}}.} 34 | 35 | \item{actionLocal}{If local copy of file should be moved, copied or 36 | left in original location. See 'action' param of bfcadd.} 37 | 38 | \item{actionWeb}{If a local copy of a remote resource already 39 | exists, should the file be copied or moved to the 40 | cache. Locally downloaded remote resources must exist in the 41 | cache location.} 42 | 43 | \item{metadataName}{If there are additional columns of data in the 44 | original data.frame besides required BiocFileCache columns, 45 | this data will be added as a metadata table with this name.} 46 | 47 | \item{...}{additional arguments passed to `file.copy()`.} 48 | 49 | \item{ask}{logical(1) Confirm creation of BiocFileCache.} 50 | } 51 | \value{ 52 | A BiocFileCache object 53 | } 54 | \description{ 55 | If there are a lot of resources being added this could take some 56 | time but if a cache is saved in a permanent location this should 57 | only have to be run once. The original data.frame must have the 58 | required columns 'rtype', 'fpath', and 'rpath'; See the vignette 59 | for more information on the expected information contained in these 60 | columns. Similarly, the optional columns 'rname', 'etag', 61 | 'last_modified_time', and 'expires' may be included. Any additional columns 62 | not listed as required or optional will be kept as an additional 63 | metadata table in the BiocFileCache database. 64 | } 65 | -------------------------------------------------------------------------------- /man/makeCachedActiveBinding.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/makeCachedActiveBinding.R 3 | \name{makeCachedActiveBinding} 4 | \alias{makeCachedActiveBinding} 5 | \title{makeCachedActiveBinding} 6 | \usage{ 7 | makeCachedActiveBinding(sym, fun, env = .GlobalEnv, verbose = FALSE) 8 | } 9 | \arguments{ 10 | \item{sym}{See \code{\link{makeActiveBinding}} in the \pkg{base} 11 | package.} 12 | 13 | \item{fun}{See \code{\link{makeActiveBinding}} in the \pkg{base} 14 | package.} 15 | 16 | \item{env}{See \code{\link{makeActiveBinding}} in the \pkg{base} 17 | package.} 18 | 19 | \item{verbose}{Set to TRUE to see caching in action (useful for 20 | troubleshooting).} 21 | } 22 | \description{ 23 | Like \code{\link{makeActiveBinding}} but the value of the active 24 | binding gets only evaluated once and is "remembered". 25 | } 26 | \examples{ 27 | makeCachedActiveBinding("x", function() runif(1), verbose=TRUE) 28 | x 29 | x 30 | } 31 | -------------------------------------------------------------------------------- /tests/testthat.R: -------------------------------------------------------------------------------- 1 | library(testthat) 2 | library(BiocFileCache) 3 | 4 | test_check("BiocFileCache") 5 | -------------------------------------------------------------------------------- /tests/testthat/test_BiocFileCache_class.R: -------------------------------------------------------------------------------- 1 | context("BiocFileCache_class") 2 | 3 | test_that("BiocFileCache creation works", { 4 | bfc <- BiocFileCache(tempfile(), ask = FALSE) 5 | expect_true(file.exists(bfccache(bfc))) 6 | 7 | # test that sql file also gets created 8 | expect_true(file.exists(file.path(bfccache(bfc), "BiocFileCache.sqlite"))) 9 | removebfc(bfc, ask=FALSE) 10 | 11 | fl <- tempfile() 12 | bfc <- BiocFileCache(fl, ask = FALSE) 13 | expect_true(file.exists(fl)) 14 | }) 15 | 16 | test_that("bfcadd and bfcnew works", { 17 | bfc <- BiocFileCache(tempfile(), ask = FALSE) 18 | fl <- tempfile(); file.create(fl) 19 | expect_identical(length(bfc), 0L) 20 | expect_identical(bfccount(bfcinfo(bfc)), 0L) 21 | 22 | # test file add and copy 23 | rid <- bfcadd(bfc, 'test-1', fl) 24 | expect_identical(length(bfc), 1L) 25 | expect_identical(bfccount(bfcinfo(bfc)), 1L) 26 | expect_true(file.exists(fl)) 27 | # test that fname used unique identifier 28 | expect_false(basename(fl) == basename(rid)) 29 | 30 | # test file add and location not in cache 31 | path <- bfcadd(bfc, 'test-2', fl, rtype='local', action='asis') 32 | rid <- names(path) 33 | expect_identical(length(bfc), 2L) 34 | expect_true(file.exists(fl)) 35 | expect_identical(bfc[[rid]], setNames(fl, rid)) 36 | 37 | # test file add and move 38 | rid <- bfcadd(bfc, 'test-3', fl, action='move') 39 | expect_identical(length(bfc), 3L) 40 | expect_true(!file.exists(fl)) 41 | 42 | # test add web resource 43 | url <- "https://httpbin.org/get" 44 | path <- bfcadd(bfc, 'test-4', url, rtype="web") 45 | rid <- names(path) 46 | expect_identical(length(bfc), 4L) 47 | expect_true(file.exists(bfc[[rid]])) 48 | 49 | # test add new (return path to save) 50 | path <- bfcnew(bfc, 'test-5') 51 | expect_identical(length(bfc), 5L) 52 | expect_identical(bfccount(bfcinfo(bfc)), 5L) 53 | expect_true(!file.exists(path)) 54 | expect_identical(bfc[[names(path)]], path) 55 | 56 | # test out of bounds and file not found 57 | expect_error(bfc[[7]]) 58 | suppressWarnings(expect_error(bfcadd( 59 | bfc, 'test-6', "https://httpbin.org/status/404", rtype="web" 60 | ))) 61 | expect_error(bfcadd(bfc, 'test-2', fl, rtype='local', action='asis')) 62 | 63 | # test no fpath given 64 | url <- "https://httpbin.org/get" 65 | path <- bfcadd(bfc, url) 66 | expect_identical(.sql_get_fpath(bfc,names(path)), 67 | .sql_get_rname(bfc,names(path))) 68 | 69 | # test web resource not download 70 | url <- "https://httpbin.org/get" 71 | path <- bfcadd(bfc, 'test-noDownload', url, rtype="web", download=FALSE) 72 | rid <- names(path) 73 | expect_identical(length(bfc), 7L) 74 | expect_false(file.exists(bfc[[rid]])) 75 | expect_true(is.na(.sql_get_last_modified(bfc, rid))) 76 | 77 | # test relative paths 78 | path <- bfcnew(bfc, "relative-test", "relative") 79 | expect_identical( 80 | .sql_get_rtype(bfc,names(path)), setNames("relative", names(path)) 81 | ) 82 | temp <- file.path(bfccache(bfc), .sql_get_field(bfc,names(path), "rpath")) 83 | expect_identical( 84 | .sql_get_rpath(bfc,names(path)), setNames(temp, names(path)) 85 | ) 86 | basename <- strsplit( 87 | .sql_get_field(bfc, names(path), "rpath"), 88 | split="_" 89 | )[[1]][2] 90 | expect_identical( 91 | .sql_get_fpath(bfc,names(path)), setNames(basename, names(path)) 92 | ) 93 | 94 | 95 | fl <- tempfile(); file.create(fl) 96 | path <- bfcadd(bfc, fl, rtype = "relative") 97 | expect_identical( 98 | .sql_get_rtype(bfc,names(path)), setNames("relative", names(path)) 99 | ) 100 | 101 | temp <- file.path(bfccache(bfc), .sql_get_field(bfc,names(path), "rpath")) 102 | expect_identical( 103 | .sql_get_rpath(bfc,names(path)), setNames(temp, names(path)) 104 | ) 105 | expect_true(file.exists(fl)) 106 | 107 | path <- bfcadd(bfc, fl, rtype = "relative", action="move") 108 | expect_identical( 109 | .sql_get_rtype(bfc,names(path)), setNames("relative", names(path)) 110 | ) 111 | 112 | temp <- file.path(bfccache(bfc), .sql_get_field(bfc,names(path), "rpath")) 113 | expect_identical( 114 | .sql_get_rpath(bfc,names(path)), setNames(temp, names(path)) 115 | ) 116 | expect_true(!file.exists(fl)) 117 | 118 | fl <- tempfile(); file.create(fl) 119 | expect_warning(bfcadd(bfc, fl, rtype = "relative", action="asis")) 120 | 121 | # test fname arguments 122 | fl_exact = tempfile(fileext=".bam"); file.create(fl_exact) 123 | rid <- bfcadd(bfc, fl_exact, fname="exact") 124 | expect_identical(basename(fl_exact), basename(rid)) 125 | rid2 <- bfcadd(bfc, fl_exact, fname="unique") 126 | expect_false(basename(fl_exact) == basename(rid2)) 127 | 128 | }) 129 | 130 | test_that("bfcadd() works for multiple inserts", { 131 | bfc <- BiocFileCache(tempfile(), ask = FALSE) 132 | fpath <- replicate(6L, tempfile()) 133 | file.create(fpath) 134 | rname <- letters[seq_along(fpath)] 135 | 136 | rpath <- bfcadd(bfc, rname[1:2], fpath[1:2], action = "asis") 137 | expect_identical(rpath, setNames(fpath[1:2], names(rpath))) 138 | 139 | rpath <- bfcadd(bfc, rname[3], fpath[3], action = "asis") 140 | expect_identical(rpath, setNames(fpath[3], names(rpath))) 141 | 142 | rpath <- bfcadd(bfc, rname[4:5], fpath[4:5]) 143 | expect_identical(names(rpath), paste0("BFC", 4:5)) 144 | expect_true(all(file.exists(rpath))) 145 | 146 | rpath <- bfcadd(bfc, rname[6], fpath[6]) 147 | expect_identical(names(rpath), paste0("BFC", 6)) 148 | expect_true(all(file.exists(rpath))) 149 | }) 150 | 151 | test_that("bfcnew() works for multiple inserts", { 152 | bfc <- BiocFileCache(tempfile(), ask = FALSE) 153 | 154 | rnames <- paste0("foo", 1:2) 155 | rpath <- bfcnew(bfc, rnames) 156 | expect_identical(bfcinfo(bfc)$rname, rnames) 157 | 158 | rnames <- "foo3" 159 | rpath <- bfcnew(bfc, rnames, ext=".foo3") 160 | expect_identical(tools::file_ext(rpath), "foo3") 161 | 162 | rnames <- paste0("foo", 4:5) 163 | rpath <- bfcnew(bfc, rnames, ext=".foo4") 164 | expect_identical(tools::file_ext(rpath), rep("foo4", 2)) 165 | 166 | rnames <- paste0("foo", 6:7) 167 | ext <- paste0(".", rnames) 168 | rpath <- bfcnew(bfc, rnames, ext=ext) 169 | expect_identical(tools::file_ext(rpath), rnames) 170 | }) 171 | 172 | # 173 | # construct bfc for further test, avoiding construction in each 174 | # 175 | bfc <- BiocFileCache(tempfile(), ask = FALSE) 176 | fl <- tempfile(); file.create(fl) 177 | add1 <- bfcadd(bfc, 'test-1', fl) 178 | rid1 <- names(add1) 179 | add2 <- bfcadd(bfc, 'test-2', fl, rtype='local', action='asis') 180 | rid2 <- names(add2) 181 | url <- "https://httpbin.org/get" 182 | add3 <- bfcadd(bfc, 'test-3', url, rtype="web") 183 | rid3 <- names(add3) 184 | path <- bfcnew(bfc, 'test-4') 185 | rid4 <- names(path) 186 | url <- "https://httpbin.org/get" 187 | add5 <- bfcadd(bfc, 'test-5', url, rtype="web", download=FALSE) 188 | rid5 <- names(add5) 189 | 190 | test_that("bfcinfo works", { 191 | # print all 192 | expect_identical(dim(as.data.frame(bfcinfo(bfc))), 193 | c(5L, 10L)) 194 | expect_is(bfcinfo(bfc), "tbl_df") 195 | # print subset 196 | expect_identical(dim(as.data.frame(bfcinfo(bfc, paste0("BFC", 1:3)))), 197 | c(3L, 10L)) 198 | # print one found and one not found 199 | expect_error(bfcinfo(bfc, c(1, 6))) 200 | 201 | # index not found 202 | expect_error(bfcinfo(bfc, 6)) 203 | 204 | # check rpaths updated 205 | expect_identical(bfcinfo(bfc)[["rpath"]], unname(bfcrpath(bfc))) 206 | }) 207 | 208 | test_that("bfcpath and bfcrpath works", { 209 | # local file 210 | expect_identical(length(bfcpath(bfc, rid1)), 1L) 211 | expect_identical(names(bfcpath(bfc, rid1)), as.character(rid1)) 212 | expect_identical(bfcpath(bfc, rid1), bfcrpath(bfc, rids=rid1)) 213 | 214 | # web file 215 | expect_identical(length(bfcpath(bfc, rid3)), 1L) 216 | expect_identical(names(bfcpath(bfc, rid3)), as.character(rid3)) 217 | expect_identical(bfcpath(bfc, rid3), bfcrpath(bfc, rids=rid3)) 218 | 219 | # index not found 220 | expect_error(bfcpath(bfc, 6)) 221 | expect_error(bfcrpath(bfc, rids=6)) 222 | 223 | # expect error 224 | expect_error(bfcrpath(bfc, rnames="testweb", rids="BFC5")) 225 | 226 | # multiple files 227 | expect_identical(length(bfcrpath(bfc, rids=paste0("BFC", 1:3))), 3L) 228 | expect_identical(length(bfcrpath(bfc)), 5L) 229 | expect_identical(length(bfcpath(bfc)), length(bfc)) 230 | expect_identical(length(bfcpath(bfc, rids=paste0("BFC", 1:3))), 3L) 231 | expect_identical(length(bfcpath(bfc)), length(bfcrpath(bfc))) 232 | 233 | # test bfcrpath with rname 234 | expect_identical(length(bfcrpath(bfc, c("test-1", "test-3"))), 2L) 235 | suppressWarnings(expect_error(bfcrpath(bfc, "test"))) 236 | url = "https://en.wikipedia.org/wiki/Bioconductor" 237 | suppressWarnings(expect_error(bfcrpath(bfc, c("test-1",url, "notworking")))) 238 | expect_identical(length(bfcrid(bfc)), 5L) 239 | expect_identical(length(bfcrpath(bfc, c("test-1", url, "test-3"))), 3L) 240 | expect_identical(length(bfcrid(bfc)), 6L) 241 | expect_identical(bfccount(bfcinfo(bfc)), 6L) 242 | expect_identical(unname(.sql_get_field(bfc, "BFC7", "rname")), url) 243 | expect_identical(unname(.sql_get_fpath(bfc, "BFC7")), url) 244 | }) 245 | 246 | test_that("bfcquery, bfcrpath allow regular expressions and exact matches", { 247 | bfc <- BiocFileCache(tempfile(), ask = FALSE) 248 | file.create(fl <- tempfile()) 249 | fl1 <- bfcadd(bfc, "fl1", fl) 250 | fl10 <- bfcadd(bfc, "fl10", fl) 251 | ## bfcquery 252 | expect_identical(NROW(bfcquery(bfc, "fl1", "rname")), 2L) 253 | expect_identical(NROW(bfcquery(bfc, "fl1", "rname", exact = TRUE)), 1L) 254 | expect_identical(NROW(bfcquery(bfc, "fl", "rname", exact = TRUE)), 0L) 255 | ## bfcrpath 256 | expect_error( 257 | suppressWarnings(bfcrpath(bfc, "fl1", exact = FALSE)), 258 | "not all 'rnames' found or unique." 259 | ) 260 | expect_identical(bfcrpath(bfc, "fl1", exact = TRUE), fl1) 261 | expect_identical(bfcrpath(bfc, "fl1$", exact = FALSE), fl1) 262 | }) 263 | 264 | test_that("check_rtype works", { 265 | fun <- .util_standardize_rtype_helper 266 | 267 | # test web types 268 | expect_identical(fun("auto", "http://somepath.com"), "web") 269 | expect_identical(fun("auto", "ftp://somepath.com"), "web") 270 | expect_identical(fun("local", "https://some.path", "web"), "local") 271 | expect_identical(fun("relative", "https://some.path", "web"), "relative") 272 | 273 | # test not web type 274 | expect_identical(fun("auto", "not/a/web/path", "copy"), "relative") 275 | expect_identical(fun("auto", "not/a/web/path", "move"), "relative") 276 | expect_identical(fun("auto", "not/a/web/path", "asis"), "local") 277 | 278 | # expect noopt 279 | expect_identical(fun("local", "some/path", "copy"), "local") 280 | expect_identical(fun("local", "some/path", "move"), "local") 281 | expect_identical(fun("local", "some/path", "asis"), "local") 282 | 283 | expect_identical(fun("relative", "some/path", "copy"), "relative") 284 | expect_identical(fun("relative", "some/path", "move"), "relative") 285 | expect_warning(fun("relative", "some/path", "asis")) 286 | suppressWarnings({ 287 | expect_identical(fun("relative", "some/path", "asis"), "local") 288 | }) 289 | }) 290 | 291 | test_that("subsetting works", { 292 | # out of bounds 293 | expect_error(bfc[3:5]) 294 | expect_error(bfc[10]) 295 | 296 | # empty 297 | bfcsub3 <- bfc[] 298 | expect_identical(length(bfcsub3), length(bfc)) 299 | subin <- as.data.frame( 300 | bfcinfo(bfcsub3)[,-which(names(bfcinfo(bfcsub3)) == "access_time")]) 301 | bfcin <- as.data.frame( 302 | bfcinfo(bfc)[,-which(names(bfcinfo(bfc)) == "access_time")]) 303 | expect_identical(subin, bfcin) 304 | 305 | # test restricted methods on subset 306 | expect_error(bfcnew(bfcsub3)) 307 | expect_error(bfcadd(bfcsub3)) 308 | expect_error(bfcupdate(bfcsub3, rname="test")) 309 | fltemp <- tempfile(); file.create(fltemp) 310 | expect_error(bfcsub3[[2]] <- fltemp) 311 | }) 312 | 313 | test_that("bfcupdate works", { 314 | # test [[<-, only updates rpath 315 | fl2 <- tempfile(); file.create(fl2) 316 | bfc[[rid2]] <- fl2 317 | expect_identical(unname(bfcpath(bfc, rid2)), fl2) 318 | expect_error(bfc[[rid1]] <- "A/file/doesnt/work") 319 | 320 | # test errors, files not found 321 | expect_error(bfcupdate(bfc, rid2, fpath="rid2/local/notweb", ask=FALSE)) 322 | suppressWarnings(expect_error(bfcupdate( 323 | bfc, rid3, fpath="https://httpbin.org/status/404", ask=FALSE 324 | ))) 325 | expect_error(bfcupdate(bfc, rid2, rpath="path/not/valid", ask=FALSE)) 326 | 327 | # test update fpath and rname 328 | link = "https://en.wikipedia.org/wiki/Bioconductor" 329 | suppressWarnings(bfcupdate( 330 | bfc, rid3, fpath=link, rname="prepQuery", ask=FALSE 331 | )) 332 | vl <- as.character(unname(as.data.frame( 333 | bfcinfo(bfc,rid3))[c("rname", "fpath")])) 334 | expect_identical(vl, c("prepQuery", link)) 335 | time <- as.data.frame(bfcinfo(bfc,rid3))$last_modified_time 336 | expect_identical(time, 337 | .httr_get_cache_info(link)[["modified"]]) 338 | 339 | # test rpath update and give second query example 340 | suppressWarnings(bfcupdate(bfc, rid1, rpath=fl2, rname="prepQuery2")) 341 | expect_identical(unname(bfcpath(bfc, rid1)), fl2) 342 | 343 | # test error 344 | expect_error(bfcupdate(bfc, c(rid2, rid1), rname="oneName")) 345 | expect_error(bfcupdate(bfc, 1:7)) 346 | }) 347 | 348 | test_that("bfcmeta works", { 349 | meta <- data.frame( 350 | rid=paste("BFC", seq_len(bfccount(bfc)), sep=""), 351 | num=seq(bfccount(bfc), 1, -1), 352 | data=c(paste("Letter", letters[seq_len(bfccount(bfc))])), 353 | stringsAsFactors=FALSE 354 | ) 355 | 356 | # test no meta 357 | expect_identical(bfcmetalist(bfc), character(0)) 358 | expect_identical(names(bfcinfo(bfc)),bfcquerycols(bfc)) 359 | 360 | # try add meta with bad rid 361 | expect_error(bfcmeta(bfc, name="resourcedata") <- meta) 362 | # add valid 363 | meta$rid[6] = "BFC7" 364 | metaOrig = meta 365 | bfcmeta(bfc, name="resourcedata") <- meta 366 | expect_identical(bfcmetalist(bfc),"resourcedata") 367 | expect_true("resourcedata" %in% bfcmetalist(bfc)) 368 | 369 | # add additional 370 | bfcmeta(bfc, name="table2") <- meta 371 | expect_identical(length(bfcmetalist(bfc)), 2L) 372 | expect_true("table2" %in% bfcmetalist(bfc)) 373 | 374 | # try and add same table name 375 | meta$num = seq(1, bfccount(bfc), 1) 376 | expect_error(bfcmeta(bfc, name="table2") <- meta) 377 | bfcmeta(bfc, name="table2", overwrite=TRUE) <- meta 378 | expect_identical(length(bfcmetalist(bfc)), 2L) 379 | 380 | # try and add reserved table name 381 | expect_error(bfcmeta(bfc, name="metadata") <- meta) 382 | 383 | # try and add with reserved col name 384 | names(meta)[2] = "rpath" 385 | expect_error(bfcmeta(bfc, name="table3") <- meta) 386 | 387 | # try add meta with missing column rid 388 | names(meta)[1:2] = c("id", "num") 389 | expect_error(bfcmeta(bfc, name="table3") <- meta) 390 | 391 | # remove table 392 | bfcmetaremove(bfc, "table2") 393 | expect_identical(length(bfcmetalist(bfc)), 1L) 394 | expect_true(!("table2" %in% bfcmetalist(bfc))) 395 | 396 | # try and remove reserved table 397 | expect_error(bfcmetaremove(bfc, "metadata")) 398 | 399 | # retrieve table 400 | metaGet <- bfcmeta(bfc, "resourcedata") 401 | expect_true(all(metaGet == metaOrig)) 402 | 403 | # retrieve bad table 404 | expect_error(bfcmeta(bfc, "table2")) 405 | 406 | # querycols should include meta columns 407 | expect_true(all(names(metaOrig) %in% bfcquerycols(bfc))) 408 | expect_identical(names(bfcinfo(bfc)),bfcquerycols(bfc)) 409 | 410 | }) 411 | 412 | test_that("bfcquery and bfccount works", { 413 | 414 | # test count 415 | expect_identical(bfccount(bfc), bfccount(bfcinfo(bfc))) 416 | expect_identical(bfccount(bfc), length(bfc)) 417 | 418 | # query found 419 | q1 <- as.data.frame(bfcquery(bfc, "prep")) 420 | expect_identical(dim(q1)[1], 2L) 421 | expect_identical(q1$rid, c(rid1,rid3)) 422 | 423 | # test query on fpath 424 | q2 <- as.data.frame(bfcquery(bfc, "wiki")) 425 | expect_identical(dim(q2)[1], 2L) 426 | q2b <- as.data.frame(bfcquery(bfc, "wiki", field="fpath")) 427 | q2 <- q2[,-which(names(q2) == "access_time")] 428 | q2b <- q2b[,-which(names(q2b) == "access_time")] 429 | expect_true(all(q2 == q2b, na.rm=TRUE)) 430 | 431 | # query not found 432 | expect_identical(bfccount(bfcquery(bfc, "nothere")), 0L) 433 | 434 | # multiple value all found 435 | path <- file.path(bfccache(bfc), "myFile") 436 | file.create(path) 437 | bfc[[rid2]] <- path 438 | q3 <- as.data.frame(bfcquery(bfc, c("test-2", "myF"))) 439 | expect_identical(dim(q3)[1], 1L) 440 | expect_identical(q3$rid, rid2) 441 | 442 | # multi value some not found 443 | expect_identical(bfccount(bfcquery(bfc, c("prep", "not"))), 0L) 444 | 445 | # test case sensitive 446 | q3 <- as.data.frame(bfcquery(bfc, c("test-2", "myf"))) 447 | expect_identical(dim(q3)[1], 0L) 448 | q3 <- as.data.frame(bfcquery(bfc, c("test-2", "myf"), ignore.case=TRUE)) 449 | expect_identical(dim(q3)[1], 1L) 450 | 451 | # test exact 452 | q4 <- as.data.frame(bfcquery(bfc, "^test-4$")) 453 | expect_identical(dim(q4)[1], 1L) 454 | 455 | }) 456 | 457 | test_that("bfcneedsupdate works", { 458 | # test not web source 459 | expect_error(bfcneedsupdate(bfc, rid4)) 460 | # test out of bounds 461 | expect_error(bfcneedsupdate(bfc, 7)) 462 | 463 | # test expires and last modified not available 464 | link = "https://httpbin.org/get" 465 | bfcupdate(bfc, rid3, fpath=link, ask=FALSE) 466 | expect_true(is.na(bfcneedsupdate(bfc, rid3))) 467 | expect_true(is.na(as.data.frame(bfcinfo(bfc,rid3))$last_modified_time)) 468 | expect_true(is.na(as.data.frame(bfcinfo(bfc,rid3))$expires)) 469 | 470 | # remove those that aren't web 471 | expect_identical( 472 | length(bfcneedsupdate(bfc)), 473 | length(.get_all_web_rids(bfc)) 474 | ) 475 | expect_identical( 476 | names(bfcneedsupdate(bfc)), 477 | as.character(.get_all_web_rids(bfc)) 478 | ) 479 | 480 | # test non downloaded is TRUE 481 | expect_true(bfcneedsupdate(bfc, rid5)) 482 | 483 | # test etag available and check order 484 | link = "https://www.wikipedia.org/" 485 | bfcupdate(bfc, rid3, fpath=link, ask=FALSE) 486 | cache_info <- .httr_get_cache_info(link) 487 | expect_identical(as.data.frame(bfcinfo(bfc,rid3))$last_modified_time, 488 | cache_info[["modified"]]) 489 | expect_identical(as.data.frame(bfcinfo(bfc,rid3))$etag, 490 | cache_info[["etag"]]) 491 | expect_true(!is.na(bfcneedsupdate(bfc, rid3))) 492 | expect_true(!is.na(as.data.frame(bfcinfo(bfc,rid3))$etag)) 493 | # wiki has expires so manually set to NA for testing 494 | .sql_set_expires(bfc, rid3, NA_character_) 495 | expect_false(bfcneedsupdate(bfc, rid3)) 496 | .sql_set_etag(bfc, rid3, "somethingElse") 497 | expect_true(bfcneedsupdate(bfc, rid3)) 498 | .sql_set_etag(bfc, rid3, NA_character_) 499 | expect_false(bfcneedsupdate(bfc, rid3)) 500 | .sql_set_last_modified(bfc, rid3, 501 | as.character(as.Date(.sql_get_last_modified(bfc, rid3)) - 1)) 502 | expect_true(bfcneedsupdate(bfc, rid3)) 503 | 504 | # maually test expires 505 | .sql_set_expires(bfc, rid3, as.character(as.Date(Sys.time()) + 2)) 506 | .sql_set_etag(bfc, rid3, NA_character_) 507 | .sql_set_last_modified(bfc, rid3,NA_character_) 508 | expect_true(is.na(bfcneedsupdate(bfc, rid3))) 509 | .sql_set_expires(bfc, rid3, as.character(as.Date(Sys.time()) -1)) 510 | expect_true(bfcneedsupdate(bfc, rid3)) 511 | }) 512 | 513 | test_that("bfcdownload works", { 514 | response <- .biocfilecache_flags$set_ask_response(FALSE) 515 | 516 | time1 <- file.info(.sql_get_rpath(bfc, rid3))[["ctime"]] 517 | temp <- bfcdownload(bfc, rid3, ask=TRUE) 518 | time2 <- file.info(.sql_get_rpath(bfc, rid3))[["ctime"]] 519 | expect_identical(time1, time2) 520 | 521 | temp <- bfcdownload(bfc, rid3, ask=FALSE) 522 | time3 <- file.info(.sql_get_rpath(bfc, rid3))[["ctime"]] 523 | expect_true(time1 < time3) 524 | expect_error(bfcdownload(bfc, rid1)) 525 | 526 | url <- "http://bioconductor.org/packages/stats/bioc/BiocFileCache/BiocFileCache_stats.tab" 527 | headFile <- 528 | function(url, file) 529 | { 530 | dat <- readLines(url) 531 | dat <- head(dat, n=3L) 532 | writeLines(dat, file) 533 | TRUE 534 | } 535 | rid <- names(bfcadd(bfc, rname="testFun", fpath=url, download=FALSE)) 536 | temp <- bfcdownload(bfc, rid, FUN=headFile) 537 | file <- readLines(temp) 538 | expect_identical(length(file), 3L) 539 | 540 | expect_error(bfcdownload(bfc, rid, ask=FALSE, FUN=rnorm)) 541 | 542 | .biocfilecache_flags$set_ask_response(response) 543 | }) 544 | 545 | test_that("exportbfc and importbfc works",{ 546 | 547 | bfc <- BiocFileCache(tempfile(), ask = FALSE) 548 | fl <- tempfile(); file.create(fl) 549 | add1 <- bfcadd(bfc, 'relative', fl) 550 | rid1 <- names(add1) 551 | add2 <- bfcadd(bfc, 'local', fl, rtype='local', action='asis') 552 | rid2 <- names(add2) 553 | url <- "https://httpbin.org/get" 554 | add3 <- bfcadd(bfc, 'web', url, rtype="web") 555 | rid3 <- names(add3) 556 | path <- bfcnew(bfc, 'notfound') 557 | rid4 <- names(path) 558 | url <- "https://httpbin.org/get" 559 | add5 <- bfcadd(bfc, 'webno', url, rtype="web", download=FALSE) 560 | rid5 <- names(add5) 561 | 562 | dirloc <- dirname(bfccache(bfc)) 563 | temploc <- file.path(dirloc, "ExportTest") 564 | dir.create(temploc) 565 | ids <- bfcrid(bfc) 566 | res <- vapply(ids, .util_export_file, character(1), 567 | bfc=bfc, dir=temploc) 568 | expect_identical(length(list.files(temploc)), 2L) 569 | expect_identical(unname(res), 570 | c("relative", "local", "relative", NA_character_, "web")) 571 | .util_unlink(temploc, recursive=TRUE) 572 | expect_false(file.exists(file.path(dirloc, "BFCExport.tar"))) 573 | file <- exportbfc(bfc, outputFile=file.path(dirloc, "BFCExport.tar"), 574 | verbose=FALSE) 575 | expect_true(file.exists(file.path(dirloc, "BFCExport.tar"))) 576 | expect_false(dir.exists("BiocFileCacheExport")) 577 | bfc2 <- importbfc(file, exdir=dirloc) 578 | expect_true(dir.exists(file.path(dirloc,"BiocFileCacheExport"))) 579 | expect_identical(bfccount(bfc2), 4L) 580 | locpath <- file.path(dirloc, "BiocFileCacheExport") 581 | expect_true(file.exists(file.path(locpath,"BiocFileCache.sqlite"))) 582 | expect_identical(length(list.files(locpath)), 4L) 583 | sub <- bfc[c(rid1,rid2)] 584 | .util_unlink(locpath, recursive=TRUE) 585 | file.remove(file) 586 | file <- exportbfc(sub, outputFile=file.path(dirloc, "SubExport.zip"), 587 | verbose=FALSE, outputMethod="zip") 588 | expect_true(file.exists(file.path(dirloc, "SubExport.zip"))) 589 | bfc3 <- importbfc(file, exdir=dirloc, archiveMethod="unzip") 590 | expect_identical(bfccount(bfc3), 2L) 591 | .util_unlink(locpath, recursive=TRUE) 592 | file.remove(file) 593 | removebfc(bfc, ask=FALSE) 594 | }) 595 | 596 | test_that("bfcsync and bfcremove works", { 597 | response <- .biocfilecache_flags$set_ask_response(FALSE) 598 | ## setup 599 | bfc2 <- BiocFileCache(tempfile(), ask = FALSE) 600 | fl <- tempfile(); file.create(fl) 601 | add1 <- bfcadd(bfc2, 'test-1', fl) 602 | rid1 <- names(add1) 603 | add2 <- bfcadd(bfc2, 'test-2', fl, rtype='local', action='asis') 604 | rid2 <- names(add2) 605 | url <- "https://httpbin.org/get" 606 | add3 <- bfcadd(bfc2, 'test-3', url, rtype="web") 607 | rid3 <- names(add3) 608 | path <- bfcnew(bfc2, 'test-4') 609 | rid4 <- names(path) 610 | suppressWarnings(bfcupdate(bfc2, rid1, rpath=add3)) 611 | add5 <- bfcnew(bfc2, "test-5", rtype="relative") 612 | rid5 <- names(add5) 613 | add6 <- bfcadd(bfc2, "test-6", fl, rtype="relative") 614 | rid6 <- names(add6) 615 | 616 | # test sync 617 | expect_message(bfcsync(bfc2)) 618 | expect_false(bfcsync(bfc2, FALSE)) 619 | 620 | bfcremove(bfc2, rid4) 621 | bfcremove(bfc2, rid5) 622 | files <- file.path( 623 | bfccache(bfc2), 624 | setdiff(list.files(bfccache(bfc2)), c("BiocFileCache.sqlite", "BiocFileCache.sqlite.LOCK")) 625 | ) 626 | # normalizePath on windows 627 | # can't across platform - no opt on linux but added hidden (private) 628 | # on mac 629 | paths <- .sql_get_rpath(bfc2, bfcrid(bfc2)) 630 | if (tolower(.Platform$OS.type) == "windows"){ 631 | files = normalizePath(files) 632 | paths = normalizePath(paths) 633 | } 634 | untracked <- setdiff(files, paths) 635 | .util_unlink(untracked) 636 | expect_true(bfcsync(bfc2, FALSE)) 637 | 638 | # test that remove, deletes file if in cache 639 | path <- .sql_get_rpath(bfc2, rid3) 640 | expect_true(file.exists(path)) 641 | bfcremove(bfc2, rid3) 642 | expect_false(file.exists(path)) 643 | 644 | # test remove leaves file if not in cache 645 | path <- .sql_get_rpath(bfc2, rid2) 646 | expect_true(file.exists(path)) 647 | bfcremove(bfc2, rid2) 648 | expect_true(file.exists(path)) 649 | 650 | .biocfilecache_flags$set_ask_response(response) 651 | }) 652 | 653 | test_that("cleanbfc works", { 654 | # can't test functiuon but test helper 655 | expect_true(length(.sql_clean_cache(bfc, 1)) == 0L) 656 | 657 | # manually change access_time so longer than a day 658 | sql<- "UPDATE resource SET access_time = '2016-01-01' WHERE rid = :rid" 659 | .sql_db_execute(bfc, sql, rid = rid1) 660 | expect_identical(.sql_clean_cache(bfc, 1), rid1) 661 | 662 | ## bfclean() works on an empty cache 663 | bfc <- BiocFileCache(tempfile(), ask = FALSE) 664 | expect_identical(character(0), .sql_clean_cache(bfc, 1L)) 665 | }) 666 | 667 | test_that("removebfc works", { 668 | bfc <- BiocFileCache(tempfile(), ask = FALSE) 669 | path <- bfccache(bfc) 670 | expect_true(file.exists(path)) 671 | expect_true(removebfc(bfc, ask=FALSE)) 672 | expect_false(file.exists(path)) 673 | }) 674 | -------------------------------------------------------------------------------- /tests/testthat/test_httr.R: -------------------------------------------------------------------------------- 1 | context("httr2") 2 | 3 | test_that("internal .httr_get_cache_info works", { 4 | 5 | # example neither 6 | url <- "https://httpbin.org/get" 7 | info <- .httr_get_cache_info(url) 8 | expect_true(all(is.na(info))) 9 | expect_identical(length(info), 3L) 10 | expect_identical(names(info), c("etag", "modified", "expires")) 11 | 12 | # example both 13 | url <- "https://www.wikipedia.org/" 14 | info <- .httr_get_cache_info(url) 15 | expect_true(all(!is.na(info))) 16 | expect_identical(length(info), 3L) 17 | expect_identical(names(info), c("etag", "modified", "expires")) 18 | 19 | # example only time 20 | url <- "https://en.wikipedia.org/wiki/Bioconductor" 21 | info <- .httr_get_cache_info(url) 22 | expect_true(is.na(info[["etag"]])) 23 | expect_true(!is.na(info[["modified"]])) 24 | expect_true(!is.na(info[["expires"]])) 25 | expect_identical(length(info), 3L) 26 | expect_identical(names(info), c("etag", "modified", "expires")) 27 | # more tests in bfcneedsupdate 28 | 29 | }) 30 | -------------------------------------------------------------------------------- /tests/testthat/test_makeBiocFileCacheFromDataFrame.R: -------------------------------------------------------------------------------- 1 | context("makeBiocFileCacheFromDataFrame") 2 | 3 | test_that("makeBiocFileCacheFromDataFrame works",{ 4 | 5 | bfc2 <- BiocFileCache(tempfile(), ask = FALSE) 6 | fl <- tempfile(); file.create(fl) 7 | add1 <- bfcadd(bfc2, 'relative', fl) 8 | add2 <- bfcadd(bfc2, 'local', fl, rtype='local', action='asis') 9 | url <- "https://httpbin.org/get" 10 | add3 <- bfcadd(bfc2, 'noDown', url, rtype="web", download=FALSE) 11 | url <- "https://www.wikipedia.org/" 12 | add4 <- bfcadd(bfc2, 'web', url, rtype="web") 13 | fl <- tempfile(); file.create(fl) 14 | add5 <- bfcadd(bfc2, 'localnoexist', fl, rtype='local', action='asis') 15 | file.remove(fl) 16 | 17 | temp = bfcinfo(bfc2) 18 | 19 | # error directory already exists 20 | expect_error(makeBiocFileCacheFromDataFrame( 21 | temp, cache=bfccache(bfc2), actionLocal="copy", actionWeb="copy", 22 | metadataName="resourceMetadata", ask = FALSE 23 | ), "!dir.exists\\(cache\\) is not TRUE") 24 | 25 | newcache <- file.path(dirname(bfccache(bfc2)), "testNEW") 26 | 27 | # error reserved column names 28 | expect_error(makeBiocFileCacheFromDataFrame( 29 | temp, cache=newcache, actionLocal="copy", actionWeb="copy", ask = FALSE 30 | ), "The following are reserved column names:.*") 31 | 32 | names(temp)[1] = "origID" 33 | names(temp)[3] = "origTimeC" 34 | names(temp)[4] = "origTimeA" 35 | 36 | # expect error metadataName missing without default 37 | expect_error(makeBiocFileCacheFromDataFrame( 38 | temp, cache=newcache, actionLocal="copy", actionWeb="copy", ask = FALSE 39 | ), "!missing\\(metadataName\\) is not TRUE") 40 | 41 | 42 | # error relative path 43 | expect_error(makeBiocFileCacheFromDataFrame( 44 | temp, cache=newcache, actionLocal="copy", actionWeb="copy", 45 | metadataName="resourceMetadata", ask = FALSE 46 | )) 47 | 48 | temp= temp[which(temp$rtype != "relative"),] 49 | 50 | # error local file doesn't exist 51 | expect_error(makeBiocFileCacheFromDataFrame( 52 | temp,cache=newcache, actionLocal="copy", actionWeb="copy", 53 | metadataName="resourceMetadata", ask = FALSE 54 | )) 55 | 56 | temp = temp[-which(temp$origID == names(add5)),] 57 | temp$rpath = unname(bfcrpath(bfc2, 58 | rids=as.character(temp$origID))) 59 | 60 | removebfc(bfc2, ask=FALSE) 61 | newbfc <- makeBiocFileCacheFromDataFrame( 62 | temp, cache=newcache, actionLocal="copy", actionWeb="copy", 63 | metadataName="resourceMetadata", ask = FALSE 64 | ) 65 | 66 | expect_identical(length(newbfc), 3L) 67 | expect_identical(length(.get_all_web_rids(newbfc)), 2L) 68 | expect_identical(length(.get_nonrelative_ids(newbfc)), 0L) 69 | # neither web file will be found, only local and sqlite 70 | expect_identical(length(list.files(bfccache(newbfc))), 2L) 71 | expect_identical(ncol(bfcinfo(newbfc)), 13L) 72 | expect_identical(length(bfcmetalist(newbfc)), 1L) 73 | expect_identical(bfcinfo(newbfc)$origID, temp$origID) 74 | expect_identical(bfcinfo(newbfc)$etag, temp$etag) 75 | expect_identical(bfcinfo(newbfc)$fpath, temp$fpath) 76 | expect_true(all(bfcinfo(newbfc)$rpath != temp$rpath)) 77 | 78 | 79 | removebfc(newbfc, ask=FALSE) 80 | names(temp)[2] = "origRname" 81 | names(temp)[8] = "origlmt" 82 | names(temp)[9] = "origetag" 83 | newbfc <- makeBiocFileCacheFromDataFrame( 84 | temp,cache=newcache, actionLocal="copy", actionWeb="copy", 85 | metadataName="resourceMetadata", ask = FALSE 86 | ) 87 | expect_identical(ncol(bfcinfo(newbfc)), 16L) 88 | 89 | removebfc(newbfc, ask=FALSE) 90 | # fail because required not available 91 | names(temp)[5] = "origrpath" 92 | expect_error(makeBiocFileCacheFromDataFrame( 93 | temp, cache=newcache, actionLocal="copy", actionWeb="copy", ask = FALSE 94 | )) 95 | names(temp)[5] = "rpath" 96 | names(temp)[6] = "origrtype" 97 | expect_error(makeBiocFileCacheFromDataFrame(temp,cache=newcache, 98 | actionLocal="copy", actionWeb="copy")) 99 | names(temp)[6] = "rtype" 100 | names(temp)[7] = "origfpath" 101 | expect_error(makeBiocFileCacheFromDataFrame( 102 | temp, cache=newcache, actionLocal="copy", actionWeb="copy", ask = FALSE 103 | )) 104 | names(temp)[7] = "fpath" 105 | temp <- temp[,c("fpath","rpath","rtype")] 106 | newbfc <- makeBiocFileCacheFromDataFrame( 107 | temp, cache=newcache, actionLocal="copy", actionWeb="copy", ask = FALSE 108 | ) 109 | expect_identical(ncol(bfcinfo(newbfc)), 10L) 110 | expect_identical(length(bfcmetalist(newbfc)), 0L) 111 | removebfc(newbfc, ask=FALSE) 112 | }) 113 | -------------------------------------------------------------------------------- /tests/testthat/test_sql.R: -------------------------------------------------------------------------------- 1 | context("sql") 2 | 3 | test_that("schema versioning works", { 4 | bfc <- BiocFileCache(tempfile(), ask = FALSE) 5 | expect_identical(.sql_schema_version(bfc), .CURRENT_SCHEMA_VERSION) 6 | expect_identical(.sql_validate_version(bfc), .CURRENT_SCHEMA_VERSION) 7 | 8 | .sql_migration_update_schema_version(bfc, "0.99.1") 9 | expect_identical(.sql_schema_version(bfc), "0.99.1") 10 | 11 | .sql_migration_update_schema_version(bfc, "0.0.1") 12 | expect_error(.sql_validate_version(bfc), "unsupported schema version") 13 | }) 14 | 15 | test_that(".sql_add_resource() works", { 16 | bfc <- BiocFileCache(tempfile(), ask = FALSE) 17 | expect_identical(.sql_get_nrows(bfcinfo(bfc)), 0L) 18 | 19 | rpath <- .sql_add_resource(bfc, "foo1", "rtype", "fpath", NA) 20 | expect_identical(names(rpath), "BFC1" ) 21 | 22 | rpath <- .sql_add_resource(bfc, c("foo2", "foo3"), "rtype", "fpath", NA) 23 | expect_identical(names(rpath), c("BFC2", "BFC3")) 24 | 25 | .sql_remove_resource(bfc, "BFC3") 26 | rpath <- .sql_add_resource(bfc, "foo4", "rtype", "fpath", NA) 27 | expect_identical(names(rpath), "BFC4") 28 | }) 29 | 30 | test_that(".sql_add_resource(ext=.) works", { 31 | getext <- function(rpath) { 32 | ext <- tools::file_ext(rpath) 33 | ext[nzchar(ext)] <- sprintf(".%s", ext)[nzchar(ext)] 34 | ext 35 | } 36 | 37 | bfc <- BiocFileCache(tempfile(), ask = FALSE) 38 | 39 | ext <- "" 40 | rpath <- .sql_add_resource(bfc, "foo1", "rtype", "fpath") 41 | expect_identical(getext(rpath), ext) 42 | 43 | ext <- ".ext2" 44 | rpath <- .sql_add_resource(bfc, "foo2", "rtype", "fpath", ext) 45 | expect_identical(getext(rpath), ext) 46 | 47 | ext <- ".ext3" 48 | rpath <- .sql_add_resource(bfc, c("foo3", "foo4"), "rtype", "fpath", ext) 49 | expect_identical(getext(rpath), c(ext, ext)) 50 | 51 | ext <- c(".ext5", ".ext6") 52 | rpath <- .sql_add_resource(bfc, c("foo5", "foo6"), "rtype", "fpath", ext) 53 | expect_identical(getext(rpath), ext) 54 | }) 55 | 56 | test_that(".sql_add_resource() sets last_modified_time to NA", { 57 | bfc <- BiocFileCache(tempfile(), ask = FALSE) 58 | 59 | rid <- names(.sql_add_resource(bfc, "foo1", "rtype", "fpath")) 60 | expected <- setNames(NA_real_, "BFC1") 61 | expect_identical(.sql_get_last_modified(bfc, rid), expected) 62 | expected <- setNames(NA_character_, "BFC1") 63 | expect_identical(.sql_get_etag(bfc, rid), expected) 64 | }) 65 | 66 | test_that(".sql_remove_resource() works", { 67 | bfc <- BiocFileCache(tempfile(), ask = FALSE) 68 | expect_identical(.sql_remove_resource(bfc, "BFC1"), 0L) 69 | 70 | rpath <- .sql_add_resource(bfc, paste0("foo", 1:4), "rtype", "fpath", NA) 71 | expect_identical(.sql_remove_resource(bfc, "BFC1"), 1L) 72 | expect_identical(.sql_remove_resource(bfc, paste0("BFC", c(2, 4))), 2L) 73 | expect_identical(bfccount(bfc), 1L) 74 | }) 75 | 76 | test_that(".sql_get_field() works", { 77 | bfc <- BiocFileCache(tempfile(), ask = FALSE) 78 | 79 | rname <- "foo1" 80 | rid <- names(.sql_add_resource(bfc, rname, "local", "fpath")) 81 | expect_identical(.sql_get_field(bfc, rid, "rname"), setNames(rname, rid)) 82 | 83 | rname <- c("foo2", "foo3") 84 | rid <- names(.sql_add_resource(bfc, rname, "local", "fpath")) 85 | expect_identical(.sql_get_field(bfc, rid, "rname"), setNames(rname, rid)) 86 | }) 87 | 88 | test_that(".sql_set_rpath() works", { 89 | bfc <- BiocFileCache(tempfile(), ask = FALSE) 90 | rid <- names(.sql_add_resource(bfc, paste0("foo", 1:3), "local", "fpath")) 91 | 92 | rpath <- "bar1" 93 | .sql_set_rpath(bfc, rid[1], rpath) 94 | expect_identical(.sql_get_rpath(bfc, rid[1]), setNames(rpath, rid[1])) 95 | 96 | rpath <- paste0("bar", 2:3) 97 | .sql_set_rpath(bfc, rid[2:3], rpath) 98 | expect_identical(.sql_get_rpath(bfc, rid[2:3]), setNames(rpath, rid[2:3])) 99 | }) 100 | 101 | test_that(".sql_get_rpath() works", { 102 | bfc <- BiocFileCache(tempfile(), ask = FALSE) 103 | 104 | fpath <- "fpath" 105 | rid <- names(.sql_add_resource(bfc, "foo1", "local", fpath)) 106 | .sql_set_rpath(bfc, rid, fpath) 107 | expect_identical(.sql_get_rpath(bfc, rid), setNames(fpath, rid)) 108 | 109 | rid <- names(.sql_add_resource(bfc, "foo2", "relative", fpath)) 110 | .sql_set_rpath(bfc, rid, fpath) 111 | expected <- setNames(file.path(bfccache(bfc), fpath), rid) 112 | expect_identical(.sql_get_rpath(bfc, rid), expected) 113 | 114 | rid <- names(.sql_add_resource(bfc, "foo3", "web", fpath)) 115 | .sql_set_rpath(bfc, rid, fpath) 116 | expected <- setNames(file.path(bfccache(bfc), fpath), rid) 117 | expect_identical(.sql_get_rpath(bfc, rid), expected) 118 | 119 | rid <- paste0("BFC", 1:3) 120 | expected <- setNames(c(fpath, expected, expected), rid) 121 | expect_identical(.sql_get_rpath(bfc, rid), expected) 122 | }) 123 | 124 | test_that(".sql_add_resource() changes remote special", { 125 | 126 | url = 127 | "https://bioconductorhubs.blob.core.windows.net/annotationhub/ncbi/uniprot/3.7/org.'Caballeronia_concitans'.eg.sqlite" 128 | bfc <- BiocFileCache(tempfile(), ask = FALSE) 129 | rpath <- path.expand(tempfile("", bfccache(bfc))) 130 | ext <- "" 131 | bfname <- basename(url) 132 | bfname <- curl::curl_escape(bfname) 133 | rpath <- sprintf("%s_%s%s", rpath, bfname, ext) 134 | id1 <- bfcadd(bfc, url) 135 | expect_identical(unname(.sql_get_rname(bfc, names(id1))), url) 136 | expect_identical(unname(.sql_get_fpath(bfc, names(id1))), url) 137 | # can't do identical because different random identifier 138 | expect_true(grepl(bfname, basename(unname(.sql_get_rpath(bfc, names(id1)))))) 139 | 140 | }) 141 | -------------------------------------------------------------------------------- /tests/testthat/test_sql_migration.R: -------------------------------------------------------------------------------- 1 | context("sql_migration") 2 | 3 | ## TODO: migration tests 4 | -------------------------------------------------------------------------------- /tests/testthat/test_utility.R: -------------------------------------------------------------------------------- 1 | context("utility") 2 | 3 | test_that("utility works", { 4 | response <- .util_ask("", .interactive = FALSE) 5 | expect_identical(response, FALSE) 6 | }) 7 | -------------------------------------------------------------------------------- /vignettes/BiocFileCache.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "BiocFileCache: Managing File Resources Across Sessions" 3 | author: Lori Shepherd 4 | output: 5 | BiocStyle::html_document: 6 | toc: true 7 | toc_depth: 2 8 | vignette: > 9 | %\VignetteEngine{knitr::rmarkdown} 10 | %\VignetteIndexEntry{1. BiocFileCache Overview: Managing File Resources Across Sessions} 11 | %\VignetteEncoding{UTF-8} 12 | %\VignetteDepends{rtracklayer} 13 | --- 14 | 15 | ```{r setup, echo=FALSE} 16 | knitr::opts_chunk$set(collapse=TRUE) 17 | ``` 18 | 19 | # Overview 20 | 21 | Organization of files on a local machine can be cumbersome. This is especially 22 | true for local copies of remote resources that may periodically require a new 23 | download to have the most updated information available. [BiocFileCache][] is 24 | designed to help manage local and remote resource files stored locally. It 25 | provides a convenient location to organize files and once added to the cache 26 | management, the package provides functions to determine if remote resources are 27 | out of date and require a new download. 28 | 29 | ## Installation and Loading 30 | 31 | `BiocFileCache` is a _Bioconductor_ package and can be installed through 32 | `BiocManager::install()`. 33 | 34 | ```{r, eval = FALSE} 35 | if (!"BiocManager" %in% rownames(installed.packages())) 36 | install.packages("BiocManager") 37 | BiocManager::install("BiocFileCache", dependencies=TRUE) 38 | ``` 39 | 40 | After the package is installed, it can be loaded into _R_ workspace by 41 | 42 | ```{r, library, results='hide', warning=FALSE, message=FALSE} 43 | library(BiocFileCache) 44 | ``` 45 | 46 | ## Creating / Loading the Cache 47 | 48 | The initial step to utilizing [BiocFileCache][] in managing files is to create a 49 | cache object specifying a location. We will create a temporary directory for use 50 | with examples in this vignette. If a path is not specified upon creation, the 51 | default location is a directory `~/.BiocFileCache` in the typical user cache 52 | directory as defined by `tools::R_user_dir("", which="cache")`. 53 | 54 | ```{r, create} 55 | path <- tempfile() 56 | bfc <- BiocFileCache(path, ask = FALSE) 57 | ``` 58 | 59 | If the path location exists and has been utilized to store files previously, the 60 | previous object will be loaded with any files saved to the cache. If the path 61 | location does not exist the user will be prompted to create the new directory. 62 | If the session is not interactive to promt the user or the user decides not to 63 | create the directory a temporary directory will be used. 64 | 65 | Some utility functions to examine the cache are: 66 | 67 | * `bfccache(bfc)` 68 | * `length(bfc)` 69 | * `show(bfc)` 70 | * `bfcinfo(bfc)` 71 | 72 | `bfccache()` will show the cache path. **NOTE**: Because we are using temporary 73 | directories, your path location will be different than shown. 74 | 75 | ```{r, cacheloc} 76 | bfccache(bfc) 77 | length(bfc) 78 | ``` 79 | 80 | `length()` on a BiocFileCache will show the number of files currently being 81 | tracked by the `BiocFileCache`. For more detailed information on what is store 82 | in the `BiocFileCache` object, there is a show method which will display the 83 | object, object class, cache path, and number of items currently being tracked. 84 | 85 | ```{r, bfcshow} 86 | bfc 87 | ``` 88 | 89 | `bfcinfo()` will list a table of `BiocFileCache` resource files being tracked in 90 | the cache. It returns a [dplyr][] object of class `tbl_sqlite`. 91 | 92 | ```{r, bfcinfo} 93 | bfcinfo(bfc) 94 | ``` 95 | 96 | The table of resource files includes the following information: 97 | 98 | * `rid`: resource id. Autogenerated. This is a unique identifier automatically 99 | generated when a resource is added to the cache. 100 | * `rname`: resource name. This is given by the user when a resource is added to 101 | the cache. It does not have to be unique and can be updated at anytime. We 102 | recommend descriptive key words and identifiers. 103 | * `create_time`: The date and time a resource is added to the cache. 104 | * `access_time`: The date and time a resource is utilized within the cache. The 105 | access time is updated when the resource is updated or downloaded. 106 | * `rpath`: resource path. This is the path to the local file. 107 | * `rtype`: resource type. Either "local" or "web", indicating if the resource 108 | has a remote origin. 109 | * `fpath`: If rtype is "web", this is the link to the remote resource. It will 110 | be utilized to download the remote data. 111 | * `last_modified_time`: For a remote resource, the last_modified (if available) 112 | information for the local copy of the data. This information is checked 113 | against the remote resource to determine if the local copy is stale and needs 114 | to be updated. If it is not available or your resource is not a remote 115 | resource, the last modified time will be marked as NA. 116 | * `etag`: For a remote resource, the etag (if available) information for the 117 | local copy of the data. This information is checked against the remote 118 | resource to determine if the local copy is stale and needs to be updated. If 119 | it is not available or your resource is not a remote resource, the etag will 120 | be marked as NA. 121 | * `expires`: For a remote resource, the expires (if available) information for 122 | the local copy of the data. This information is checked against the 123 | `Sys.time` to determine if the local copy needs to be updated. If it is not 124 | available or your resource is not a remote resource, the expires will be 125 | marked as NA. 126 | 127 | Now that we have created the cache object and location, let's explore adding 128 | files that the cache will manage! 129 | 130 | ## Adding / Tracking Resources 131 | 132 | Now that a `BiocFileCache` object and cache location has been created, files can 133 | be added to the cache for tracking. There are two functions to add a resource to 134 | the cache: 135 | 136 | * `bfcnew()` 137 | * `bfcadd()` 138 | 139 | The difference between the options: `bfcnew()` creates an entry for a resource 140 | and returns a filepath to save to. As there are many types of data that can be 141 | saved in many different ways, `bfcnew()` allows you to save any _R_ data object 142 | in the appropriate manner and still be able to track the saved file. `bfcadd()` 143 | should be utilized when a file already exists or a remote resource is being 144 | accessed. 145 | 146 | `bfcnew` takes the `BiocFileCache` object and a user specified `rname` and 147 | returns a path location to save data to. (optionally) you can add the file 148 | extension if you know the type of file that will be saved: 149 | 150 | ```{r, bfcnew} 151 | savepath <- bfcnew(bfc, "NewResource", ext=".RData") 152 | savepath 153 | 154 | ## now we can use that path in any save function 155 | m = matrix(1:12, nrow=3) 156 | save(m, file=savepath) 157 | 158 | ## and that file will be tracked in the cache 159 | bfcinfo(bfc) 160 | ``` 161 | 162 | `bfcadd()` is for existing files or remote resources. The user will still 163 | specify an `rname` of their choosing but also must specify a path to local file 164 | or web resource as `fpath`. If no `fpath` is given, the default is to assume the 165 | `rname` is also the path location. If the `fpath` is a local file, there are a 166 | few options for the user determined by the `action` argument. `action` will 167 | allow the user to either `copy` the existing file into the cache directory, 168 | `move` the existing file into the cache directory, or leave the file whereever 169 | it is on the local system yet still track through the cache object `asis`. copy 170 | and move will rename the file to the generated cache file path. If the `fpath` 171 | is a remote source, the source will try to be downloaded, if it is successful it 172 | will save in the cache location and track in the cache object; The original 173 | source will be added to the cache information as `fpath`. If the user does not 174 | want the remote resource to be downloaded initially, the argument 175 | `download=FALSE` may be used to delay the download but add the resource to the 176 | cache. Relative path locations may also be used, specified with 177 | `rtype = "relative"`. This will store a relative location for the file within 178 | the cache; only actions `copy` and `move` are available for relative paths. 179 | 180 | First let's use local files: 181 | 182 | ```{r, bfcadd} 183 | fl1 <- tempfile(); file.create(fl1) 184 | add2 <- bfcadd(bfc, "Test_addCopy", fl1) # copy 185 | # returns filepath being tracked in cache 186 | add2 187 | # the name is the unique rid in the cache 188 | rid2 <- names(add2) 189 | 190 | fl2 <- tempfile(); file.create(fl2) 191 | add3 <- bfcadd(bfc, "Test2_addMove", fl2, action="move") # move 192 | rid3 <- names(add3) 193 | 194 | fl3 <- tempfile(); file.create(fl3) 195 | add4 <- bfcadd(bfc, "Test3_addAsis", fl3, rtype="local", 196 | action="asis") # reference 197 | rid4 <- names(add4) 198 | 199 | file.exists(fl1) # TRUE - copied from original location 200 | file.exists(fl2) # FALSE - moved from original location 201 | file.exists(fl3) # TRUE - left asis, original location tracked 202 | ``` 203 | 204 | Now let's add some examples with remote sources: 205 | 206 | ```{r, bfcaddremote} 207 | url <- "https://httpbin.org/get" 208 | add5 <- bfcadd(bfc, "TestWeb", fpath=url) 209 | rid5 <- names(add5) 210 | 211 | url2<- "https://bioconductor.org/packages/stats/bioc/BiocFileCache/BiocFileCache_2024_stats.tab" 212 | add6 <- bfcadd(bfc, "TestWeb", fpath=url2) 213 | rid6 <- names(add6) 214 | 215 | # add a remote resource but don't initially download 216 | add7 <- bfcadd(bfc, "TestNoDweb", fpath=url2, download=FALSE) 217 | rid7 <- names(add7) 218 | # let's look at our BiocFileCache object now 219 | bfc 220 | bfcinfo(bfc) 221 | ``` 222 | 223 | Now that we are tracking resources, let's explore accessing their information! 224 | 225 | ### Caveat 226 | 227 | Files will by default have a unique identifier added to the start of the 228 | original file name (identifier_originalName) when added to the cache to allow 229 | for multiple versions of the same file name. There is an option to override this 230 | default behavior by using the `fname` argument of `bfcadd` or `bfcnew`. `fname` 231 | takes one of two options: `unique` or `exact`. The `unique` option behaves as 232 | default and adds a unique identifier to the original file name. The `exact` 233 | option wil override and not add a unique identifier and an exact match to the 234 | original file name will be added. 235 | 236 | 237 | ## Investigating / Accessing Resources 238 | 239 | Before we get into exploring individual resources, a helper function. Most of 240 | the functions provided require the unique rid[s] assigned to a resource. The 241 | `bfcadd` and `bfcnew` return the path as a named character vector, the name of 242 | the character vector is the rid. However, you may want to access a resource 243 | that you have added some time ago. 244 | 245 | * `bfcquery()` 246 | 247 | `bfcquery()` will take in a key word and search across the `rname`, `rpath`, and 248 | `fpath` for any matching entries. The columns that are searched can be 249 | controlled with the argument `field`. 250 | 251 | ```{r, bfcquery} 252 | bfcquery(bfc, "Web") 253 | 254 | bfcquery(bfc, "copy") 255 | 256 | q1 <- bfcquery(bfc, "BiocFileCache") 257 | q1 258 | class(q1) 259 | ``` 260 | 261 | As you can see above `bfcquery()`, returns an object of class `tbl_sql` and can 262 | be investiaged further utilizing methods for these classes, such as the package 263 | `dplyr` methods. The `rid` can be seen in the first column of the table to be 264 | used in other functions. To get a quick count of how many objects in the cache 265 | matched the query, use `bfccount()`. 266 | 267 | ```{r, bfccount} 268 | bfccount(q1) 269 | ``` 270 | 271 | * `[` 272 | 273 | `[` allows for subsetting of the BiocFileCache object. The output will be a 274 | BiocFileSubCache object. Users will still be able to query, remove (from the 275 | subset object only), and access resources of the subset, however the resources 276 | cannot be updated. 277 | 278 | ```{r, bfcsubset} 279 | bfcsubWeb = bfc[paste0("BFC", 5:6)] 280 | bfcsubWeb 281 | bfcinfo(bfcsubWeb) 282 | ``` 283 | 284 | There are three methods for retrieving the `BiocFileCache` resource path 285 | location. 286 | 287 | * `[[` 288 | * `bfcpath()` 289 | * `bfcrpath()` 290 | 291 | The `[[` will access the `rpath` saved in the `BiocFileCache`. Retrieving this 292 | location will return the path to the local version of the resource; allowing the 293 | user to then use this path in any load/read methods most appropriate for the 294 | resource. The `bfcpath()` and `bfcrpath()` both return a named character vector 295 | also displaying the local file that can be used for retrieval. `bfcpath` 296 | requires `rids` while `bfcrpath()` can use `rids` or `rnames` (but not 297 | both). `bfcrpath()` can be used to add a resource into the cache when `rnames` 298 | are specified; if the element in `rnames` is not found, it will try and add to 299 | the cache with `bfcadd()`. 300 | 301 | 302 | ```{r, bfcbracket} 303 | bfc[["BFC2"]] 304 | bfcpath(bfc, "BFC2") 305 | bfcpath(bfc, "BFC5") 306 | bfcrpath(bfc, rids="BFC5") 307 | bfcrpath(bfc) 308 | bfcrpath(bfc, c("https://httpbin.org/get","Test3_addAsis")) 309 | ``` 310 | 311 | Managing remote resources locally involves knowing when to update the local copy 312 | of the data. 313 | 314 | * `bfcneedsupdate()` 315 | 316 | `bfcneedsupdate()` is a method that will check the local copy of the data's 317 | etag and last_modifed time to the etag and last_modified time of the remote 318 | resource as well as an expires time. The cache saves this information when the 319 | web resource is initially added. The expires time is checked against the current 320 | Sys.time to see if the local resource has expired. If so the resource will deem 321 | need to be updated; if unavailable or not expired will check the etag and 322 | last_modified_time. The etag information is used definitively if it is 323 | available, if it is not available it checks the last_modified time. If the 324 | resource does not have a last_modified tag either, it is undetermined. If the 325 | resource has not been download yet, it is `TRUE`. 326 | 327 | **Note:** This function does not automatically download the remote source if it 328 | is out of date. Please see `bfcdownload()`. 329 | 330 | ```{r, bfcneedsupdate} 331 | bfcneedsupdate(bfc, "BFC5") 332 | bfcneedsupdate(bfc, "BFC6") 333 | bfcneedsupdate(bfc) 334 | ``` 335 | 336 | ## Updating Resource Entries or Local Copy of Remote Data 337 | 338 | Just as you could access the `rpath`, the local resource path can be set with 339 | 340 | * `[[<-` 341 | 342 | The file must exist in order to be replaced in the `BiocFileCache`. If the user 343 | wishes to rename, they must make a copy (or touch) the file first. 344 | 345 | ```{r, bfcrename} 346 | fileBeingReplaced <- bfc[[rid3]] 347 | fileBeingReplaced 348 | 349 | # fl3 was created when we were adding resources 350 | fl3 351 | 352 | bfc[[rid3]]<-fl3 353 | bfc[[rid3]] 354 | ``` 355 | 356 | The user may also wish to change the `rname` or `fpath` associated with a 357 | resource in addition to the `rpath`. This can be done with 358 | 359 | * `bfcupdate()` 360 | 361 | Again, if changing the `rpath` the file must exist. If a `fpath` is being 362 | updated, the data will be downloaded and the user will be prompted to overwrite 363 | the current file specified in `rpath`. If the user does not want to be prompted 364 | about overwritting of files, `ask=FALSE` may be used. 365 | 366 | ```{r, bfcupdate} 367 | bfcinfo(bfc, "BFC1") 368 | bfcupdate(bfc, "BFC1", rname="FirstEntry") 369 | bfcinfo(bfc, "BFC1") 370 | ``` 371 | 372 | Now let's update a web resource 373 | 374 | ```{r, bfcupdateremote} 375 | suppressPackageStartupMessages({ 376 | library(dplyr) 377 | }) 378 | bfcinfo(bfc, "BFC6") %>% select(rid, rpath, fpath) 379 | bfcupdate(bfc, "BFC6", fpath=url, rname="Duplicate", ask=FALSE) 380 | bfcinfo(bfc, "BFC6") %>% select(rid, rpath, fpath) 381 | ``` 382 | 383 | Lastly, remote resources may require an update if the Data is out of date (See 384 | `bfcneedsupdate()`). The `bfcdownload` function will attempt to download from 385 | the original resource saved in the cache as `fpath` and overwrite the out of 386 | date file `rpath` 387 | 388 | * `bfcdownload()` 389 | 390 | The following confirms that resources need updating, and the performs the update 391 | 392 | ```{r, bfcdownload} 393 | rid <- "BFC5" 394 | test <- !identical(bfcneedsupdate(bfc, rid), FALSE) # 'TRUE' or 'NA' 395 | if (test) 396 | bfcdownload(bfc, rid, ask=FALSE) 397 | ``` 398 | 399 | ## Adding MetaData 400 | 401 | The following functions are provided for metadata: 402 | 403 | * `bfcmeta()<-` 404 | * `bfcmeta()` 405 | * `bfcmetalist()` 406 | * `bfcmetaremove()` 407 | 408 | Additional metadata can be added as `data.frames` that become tables in the sql 409 | database. The `data.frame` must contain a column `rid` that matches the `rid` 410 | column in the cache. Any metadata added will then be displayed when accessing 411 | the cache. Metadata is added with `bfcmeta()<-`. A table `name` must be provided 412 | as an argument. Users can add multiple metadata tables as long as the names are 413 | unique. Tables may be appended or overwritten using additional arguments 414 | `append=TRUE` or `overwrite=TRUE`. 415 | 416 | ```{r, bfcmetadata} 417 | names(bfcinfo(bfc)) 418 | meta <- as.data.frame(list(rid=bfcrid(bfc)[1:3], idx=1:3)) 419 | bfcmeta(bfc, name="resourceData") <- meta 420 | names(bfcinfo(bfc)) 421 | ``` 422 | The metadata tables that exist can be listed with `bfcmetalist()` and can be 423 | retrieved with `bfcmeta()`. 424 | 425 | ```{r, bfcmetalist} 426 | bfcmetalist(bfc) 427 | bfcmeta(bfc, name="resourceData") 428 | ``` 429 | 430 | Lastly, metadata can be removed with `bfcmetaremove()`. 431 | 432 | ```{r, bfcmetaremove} 433 | bfcmetaremove(bfc, name="resourceData") 434 | ``` 435 | 436 | **Note:** 437 | 438 | While quick implementations of all the functions exist where if you 439 | don't specify a BiocFileCache object it will operate on `BiocFileCache()`, 440 | this option is not available for `bfcmeta()<-`. This function must always 441 | specify a BiocFileCache object by first defining a variable and then passing 442 | that variable into the function. 443 | 444 | Example of ERROR: 445 | ```{r eval=FALSE} 446 | bfcmeta(name="resourceData") <- meta 447 | Error in bfcmeta(name = "resourceData") <- meta : 448 | target of assignment expands to non-language object 449 | ``` 450 | Correct implementation: 451 | ```{r eval=FALSE} 452 | bfc <- BiocFileCache() 453 | bfcmeta(bfc, name="resourceData") <- meta 454 | ``` 455 | All other functions have a default, if the BiocFileCache object is missing it 456 | will operate on the default cache `BiocFileCache()`. 457 | 458 | ## Removing Resources 459 | 460 | Now that we have added resources, it is also possible to remove a resource. 461 | 462 | * `bfcremove()` 463 | 464 | When you remove a resource from the cache, it will also delete the local file 465 | but only if it is stored in the cache directory as given by `bfccache(bfc)`. If 466 | it is a path to a file somewhere else on the user system, it will only be 467 | removed from the `BiocFileCache` object but the file not deleted. 468 | 469 | ```{r, bfcremove} 470 | # let's remind ourselves of our object 471 | bfc 472 | 473 | bfcremove(bfc, "BFC6") 474 | bfcremove(bfc, "BFC1") 475 | 476 | # let's look at our BiocFileCache object now 477 | bfc 478 | ``` 479 | 480 | There is another helper function that may be of use: 481 | 482 | * `bfcsync()` 483 | 484 | This function will compare two things: 485 | 486 | 1. If any `rpath` cannot be found (This would occur if `bfcnew()` is used and 487 | the path was not used to save an object) 488 | 2. If there are files in the cache directory (`bfccache(bfc)`), that are not 489 | being tracked by the `BiocFileCache` object 490 | 491 | ```{r, bfcsync} 492 | # create a new entry that hasn't been used 493 | path <- bfcnew(bfc, "UseMe") 494 | rmMe <- names(path) 495 | # We also have a file not being tracked because we updated rpath 496 | 497 | bfcsync(bfc) 498 | 499 | # you can suppress the messages and just have a TRUE/FALSE 500 | bfcsync(bfc, FALSE) 501 | 502 | # 503 | # Let's do some cleaning to have a synced object 504 | # 505 | bfcremove(bfc, rmMe) 506 | unlink(fileBeingReplaced) 507 | 508 | bfcsync(bfc) 509 | ``` 510 | 511 | ## Exporting and Importing Cache 512 | 513 | There is a helper function to export a BiocFileCache and associated files as a 514 | tar or zip archive as well as the appropriate import function. 515 | 516 | * `exportbfc()` 517 | * `importbfc()` 518 | 519 | The `exportbfc` function will take in a BiocFileCache object or subsetted object 520 | and create a tar or zip archive that can then be shared to other collaborators 521 | on different computer systems. The user can choose where the archive is created 522 | with `outputFile`; the current working directory and the name 523 | `BiocFileCacheExport.tar` is used as default. By default a tar archive is 524 | created, but the user can create a zip archive instead using the argument 525 | `outputMethod="zip"`. Any additional argument to the `utils::zip` or 526 | `utils::tar` may also be utilized. 527 | 528 | The following are some example calls: 529 | ```{r eval=FALSE} 530 | # export entire biocfilecache 531 | exportbfc(bfc) 532 | 533 | # export the first 4 entries of biocfilecache 534 | # as a compressed tar 535 | exportbfc(bfc, rids=paste0("BFC", 1:4), 536 | outputFile="BiocFileCacheExport.tar.gz", compression="gzip") 537 | 538 | # export the subsetted object of web resources as zip 539 | sub1 <- bfc[bfcrid(bfcquery(bfc, "web", field='rtype'))] 540 | exportbfc(sub1, outputFile = "BiocFileCacheExportWeb.zip", 541 | outMethod="zip") 542 | ``` 543 | 544 | The archive once inflated on a users system will have a fully functional copy of 545 | the sent cache. The archive can be extracted manually and the path used in the 546 | constructor `BiocFileCache()` or for convenience the function `importbfc` may be 547 | utilized. The `importbfc` function takes in a path to the appropriate tar or zip 548 | file, the argument `archiveMethod` indicating if `untar` or `unzip` should be 549 | used (the default is untar), a path to where the archive should be extracted to 550 | as `exdir`, and any additional arguments to the `utils::untar` and 551 | `utils::unzip` methods. The function will extract the files and load the 552 | associated BiocFileCache object into the R session. 553 | 554 | The following are example calls to load the above example exported objects: 555 | ```{r eval=FALSE} 556 | 557 | bfc <- importbfc("BiocFileCacheExport.tar") 558 | 559 | bfc2 <- importbfc("BiocFileCacheExport.tar.gz", compression="gzip") 560 | 561 | bfc3 <- importbfc("BiocFileCacheExportWeb.zip", archiveMethod="unzip") 562 | ``` 563 | 564 | ## Creating a Cache from Existing Data 565 | 566 | There exists the following helper functions to convert existing data to a 567 | BiocFileCache: 568 | 569 | * `makeBiocFileCacheFromDataFrame` 570 | 571 | These functions may take awhile to run if there are a lot of resources, however 572 | if the BiocFileCache is stored in a permanent location it will only need to be 573 | run once. 574 | 575 | ### Create a BiocFileCache from an Existing data.frame 576 | 577 | `makeBiocFileCacheFromDataFrame` takes an existing data.frame and creates a 578 | BiocFileCache object. The cache location can be specified by the `cache` 579 | argument. The `cache` must not already exist and the user will be prompted to 580 | create the location. If the user opts 'N', the cache will be created in a 581 | temporary directory and this function will have to be run again upon a new R 582 | session. The original data.frame must contain the required BiocFileCache columns 583 | `rtype`, `rpath`, and `fpath` as described in the section 1.2 "Creating / 584 | Loading the Cache". The optional columns `rname`, `last_modified_time`, `etag` 585 | and `expires` may also be specified in the original data.frame although are not 586 | required and will be populated with defaults if missing. For resources with 587 | `rtype="local"`, the `actionLocal` will control if the local copy of the file is 588 | copied or moved to the cache location, or if it is left asis on the local 589 | system; A local copy of the file must exist if the resource is identified as 590 | `rtype=local`. For resources with `rtype="web"`, `actionWeb` will control if the 591 | local copy of the remote file is copied or moved to the cache location. It is a 592 | requirement of BiocFileCache that all remote resources download their local copy 593 | to the cache location. A local copy of the file does not have to exist and can 594 | be downloaded into the cache at a later time. Any additional columns of the 595 | original data.frame besides those required or optional BiocFileCache columns, 596 | are separated and added to the BiocFileCache as a meta data table with the name 597 | given as `metadataName`. See section 1.6 on "Adding Metadata". 598 | 599 | The following is an example data.frame with minimal columns 'rtype', 'rpath', 600 | and 'fpath' and one additional column that will become metadata 'keywords'. The 601 | 'rpath' can be `NA` as these are remote resources (`rtype='web'`) that have not 602 | been downloaded yet. 603 | 604 | ```{r, mock} 605 | tbl <- data.frame(rtype=c("web","web"), 606 | rpath=c(NA_character_,NA_character_), 607 | fpath=c("https://httpbin.org/get", 608 | "https://en.wikipedia.org/wiki/Bioconductor"), 609 | keywords = c("httpbin", "wiki"), stringsAsFactors=FALSE) 610 | tbl 611 | ``` 612 | 613 | ```{r eval=FALSE} 614 | 615 | newbfc <- makeBiocFileCacheFromDataFrame(tbl, 616 | cache=file.path(tempdir(),"BFC"), 617 | actionWeb="copy", 618 | actionLocal="copy", 619 | metadataName="resourceMetadata") 620 | 621 | ``` 622 | 623 | ## Cleaning or Removing Cache 624 | 625 | Finally, there are two function involved with cleaning or deleting the cache: 626 | 627 | * `cleanbfc()` 628 | * `removebfc()` 629 | 630 | `cleanbfc()` will evaluate the resources in the `BiocFileCache` object and 631 | determine which, if any, have not been created, redownloaded, or updated in a 632 | specified number of days. If `ask=TRUE`, each entry that is above that threshold 633 | will ask if it should be removed from the cache object and the file deleted 634 | (only deleted if in `bfccache(bfc)` location). If `ask=FALSE`, it does not ask 635 | about each file and automatically removes and deletes the file. The default 636 | number of days is 120. If a resource has not needed any updates, this function 637 | could give a false positive. It is also does not take into account how many time 638 | the resource was loaded by retrieving the path (ie. via [[, bfcpath, bfcrpath), 639 | so may not be an accurate indication of how often the resource is 640 | utilized. Please use this function with caution. 641 | 642 | ```{r eval=FALSE} 643 | cleanbfc(bfc) 644 | ``` 645 | 646 | `removebfc()` will remove the `BiocFileCache` complete from the system. Any 647 | files saved in `bfccache(bfc)` directory will also be deleted. 648 | 649 | ```{r eval=FALSE} 650 | removebfc(bfc) 651 | ``` 652 | **Note** Use with caution! 653 | 654 | # Access Behind a Proxy 655 | 656 | BiocFileCache uses CRAN package `httr2` functions for accessing and downloading 657 | web resources. This can be problematic if operating behind a 658 | proxy. Unfortunately unlike the previously used `httr::set_config`, there is no 659 | option to globally set the proxy for httr2 requests. You can pass proxy 660 | information into the bfcadd, bfcupdate, bfcneedsupdate, and bfcdownload 661 | functions. 662 | 663 | ```{r eval=FALSE} 664 | proxy <- "http://my_user:my_password@myproxy:8080" 665 | bfcadd(bfc, rname="uniquename", fpath="https://remoteresource", proxy=proxy) 666 | ``` 667 | You can also set a `https_proxy` environment variable. It should be a character 668 | vector similar to the format above. 669 | 670 | # Other configuration options for resource downloading 671 | 672 | As mentioned previously, There is no global option like `httr::set_config` to 673 | set configuration options when using httr2. A `config` argument may be passed to 674 | bfcadd, bfcupdate, bfcneedsupdate, and bfcdownload functions. This argument is a 675 | R list object that will be passed to `httr2::req_options`. The names of the 676 | items should be valid curl options as defined in `curl::curl_options`. 677 | 678 | ```{r eval=FALSE} 679 | ssl_opts <- list(verbose = 1L,ssl_verifypeer = 0L, ssl_verifyhost = 0L) 680 | bfcadd(bfc, rname="uniquename", fpath="https://remoteresource", config=ssl_opts) 681 | ``` 682 | # Group Cache Access 683 | 684 | The situation may occur where a cache is desired to be shared across multiple 685 | users on a system. This presents permissions errors. To allow access to 686 | multiple users create a group that the users belong to and that the cache 687 | belongs too. Permissions of potentially two files need to be altered depending 688 | on what you would like individuals to be able to accomplish with the cache. A 689 | read-only cache will require manual manipulatios of the 690 | BiocFileCache.sqlite.LOCK so that the group permissions are `g+rw`. To allow 691 | users to download files to the shared cache, both the 692 | BiocFileCache.sqlite.LOCK file and the BiocFileCache.sqlite file will need group 693 | permissions to `g+rw`. Please google how to create a user group for your system 694 | of interest. To find the location of the cache to be able to change the group 695 | and file permissions, you may run the following in R if you used the default location: 696 | `tools::R_user_dir("BiocFileCache", which="cache")` or if you created a unique 697 | location, something like the following: `bfc = 698 | BiocFileCache(cache="someUniquelocation"); bfccache(bfc)`. For quick reference 699 | in linux you will use `chown currentuser:newgroup` to change the group and 700 | `chmod` to change the file permissions: `chmod 660` or `chmod g+rw` should 701 | accomplish the correct permissions. 702 | 703 | # Summary 704 | 705 | It is our hope that this package allows for easier management of local and 706 | remote resources. 707 | 708 | # SessionInfo 709 | 710 | ```{r, sessioninfo} 711 | sessionInfo() 712 | 713 | ``` 714 | 715 | 716 | 717 | [BiocFileCache]: https://bioconductor.org/packages/BiocFileCache 718 | [dplyr]: https://cran.r-project.org/package=dplyr 719 | -------------------------------------------------------------------------------- /vignettes/BiocFileCache_Troubleshooting.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "BiocFileCache Troubleshooting" 3 | author: Lori Shepherd 4 | output: 5 | BiocStyle::html_document: 6 | toc: true 7 | toc_depth: 2 8 | vignette: > 9 | %\VignetteEngine{knitr::rmarkdown} 10 | %\VignetteIndexEntry{3. BiocFileCache Troubleshooting} 11 | %\VignetteEncoding{UTF-8} 12 | %\VignetteDepends{rtracklayer} 13 | --- 14 | 15 | ```{r setup, echo=FALSE} 16 | knitr::opts_chunk$set(collapse=TRUE) 17 | ``` 18 | 19 | # Overview 20 | 21 | Organization of files on a local machine can be cumbersome. This is especially 22 | true for local copies of remote resources that may periodically require a new 23 | download to have the most updated information available. [BiocFileCache][] is 24 | designed to help manage local and remote resource files stored locally. It 25 | provides a convenient location to organize files and once added to the cache 26 | management, the package provides functions to determine if remote resources are 27 | out of date and require a new download. 28 | 29 | ## Installation and Loading 30 | 31 | `BiocFileCache` is a _Bioconductor_ package and can be installed through 32 | `BiocManager::install()`. 33 | 34 | ```{r, eval = FALSE} 35 | if (!"BiocManager" %in% rownames(installed.packages())) 36 | install.packages("BiocManager") 37 | BiocManager::install("BiocFileCache", dependencies=TRUE) 38 | ``` 39 | 40 | After the package is installed, it can be loaded into _R_ workspace by 41 | 42 | ```{r, results='hide', warning=FALSE, message=FALSE} 43 | library(BiocFileCache) 44 | ``` 45 | 46 | ## Creating / Loading the Cache 47 | 48 | The initial step to utilizing [BiocFileCache][] in managing files is to create a 49 | cache object specifying a location. We will create a temporary directory for use 50 | with examples in this vignette. If a path is not specified upon creation, the 51 | default location is a directory `~/.BiocFileCache` in the typical user cache 52 | directory as defined by `tools::R_user_dir("", which="cache")`. 53 | 54 | ```{r, create} 55 | path <- tempfile() 56 | bfc <- BiocFileCache(path, ask = FALSE) 57 | ``` 58 | 59 | # Access Behind a Proxy 60 | 61 | BiocFileCache uses CRAN package `httr2` functions for accessing and downloading 62 | web resources. This can be problematic if operating behind a 63 | proxy. Unfortunately unlike the previously used `httr::set_config`, there is no 64 | option to globally set the proxy for httr2 requests. You can pass proxy 65 | information into the bfcadd, bfcupdate, bfcneedsupdate, and bfcdownload 66 | functions. 67 | 68 | ```{r eval=FALSE} 69 | proxy <- "http://my_user:my_password@myproxy:8080" 70 | bfcadd(bfc, rname="uniquename", fpath="https://remoteresource", proxy=proxy) 71 | ``` 72 | You can also set a `https_proxy` environment variable. It should be a character 73 | vector similar to the format above. 74 | 75 | # Other configuration options for resource downloading 76 | 77 | As mentioned previously, There is no global option like `httr::set_config` to 78 | set configuration options when using httr2. A `config` argument may be passed to 79 | bfcadd, bfcupdate, bfcneedsupdate, and bfcdownload functions. This argument is a 80 | R list object that will be passed to `httr2::req_options`. The names of the 81 | items should be valid curl options as defined in `curl::curl_options`. 82 | 83 | ```{r eval=FALSE} 84 | ssl_opts <- list(verbose = 1L,ssl_verifypeer = 0L, ssl_verifyhost = 0L) 85 | bfcadd(bfc, rname="uniquename", fpath="https://remoteresource", config=ssl_opts) 86 | ``` 87 | 88 | # Group Cache Access 89 | 90 | The situation may occur where a cache is desired to be shared across multiple 91 | users on a system. This presents permissions errors. To allow access to 92 | multiple users create a group that the users belong to and that the cache 93 | belongs too. Permissions of potentially two files need to be altered depending 94 | on what you would like individuals to be able to accomplish with the cache. A 95 | read-only cache will require manual manipulatios of the 96 | BiocFileCache.sqlite.LOCK so that the group permissions are `g+rw`. To allow 97 | users to download files to the shared cache, both the 98 | BiocFileCache.sqlite.LOCK file and the BiocFileCache.sqlite file will need group 99 | permissions to `g+rw`. Please google how to create a user group for your system 100 | of interest. To find the location of the cache to be able to change the group 101 | and file permissions, you may run the following in R if you used the default location: 102 | `tools::R_user_dir("BiocFileCache", which="cache")` or if you created a unique 103 | location, something like the following: `bfc = 104 | BiocFileCache(cache="someUniquelocation"); bfccache(bfc)`. For quick reference 105 | in linux you will use `chown currentuser:newgroup` to change the group and 106 | `chmod` to change the file permissions: `chmod 660` or `chmod g+rw` should 107 | accomplish the correct permissions. 108 | 109 | 110 | # Lock file Troubleshooting 111 | 112 | Two issues have been commonly reported regarding the lock file. 113 | 114 | ## Permissions 115 | 116 | There could be permission ERROR regarding group and public access. See the 117 | previous `Group Cache Access` section. 118 | 119 | ## Cannot lock file / no lock available 120 | 121 | This is an issue with filelock on particular systems. Particular partitions and 122 | non standard file systems may not support filelock. The solution is to use a 123 | different section of the system to create the cache. The easiest way to define a 124 | new cache location is by using environment variables. 125 | 126 | In R: 127 | 128 | `Sys.setenv(BFC_CACHE=)` 129 | 130 | Alternatively, you can set an environment variable globally to avoid having to 131 | set uniquely in each R session. Please google for specific instructions for 132 | setting environment variables globally for your particular OS system. 133 | 134 | Other common filelock implemented packages that have specific environment 135 | variables to control location are: 136 | 137 | * BiocFileCache: BFC_CACHE 138 | * ExperimentHub: EXPERIMENT_HUB_CACHE 139 | * AnnotationHub: ANNOTATION_HUB_CACHE 140 | * biomaRt: BIOMART_CACHE 141 | 142 | # Default Caching Location Update 143 | 144 | As of BiocFileCache version > 1.15.1, the default caching location has 145 | changed. The default cache is now controlled by the function `tools::R_user_dir` 146 | instead of `rappdirs::user_cache_dir`. Users who have utilized the default 147 | BiocFileCache location, to continue using the created cache, must move the cache and its 148 | files to the new default location or delete the old cache and have to redownload 149 | any previous files. 150 | 151 | ## Option 1: Moving Files 152 | 153 | The following steps can be used to move the files to the new location: 154 | 155 | 1. Determine the old location by running the following in R 156 | `rappdirs::user_cache_dir(appname="BiocFileCache")` 157 | 158 | 2. Determine the new location by running the following in R 159 | `tools::R_user_dir("BiocFileCache", which="cache")` 160 | 161 | 3. Move the files to the new location. You can do this manually or do the 162 | following steps in R. Remember if you have a lot of cached files, this may take 163 | awhile and you will need permissions on all the files in order to move them. 164 | 165 | ```{r, eval=FALSE} 166 | # make sure you have permissions on the cache/files 167 | # use at own risk 168 | 169 | moveFiles<-function(package){ 170 | olddir <- path.expand(rappdirs::user_cache_dir(appname=package)) 171 | newdir <- tools::R_user_dir(package, which="cache") 172 | dir.create(path=newdir, recursive=TRUE) 173 | files <- list.files(olddir, full.names =TRUE) 174 | moveres <- vapply(files, 175 | FUN=function(fl){ 176 | filename = basename(fl) 177 | newname = file.path(newdir, filename) 178 | file.rename(fl, newname) 179 | }, 180 | FUN.VALUE = logical(1)) 181 | if(all(moveres)) unlink(olddir, recursive=TRUE) 182 | } 183 | 184 | 185 | package="BiocFileCache" 186 | moveFiles(package) 187 | 188 | ``` 189 | 190 | 191 | ## Option 2: Specify a Cache Location Explicitly 192 | 193 | Users may always specify a unique caching location by providing the `cache` argument to the BiocFileCache 194 | constructor; however users must always specify this location as it will not be 195 | recognized by default in subsequent runs. 196 | 197 | Alternatively, the default caching location may also be controlled by a 198 | user-wise or system-wide environment variable. Users may set the environment 199 | variable `BFC_CACHE` to the old location to continue using as default location. 200 | 201 | 202 | ## Option 3: Delete the old cache 203 | 204 | Lastly, if a user does not care about the already existing default cache, the 205 | old location may be deleted to move forward with the new default location. This 206 | option should be used with caution. Once deleted, old cached resources will no 207 | longer be available and have to be re-downloaded. 208 | 209 | One can do this manually by navigating to the location indicated in the ERROR 210 | message as `Problematic cache:` and deleting the folder and all its content. 211 | 212 | 213 | The following can be done to delete through R code: 214 | 215 | **CAUTION** This will remove the old cache and all downloaded resources. 216 | 217 | 218 | ```{r, eval=FALSE} 219 | library(BiocFileCache) 220 | 221 | 222 | package = "BiocFileCache" 223 | 224 | BFC_CACHE = rappdirs::user_cache_dir(appname=package) 225 | Sys.setenv(BFC_CACHE = BFC_CACHE) 226 | bfc = BiocFileCache(BFC_CACHE) 227 | ## CAUTION: This removes the cache and all downloaded resources 228 | removebfc(bfc, ask=FALSE) 229 | 230 | ## create new empty cache in new default location 231 | bfc = BiocFileCache(ask=FALSE) 232 | 233 | ``` 234 | 235 | # SessionInfo 236 | 237 | ```{r, sessioninfo} 238 | sessionInfo() 239 | 240 | ``` 241 | 242 | 243 | 244 | [BiocFileCache]: https://bioconductor.org/packages/BiocFileCache 245 | [dplyr]: https://cran.r-project.org/package=dplyr 246 | -------------------------------------------------------------------------------- /vignettes/BiocFileCache_UseCases.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "BiocFileCache Use Cases" 3 | author: Lori Shepherd 4 | output: 5 | BiocStyle::html_document: 6 | toc: true 7 | toc_depth: 2 8 | vignette: > 9 | %\VignetteEngine{knitr::rmarkdown} 10 | %\VignetteIndexEntry{2. BiocFileCache: Use Cases} 11 | %\VignetteEncoding{UTF-8} 12 | %\VignetteDepends{rtracklayer} 13 | --- 14 | 15 | ```{r setup, echo=FALSE} 16 | knitr::opts_chunk$set(collapse=TRUE) 17 | ``` 18 | 19 | # Overview 20 | 21 | Organization of files on a local machine can be cumbersome. This is especially 22 | true for local copies of remote resources that may periodically require a new 23 | download to have the most updated information available. [BiocFileCache][] is 24 | designed to help manage local and remote resource files stored locally. It 25 | provides a convenient location to organize files and once added to the cache 26 | management, the package provides functions to determine if remote resources are 27 | out of date and require a new download. 28 | 29 | ## Installation and Loading 30 | 31 | `BiocFileCache` is a _Bioconductor_ package and can be installed through 32 | `BiocManager::install()`. 33 | 34 | ```{r, eval = FALSE} 35 | if (!"BiocManager" %in% rownames(installed.packages())) 36 | install.packages("BiocManager") 37 | BiocManager::install("BiocFileCache", dependencies=TRUE) 38 | ``` 39 | 40 | After the package is installed, it can be loaded into _R_ workspace by 41 | 42 | ```{r, results='hide', warning=FALSE, message=FALSE} 43 | library(BiocFileCache) 44 | ``` 45 | 46 | ## Creating / Loading the Cache 47 | 48 | The initial step to utilizing [BiocFileCache][] in managing files is to create a 49 | cache object specifying a location. We will create a temporary directory for use 50 | with examples in this vignette. If a path is not specified upon creation, the 51 | default location is a directory `~/.BiocFileCache` in the typical user cache 52 | directory as defined by `tools::R_user_dir("", which="cache")`. 53 | 54 | ```{r} 55 | path <- tempfile() 56 | bfc <- BiocFileCache(path, ask = FALSE) 57 | ``` 58 | 59 | # Use Cases 60 | 61 | ## Local cache of an internet resource 62 | 63 | One use for [BiocFileCache][] is to save local copies of remote 64 | resources. The benefits of this approach include reproducibility, 65 | faster access, and access (once cached) without need for an internet 66 | connection. An example is an Ensembl GTF file (also available via 67 | [AnnotationHub][]) 68 | 69 | ```{r, url} 70 | ## paste to avoid long line in vignette 71 | url <- paste( 72 | "ftp://ftp.ensembl.org/pub/release-71/gtf", 73 | "homo_sapiens/Homo_sapiens.GRCh37.71.gtf.gz", 74 | sep="/") 75 | ``` 76 | 77 | For a system-wide cache, simply load the [BiocFileCache][] package and 78 | ask for the local resource path (`rpath`) of the resource. 79 | 80 | ```{r, eval=FALSE} 81 | library(BiocFileCache) 82 | bfc <- BiocFileCache() 83 | path <- bfcrpath(bfc, url) 84 | ``` 85 | 86 | Use the path returned by `bfcrpath()` as usual, e.g., 87 | 88 | ```{r, eval=FALSE} 89 | gtf <- rtracklayer::import.gff(path) 90 | ``` 91 | 92 | A more compact use, the first or any time, is 93 | 94 | ```{r, eval=FALSE} 95 | gtf <- rtracklayer::import.gff(bfcrpath(BiocFileCache(), url)) 96 | ``` 97 | 98 | Ensembl releases do not change with time, so there is no need to check 99 | whether the cached resource needs to be updated. 100 | 101 | ## Cache of experimental computations 102 | 103 | One might use [BiocFileCache][] to cache results from experimental 104 | analysis. The `rname` field provides an opportunity to provide 105 | descriptive metadata to help manage collections of resources, without 106 | relying on cryptic file naming conventions. 107 | 108 | Here we create or use a local file cache in the directory in which we are 109 | doing our analysis. 110 | 111 | ```{r, eval=FALSE} 112 | library(BiocFileCache) 113 | bfc <- BiocFileCache("~/my-experiment/results") 114 | ``` 115 | 116 | We perform our analysis... 117 | 118 | ```{r, eval=FALSE} 119 | suppressPackageStartupMessages({ 120 | library(DESeq2) 121 | library(airway) 122 | }) 123 | data(airway) 124 | dds <- DESeqDataData(airway, design = ~ cell + dex) 125 | result <- DESeq(dds) 126 | ``` 127 | 128 | ...and then save our result in a location provided by 129 | [BiocFileCache][]. 130 | 131 | ```{r, eval=FALSE} 132 | saveRDS(result, bfcnew(bfc, "airway / DESeq standard analysis")) 133 | ``` 134 | 135 | Retrieve the result at a later date 136 | 137 | ```{r, eval=FALSE} 138 | result <- readRDS(bfcrpath(bfc, "airway / DESeq standard analysis")) 139 | ``` 140 | 141 | One might imagine the following workflow: 142 | 143 | ```{r eval=FALSE} 144 | suppressPackageStartupMessages({ 145 | library(BiocFileCache) 146 | library(rtracklayer) 147 | }) 148 | 149 | # load the cache 150 | path <- file.path(tempdir(), "tempCacheDir") 151 | bfc <- BiocFileCache(path) 152 | 153 | # the web resource of interest 154 | url <- "ftp://ftp.ensembl.org/pub/release-71/gtf/homo_sapiens/Homo_sapiens.GRCh37.71.gtf.gz" 155 | 156 | # check if url is being tracked 157 | res <- bfcquery(bfc, url, exact=TRUE) 158 | 159 | if (bfccount(res) == 0L) { 160 | 161 | # if it is not in cache, add 162 | ans <- bfcadd(bfc, rname="ensembl, homo sapien", fpath=url) 163 | 164 | } else { 165 | 166 | # if it is in cache, get path to load 167 | rid = res$rid 168 | ans <- bfcrpath(bfc, rid) 169 | 170 | # check to see if the resource needs to be updated 171 | check <- bfcneedsupdate(bfc, rid) 172 | # check can be NA if it cannot be determined, choose how to handle 173 | if (is.na(check)) check <- TRUE 174 | if (check){ 175 | ans < - bfcdownload(bfc, rid) 176 | } 177 | } 178 | 179 | # ans is the path of the file to load 180 | ans 181 | 182 | # we know because we search for the url that the file is a .gtf.gz, 183 | # if we searched on other terms we can use 'bfcpath' to see the 184 | # original fpath to know the appropriate load/read/import method 185 | bfcpath(bfc, names(ans)) 186 | 187 | temp = GTFFile(ans) 188 | info = import(temp) 189 | ``` 190 | 191 | ```{r, ensemblremote, eval=TRUE} 192 | 193 | # 194 | # A simpler test to see if something is in the cache 195 | # and if not start tracking it is using `bfcrpath` 196 | # 197 | 198 | suppressPackageStartupMessages({ 199 | library(BiocFileCache) 200 | library(rtracklayer) 201 | }) 202 | 203 | # load the cache 204 | path <- file.path(tempdir(), "tempCacheDir") 205 | bfc <- BiocFileCache(path, ask=FALSE) 206 | 207 | # the web resources of interest 208 | url <- "ftp://ftp.ensembl.org/pub/release-71/gtf/homo_sapiens/Homo_sapiens.GRCh37.71.gtf.gz" 209 | 210 | url2 <- "ftp://ftp.ensembl.org/pub/release-71/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_5.0.71.gtf.gz" 211 | 212 | # if not in cache will download and create new entry 213 | pathsToLoad <- bfcrpath(bfc, c(url, url2)) 214 | 215 | pathsToLoad 216 | 217 | # now load files as see fit 218 | info = import(GTFFile(pathsToLoad[1])) 219 | class(info) 220 | summary(info) 221 | ``` 222 | 223 | ```{r eval=FALSE} 224 | # 225 | # One could also imagine the following: 226 | # 227 | 228 | library(BiocFileCache) 229 | 230 | # load the cache 231 | bfc <- BiocFileCache() 232 | 233 | # 234 | # Do some work! 235 | # 236 | 237 | # add a location in the cache 238 | filepath <- bfcnew(bfc, "R workspace") 239 | 240 | save(list = ls(), file=filepath) 241 | 242 | # now the R workspace is being tracked in the cache 243 | ``` 244 | 245 | ## Cache to manage package data 246 | 247 | A package may desire to use BiocFileCache to manage remote data. The following 248 | is example code providing some best practice guidelines. 249 | 250 | 1. Creating the cache 251 | 252 | Assumingly, the cache could potentially be called in a variety of places within 253 | code, examples, and vignette. It is desirable to have a wrapper to the 254 | BiocFileCache constructor. The following is a suggested example for a package 255 | called `MyNewPackage`: 256 | 257 | ```{r, eval=FALSE} 258 | .get_cache <- 259 | function() 260 | { 261 | cache <- tools::R_user_dir("MyNewPackage", which="cache") 262 | BiocFileCache::BiocFileCache(cache) 263 | } 264 | ``` 265 | Essentially this will create a unique cache for the package. If run 266 | interactively, the user will have the option to permanently create the package 267 | cache, else a temporary directory will be used. 268 | 269 | 2. Resources in the cache 270 | 271 | Managing remote resources then involves a function that will query to see if the 272 | resource has been added, if it is not it will add to the cache and if it has it 273 | checks if the file needs to be updated. 274 | 275 | ```{r, eval=FALSE} 276 | download_data_file <- 277 | function( verbose = FALSE ) 278 | { 279 | fileURL <- "http://a_path_to/someremotefile.tsv.gz" 280 | 281 | bfc <- .get_cache() 282 | rid <- bfcquery(bfc, "geneFileV2", "rname")$rid 283 | if (!length(rid)) { 284 | if( verbose ) 285 | message( "Downloading GENE file" ) 286 | rid <- names(bfcadd(bfc, "geneFileV2", fileURL )) 287 | } 288 | if (!isFALSE(bfcneedsupdate(bfc, rid))) 289 | bfcdownload(bfc, rid) 290 | 291 | bfcrpath(bfc, rids = rid) 292 | } 293 | ``` 294 | 295 | ## Processing web resources before caching 296 | 297 | A case has been identified where it may be desired to do some 298 | processing of web-based resources before saving the resource in the 299 | cache. This can be done through specific options of the `bfcadd()` and 300 | `bfcdownload()` functions. 301 | 302 | 1. Add the resource with `bfcadd()` using the `download=FALSE` argument. 303 | 2. Download the resource with `bfcdownload()` using the `FUN` argument. 304 | 305 | The `FUN` argument is the name of a function to be applied before 306 | saving the downloaded file into the cache. The default is 307 | `file.rename`, simply copying the downloaded file into the cache. A 308 | user-supplied function must take ONLY two arguments. When invoked, the 309 | arguments will be: 310 | 311 | 1. `character(1)` A temporary file containing the resource as 312 | retrieved from the web. 313 | 2. `character(1)` The BiocFileCache location where the processed file 314 | should be saved. 315 | 316 | The function should return a `TRUE` on success or a `character(1)` 317 | description for failure on error. As an example: 318 | 319 | ```{r, preprocess} 320 | url <- "http://bioconductor.org/packages/stats/bioc/BiocFileCache/BiocFileCache_stats.tab" 321 | 322 | headFile <- # how to process file before caching 323 | function(from, to) 324 | { 325 | dat <- readLines(from) 326 | writeLines(head(dat), to) 327 | TRUE 328 | } 329 | 330 | rid <- bfcquery(bfc, url, "fpath")$rid 331 | if (!length(rid)) # not in cache, add but do not download 332 | rid <- names(bfcadd(bfc, url, download = FALSE)) 333 | 334 | update <- bfcneedsupdate(bfc, rid) # TRUE if newly added or stale 335 | if (!isFALSE(update)) # download & process 336 | bfcdownload(bfc, rid, ask = FALSE, FUN = headFile) 337 | 338 | rpath <- bfcrpath(bfc, rids=rid) # path to processed result 339 | readLines(rpath) # read processed result 340 | ``` 341 | 342 | Note: By default bfcadd uses the webfile name as the saved local file. If the 343 | processing step involves saving the data in a different format, utilize the 344 | bfcadd argument `ext` to assign an extension to identify the type of file that 345 | was saved. 346 | For example 347 | ``` 348 | url = "https://httpbin.org/get" 349 | bfcadd("myfile", url, download=FALSE) 350 | # would save a file `_get` in the cache 351 | bfcadd("myfile", url, download=FALSE, ext=".Rdata") 352 | # would save a file `_get.Rdata` in the cache 353 | ``` 354 | 355 | # Access Behind a Proxy 356 | 357 | BiocFileCache uses CRAN package `httr2` functions for accessing and downloading 358 | web resources. This can be problematic if operating behind a 359 | proxy. Unfortunately unlike the previously used `httr::set_config`, there is no 360 | option to globally set the proxy for httr2 requests. You can pass proxy 361 | information into the bfcadd, bfcupdate, bfcneedsupdate, and bfcdownload 362 | functions. 363 | 364 | ```{r eval=FALSE} 365 | proxy <- "http://my_user:my_password@myproxy:8080" 366 | bfcadd(bfc, rname="uniquename", fpath="https://remoteresource", proxy=proxy) 367 | ``` 368 | You can also set a `https_proxy` environment variable. It should be a character 369 | vector similar to the format above. 370 | 371 | # Other configuration options for resource downloading 372 | 373 | As mentioned previously, There is no global option like `httr::set_config` to 374 | set configuration options when using httr2. A `config` argument may be passed to 375 | bfcadd, bfcupdate, bfcneedsupdate, and bfcdownload functions. This argument is a 376 | R list object that will be passed to `httr2::req_options`. The names of the 377 | items should be valid curl options as defined in `curl::curl_options`. 378 | 379 | ```{r eval=FALSE} 380 | ssl_opts <- list(verbose = 1L,ssl_verifypeer = 0L, ssl_verifyhost = 0L) 381 | bfcadd(bfc, rname="uniquename", fpath="https://remoteresource", config=ssl_opts) 382 | ``` 383 | 384 | # Group Cache Access 385 | 386 | The situation may occur where a cache is desired to be shared across multiple 387 | users on a system. This presents permissions errors. To allow access to 388 | multiple users create a group that the users belong to and that the cache 389 | belongs too. Permissions of potentially two files need to be altered depending 390 | on what you would like individuals to be able to accomplish with the cache. A 391 | read-only cache will require manual manipulatios of the 392 | BiocFileCache.sqlite.LOCK so that the group permissions are `g+rw`. To allow 393 | users to download files to the shared cache, both the 394 | BiocFileCache.sqlite.LOCK file and the BiocFileCache.sqlite file will need group 395 | permissions to `g+rw`. Please google how to create a user group for your system 396 | of interest. To find the location of the cache to be able to change the group 397 | and file permissions, you may run the following in R if you used the default location: 398 | `tools::R_user_dir("BiocFileCache", which="cache")` or if you created a unique 399 | location, something like the following: `bfc = 400 | BiocFileCache(cache="someUniquelocation"); bfccache(bfc)`. For quick reference 401 | in linux you will use `chown currentuser:newgroup` to change the group and 402 | `chmod` to change the file permissions: `chmod 660` or `chmod g+rw` should 403 | accomplish the correct permissions. 404 | 405 | # Summary 406 | 407 | It is our hope that this package allows for easier management of local and 408 | remote resources. 409 | 410 | 411 | # SessionInfo 412 | 413 | ```{r, sessioninfo} 414 | sessionInfo() 415 | 416 | ``` 417 | 418 | 419 | 420 | [BiocFileCache]: https://bioconductor.org/packages/BiocFileCache 421 | [dplyr]: https://cran.r-project.org/package=dplyr 422 | --------------------------------------------------------------------------------