├── .Rbuildignore ├── .gitignore ├── BatchGetSymbols.Rproj ├── CRAN-RELEASE ├── CRAN-SUBMISSION ├── DESCRIPTION ├── NAMESPACE ├── NEWS.md ├── R ├── BatchGetSymbols.R ├── GetFTSE100Stocks.R ├── GetSP500Stocks.R ├── Get_Ibov_Stocks.R ├── Utils.R └── myGetSymbols.R ├── README.md ├── _pkgdown.yml ├── docs ├── 404.html ├── articles │ ├── BatchGetSymbols-vignette.html │ ├── BatchGetSymbols-vignette_files │ │ └── figure-html │ │ │ └── plot.prices-1.png │ └── index.html ├── authors.html ├── bootstrap-toc.css ├── bootstrap-toc.js ├── docsearch.css ├── docsearch.js ├── index.html ├── link.svg ├── news │ └── index.html ├── pkgdown.css ├── pkgdown.js ├── pkgdown.yml ├── reference │ ├── BatchGetSymbols.html │ ├── GetFTSE100Stocks.html │ ├── GetIbovStocks.html │ ├── GetSP500Stocks.html │ ├── Rplot001.png │ ├── calc.ret.html │ ├── df.fill.na.html │ ├── fix.ticker.name.html │ ├── get.clean.data.html │ ├── index.html │ ├── myGetSymbols.html │ └── reshape.wide.html └── sitemap.xml ├── inst └── extdata │ └── ExampleData.rds ├── man ├── BatchGetSymbols.Rd ├── GetFTSE100Stocks.Rd ├── GetIbovStocks.Rd ├── GetSP500Stocks.Rd ├── calc.ret.Rd ├── df.fill.na.Rd ├── fix.ticker.name.Rd ├── get.clean.data.Rd ├── myGetSymbols.Rd └── reshape.wide.Rd ├── tests ├── testthat.R └── testthat │ └── test_BatchGetSymbols.R └── vignettes ├── BatchGetSymbols-vignette.R ├── BatchGetSymbols-vignette.Rmd └── BatchGetSymbols-vignette.html /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^CRAN-RELEASE$ 2 | ^docs$ 3 | ^_pkgdown\.yml$ 4 | ^.*\.Rproj$ 5 | ^\.Rproj\.user$ 6 | ^CRAN-SUBMISSION$ 7 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | -------------------------------------------------------------------------------- /BatchGetSymbols.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: No 4 | SaveWorkspace: No 5 | AlwaysSaveHistory: No 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | 15 | AutoAppendNewline: Yes 16 | StripTrailingWhitespace: Yes 17 | 18 | BuildType: Package 19 | PackageUseDevtools: Yes 20 | PackageInstallArgs: --no-multiarch --with-keep.source 21 | PackageRoxygenize: rd,collate,namespace,vignette 22 | -------------------------------------------------------------------------------- /CRAN-RELEASE: -------------------------------------------------------------------------------- 1 | This package was submitted to CRAN on 2020-11-22. 2 | Once it is accepted, delete this file and tag the release (commit 52c8b89). 3 | -------------------------------------------------------------------------------- /CRAN-SUBMISSION: -------------------------------------------------------------------------------- 1 | Version: 2.6.4 2 | Date: 2022-05-01 14:54:02 UTC 3 | SHA: ccf89f21a0222b7d665579510c63bf1745e125c8 4 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: BatchGetSymbols 2 | Title: Downloads and Organizes Financial Data for Multiple Tickers 3 | Version: 2.6.4 4 | Authors@R: person("Marcelo", "Perlin", email = "marceloperlin@gmail.com", role = c("aut", "cre")) 5 | Description: Makes it easy to download financial data from Yahoo Finance . 6 | Depends: 7 | R (>= 3.4.0), rvest, dplyr 8 | Imports: stringr, curl, quantmod, XML, tidyr, 9 | lubridate, scales, furrr, purrr, future, tibble, zoo, crayon, 10 | cli, lifecycle 11 | License: GPL-2 12 | LazyData: true 13 | RoxygenNote: 7.1.2 14 | Suggests: knitr, 15 | rmarkdown, 16 | testthat, 17 | ggplot2 18 | VignetteBuilder: knitr 19 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | export(BatchGetSymbols) 4 | export(GetFTSE100Stocks) 5 | export(GetIbovStocks) 6 | export(GetSP500Stocks) 7 | export(calc.ret) 8 | export(df.fill.na) 9 | export(fix.ticker.name) 10 | export(get.clean.data) 11 | export(myGetSymbols) 12 | export(reshape.wide) 13 | import(dplyr) 14 | import(rvest) 15 | -------------------------------------------------------------------------------- /NEWS.md: -------------------------------------------------------------------------------- 1 | ## Version 2.6.5 (2022-07-01) 2 | 3 | - again improved deprecation message 4 | 5 | ## Version 2.6.4 (2022-05-01) 6 | 7 | - improved deprecation message using lifecycle 8 | - added message for do_paralell = TRUE (YF added api calls limits) (see issue [27](https://github.com/msperlin/BatchGetSymbols/issues/27)) 9 | - improved tests 10 | 11 | ## Version 2.6.3 (2022-03-30) 12 | 13 | - Added deprecation message 14 | 15 | ## Version 2.6.2 (2022-03-08) 16 | 17 | - Added message for ibov composition 18 | 19 | ## Version 2.6.1 (2020-11-27) 20 | 21 | - Fixed issue [issue 21](https://github.com/msperlin/BatchGetSymbols/issues/21), which only happened in Windows. 22 | - changed default cache dir for ticker grabbing function 23 | 24 | ## Version 2.6 (2020-11-22) 25 | 26 | - The cache system is now session-persistent with `cache.dir = file.path(tempdir(), 'BGS_Cache')`. This solves the problem with mismatching price series from cached data between splits or dividends. A new warning is set whenever the user uses cache.dir different from temp.dir() 27 | - removed dplyr grouping message 28 | 29 | ## Version 2.5.9 (2020-11-17) 30 | 31 | - removed rownames from output ([issue 18](https://github.com/msperlin/BatchGetSymbols/issues/18)) 32 | - Made sure that, on "weekly" mode, the first day of week is a monday ([issue 19](https://github.com/msperlin/BatchGetSymbols/issues/19)) 33 | - Added input how.to.aggregate that will allow the user to aggregate de data by the last or first prices of intervals ([issue 19](https://github.com/msperlin/BatchGetSymbols/issues/19)) 34 | 35 | ## Version 2.5.8 (2020-05-08) 36 | 37 | - Fixed bug in NA price 38 | 39 | ## Version 2.5.7 (2020-04-21) 40 | 41 | - Fixed bug in sp500 fct 42 | - Fixed bug with repeated rows of data ([git issue 16](https://github.com/msperlin/BatchGetSymbols/issues/16)) 43 | 44 | ## Version 2.5.6 (2020-02-25) 45 | 46 | - Improved startup message for 2020 book (link to online version) 47 | 48 | ## Version 2.5.5 (2019-02-17) 49 | 50 | - Fixed bug in cleaning function 51 | - Added startup message for 2020 book 52 | 53 | ## Version 2.5.4 (2019-10-12) 54 | 55 | - Fixed small bug in vignette 56 | 57 | ## Version 2.5.3 (2019-07-05) 58 | 59 | - Added option for keeping function quiet (be.quiet) 60 | - Once again fixed function for grabbing stocks from the SP500 index 61 | 62 | ## Version 2.5.2 (2019-04-24) 63 | 64 | - Fixed bug in stock index grabbing functions 65 | 66 | ## Version 2.5.1 (2019-04-13) 67 | 68 | - New function for FTSE100 Stocks (thanks samprohaska) 69 | 70 | ## Version 2.5 (2019-04-13) 71 | 72 | - Implemented option for parallel computations with BatchGetSymbols 73 | - Added caching system for index grabing functions (SP500 and IBOV) 74 | 75 | ## Version 2.4 (2019-03-23) 76 | 77 | - Fixed bug in function that downloads SP500 data. 78 | 79 | ## Version 2.3 (2018-11-25) 80 | 81 | - User can now choose to return a full balanced price dataset by filling NA values or volume == 0 for their closest prices. 82 | - Fixed small bug in cache system when an empty dataframe is returned 83 | - Fixed bug in function that downloads ibovespa's composition 84 | 85 | ## Version 2.2 (2018-10-10) 86 | 87 | - Users can now set frequency of data (daily, weekly, monthly or yearly) 88 | 89 | ## Version 2.1 (2018-05-08) 90 | 91 | Small update: 92 | 93 | - added startup message with book link 94 | 95 | ## Version 2.0 (2018-01-22) 96 | 97 | Major update: 98 | 99 | - New fct GetIBovStocks for downloading current composition of Ibovespa 100 | - New fct for changing to wide format 101 | - Now the output includes returns and not only prices 102 | - removed annoying startup message 103 | - implemented clever caching system for financial data 104 | - simplified input dates (no need for Date class any more) 105 | 106 | 107 | ## Version 1.2 (2017-06-26) 108 | 109 | - Fixed issue with GetSP500(). The wikipedia page changed its code.. 110 | 111 | ## Version 1.1 (2016-12-05) 112 | 113 | - Fixed CRAN NOTE (stats not used in imports) 114 | - Fixed typos and improved vignette 115 | - Couple of fixes in documentation 116 | - Added citation message on startup 117 | 118 | ## Version 1.0 (2016-11-06) 119 | 120 | - First commit 121 | -------------------------------------------------------------------------------- /R/BatchGetSymbols.R: -------------------------------------------------------------------------------- 1 | #' Function to download financial data 2 | #' 3 | #' This function downloads financial data from Yahoo Finance using \code{\link[quantmod]{getSymbols}}. 4 | #' Based on a set of tickers and a time period, the function will download the data for each ticker and return a report of the process, along with the actual data in the long dataframe format. 5 | #' The main advantage of the function is that it automatically recognizes the source of the dataset from the ticker and structures the resulting data from different sources in the long format. 6 | #' A caching system is also available, making it very fast. 7 | #' 8 | #' @section Warning: 9 | #' 10 | #' Do notice that since 2019, adjusted prices are no longer available from google finance. 11 | #' When using this source, the function will output NA values for this column. 12 | #' 13 | #' Also, be aware that when using cache system in a local folder (and not the default tempdir()), the aggregate prices series might not match if 14 | #' a split or dividends event happens in between cache files. 15 | #' 16 | #' @param tickers A vector of tickers. If not sure whether the ticker is available, check the websites of google and yahoo finance. The source for downloading 17 | #' the data can either be Google or Yahoo. The function automatically selects the source webpage based on the input ticker. 18 | #' @param first.date The first date to download data (date or char as YYYY-MM-DD) 19 | #' @param last.date The last date to download data (date or char as YYYY-MM-DD) 20 | #' @param bench.ticker The ticker of the benchmark asset used to compare dates. My suggestion is to use the main stock index of the market from where the data is coming from (default = ^GSPC (SP500, US market)) 21 | #' @param type.return Type of price return to calculate: 'arit' (default) - aritmetic, 'log' - log returns. 22 | #' @param freq.data Frequency of financial data ('daily', 'weekly', 'monthly', 'yearly') 23 | #' @param how.to.aggregate Defines whether to aggregate the data using the first observations of the aggregating period or last ('first', 'last'). 24 | #' For example, if freq.data = 'yearly' and how.to.aggregate = 'last', the last available day of the year will be used for all 25 | #' aggregated values such as price.adjusted. 26 | #' @param thresh.bad.data A percentage threshold for defining bad data. The dates of the benchmark ticker are compared to each asset. If the percentage of non-missing dates 27 | #' with respect to the benchmark ticker is lower than thresh.bad.data, the function will ignore the asset (default = 0.75) 28 | #' @param do.complete.data Return a complete/balanced dataset? If TRUE, all missing pairs of ticker-date will be replaced by NA or closest price (see input do.fill.missing.prices). Default = FALSE. 29 | #' @param do.fill.missing.prices Finds all missing prices and replaces them by their closest price with preference for the previous price. This ensures a balanced dataset for all assets, without any NA. Default = TRUE. 30 | #' @param do.cache Use cache system? (default = TRUE) 31 | #' @param cache.folder Where to save cache files? (default = file.path(tempdir(), 'BGS_Cache') ) 32 | #' @param do.parallel Flag for using parallel or not (default = FALSE). Before using parallel, make sure you call function future::plan() first. 33 | #' @param be.quiet Logical for printing statements (default = FALSE) 34 | #' @return A list with the following items: \describe{ 35 | #' \item{df.control }{A dataframe containing the results of the download process for each asset} 36 | #' \item{df.tickers}{A dataframe with the financial data for all valid tickers} } 37 | #' @export 38 | #' @import dplyr 39 | #' 40 | #' @seealso \link[quantmod]{getSymbols} 41 | #' 42 | #' @examples 43 | #' tickers <- c('FB','MMM') 44 | #' 45 | #' first.date <- Sys.Date()-30 46 | #' last.date <- Sys.Date() 47 | #' 48 | #' l.out <- BatchGetSymbols(tickers = tickers, 49 | #' first.date = first.date, 50 | #' last.date = last.date, do.cache=FALSE) 51 | #' 52 | #' print(l.out$df.control) 53 | #' print(l.out$df.tickers) 54 | BatchGetSymbols <- function(tickers, 55 | first.date = Sys.Date()-30, 56 | last.date = Sys.Date(), 57 | thresh.bad.data = 0.75, 58 | bench.ticker = '^GSPC', 59 | type.return = 'arit', 60 | freq.data = 'daily', 61 | how.to.aggregate = 'last', 62 | do.complete.data = FALSE, 63 | do.fill.missing.prices = TRUE, 64 | do.cache = TRUE, 65 | cache.folder = file.path(tempdir(), 66 | 'BGS_Cache'), 67 | do.parallel = FALSE, 68 | be.quiet = FALSE) { 69 | 70 | # 20220701 DEPRECATION 71 | my_message <- stringr::str_glue( 72 | "2022-07-01: BatchGetSymbols is being **replaced** by package yfR, ", 73 | " a better and more comprehensive module for fetching Yahoo Finance data.\n", 74 | "More details about the change is available at github ", 75 | "\nYou can install yfR with the following code:\n\n", 76 | "install.packages('yfR')\n\n", 77 | "and fetch data with function yf_get()" 78 | ) 79 | lifecycle::deprecate_soft(when = "v2.6.4 (2022-07-01)", 80 | what = "BatchGetSymbols::BatchGetSymbols()", 81 | details = c(i = my_message) ) 82 | 83 | # check for internet 84 | test.internet <- curl::has_internet() 85 | if (!test.internet) { 86 | stop('No internet connection found...') 87 | } 88 | 89 | # check cache folder 90 | if ( (do.cache)&(!dir.exists(cache.folder))) dir.create(cache.folder) 91 | 92 | # check options 93 | possible.values <- c('arit', 'log') 94 | if (!any(type.return %in% possible.values)) { 95 | stop(paste0('Input type.ret should be one of:\n\n', paste0(possible.values, collapse = '\n'))) 96 | } 97 | 98 | possible.values <- c('first', 'last') 99 | if (!any(how.to.aggregate %in% possible.values)) { 100 | stop(paste0('Input how.to.aggregate should be one of:\n\n', paste0(possible.values, collapse = '\n'))) 101 | } 102 | 103 | # check for NA 104 | if (any(is.na(tickers))) { 105 | my.msg <- paste0('Found NA value in ticker vector.', 106 | 'You need to remove it before running BatchGetSymbols.') 107 | stop(my.msg) 108 | } 109 | 110 | possible.values <- c('daily', 'weekly', 'monthly', 'yearly') 111 | if (!any(freq.data %in% possible.values)) { 112 | stop(paste0('Input freq.data should be one of:\n\n', paste0(possible.values, collapse = '\n'))) 113 | } 114 | 115 | # check date class 116 | first.date <- as.Date(first.date) 117 | last.date <- as.Date(last.date) 118 | 119 | if (class(first.date) != 'Date') { 120 | stop('ERROR: Input first.date should be of class Date') 121 | } 122 | 123 | if (class(last.date) != 'Date') { 124 | stop('ERROR: Input first.date should be of class Date') 125 | } 126 | 127 | if (last.date<=first.date){ 128 | stop('The last.date is lower (less recent) or equal to first.date. Check your dates!') 129 | } 130 | 131 | 132 | # check tickers 133 | if (!is.null(tickers)){ 134 | tickers <- as.character(tickers) 135 | 136 | if (class(tickers)!='character'){ 137 | stop('The input tickers should be a character object.') 138 | } 139 | } 140 | 141 | # check threshold 142 | if ( (thresh.bad.data<0)|(thresh.bad.data>1)){ 143 | stop('Input thresh.bad.data should be a proportion between 0 and 1') 144 | } 145 | 146 | # build tickers.src (google tickers have : in their name) 147 | tickers.src <- ifelse(stringr::str_detect(tickers,':'),'google','yahoo') 148 | 149 | if (any(tickers.src == 'google')) { 150 | my.msg <- 'Google is no longer providing price data. 151 | You should be using tickers from YFinance' 152 | stop(my.msg) 153 | } 154 | 155 | # fix for dates with google finance data 156 | # details: http://stackoverflow.com/questions/20472376/quantmod-empty-dates-in-getsymbols-from-google 157 | 158 | if(any(tickers.src=='google')){ 159 | suppressWarnings({ 160 | invisible(Sys.setlocale("LC_MESSAGES", "C")) 161 | invisible(Sys.setlocale("LC_TIME", "C")) 162 | }) 163 | } 164 | 165 | # check if using do_parallel = TRUE 166 | # 20220501 Yahoo finance started setting limits to api calls, which 167 | # invalidates the use of any parallel computation 168 | if (do.parallel) { 169 | my_message <- stringr::str_glue( 170 | "Since 2022-04-25, Yahoo Finance started to set limits to api calls, ", 171 | "resulting in 401 errors. When using parallel computations for fetching ", 172 | "data, the limit is reached easily. Said that, the parallel option is now", 173 | " disabled by default. Please set do_parallel = FALSE to use this function.", 174 | "\n\n", 175 | "Returning empty dataframe.") 176 | 177 | cli::cli_alert_danger(my_message) 178 | return(data.frame()) 179 | 180 | } 181 | 182 | # disable dplyr group message 183 | options(dplyr.summarise.inform = FALSE) 184 | 185 | # first screen msgs 186 | 187 | if (!be.quiet) { 188 | message('\nRunning BatchGetSymbols for:', appendLF = FALSE ) 189 | message('\n tickers =', paste0(tickers, collapse = ', '), appendLF = FALSE ) 190 | message('\n Downloading data for benchmark ticker', appendLF = FALSE ) 191 | } 192 | 193 | # detect if bench.src is google or yahoo (google tickers have : in their name) 194 | bench.src <- ifelse(stringr::str_detect(bench.ticker,':'),'google','yahoo') 195 | 196 | df.bench <- myGetSymbols(ticker = bench.ticker, 197 | i.ticker = 1, 198 | length.tickers = 1, 199 | src = bench.src, 200 | first.date = first.date, 201 | last.date = last.date, 202 | do.cache = do.cache, 203 | cache.folder = cache.folder, 204 | be.quiet = be.quiet) 205 | 206 | # run fetching function for all tickers 207 | 208 | l.args <- list(ticker = tickers, 209 | i.ticker = seq_along(tickers), 210 | length.tickers = length(tickers), 211 | src = tickers.src, 212 | first.date = first.date, 213 | last.date = last.date, 214 | do.cache = do.cache, 215 | cache.folder = cache.folder, 216 | df.bench = rep(list(df.bench), length(tickers)), 217 | thresh.bad.data = thresh.bad.data, 218 | be.quiet = be.quiet) 219 | 220 | if (!do.parallel) { 221 | 222 | my.l <- purrr::pmap(.l = l.args, 223 | .f = myGetSymbols) 224 | 225 | } else { 226 | 227 | # find number of used cores 228 | formals.parallel <- formals(future::plan()) 229 | used.workers <- formals.parallel$workers 230 | 231 | available.cores <- future::availableCores() 232 | 233 | if (!be.quiet) { 234 | message(paste0('\nRunning parallel BatchGetSymbols with ', used.workers, ' cores (', 235 | available.cores, ' available)'), appendLF = FALSE ) 236 | message('\n\n', appendLF = FALSE ) 237 | } 238 | 239 | 240 | # test if plan() was called 241 | msg <- utils::capture.output(future::plan()) 242 | 243 | flag <- stringr::str_detect(msg[1], 'sequential') 244 | 245 | if (flag) { 246 | stop(paste0('When using do.parallel = TRUE, you need to call future::plan() to configure your parallel settings. \n', 247 | 'A suggestion, write the following lines:\n\n', 248 | 'future::plan(future::multisession, workers = floor(parallel::detectCores()/2))', 249 | '\n\n', 250 | 'The last line should be placed just before calling BatchGetSymbols. ', 251 | 'Notice it will use half of your available cores so that your OS has some room to breathe.')) 252 | } 253 | 254 | 255 | my.l <- furrr::future_pmap(.l = l.args, 256 | .f = myGetSymbols, 257 | .progress = TRUE) 258 | 259 | } 260 | 261 | df.tickers <- dplyr::bind_rows(purrr::map(my.l, 1)) 262 | df.control <- dplyr::bind_rows(purrr::map(my.l, 2)) 263 | 264 | # remove tickers with bad data 265 | tickers.to.keep <- df.control$ticker[df.control$threshold.decision=='KEEP'] 266 | idx <- df.tickers$ticker %in% tickers.to.keep 267 | df.tickers <- df.tickers[idx, ] 268 | 269 | # do data manipulations 270 | if (do.complete.data) { 271 | ticker <- ref.date <- NULL # for cran check: "no visible binding for global..." 272 | df.tickers <- tidyr::complete(df.tickers, ticker, ref.date) 273 | 274 | l.out <- lapply(split(df.tickers, f = df.tickers$ticker), 275 | df.fill.na) 276 | 277 | df.tickers <- dplyr::bind_rows(l.out) 278 | 279 | } 280 | 281 | # change frequency of data 282 | if (freq.data != 'daily') { 283 | 284 | str.freq <- switch(freq.data, 285 | 'weekly' = '1 week', 286 | 'monthly' = '1 month', 287 | 'yearly' = '1 year') 288 | 289 | # find the first monday (see issue #19) 290 | # https://github.com/msperlin/BatchGetSymbols/issues/19 291 | temp_dates <- seq(as.Date(paste0(lubridate::year(min(df.tickers$ref.date)), '-01-01')), 292 | as.Date(paste0(lubridate::year(max(df.tickers$ref.date))+1, '-12-31')), 293 | by = '1 day') 294 | 295 | temp_weekdays <- lubridate::wday(temp_dates, week_start = 1) 296 | first_idx <- min(which(temp_weekdays == 1)) 297 | first_monday <- temp_dates[first_idx] 298 | 299 | if (freq.data == 'weekly') { 300 | # make sure it starts on a monday 301 | week.vec <- seq(first_monday, 302 | as.Date(paste0(lubridate::year(max(df.tickers$ref.date))+1, '-12-31')), 303 | by = str.freq) 304 | } else { 305 | # every other case 306 | week.vec <- seq(as.Date(paste0(lubridate::year(min(df.tickers$ref.date)), '-01-01')), 307 | as.Date(paste0(lubridate::year(max(df.tickers$ref.date))+1, '-12-31')), 308 | by = str.freq) 309 | } 310 | 311 | 312 | df.tickers$time.groups <- cut(x = df.tickers$ref.date, breaks = week.vec, right = FALSE) 313 | 314 | # set NULL vars for CRAN check: "no visible binding..." 315 | time.groups <- volume <- price.open <- price.close <- price.adjusted <- NULL 316 | price.high <- price.low <- NULL 317 | 318 | if (how.to.aggregate == 'first') { 319 | 320 | df.tickers <- df.tickers %>% 321 | group_by(time.groups, ticker) %>% 322 | summarise(ref.date = min(ref.date), 323 | volume = sum(volume, na.rm = TRUE), 324 | price.open = first(price.open), 325 | price.high = max(price.high), 326 | price.low = min(price.low), 327 | price.close = first(price.close), 328 | price.adjusted = first(price.adjusted)) %>% 329 | ungroup() %>% 330 | #select(-time.groups) %>% 331 | arrange(ticker, ref.date) 332 | 333 | 334 | } else if (how.to.aggregate == 'last') { 335 | 336 | df.tickers <- df.tickers %>% 337 | group_by(time.groups, ticker) %>% 338 | summarise(ref.date = min(ref.date), 339 | volume = sum(volume, na.rm = TRUE), 340 | price.open = first(price.open), 341 | price.high = max(price.high), 342 | price.low = min(price.low), 343 | price.close = last(price.close), 344 | price.adjusted = last(price.adjusted) ) %>% 345 | ungroup() %>% 346 | #select(-time.groups) %>% 347 | arrange(ticker, ref.date) 348 | } 349 | 350 | 351 | df.tickers$time.groups <- NULL 352 | } 353 | 354 | 355 | # calculate returns 356 | df.tickers$ret.adjusted.prices <- calc.ret(df.tickers$price.adjusted, 357 | df.tickers$ticker, 358 | type.return) 359 | df.tickers$ret.closing.prices <- calc.ret(df.tickers$price.close, 360 | df.tickers$ticker, 361 | type.return) 362 | 363 | # fix for issue with repeated rows (see git issue 16) 364 | # https://github.com/msperlin/BatchGetSymbols/issues/16 365 | df.tickers = unique(df.tickers) 366 | 367 | # remove rownames from output (see git issue #18) 368 | # https://github.com/msperlin/BatchGetSymbols/issues/18 369 | rownames(df.tickers) <- NULL 370 | 371 | my.l <- list(df.control = df.control, 372 | df.tickers = df.tickers) 373 | 374 | # check if cach folder is tempdir() 375 | flag <- stringr::str_detect(cache.folder, 376 | pattern = stringr::fixed(tempdir())) 377 | 378 | if (!flag) { 379 | warning(stringr::str_glue('\nIt seems you are using a non-default cache folder at {cache.folder}. ', 380 | 'Be aware that if any stock event -- split or dividend -- happens ', 381 | 'in between cache files, the resulting aggregate cache data will not correspond to reality as ', 382 | 'some part of the price data will not be adjusted to the event. ', 383 | 'For safety and reproducibility, my suggestion is to use cache system only ', 384 | 'for the current session with tempdir(), which is the default option.') ) 385 | } 386 | 387 | # enable dplyr group message 388 | options(dplyr.summarise.inform = TRUE) 389 | 390 | return(my.l) 391 | } 392 | -------------------------------------------------------------------------------- /R/GetFTSE100Stocks.R: -------------------------------------------------------------------------------- 1 | #' Function to download the current components of the FTSE100 index from Wikipedia 2 | #' 3 | #' This function scrapes the stocks that constitute the FTSE100 index from the wikipedia page at . 4 | #' 5 | #' @inheritParams BatchGetSymbols 6 | #' 7 | #' @return A dataframe that includes a column with the list of tickers of companies that belong to the FTSE100 index 8 | #' @export 9 | #' @import rvest 10 | #' @examples 11 | #' \dontrun{ 12 | #' df.FTSE100 <- GetFTSE100Stocks() 13 | #' print(df.FTSE100$tickers) 14 | #' } 15 | GetFTSE100Stocks <- function(do.cache = TRUE, 16 | cache.folder = file.path(tempdir(), 17 | 'BGS_Cache')){ 18 | 19 | cache.file <- file.path(cache.folder, 20 | paste0('FTSE100_Composition_', Sys.Date(), '.rds') ) 21 | 22 | if (do.cache) { 23 | # check if file exists 24 | flag <- file.exists(cache.file) 25 | 26 | if (flag) { 27 | df.FTSE100Stocks <- readRDS(cache.file) 28 | return(df.FTSE100Stocks) 29 | } 30 | } 31 | 32 | my.url <- 'https://en.wikipedia.org/wiki/FTSE_100_Index' 33 | 34 | read_html <- 0 # fix for global variable nagging from BUILD 35 | my.xpath <- '//*[@id="mw-content-text"]/div/table[2]' # old xpath 36 | my.xpath <- '//*[@id="constituents"]' 37 | df.FTSE100Stocks <- my.url %>% 38 | read_html() %>% 39 | html_nodes(xpath = my.xpath) %>% 40 | html_table() 41 | 42 | df.FTSE100Stocks <- df.FTSE100Stocks[[1]] 43 | 44 | colnames(df.FTSE100Stocks) <- c('company','tickers','ICB.sector') 45 | 46 | if (do.cache) { 47 | 48 | if (!dir.exists(cache.folder)) dir.create(cache.folder) 49 | 50 | saveRDS(df.FTSE100Stocks, cache.file) 51 | } 52 | 53 | return(df.FTSE100Stocks) 54 | } 55 | -------------------------------------------------------------------------------- /R/GetSP500Stocks.R: -------------------------------------------------------------------------------- 1 | #' Function to download the current components of the SP500 index from Wikipedia 2 | #' 3 | #' This function scrapes the stocks that constitute the SP500 index from the wikipedia page at https://en.wikipedia.org/wiki/List_of_S%26P_500_companies. 4 | #' 5 | #' @inheritParams BatchGetSymbols 6 | #' 7 | #' @return A dataframe that includes a column with the list of tickers of companies that belong to the SP500 index 8 | #' @export 9 | #' @import rvest 10 | #' @examples 11 | #' \dontrun{ 12 | #' df.SP500 <- GetSP500Stocks() 13 | #' print(df.SP500$tickers) 14 | #' } 15 | GetSP500Stocks <- function(do.cache = TRUE, 16 | cache.folder = file.path(tempdir(), 17 | 'BGS_Cache')){ 18 | 19 | cache.file <- file.path(cache.folder, 20 | paste0('SP500_Composition_', Sys.Date(), '.rds') ) 21 | 22 | if (do.cache) { 23 | # check if file exists 24 | flag <- file.exists(cache.file) 25 | 26 | if (flag) { 27 | df.SP500Stocks <- readRDS(cache.file) 28 | return(df.SP500Stocks) 29 | } 30 | } 31 | 32 | my.url <- 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies' 33 | 34 | read_html <- 0 # fix for global variable nagging from BUILD 35 | my.xpath <- '//*[@id="constituents"]' 36 | df.SP500Stocks <- my.url %>% 37 | read_html() %>% 38 | html_nodes(xpath = my.xpath) %>% 39 | html_table(fill = TRUE) 40 | 41 | df.SP500Stocks <- df.SP500Stocks[[1]] 42 | 43 | colnames(df.SP500Stocks) <- c('Tickers','Company','SEC.filings','GICS.Sector', 44 | 'GICS.Sub.Industry','HQ.Location','Date.First.Added','CIK', 'Founded') 45 | 46 | if (do.cache) { 47 | if (!dir.exists(cache.folder)) dir.create(cache.folder) 48 | 49 | saveRDS(df.SP500Stocks, cache.file) 50 | 51 | } 52 | 53 | return(df.SP500Stocks) 54 | } 55 | -------------------------------------------------------------------------------- /R/Get_Ibov_Stocks.R: -------------------------------------------------------------------------------- 1 | #' Function to download the current components of the Ibovespa index from Bovespa website 2 | #' 3 | #' This function scrapes the stocks that constitute the Ibovespa index from the wikipedia page at http://bvmf.bmfbovespa.com.br/indices/ResumoCarteiraTeorica.aspx?Indice=IBOV&idioma=pt-br. 4 | #' 5 | #' @param max.tries Maximum number of attempts to download the data 6 | #' @inheritParams BatchGetSymbols 7 | #' 8 | #' @return A dataframe that includes a column with the list of tickers of companies that belong to the Ibovespa index 9 | #' @export 10 | #' @examples 11 | #' \dontrun{ 12 | #' df.ibov <- GetIbovStocks() 13 | #' print(df.ibov$tickers) 14 | #' } 15 | GetIbovStocks <- function(do.cache = TRUE, 16 | cache.folder = file.path(tempdir(), 17 | 'BGS_Cache'), 18 | max.tries = 10){ 19 | 20 | # warning note: 21 | # https://github.com/msperlin/BatchGetSymbols/issues/25 22 | 23 | warning(paste0("IBOV data is no longer available from the exchange site. ', 24 | ' if you know a different and RELIABLE source of Ibov composition, let me know at .", 25 | "Also, you can manually download a csv file with the current composition from B3 in ")) 26 | 27 | cache.file <- file.path(cache.folder, 28 | paste0('Ibov_Composition_', Sys.Date(), '.rds') ) 29 | 30 | # get list of ibovespa's tickers from wbsite 31 | 32 | if (do.cache) { 33 | # check if file exists 34 | flag <- file.exists(cache.file) 35 | 36 | if (flag) { 37 | df.ibov.comp <- readRDS(cache.file) 38 | return(df.ibov.comp) 39 | } 40 | } 41 | 42 | for (i.try in seq(max.tries)) { 43 | myUrl <- 'http://bvmf.bmfbovespa.com.br/indices/ResumoCarteiraTeorica.aspx?Indice=IBOV&idioma=pt-br' 44 | #df.ibov.comp <- XML::readHTMLTable(myUrl)[[1]] 45 | df.ibov.comp <- as.data.frame(XML::readHTMLTable(myUrl)) 46 | 47 | Sys.sleep(0.5) 48 | 49 | if (nrow(df.ibov.comp) > 0) break() 50 | 51 | } 52 | 53 | names(df.ibov.comp) <- c('tickers', 'ticker.desc', 'type.stock', 'quantity', 'percentage.participation') 54 | 55 | df.ibov.comp$quantity <- as.numeric(stringr::str_replace_all(df.ibov.comp$quantity, 56 | stringr::fixed('.'), '')) 57 | df.ibov.comp$percentage.participation <- as.numeric(stringr::str_replace_all(df.ibov.comp$percentage.participation, 58 | stringr::fixed(','), '.')) 59 | 60 | df.ibov.comp$ref.date <- Sys.Date() 61 | df.ibov.comp$tickers <- as.character(df.ibov.comp$tickers) 62 | 63 | if (do.cache) { 64 | 65 | if (!dir.exists(cache.folder)) dir.create(cache.folder) 66 | 67 | saveRDS(df.ibov.comp, cache.file) 68 | } 69 | 70 | return(df.ibov.comp) 71 | } 72 | 73 | -------------------------------------------------------------------------------- /R/Utils.R: -------------------------------------------------------------------------------- 1 | #' Fix name of ticker 2 | #' 3 | #' Removes bad symbols from names of tickers. This is useful for naming files with cache system. 4 | #' 5 | #' @param ticker.in A bad ticker name 6 | #' @return A good ticker name 7 | #' @export 8 | #' @examples 9 | #' bad.ticker <- '^GSPC' 10 | #' good.ticker <- fix.ticker.name(bad.ticker) 11 | #' good.ticker 12 | fix.ticker.name <- function(ticker.in){ 13 | 14 | ticker.in <- stringr::str_replace_all(ticker.in, stringr::fixed('.'), '') 15 | ticker.in <- stringr::str_replace_all(ticker.in, stringr::fixed('^'), '') 16 | 17 | return(ticker.in) 18 | } 19 | 20 | 21 | #' Get clean data from yahoo/google 22 | #' 23 | #' @param src Source of data (yahoo or google) 24 | #' @inheritParams BatchGetSymbols 25 | #' 26 | #' @return A dataframe with the cleaned data 27 | #' @export 28 | #' 29 | #' @examples 30 | #' df.sp500 <- get.clean.data('^GSPC', 31 | #' first.date = as.Date('2010-01-01'), 32 | #' last.date = as.Date('2010-02-01')) 33 | get.clean.data <- function(tickers, 34 | src = 'yahoo', 35 | first.date, 36 | last.date) { 37 | 38 | # dont push luck with yahoo servers 39 | # No problem in my testings, so far. You can safely leave it unrestricted 40 | #Sys.sleep(0.5) 41 | 42 | # set empty df for errors 43 | df.out <- data.frame() 44 | 45 | suppressMessages({ 46 | suppressWarnings({ 47 | try(df.out <- quantmod::getSymbols(Symbols = tickers, 48 | src = src, 49 | from = first.date, 50 | to = last.date, 51 | auto.assign = F), 52 | silent = T) 53 | }) }) 54 | 55 | if (nrow(df.out) == 0) return(df.out) 56 | 57 | df.out <- as.data.frame(df.out[!duplicated(zoo::index(df.out))]) 58 | 59 | # adjust df for difference of columns from yahoo and google 60 | if (src=='google'){ 61 | 62 | colnames(df.out) <- c('price.open','price.high','price.low','price.close','volume') 63 | df.out$price.adjusted <- NA 64 | 65 | } else { 66 | 67 | colnames(df.out) <- c('price.open','price.high','price.low','price.close','volume','price.adjusted') 68 | } 69 | 70 | # get a nice column for dates and tickers 71 | df.out$ref.date <- as.Date(rownames(df.out)) 72 | df.out$ticker <- tickers 73 | 74 | # remove rownames 75 | rownames(df.out) <- NULL 76 | 77 | # remove rows with NA 78 | idx <- !is.na(df.out$price.adjusted) 79 | df.out <- df.out[idx, ] 80 | 81 | if (nrow(df.out) ==0) return('Error in download') 82 | 83 | return(df.out) 84 | } 85 | 86 | 87 | #' Transforms a dataframe in the long format to a list of dataframes in the wide format 88 | #' 89 | #' @param df.tickers Dataframe in the long format 90 | #' 91 | #' @return A list with dataframes in the wide format 92 | #' @export 93 | #' 94 | #' @examples 95 | #' 96 | #' my.f <- system.file( 'extdata/ExampleData.rds', package = 'BatchGetSymbols' ) 97 | #' df.tickers <- readRDS(my.f) 98 | #' l.wide <- reshape.wide(df.tickers) 99 | #' l.wide 100 | reshape.wide <- function(df.tickers) { 101 | 102 | cols.to.keep <- c('ref.date', 'ticker') 103 | 104 | my.cols <- setdiff(names(df.tickers), cols.to.keep) 105 | 106 | fct.format.wide <- function(name.in, df.tickers) { 107 | 108 | temp.df <- df.tickers[, c('ref.date', 'ticker', name.in)] 109 | 110 | ticker <- NULL # fix for CHECK: "no visible binding..." 111 | temp.df.wide <- tidyr::spread(temp.df, ticker, name.in) 112 | return(temp.df.wide) 113 | 114 | } 115 | 116 | l.out <- lapply(my.cols, fct.format.wide, df.tickers = df.tickers) 117 | names(l.out) <- my.cols 118 | 119 | return(l.out) 120 | 121 | } 122 | 123 | 124 | #' Function to calculate returns from a price and ticker vector 125 | #' 126 | #' Created so that a return column is added to a dataframe with prices in the long (tidy) format. 127 | #' 128 | #' @param P Price vector 129 | #' @param tickers Ticker of symbols (usefull if working with long dataframe) 130 | #' @inheritParams BatchGetSymbols 131 | #' 132 | #' @return A vector of returns 133 | #' @export 134 | #' 135 | #' @examples 136 | #' P <- c(1,2,3) 137 | #' R <- calc.ret(P) 138 | calc.ret <- function(P, 139 | tickers = rep('ticker', length(P)), 140 | type.return = 'arit') { 141 | 142 | my.length <- length(P) 143 | 144 | ret <- switch(type.return, 145 | 'arit' = P/dplyr::lag(P) - 1, 146 | 'log' = log(P/dplyr::lag(P)) ) 147 | 148 | idx <- (tickers != dplyr::lag(tickers)) 149 | ret[idx] <- NA 150 | 151 | return(ret) 152 | } 153 | 154 | #' Replaces NA values in dataframe for closest price 155 | #' 156 | #' Helper function for BatchGetSymbols. Replaces NA values and returns fixed dataframe. 157 | #' 158 | #' @param df.in DAtaframe to be fixed 159 | #' 160 | #' @return A fixed dataframe. 161 | #' @export 162 | #' 163 | #' @examples 164 | #' 165 | #' df <- data.frame(price.adjusted = c(NA, 10, 11, NA, 12, 12.5, NA ), volume = c(1,10, 0, 2, 0, 1, 5)) 166 | #' 167 | #' df.fixed.na <- df.fill.na(df) 168 | #' 169 | df.fill.na = function(df.in) { 170 | 171 | 172 | # find NAs or volume == 0 173 | idx.na <- which(is.na(df.in$price.adjusted) | 174 | df.in$volume == 0) 175 | 176 | if (length(idx.na) ==0) return(df.in) 177 | 178 | idx.not.na <- which(!is.na(df.in$price.adjusted)) 179 | 180 | cols.to.adjust <- c("price.open", "price.high", "price.low", 181 | "price.close", "price.adjusted") 182 | 183 | print(unique(df.in$ticker)) 184 | 185 | cols.to.adjust <- cols.to.adjust[cols.to.adjust %in% names(df.in)] 186 | 187 | # function for finding closest price 188 | fct.find.min.dist <- function(x, vec.comp) { 189 | 190 | if (x < min(vec.comp)) return(min(vec.comp)) 191 | 192 | my.dist <- x - vec.comp 193 | my.dist <- my.dist[my.dist > 0] 194 | idx <- which.min(my.dist)[1] 195 | 196 | return(vec.comp[idx]) 197 | 198 | } 199 | 200 | for (i.col in cols.to.adjust) { 201 | 202 | # adjust for NA by replacing values 203 | idx.to.use <- sapply(idx.na, 204 | fct.find.min.dist, 205 | vec.comp = idx.not.na) 206 | 207 | df.in[idx.na, i.col] <- unlist(df.in[idx.to.use, i.col]) 208 | 209 | } 210 | 211 | # adjust volume for all NAs 212 | df.in$volume[idx.na] <- 0 213 | 214 | return(df.in) 215 | 216 | } 217 | 218 | 219 | .onAttach <- function(libname,pkgname) { 220 | 221 | do_color <- crayon::make_style("#FF4141") 222 | this_pkg <- 'BatchGetSymbols' 223 | 224 | if (interactive()) { 225 | msg <- paste0('\nWant to learn more about ', 226 | do_color(this_pkg), ' and other R packages for Finance and Economics?', 227 | '\nThe second edition (2020) of ', 228 | do_color('Analyzing Financial and Economic Data with R'), ' is available at\n', 229 | do_color('https://www.msperlin.com/afedR/'), 230 | "\n\n", 231 | "WARNING - Package BatchGetSymbols is **soft-deprecated** will soon be substituted ", 232 | "by yfR . You can still use BatchGetSymbols, ", 233 | "but be aware that it will be removed from CRAN once yfR reaches a stable version and ", 234 | "is submitted to CRAN. If you can, start using ", 235 | "yfR in your new projects.", 236 | '\n\n') 237 | } else { 238 | msg <- '' 239 | } 240 | 241 | packageStartupMessage(msg) 242 | 243 | } 244 | 245 | -------------------------------------------------------------------------------- /R/myGetSymbols.R: -------------------------------------------------------------------------------- 1 | #' An improved version of function \code{\link[quantmod]{getSymbols}} from quantmod 2 | #' 3 | #' This is a helper function to \code{\link{BatchGetSymbols}} and it should normaly not be called directly. The purpose of this function is to download financial data based on a ticker and a time period. 4 | #' The main difference from \code{\link[quantmod]{getSymbols}} is that it imports the data as a dataframe with proper named columns and saves data locally with the caching system. 5 | #' 6 | #' @param ticker A single ticker to download data 7 | #' @param src The source of the data ('google' or'yahoo') 8 | #' @param i.ticker A index for the stock that is downloading (for cat() purposes) 9 | #' @param length.tickers total number of stocks being downloaded (also for cat() purposes) 10 | #' @param df.bench Data for bechmark ticker 11 | #' @inheritParams BatchGetSymbols 12 | #' 13 | #' @return A dataframe with the financial data 14 | #' 15 | #' @export 16 | #' @seealso \link[quantmod]{getSymbols} for the base function 17 | #' 18 | #' @examples 19 | #' ticker <- 'FB' 20 | #' 21 | #' first.date <- Sys.Date()-30 22 | #' last.date <- Sys.Date() 23 | #' 24 | #' \dontrun{ 25 | #' df.ticker <- myGetSymbols(ticker, 26 | #' first.date = first.date, 27 | #' last.date = last.date) 28 | #' } 29 | myGetSymbols <- function(ticker, 30 | i.ticker, 31 | length.tickers, 32 | src = 'yahoo', 33 | first.date, 34 | last.date, 35 | do.cache = TRUE, 36 | cache.folder = file.path(tempdir(),'BGS_Cache'), 37 | df.bench = NULL, 38 | be.quiet = FALSE, 39 | thresh.bad.data) { 40 | 41 | 42 | if (!be.quiet) { 43 | message(paste0('\n', ticker, 44 | ' | ', src, ' (', i.ticker,'|', 45 | length.tickers,')'), appendLF = FALSE ) 46 | } 47 | 48 | 49 | 50 | # do cache 51 | if ( (do.cache)) { 52 | 53 | # check if data is in cache files 54 | my.cache.files <- list.files(cache.folder, full.names = TRUE) 55 | 56 | if (length(my.cache.files) > 0) { 57 | l.out <- stringr::str_split(tools::file_path_sans_ext(basename(my.cache.files)), 58 | '_') 59 | 60 | df.cache.files <- dplyr::tibble(f.name = my.cache.files, 61 | ticker = sapply(l.out, function(x) x[1]), 62 | src = sapply(l.out, function(x) x[2]), 63 | first.date = as.Date(sapply(l.out, function(x) x[3])), 64 | last.date = as.Date(sapply(l.out, function(x) x[4]))) 65 | 66 | } else { 67 | # empty df 68 | df.cache.files <- dplyr::tibble(f.name = '', 69 | ticker = '', 70 | src = '', 71 | first.date = first.date, 72 | last.date = last.date) 73 | 74 | } 75 | 76 | # check dates 77 | fixed.ticker <-fix.ticker.name(ticker) 78 | 79 | temp.cache <- dplyr::filter(df.cache.files, 80 | ticker == fixed.ticker, 81 | src == src) 82 | 83 | if (nrow(temp.cache) > 1) { 84 | stop(paste0('Found more than one file in cache for ', ticker, 85 | '\nYou must manually remove one of \n\n', paste0(temp.cache$f.name, collapse = '\n'))) 86 | } 87 | 88 | if (nrow(temp.cache) != 0) { 89 | 90 | df.cache <- data.frame() 91 | flag.dates <- TRUE 92 | 93 | if (!be.quiet) { 94 | message(' | Found cache file', appendLF = FALSE ) 95 | } 96 | 97 | df.cache <- readRDS(temp.cache$f.name) 98 | 99 | # check if data matches 100 | 101 | max.diff.dates <- 0 102 | flag.dates <- ((first.date - temp.cache$first.date) < - max.diff.dates )| 103 | ((last.date - temp.cache$last.date) > max.diff.dates) 104 | 105 | df.out <- data.frame() 106 | if (flag.dates) { 107 | 108 | if (!be.quiet) { 109 | message(' | Need new data', appendLF = FALSE ) 110 | } 111 | 112 | flag.date.bef <- ((first.date - temp.cache$first.date) < - max.diff.dates ) 113 | df.out.bef <- data.frame() 114 | if (flag.date.bef) { 115 | df.out.bef <- get.clean.data(ticker, 116 | src, 117 | first.date, 118 | temp.cache$first.date) 119 | } 120 | 121 | flag.date.aft <- ((last.date - temp.cache$last.date) > max.diff.dates) 122 | df.out.aft <- data.frame() 123 | if (flag.date.aft) { 124 | df.out.aft <- get.clean.data(ticker, 125 | src, 126 | temp.cache$last.date, 127 | last.date) 128 | } 129 | 130 | df.out <- rbind(df.out.bef, df.out.aft) 131 | } 132 | 133 | # merge with cache 134 | df.out <- unique(rbind(df.cache, df.out)) 135 | 136 | # sort it 137 | if (nrow(df.out) > 0 ) { 138 | idx <- order(df.out$ticker, df.out$ref.date) 139 | df.out <- df.out[idx, ] 140 | } 141 | 142 | 143 | # remove old file 144 | file.remove(temp.cache$f.name) 145 | 146 | my.f.out <- paste0(fixed.ticker, '_', 147 | src, '_', 148 | min(c(temp.cache$first.date, first.date)), '_', 149 | max(c(temp.cache$last.date, last.date)), '.rds') 150 | 151 | saveRDS(df.out, file = file.path(cache.folder, my.f.out)) 152 | 153 | # filter for dates 154 | ref.date <- NULL 155 | df.out <- dplyr::filter(df.out, 156 | ref.date >= first.date, 157 | ref.date <= last.date) 158 | 159 | } else { 160 | if (!be.quiet) { 161 | message(' | Not Cached', appendLF = FALSE ) 162 | } 163 | 164 | my.f.out <- paste0(fixed.ticker, '_', 165 | src, '_', 166 | first.date, '_', 167 | last.date, '.rds') 168 | 169 | df.out <- get.clean.data(ticker, 170 | src, 171 | first.date, 172 | last.date) 173 | 174 | # only saves if there is data 175 | if (nrow(df.out) > 1) { 176 | if (!be.quiet) { 177 | message(' | Saving cache', appendLF = FALSE ) 178 | } 179 | saveRDS(df.out, file = file.path(cache.folder, my.f.out)) 180 | } 181 | } 182 | 183 | } else { 184 | df.out <- get.clean.data(ticker, 185 | src, 186 | first.date, 187 | last.date) 188 | } 189 | 190 | # control for ERROr in download 191 | if (nrow(df.out) == 0 ){ 192 | download.status = 'NOT OK' 193 | total.obs = 0 194 | perc.benchmark.dates = 0 195 | threshold.decision = 'OUT' 196 | 197 | df.out <- data.frame() 198 | if (!be.quiet) { 199 | message(' - Error in download..', appendLF = FALSE ) 200 | } 201 | } else { 202 | 203 | # control for returning data when importing bench ticker 204 | if (is.null(df.bench)) return(df.out) 205 | 206 | download.status = 'OK' 207 | total.obs = nrow(df.out) 208 | perc.benchmark.dates = sum(df.out$ref.date %in% df.bench$ref.date)/length(df.bench$ref.date) 209 | 210 | if (perc.benchmark.dates >= thresh.bad.data){ 211 | threshold.decision = 'KEEP' 212 | } else { 213 | threshold.decision = 'OUT' 214 | } 215 | 216 | morale.boost <- c(rep(c('OK!', 'Got it!','Nice!','Good stuff!', 217 | 'Looking good!', 'Good job!', 'Well done!', 218 | 'Feels good!', 'You got it!', 'Youre doing good!'), 10), 219 | 'Boa!', 'Mas bah tche, que coisa linda!', 220 | 'Mais contente que cusco de cozinheira!', 221 | 'Feliz que nem lambari de sanga!', 222 | 'Mais faceiro que guri de bombacha nova!') 223 | 224 | if (!be.quiet) { 225 | if (threshold.decision == 'KEEP') { 226 | message(paste0(' - ', 'Got ', scales::percent(perc.benchmark.dates), ' of valid prices | ', 227 | sample(morale.boost, 1)), appendLF = FALSE ) 228 | } else { 229 | message(paste0(' - ', 'Got ', scales::percent(perc.benchmark.dates), ' of valid prices | ', 230 | 'OUT: not enough data (thresh.bad.data = ', scales::percent(thresh.bad.data), ')'), 231 | appendLF = FALSE ) 232 | 233 | } 234 | } 235 | 236 | df.control <- tibble::tibble(ticker=ticker, 237 | src = src, 238 | download.status, 239 | total.obs, 240 | perc.benchmark.dates, 241 | threshold.decision) 242 | 243 | l.out <- list(df.tickers = df.out, df.control = df.control) 244 | 245 | return(l.out) 246 | 247 | 248 | } 249 | } 250 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | > :warning: Package `BatchGetSymbols` is soft-deprecated in favor of [`yfR`](https://github.com/msperlin/yfR). See this [Readme.md](https://github.com/msperlin/yfR) for my motivation in writing a new R package. In practice, this means that `BatchGetSymbols` is no longer maintained besides the correction of major bugs. All efforts goes to the development of `yfR`. While `BatchGetSymbols` will be be available in CRAN in the near future, my plan is to remove it from CRAN and archive it in Github once `yfR` becames more stable. 2 | 3 | ## Motivation 4 | 5 | [![](https://cranlogs.r-pkg.org/badges/BatchGetSymbols)](https://CRAN.R-project.org/package=BatchGetSymbols) 6 | [![CRAN Status 7 | Badge](http://www.r-pkg.org/badges/version/BatchGetSymbols)](https://cran.r-project.org/package=BatchGetSymbols) 8 | [![CRAN Total 9 | Downloads](http://cranlogs.r-pkg.org/badges/grand-total/BatchGetSymbols)](https://cran.r-project.org/package=BatchGetSymbols) 10 | 11 | BatchGetSymbols is a R package for large-scale download of financial data from Yahoo Finance. Based on a set of tickers and date ranges, the package will download and organize the financial data in the tidy/long format. 12 | 13 | ## Warnings 14 | 15 | - Yahoo finance data is far from perfect or reliable, specially for individual stocks. In my experience, using it for research code with stock **indices** is fine and I can match it with other data sources. But, adjusted stock prices for **individual assets** is messy as stock events such as splits or dividends are not properly registered. I was never able to match it with other data sources. My advice is to never use the data of individual stocks in production. 16 | 17 | - Since version 2.6, the cache system is session-persistent by default, meaning that whenever you restart your R session, you lose all your cached data. This is a safety feature for mismatching prices due to corporate events. 18 | 19 | ## Main features: 20 | 21 | - Organizes data in a tabular/long or wide format, returning prices and returns (arithmetic or logarithmic) 22 | - A session-persistent cache system was implemented in version 2.0, meaning that the data is saved locally and only missing portions of the data are downloaded, if needed. 23 | - All dates are compared to a benchmark ticker such as SP500 and, whenever an individual asset does not have a sufficient number of dates, the software drops it from the output. This means you can choose to ignore tickers with high number of missing dates. 24 | - Allows the choice for the wide format, with tickers as columns 25 | - Users can choose the frequency of the resulting dataset (daily, weekly, monthly, yearly) 26 | - Option for parallel computing, speeding up the data importation process 27 | 28 | 29 | 30 | ## Installation 31 | 32 | ``` 33 | # CRAN (official release) 34 | install.packages('BatchGetSymbols') 35 | 36 | # Github (dev version) 37 | devtools::install_github('msperlin/BatchGetSymbols') 38 | ``` 39 | 40 | ## A simple example 41 | 42 | See [vignette](https://CRAN.R-project.org/package=BatchGetSymbols). 43 | -------------------------------------------------------------------------------- /_pkgdown.yml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/msperlin/BatchGetSymbols/689c84c0fe5177d50afbe6473a7aa1abbf53c29e/_pkgdown.yml -------------------------------------------------------------------------------- /docs/404.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Page not found (404) • BatchGetSymbols 9 | 10 | 11 | 12 | 13 | 14 | 15 | 19 | 20 | 21 | 22 | 23 |
24 |
66 | 67 | 68 | 69 | 70 |
71 |
72 | 75 | 76 | Content not found. Please use links in the navbar. 77 | 78 |
79 | 80 | 84 | 85 |
86 | 87 | 88 | 89 |
93 | 94 |
95 |

96 |

Site built with pkgdown 97 | 2.0.2.

98 |
99 | 100 |
101 |
102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | -------------------------------------------------------------------------------- /docs/articles/BatchGetSymbols-vignette_files/figure-html/plot.prices-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/msperlin/BatchGetSymbols/689c84c0fe5177d50afbe6473a7aa1abbf53c29e/docs/articles/BatchGetSymbols-vignette_files/figure-html/plot.prices-1.png -------------------------------------------------------------------------------- /docs/articles/index.html: -------------------------------------------------------------------------------- 1 | 2 | Articles • BatchGetSymbols 6 | 7 | 8 |
9 |
44 | 45 | 46 | 47 |
48 |
49 | 52 | 53 |
54 |

All vignettes

55 |

56 | 57 |
How to use BatchGetSymbols
58 |
59 |
60 |
61 |
62 | 63 | 64 |
67 | 68 |
69 |

Site built with pkgdown 70 | 2.0.2.

71 |
72 | 73 |
74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | -------------------------------------------------------------------------------- /docs/authors.html: -------------------------------------------------------------------------------- 1 | 2 | Authors and Citation • BatchGetSymbols 6 | 7 | 8 |
9 |
44 | 45 | 46 | 47 |
48 |
49 |
50 | 53 | 54 | 55 |
  • 56 |

    Marcelo Perlin. Author, maintainer. 57 |

    58 |
  • 59 |
60 |
61 |
62 |

Citation

63 | 64 |
65 |
66 | 67 | 68 |

Perlin M (2022). 69 | BatchGetSymbols: Downloads and Organizes Financial Data for Multiple Tickers. 70 | R package version 2.6.3. 71 |

72 |
@Manual{,
 73 |   title = {BatchGetSymbols: Downloads and Organizes Financial Data for Multiple Tickers},
 74 |   author = {Marcelo Perlin},
 75 |   year = {2022},
 76 |   note = {R package version 2.6.3},
 77 | }
78 | 79 |
80 | 81 |
82 | 83 | 84 | 85 |
88 | 89 |
90 |

Site built with pkgdown 91 | 2.0.2.

92 |
93 | 94 |
95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | -------------------------------------------------------------------------------- /docs/bootstrap-toc.css: -------------------------------------------------------------------------------- 1 | /*! 2 | * Bootstrap Table of Contents v0.4.1 (http://afeld.github.io/bootstrap-toc/) 3 | * Copyright 2015 Aidan Feldman 4 | * Licensed under MIT (https://github.com/afeld/bootstrap-toc/blob/gh-pages/LICENSE.md) */ 5 | 6 | /* modified from https://github.com/twbs/bootstrap/blob/94b4076dd2efba9af71f0b18d4ee4b163aa9e0dd/docs/assets/css/src/docs.css#L548-L601 */ 7 | 8 | /* All levels of nav */ 9 | nav[data-toggle='toc'] .nav > li > a { 10 | display: block; 11 | padding: 4px 20px; 12 | font-size: 13px; 13 | font-weight: 500; 14 | color: #767676; 15 | } 16 | nav[data-toggle='toc'] .nav > li > a:hover, 17 | nav[data-toggle='toc'] .nav > li > a:focus { 18 | padding-left: 19px; 19 | color: #563d7c; 20 | text-decoration: none; 21 | background-color: transparent; 22 | border-left: 1px solid #563d7c; 23 | } 24 | nav[data-toggle='toc'] .nav > .active > a, 25 | nav[data-toggle='toc'] .nav > .active:hover > a, 26 | nav[data-toggle='toc'] .nav > .active:focus > a { 27 | padding-left: 18px; 28 | font-weight: bold; 29 | color: #563d7c; 30 | background-color: transparent; 31 | border-left: 2px solid #563d7c; 32 | } 33 | 34 | /* Nav: second level (shown on .active) */ 35 | nav[data-toggle='toc'] .nav .nav { 36 | display: none; /* Hide by default, but at >768px, show it */ 37 | padding-bottom: 10px; 38 | } 39 | nav[data-toggle='toc'] .nav .nav > li > a { 40 | padding-top: 1px; 41 | padding-bottom: 1px; 42 | padding-left: 30px; 43 | font-size: 12px; 44 | font-weight: normal; 45 | } 46 | nav[data-toggle='toc'] .nav .nav > li > a:hover, 47 | nav[data-toggle='toc'] .nav .nav > li > a:focus { 48 | padding-left: 29px; 49 | } 50 | nav[data-toggle='toc'] .nav .nav > .active > a, 51 | nav[data-toggle='toc'] .nav .nav > .active:hover > a, 52 | nav[data-toggle='toc'] .nav .nav > .active:focus > a { 53 | padding-left: 28px; 54 | font-weight: 500; 55 | } 56 | 57 | /* from https://github.com/twbs/bootstrap/blob/e38f066d8c203c3e032da0ff23cd2d6098ee2dd6/docs/assets/css/src/docs.css#L631-L634 */ 58 | nav[data-toggle='toc'] .nav > .active > ul { 59 | display: block; 60 | } 61 | -------------------------------------------------------------------------------- /docs/bootstrap-toc.js: -------------------------------------------------------------------------------- 1 | /*! 2 | * Bootstrap Table of Contents v0.4.1 (http://afeld.github.io/bootstrap-toc/) 3 | * Copyright 2015 Aidan Feldman 4 | * Licensed under MIT (https://github.com/afeld/bootstrap-toc/blob/gh-pages/LICENSE.md) */ 5 | (function() { 6 | 'use strict'; 7 | 8 | window.Toc = { 9 | helpers: { 10 | // return all matching elements in the set, or their descendants 11 | findOrFilter: function($el, selector) { 12 | // http://danielnouri.org/notes/2011/03/14/a-jquery-find-that-also-finds-the-root-element/ 13 | // http://stackoverflow.com/a/12731439/358804 14 | var $descendants = $el.find(selector); 15 | return $el.filter(selector).add($descendants).filter(':not([data-toc-skip])'); 16 | }, 17 | 18 | generateUniqueIdBase: function(el) { 19 | var text = $(el).text(); 20 | var anchor = text.trim().toLowerCase().replace(/[^A-Za-z0-9]+/g, '-'); 21 | return anchor || el.tagName.toLowerCase(); 22 | }, 23 | 24 | generateUniqueId: function(el) { 25 | var anchorBase = this.generateUniqueIdBase(el); 26 | for (var i = 0; ; i++) { 27 | var anchor = anchorBase; 28 | if (i > 0) { 29 | // add suffix 30 | anchor += '-' + i; 31 | } 32 | // check if ID already exists 33 | if (!document.getElementById(anchor)) { 34 | return anchor; 35 | } 36 | } 37 | }, 38 | 39 | generateAnchor: function(el) { 40 | if (el.id) { 41 | return el.id; 42 | } else { 43 | var anchor = this.generateUniqueId(el); 44 | el.id = anchor; 45 | return anchor; 46 | } 47 | }, 48 | 49 | createNavList: function() { 50 | return $(''); 51 | }, 52 | 53 | createChildNavList: function($parent) { 54 | var $childList = this.createNavList(); 55 | $parent.append($childList); 56 | return $childList; 57 | }, 58 | 59 | generateNavEl: function(anchor, text) { 60 | var $a = $(''); 61 | $a.attr('href', '#' + anchor); 62 | $a.text(text); 63 | var $li = $('
  • '); 64 | $li.append($a); 65 | return $li; 66 | }, 67 | 68 | generateNavItem: function(headingEl) { 69 | var anchor = this.generateAnchor(headingEl); 70 | var $heading = $(headingEl); 71 | var text = $heading.data('toc-text') || $heading.text(); 72 | return this.generateNavEl(anchor, text); 73 | }, 74 | 75 | // Find the first heading level (`

    `, then `

    `, etc.) that has more than one element. Defaults to 1 (for `

    `). 76 | getTopLevel: function($scope) { 77 | for (var i = 1; i <= 6; i++) { 78 | var $headings = this.findOrFilter($scope, 'h' + i); 79 | if ($headings.length > 1) { 80 | return i; 81 | } 82 | } 83 | 84 | return 1; 85 | }, 86 | 87 | // returns the elements for the top level, and the next below it 88 | getHeadings: function($scope, topLevel) { 89 | var topSelector = 'h' + topLevel; 90 | 91 | var secondaryLevel = topLevel + 1; 92 | var secondarySelector = 'h' + secondaryLevel; 93 | 94 | return this.findOrFilter($scope, topSelector + ',' + secondarySelector); 95 | }, 96 | 97 | getNavLevel: function(el) { 98 | return parseInt(el.tagName.charAt(1), 10); 99 | }, 100 | 101 | populateNav: function($topContext, topLevel, $headings) { 102 | var $context = $topContext; 103 | var $prevNav; 104 | 105 | var helpers = this; 106 | $headings.each(function(i, el) { 107 | var $newNav = helpers.generateNavItem(el); 108 | var navLevel = helpers.getNavLevel(el); 109 | 110 | // determine the proper $context 111 | if (navLevel === topLevel) { 112 | // use top level 113 | $context = $topContext; 114 | } else if ($prevNav && $context === $topContext) { 115 | // create a new level of the tree and switch to it 116 | $context = helpers.createChildNavList($prevNav); 117 | } // else use the current $context 118 | 119 | $context.append($newNav); 120 | 121 | $prevNav = $newNav; 122 | }); 123 | }, 124 | 125 | parseOps: function(arg) { 126 | var opts; 127 | if (arg.jquery) { 128 | opts = { 129 | $nav: arg 130 | }; 131 | } else { 132 | opts = arg; 133 | } 134 | opts.$scope = opts.$scope || $(document.body); 135 | return opts; 136 | } 137 | }, 138 | 139 | // accepts a jQuery object, or an options object 140 | init: function(opts) { 141 | opts = this.helpers.parseOps(opts); 142 | 143 | // ensure that the data attribute is in place for styling 144 | opts.$nav.attr('data-toggle', 'toc'); 145 | 146 | var $topContext = this.helpers.createChildNavList(opts.$nav); 147 | var topLevel = this.helpers.getTopLevel(opts.$scope); 148 | var $headings = this.helpers.getHeadings(opts.$scope, topLevel); 149 | this.helpers.populateNav($topContext, topLevel, $headings); 150 | } 151 | }; 152 | 153 | $(function() { 154 | $('nav[data-toggle="toc"]').each(function(i, el) { 155 | var $nav = $(el); 156 | Toc.init($nav); 157 | }); 158 | }); 159 | })(); 160 | -------------------------------------------------------------------------------- /docs/docsearch.css: -------------------------------------------------------------------------------- 1 | /* Docsearch -------------------------------------------------------------- */ 2 | /* 3 | Source: https://github.com/algolia/docsearch/ 4 | License: MIT 5 | */ 6 | 7 | .algolia-autocomplete { 8 | display: block; 9 | -webkit-box-flex: 1; 10 | -ms-flex: 1; 11 | flex: 1 12 | } 13 | 14 | .algolia-autocomplete .ds-dropdown-menu { 15 | width: 100%; 16 | min-width: none; 17 | max-width: none; 18 | padding: .75rem 0; 19 | background-color: #fff; 20 | background-clip: padding-box; 21 | border: 1px solid rgba(0, 0, 0, .1); 22 | box-shadow: 0 .5rem 1rem rgba(0, 0, 0, .175); 23 | } 24 | 25 | @media (min-width:768px) { 26 | .algolia-autocomplete .ds-dropdown-menu { 27 | width: 175% 28 | } 29 | } 30 | 31 | .algolia-autocomplete .ds-dropdown-menu::before { 32 | display: none 33 | } 34 | 35 | .algolia-autocomplete .ds-dropdown-menu [class^=ds-dataset-] { 36 | padding: 0; 37 | background-color: rgb(255,255,255); 38 | border: 0; 39 | max-height: 80vh; 40 | } 41 | 42 | .algolia-autocomplete .ds-dropdown-menu .ds-suggestions { 43 | margin-top: 0 44 | } 45 | 46 | .algolia-autocomplete .algolia-docsearch-suggestion { 47 | padding: 0; 48 | overflow: visible 49 | } 50 | 51 | .algolia-autocomplete .algolia-docsearch-suggestion--category-header { 52 | padding: .125rem 1rem; 53 | margin-top: 0; 54 | font-size: 1.3em; 55 | font-weight: 500; 56 | color: #00008B; 57 | border-bottom: 0 58 | } 59 | 60 | .algolia-autocomplete .algolia-docsearch-suggestion--wrapper { 61 | float: none; 62 | padding-top: 0 63 | } 64 | 65 | .algolia-autocomplete .algolia-docsearch-suggestion--subcategory-column { 66 | float: none; 67 | width: auto; 68 | padding: 0; 69 | text-align: left 70 | } 71 | 72 | .algolia-autocomplete .algolia-docsearch-suggestion--content { 73 | float: none; 74 | width: auto; 75 | padding: 0 76 | } 77 | 78 | .algolia-autocomplete .algolia-docsearch-suggestion--content::before { 79 | display: none 80 | } 81 | 82 | .algolia-autocomplete .ds-suggestion:not(:first-child) .algolia-docsearch-suggestion--category-header { 83 | padding-top: .75rem; 84 | margin-top: .75rem; 85 | border-top: 1px solid rgba(0, 0, 0, .1) 86 | } 87 | 88 | .algolia-autocomplete .ds-suggestion .algolia-docsearch-suggestion--subcategory-column { 89 | display: block; 90 | padding: .1rem 1rem; 91 | margin-bottom: 0.1; 92 | font-size: 1.0em; 93 | font-weight: 400 94 | /* display: none */ 95 | } 96 | 97 | .algolia-autocomplete .algolia-docsearch-suggestion--title { 98 | display: block; 99 | padding: .25rem 1rem; 100 | margin-bottom: 0; 101 | font-size: 0.9em; 102 | font-weight: 400 103 | } 104 | 105 | .algolia-autocomplete .algolia-docsearch-suggestion--text { 106 | padding: 0 1rem .5rem; 107 | margin-top: -.25rem; 108 | font-size: 0.8em; 109 | font-weight: 400; 110 | line-height: 1.25 111 | } 112 | 113 | .algolia-autocomplete .algolia-docsearch-footer { 114 | width: 110px; 115 | height: 20px; 116 | z-index: 3; 117 | margin-top: 10.66667px; 118 | float: right; 119 | font-size: 0; 120 | line-height: 0; 121 | } 122 | 123 | .algolia-autocomplete .algolia-docsearch-footer--logo { 124 | background-image: url("data:image/svg+xml;utf8,"); 125 | background-repeat: no-repeat; 126 | background-position: 50%; 127 | background-size: 100%; 128 | overflow: hidden; 129 | text-indent: -9000px; 130 | width: 100%; 131 | height: 100%; 132 | display: block; 133 | transform: translate(-8px); 134 | } 135 | 136 | .algolia-autocomplete .algolia-docsearch-suggestion--highlight { 137 | color: #FF8C00; 138 | background: rgba(232, 189, 54, 0.1) 139 | } 140 | 141 | 142 | .algolia-autocomplete .algolia-docsearch-suggestion--text .algolia-docsearch-suggestion--highlight { 143 | box-shadow: inset 0 -2px 0 0 rgba(105, 105, 105, .5) 144 | } 145 | 146 | .algolia-autocomplete .ds-suggestion.ds-cursor .algolia-docsearch-suggestion--content { 147 | background-color: rgba(192, 192, 192, .15) 148 | } 149 | -------------------------------------------------------------------------------- /docs/docsearch.js: -------------------------------------------------------------------------------- 1 | $(function() { 2 | 3 | // register a handler to move the focus to the search bar 4 | // upon pressing shift + "/" (i.e. "?") 5 | $(document).on('keydown', function(e) { 6 | if (e.shiftKey && e.keyCode == 191) { 7 | e.preventDefault(); 8 | $("#search-input").focus(); 9 | } 10 | }); 11 | 12 | $(document).ready(function() { 13 | // do keyword highlighting 14 | /* modified from https://jsfiddle.net/julmot/bL6bb5oo/ */ 15 | var mark = function() { 16 | 17 | var referrer = document.URL ; 18 | var paramKey = "q" ; 19 | 20 | if (referrer.indexOf("?") !== -1) { 21 | var qs = referrer.substr(referrer.indexOf('?') + 1); 22 | var qs_noanchor = qs.split('#')[0]; 23 | var qsa = qs_noanchor.split('&'); 24 | var keyword = ""; 25 | 26 | for (var i = 0; i < qsa.length; i++) { 27 | var currentParam = qsa[i].split('='); 28 | 29 | if (currentParam.length !== 2) { 30 | continue; 31 | } 32 | 33 | if (currentParam[0] == paramKey) { 34 | keyword = decodeURIComponent(currentParam[1].replace(/\+/g, "%20")); 35 | } 36 | } 37 | 38 | if (keyword !== "") { 39 | $(".contents").unmark({ 40 | done: function() { 41 | $(".contents").mark(keyword); 42 | } 43 | }); 44 | } 45 | } 46 | }; 47 | 48 | mark(); 49 | }); 50 | }); 51 | 52 | /* Search term highlighting ------------------------------*/ 53 | 54 | function matchedWords(hit) { 55 | var words = []; 56 | 57 | var hierarchy = hit._highlightResult.hierarchy; 58 | // loop to fetch from lvl0, lvl1, etc. 59 | for (var idx in hierarchy) { 60 | words = words.concat(hierarchy[idx].matchedWords); 61 | } 62 | 63 | var content = hit._highlightResult.content; 64 | if (content) { 65 | words = words.concat(content.matchedWords); 66 | } 67 | 68 | // return unique words 69 | var words_uniq = [...new Set(words)]; 70 | return words_uniq; 71 | } 72 | 73 | function updateHitURL(hit) { 74 | 75 | var words = matchedWords(hit); 76 | var url = ""; 77 | 78 | if (hit.anchor) { 79 | url = hit.url_without_anchor + '?q=' + escape(words.join(" ")) + '#' + hit.anchor; 80 | } else { 81 | url = hit.url + '?q=' + escape(words.join(" ")); 82 | } 83 | 84 | return url; 85 | } 86 | -------------------------------------------------------------------------------- /docs/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Downloads and Organizes Financial Data for Multiple Tickers • BatchGetSymbols 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 20 | 21 | 22 | 23 | 24 |
    25 |
    67 | 68 | 69 | 70 | 71 |
    72 |
    73 | 74 |
    75 |

    Deprecated! 76 |

    77 |

    Package BatchGetSymbols is soft-deprecated in favor of 78 | yfR. See this Readme.md for my motivation 79 | in writing a new R package. In practice, this means that 80 | BatchGetSymbols is no longer maintained besides the 81 | correction of major bugs. All efforts goes to the development of 82 | yfR.

    83 |

    While BatchGetSymbols will be be available in CRAN in 84 | the near future, my plan is to remove it from CRAN and archive it in 85 | Github once yfR becames more stable.

    86 |
    87 |
    88 |

    Motivation 89 |

    90 |

    91 |

    BatchGetSymbols is a R package for large-scale download of financial 92 | data from Yahoo Finance. Based on a set of tickers and date ranges, the 93 | package will download and organize the financial data in the tidy/long 94 | format.

    95 |
    96 |
    97 |

    Warnings 98 |

    99 |
      100 |
    • Yahoo finance data is far from perfect or reliable, specially for 101 | individual stocks. In my experience, using it for research code with 102 | stock indices is fine and I can match it with other 103 | data sources. But, adjusted stock prices for individual 104 | assets is messy as stock events such as splits or dividends are 105 | not properly registered. I was never able to match it with other data 106 | sources. My advice is to never use the data of individual stocks in 107 | production.

    • 108 |
    • Since version 2.6, the cache system is session-persistent by 109 | default, meaning that whenever you restart your R session, you lose all 110 | your cached data. This is a safety feature for mismatching prices due to 111 | corporate events.

    • 112 |
    113 |
    114 |
    115 |

    Main features: 116 |

    117 |
      118 |
    • Organizes data in a tabular/long or wide format, returning prices 119 | and returns (arithmetic or logarithmic)
    • 120 |
    • A session-persistent cache system was implemented in version 2.0, 121 | meaning that the data is saved locally and only missing portions of the 122 | data are downloaded, if needed.
    • 123 |
    • All dates are compared to a benchmark ticker such as SP500 and, 124 | whenever an individual asset does not have a sufficient number of dates, 125 | the software drops it from the output. This means you can choose to 126 | ignore tickers with high number of missing dates.
    • 127 |
    • Allows the choice for the wide format, with tickers as columns
    • 128 |
    • Users can choose the frequency of the resulting dataset (daily, 129 | weekly, monthly, yearly)
    • 130 |
    • Option for parallel computing, speeding up the data importation 131 | process
    • 132 |
    133 |
    134 |
    135 |

    Installation 136 |

    137 |
    # CRAN (official release)
    138 | install.packages('BatchGetSymbols')
    139 | 
    140 | # Github (dev version)
    141 | devtools::install_github('msperlin/BatchGetSymbols')
    142 |
    143 |
    144 |

    A simple example 145 |

    146 |

    See vignette.

    147 |
    148 | 149 |
    150 | 151 | 184 |
    185 | 186 | 187 |
    191 | 192 |
    193 |

    194 |

    Site built with pkgdown 195 | 2.0.2.

    196 |
    197 | 198 |
    199 |
    200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 | 208 | -------------------------------------------------------------------------------- /docs/link.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 5 | 8 | 12 | 13 | -------------------------------------------------------------------------------- /docs/news/index.html: -------------------------------------------------------------------------------- 1 | 2 | Changelog • BatchGetSymbols 6 | 7 | 8 |
    9 |
    44 | 45 | 46 | 47 |
    48 |
    49 | 53 | 54 |
    55 | 56 |
    • Added deprecation message
    • 57 |
    58 |
    59 | 60 |
    • Added message for ibov composition
    • 61 |
    62 |
    63 | 64 |
    • Fixed issue issue 65 | 21, which only happened in Windows.
    • 66 |
    • changed default cache dir for ticker grabbing function
    • 67 |
    68 |
    69 | 70 |
    • The cache system is now session-persistent with 71 | cache.dir = file.path(tempdir(), 'BGS_Cache'). This solves 72 | the problem with mismatching price series from cached data between 73 | splits or dividends. A new warning is set whenever the user uses 74 | cache.dir different from temp.dir()
    • 75 |
    • removed dplyr grouping message
    • 76 |
    77 |
    78 | 79 |
    • removed rownames from output (issue 80 | 18)
    • 81 |
    • Made sure that, on “weekly” mode, the first day of week is a monday 82 | (issue 83 | 19)
    • 84 |
    • Added input how.to.aggregate that will allow the user to aggregate 85 | de data by the last or first prices of intervals (issue 86 | 19)
    • 87 |
    88 |
    89 | 90 |
    • Fixed bug in NA price
    • 91 |
    92 |
    93 | 94 |
    • Fixed bug in sp500 fct
    • 95 |
    • Fixed bug with repeated rows of data (git issue 96 | 16)
    • 97 |
    98 |
    99 | 100 |
    • Improved startup message for 2020 book (link to online version)
    • 101 |
    102 |
    103 | 104 |
    • Fixed bug in cleaning function
    • 105 |
    • Added startup message for 2020 book
    • 106 |
    107 |
    108 | 109 |
    • Fixed small bug in vignette
    • 110 |
    111 |
    112 | 113 |
    • Added option for keeping function quiet (be.quiet)
    • 114 |
    • Once again fixed function for grabbing stocks from the SP500 115 | index
    • 116 |
    117 |
    118 | 119 |
    • Fixed bug in stock index grabbing functions
    • 120 |
    121 |
    122 | 123 |
    • New function for FTSE100 Stocks (thanks samprohaska)
    • 124 |
    125 |
    126 | 127 |
    • Implemented option for parallel computations with 128 | BatchGetSymbols
    • 129 |
    • Added caching system for index grabing functions (SP500 and 130 | IBOV)
    • 131 |
    132 |
    133 | 134 |
    • Fixed bug in function that downloads SP500 data.
    • 135 |
    136 |
    137 | 138 |
    • User can now choose to return a full balanced price dataset by 139 | filling NA values or volume == 0 for their closest prices.
    • 140 |
    • Fixed small bug in cache system when an empty dataframe is 141 | returned
    • 142 |
    • Fixed bug in function that downloads ibovespa’s composition
    • 143 |
    144 |
    145 | 146 |
    • Users can now set frequency of data (daily, weekly, monthly or 147 | yearly)
    • 148 |
    149 |
    150 | 151 |

    Small update:

    152 |
    • added startup message with book link
    • 153 |
    154 |
    155 | 156 |

    Major update:

    157 |
    • New fct GetIBovStocks for downloading current composition of 158 | Ibovespa
    • 159 |
    • New fct for changing to wide format
    • 160 |
    • Now the output includes returns and not only prices
    • 161 |
    • removed annoying startup message
    • 162 |
    • implemented clever caching system for financial data
    • 163 |
    • simplified input dates (no need for Date class any more)
    • 164 |
    165 |
    166 | 167 |
    • Fixed issue with GetSP500(). The wikipedia page changed its 168 | code..
    • 169 |
    170 |
    171 | 172 |
    • Fixed CRAN NOTE (stats not used in imports)
    • 173 |
    • Fixed typos and improved vignette
    • 174 |
    • Couple of fixes in documentation
    • 175 |
    • Added citation message on startup
    • 176 |
    177 |
    178 | 179 |
    • First commit
    • 180 |
    181 |
    182 | 183 | 186 | 187 |
    188 | 189 | 190 |
    193 | 194 |
    195 |

    Site built with pkgdown 196 | 2.0.2.

    197 |
    198 | 199 |
    200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 | 208 | -------------------------------------------------------------------------------- /docs/pkgdown.css: -------------------------------------------------------------------------------- 1 | /* Sticky footer */ 2 | 3 | /** 4 | * Basic idea: https://philipwalton.github.io/solved-by-flexbox/demos/sticky-footer/ 5 | * Details: https://github.com/philipwalton/solved-by-flexbox/blob/master/assets/css/components/site.css 6 | * 7 | * .Site -> body > .container 8 | * .Site-content -> body > .container .row 9 | * .footer -> footer 10 | * 11 | * Key idea seems to be to ensure that .container and __all its parents__ 12 | * have height set to 100% 13 | * 14 | */ 15 | 16 | html, body { 17 | height: 100%; 18 | } 19 | 20 | body { 21 | position: relative; 22 | } 23 | 24 | body > .container { 25 | display: flex; 26 | height: 100%; 27 | flex-direction: column; 28 | } 29 | 30 | body > .container .row { 31 | flex: 1 0 auto; 32 | } 33 | 34 | footer { 35 | margin-top: 45px; 36 | padding: 35px 0 36px; 37 | border-top: 1px solid #e5e5e5; 38 | color: #666; 39 | display: flex; 40 | flex-shrink: 0; 41 | } 42 | footer p { 43 | margin-bottom: 0; 44 | } 45 | footer div { 46 | flex: 1; 47 | } 48 | footer .pkgdown { 49 | text-align: right; 50 | } 51 | footer p { 52 | margin-bottom: 0; 53 | } 54 | 55 | img.icon { 56 | float: right; 57 | } 58 | 59 | /* Ensure in-page images don't run outside their container */ 60 | .contents img { 61 | max-width: 100%; 62 | height: auto; 63 | } 64 | 65 | /* Fix bug in bootstrap (only seen in firefox) */ 66 | summary { 67 | display: list-item; 68 | } 69 | 70 | /* Typographic tweaking ---------------------------------*/ 71 | 72 | .contents .page-header { 73 | margin-top: calc(-60px + 1em); 74 | } 75 | 76 | dd { 77 | margin-left: 3em; 78 | } 79 | 80 | /* Section anchors ---------------------------------*/ 81 | 82 | a.anchor { 83 | display: none; 84 | margin-left: 5px; 85 | width: 20px; 86 | height: 20px; 87 | 88 | background-image: url(./link.svg); 89 | background-repeat: no-repeat; 90 | background-size: 20px 20px; 91 | background-position: center center; 92 | } 93 | 94 | h1:hover .anchor, 95 | h2:hover .anchor, 96 | h3:hover .anchor, 97 | h4:hover .anchor, 98 | h5:hover .anchor, 99 | h6:hover .anchor { 100 | display: inline-block; 101 | } 102 | 103 | /* Fixes for fixed navbar --------------------------*/ 104 | 105 | .contents h1, .contents h2, .contents h3, .contents h4 { 106 | padding-top: 60px; 107 | margin-top: -40px; 108 | } 109 | 110 | /* Navbar submenu --------------------------*/ 111 | 112 | .dropdown-submenu { 113 | position: relative; 114 | } 115 | 116 | .dropdown-submenu>.dropdown-menu { 117 | top: 0; 118 | left: 100%; 119 | margin-top: -6px; 120 | margin-left: -1px; 121 | border-radius: 0 6px 6px 6px; 122 | } 123 | 124 | .dropdown-submenu:hover>.dropdown-menu { 125 | display: block; 126 | } 127 | 128 | .dropdown-submenu>a:after { 129 | display: block; 130 | content: " "; 131 | float: right; 132 | width: 0; 133 | height: 0; 134 | border-color: transparent; 135 | border-style: solid; 136 | border-width: 5px 0 5px 5px; 137 | border-left-color: #cccccc; 138 | margin-top: 5px; 139 | margin-right: -10px; 140 | } 141 | 142 | .dropdown-submenu:hover>a:after { 143 | border-left-color: #ffffff; 144 | } 145 | 146 | .dropdown-submenu.pull-left { 147 | float: none; 148 | } 149 | 150 | .dropdown-submenu.pull-left>.dropdown-menu { 151 | left: -100%; 152 | margin-left: 10px; 153 | border-radius: 6px 0 6px 6px; 154 | } 155 | 156 | /* Sidebar --------------------------*/ 157 | 158 | #pkgdown-sidebar { 159 | margin-top: 30px; 160 | position: -webkit-sticky; 161 | position: sticky; 162 | top: 70px; 163 | } 164 | 165 | #pkgdown-sidebar h2 { 166 | font-size: 1.5em; 167 | margin-top: 1em; 168 | } 169 | 170 | #pkgdown-sidebar h2:first-child { 171 | margin-top: 0; 172 | } 173 | 174 | #pkgdown-sidebar .list-unstyled li { 175 | margin-bottom: 0.5em; 176 | } 177 | 178 | /* bootstrap-toc tweaks ------------------------------------------------------*/ 179 | 180 | /* All levels of nav */ 181 | 182 | nav[data-toggle='toc'] .nav > li > a { 183 | padding: 4px 20px 4px 6px; 184 | font-size: 1.5rem; 185 | font-weight: 400; 186 | color: inherit; 187 | } 188 | 189 | nav[data-toggle='toc'] .nav > li > a:hover, 190 | nav[data-toggle='toc'] .nav > li > a:focus { 191 | padding-left: 5px; 192 | color: inherit; 193 | border-left: 1px solid #878787; 194 | } 195 | 196 | nav[data-toggle='toc'] .nav > .active > a, 197 | nav[data-toggle='toc'] .nav > .active:hover > a, 198 | nav[data-toggle='toc'] .nav > .active:focus > a { 199 | padding-left: 5px; 200 | font-size: 1.5rem; 201 | font-weight: 400; 202 | color: inherit; 203 | border-left: 2px solid #878787; 204 | } 205 | 206 | /* Nav: second level (shown on .active) */ 207 | 208 | nav[data-toggle='toc'] .nav .nav { 209 | display: none; /* Hide by default, but at >768px, show it */ 210 | padding-bottom: 10px; 211 | } 212 | 213 | nav[data-toggle='toc'] .nav .nav > li > a { 214 | padding-left: 16px; 215 | font-size: 1.35rem; 216 | } 217 | 218 | nav[data-toggle='toc'] .nav .nav > li > a:hover, 219 | nav[data-toggle='toc'] .nav .nav > li > a:focus { 220 | padding-left: 15px; 221 | } 222 | 223 | nav[data-toggle='toc'] .nav .nav > .active > a, 224 | nav[data-toggle='toc'] .nav .nav > .active:hover > a, 225 | nav[data-toggle='toc'] .nav .nav > .active:focus > a { 226 | padding-left: 15px; 227 | font-weight: 500; 228 | font-size: 1.35rem; 229 | } 230 | 231 | /* orcid ------------------------------------------------------------------- */ 232 | 233 | .orcid { 234 | font-size: 16px; 235 | color: #A6CE39; 236 | /* margins are required by official ORCID trademark and display guidelines */ 237 | margin-left:4px; 238 | margin-right:4px; 239 | vertical-align: middle; 240 | } 241 | 242 | /* Reference index & topics ----------------------------------------------- */ 243 | 244 | .ref-index th {font-weight: normal;} 245 | 246 | .ref-index td {vertical-align: top; min-width: 100px} 247 | .ref-index .icon {width: 40px;} 248 | .ref-index .alias {width: 40%;} 249 | .ref-index-icons .alias {width: calc(40% - 40px);} 250 | .ref-index .title {width: 60%;} 251 | 252 | .ref-arguments th {text-align: right; padding-right: 10px;} 253 | .ref-arguments th, .ref-arguments td {vertical-align: top; min-width: 100px} 254 | .ref-arguments .name {width: 20%;} 255 | .ref-arguments .desc {width: 80%;} 256 | 257 | /* Nice scrolling for wide elements --------------------------------------- */ 258 | 259 | table { 260 | display: block; 261 | overflow: auto; 262 | } 263 | 264 | /* Syntax highlighting ---------------------------------------------------- */ 265 | 266 | pre, code, pre code { 267 | background-color: #f8f8f8; 268 | color: #333; 269 | } 270 | pre, pre code { 271 | white-space: pre-wrap; 272 | word-break: break-all; 273 | overflow-wrap: break-word; 274 | } 275 | 276 | pre { 277 | border: 1px solid #eee; 278 | } 279 | 280 | pre .img, pre .r-plt { 281 | margin: 5px 0; 282 | } 283 | 284 | pre .img img, pre .r-plt img { 285 | background-color: #fff; 286 | } 287 | 288 | code a, pre a { 289 | color: #375f84; 290 | } 291 | 292 | a.sourceLine:hover { 293 | text-decoration: none; 294 | } 295 | 296 | .fl {color: #1514b5;} 297 | .fu {color: #000000;} /* function */ 298 | .ch,.st {color: #036a07;} /* string */ 299 | .kw {color: #264D66;} /* keyword */ 300 | .co {color: #888888;} /* comment */ 301 | 302 | .error {font-weight: bolder;} 303 | .warning {font-weight: bolder;} 304 | 305 | /* Clipboard --------------------------*/ 306 | 307 | .hasCopyButton { 308 | position: relative; 309 | } 310 | 311 | .btn-copy-ex { 312 | position: absolute; 313 | right: 0; 314 | top: 0; 315 | visibility: hidden; 316 | } 317 | 318 | .hasCopyButton:hover button.btn-copy-ex { 319 | visibility: visible; 320 | } 321 | 322 | /* headroom.js ------------------------ */ 323 | 324 | .headroom { 325 | will-change: transform; 326 | transition: transform 200ms linear; 327 | } 328 | .headroom--pinned { 329 | transform: translateY(0%); 330 | } 331 | .headroom--unpinned { 332 | transform: translateY(-100%); 333 | } 334 | 335 | /* mark.js ----------------------------*/ 336 | 337 | mark { 338 | background-color: rgba(255, 255, 51, 0.5); 339 | border-bottom: 2px solid rgba(255, 153, 51, 0.3); 340 | padding: 1px; 341 | } 342 | 343 | /* vertical spacing after htmlwidgets */ 344 | .html-widget { 345 | margin-bottom: 10px; 346 | } 347 | 348 | /* fontawesome ------------------------ */ 349 | 350 | .fab { 351 | font-family: "Font Awesome 5 Brands" !important; 352 | } 353 | 354 | /* don't display links in code chunks when printing */ 355 | /* source: https://stackoverflow.com/a/10781533 */ 356 | @media print { 357 | code a:link:after, code a:visited:after { 358 | content: ""; 359 | } 360 | } 361 | 362 | /* Section anchors --------------------------------- 363 | Added in pandoc 2.11: https://github.com/jgm/pandoc-templates/commit/9904bf71 364 | */ 365 | 366 | div.csl-bib-body { } 367 | div.csl-entry { 368 | clear: both; 369 | } 370 | .hanging-indent div.csl-entry { 371 | margin-left:2em; 372 | text-indent:-2em; 373 | } 374 | div.csl-left-margin { 375 | min-width:2em; 376 | float:left; 377 | } 378 | div.csl-right-inline { 379 | margin-left:2em; 380 | padding-left:1em; 381 | } 382 | div.csl-indent { 383 | margin-left: 2em; 384 | } 385 | -------------------------------------------------------------------------------- /docs/pkgdown.js: -------------------------------------------------------------------------------- 1 | /* http://gregfranko.com/blog/jquery-best-practices/ */ 2 | (function($) { 3 | $(function() { 4 | 5 | $('.navbar-fixed-top').headroom(); 6 | 7 | $('body').css('padding-top', $('.navbar').height() + 10); 8 | $(window).resize(function(){ 9 | $('body').css('padding-top', $('.navbar').height() + 10); 10 | }); 11 | 12 | $('[data-toggle="tooltip"]').tooltip(); 13 | 14 | var cur_path = paths(location.pathname); 15 | var links = $("#navbar ul li a"); 16 | var max_length = -1; 17 | var pos = -1; 18 | for (var i = 0; i < links.length; i++) { 19 | if (links[i].getAttribute("href") === "#") 20 | continue; 21 | // Ignore external links 22 | if (links[i].host !== location.host) 23 | continue; 24 | 25 | var nav_path = paths(links[i].pathname); 26 | 27 | var length = prefix_length(nav_path, cur_path); 28 | if (length > max_length) { 29 | max_length = length; 30 | pos = i; 31 | } 32 | } 33 | 34 | // Add class to parent
  • , and enclosing
  • if in dropdown 35 | if (pos >= 0) { 36 | var menu_anchor = $(links[pos]); 37 | menu_anchor.parent().addClass("active"); 38 | menu_anchor.closest("li.dropdown").addClass("active"); 39 | } 40 | }); 41 | 42 | function paths(pathname) { 43 | var pieces = pathname.split("/"); 44 | pieces.shift(); // always starts with / 45 | 46 | var end = pieces[pieces.length - 1]; 47 | if (end === "index.html" || end === "") 48 | pieces.pop(); 49 | return(pieces); 50 | } 51 | 52 | // Returns -1 if not found 53 | function prefix_length(needle, haystack) { 54 | if (needle.length > haystack.length) 55 | return(-1); 56 | 57 | // Special case for length-0 haystack, since for loop won't run 58 | if (haystack.length === 0) { 59 | return(needle.length === 0 ? 0 : -1); 60 | } 61 | 62 | for (var i = 0; i < haystack.length; i++) { 63 | if (needle[i] != haystack[i]) 64 | return(i); 65 | } 66 | 67 | return(haystack.length); 68 | } 69 | 70 | /* Clipboard --------------------------*/ 71 | 72 | function changeTooltipMessage(element, msg) { 73 | var tooltipOriginalTitle=element.getAttribute('data-original-title'); 74 | element.setAttribute('data-original-title', msg); 75 | $(element).tooltip('show'); 76 | element.setAttribute('data-original-title', tooltipOriginalTitle); 77 | } 78 | 79 | if(ClipboardJS.isSupported()) { 80 | $(document).ready(function() { 81 | var copyButton = ""; 82 | 83 | $("div.sourceCode").addClass("hasCopyButton"); 84 | 85 | // Insert copy buttons: 86 | $(copyButton).prependTo(".hasCopyButton"); 87 | 88 | // Initialize tooltips: 89 | $('.btn-copy-ex').tooltip({container: 'body'}); 90 | 91 | // Initialize clipboard: 92 | var clipboardBtnCopies = new ClipboardJS('[data-clipboard-copy]', { 93 | text: function(trigger) { 94 | return trigger.parentNode.textContent.replace(/\n#>[^\n]*/g, ""); 95 | } 96 | }); 97 | 98 | clipboardBtnCopies.on('success', function(e) { 99 | changeTooltipMessage(e.trigger, 'Copied!'); 100 | e.clearSelection(); 101 | }); 102 | 103 | clipboardBtnCopies.on('error', function() { 104 | changeTooltipMessage(e.trigger,'Press Ctrl+C or Command+C to copy'); 105 | }); 106 | }); 107 | } 108 | })(window.jQuery || window.$) 109 | -------------------------------------------------------------------------------- /docs/pkgdown.yml: -------------------------------------------------------------------------------- 1 | pandoc: 2.17.1.1 2 | pkgdown: 2.0.2 3 | pkgdown_sha: ~ 4 | articles: 5 | BatchGetSymbols-vignette: BatchGetSymbols-vignette.html 6 | last_built: 2022-03-30T18:15Z 7 | 8 | -------------------------------------------------------------------------------- /docs/reference/GetFTSE100Stocks.html: -------------------------------------------------------------------------------- 1 | 2 | Function to download the current components of the FTSE100 index from Wikipedia — GetFTSE100Stocks • BatchGetSymbols 6 | 7 | 8 |
    9 |
    44 | 45 | 46 | 47 |
    48 |
    49 | 54 | 55 |
    56 |

    This function scrapes the stocks that constitute the FTSE100 index from the wikipedia page at <https://en.wikipedia.org/wiki/FTSE_100_Index#List_of_FTSE_100_companies>.

    57 |
    58 | 59 |
    60 |
    GetFTSE100Stocks(
     61 |   do.cache = TRUE,
     62 |   cache.folder = file.path(tempdir(), "BGS_Cache")
     63 | )
    64 |
    65 | 66 |
    67 |

    Arguments

    68 |
    do.cache
    69 |

    Use cache system? (default = TRUE)

    70 |
    cache.folder
    71 |

    Where to save cache files? (default = file.path(tempdir(), 'BGS_Cache') )

    72 |
    73 |
    74 |

    Value

    75 |

    A dataframe that includes a column with the list of tickers of companies that belong to the FTSE100 index

    76 |
    77 | 78 |
    79 |

    Examples

    80 |
    if (FALSE) {
     81 | df.FTSE100 <- GetFTSE100Stocks()
     82 | print(df.FTSE100$tickers)
     83 | }
     84 | 
    85 |
    86 |
    87 | 90 |
    91 | 92 | 93 |
    96 | 97 |
    98 |

    Site built with pkgdown 99 | 2.0.2.

    100 |
    101 | 102 |
    103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | -------------------------------------------------------------------------------- /docs/reference/GetIbovStocks.html: -------------------------------------------------------------------------------- 1 | 2 | Function to download the current components of the Ibovespa index from Bovespa website — GetIbovStocks • BatchGetSymbols 6 | 7 | 8 |
    9 |
    44 | 45 | 46 | 47 |
    48 |
    49 | 54 | 55 |
    56 |

    This function scrapes the stocks that constitute the Ibovespa index from the wikipedia page at http://bvmf.bmfbovespa.com.br/indices/ResumoCarteiraTeorica.aspx?Indice=IBOV&idioma=pt-br.

    57 |
    58 | 59 |
    60 |
    GetIbovStocks(
     61 |   do.cache = TRUE,
     62 |   cache.folder = file.path(tempdir(), "BGS_Cache"),
     63 |   max.tries = 10
     64 | )
    65 |
    66 | 67 |
    68 |

    Arguments

    69 |
    do.cache
    70 |

    Use cache system? (default = TRUE)

    71 |
    cache.folder
    72 |

    Where to save cache files? (default = file.path(tempdir(), 'BGS_Cache') )

    73 |
    max.tries
    74 |

    Maximum number of attempts to download the data

    75 |
    76 |
    77 |

    Value

    78 |

    A dataframe that includes a column with the list of tickers of companies that belong to the Ibovespa index

    79 |
    80 | 81 |
    82 |

    Examples

    83 |
    if (FALSE) {
     84 | df.ibov <- GetIbovStocks()
     85 | print(df.ibov$tickers)
     86 | }
     87 | 
    88 |
    89 |
    90 | 93 |
    94 | 95 | 96 |
    99 | 100 |
    101 |

    Site built with pkgdown 102 | 2.0.2.

    103 |
    104 | 105 |
    106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | -------------------------------------------------------------------------------- /docs/reference/GetSP500Stocks.html: -------------------------------------------------------------------------------- 1 | 2 | Function to download the current components of the SP500 index from Wikipedia — GetSP500Stocks • BatchGetSymbols 6 | 7 | 8 |
    9 |
    44 | 45 | 46 | 47 |
    48 |
    49 | 54 | 55 |
    56 |

    This function scrapes the stocks that constitute the SP500 index from the wikipedia page at https://en.wikipedia.org/wiki/List_of_S

    57 |
    58 | 59 |
    60 |
    GetSP500Stocks(
     61 |   do.cache = TRUE,
     62 |   cache.folder = file.path(tempdir(), "BGS_Cache")
     63 | )
    64 |
    65 | 66 |
    67 |

    Arguments

    68 |
    do.cache
    69 |

    Use cache system? (default = TRUE)

    70 |
    cache.folder
    71 |

    Where to save cache files? (default = file.path(tempdir(), 'BGS_Cache') )

    72 |
    73 |
    74 |

    Value

    75 |

    A dataframe that includes a column with the list of tickers of companies that belong to the SP500 index

    76 |
    77 | 78 |
    79 |

    Examples

    80 |
    if (FALSE) {
     81 | df.SP500 <- GetSP500Stocks()
     82 | print(df.SP500$tickers)
     83 | }
     84 | 
    85 |
    86 |
    87 | 90 |
    91 | 92 | 93 |
    96 | 97 |
    98 |

    Site built with pkgdown 99 | 2.0.2.

    100 |
    101 | 102 |
    103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | -------------------------------------------------------------------------------- /docs/reference/Rplot001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/msperlin/BatchGetSymbols/689c84c0fe5177d50afbe6473a7aa1abbf53c29e/docs/reference/Rplot001.png -------------------------------------------------------------------------------- /docs/reference/calc.ret.html: -------------------------------------------------------------------------------- 1 | 2 | Function to calculate returns from a price and ticker vector — calc.ret • BatchGetSymbols 6 | 7 | 8 |
    9 |
    44 | 45 | 46 | 47 |
    48 |
    49 | 54 | 55 |
    56 |

    Created so that a return column is added to a dataframe with prices in the long (tidy) format.

    57 |
    58 | 59 |
    60 |
    calc.ret(P, tickers = rep("ticker", length(P)), type.return = "arit")
    61 |
    62 | 63 |
    64 |

    Arguments

    65 |
    P
    66 |

    Price vector

    67 |
    tickers
    68 |

    Ticker of symbols (usefull if working with long dataframe)

    69 |
    type.return
    70 |

    Type of price return to calculate: 'arit' (default) - aritmetic, 'log' - log returns.

    71 |
    72 |
    73 |

    Value

    74 |

    A vector of returns

    75 |
    76 | 77 |
    78 |

    Examples

    79 |
    P <- c(1,2,3)
     80 | R <- calc.ret(P)
     81 | 
    82 |
    83 |
    84 | 87 |
    88 | 89 | 90 |
    93 | 94 |
    95 |

    Site built with pkgdown 96 | 2.0.2.

    97 |
    98 | 99 |
    100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | -------------------------------------------------------------------------------- /docs/reference/df.fill.na.html: -------------------------------------------------------------------------------- 1 | 2 | Replaces NA values in dataframe for closest price — df.fill.na • BatchGetSymbols 6 | 7 | 8 |
    9 |
    44 | 45 | 46 | 47 |
    48 |
    49 | 54 | 55 |
    56 |

    Helper function for BatchGetSymbols. Replaces NA values and returns fixed dataframe.

    57 |
    58 | 59 |
    60 |
    df.fill.na(df.in)
    61 |
    62 | 63 |
    64 |

    Arguments

    65 |
    df.in
    66 |

    DAtaframe to be fixed

    67 |
    68 |
    69 |

    Value

    70 |

    A fixed dataframe.

    71 |
    72 | 73 |
    74 |

    Examples

    75 |
    
     76 | df <- data.frame(price.adjusted = c(NA, 10, 11, NA, 12, 12.5, NA ), volume = c(1,10, 0, 2, 0, 1, 5))
     77 | 
     78 | df.fixed.na <- df.fill.na(df)
     79 | #> NULL
     80 | 
     81 | 
    82 |
    83 |
    84 | 87 |
    88 | 89 | 90 |
    93 | 94 |
    95 |

    Site built with pkgdown 96 | 2.0.2.

    97 |
    98 | 99 |
    100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | -------------------------------------------------------------------------------- /docs/reference/fix.ticker.name.html: -------------------------------------------------------------------------------- 1 | 2 | Fix name of ticker — fix.ticker.name • BatchGetSymbols 6 | 7 | 8 |
    9 |
    44 | 45 | 46 | 47 |
    48 |
    49 | 54 | 55 |
    56 |

    Removes bad symbols from names of tickers. This is useful for naming files with cache system.

    57 |
    58 | 59 |
    60 |
    fix.ticker.name(ticker.in)
    61 |
    62 | 63 |
    64 |

    Arguments

    65 |
    ticker.in
    66 |

    A bad ticker name

    67 |
    68 |
    69 |

    Value

    70 |

    A good ticker name

    71 |
    72 | 73 |
    74 |

    Examples

    75 |
    bad.ticker <- '^GSPC'
     76 | good.ticker <- fix.ticker.name(bad.ticker)
     77 | good.ticker
     78 | #> [1] "GSPC"
     79 | 
    80 |
    81 |
    82 | 85 |
    86 | 87 | 88 |
    91 | 92 |
    93 |

    Site built with pkgdown 94 | 2.0.2.

    95 |
    96 | 97 |
    98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | -------------------------------------------------------------------------------- /docs/reference/get.clean.data.html: -------------------------------------------------------------------------------- 1 | 2 | Get clean data from yahoo/google — get.clean.data • BatchGetSymbols 6 | 7 | 8 |
    9 |
    44 | 45 | 46 | 47 |
    48 |
    49 | 54 | 55 |
    56 |

    Get clean data from yahoo/google

    57 |
    58 | 59 |
    60 |
    get.clean.data(tickers, src = "yahoo", first.date, last.date)
    61 |
    62 | 63 |
    64 |

    Arguments

    65 |
    tickers
    66 |

    A vector of tickers. If not sure whether the ticker is available, check the websites of google and yahoo finance. The source for downloading 67 | the data can either be Google or Yahoo. The function automatically selects the source webpage based on the input ticker.

    68 |
    src
    69 |

    Source of data (yahoo or google)

    70 |
    first.date
    71 |

    The first date to download data (date or char as YYYY-MM-DD)

    72 |
    last.date
    73 |

    The last date to download data (date or char as YYYY-MM-DD)

    74 |
    75 |
    76 |

    Value

    77 |

    A dataframe with the cleaned data

    78 |
    79 | 80 |
    81 |

    Examples

    82 |
    df.sp500 <- get.clean.data('^GSPC',
     83 |                            first.date = as.Date('2010-01-01'),
     84 |                            last.date = as.Date('2010-02-01'))
     85 | 
    86 |
    87 |
    88 | 91 |
    92 | 93 | 94 |
    97 | 98 |
    99 |

    Site built with pkgdown 100 | 2.0.2.

    101 |
    102 | 103 |
    104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | -------------------------------------------------------------------------------- /docs/reference/index.html: -------------------------------------------------------------------------------- 1 | 2 | Function reference • BatchGetSymbols 6 | 7 | 8 |
    9 |
    44 | 45 | 46 | 47 |
    48 |
    49 | 52 | 53 | 57 | 60 | 61 | 64 | 65 | 68 | 69 | 72 | 73 | 76 | 77 | 80 | 81 | 84 | 85 | 88 | 89 | 92 | 93 | 96 | 97 |
    54 |

    All functions

    55 |

    56 |
    58 |

    BatchGetSymbols()

    59 |

    Function to download financial data

    62 |

    GetFTSE100Stocks()

    63 |

    Function to download the current components of the FTSE100 index from Wikipedia

    66 |

    GetIbovStocks()

    67 |

    Function to download the current components of the Ibovespa index from Bovespa website

    70 |

    GetSP500Stocks()

    71 |

    Function to download the current components of the SP500 index from Wikipedia

    74 |

    calc.ret()

    75 |

    Function to calculate returns from a price and ticker vector

    78 |

    df.fill.na()

    79 |

    Replaces NA values in dataframe for closest price

    82 |

    fix.ticker.name()

    83 |

    Fix name of ticker

    86 |

    get.clean.data()

    87 |

    Get clean data from yahoo/google

    90 |

    myGetSymbols()

    91 |

    An improved version of function getSymbols from quantmod

    94 |

    reshape.wide()

    95 |

    Transforms a dataframe in the long format to a list of dataframes in the wide format

    98 | 99 | 102 |
    103 | 104 | 105 |
    108 | 109 |
    110 |

    Site built with pkgdown 111 | 2.0.2.

    112 |
    113 | 114 |
    115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | -------------------------------------------------------------------------------- /docs/reference/myGetSymbols.html: -------------------------------------------------------------------------------- 1 | 2 | An improved version of function getSymbols from quantmod — myGetSymbols • BatchGetSymbols 7 | 8 | 9 |
    10 |
    45 | 46 | 47 | 48 |
    49 |
    50 | 55 | 56 |
    57 |

    This is a helper function to BatchGetSymbols and it should normaly not be called directly. The purpose of this function is to download financial data based on a ticker and a time period. 58 | The main difference from getSymbols is that it imports the data as a dataframe with proper named columns and saves data locally with the caching system.

    59 |
    60 | 61 |
    62 |
    myGetSymbols(
     63 |   ticker,
     64 |   i.ticker,
     65 |   length.tickers,
     66 |   src = "yahoo",
     67 |   first.date,
     68 |   last.date,
     69 |   do.cache = TRUE,
     70 |   cache.folder = file.path(tempdir(), "BGS_Cache"),
     71 |   df.bench = NULL,
     72 |   be.quiet = FALSE,
     73 |   thresh.bad.data
     74 | )
    75 |
    76 | 77 |
    78 |

    Arguments

    79 |
    ticker
    80 |

    A single ticker to download data

    81 |
    i.ticker
    82 |

    A index for the stock that is downloading (for cat() purposes)

    83 |
    length.tickers
    84 |

    total number of stocks being downloaded (also for cat() purposes)

    85 |
    src
    86 |

    The source of the data ('google' or'yahoo')

    87 |
    first.date
    88 |

    The first date to download data (date or char as YYYY-MM-DD)

    89 |
    last.date
    90 |

    The last date to download data (date or char as YYYY-MM-DD)

    91 |
    do.cache
    92 |

    Use cache system? (default = TRUE)

    93 |
    cache.folder
    94 |

    Where to save cache files? (default = file.path(tempdir(), 'BGS_Cache') )

    95 |
    df.bench
    96 |

    Data for bechmark ticker

    97 |
    be.quiet
    98 |

    Logical for printing statements (default = FALSE)

    99 |
    thresh.bad.data
    100 |

    A percentage threshold for defining bad data. The dates of the benchmark ticker are compared to each asset. If the percentage of non-missing dates 101 | with respect to the benchmark ticker is lower than thresh.bad.data, the function will ignore the asset (default = 0.75)

    102 |
    103 |
    104 |

    Value

    105 |

    A dataframe with the financial data

    106 |
    107 |
    108 |

    See also

    109 |

    getSymbols for the base function

    110 |
    111 | 112 |
    113 |

    Examples

    114 |
    ticker <- 'FB'
    115 | 
    116 | first.date <- Sys.Date()-30
    117 | last.date <- Sys.Date()
    118 | 
    119 | if (FALSE) {
    120 | df.ticker <- myGetSymbols(ticker,
    121 |                           first.date = first.date,
    122 |                           last.date = last.date)
    123 | }
    124 | 
    125 |
    126 |
    127 | 130 |
    131 | 132 | 133 |
    136 | 137 |
    138 |

    Site built with pkgdown 139 | 2.0.2.

    140 |
    141 | 142 |
    143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | -------------------------------------------------------------------------------- /docs/sitemap.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | /404.html 5 | 6 | 7 | /articles/BatchGetSymbols-vignette.html 8 | 9 | 10 | /articles/index.html 11 | 12 | 13 | /authors.html 14 | 15 | 16 | /index.html 17 | 18 | 19 | /news/index.html 20 | 21 | 22 | /reference/BatchGetSymbols.html 23 | 24 | 25 | /reference/GetFTSE100Stocks.html 26 | 27 | 28 | /reference/GetIbovStocks.html 29 | 30 | 31 | /reference/GetSP500Stocks.html 32 | 33 | 34 | /reference/calc.ret.html 35 | 36 | 37 | /reference/df.fill.na.html 38 | 39 | 40 | /reference/fix.ticker.name.html 41 | 42 | 43 | /reference/get.clean.data.html 44 | 45 | 46 | /reference/index.html 47 | 48 | 49 | /reference/myGetSymbols.html 50 | 51 | 52 | /reference/reshape.wide.html 53 | 54 | 55 | -------------------------------------------------------------------------------- /inst/extdata/ExampleData.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/msperlin/BatchGetSymbols/689c84c0fe5177d50afbe6473a7aa1abbf53c29e/inst/extdata/ExampleData.rds -------------------------------------------------------------------------------- /man/BatchGetSymbols.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/BatchGetSymbols.R 3 | \name{BatchGetSymbols} 4 | \alias{BatchGetSymbols} 5 | \title{Function to download financial data} 6 | \usage{ 7 | BatchGetSymbols( 8 | tickers, 9 | first.date = Sys.Date() - 30, 10 | last.date = Sys.Date(), 11 | thresh.bad.data = 0.75, 12 | bench.ticker = "^GSPC", 13 | type.return = "arit", 14 | freq.data = "daily", 15 | how.to.aggregate = "last", 16 | do.complete.data = FALSE, 17 | do.fill.missing.prices = TRUE, 18 | do.cache = TRUE, 19 | cache.folder = file.path(tempdir(), "BGS_Cache"), 20 | do.parallel = FALSE, 21 | be.quiet = FALSE 22 | ) 23 | } 24 | \arguments{ 25 | \item{tickers}{A vector of tickers. If not sure whether the ticker is available, check the websites of google and yahoo finance. The source for downloading 26 | the data can either be Google or Yahoo. The function automatically selects the source webpage based on the input ticker.} 27 | 28 | \item{first.date}{The first date to download data (date or char as YYYY-MM-DD)} 29 | 30 | \item{last.date}{The last date to download data (date or char as YYYY-MM-DD)} 31 | 32 | \item{thresh.bad.data}{A percentage threshold for defining bad data. The dates of the benchmark ticker are compared to each asset. If the percentage of non-missing dates 33 | with respect to the benchmark ticker is lower than thresh.bad.data, the function will ignore the asset (default = 0.75)} 34 | 35 | \item{bench.ticker}{The ticker of the benchmark asset used to compare dates. My suggestion is to use the main stock index of the market from where the data is coming from (default = ^GSPC (SP500, US market))} 36 | 37 | \item{type.return}{Type of price return to calculate: 'arit' (default) - aritmetic, 'log' - log returns.} 38 | 39 | \item{freq.data}{Frequency of financial data ('daily', 'weekly', 'monthly', 'yearly')} 40 | 41 | \item{how.to.aggregate}{Defines whether to aggregate the data using the first observations of the aggregating period or last ('first', 'last'). 42 | For example, if freq.data = 'yearly' and how.to.aggregate = 'last', the last available day of the year will be used for all 43 | aggregated values such as price.adjusted.} 44 | 45 | \item{do.complete.data}{Return a complete/balanced dataset? If TRUE, all missing pairs of ticker-date will be replaced by NA or closest price (see input do.fill.missing.prices). Default = FALSE.} 46 | 47 | \item{do.fill.missing.prices}{Finds all missing prices and replaces them by their closest price with preference for the previous price. This ensures a balanced dataset for all assets, without any NA. Default = TRUE.} 48 | 49 | \item{do.cache}{Use cache system? (default = TRUE)} 50 | 51 | \item{cache.folder}{Where to save cache files? (default = file.path(tempdir(), 'BGS_Cache') )} 52 | 53 | \item{do.parallel}{Flag for using parallel or not (default = FALSE). Before using parallel, make sure you call function future::plan() first.} 54 | 55 | \item{be.quiet}{Logical for printing statements (default = FALSE)} 56 | } 57 | \value{ 58 | A list with the following items: \describe{ 59 | \item{df.control }{A dataframe containing the results of the download process for each asset} 60 | \item{df.tickers}{A dataframe with the financial data for all valid tickers} } 61 | } 62 | \description{ 63 | This function downloads financial data from Yahoo Finance using \code{\link[quantmod]{getSymbols}}. 64 | Based on a set of tickers and a time period, the function will download the data for each ticker and return a report of the process, along with the actual data in the long dataframe format. 65 | The main advantage of the function is that it automatically recognizes the source of the dataset from the ticker and structures the resulting data from different sources in the long format. 66 | A caching system is also available, making it very fast. 67 | } 68 | \section{Warning}{ 69 | 70 | 71 | Do notice that since 2019, adjusted prices are no longer available from google finance. 72 | When using this source, the function will output NA values for this column. 73 | 74 | Also, be aware that when using cache system in a local folder (and not the default tempdir()), the aggregate prices series might not match if 75 | a split or dividends event happens in between cache files. 76 | } 77 | 78 | \examples{ 79 | tickers <- c('FB','MMM') 80 | 81 | first.date <- Sys.Date()-30 82 | last.date <- Sys.Date() 83 | 84 | l.out <- BatchGetSymbols(tickers = tickers, 85 | first.date = first.date, 86 | last.date = last.date, do.cache=FALSE) 87 | 88 | print(l.out$df.control) 89 | print(l.out$df.tickers) 90 | } 91 | \seealso{ 92 | \link[quantmod]{getSymbols} 93 | } 94 | -------------------------------------------------------------------------------- /man/GetFTSE100Stocks.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/GetFTSE100Stocks.R 3 | \name{GetFTSE100Stocks} 4 | \alias{GetFTSE100Stocks} 5 | \title{Function to download the current components of the FTSE100 index from Wikipedia} 6 | \usage{ 7 | GetFTSE100Stocks( 8 | do.cache = TRUE, 9 | cache.folder = file.path(tempdir(), "BGS_Cache") 10 | ) 11 | } 12 | \arguments{ 13 | \item{do.cache}{Use cache system? (default = TRUE)} 14 | 15 | \item{cache.folder}{Where to save cache files? (default = file.path(tempdir(), 'BGS_Cache') )} 16 | } 17 | \value{ 18 | A dataframe that includes a column with the list of tickers of companies that belong to the FTSE100 index 19 | } 20 | \description{ 21 | This function scrapes the stocks that constitute the FTSE100 index from the wikipedia page at . 22 | } 23 | \examples{ 24 | \dontrun{ 25 | df.FTSE100 <- GetFTSE100Stocks() 26 | print(df.FTSE100$tickers) 27 | } 28 | } 29 | -------------------------------------------------------------------------------- /man/GetIbovStocks.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Get_Ibov_Stocks.R 3 | \name{GetIbovStocks} 4 | \alias{GetIbovStocks} 5 | \title{Function to download the current components of the Ibovespa index from Bovespa website} 6 | \usage{ 7 | GetIbovStocks( 8 | do.cache = TRUE, 9 | cache.folder = file.path(tempdir(), "BGS_Cache"), 10 | max.tries = 10 11 | ) 12 | } 13 | \arguments{ 14 | \item{do.cache}{Use cache system? (default = TRUE)} 15 | 16 | \item{cache.folder}{Where to save cache files? (default = file.path(tempdir(), 'BGS_Cache') )} 17 | 18 | \item{max.tries}{Maximum number of attempts to download the data} 19 | } 20 | \value{ 21 | A dataframe that includes a column with the list of tickers of companies that belong to the Ibovespa index 22 | } 23 | \description{ 24 | This function scrapes the stocks that constitute the Ibovespa index from the wikipedia page at http://bvmf.bmfbovespa.com.br/indices/ResumoCarteiraTeorica.aspx?Indice=IBOV&idioma=pt-br. 25 | } 26 | \examples{ 27 | \dontrun{ 28 | df.ibov <- GetIbovStocks() 29 | print(df.ibov$tickers) 30 | } 31 | } 32 | -------------------------------------------------------------------------------- /man/GetSP500Stocks.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/GetSP500Stocks.R 3 | \name{GetSP500Stocks} 4 | \alias{GetSP500Stocks} 5 | \title{Function to download the current components of the SP500 index from Wikipedia} 6 | \usage{ 7 | GetSP500Stocks( 8 | do.cache = TRUE, 9 | cache.folder = file.path(tempdir(), "BGS_Cache") 10 | ) 11 | } 12 | \arguments{ 13 | \item{do.cache}{Use cache system? (default = TRUE)} 14 | 15 | \item{cache.folder}{Where to save cache files? (default = file.path(tempdir(), 'BGS_Cache') )} 16 | } 17 | \value{ 18 | A dataframe that includes a column with the list of tickers of companies that belong to the SP500 index 19 | } 20 | \description{ 21 | This function scrapes the stocks that constitute the SP500 index from the wikipedia page at https://en.wikipedia.org/wiki/List_of_S%26P_500_companies. 22 | } 23 | \examples{ 24 | \dontrun{ 25 | df.SP500 <- GetSP500Stocks() 26 | print(df.SP500$tickers) 27 | } 28 | } 29 | -------------------------------------------------------------------------------- /man/calc.ret.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Utils.R 3 | \name{calc.ret} 4 | \alias{calc.ret} 5 | \title{Function to calculate returns from a price and ticker vector} 6 | \usage{ 7 | calc.ret(P, tickers = rep("ticker", length(P)), type.return = "arit") 8 | } 9 | \arguments{ 10 | \item{P}{Price vector} 11 | 12 | \item{tickers}{Ticker of symbols (usefull if working with long dataframe)} 13 | 14 | \item{type.return}{Type of price return to calculate: 'arit' (default) - aritmetic, 'log' - log returns.} 15 | } 16 | \value{ 17 | A vector of returns 18 | } 19 | \description{ 20 | Created so that a return column is added to a dataframe with prices in the long (tidy) format. 21 | } 22 | \examples{ 23 | P <- c(1,2,3) 24 | R <- calc.ret(P) 25 | } 26 | -------------------------------------------------------------------------------- /man/df.fill.na.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Utils.R 3 | \name{df.fill.na} 4 | \alias{df.fill.na} 5 | \title{Replaces NA values in dataframe for closest price} 6 | \usage{ 7 | df.fill.na(df.in) 8 | } 9 | \arguments{ 10 | \item{df.in}{DAtaframe to be fixed} 11 | } 12 | \value{ 13 | A fixed dataframe. 14 | } 15 | \description{ 16 | Helper function for BatchGetSymbols. Replaces NA values and returns fixed dataframe. 17 | } 18 | \examples{ 19 | 20 | df <- data.frame(price.adjusted = c(NA, 10, 11, NA, 12, 12.5, NA ), volume = c(1,10, 0, 2, 0, 1, 5)) 21 | 22 | df.fixed.na <- df.fill.na(df) 23 | 24 | } 25 | -------------------------------------------------------------------------------- /man/fix.ticker.name.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Utils.R 3 | \name{fix.ticker.name} 4 | \alias{fix.ticker.name} 5 | \title{Fix name of ticker} 6 | \usage{ 7 | fix.ticker.name(ticker.in) 8 | } 9 | \arguments{ 10 | \item{ticker.in}{A bad ticker name} 11 | } 12 | \value{ 13 | A good ticker name 14 | } 15 | \description{ 16 | Removes bad symbols from names of tickers. This is useful for naming files with cache system. 17 | } 18 | \examples{ 19 | bad.ticker <- '^GSPC' 20 | good.ticker <- fix.ticker.name(bad.ticker) 21 | good.ticker 22 | } 23 | -------------------------------------------------------------------------------- /man/get.clean.data.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Utils.R 3 | \name{get.clean.data} 4 | \alias{get.clean.data} 5 | \title{Get clean data from yahoo/google} 6 | \usage{ 7 | get.clean.data(tickers, src = "yahoo", first.date, last.date) 8 | } 9 | \arguments{ 10 | \item{tickers}{A vector of tickers. If not sure whether the ticker is available, check the websites of google and yahoo finance. The source for downloading 11 | the data can either be Google or Yahoo. The function automatically selects the source webpage based on the input ticker.} 12 | 13 | \item{src}{Source of data (yahoo or google)} 14 | 15 | \item{first.date}{The first date to download data (date or char as YYYY-MM-DD)} 16 | 17 | \item{last.date}{The last date to download data (date or char as YYYY-MM-DD)} 18 | } 19 | \value{ 20 | A dataframe with the cleaned data 21 | } 22 | \description{ 23 | Get clean data from yahoo/google 24 | } 25 | \examples{ 26 | df.sp500 <- get.clean.data('^GSPC', 27 | first.date = as.Date('2010-01-01'), 28 | last.date = as.Date('2010-02-01')) 29 | } 30 | -------------------------------------------------------------------------------- /man/myGetSymbols.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/myGetSymbols.R 3 | \name{myGetSymbols} 4 | \alias{myGetSymbols} 5 | \title{An improved version of function \code{\link[quantmod]{getSymbols}} from quantmod} 6 | \usage{ 7 | myGetSymbols( 8 | ticker, 9 | i.ticker, 10 | length.tickers, 11 | src = "yahoo", 12 | first.date, 13 | last.date, 14 | do.cache = TRUE, 15 | cache.folder = file.path(tempdir(), "BGS_Cache"), 16 | df.bench = NULL, 17 | be.quiet = FALSE, 18 | thresh.bad.data 19 | ) 20 | } 21 | \arguments{ 22 | \item{ticker}{A single ticker to download data} 23 | 24 | \item{i.ticker}{A index for the stock that is downloading (for cat() purposes)} 25 | 26 | \item{length.tickers}{total number of stocks being downloaded (also for cat() purposes)} 27 | 28 | \item{src}{The source of the data ('google' or'yahoo')} 29 | 30 | \item{first.date}{The first date to download data (date or char as YYYY-MM-DD)} 31 | 32 | \item{last.date}{The last date to download data (date or char as YYYY-MM-DD)} 33 | 34 | \item{do.cache}{Use cache system? (default = TRUE)} 35 | 36 | \item{cache.folder}{Where to save cache files? (default = file.path(tempdir(), 'BGS_Cache') )} 37 | 38 | \item{df.bench}{Data for bechmark ticker} 39 | 40 | \item{be.quiet}{Logical for printing statements (default = FALSE)} 41 | 42 | \item{thresh.bad.data}{A percentage threshold for defining bad data. The dates of the benchmark ticker are compared to each asset. If the percentage of non-missing dates 43 | with respect to the benchmark ticker is lower than thresh.bad.data, the function will ignore the asset (default = 0.75)} 44 | } 45 | \value{ 46 | A dataframe with the financial data 47 | } 48 | \description{ 49 | This is a helper function to \code{\link{BatchGetSymbols}} and it should normaly not be called directly. The purpose of this function is to download financial data based on a ticker and a time period. 50 | The main difference from \code{\link[quantmod]{getSymbols}} is that it imports the data as a dataframe with proper named columns and saves data locally with the caching system. 51 | } 52 | \examples{ 53 | ticker <- 'FB' 54 | 55 | first.date <- Sys.Date()-30 56 | last.date <- Sys.Date() 57 | 58 | \dontrun{ 59 | df.ticker <- myGetSymbols(ticker, 60 | first.date = first.date, 61 | last.date = last.date) 62 | } 63 | } 64 | \seealso{ 65 | \link[quantmod]{getSymbols} for the base function 66 | } 67 | -------------------------------------------------------------------------------- /man/reshape.wide.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/Utils.R 3 | \name{reshape.wide} 4 | \alias{reshape.wide} 5 | \title{Transforms a dataframe in the long format to a list of dataframes in the wide format} 6 | \usage{ 7 | reshape.wide(df.tickers) 8 | } 9 | \arguments{ 10 | \item{df.tickers}{Dataframe in the long format} 11 | } 12 | \value{ 13 | A list with dataframes in the wide format 14 | } 15 | \description{ 16 | Transforms a dataframe in the long format to a list of dataframes in the wide format 17 | } 18 | \examples{ 19 | 20 | my.f <- system.file( 'extdata/ExampleData.rds', package = 'BatchGetSymbols' ) 21 | df.tickers <- readRDS(my.f) 22 | l.wide <- reshape.wide(df.tickers) 23 | l.wide 24 | } 25 | -------------------------------------------------------------------------------- /tests/testthat.R: -------------------------------------------------------------------------------- 1 | library(testthat) 2 | library(BatchGetSymbols) 3 | 4 | test_check("BatchGetSymbols") 5 | -------------------------------------------------------------------------------- /tests/testthat/test_BatchGetSymbols.R: -------------------------------------------------------------------------------- 1 | library(testthat) 2 | library(BatchGetSymbols) 3 | 4 | test_that(desc = 'Test of read function',{ 5 | 6 | first.date <- Sys.Date() - 30 7 | last.date <- Sys.Date() 8 | 9 | my.tickers <- c('MMM') 10 | 11 | l.out <- BatchGetSymbols(tickers = my.tickers, 12 | first.date = first.date, 13 | last.date = last.date) 14 | 15 | expect_true(nrow(l.out$df.tickers) > 0) 16 | } ) 17 | 18 | 19 | -------------------------------------------------------------------------------- /vignettes/BatchGetSymbols-vignette.R: -------------------------------------------------------------------------------- 1 | ## ----example1----------------------------------------------------------------- 2 | if (!require(BatchGetSymbols)) install.packages('BatchGetSymbols') 3 | 4 | library(BatchGetSymbols) 5 | 6 | # set dates 7 | first.date <- Sys.Date() - 60 8 | last.date <- Sys.Date() 9 | freq.data <- 'daily' 10 | # set tickers 11 | tickers <- c('FB','MMM','PETR4.SA','abcdef') 12 | 13 | l.out <- BatchGetSymbols(tickers = tickers, 14 | first.date = first.date, 15 | last.date = last.date, 16 | freq.data = freq.data, 17 | cache.folder = file.path(tempdir(), 18 | 'BGS_Cache') ) # cache in tempdir() 19 | 20 | 21 | ## ----example2----------------------------------------------------------------- 22 | print(l.out$df.control) 23 | 24 | 25 | ## ----plot.prices, fig.width=7, fig.height=2.5--------------------------------- 26 | library(ggplot2) 27 | 28 | p <- ggplot(l.out$df.tickers, aes(x = ref.date, y = price.close)) 29 | p <- p + geom_line() 30 | p <- p + facet_wrap(~ticker, scales = 'free_y') 31 | print(p) 32 | 33 | ## ----example3,eval=FALSE------------------------------------------------------ 34 | # library(BatchGetSymbols) 35 | # 36 | # first.date <- Sys.Date()-365 37 | # last.date <- Sys.Date() 38 | # 39 | # df.SP500 <- GetSP500Stocks() 40 | # tickers <- df.SP500$Tickers 41 | # 42 | # l.out <- BatchGetSymbols(tickers = tickers, 43 | # first.date = first.date, 44 | # last.date = last.date) 45 | # 46 | # print(l.out$df.control) 47 | # print(l.out$df.tickers) 48 | # 49 | 50 | -------------------------------------------------------------------------------- /vignettes/BatchGetSymbols-vignette.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "How to use BatchGetSymbols" 3 | author: "Marcelo Perlin" 4 | date: "`r Sys.Date()`" 5 | output: rmarkdown::html_vignette 6 | vignette: > 7 | %\VignetteIndexEntry{How to use BatchGetSymbols} 8 | %\VignetteEngine{knitr::rmarkdown} 9 | %\VignetteEncoding{UTF-8} 10 | --- 11 | 12 | ## Motivation 13 | 14 | One of the great things of working in finance is that financial datasets from capital markets are freely available from sources such as Yahoo Finance. This is an excelent feature for building up to date content for classes and conducting academic research. 15 | 16 | In the past I have used function GetSymbols from the CRAN package [quantmod](https://cran.r-project.org/package=quantmod) in order to download end of day trade data for several stocks in the financial market. The problem in using GetSymbols is that it does not aggregate or clean the financial data for several tickers. In the usage of GetSymbols, each stock will have its own `xts` object with different column names and this makes it harder to store data from several tickers in a single dataframe. 17 | 18 | Package BatchGetSymbols is my solution to this problem. Based on a list of tickers and a time period, BatchGetSymbols will download price data from yahoo finance and organize it so that you don't need to worry about cleaning it yourself. 19 | 20 | ## Main features: 21 | 22 | - Organizes data in a tabular format, returning prices and returns 23 | - A cache system was implemented in version 2.0, meaning that the data is saved locally and only missings portions of the data are downloaded, if needed. 24 | - All dates are compared to a benchmark ticker such as SP500. You can choose to ignore ticker with a high number of missing dates. 25 | - User can choose a complete/balanced dataset output. The package uses a benchmark ticker for date comparison (e.g. SP500 - ^GSPC). Days with missing prices and traded volume equal to zero are found and prices are either set to NA or replaced by closest available value. 26 | - Allows the choice for the wide format, with tickers as columns 27 | - Users can choose the frequency of the resulting dataset (daily, weekly, monthly, yearly) 28 | 29 | 30 | ## A simple example 31 | 32 | As a simple exercise, let's download data for three stocks, facebook (FB), 3M (MMM), PETR4.SA (PETROBRAS) and abcdef, a ticker I just made up. We will use the last 60 days as the time period. This example will show the simple interface of the package and how it handles invalid tickers. 33 | 34 | ```{r example1} 35 | if (!require(BatchGetSymbols)) install.packages('BatchGetSymbols') 36 | 37 | library(BatchGetSymbols) 38 | 39 | # set dates 40 | first.date <- Sys.Date() - 60 41 | last.date <- Sys.Date() 42 | freq.data <- 'daily' 43 | # set tickers 44 | tickers <- c('FB','MMM','PETR4.SA','abcdef') 45 | 46 | l.out <- BatchGetSymbols(tickers = tickers, 47 | first.date = first.date, 48 | last.date = last.date, 49 | freq.data = freq.data, 50 | cache.folder = file.path(tempdir(), 51 | 'BGS_Cache') ) # cache in tempdir() 52 | 53 | ``` 54 | 55 | 56 | After downloading the data, we can check the success of the process for each ticker. Notice that the last ticker does not exist in yahoo finance and therefore results in an error. All information regarding the download process is provided in the dataframe df.control: 57 | 58 | ```{r example2} 59 | print(l.out$df.control) 60 | 61 | ``` 62 | 63 | Moreover, we can plot the daily closing prices using ggplot2: 64 | 65 | 66 | ```{r plot.prices, fig.width=7, fig.height=2.5} 67 | library(ggplot2) 68 | 69 | p <- ggplot(l.out$df.tickers, aes(x = ref.date, y = price.close)) 70 | p <- p + geom_line() 71 | p <- p + facet_wrap(~ticker, scales = 'free_y') 72 | print(p) 73 | ``` 74 | 75 | 76 | 77 | ## Downloading data for all tickers in the SP500 index 78 | 79 | The package was designed for large scale download of financial data. An example is downloading all stocks in the current composition of the SP500 stock index. The package also includes a function that downloads the current composition of the SP500 index from the internet. By using this function along with BatchGetSymbols, we can easily import end-of-day data for all assets in the index. 80 | 81 | In the following code we download data for the SP500 stocks for the last year. The code is not executed in this vignette given its time duration, but you can just copy and paste on its own R script in order to check the results. In my computer it takes around 5 minutes to download the whole dataset. 82 | 83 | ```{r example3,eval=FALSE} 84 | library(BatchGetSymbols) 85 | 86 | first.date <- Sys.Date()-365 87 | last.date <- Sys.Date() 88 | 89 | df.SP500 <- GetSP500Stocks() 90 | tickers <- df.SP500$Tickers 91 | 92 | l.out <- BatchGetSymbols(tickers = tickers, 93 | first.date = first.date, 94 | last.date = last.date) 95 | 96 | print(l.out$df.control) 97 | print(l.out$df.tickers) 98 | 99 | ``` 100 | --------------------------------------------------------------------------------