├── .Rbuildignore ├── .gitignore ├── CRAN-RELEASE ├── DESCRIPTION ├── GetHFData.Rproj ├── NAMESPACE ├── NEWS.md ├── R ├── OnAttach.R ├── ghfd_build_lob.R ├── ghfd_download_file.R ├── ghfd_get_HF_data.R ├── ghfd_get_available_tickers_from_file.R ├── ghfd_get_available_tickers_from_ftp.R ├── ghfd_get_ftp_contents.R ├── ghfd_lob_fcts.R └── ghfd_read_file.R ├── README.md ├── docs ├── articles │ ├── ghfd-vignette-LOB.html │ ├── ghfd-vignette-Orders.html │ ├── ghfd-vignette-Trades.html │ ├── ghfd-vignette-Trades_files │ │ └── figure-html │ │ │ └── plot.prices-1.png │ └── index.html ├── authors.html ├── docsearch.css ├── docsearch.js ├── index.html ├── jquery.sticky-kit.min.js ├── link.svg ├── news │ └── index.html ├── pkgdown.css ├── pkgdown.js ├── pkgdown.yml └── reference │ ├── add.order.html │ ├── ghfd_build_lob.html │ ├── ghfd_download_file.html │ ├── ghfd_get_HF_data.html │ ├── ghfd_get_available_tickers_from_file.html │ ├── ghfd_get_available_tickers_from_ftp.html │ ├── ghfd_get_ftp_contents.html │ ├── ghfd_read_file.html │ ├── ghfd_read_file.orders.html │ ├── ghfd_read_file.trades.html │ ├── index.html │ ├── organize.lob.html │ ├── print.lob.html │ └── process.lob.from.df.html ├── inst ├── CITATION └── extdata │ ├── Example_Orders.RData │ └── NEG_OPCOES_20151126.zip ├── man ├── add.order.Rd ├── ghfd_build_lob.Rd ├── ghfd_download_file.Rd ├── ghfd_get_HF_data.Rd ├── ghfd_get_available_tickers_from_file.Rd ├── ghfd_get_available_tickers_from_ftp.Rd ├── ghfd_get_ftp_contents.Rd ├── ghfd_read_file.Rd ├── ghfd_read_file.orders.Rd ├── ghfd_read_file.trades.Rd ├── organize.lob.Rd ├── print.lob.Rd └── process.lob.from.df.Rd ├── tests ├── testthat.R └── testthat │ └── test_ghfd.R └── vignettes ├── ghfd-vignette-LOB.R ├── ghfd-vignette-LOB.Rmd ├── ghfd-vignette-LOB.html ├── ghfd-vignette-Orders.R ├── ghfd-vignette-Orders.Rmd ├── ghfd-vignette-Orders.html ├── ghfd-vignette-Trades.R ├── ghfd-vignette-Trades.Rmd └── ghfd-vignette-Trades.html /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^.*\.Rproj$ 2 | ^\.Rproj\.user$ 3 | ^docs$ 4 | ^CRAN-RELEASE$ 5 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | -------------------------------------------------------------------------------- /CRAN-RELEASE: -------------------------------------------------------------------------------- 1 | This package was submitted to CRAN on 2019-04-08. 2 | Once it is accepted, delete this file and tag the release (commit 24ba99cc6a). 3 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: GetHFData 2 | Title: Download and Aggregate High Frequency Trading Data from Bovespa 3 | Version: 1.7.2 4 | Date: 2020-05-24 5 | Authors@R: c(person("Marcelo", "Perlin", email = "marceloperlin@gmail.com", role = c("aut", "cre")), 6 | person("Henrique", "Ramos", email = "hpramos4@gmail.com", role = c("ctb")) ) 7 | Description: Downloads and aggregates high frequency trading data for Brazilian instruments directly from Bovespa ftp site . 8 | Depends: 9 | R (>= 3.3.0) 10 | Imports: stringr,stats,RCurl, lubridate, readr, utils, curl,dplyr, archive 11 | License: GPL-2 12 | BugReports: https://github.com/msperlin/GetHFData/issues 13 | URL: https://github.com/msperlin/GetHFData/ 14 | LazyData: true 15 | RoxygenNote: 7.1.0 16 | Suggests: knitr, 17 | rmarkdown, 18 | testthat, 19 | ggplot2, 20 | R.utils 21 | VignetteBuilder: knitr 22 | remotes: git::https://github.com/jimhester/archive.git 23 | -------------------------------------------------------------------------------- /GetHFData.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: No 4 | SaveWorkspace: No 5 | AlwaysSaveHistory: No 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | 15 | AutoAppendNewline: Yes 16 | StripTrailingWhitespace: Yes 17 | 18 | BuildType: Package 19 | PackageUseDevtools: Yes 20 | PackageInstallArgs: --no-multiarch --with-keep.source 21 | PackageRoxygenize: rd,collate,namespace,vignette 22 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | export(ghfd_build_lob) 4 | export(ghfd_download_file) 5 | export(ghfd_get_HF_data) 6 | export(ghfd_get_available_tickers_from_file) 7 | export(ghfd_get_available_tickers_from_ftp) 8 | export(ghfd_get_ftp_contents) 9 | export(ghfd_read_file) 10 | -------------------------------------------------------------------------------- /NEWS.md: -------------------------------------------------------------------------------- 1 | ## Version 1.7.2 (2020-05-25) 2 | 3 | Minor update: 4 | 5 | * Fixed issue with 0 rows dataframes (filter badly placed for cancelled trades) 6 | 7 | ## Version 1.7.1 (2019-04-30) 8 | 9 | Minor update: 10 | 11 | * Fixed bug regarding silent argument at build_lob (see [issue 7](https://github.com/msperlin/GetHFData/issues/7)) 12 | 13 | 14 | ## Version 1.7 (2019-04-08) 15 | 16 | Minor update: 17 | 18 | * Fixed bug regarding files at ftp (see [issue 5](https://github.com/msperlin/GetHFData/issues/5)) 19 | 20 | ## Version 1.6 (2018-10-10) 21 | 22 | Minor update: 23 | 24 | * Fixed bug in ghfd_get_ftp_contents for 'equity' option 25 | 26 | ## Version 1.5 (2017-11-27) 27 | 28 | Minor update: 29 | 30 | * Added support for milsecond in LOB 31 | 32 | ## Version 1.4 (2017-09-10) 33 | 34 | Major update: 35 | 36 | * Users can now recreate the LOB (limit order book) using order data from Bovespa 37 | * fixed bug for only.dl = TRUE 38 | 39 | ## Version 1.3 (2017-05-29) 40 | 41 | Major update: 42 | 43 | * Users can now download and aggregate order files (input type.data) 44 | * Fixed link to paper 45 | * Partial matching for assets is now possible (e.g. use PETR for all stocks or options related to Petrobras) 46 | * implement option for only downloading files (this is helpful if you are dealing with order data and will process the files in other R session or software) 47 | * muted message "Using ',' as decimal and '.' as grouping mark. Use read_delim() for more control." 48 | 49 | ## Version 1.2.4 (2017-01-30) 50 | 51 | Minor update: 52 | 53 | * Fixed bug in msg output when length(my.assets) > 2 54 | 55 | ## Version 1.2.3 (2017-01-13) 56 | 57 | Minor update: 58 | 59 | * Fixed bug for non existing assets in first date of download process 60 | * Changed input Date for simpler format (e.g. '2016-01-01' and not as.Date('2016-01-01')) 61 | 62 | ## Version 1.2.2 (2016-12-05) 63 | 64 | Minor update: 65 | 66 | * Revised apa citation on attach 67 | * Fixed some typos in vignette and added link to SSRN paper 68 | 69 | ## Version 1.2.1 (2016-11-07) 70 | 71 | Minor update with the following changes: 72 | 73 | * The user can now download data from the odd lots equity market (type.market='equity-odds') 74 | * Added Henrique Ramos as a contributor 75 | * Other minor changes 76 | 77 | ## Version 1.2.0 (2016-10-14) 78 | 79 | Minor update with the following changes: 80 | 81 | * The function ghfd_get_HF_data now allows for partial matching of asset names and also the download of all assets available in ftp files 82 | * Function ghfd_get_available_tickers_from_ftp also returns the type of market in data.frame 83 | 84 | ## Version 1.1.0 (2016-08-15) 85 | 86 | Major update from initial version with the following changes: 87 | 88 | * The function for finding tickers in the ftp now looks for the closest date in the case that the actual date is missing from the ftp 89 | * The function for finding tickers now returns a dataframe with the tickers and number of trades 90 | * Added control for bad files 91 | * The output for raw and agg type of output were revised 92 | * The vignette is revised 93 | 94 | ## Version 1.0.0 - First commit (2016-07-21) 95 | -------------------------------------------------------------------------------- /R/OnAttach.R: -------------------------------------------------------------------------------- 1 | .onAttach <- function(libname, pkgname) { 2 | 3 | citation.apa <- 'Perlin, M., Ramos, H. (2016). GetHFData: A R Package for Downloading and Aggregating High Frequency Trading Data from Bovespa. Brazilian Review of Finance, V. 14, N. 3.' 4 | citation.bibtex <- '@article{perlin2016gethfdata, 5 | title={GetHFData: A R Package for Downloading and Aggregating High Frequency Trading Data from Bovespa}, 6 | author={Perlin, Marcelo and Henrique, Ramos}, 7 | journal={Brazilian Review of Finance}, 8 | volume={14}, 9 | number={3}, 10 | year={2016}, 11 | publisher={Brazilian Society of Finance} 12 | }' 13 | my.message <- paste('Thank you for using GetHFData! More details about the package can be found in:\n\n', 14 | 'http://bibliotecadigital.fgv.br/ojs/index.php/rbfin/article/view/64587/65702','\n\n', 15 | 'If applicable, please use the following citations in your research report. Thanks!', 16 | '\n\nAPA:\n',citation.apa, 17 | '\n\nbibtex:\n',citation.bibtex ) 18 | packageStartupMessage(my.message) 19 | } 20 | -------------------------------------------------------------------------------- /R/ghfd_build_lob.R: -------------------------------------------------------------------------------- 1 | #' Building LOB (limit order book) from orders 2 | #' 3 | #' @param df.orders A dataframe, output from ghfd_GetHFData 4 | #' @param silent Should the function print progress ? (TRUE (default) or FALSE) 5 | #' 6 | #' @return A dataframe with information about LOB 7 | #' @export 8 | #' 9 | #' @examples 10 | #' \dontrun{ 11 | #' library(GetHFData) 12 | #' first.time <- '11:00:00' 13 | #' last.time <- '17:00:00' 14 | #' first.date <- as.Date('2015-11-03') 15 | #' last.date <- as.Date('2015-11-03') 16 | #' type.output <- 'raw' 17 | #' type.data <- 'orders' 18 | #' type.market = 'equity-odds' 19 | #' 20 | #' df.out <- ghfd_get_HF_data(my.assets =my.assets, 21 | #' type.market = type.market, 22 | #' type.data = type.data, 23 | #' first.date = first.date, 24 | #' last.date = last.date, 25 | #' first.time = first.time, 26 | #' last.time = last.time, 27 | #' type.output = type.output) 28 | #' 29 | #' df.lob <- ghfd_build_lob(df.out) 30 | #' } 31 | ghfd_build_lob <- function(df.orders, silent = TRUE) { 32 | 33 | # check inputs 34 | if (class(df.orders) != 'data.frame'){ 35 | stop('Input df.orders is not a dataframe..') 36 | } 37 | 38 | unique.assets <- unique(df.orders$InstrumentSymbol) 39 | 40 | df.lob <- data.frame() 41 | for (i.asset in unique.assets) { 42 | 43 | temp.df <- df.orders[df.orders$InstrumentSymbol == i.asset, ] 44 | cat(paste0('\nBuilding LOB for ', i.asset, ' - ', nrow(temp.df), ' orders') ) 45 | 46 | temp.lob <- process.lob.from.df(temp.df, silent) 47 | 48 | df.lob <- dplyr::bind_rows(df.lob, temp.lob) 49 | 50 | } 51 | 52 | return(df.lob) 53 | } 54 | -------------------------------------------------------------------------------- /R/ghfd_download_file.R: -------------------------------------------------------------------------------- 1 | #' Downloads a single file from Bovespa ftp 2 | #' 3 | #' This function will take as input a ftp addresss, the name of the downloaded file in the local drive, 4 | #' and it will download the corresponding file. Returns TRUE if it worked and FALSE otherwise. 5 | #' 6 | #' @param my.ftp A complete, including file name, ftp address to download the file from 7 | #' @param out.file Name of downloaded file with HFT data from Bovespa 8 | #' @inheritParams ghfd_get_HF_data 9 | #' 10 | #' @return TRUE if sucessfull, FALSE if not 11 | #' @export 12 | #' 13 | #' @examples 14 | #' 15 | #' my.ftp <- 'ftp://ftp.bmf.com.br/MarketData/Bovespa-Opcoes/NEG_OPCOES_20151229.zip' 16 | #' out.file <- 'temp.zip' 17 | #' 18 | #' \dontrun{ 19 | #' ghfd_download_file(my.ftp = my.ftp, out.file=out.file) 20 | #' } 21 | ghfd_download_file <- function(my.ftp, 22 | out.file, 23 | dl.dir = 'Dl Files', 24 | max.dl.tries = 10){ 25 | 26 | if (length(my.ftp)!=1){ 27 | stop('ERROR: input my.ftp should have length 1') 28 | } 29 | 30 | i.try <- 1 31 | 32 | while (TRUE){ 33 | cat(paste0(' Attempt ', i.try)) 34 | 35 | if (file.exists(out.file)&(file.size(out.file)>100)){ 36 | cat(' - File exists, skipping dl') 37 | break() 38 | 39 | } else { 40 | # DO TRIES for download 41 | 42 | try({ 43 | utils::download.file(url = my.ftp , 44 | method = 'auto', 45 | destfile = out.file , 46 | quiet = T) 47 | }) 48 | 49 | 50 | if (file.size(out.file) < 100 ){ 51 | cat(' - Error in downloading. Trying again..') 52 | } else { 53 | return(TRUE) 54 | break() 55 | } 56 | } 57 | 58 | if (i.try==max.dl.tries){ 59 | warning('Reached maximum number of attempts to read ftp content. Exiting now...') 60 | return(FALSE) 61 | } 62 | 63 | i.try <- i.try + 1 64 | 65 | Sys.sleep(1) 66 | } 67 | 68 | 69 | 70 | } 71 | -------------------------------------------------------------------------------- /R/ghfd_get_available_tickers_from_file.R: -------------------------------------------------------------------------------- 1 | #' Function to get available tickers from downloaded zip file 2 | #' 3 | #' This function will read the zip file downloaded from Bovespa and output 4 | #' a numeric vector where the names of the elements represents the different tickers 5 | #' and the numeric values as the number of trades for each ticker 6 | #' 7 | #' @inheritParams ghfd_download_file 8 | #' 9 | #' @return A dataframe with the number of trades for each ticker found in file 10 | #' @export 11 | #' 12 | #' @examples 13 | #' 14 | #' ## get file from package (usually this would be been downloaded from the ftp) 15 | #' out.file <- system.file("extdata", 'NEG_OPCOES_20151126.zip', package = "GetHFData") 16 | #' 17 | #' df.tickers <- ghfd_get_available_tickers_from_file(out.file) 18 | #' 19 | #' print(head(df.tickers)) 20 | ghfd_get_available_tickers_from_file <- function(out.file){ 21 | 22 | if (length(out.file)!=1){ 23 | stop('ERROR: input out.file should have length 1') 24 | 25 | } 26 | 27 | suppressWarnings(suppressMessages( 28 | my.df <- readr::read_csv2(file = out.file, 29 | skip = 1, 30 | progress = F, 31 | col_names = F, 32 | col_types = readr::cols() ) 33 | )) 34 | 35 | out <- sort(table(my.df$X2), decreasing = T) 36 | 37 | df.out <- data.frame(tickers = names(out), 38 | n.obs = as.numeric(out), 39 | f.name = out.file) 40 | return(df.out) 41 | 42 | 43 | } 44 | 45 | -------------------------------------------------------------------------------- /R/ghfd_get_available_tickers_from_ftp.R: -------------------------------------------------------------------------------- 1 | #' Function to get available tickers from ftp 2 | #' 3 | #' This function will read the Bovespa ftp for a given market/date and output 4 | #' a numeric vector where the names of the elements represents the different tickers 5 | #' and the numeric values as the number of trades for each ticker 6 | #' 7 | #' @param my.date A single date to check tickers in ftp (e.g. '2015-11-03') 8 | #' @inheritParams ghfd_get_HF_data 9 | #' 10 | #' @return A data.frame with the tickers, number of found trades and file name 11 | #' @export 12 | #' 13 | #' @examples 14 | #' 15 | #' \dontrun{ 16 | #' df.tickers <- ghfd_get_available_tickers_from_ftp(my.date = '2015-11-03', 17 | #' type.market = 'BMF') 18 | #' 19 | #' print(head(df.tickers)) 20 | #' } 21 | ghfd_get_available_tickers_from_ftp <- function(my.date = '2015-11-03', 22 | type.market = 'equity', 23 | type.data = 'trades', 24 | dl.dir = 'ftp files', 25 | max.dl.tries = 10){ 26 | 27 | if (length(my.date)!=1){ 28 | stop('ERROR: input my.date should have length = 1') 29 | } 30 | 31 | # check date class 32 | my.date <- as.Date(my.date) 33 | if (class(my.date) != 'Date') { 34 | stop('ERROR: Input my.date can either be a Date object or a string with the standard data format (YYYY-MM-DD)') 35 | } 36 | 37 | if (!dir.exists(dl.dir)) { 38 | dir.create(dl.dir) 39 | } 40 | 41 | # check type.market 42 | # check type.market 43 | possible.names <- c('equity','equity-odds','options','BMF') 44 | 45 | idx <- type.market %in% possible.names 46 | 47 | if (!any(idx)){ 48 | stop(paste(c('Input type.market not valid. It should be one of the following: ', possible.names), collapse = ', ')) 49 | } 50 | 51 | # test for internet 52 | test.internet <- curl::has_internet() 53 | 54 | # set ftp site 55 | if (type.market == 'equity') 56 | my.ftp <- "ftp://ftp.bmf.com.br/marketdata/Bovespa-Vista/" 57 | if (type.market == 'equity-odds') 58 | my.ftp <- "ftp://ftp.bmf.com.br/marketdata/Bovespa-Vista/" 59 | if (type.market == 'options') 60 | my.ftp <- "ftp://ftp.bmf.com.br/MarketData/Bovespa-Opcoes/" 61 | if (type.market == 'BMF') 62 | my.ftp <- "ftp://ftp.bmf.com.br/marketdata/BMF/" 63 | 64 | # get contents 65 | df.ftp <- ghfd_get_ftp_contents(type.market = type.market, 66 | type.data = type.data, 67 | max.dl.tries = max.dl.tries) 68 | 69 | idx <- which(df.ftp$dates == my.date) 70 | 71 | if (length(idx)==0){ 72 | 73 | closest.date <- df.ftp$dates[which.min(abs(df.ftp$dates - my.date))] 74 | 75 | warning(paste('Cant find date', my.date,' in ftp. Selecting the closest date,',closest.date)) 76 | 77 | idx <- which(df.ftp$dates == closest.date) 78 | 79 | } 80 | 81 | files.to.dl <- df.ftp$files[idx] 82 | 83 | my.links <- paste0(my.ftp, files.to.dl) 84 | 85 | my.url <- my.links[1] 86 | out.file <- paste0(dl.dir, '/', files.to.dl[1]) 87 | 88 | ghfd_download_file(my.url, out.file, max.dl.tries) 89 | 90 | suppressWarnings(suppressMessages( 91 | my.df <- readr::read_csv2(file = out.file, 92 | skip = 1, 93 | progress = F, 94 | col_names = F, 95 | col_types = readr::cols() ) 96 | )) 97 | 98 | 99 | out <- sort(table(my.df$X2), decreasing = T) 100 | 101 | df.out <- data.frame(tickers = names(out), 102 | n.obs = as.numeric(out), 103 | type.market = type.market, 104 | f.name = out.file) 105 | return(df.out) 106 | 107 | } 108 | 109 | -------------------------------------------------------------------------------- /R/ghfd_get_ftp_contents.R: -------------------------------------------------------------------------------- 1 | #' Gets the contents of Bovespa ftp 2 | #' 3 | #' This function will access the Bovespa ftp and return a vector with all files related to trades (all others are ignored) 4 | #' 5 | #' @inheritParams ghfd_get_HF_data 6 | #' 7 | #' @return A list with all files from the ftp that are related to executed trades 8 | #' @export 9 | #' 10 | #' @examples 11 | #' 12 | #' \dontrun{ 13 | #' ftp.files <- ghfd_get_ftp_contents(type.market = 'equity') 14 | #' print(ftp.files) 15 | #' } 16 | ghfd_get_ftp_contents <- function(type.market = 'equity', 17 | max.dl.tries = 10, 18 | type.data = 'trades'){ 19 | 20 | # check type.market 21 | possible.names <- c('equity','equity-odds','options','BMF') 22 | idx <- type.market %in% possible.names 23 | 24 | if (!any(idx)){ 25 | stop(paste(c('Input type.market not valid. It should be one of the following: ', possible.names), collapse = ', ')) 26 | } 27 | 28 | # check type.data 29 | possible.names <- c('trades','orders') 30 | idx <- type.data %in% possible.names 31 | 32 | if (!any(idx)){ 33 | stop(paste(c('Input type.data not valid. It should be one of the following: ', possible.names), collapse = ', ')) 34 | } 35 | 36 | # test for internet 37 | test.internet <- curl::has_internet() 38 | 39 | if (!test.internet){ 40 | stop('No internet connection found...') 41 | } 42 | 43 | # set ftp site 44 | if (type.market == 'equity') my.ftp <- "ftp://ftp.bmf.com.br/marketdata/Bovespa-Vista/" 45 | if (type.market == 'equity-odds') my.ftp <- "ftp://ftp.bmf.com.br/marketdata/Bovespa-Vista/" 46 | if (type.market == 'options') my.ftp <- "ftp://ftp.bmf.com.br/MarketData/Bovespa-Opcoes/" 47 | if (type.market == 'BMF') my.ftp <- "ftp://ftp.bmf.com.br/marketdata/BMF/" 48 | 49 | # set time stop (ftp seems to give wrong files sometimes..) 50 | Sys.sleep(1) 51 | 52 | i.try <- 1 53 | while (TRUE){ 54 | cat(paste('\nReading ftp contents for ',type.market, '(',type.data,')', 55 | ' (attempt = ', i.try,'|',max.dl.tries,')',sep = '')) 56 | files.at.ftp <- NULL 57 | try({ 58 | files.at.ftp <- RCurl::getURL(my.ftp, 59 | verbose=F, 60 | ftp.use.epsv=FALSE, 61 | dirlistonly = TRUE) 62 | }) 63 | 64 | if (type.data == 'trades') { 65 | # filter ftp files for trades 66 | # pattern.files <- 'NEG_(.*).zip' 67 | 68 | # Fix for issue 5: https://github.com/msperlin/GetHFData/issues/5 69 | 70 | pattern.files <- 'NEG_(.*)' 71 | } 72 | else if (type.data == 'orders') { 73 | # pattern.files <- 'OFER_(.*).zip' 74 | # Fix for issue 5: https://github.com/msperlin/GetHFData/issues/5 75 | pattern.files <- 'OFER_(.*)' 76 | } 77 | 78 | files.at.ftp <- stringr::str_extract_all(files.at.ftp, 79 | pattern = pattern.files )[[1]] 80 | 81 | # remove or not FRAC market files 82 | idx <- stringr::str_detect(files.at.ftp, pattern = stringr::fixed('FRAC')) 83 | 84 | if (type.market=='equity-odds'){ 85 | files.at.ftp <- files.at.ftp[idx] 86 | } else { 87 | 88 | files.at.ftp <- files.at.ftp[!idx] 89 | } 90 | 91 | # remove BMF files in Bovespa equity (why are these files there??) 92 | 93 | if ((type.market =='equity')|(type.market=='equity-odds')){ 94 | idx <- stringr::str_detect(files.at.ftp, pattern = stringr::fixed('BMF')) 95 | files.at.ftp <- files.at.ftp[!idx] 96 | 97 | idx <- stringr::str_detect(files.at.ftp, pattern = stringr::fixed('OPCOES')) 98 | files.at.ftp <- files.at.ftp[!idx] 99 | } 100 | 101 | # remove larger zip files with several txt files (only a couple of months) 102 | # DEPRECATED: THESE FILES WITH LARGE NAMES ARE NO LONGER IN THE FTP 103 | #idx <- sapply(files.at.ftp, FUN = function(x) return(stringr::str_count(x,pattern = '_')))<3 104 | #files.at.ftp <- files.at.ftp[idx] 105 | 106 | # check if html.code and size makes sense. If not, download it again 107 | 108 | if ( is.null(files.at.ftp)|(length(files.at.ftp)<50) ){ 109 | cat(' - Error in reading ftp contents. Trying again..') 110 | } else { 111 | break() 112 | } 113 | 114 | if (i.try==max.dl.tries){ 115 | stop('Reached maximum number of attempts to read ftp content. Exiting now...') 116 | } 117 | 118 | i.try <- i.try + 1 119 | 120 | Sys.sleep(2) 121 | } 122 | 123 | # find dates from file names 124 | 125 | ftp.dates <- unlist(stringr::str_extract_all(files.at.ftp, 126 | pattern = paste0(rep('[0-9]',8), 127 | collapse = ''))) 128 | ftp.dates <- as.Date(ftp.dates,format = '%Y%m%d') 129 | 130 | 131 | 132 | df.ftp <- data.frame(files = as.character(files.at.ftp), 133 | dates = ftp.dates, 134 | link = as.character(paste0(my.ftp,as.character(files.at.ftp)))) 135 | 136 | if (type.data == 'orders') { 137 | df.ftp$type.order <- ifelse(stringr::str_detect(files.at.ftp, 138 | stringr::fixed('VDA')), 'Sell','Buy') 139 | } 140 | 141 | return(df.ftp) 142 | 143 | } 144 | -------------------------------------------------------------------------------- /R/ghfd_lob_fcts.R: -------------------------------------------------------------------------------- 1 | #' Organizes LOB (internal function) 2 | #' 3 | #' This internal recursive function organizes the lob by making sure that all prices and time are ordered. 4 | #' Every time that prices in the bid and ask matches, it will create a trade and modify the lob accordingly. 5 | #' 6 | #' @param my.lob A LOB (order book) 7 | #' @inheritParams ghfd_build_lob 8 | #' 9 | #' @return An organized LOB 10 | #' 11 | #' @examples 12 | #' 13 | #' # no examples (internal) 14 | organize.lob <- function(my.lob, silent = TRUE) { 15 | 16 | if (is.na(my.lob$last.update)){ 17 | return(my.lob) 18 | } 19 | 20 | # order by price and time 21 | idx.ask <- order(my.lob$ask.price, my.lob$ask.time) 22 | my.lob$ask.price <- my.lob$ask.price[idx.ask] 23 | my.lob$ask.vol <- my.lob$ask.vol[idx.ask] 24 | my.lob$ask.time <- my.lob$ask.time[idx.ask] 25 | 26 | idx.bid <- order(my.lob$bid.price, my.lob$bid.time, decreasing = TRUE) 27 | my.lob$bid.price <- my.lob$bid.price[idx.bid] 28 | my.lob$bid.vol <- my.lob$bid.vol[idx.bid] 29 | my.lob$bid.time <- my.lob$ask.time[idx.bid] 30 | 31 | # fix for empty lob 32 | if (length(my.lob$bid.price) ==0 ) return(my.lob) 33 | if (length(my.lob$ask.price) ==0 ) return(my.lob) 34 | 35 | if (any(is.na(c(my.lob$bid.price[1], my.lob$ask.price[1])))) return(my.lob) 36 | #rowser() 37 | 38 | # check trades in top of lob 39 | if (my.lob$bid.price[1] >= my.lob$ask.price[1]){ 40 | 41 | if (!silent) cat('\tFound trades!') 42 | 43 | diff.vol <- my.lob$bid.vol[1] - my.lob$ask.vol[1] 44 | 45 | if (diff.vol < 0 ){ 46 | my.lob$ask.vol[1] = abs(diff.vol) 47 | 48 | my.lob$bid.price <- my.lob$bid.price[-1] 49 | my.lob$bid.vol <- my.lob$bid.vol[-1] 50 | my.lob$bid.time <- my.lob$bid.time[-1] 51 | my.lob$bid.id <- my.lob$bid.id[-1] 52 | 53 | 54 | } else if (diff.vol > 0) { 55 | my.lob$bid.vol[1] = abs(diff.vol) 56 | 57 | my.lob$ask.price <- my.lob$ask.price[-1] 58 | my.lob$ask.vol <- my.lob$ask.vol[-1] 59 | my.lob$ask.time <- my.lob$ask.time[-1] 60 | my.lob$ask.id <- my.lob$ask.id[-1] 61 | } else if (diff.vol ==0){ 62 | 63 | my.lob$bid.price <- my.lob$bid.price[-1] 64 | my.lob$bid.vol <- my.lob$bid.vol[-1] 65 | my.lob$bid.time <- my.lob$bid.time[-1] 66 | my.lob$bid.id <- my.lob$bid.id[-1] 67 | 68 | my.lob$ask.price <- my.lob$ask.price[-1] 69 | my.lob$ask.vol <- my.lob$ask.vol[-1] 70 | my.lob$ask.time <- my.lob$ask.time[-1] 71 | my.lob$ask.id <- my.lob$ask.id[-1] 72 | 73 | } 74 | 75 | my.lob <- organize.lob(my.lob) 76 | } 77 | 78 | #print.lob(my.lob) 79 | 80 | return(my.lob) 81 | } 82 | 83 | #' Adds an order to the LOB 84 | #' 85 | #' @inheritParams organize.lob 86 | #' @param order.in An order from the data 87 | #' 88 | #' @return An LOB with the new order 89 | #' 90 | #' @examples 91 | #' # no example (internal) 92 | add.order <- function(my.lob, order.in, silent = TRUE) { 93 | 94 | my.lob$last.update <- order.in$time 95 | 96 | if (!silent) cat('\t', order.in$type.order) 97 | 98 | # new order 99 | if (order.in$type.order == 'New' ) { 100 | 101 | if (order.in$side == 'Buy') { 102 | my.lob$bid.price <- c(my.lob$bid.price, order.in$price) 103 | my.lob$bid.vol <- c(my.lob$bid.vol , order.in$vol) 104 | my.lob$bid.id <- c(my.lob$bid.id , order.in$id) 105 | my.lob$bid.time <- c(my.lob$bid.time , order.in$time) 106 | } 107 | 108 | if (order.in$side == 'Sell') { 109 | my.lob$ask.price <- c(my.lob$ask.price, order.in$price) 110 | my.lob$ask.vol <- c(my.lob$ask.vol , order.in$vol) 111 | my.lob$ask.id <- c(my.lob$ask.id , order.in$id) 112 | my.lob$ask.time <- c(my.lob$ask.time , order.in$time) 113 | 114 | } 115 | } 116 | 117 | # cancel order 118 | if (order.in$type.order == 'Cancel' ) { 119 | 120 | if (order.in$side == 'Buy') { 121 | idx <- which(my.lob$bid.id == order.in$id) 122 | #browser() 123 | if (length(idx) == 0){ 124 | if (!silent) cat('\tCant match id for cancel order..') 125 | return(my.lob) 126 | } 127 | 128 | my.lob$bid.price <- my.lob$bid.price[-idx] 129 | my.lob$bid.vol <- my.lob$bid.vol[-idx] 130 | my.lob$bid.id <- my.lob$bid.id[-idx] 131 | my.lob$bid.time <- my.lob$bid.time[-idx] 132 | } 133 | 134 | if (order.in$side == 'Sell') { 135 | idx <- which(my.lob$ask.id == order.in$id) 136 | 137 | if (length(idx) == 0){ 138 | if (!silent) cat('\tCant match id for cancel order..') 139 | return(my.lob) 140 | } 141 | 142 | my.lob$ask.price <- my.lob$ask.price[-idx] 143 | my.lob$ask.vol <- my.lob$ask.vol[-idx] 144 | my.lob$ask.id <- my.lob$ask.id[-idx] 145 | my.lob$ask.time <- my.lob$ask.time[-idx] 146 | 147 | } 148 | 149 | } 150 | 151 | # update order 152 | if (order.in$type.order == 'Update' ) { 153 | 154 | if (order.in$side == 'Buy') { 155 | idx <- which(my.lob$bid.id == order.in$id) 156 | 157 | if (length(idx) == 0){ 158 | if (!silent) cat('\tCant match id for Update order..') 159 | return(my.lob) 160 | } 161 | 162 | my.lob$bid.price[idx] <- order.in$price 163 | my.lob$bid.vol[idx] <- order.in$vol 164 | my.lob$bid.id[idx] <- order.in$id 165 | my.lob$bid.time[idx] <- order.in$time 166 | } 167 | 168 | if (order.in$side == 'Sell') { 169 | idx <- which(my.lob$ask.id == order.in$id) 170 | 171 | if (length(idx) == 0){ 172 | if (!silent) cat('\tCant match id for Update order..') 173 | return(my.lob) 174 | } 175 | 176 | my.lob$ask.price[idx] <- order.in$price 177 | my.lob$ask.vol[idx] <- order.in$vol 178 | my.lob$ask.id[idx] <- order.in$id 179 | my.lob$ask.time[idx] <- order.in$time 180 | 181 | } 182 | 183 | } 184 | 185 | 186 | my.lob <- organize.lob(my.lob) 187 | 188 | 189 | return(my.lob) 190 | 191 | } 192 | 193 | #' Prints the LOB 194 | #' 195 | #' @inheritParams organize.lob 196 | #' @param max.level Max level of lob to print 197 | #' 198 | #' @return nothing 199 | #' 200 | #' @examples 201 | #' # no example (internal) 202 | print.lob <- function(my.lob, max.level = 3) { 203 | 204 | cat(paste0('Last update: ', my.lob$last.update, '\n') ) 205 | 206 | cat('\nASK price: ', paste0(format(my.lob$ask.price,digits = 4), collapse = '\t')) 207 | cat('\nBID price: ', paste0(format(my.lob$bid.price, digits = 4), collapse = '\t')) 208 | 209 | cat('\nASK vol: ', paste0(format(my.lob$ask.vol,digits = 4), collapse = '\t')) 210 | cat('\nBID vol: ', paste0(format(my.lob$bid.vol, digits = 4), collapse = '\t')) 211 | 212 | } 213 | 214 | #' Process LOB from asset dataframe 215 | #' 216 | #' @param asset.df A dataframe with orders for a single asset 217 | #' @inheritParams ghfd_build_lob 218 | #' 219 | #' @return The lob for the single asset 220 | #' 221 | #' @examples 222 | #' # no example (internal) 223 | process.lob.from.df <- function(asset.df, silent = TRUE) { 224 | 225 | # sort df by priority time 226 | asset.df <- asset.df[order(asset.df$PriorityDateTime), ] 227 | 228 | # get first new order to fill book 229 | idx.bid <- sort(which(asset.df$OrderSide == 'Buy'& asset.df$ExecutionType == 'Trade'))[1:3] 230 | idx.ask <- sort(which(asset.df$OrderSide == 'Sell'& asset.df$ExecutionType == 'Trade'))[1:3] 231 | 232 | if (any(is.na(c(idx.bid,idx.ask)))){ 233 | idx.bid <- sort(which(asset.df$OrderSide == 'Buy'& asset.df$ExecutionType == 'New'))[1:3] 234 | idx.ask <- sort(which(asset.df$OrderSide == 'Sell'& asset.df$ExecutionType == 'New'))[1:3] 235 | 236 | } 237 | 238 | my.lob <- list(bid.price = asset.df$OrderPrice[idx.bid], 239 | ask.price = asset.df$OrderPrice[idx.ask], 240 | bid.id = asset.df$SequentialOrderNumber[idx.bid], 241 | ask.id = asset.df$SequentialOrderNumber[idx.ask], 242 | bid.vol = asset.df$TotalQuantity[idx.bid], 243 | ask.vol = asset.df$TotalQuantity[idx.ask], 244 | bid.time = asset.df$PriorityDateTime[idx.bid], 245 | ask.time = asset.df$PriorityDateTime[idx.ask], 246 | last.update = NA) 247 | 248 | delta.sec <- 5 # in sec 249 | lob.info <- data.frame() 250 | my.l <- list() 251 | for (i.row in seq(1, nrow(asset.df))) { 252 | 253 | i.df <- asset.df[i.row, ] 254 | 255 | my.asset <- unique(asset.df$InstrumentSymbol) 256 | 257 | if (!silent) { 258 | cat(paste0("\n\tProcessing ", my.asset,'\t',i.df$PriorityDateTime, '\t(',i.row,'|', nrow(asset.df),')' ) ) 259 | } 260 | 261 | 262 | #if (i.row ==6) browser() 263 | #browser() 264 | if (i.row == 1) my.lob <- organize.lob(my.lob, silent) 265 | 266 | order.in <- list() 267 | order.in$price = i.df$OrderPrice 268 | order.in$vol = i.df$TotalQuantity[1] 269 | order.in$side = i.df$OrderSide[1] 270 | order.in$type.order = i.df$ExecutionType[1] 271 | order.in$id = i.df$SequentialOrderNumber[1] 272 | order.in$time = i.df$PriorityDateTime 273 | 274 | #print(as.character(order.in$type.order)) 275 | 276 | my.lob <- add.order(my.lob, order.in, silent = silent) 277 | 278 | df.lob <- data.frame(InstrumentSymbol = my.asset, 279 | best.ask = my.lob$ask.price[1], 280 | best.bid = my.lob$bid.price[1], 281 | mid.quote = (my.lob$ask.price[1] + my.lob$bid.price[1])/2, 282 | spread = my.lob$ask.price[1] - my.lob$bid.price[1], 283 | update.time = my.lob$last.update) 284 | 285 | # create list, later rbind it 286 | my.l <- c( my.l, list(df.lob)) 287 | 288 | # OLD code with rbind (slower) 289 | 290 | #lob.info <- rbind(lob.info, data.frame(InstrumentSymbol = my.asset, 291 | # best.ask = my.lob$ask.price[1], 292 | # best.bid = my.lob$bid.price[1], 293 | # mid.quote = (my.lob$ask.price[1] + my.lob$bid.price[1])/2, 294 | # spread = my.lob$ask.price[1] - my.lob$bid.price[1], 295 | # update.time = my.lob$last.update)) 296 | #print.lob(my.lob) 297 | 298 | } 299 | 300 | lob.info <- do.call(what = dplyr::bind_rows, args = my.l) 301 | 302 | return(lob.info) 303 | 304 | } 305 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Download and Aggregate High Frequency Trading Data from Bovespa 2 | 3 | --- 4 | 5 | **2020-06-30 IMPORTANT: B3 closed the public access to the ftp site and, therefore, the package is not working. As far as I know, there is no expectation of any change in that status in the near future. I'm also not aware of any other source of HF data.** 6 | 7 | --- 8 | 9 | Recently, Bovespa, the Brazilian financial exchange company, allowed external access to its [ftp site](ftp://ftp.bmf.com.br/). In this address one can find several information regarding the Brazilian financial system, including datasets with high frequency (tick by tick) trading data for three different markets: equity, options and BMF. 10 | 11 | Downloading and processing these files, however, can be exausting. The dataset is composed of zip files with the whole trading data, separated by day and market. These files are huge in size and processing or aggregating them in a usefull manner requires specific knowledge for the structure of the dataset. 12 | 13 | The package GetHFData make is easy to access this dataset directly by allowing the easy importation and aggregations of it. Based on this package the user can: 14 | 15 | * Access the contents of the Bovespa ftp using function function `ghfd_get_ftp_contents` 16 | * Get the list of available ticker in the trading data using `ghfd_get_available_tickers_from_ftp` 17 | * Download individual files using `ghfd_download_file` 18 | * Download and process a batch of dates and assets codes with `ghfd_get_HF_data` 19 | 20 | In the next example we will only use a local file from the package. Given the size of the files in the ftp and the CHECK process of CRAN, it makes sense to keep this vignette compact and fast to run. More details about the usage of the package can be found in my [RBFIN paper](http://bibliotecadigital.fgv.br/ojs/index.php/rbfin/article/view/64587/65702 ). 21 | 22 | ## Instalation 23 | 24 | You can install the development version from github: 25 | 26 | ``` 27 | devtools::install_github('msperlin/GetHFData') 28 | ``` 29 | 30 | The stable version is availabe in CRAN: 31 | 32 | ``` 33 | install.packages('GetHFData') 34 | ``` 35 | 36 | ## Downloading and aggregating TRADE data 37 | 38 | Package GetHDData supports batch downloads and processing of several different tickers using start and end dates. In this vignette we are not running the code given the large size of the downloaded files. You should try the next example in your own computer (just copy, paste and run the code in R). 39 | 40 | In this example we will download files from the ftp for all stocks related to Petrobras (PETR) and Vale do Rio Doce (VALE). The data will be processed, resulting in a dataframe with aggregated data. 41 | 42 | ``` 43 | library(GetHFData) 44 | 45 | first.time <- '11:00:00' 46 | last.time <- '17:00:00' 47 | 48 | first.date <- '2015-11-01' 49 | last.date <- '2015-11-10' 50 | type.output <- 'agg' 51 | type.data <- 'trades' 52 | agg.diff <- '15 min' 53 | 54 | # partial matching is available 55 | my.assets <- c('PETR','VALE') 56 | type.matching <- 'partial' 57 | type.market <- 'equity' 58 | 59 | df.out <- ghfd_get_HF_data(my.assets =my.assets, 60 | type.matching = type.matching, 61 | type.market = type.market, 62 | type.data = type.data, 63 | first.date = first.date, 64 | last.date = last.date, 65 | first.time = first.time, 66 | last.time = last.time, 67 | type.output = type.output, 68 | agg.diff = agg.diff) 69 | 70 | ``` 71 | 72 | ## Downloading and aggregating ORDER data 73 | 74 | Version 1.3 of `GetHFData` makes it possible to download and aggregate order data from Bovespa. The data comprises buy and sell orders sent by market operators. Tabular data includes type of orders (buy or sell, new/update/cancel/..), date/time of submission, priority time, prices, order quantity, among many other information. 75 | 76 | **Be aware that these are very large files.** One day of buy and sell orders in the equity market is around 100 MB zipped and close to 1 GB unzipped. If you computer is not suited to store this data in its memory, **it will crash**. 77 | 78 | Here's an example of usage that will download and aggregate order data for all option contracts related to Petrobras (PETR): 79 | 80 | ``` 81 | library(GetHFData) 82 | 83 | first.time <- '10:00:00' 84 | last.time <- '17:00:00' 85 | 86 | first.date <- '2015-08-18' 87 | last.date <- '2015-08-18' 88 | 89 | type.output <- 'agg' # aggregates data 90 | agg.diff <- '5 min' # interval for aggregation 91 | 92 | my.assets <- 'PETR' # all options related to Petrobras (partial matching) 93 | type.matching <- 'partial' # finds tickers from my.assets using partial matching 94 | type.market = 'options' # option market 95 | type.data <- 'orders' # order data 96 | 97 | df.out <- ghfd_get_HF_data(my.assets =my.assets, 98 | type.data= type.data, 99 | type.matching = type.matching, 100 | type.market = type.market, 101 | first.date = first.date, 102 | last.date = last.date, 103 | first.time = first.time, 104 | last.time = last.time, 105 | type.output = type.output, 106 | agg.diff = agg.diff) 107 | 108 | ``` 109 | -------------------------------------------------------------------------------- /docs/articles/ghfd-vignette-LOB.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Recreating the LOB (limit order book) • GetHFData 9 | 10 | 11 | 12 | 13 | 14 | 15 | 19 | 20 | 21 |
22 |
82 | 83 | 84 | 85 |
86 |
87 | 97 | 98 | 99 | 100 |

Version 1.4 of GetHFData adds functions for recreating the LOB (limit order book) from the order data. The LOB is recreated by sorting all trading orders (buy and sell) and matching them whenever there is a match of prices.

101 |

Simulating the LOB is a recursive and computer intensive problem. The current code is not optimized for speed and it may take a long time to process even a small set of financial orders.

102 |

Here’s an example of usage:

103 |
library(GetHFData)
104 | 
105 | first.time <- '10:00:00'
106 | last.time <- '17:00:00'
107 | 
108 | first.date <- '2016-08-18' 
109 | last.date <- '2016-08-18'
110 | 
111 | type.output <- 'raw' # aggregates data 
112 | 
113 | my.assets <- 'PETR4F' 
114 | type.matching <- 'exact' 
115 | type.market = 'equity-odds' 
116 | type.data <- 'orders' # order data
117 | 
118 | df.out <- ghfd_get_HF_data(my.assets =my.assets, 
119 |                            type.data= type.data,
120 |                            type.matching = type.matching,
121 |                            type.market = type.market,
122 |                            first.date = first.date,
123 |                            last.date = last.date,
124 |                            first.time = first.time,
125 |                            last.time = last.time,
126 |                            type.output = type.output)
127 |                            
128 | df.lob <- ghfd_build_lob(df.out)
129 |
130 | 131 | 133 | 134 |
135 | 136 | 137 |
140 | 141 |
142 |

Site built with pkgdown.

143 |
144 | 145 |
146 |
147 | 148 | 149 | 150 | 151 | 152 | -------------------------------------------------------------------------------- /docs/articles/ghfd-vignette-Orders.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Downloading and aggregating order data from Bovespa • GetHFData 9 | 10 | 11 | 12 | 13 | 14 | 15 | 19 | 20 | 21 |
22 |
82 | 83 | 84 | 85 |
86 |
87 | 97 | 98 | 99 | 100 |

Version 1.3 of GetHFData makes it possible to download and aggregate order data from Bovespa. The data comprises buy and sell orders sent by market operators. Tabular data includes type of orders (buy or sell, new/update/cancel/..), date/time of submission, priority time, prices, order quantity, among many other information.

101 |

Be aware that these are very large files. One day of buy and sell orders in the equity market is around 100 MB zipped and close to 1 GB unzipped. If you computer is not suited to store this data in its memory, it will crash.

102 |

Here’s an example of usage that will download and aggregate order data for all option contracts related to Petrobras (PETR):

103 |
library(GetHFData)
104 | 
105 | first.time <- '10:00:00'
106 | last.time <- '17:00:00'
107 | 
108 | first.date <- '2015-08-18' 
109 | last.date <- '2015-08-18'
110 | 
111 | type.output <- 'agg' # aggregates data 
112 | agg.diff <- '5 min' # interval for aggregation
113 | 
114 | my.assets <- 'PETR' # all options related to Petrobras (partial matching)
115 | type.matching <- 'partial' # finds tickers from my.assets using partial matching
116 | type.market = 'options' # option market
117 | type.data <- 'orders' # order data
118 | 
119 | df.out <- ghfd_get_HF_data(my.assets =my.assets, 
120 |                            type.data= type.data,
121 |                            type.matching = type.matching,
122 |                            type.market = type.market,
123 |                            first.date = first.date,
124 |                            last.date = last.date,
125 |                            first.time = first.time,
126 |                            last.time = last.time,
127 |                            type.output = type.output,
128 |                            agg.diff = agg.diff)
129 |
130 | 131 | 133 | 134 |
135 | 136 | 137 |
140 | 141 |
142 |

Site built with pkgdown.

143 |
144 | 145 |
146 |
147 | 148 | 149 | 150 | 151 | 152 | -------------------------------------------------------------------------------- /docs/articles/ghfd-vignette-Trades_files/figure-html/plot.prices-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/msperlin/GetHFData/33328a5920a1087fc3729893d17f532d5970f349/docs/articles/ghfd-vignette-Trades_files/figure-html/plot.prices-1.png -------------------------------------------------------------------------------- /docs/articles/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Articles • GetHFData 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 44 | 45 | 46 | 47 | 48 | 49 |
50 |
51 | 111 | 112 | 113 |
114 | 115 |
116 |
117 | 120 | 121 | 131 |
132 |
133 | 134 |
135 | 138 | 139 |
140 |

Site built with pkgdown.

141 |
142 | 143 |
144 |
145 | 146 | 147 | 148 | 149 | 150 | 151 | -------------------------------------------------------------------------------- /docs/authors.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Citation and Authors • GetHFData 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 44 | 45 | 46 | 47 | 48 | 49 |
50 |
51 | 111 | 112 | 113 |
114 | 115 |
116 |
117 | 120 | 121 |

Perlin M, Ramos H (2016). 122 | GetHFData: A R Package for Downloading and Aggregating High Frequency Trading Data from Bovespa. 123 | https://ssrn.com/abstract=2824058. 124 |

125 |
@Manual{,
126 |   title = {GetHFData: A R Package for Downloading and Aggregating High Frequency Trading Data from Bovespa},
127 |   author = {Marcelo Perlin and Henrique Ramos},
128 |   year = {2016},
129 |   journal = {Available at SSRN},
130 |   url = {https://ssrn.com/abstract=2824058},
131 | }
132 | 135 | 136 |
    137 |
  • 138 |

    Marcelo Perlin. Author, maintainer. 139 |

    140 |
  • 141 |
  • 142 |

    Henrique Ramos. Contributor. 143 |

    144 |
  • 145 |
146 | 147 |
148 | 149 |
150 | 151 | 152 |
153 | 156 | 157 |
158 |

Site built with pkgdown.

159 |
160 | 161 |
162 |
163 | 164 | 165 | 166 | 167 | 168 | 169 | -------------------------------------------------------------------------------- /docs/docsearch.js: -------------------------------------------------------------------------------- 1 | $(function() { 2 | 3 | // register a handler to move the focus to the search bar 4 | // upon pressing shift + "/" (i.e. "?") 5 | $(document).on('keydown', function(e) { 6 | if (e.shiftKey && e.keyCode == 191) { 7 | e.preventDefault(); 8 | $("#search-input").focus(); 9 | } 10 | }); 11 | 12 | $(document).ready(function() { 13 | // do keyword highlighting 14 | /* modified from https://jsfiddle.net/julmot/bL6bb5oo/ */ 15 | var mark = function() { 16 | 17 | var referrer = document.URL ; 18 | var paramKey = "q" ; 19 | 20 | if (referrer.indexOf("?") !== -1) { 21 | var qs = referrer.substr(referrer.indexOf('?') + 1); 22 | var qs_noanchor = qs.split('#')[0]; 23 | var qsa = qs_noanchor.split('&'); 24 | var keyword = ""; 25 | 26 | for (var i = 0; i < qsa.length; i++) { 27 | var currentParam = qsa[i].split('='); 28 | 29 | if (currentParam.length !== 2) { 30 | continue; 31 | } 32 | 33 | if (currentParam[0] == paramKey) { 34 | keyword = decodeURIComponent(currentParam[1].replace(/\+/g, "%20")); 35 | } 36 | } 37 | 38 | if (keyword !== "") { 39 | $(".contents").unmark({ 40 | done: function() { 41 | $(".contents").mark(keyword); 42 | } 43 | }); 44 | } 45 | } 46 | }; 47 | 48 | mark(); 49 | }); 50 | }); 51 | 52 | /* Search term highlighting ------------------------------*/ 53 | 54 | function matchedWords(hit) { 55 | var words = []; 56 | 57 | var hierarchy = hit._highlightResult.hierarchy; 58 | // loop to fetch from lvl0, lvl1, etc. 59 | for (var idx in hierarchy) { 60 | words = words.concat(hierarchy[idx].matchedWords); 61 | } 62 | 63 | var content = hit._highlightResult.content; 64 | if (content) { 65 | words = words.concat(content.matchedWords); 66 | } 67 | 68 | // return unique words 69 | var words_uniq = [...new Set(words)]; 70 | return words_uniq; 71 | } 72 | 73 | function updateHitURL(hit) { 74 | 75 | var words = matchedWords(hit); 76 | var url = ""; 77 | 78 | if (hit.anchor) { 79 | url = hit.url_without_anchor + '?q=' + escape(words.join(" ")) + '#' + hit.anchor; 80 | } else { 81 | url = hit.url + '?q=' + escape(words.join(" ")); 82 | } 83 | 84 | return url; 85 | } 86 | -------------------------------------------------------------------------------- /docs/jquery.sticky-kit.min.js: -------------------------------------------------------------------------------- 1 | /* 2 | Sticky-kit v1.1.2 | WTFPL | Leaf Corcoran 2015 | http://leafo.net 3 | */ 4 | (function(){var b,f;b=this.jQuery||window.jQuery;f=b(window);b.fn.stick_in_parent=function(d){var A,w,J,n,B,K,p,q,k,E,t;null==d&&(d={});t=d.sticky_class;B=d.inner_scrolling;E=d.recalc_every;k=d.parent;q=d.offset_top;p=d.spacer;w=d.bottoming;null==q&&(q=0);null==k&&(k=void 0);null==B&&(B=!0);null==t&&(t="is_stuck");A=b(document);null==w&&(w=!0);J=function(a,d,n,C,F,u,r,G){var v,H,m,D,I,c,g,x,y,z,h,l;if(!a.data("sticky_kit")){a.data("sticky_kit",!0);I=A.height();g=a.parent();null!=k&&(g=g.closest(k)); 5 | if(!g.length)throw"failed to find stick parent";v=m=!1;(h=null!=p?p&&a.closest(p):b("
"))&&h.css("position",a.css("position"));x=function(){var c,f,e;if(!G&&(I=A.height(),c=parseInt(g.css("border-top-width"),10),f=parseInt(g.css("padding-top"),10),d=parseInt(g.css("padding-bottom"),10),n=g.offset().top+c+f,C=g.height(),m&&(v=m=!1,null==p&&(a.insertAfter(h),h.detach()),a.css({position:"",top:"",width:"",bottom:""}).removeClass(t),e=!0),F=a.offset().top-(parseInt(a.css("margin-top"),10)||0)-q, 6 | u=a.outerHeight(!0),r=a.css("float"),h&&h.css({width:a.outerWidth(!0),height:u,display:a.css("display"),"vertical-align":a.css("vertical-align"),"float":r}),e))return l()};x();if(u!==C)return D=void 0,c=q,z=E,l=function(){var b,l,e,k;if(!G&&(e=!1,null!=z&&(--z,0>=z&&(z=E,x(),e=!0)),e||A.height()===I||x(),e=f.scrollTop(),null!=D&&(l=e-D),D=e,m?(w&&(k=e+u+c>C+n,v&&!k&&(v=!1,a.css({position:"fixed",bottom:"",top:c}).trigger("sticky_kit:unbottom"))),eb&&!v&&(c-=l,c=Math.max(b-u,c),c=Math.min(q,c),m&&a.css({top:c+"px"})))):e>F&&(m=!0,b={position:"fixed",top:c},b.width="border-box"===a.css("box-sizing")?a.outerWidth()+"px":a.width()+"px",a.css(b).addClass(t),null==p&&(a.after(h),"left"!==r&&"right"!==r||h.append(a)),a.trigger("sticky_kit:stick")),m&&w&&(null==k&&(k=e+u+c>C+n),!v&&k)))return v=!0,"static"===g.css("position")&&g.css({position:"relative"}), 8 | a.css({position:"absolute",bottom:d,top:"auto"}).trigger("sticky_kit:bottom")},y=function(){x();return l()},H=function(){G=!0;f.off("touchmove",l);f.off("scroll",l);f.off("resize",y);b(document.body).off("sticky_kit:recalc",y);a.off("sticky_kit:detach",H);a.removeData("sticky_kit");a.css({position:"",bottom:"",top:"",width:""});g.position("position","");if(m)return null==p&&("left"!==r&&"right"!==r||a.insertAfter(h),h.remove()),a.removeClass(t)},f.on("touchmove",l),f.on("scroll",l),f.on("resize", 9 | y),b(document.body).on("sticky_kit:recalc",y),a.on("sticky_kit:detach",H),setTimeout(l,0)}};n=0;for(K=this.length;n 2 | 3 | 5 | 8 | 12 | 13 | -------------------------------------------------------------------------------- /docs/news/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Changelog • GetHFData 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 44 | 45 | 46 | 47 | 48 | 49 |
50 |
51 | 111 | 112 | 113 |
114 | 115 |
116 |
117 | 121 | 122 |
123 |

124 | Version 1.6 (2018-10-10)

125 |

Minor update:

126 |
    127 |
  • Fixed bug in ghfd_get_ftp_contents for ‘equity’ option
  • 128 |
129 |
130 |
131 |

132 | Version 1.5 (2017-11-27)

133 |

Minor update:

134 |
    135 |
  • Added support for milsecond in LOB
  • 136 |
137 |
138 |
139 |

140 | Version 1.4 (2017-09-10)

141 |

Major update:

142 |
    143 |
  • Users can now recreate the LOB (limit order book) using order data from Bovespa
  • 144 |
  • fixed bug for only.dl = TRUE
  • 145 |
146 |
147 |
148 |

149 | Version 1.3 (2017-05-29)

150 |

Major update:

151 |
    152 |
  • Users can now download and aggregate order files (input type.data)
  • 153 |
  • Fixed link to paper
  • 154 |
  • Partial matching for assets is now possible (e.g. use PETR for all stocks or options related to Petrobras)
  • 155 |
  • implement option for only downloading files (this is helpful if you are dealing with order data and will process the files in other R session or software)
  • 156 |
  • muted message “Using ‘,’ as decimal and ‘.’ as grouping mark. Use read_delim() for more control.”
  • 157 |
158 |
159 |
160 |

161 | Version 1.2.4 (2017-01-30)

162 |

Minor update:

163 |
    164 |
  • Fixed bug in msg output when length(my.assets) > 2
  • 165 |
166 |
167 |
168 |

169 | Version 1.2.3 (2017-01-13)

170 |

Minor update:

171 |
    172 |
  • Fixed bug for non existing assets in first date of download process
  • 173 |
  • Changed input Date for simpler format (e.g. ‘2016-01-01’ and not as.Date(‘2016-01-01’))
  • 174 |
175 |
176 |
177 |

178 | Version 1.2.2 (2016-12-05)

179 |

Minor update:

180 |
    181 |
  • Revised apa citation on attach
  • 182 |
  • Fixed some typos in vignette and added link to SSRN paper
  • 183 |
184 |
185 |
186 |

187 | Version 1.2.1 (2016-11-07)

188 |

Minor update with the following changes:

189 |
    190 |
  • The user can now download data from the odd lots equity market (type.market=‘equity-odds’)
  • 191 |
  • Added Henrique Ramos as a contributor
  • 192 |
  • Other minor changes
  • 193 |
194 |
195 |
196 |

197 | Version 1.2.0 (2016-10-14)

198 |

Minor update with the following changes:

199 |
    200 |
  • The function ghfd_get_HF_data now allows for partial matching of asset names and also the download of all assets available in ftp files
  • 201 |
  • Function ghfd_get_available_tickers_from_ftp also returns the type of market in data.frame
  • 202 |
203 |
204 |
205 |

206 | Version 1.1.0 (2016-08-15)

207 |

Major update from initial version with the following changes:

208 |
    209 |
  • The function for finding tickers in the ftp now looks for the closest date in the case that the actual date is missing from the ftp
  • 210 |
  • The function for finding tickers now returns a dataframe with the tickers and number of trades
  • 211 |
  • Added control for bad files
  • 212 |
  • The output for raw and agg type of output were revised
  • 213 |
  • The vignette is revised
  • 214 |
215 |
216 |
217 |

218 | Version 1.0.0 - First commit (2016-07-21)

219 |
220 |
221 | 222 | 240 | 241 |
242 | 243 |
244 | 247 | 248 |
249 |

Site built with pkgdown.

250 |
251 | 252 |
253 |
254 | 255 | 256 | 257 | 258 | 259 | 260 | -------------------------------------------------------------------------------- /docs/pkgdown.css: -------------------------------------------------------------------------------- 1 | /* Sticky footer */ 2 | 3 | /** 4 | * Basic idea: https://philipwalton.github.io/solved-by-flexbox/demos/sticky-footer/ 5 | * Details: https://github.com/philipwalton/solved-by-flexbox/blob/master/assets/css/components/site.css 6 | * 7 | * .Site -> body > .container 8 | * .Site-content -> body > .container .row 9 | * .footer -> footer 10 | * 11 | * Key idea seems to be to ensure that .container and __all its parents__ 12 | * have height set to 100% 13 | * 14 | */ 15 | 16 | html, body { 17 | height: 100%; 18 | } 19 | 20 | body > .container { 21 | display: flex; 22 | height: 100%; 23 | flex-direction: column; 24 | 25 | padding-top: 60px; 26 | } 27 | 28 | body > .container .row { 29 | flex: 1 0 auto; 30 | } 31 | 32 | footer { 33 | margin-top: 45px; 34 | padding: 35px 0 36px; 35 | border-top: 1px solid #e5e5e5; 36 | color: #666; 37 | display: flex; 38 | flex-shrink: 0; 39 | } 40 | footer p { 41 | margin-bottom: 0; 42 | } 43 | footer div { 44 | flex: 1; 45 | } 46 | footer .pkgdown { 47 | text-align: right; 48 | } 49 | footer p { 50 | margin-bottom: 0; 51 | } 52 | 53 | img.icon { 54 | float: right; 55 | } 56 | 57 | img { 58 | max-width: 100%; 59 | } 60 | 61 | /* Typographic tweaking ---------------------------------*/ 62 | 63 | .contents h1.page-header { 64 | margin-top: calc(-60px + 1em); 65 | } 66 | 67 | /* Section anchors ---------------------------------*/ 68 | 69 | a.anchor { 70 | margin-left: -30px; 71 | display:inline-block; 72 | width: 30px; 73 | height: 30px; 74 | visibility: hidden; 75 | 76 | background-image: url(./link.svg); 77 | background-repeat: no-repeat; 78 | background-size: 20px 20px; 79 | background-position: center center; 80 | } 81 | 82 | .hasAnchor:hover a.anchor { 83 | visibility: visible; 84 | } 85 | 86 | @media (max-width: 767px) { 87 | .hasAnchor:hover a.anchor { 88 | visibility: hidden; 89 | } 90 | } 91 | 92 | 93 | /* Fixes for fixed navbar --------------------------*/ 94 | 95 | .contents h1, .contents h2, .contents h3, .contents h4 { 96 | padding-top: 60px; 97 | margin-top: -40px; 98 | } 99 | 100 | /* Static header placement on mobile devices */ 101 | @media (max-width: 767px) { 102 | .navbar-fixed-top { 103 | position: absolute; 104 | } 105 | .navbar { 106 | padding: 0; 107 | } 108 | } 109 | 110 | 111 | /* Sidebar --------------------------*/ 112 | 113 | #sidebar { 114 | margin-top: 30px; 115 | } 116 | #sidebar h2 { 117 | font-size: 1.5em; 118 | margin-top: 1em; 119 | } 120 | 121 | #sidebar h2:first-child { 122 | margin-top: 0; 123 | } 124 | 125 | #sidebar .list-unstyled li { 126 | margin-bottom: 0.5em; 127 | } 128 | 129 | .orcid { 130 | height: 16px; 131 | vertical-align: middle; 132 | } 133 | 134 | /* Reference index & topics ----------------------------------------------- */ 135 | 136 | .ref-index th {font-weight: normal;} 137 | 138 | .ref-index td {vertical-align: top;} 139 | .ref-index .alias {width: 40%;} 140 | .ref-index .title {width: 60%;} 141 | 142 | .ref-index .alias {width: 40%;} 143 | .ref-index .title {width: 60%;} 144 | 145 | .ref-arguments th {text-align: right; padding-right: 10px;} 146 | .ref-arguments th, .ref-arguments td {vertical-align: top;} 147 | .ref-arguments .name {width: 20%;} 148 | .ref-arguments .desc {width: 80%;} 149 | 150 | /* Nice scrolling for wide elements --------------------------------------- */ 151 | 152 | table { 153 | display: block; 154 | overflow: auto; 155 | } 156 | 157 | /* Syntax highlighting ---------------------------------------------------- */ 158 | 159 | pre { 160 | word-wrap: normal; 161 | word-break: normal; 162 | border: 1px solid #eee; 163 | } 164 | 165 | pre, code { 166 | background-color: #f8f8f8; 167 | color: #333; 168 | } 169 | 170 | pre code { 171 | overflow: auto; 172 | word-wrap: normal; 173 | white-space: pre; 174 | } 175 | 176 | pre .img { 177 | margin: 5px 0; 178 | } 179 | 180 | pre .img img { 181 | background-color: #fff; 182 | display: block; 183 | height: auto; 184 | } 185 | 186 | code a, pre a { 187 | color: #375f84; 188 | } 189 | 190 | a.sourceLine:hover { 191 | text-decoration: none; 192 | } 193 | 194 | .fl {color: #1514b5;} 195 | .fu {color: #000000;} /* function */ 196 | .ch,.st {color: #036a07;} /* string */ 197 | .kw {color: #264D66;} /* keyword */ 198 | .co {color: #888888;} /* comment */ 199 | 200 | .message { color: black; font-weight: bolder;} 201 | .error { color: orange; font-weight: bolder;} 202 | .warning { color: #6A0366; font-weight: bolder;} 203 | 204 | /* Clipboard --------------------------*/ 205 | 206 | .hasCopyButton { 207 | position: relative; 208 | } 209 | 210 | .btn-copy-ex { 211 | position: absolute; 212 | right: 0; 213 | top: 0; 214 | visibility: hidden; 215 | } 216 | 217 | .hasCopyButton:hover button.btn-copy-ex { 218 | visibility: visible; 219 | } 220 | 221 | /* mark.js ----------------------------*/ 222 | 223 | mark { 224 | background-color: rgba(255, 255, 51, 0.5); 225 | border-bottom: 2px solid rgba(255, 153, 51, 0.3); 226 | padding: 1px; 227 | } 228 | 229 | /* vertical spacing after htmlwidgets */ 230 | .html-widget { 231 | margin-bottom: 10px; 232 | } 233 | -------------------------------------------------------------------------------- /docs/pkgdown.js: -------------------------------------------------------------------------------- 1 | /* http://gregfranko.com/blog/jquery-best-practices/ */ 2 | (function($) { 3 | $(function() { 4 | 5 | $("#sidebar") 6 | .stick_in_parent({offset_top: 40}) 7 | .on('sticky_kit:bottom', function(e) { 8 | $(this).parent().css('position', 'static'); 9 | }) 10 | .on('sticky_kit:unbottom', function(e) { 11 | $(this).parent().css('position', 'relative'); 12 | }); 13 | 14 | $('body').scrollspy({ 15 | target: '#sidebar', 16 | offset: 60 17 | }); 18 | 19 | $('[data-toggle="tooltip"]').tooltip(); 20 | 21 | var cur_path = paths(location.pathname); 22 | var links = $("#navbar ul li a"); 23 | var max_length = -1; 24 | var pos = -1; 25 | for (var i = 0; i < links.length; i++) { 26 | if (links[i].getAttribute("href") === "#") 27 | continue; 28 | var path = paths(links[i].pathname); 29 | 30 | var length = prefix_length(cur_path, path); 31 | if (length > max_length) { 32 | max_length = length; 33 | pos = i; 34 | } 35 | } 36 | 37 | // Add class to parent
  • , and enclosing
  • if in dropdown 38 | if (pos >= 0) { 39 | var menu_anchor = $(links[pos]); 40 | menu_anchor.parent().addClass("active"); 41 | menu_anchor.closest("li.dropdown").addClass("active"); 42 | } 43 | }); 44 | 45 | function paths(pathname) { 46 | var pieces = pathname.split("/"); 47 | pieces.shift(); // always starts with / 48 | 49 | var end = pieces[pieces.length - 1]; 50 | if (end === "index.html" || end === "") 51 | pieces.pop(); 52 | return(pieces); 53 | } 54 | 55 | function prefix_length(needle, haystack) { 56 | if (needle.length > haystack.length) 57 | return(0); 58 | 59 | // Special case for length-0 haystack, since for loop won't run 60 | if (haystack.length === 0) { 61 | return(needle.length === 0 ? 1 : 0); 62 | } 63 | 64 | for (var i = 0; i < haystack.length; i++) { 65 | if (needle[i] != haystack[i]) 66 | return(i); 67 | } 68 | 69 | return(haystack.length); 70 | } 71 | 72 | /* Clipboard --------------------------*/ 73 | 74 | function changeTooltipMessage(element, msg) { 75 | var tooltipOriginalTitle=element.getAttribute('data-original-title'); 76 | element.setAttribute('data-original-title', msg); 77 | $(element).tooltip('show'); 78 | element.setAttribute('data-original-title', tooltipOriginalTitle); 79 | } 80 | 81 | if(Clipboard.isSupported()) { 82 | $(document).ready(function() { 83 | var copyButton = ""; 84 | 85 | $(".examples, div.sourceCode").addClass("hasCopyButton"); 86 | 87 | // Insert copy buttons: 88 | $(copyButton).prependTo(".hasCopyButton"); 89 | 90 | // Initialize tooltips: 91 | $('.btn-copy-ex').tooltip({container: 'body'}); 92 | 93 | // Initialize clipboard: 94 | var clipboardBtnCopies = new Clipboard('[data-clipboard-copy]', { 95 | text: function(trigger) { 96 | return trigger.parentNode.textContent; 97 | } 98 | }); 99 | 100 | clipboardBtnCopies.on('success', function(e) { 101 | changeTooltipMessage(e.trigger, 'Copied!'); 102 | e.clearSelection(); 103 | }); 104 | 105 | clipboardBtnCopies.on('error', function() { 106 | changeTooltipMessage(e.trigger,'Press Ctrl+C or Command+C to copy'); 107 | }); 108 | }); 109 | } 110 | })(window.jQuery || window.$) 111 | -------------------------------------------------------------------------------- /docs/pkgdown.yml: -------------------------------------------------------------------------------- 1 | pandoc: 1.19.2.1 2 | pkgdown: 1.1.0 3 | pkgdown_sha: ~ 4 | articles: 5 | ghfd-vignette-LOB: ghfd-vignette-LOB.html 6 | ghfd-vignette-Orders: ghfd-vignette-Orders.html 7 | ghfd-vignette-Trades: ghfd-vignette-Trades.html 8 | 9 | -------------------------------------------------------------------------------- /docs/reference/add.order.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Adds an order to the LOB — add.order • GetHFData 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 47 | 48 | 49 | 50 | 51 | 52 |
    53 |
    54 | 114 | 115 | 116 |
    117 | 118 |
    119 |
    120 | 125 | 126 |
    127 | 128 |

    Adds an order to the LOB

    129 | 130 |
    131 | 132 |
    add.order(my.lob, order.in, silent = TRUE)
    133 | 134 |

    Arguments

    135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 |
    my.lob

    A LOB (order book)

    order.in

    An order from the data

    silent

    Should the function print progress ? (TRUE (default) or FALSE)

    150 | 151 |

    Value

    152 | 153 |

    An LOB with the new order

    154 | 155 | 156 |

    Examples

    157 |
    # no example (internal) 158 |
    159 |
    160 | 171 |
    172 | 173 |
    174 | 177 | 178 |
    179 |

    Site built with pkgdown.

    180 |
    181 | 182 |
    183 |
    184 | 185 | 186 | 187 | 188 | 189 | 190 | -------------------------------------------------------------------------------- /docs/reference/ghfd_build_lob.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Building LOB (limit order book) from orders — ghfd_build_lob • GetHFData 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 47 | 48 | 49 | 50 | 51 | 52 |
    53 |
    54 | 114 | 115 | 116 |
    117 | 118 |
    119 |
    120 | 125 | 126 |
    127 | 128 |

    Building LOB (limit order book) from orders

    129 | 130 |
    131 | 132 |
    ghfd_build_lob(df.orders, silent = TRUE)
    133 | 134 |

    Arguments

    135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 |
    df.orders

    A dataframe, output from ghfd_GetHFData

    silent

    Should the function print progress ? (TRUE (default) or FALSE)

    146 | 147 |

    Value

    148 | 149 |

    A dataframe with information about LOB

    150 | 151 | 152 |

    Examples

    153 |
    # NOT RUN {
    154 | library(GetHFData)
    155 | first.time <- '11:00:00'
    156 | last.time <- '17:00:00'
    157 | first.date <- as.Date('2015-11-03')
    158 | last.date <- as.Date('2015-11-03')
    159 | type.output <- 'raw'
    160 | type.data <- 'orders'
    161 | type.market = 'equity-odds'
    162 | 
    163 | df.out <- ghfd_get_HF_data(my.assets =my.assets,
    164 |                           type.market = type.market,
    165 |                           type.data = type.data,
    166 |                           first.date = first.date,
    167 |                           last.date = last.date,
    168 |                           first.time = first.time,
    169 |                           last.time = last.time,
    170 |                           type.output = type.output)
    171 | 
    172 | df.lob <- ghfd_build_lob(df.out)
    173 | # }
    174 |
    175 | 186 |
    187 | 188 |
    189 | 192 | 193 |
    194 |

    Site built with pkgdown.

    195 |
    196 | 197 |
    198 |
    199 | 200 | 201 | 202 | 203 | 204 | 205 | -------------------------------------------------------------------------------- /docs/reference/ghfd_download_file.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Downloads a single file from Bovespa ftp — ghfd_download_file • GetHFData 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 48 | 49 | 50 | 51 | 52 | 53 |
    54 |
    55 | 115 | 116 | 117 |
    118 | 119 |
    120 |
    121 | 126 | 127 |
    128 | 129 |

    This function will take as input a ftp addresss, the name of the downloaded file in the local drive, 130 | and it will download the corresponding file. Returns TRUE if it worked and FALSE otherwise.

    131 | 132 |
    133 | 134 |
    ghfd_download_file(my.ftp, out.file, dl.dir = "Dl Files",
    135 |   max.dl.tries = 10)
    136 | 137 |

    Arguments

    138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 |
    my.ftp

    A complete, including file name, ftp address to download the file from

    out.file

    Name of downloaded file with HFT data from Bovespa

    dl.dir

    The folder to download the zip files (default = 'ftp files')

    max.dl.tries

    Maximum attempts to download the files from ftp

    157 | 158 |

    Value

    159 | 160 |

    TRUE if sucessfull, FALSE if not

    161 | 162 | 163 |

    Examples

    164 |
    165 | my.ftp <- 'ftp://ftp.bmf.com.br/MarketData/Bovespa-Opcoes/NEG_OPCOES_20151229.zip' 166 | out.file <- 'temp.zip'
    # NOT RUN { 167 | ghfd_download_file(my.ftp = my.ftp, out.file=out.file) 168 | # }
    169 |
    170 | 181 |
    182 | 183 |
    184 | 187 | 188 |
    189 |

    Site built with pkgdown.

    190 |
    191 | 192 |
    193 |
    194 | 195 | 196 | 197 | 198 | 199 | 200 | -------------------------------------------------------------------------------- /docs/reference/ghfd_get_available_tickers_from_file.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Function to get available tickers from downloaded zip file — ghfd_get_available_tickers_from_file • GetHFData 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 49 | 50 | 51 | 52 | 53 | 54 |
    55 |
    56 | 116 | 117 | 118 |
    119 | 120 |
    121 |
    122 | 127 | 128 |
    129 | 130 |

    This function will read the zip file downloaded from Bovespa and output 131 | a numeric vector where the names of the elements represents the different tickers 132 | and the numeric values as the number of trades for each ticker

    133 | 134 |
    135 | 136 |
    ghfd_get_available_tickers_from_file(out.file)
    137 | 138 |

    Arguments

    139 | 140 | 141 | 142 | 143 | 144 | 145 |
    out.file

    Name of downloaded file with HFT data from Bovespa

    146 | 147 |

    Value

    148 | 149 |

    A dataframe with the number of trades for each ticker found in file

    150 | 151 | 152 |

    Examples

    153 |
    154 | ## get file from package (usually this would be been downloaded from the ftp) 155 | out.file <- system.file("extdata", 'NEG_OPCOES_20151126.zip', package = "GetHFData") 156 | 157 | df.tickers <- ghfd_get_available_tickers_from_file(out.file) 158 | 159 | print(head(df.tickers))
    #> tickers n.obs 160 | #> 1 PETRL80 2882 161 | #> 2 PETRL9 985 162 | #> 3 VALEL14 754 163 | #> 4 VALEL43 679 164 | #> 5 PETRL70 514 165 | #> 6 PETRX80 507 166 | #> f.name 167 | #> 1 /home/msperlin/GitRepo/GetHFData/inst/extdata/NEG_OPCOES_20151126.zip 168 | #> 2 /home/msperlin/GitRepo/GetHFData/inst/extdata/NEG_OPCOES_20151126.zip 169 | #> 3 /home/msperlin/GitRepo/GetHFData/inst/extdata/NEG_OPCOES_20151126.zip 170 | #> 4 /home/msperlin/GitRepo/GetHFData/inst/extdata/NEG_OPCOES_20151126.zip 171 | #> 5 /home/msperlin/GitRepo/GetHFData/inst/extdata/NEG_OPCOES_20151126.zip 172 | #> 6 /home/msperlin/GitRepo/GetHFData/inst/extdata/NEG_OPCOES_20151126.zip
    173 |
    174 | 185 |
    186 | 187 |
    188 | 191 | 192 |
    193 |

    Site built with pkgdown.

    194 |
    195 | 196 |
    197 |
    198 | 199 | 200 | 201 | 202 | 203 | 204 | -------------------------------------------------------------------------------- /docs/reference/ghfd_get_available_tickers_from_ftp.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Function to get available tickers from ftp — ghfd_get_available_tickers_from_ftp • GetHFData 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 49 | 50 | 51 | 52 | 53 | 54 |
    55 |
    56 | 116 | 117 | 118 |
    119 | 120 |
    121 |
    122 | 127 | 128 |
    129 | 130 |

    This function will read the Bovespa ftp for a given market/date and output 131 | a numeric vector where the names of the elements represents the different tickers 132 | and the numeric values as the number of trades for each ticker

    133 | 134 |
    135 | 136 |
    ghfd_get_available_tickers_from_ftp(my.date = "2015-11-03",
    137 |   type.market = "equity", type.data = "trades", dl.dir = "ftp files",
    138 |   max.dl.tries = 10)
    139 | 140 |

    Arguments

    141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 |
    my.date

    A single date to check tickers in ftp (e.g. '2015-11-03')

    type.market

    The type of market to download data from ('equity', 'equity-odds','options', 'BMF' ).

    type.data

    The type of financial data to download and aggregate ('trades' or 'orders').

    dl.dir

    The folder to download the zip files (default = 'ftp files')

    max.dl.tries

    Maximum attempts to download the files from ftp

    164 | 165 |

    Value

    166 | 167 |

    A data.frame with the tickers, number of found trades and file name

    168 | 169 | 170 |

    Examples

    171 |
    172 |
    # NOT RUN { 173 | df.tickers <- ghfd_get_available_tickers_from_ftp(my.date = '2015-11-03', 174 | type.market = 'BMF') 175 | 176 | print(head(df.tickers)) 177 | # }
    178 |
    179 | 190 |
    191 | 192 |
    193 | 196 | 197 |
    198 |

    Site built with pkgdown.

    199 |
    200 | 201 |
    202 |
    203 | 204 | 205 | 206 | 207 | 208 | 209 | -------------------------------------------------------------------------------- /docs/reference/ghfd_get_ftp_contents.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Gets the contents of Bovespa ftp — ghfd_get_ftp_contents • GetHFData 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 47 | 48 | 49 | 50 | 51 | 52 |
    53 |
    54 | 114 | 115 | 116 |
    117 | 118 |
    119 |
    120 | 125 | 126 |
    127 | 128 |

    This function will access the Bovespa ftp and return a vector with all files related to trades (all others are ignored)

    129 | 130 |
    131 | 132 |
    ghfd_get_ftp_contents(type.market = "equity", max.dl.tries = 10,
    133 |   type.data = "trades")
    134 | 135 |

    Arguments

    136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 |
    type.market

    The type of market to download data from ('equity', 'equity-odds','options', 'BMF' ).

    max.dl.tries

    Maximum attempts to download the files from ftp

    type.data

    The type of financial data to download and aggregate ('trades' or 'orders').

    151 | 152 |

    Value

    153 | 154 |

    A list with all files from the ftp that are related to executed trades

    155 | 156 | 157 |

    Examples

    158 |
    159 |
    # NOT RUN { 160 | ftp.files <- ghfd_get_ftp_contents(type.market = 'equity') 161 | print(ftp.files) 162 | # }
    163 |
    164 | 175 |
    176 | 177 |
    178 | 181 | 182 |
    183 |

    Site built with pkgdown.

    184 |
    185 | 186 |
    187 |
    188 | 189 | 190 | 191 | 192 | 193 | 194 | -------------------------------------------------------------------------------- /docs/reference/ghfd_read_file.orders.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Reads zip file downloaded from Bovespa ftp (orders) - INTERNAL USE — ghfd_read_file.orders • GetHFData 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 47 | 48 | 49 | 50 | 51 | 52 |
    53 |
    54 | 114 | 115 | 116 |
    117 | 118 |
    119 |
    120 | 125 | 126 |
    127 | 128 |

    Reads zip file downloaded from Bovespa ftp (orders) - INTERNAL USE

    129 | 130 |
    131 | 132 |
    ghfd_read_file.orders(out.file, my.assets = NULL, type.matching = NULL,
    133 |   first.time = "10:00:00", last.time = "17:00:00",
    134 |   type.output = "agg", agg.diff = "15 min")
    135 | 136 |

    Arguments

    137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 163 | 164 | 165 | 166 | 168 | 169 |
    out.file

    Name of zip file

    my.assets

    The tickers (symbols) of the derised assets to import data (e.g. c('PETR4', 'VALE5')). The function allow for partial patching (e.g. 'PETR' for all assets related to Petrobras). Default is set to NULL (download all available tickers)

    type.matching

    Type of matching for asset names in data ('exact' or 'partial')

    first.time

    The first intraday period to import the data. All trades/orders before this time of day are ignored. As character, e.g. '10:00:00'.

    last.time

    The last intraday period to import the data. All trades/orders after this time of day are ignored. As character, e.g. '18:00:00'.

    type.output

    Defines the type of output of the data. The choice 'agg' outputs aggregated data for time intervals defined in agg.diff. 162 | The choice 'raw' outputs the raw, tick by tick/order by order, data from the zip files.

    agg.diff

    The time interval used in the aggregation of data. Only used for type.output='agg'. It should contain a integer followed by a time unit ('sec' or 'secs', 'min' or 'mins', 'hour' or 'hours', 'day' or 'days'). 167 | Example: agg.diff = '15 mins', agg.diff = '1 hour'.

    170 | 171 |

    Value

    172 | 173 |

    A dataframe with trade data (aggregated or raw)

    174 | 175 | 176 |

    Examples

    177 |
    178 | # no example 179 |
    180 |
    181 | 192 |
    193 | 194 |
    195 | 198 | 199 |
    200 |

    Site built with pkgdown.

    201 |
    202 | 203 |
    204 |
    205 | 206 | 207 | 208 | 209 | 210 | 211 | -------------------------------------------------------------------------------- /docs/reference/ghfd_read_file.trades.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Reads zip file downloaded from Bovespa ftp (trades) - INTERNAL USE — ghfd_read_file.trades • GetHFData 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 47 | 48 | 49 | 50 | 51 | 52 |
    53 |
    54 | 114 | 115 | 116 |
    117 | 118 |
    119 |
    120 | 125 | 126 |
    127 | 128 |

    Reads zip file downloaded from Bovespa ftp (trades) - INTERNAL USE

    129 | 130 |
    131 | 132 |
    ghfd_read_file.trades(out.file, my.assets = NULL, type.matching = NULL,
    133 |   first.time = "10:00:00", last.time = "17:00:00",
    134 |   type.output = "agg", agg.diff = "15 min")
    135 | 136 |

    Arguments

    137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 163 | 164 | 165 | 166 | 168 | 169 |
    out.file

    Name of zip file

    my.assets

    The tickers (symbols) of the derised assets to import data (e.g. c('PETR4', 'VALE5')). The function allow for partial patching (e.g. 'PETR' for all assets related to Petrobras). Default is set to NULL (download all available tickers)

    type.matching

    Type of matching for asset names in data ('exact' or 'partial')

    first.time

    The first intraday period to import the data. All trades/orders before this time of day are ignored. As character, e.g. '10:00:00'.

    last.time

    The last intraday period to import the data. All trades/orders after this time of day are ignored. As character, e.g. '18:00:00'.

    type.output

    Defines the type of output of the data. The choice 'agg' outputs aggregated data for time intervals defined in agg.diff. 162 | The choice 'raw' outputs the raw, tick by tick/order by order, data from the zip files.

    agg.diff

    The time interval used in the aggregation of data. Only used for type.output='agg'. It should contain a integer followed by a time unit ('sec' or 'secs', 'min' or 'mins', 'hour' or 'hours', 'day' or 'days'). 167 | Example: agg.diff = '15 mins', agg.diff = '1 hour'.

    170 | 171 |

    Value

    172 | 173 |

    A dataframe with trade data (aggregated or raw)

    174 | 175 | 176 |

    Examples

    177 |
    178 | # no example 179 |
    180 |
    181 | 192 |
    193 | 194 |
    195 | 198 | 199 |
    200 |

    Site built with pkgdown.

    201 |
    202 | 203 |
    204 |
    205 | 206 | 207 | 208 | 209 | 210 | 211 | -------------------------------------------------------------------------------- /docs/reference/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Function reference • GetHFData 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 44 | 45 | 46 | 47 | 48 | 49 |
    50 |
    51 | 111 | 112 | 113 |
    114 | 115 |
    116 |
    117 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 135 | 136 | 137 | 138 | 141 | 142 | 143 | 144 | 147 | 148 | 149 | 150 | 153 | 154 | 155 | 156 | 159 | 160 | 161 | 162 | 165 | 166 | 167 | 168 | 171 | 172 | 173 | 174 | 177 | 178 | 179 | 180 | 183 | 184 | 185 | 186 | 189 | 190 | 191 | 192 | 195 | 196 | 197 | 198 | 201 | 202 | 203 | 204 | 207 | 208 | 209 | 210 | 213 | 214 | 215 | 216 |
    132 |

    All functions

    133 |

    134 |
    139 |

    add.order()

    140 |

    Adds an order to the LOB

    145 |

    ghfd_build_lob()

    146 |

    Building LOB (limit order book) from orders

    151 |

    ghfd_download_file()

    152 |

    Downloads a single file from Bovespa ftp

    157 |

    ghfd_get_HF_data()

    158 |

    Downloads and aggregates high frequency trading data directly from the Bovespa ftp

    163 |

    ghfd_get_available_tickers_from_file()

    164 |

    Function to get available tickers from downloaded zip file

    169 |

    ghfd_get_available_tickers_from_ftp()

    170 |

    Function to get available tickers from ftp

    175 |

    ghfd_get_ftp_contents()

    176 |

    Gets the contents of Bovespa ftp

    181 |

    ghfd_read_file()

    182 |

    Reads zip file downloaded from Bovespa ftp (trades or orders)

    187 |

    ghfd_read_file.orders()

    188 |

    Reads zip file downloaded from Bovespa ftp (orders) - INTERNAL USE

    193 |

    ghfd_read_file.trades()

    194 |

    Reads zip file downloaded from Bovespa ftp (trades) - INTERNAL USE

    199 |

    organize.lob()

    200 |

    Organizes LOB (internal function)

    205 |

    print(<lob>)

    206 |

    Prints the LOB

    211 |

    process.lob.from.df()

    212 |

    Process LOB from asset dataframe

    217 |
    218 | 219 | 225 |
    226 | 227 |
    228 | 231 | 232 |
    233 |

    Site built with pkgdown.

    234 |
    235 | 236 |
    237 |
    238 | 239 | 240 | 241 | 242 | 243 | 244 | -------------------------------------------------------------------------------- /docs/reference/organize.lob.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Organizes LOB (internal function) — organize.lob • GetHFData 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 48 | 49 | 50 | 51 | 52 | 53 |
    54 |
    55 | 115 | 116 | 117 |
    118 | 119 |
    120 |
    121 | 126 | 127 |
    128 | 129 |

    This internal recursive function organizes the lob by making sure that all prices and time are ordered. 130 | Every time that prices in the bid and ask matches, it will create a trade and modify the lob accordingly.

    131 | 132 |
    133 | 134 |
    organize.lob(my.lob, silent = TRUE)
    135 | 136 |

    Arguments

    137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 |
    my.lob

    A LOB (order book)

    silent

    Should the function print progress ? (TRUE (default) or FALSE)

    148 | 149 |

    Value

    150 | 151 |

    An organized LOB

    152 | 153 | 154 |

    Examples

    155 |
    156 | # no examples (internal) 157 |
    158 |
    159 | 170 |
    171 | 172 |
    173 | 176 | 177 |
    178 |

    Site built with pkgdown.

    179 |
    180 | 181 |
    182 |
    183 | 184 | 185 | 186 | 187 | 188 | 189 | -------------------------------------------------------------------------------- /docs/reference/print.lob.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Prints the LOB — print.lob • GetHFData 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 47 | 48 | 49 | 50 | 51 | 52 |
    53 |
    54 | 114 | 115 | 116 |
    117 | 118 |
    119 |
    120 | 125 | 126 |
    127 | 128 |

    Prints the LOB

    129 | 130 |
    131 | 132 |
    # S3 method for lob
    133 | print(my.lob, max.level = 3)
    134 | 135 |

    Arguments

    136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 |
    my.lob

    A LOB (order book)

    max.level

    Max level of lob to print

    147 | 148 |

    Value

    149 | 150 |

    nothing

    151 | 152 | 153 |

    Examples

    154 |
    # no example (internal) 155 |
    156 |
    157 | 168 |
    169 | 170 |
    171 | 174 | 175 |
    176 |

    Site built with pkgdown.

    177 |
    178 | 179 |
    180 |
    181 | 182 | 183 | 184 | 185 | 186 | 187 | -------------------------------------------------------------------------------- /docs/reference/process.lob.from.df.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Process LOB from asset dataframe — process.lob.from.df • GetHFData 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 47 | 48 | 49 | 50 | 51 | 52 |
    53 |
    54 | 114 | 115 | 116 |
    117 | 118 |
    119 |
    120 | 125 | 126 |
    127 | 128 |

    Process LOB from asset dataframe

    129 | 130 |
    131 | 132 |
    process.lob.from.df(asset.df, silent = T)
    133 | 134 |

    Arguments

    135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 |
    asset.df

    A dataframe with orders for a single asset

    silent

    Should the function print progress ? (TRUE (default) or FALSE)

    146 | 147 |

    Value

    148 | 149 |

    The lob for the single asset

    150 | 151 | 152 |

    Examples

    153 |
    # no example (internal) 154 |
    155 |
    156 | 167 |
    168 | 169 | 179 |
    180 | 181 | 182 | 183 | 184 | 185 | 186 | -------------------------------------------------------------------------------- /inst/CITATION: -------------------------------------------------------------------------------- 1 | bibentry(bibtype = "Manual", 2 | title = "GetHFData: A R Package for Downloading and Aggregating High Frequency Trading Data from Bovespa", 3 | author = c(person("Marcelo", "Perlin"), 4 | person("Henrique", "Ramos")), 5 | year = 2016, 6 | journal = 'Available at SSRN', 7 | url = "https://ssrn.com/abstract=2824058") 8 | -------------------------------------------------------------------------------- /inst/extdata/Example_Orders.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/msperlin/GetHFData/33328a5920a1087fc3729893d17f532d5970f349/inst/extdata/Example_Orders.RData -------------------------------------------------------------------------------- /inst/extdata/NEG_OPCOES_20151126.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/msperlin/GetHFData/33328a5920a1087fc3729893d17f532d5970f349/inst/extdata/NEG_OPCOES_20151126.zip -------------------------------------------------------------------------------- /man/add.order.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ghfd_lob_fcts.R 3 | \name{add.order} 4 | \alias{add.order} 5 | \title{Adds an order to the LOB} 6 | \usage{ 7 | add.order(my.lob, order.in, silent = TRUE) 8 | } 9 | \arguments{ 10 | \item{my.lob}{A LOB (order book)} 11 | 12 | \item{order.in}{An order from the data} 13 | 14 | \item{silent}{Should the function print progress ? (TRUE (default) or FALSE)} 15 | } 16 | \value{ 17 | An LOB with the new order 18 | } 19 | \description{ 20 | Adds an order to the LOB 21 | } 22 | \examples{ 23 | # no example (internal) 24 | } 25 | -------------------------------------------------------------------------------- /man/ghfd_build_lob.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ghfd_build_lob.R 3 | \name{ghfd_build_lob} 4 | \alias{ghfd_build_lob} 5 | \title{Building LOB (limit order book) from orders} 6 | \usage{ 7 | ghfd_build_lob(df.orders, silent = TRUE) 8 | } 9 | \arguments{ 10 | \item{df.orders}{A dataframe, output from ghfd_GetHFData} 11 | 12 | \item{silent}{Should the function print progress ? (TRUE (default) or FALSE)} 13 | } 14 | \value{ 15 | A dataframe with information about LOB 16 | } 17 | \description{ 18 | Building LOB (limit order book) from orders 19 | } 20 | \examples{ 21 | \dontrun{ 22 | library(GetHFData) 23 | first.time <- '11:00:00' 24 | last.time <- '17:00:00' 25 | first.date <- as.Date('2015-11-03') 26 | last.date <- as.Date('2015-11-03') 27 | type.output <- 'raw' 28 | type.data <- 'orders' 29 | type.market = 'equity-odds' 30 | 31 | df.out <- ghfd_get_HF_data(my.assets =my.assets, 32 | type.market = type.market, 33 | type.data = type.data, 34 | first.date = first.date, 35 | last.date = last.date, 36 | first.time = first.time, 37 | last.time = last.time, 38 | type.output = type.output) 39 | 40 | df.lob <- ghfd_build_lob(df.out) 41 | } 42 | } 43 | -------------------------------------------------------------------------------- /man/ghfd_download_file.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ghfd_download_file.R 3 | \name{ghfd_download_file} 4 | \alias{ghfd_download_file} 5 | \title{Downloads a single file from Bovespa ftp} 6 | \usage{ 7 | ghfd_download_file(my.ftp, out.file, dl.dir = "Dl Files", max.dl.tries = 10) 8 | } 9 | \arguments{ 10 | \item{my.ftp}{A complete, including file name, ftp address to download the file from} 11 | 12 | \item{out.file}{Name of downloaded file with HFT data from Bovespa} 13 | 14 | \item{dl.dir}{The folder to download the zip files (default = 'ftp files')} 15 | 16 | \item{max.dl.tries}{Maximum attempts to download the files from ftp} 17 | } 18 | \value{ 19 | TRUE if sucessfull, FALSE if not 20 | } 21 | \description{ 22 | This function will take as input a ftp addresss, the name of the downloaded file in the local drive, 23 | and it will download the corresponding file. Returns TRUE if it worked and FALSE otherwise. 24 | } 25 | \examples{ 26 | 27 | my.ftp <- 'ftp://ftp.bmf.com.br/MarketData/Bovespa-Opcoes/NEG_OPCOES_20151229.zip' 28 | out.file <- 'temp.zip' 29 | 30 | \dontrun{ 31 | ghfd_download_file(my.ftp = my.ftp, out.file=out.file) 32 | } 33 | } 34 | -------------------------------------------------------------------------------- /man/ghfd_get_HF_data.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ghfd_get_HF_data.R 3 | \name{ghfd_get_HF_data} 4 | \alias{ghfd_get_HF_data} 5 | \title{Downloads and aggregates high frequency trading data directly from the Bovespa ftp} 6 | \usage{ 7 | ghfd_get_HF_data( 8 | my.assets = NULL, 9 | type.matching = "exact", 10 | type.market = "equity", 11 | type.data = "trades", 12 | first.date = "2016-01-01", 13 | last.date = "2016-01-05", 14 | first.time = NULL, 15 | last.time = NULL, 16 | type.output = "agg", 17 | agg.diff = "15 min", 18 | dl.dir = "ftp files", 19 | max.dl.tries = 10, 20 | clean.files = FALSE, 21 | only.dl = FALSE 22 | ) 23 | } 24 | \arguments{ 25 | \item{my.assets}{The tickers (symbols) of the derised assets to import data (e.g. c('PETR4', 'VALE5')). The function allow for partial patching (e.g. 'PETR' for all assets related to Petrobras). Default is set to NULL (download all available tickers)} 26 | 27 | \item{type.matching}{Type of matching for asset names in data ('exact' or 'partial')} 28 | 29 | \item{type.market}{The type of market to download data from ('equity', 'equity-odds','options', 'BMF' ).} 30 | 31 | \item{type.data}{The type of financial data to download and aggregate ('trades' or 'orders').} 32 | 33 | \item{first.date}{The first date of the imported data (e.g. '2016-01-01')} 34 | 35 | \item{last.date}{The last date of the imported data (e.g. '2016-01-05')} 36 | 37 | \item{first.time}{The first intraday period to import the data. All trades/orders before this time of day are ignored. As character, e.g. '10:00:00'.} 38 | 39 | \item{last.time}{The last intraday period to import the data. All trades/orders after this time of day are ignored. As character, e.g. '18:00:00'.} 40 | 41 | \item{type.output}{Defines the type of output of the data. The choice 'agg' outputs aggregated data for time intervals defined in agg.diff. 42 | The choice 'raw' outputs the raw, tick by tick/order by order, data from the zip files.} 43 | 44 | \item{agg.diff}{The time interval used in the aggregation of data. Only used for type.output='agg'. It should contain a integer followed by a time unit ('sec' or 'secs', 'min' or 'mins', 'hour' or 'hours', 'day' or 'days'). 45 | Example: agg.diff = '15 mins', agg.diff = '1 hour'.} 46 | 47 | \item{dl.dir}{The folder to download the zip files (default = 'ftp files')} 48 | 49 | \item{max.dl.tries}{Maximum attempts to download the files from ftp} 50 | 51 | \item{clean.files}{Logical. Should the files be removed after reading it? (TRUE or FALSE)} 52 | 53 | \item{only.dl}{Logical. Should the function only download the files? (TRUE or FALSE). This is usefull if you just want the file for later analysis} 54 | } 55 | \value{ 56 | A dataframe with the financial data in the raw format (tick by tick) or aggregated 57 | } 58 | \description{ 59 | This function downloads zip files containing trades from Bovespa's ftp (ftp://ftp.bmf.com.br/MarketData/) and imports it into R. 60 | See the vignette and examples for more details on how to use the function. 61 | } 62 | \examples{ 63 | 64 | my.assets <- 'ABEVA69' 65 | type.market <- 'options' 66 | first.date <- as.Date('2015-12-29') 67 | last.date <- as.Date('2015-12-29') 68 | 69 | \dontrun{ 70 | df.out <- ghfd_get_HF_data(my.assets, type.market, first.date, last.date) 71 | } 72 | } 73 | -------------------------------------------------------------------------------- /man/ghfd_get_available_tickers_from_file.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ghfd_get_available_tickers_from_file.R 3 | \name{ghfd_get_available_tickers_from_file} 4 | \alias{ghfd_get_available_tickers_from_file} 5 | \title{Function to get available tickers from downloaded zip file} 6 | \usage{ 7 | ghfd_get_available_tickers_from_file(out.file) 8 | } 9 | \arguments{ 10 | \item{out.file}{Name of downloaded file with HFT data from Bovespa} 11 | } 12 | \value{ 13 | A dataframe with the number of trades for each ticker found in file 14 | } 15 | \description{ 16 | This function will read the zip file downloaded from Bovespa and output 17 | a numeric vector where the names of the elements represents the different tickers 18 | and the numeric values as the number of trades for each ticker 19 | } 20 | \examples{ 21 | 22 | ## get file from package (usually this would be been downloaded from the ftp) 23 | out.file <- system.file("extdata", 'NEG_OPCOES_20151126.zip', package = "GetHFData") 24 | 25 | df.tickers <- ghfd_get_available_tickers_from_file(out.file) 26 | 27 | print(head(df.tickers)) 28 | } 29 | -------------------------------------------------------------------------------- /man/ghfd_get_available_tickers_from_ftp.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ghfd_get_available_tickers_from_ftp.R 3 | \name{ghfd_get_available_tickers_from_ftp} 4 | \alias{ghfd_get_available_tickers_from_ftp} 5 | \title{Function to get available tickers from ftp} 6 | \usage{ 7 | ghfd_get_available_tickers_from_ftp( 8 | my.date = "2015-11-03", 9 | type.market = "equity", 10 | type.data = "trades", 11 | dl.dir = "ftp files", 12 | max.dl.tries = 10 13 | ) 14 | } 15 | \arguments{ 16 | \item{my.date}{A single date to check tickers in ftp (e.g. '2015-11-03')} 17 | 18 | \item{type.market}{The type of market to download data from ('equity', 'equity-odds','options', 'BMF' ).} 19 | 20 | \item{type.data}{The type of financial data to download and aggregate ('trades' or 'orders').} 21 | 22 | \item{dl.dir}{The folder to download the zip files (default = 'ftp files')} 23 | 24 | \item{max.dl.tries}{Maximum attempts to download the files from ftp} 25 | } 26 | \value{ 27 | A data.frame with the tickers, number of found trades and file name 28 | } 29 | \description{ 30 | This function will read the Bovespa ftp for a given market/date and output 31 | a numeric vector where the names of the elements represents the different tickers 32 | and the numeric values as the number of trades for each ticker 33 | } 34 | \examples{ 35 | 36 | \dontrun{ 37 | df.tickers <- ghfd_get_available_tickers_from_ftp(my.date = '2015-11-03', 38 | type.market = 'BMF') 39 | 40 | print(head(df.tickers)) 41 | } 42 | } 43 | -------------------------------------------------------------------------------- /man/ghfd_get_ftp_contents.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ghfd_get_ftp_contents.R 3 | \name{ghfd_get_ftp_contents} 4 | \alias{ghfd_get_ftp_contents} 5 | \title{Gets the contents of Bovespa ftp} 6 | \usage{ 7 | ghfd_get_ftp_contents( 8 | type.market = "equity", 9 | max.dl.tries = 10, 10 | type.data = "trades" 11 | ) 12 | } 13 | \arguments{ 14 | \item{type.market}{The type of market to download data from ('equity', 'equity-odds','options', 'BMF' ).} 15 | 16 | \item{max.dl.tries}{Maximum attempts to download the files from ftp} 17 | 18 | \item{type.data}{The type of financial data to download and aggregate ('trades' or 'orders').} 19 | } 20 | \value{ 21 | A list with all files from the ftp that are related to executed trades 22 | } 23 | \description{ 24 | This function will access the Bovespa ftp and return a vector with all files related to trades (all others are ignored) 25 | } 26 | \examples{ 27 | 28 | \dontrun{ 29 | ftp.files <- ghfd_get_ftp_contents(type.market = 'equity') 30 | print(ftp.files) 31 | } 32 | } 33 | -------------------------------------------------------------------------------- /man/ghfd_read_file.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ghfd_read_file.R 3 | \name{ghfd_read_file} 4 | \alias{ghfd_read_file} 5 | \title{Reads zip file downloaded from Bovespa ftp (trades or orders)} 6 | \usage{ 7 | ghfd_read_file( 8 | out.file, 9 | my.assets = NULL, 10 | type.matching = "exact", 11 | type.data = "trades", 12 | first.time = "10:00:00", 13 | last.time = "17:00:00", 14 | type.output = "agg", 15 | agg.diff = "15 min" 16 | ) 17 | } 18 | \arguments{ 19 | \item{out.file}{Name of zip file} 20 | 21 | \item{my.assets}{The tickers (symbols) of the derised assets to import data (e.g. c('PETR4', 'VALE5')). The function allow for partial patching (e.g. 'PETR' for all assets related to Petrobras). Default is set to NULL (download all available tickers)} 22 | 23 | \item{type.matching}{Type of matching for asset names in data ('exact' or 'partial')} 24 | 25 | \item{type.data}{The type of financial data to download and aggregate ('trades' or 'orders').} 26 | 27 | \item{first.time}{The first intraday period to import the data. All trades/orders before this time of day are ignored. As character, e.g. '10:00:00'.} 28 | 29 | \item{last.time}{The last intraday period to import the data. All trades/orders after this time of day are ignored. As character, e.g. '18:00:00'.} 30 | 31 | \item{type.output}{Defines the type of output of the data. The choice 'agg' outputs aggregated data for time intervals defined in agg.diff. 32 | The choice 'raw' outputs the raw, tick by tick/order by order, data from the zip files.} 33 | 34 | \item{agg.diff}{The time interval used in the aggregation of data. Only used for type.output='agg'. It should contain a integer followed by a time unit ('sec' or 'secs', 'min' or 'mins', 'hour' or 'hours', 'day' or 'days'). 35 | Example: agg.diff = '15 mins', agg.diff = '1 hour'.} 36 | } 37 | \value{ 38 | A dataframe with the raw (tick by tick/order by order) dataset 39 | } 40 | \description{ 41 | Reads zip file downloaded from Bovespa ftp (trades or orders) 42 | } 43 | \examples{ 44 | 45 | my.assets <- c('ABEVA20', 'PETRL78') 46 | 47 | ## getting data from local file (in practice it would be downloaded from ftp) 48 | out.file <- system.file("extdata", 'NEG_OPCOES_20151126.zip', package = "GetHFData") 49 | 50 | df.out <- ghfd_read_file(out.file, my.assets) 51 | print(head(df.out)) 52 | } 53 | -------------------------------------------------------------------------------- /man/ghfd_read_file.orders.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ghfd_read_file.R 3 | \name{ghfd_read_file.orders} 4 | \alias{ghfd_read_file.orders} 5 | \title{Reads zip file downloaded from Bovespa ftp (orders) - INTERNAL USE} 6 | \usage{ 7 | ghfd_read_file.orders( 8 | out.file, 9 | my.assets = NULL, 10 | type.matching = NULL, 11 | first.time = "10:00:00", 12 | last.time = "17:00:00", 13 | type.output = "agg", 14 | agg.diff = "15 min" 15 | ) 16 | } 17 | \arguments{ 18 | \item{out.file}{Name of zip file} 19 | 20 | \item{my.assets}{The tickers (symbols) of the derised assets to import data (e.g. c('PETR4', 'VALE5')). The function allow for partial patching (e.g. 'PETR' for all assets related to Petrobras). Default is set to NULL (download all available tickers)} 21 | 22 | \item{type.matching}{Type of matching for asset names in data ('exact' or 'partial')} 23 | 24 | \item{first.time}{The first intraday period to import the data. All trades/orders before this time of day are ignored. As character, e.g. '10:00:00'.} 25 | 26 | \item{last.time}{The last intraday period to import the data. All trades/orders after this time of day are ignored. As character, e.g. '18:00:00'.} 27 | 28 | \item{type.output}{Defines the type of output of the data. The choice 'agg' outputs aggregated data for time intervals defined in agg.diff. 29 | The choice 'raw' outputs the raw, tick by tick/order by order, data from the zip files.} 30 | 31 | \item{agg.diff}{The time interval used in the aggregation of data. Only used for type.output='agg'. It should contain a integer followed by a time unit ('sec' or 'secs', 'min' or 'mins', 'hour' or 'hours', 'day' or 'days'). 32 | Example: agg.diff = '15 mins', agg.diff = '1 hour'.} 33 | } 34 | \value{ 35 | A dataframe with trade data (aggregated or raw) 36 | } 37 | \description{ 38 | Reads zip file downloaded from Bovespa ftp (orders) - INTERNAL USE 39 | } 40 | \examples{ 41 | 42 | # no example 43 | } 44 | -------------------------------------------------------------------------------- /man/ghfd_read_file.trades.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ghfd_read_file.R 3 | \name{ghfd_read_file.trades} 4 | \alias{ghfd_read_file.trades} 5 | \title{Reads zip file downloaded from Bovespa ftp (trades) - INTERNAL USE} 6 | \usage{ 7 | ghfd_read_file.trades( 8 | out.file, 9 | my.assets = NULL, 10 | type.matching = NULL, 11 | first.time = "10:00:00", 12 | last.time = "17:00:00", 13 | type.output = "agg", 14 | agg.diff = "15 min" 15 | ) 16 | } 17 | \arguments{ 18 | \item{out.file}{Name of zip file} 19 | 20 | \item{my.assets}{The tickers (symbols) of the derised assets to import data (e.g. c('PETR4', 'VALE5')). The function allow for partial patching (e.g. 'PETR' for all assets related to Petrobras). Default is set to NULL (download all available tickers)} 21 | 22 | \item{type.matching}{Type of matching for asset names in data ('exact' or 'partial')} 23 | 24 | \item{first.time}{The first intraday period to import the data. All trades/orders before this time of day are ignored. As character, e.g. '10:00:00'.} 25 | 26 | \item{last.time}{The last intraday period to import the data. All trades/orders after this time of day are ignored. As character, e.g. '18:00:00'.} 27 | 28 | \item{type.output}{Defines the type of output of the data. The choice 'agg' outputs aggregated data for time intervals defined in agg.diff. 29 | The choice 'raw' outputs the raw, tick by tick/order by order, data from the zip files.} 30 | 31 | \item{agg.diff}{The time interval used in the aggregation of data. Only used for type.output='agg'. It should contain a integer followed by a time unit ('sec' or 'secs', 'min' or 'mins', 'hour' or 'hours', 'day' or 'days'). 32 | Example: agg.diff = '15 mins', agg.diff = '1 hour'.} 33 | } 34 | \value{ 35 | A dataframe with trade data (aggregated or raw) 36 | } 37 | \description{ 38 | Reads zip file downloaded from Bovespa ftp (trades) - INTERNAL USE 39 | } 40 | \examples{ 41 | 42 | # no example 43 | } 44 | -------------------------------------------------------------------------------- /man/organize.lob.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ghfd_lob_fcts.R 3 | \name{organize.lob} 4 | \alias{organize.lob} 5 | \title{Organizes LOB (internal function)} 6 | \usage{ 7 | organize.lob(my.lob, silent = TRUE) 8 | } 9 | \arguments{ 10 | \item{my.lob}{A LOB (order book)} 11 | 12 | \item{silent}{Should the function print progress ? (TRUE (default) or FALSE)} 13 | } 14 | \value{ 15 | An organized LOB 16 | } 17 | \description{ 18 | This internal recursive function organizes the lob by making sure that all prices and time are ordered. 19 | Every time that prices in the bid and ask matches, it will create a trade and modify the lob accordingly. 20 | } 21 | \examples{ 22 | 23 | # no examples (internal) 24 | } 25 | -------------------------------------------------------------------------------- /man/print.lob.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ghfd_lob_fcts.R 3 | \name{print.lob} 4 | \alias{print.lob} 5 | \title{Prints the LOB} 6 | \usage{ 7 | \method{print}{lob}(my.lob, max.level = 3) 8 | } 9 | \arguments{ 10 | \item{my.lob}{A LOB (order book)} 11 | 12 | \item{max.level}{Max level of lob to print} 13 | } 14 | \value{ 15 | nothing 16 | } 17 | \description{ 18 | Prints the LOB 19 | } 20 | \examples{ 21 | # no example (internal) 22 | } 23 | -------------------------------------------------------------------------------- /man/process.lob.from.df.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ghfd_lob_fcts.R 3 | \name{process.lob.from.df} 4 | \alias{process.lob.from.df} 5 | \title{Process LOB from asset dataframe} 6 | \usage{ 7 | process.lob.from.df(asset.df, silent = TRUE) 8 | } 9 | \arguments{ 10 | \item{asset.df}{A dataframe with orders for a single asset} 11 | 12 | \item{silent}{Should the function print progress ? (TRUE (default) or FALSE)} 13 | } 14 | \value{ 15 | The lob for the single asset 16 | } 17 | \description{ 18 | Process LOB from asset dataframe 19 | } 20 | \examples{ 21 | # no example (internal) 22 | } 23 | -------------------------------------------------------------------------------- /tests/testthat.R: -------------------------------------------------------------------------------- 1 | library(testthat) 2 | library(GetHFData) 3 | 4 | test_check("GetHFData") 5 | -------------------------------------------------------------------------------- /tests/testthat/test_ghfd.R: -------------------------------------------------------------------------------- 1 | library(testthat) 2 | library(GetHFData) 3 | 4 | #test_that(desc = 'Test of download function',{ 5 | # expect_equal(1, 1) } ) 6 | 7 | my.assets <- c('ABEVA20', 'PETRL78') 8 | out.file <- system.file("extdata", 'NEG_OPCOES_20151126.zip', package = "GetHFData") 9 | 10 | df.out <- ghfd_read_file(out.file, my.assets) 11 | 12 | test_that(desc = 'Test of read function',{ 13 | expect_true(nrow(df.out)>0) 14 | } ) 15 | 16 | #cat('\nDeleting test folder') 17 | #unlink(dl.folder, recursive = T) 18 | 19 | -------------------------------------------------------------------------------- /vignettes/ghfd-vignette-LOB.R: -------------------------------------------------------------------------------- 1 | ## ----notrun, eval=FALSE------------------------------------------------------- 2 | # library(GetHFData) 3 | # 4 | # first.time <- '10:00:00' 5 | # last.time <- '17:00:00' 6 | # 7 | # first.date <- '2016-08-18' 8 | # last.date <- '2016-08-18' 9 | # 10 | # type.output <- 'raw' # aggregates data 11 | # 12 | # my.assets <- 'PETR4F' 13 | # type.matching <- 'exact' 14 | # type.market = 'equity-odds' 15 | # type.data <- 'orders' # order data 16 | # 17 | # df.out <- ghfd_get_HF_data(my.assets =my.assets, 18 | # type.data= type.data, 19 | # type.matching = type.matching, 20 | # type.market = type.market, 21 | # first.date = first.date, 22 | # last.date = last.date, 23 | # first.time = first.time, 24 | # last.time = last.time, 25 | # type.output = type.output) 26 | # 27 | # df.lob <- ghfd_build_lob(df.out) 28 | # 29 | # 30 | 31 | -------------------------------------------------------------------------------- /vignettes/ghfd-vignette-LOB.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Recreating the LOB (limit order book)" 3 | author: "Marcelo Perlin" 4 | date: "`r Sys.Date()`" 5 | output: rmarkdown::html_vignette 6 | vignette: > 7 | %\VignetteIndexEntry{Recreating the LOB (limit order book)} 8 | %\VignetteEngine{knitr::rmarkdown} 9 | %\VignetteEncoding{UTF-8} 10 | --- 11 | 12 | Version 1.4 of `GetHFData` adds functions for recreating the LOB (limit order book) from the order data. The LOB is recreated by sorting all trading orders (buy and sell) and matching them whenever there is a match of prices. 13 | 14 | Simulating the LOB is a recursive and computer intensive problem. The current code is not optimized for speed and it may take a long time to process even a small set of financial orders. 15 | 16 | Here's an example of usage: 17 | 18 | ```{r notrun, eval=FALSE} 19 | library(GetHFData) 20 | 21 | first.time <- '10:00:00' 22 | last.time <- '17:00:00' 23 | 24 | first.date <- '2016-08-18' 25 | last.date <- '2016-08-18' 26 | 27 | type.output <- 'raw' # aggregates data 28 | 29 | my.assets <- 'PETR4F' 30 | type.matching <- 'exact' 31 | type.market = 'equity-odds' 32 | type.data <- 'orders' # order data 33 | 34 | df.out <- ghfd_get_HF_data(my.assets =my.assets, 35 | type.data= type.data, 36 | type.matching = type.matching, 37 | type.market = type.market, 38 | first.date = first.date, 39 | last.date = last.date, 40 | first.time = first.time, 41 | last.time = last.time, 42 | type.output = type.output) 43 | 44 | df.lob <- ghfd_build_lob(df.out) 45 | 46 | 47 | ``` 48 | -------------------------------------------------------------------------------- /vignettes/ghfd-vignette-Orders.R: -------------------------------------------------------------------------------- 1 | ## ----notrun, eval=FALSE------------------------------------------------------- 2 | # library(GetHFData) 3 | # 4 | # first.time <- '10:00:00' 5 | # last.time <- '17:00:00' 6 | # 7 | # first.date <- '2015-08-18' 8 | # last.date <- '2015-08-18' 9 | # 10 | # type.output <- 'agg' # aggregates data 11 | # agg.diff <- '5 min' # interval for aggregation 12 | # 13 | # my.assets <- 'PETR' # all options related to Petrobras (partial matching) 14 | # type.matching <- 'partial' # finds tickers from my.assets using partial matching 15 | # type.market = 'options' # option market 16 | # type.data <- 'orders' # order data 17 | # 18 | # df.out <- ghfd_get_HF_data(my.assets =my.assets, 19 | # type.data= type.data, 20 | # type.matching = type.matching, 21 | # type.market = type.market, 22 | # first.date = first.date, 23 | # last.date = last.date, 24 | # first.time = first.time, 25 | # last.time = last.time, 26 | # type.output = type.output, 27 | # agg.diff = agg.diff) 28 | # 29 | 30 | -------------------------------------------------------------------------------- /vignettes/ghfd-vignette-Orders.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Downloading and aggregating order data from Bovespa" 3 | author: "Marcelo Perlin" 4 | date: "`r Sys.Date()`" 5 | output: rmarkdown::html_vignette 6 | vignette: > 7 | %\VignetteIndexEntry{Downloading and aggregating order data} 8 | %\VignetteEngine{knitr::rmarkdown} 9 | %\VignetteEncoding{UTF-8} 10 | --- 11 | 12 | Version 1.3 of `GetHFData` makes it possible to download and aggregate order data from Bovespa. The data comprises buy and sell orders sent by market operators. Tabular data includes type of orders (buy or sell, new/update/cancel/..), date/time of submission, priority time, prices, order quantity, among many other information. 13 | 14 | **Be aware that these are very large files.** One day of buy and sell orders in the equity market is around 100 MB zipped and close to 1 GB unzipped. If you computer is not suited to store this data in its memory, **it will crash**. 15 | 16 | Here's an example of usage that will download and aggregate order data for all option contracts related to Petrobras (PETR): 17 | 18 | ```{r notrun, eval=FALSE} 19 | library(GetHFData) 20 | 21 | first.time <- '10:00:00' 22 | last.time <- '17:00:00' 23 | 24 | first.date <- '2015-08-18' 25 | last.date <- '2015-08-18' 26 | 27 | type.output <- 'agg' # aggregates data 28 | agg.diff <- '5 min' # interval for aggregation 29 | 30 | my.assets <- 'PETR' # all options related to Petrobras (partial matching) 31 | type.matching <- 'partial' # finds tickers from my.assets using partial matching 32 | type.market = 'options' # option market 33 | type.data <- 'orders' # order data 34 | 35 | df.out <- ghfd_get_HF_data(my.assets =my.assets, 36 | type.data= type.data, 37 | type.matching = type.matching, 38 | type.market = type.market, 39 | first.date = first.date, 40 | last.date = last.date, 41 | first.time = first.time, 42 | last.time = last.time, 43 | type.output = type.output, 44 | agg.diff = agg.diff) 45 | 46 | ``` 47 | -------------------------------------------------------------------------------- /vignettes/ghfd-vignette-Trades.R: -------------------------------------------------------------------------------- 1 | ## ----example1----------------------------------------------------------------- 2 | library(GetHFData) 3 | 4 | out.file <- system.file("extdata", 'NEG_OPCOES_20151126.zip', package = "GetHFData") 5 | df.tickers <- ghfd_get_available_tickers_from_file(out.file) 6 | print(head(df.tickers)) # show only 10 7 | 8 | ## ----example2----------------------------------------------------------------- 9 | 10 | my.assets <- df.tickers$tickers[1:3] # ticker to find in zip file 11 | 12 | type.matching <- 'exact' # defines how to match assets in dataset 13 | start.time <- '10:00:00' # defines first time period of day 14 | last.time <- '17:00:00' # defines last time period of day 15 | 16 | my.df <- ghfd_read_file(out.file, 17 | type.matching = type.matching, 18 | my.assets = my.assets, 19 | first.time = '10:00:00', 20 | last.time = '17:00:00', 21 | type.output = 'raw', 22 | agg.diff = '15 min') 23 | 24 | 25 | ## ----------------------------------------------------------------------------- 26 | head(my.df) 27 | 28 | ## ----------------------------------------------------------------------------- 29 | names(my.df) 30 | 31 | ## ----plot.prices, fig.width=7, fig.height=2.5--------------------------------- 32 | library(ggplot2) 33 | 34 | p <- ggplot(my.df, aes(x = TradeDateTime, y = TradePrice, color = InstrumentSymbol)) 35 | p <- p + geom_line() 36 | print(p) 37 | 38 | ## ----notrun, eval=FALSE------------------------------------------------------- 39 | # library(GetHFData) 40 | # 41 | # first.time <- '11:00:00' 42 | # last.time <- '17:00:00' 43 | # 44 | # first.date <- '2015-11-01' 45 | # last.date <- '2015-11-10' 46 | # type.output <- 'agg' 47 | # type.data <- 'trades' 48 | # agg.diff <- '15 min' 49 | # 50 | # # partial matching is available 51 | # my.assets <- c('PETR','VALE') 52 | # type.matching <- 'partial' 53 | # type.market <- 'equity' 54 | # 55 | # df.out <- ghfd_get_HF_data(my.assets =my.assets, 56 | # type.matching = type.matching, 57 | # type.market = type.market, 58 | # type.data = type.data, 59 | # first.date = first.date, 60 | # last.date = last.date, 61 | # first.time = first.time, 62 | # last.time = last.time, 63 | # type.output = type.output, 64 | # agg.diff = agg.diff) 65 | # 66 | 67 | -------------------------------------------------------------------------------- /vignettes/ghfd-vignette-Trades.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Downloading and aggregating trade data from Bovespa" 3 | author: "Marcelo Perlin" 4 | date: "`r Sys.Date()`" 5 | output: rmarkdown::html_vignette 6 | vignette: > 7 | %\VignetteIndexEntry{Downloading and aggregating trade data} 8 | %\VignetteEngine{knitr::rmarkdown} 9 | %\VignetteEncoding{UTF-8} 10 | --- 11 | 12 | Recently, Bovespa, the Brazilian financial exchange company, allowed external access to its [ftp site](ftp://ftp.bmf.com.br/). In this address one can find several information regarding the Brazilian financial system, including datasets with high frequency (tick by tick) trading data for three different markets: equity, options and BMF. 13 | 14 | Downloading and processing these files, however, can be exausting. The dataset is composed of zip files with the whole trading data, separated by day and market. These files are huge in size and processing or aggregating them in a usefull manner requires specific knowledge for the structure of the dataset. 15 | 16 | The package GetHFData make is easy to access this dataset directly by allowing the easy importation and aggregations of it. Based on this package the user can: 17 | 18 | * Access the contents of the Bovespa ftp using function function `ghfd_get_ftp_contents` 19 | * Get the list of available ticker in the trading data using `ghfd_get_available_tickers_from_ftp` 20 | * Download individual files using `ghfd_download_file` 21 | * Download and process a batch of dates and assets codes with `ghfd_get_HF_data` 22 | 23 | In the next example we will only use a local file from the package. Given the size of the files in the ftp and the CHECK process of CRAN, it makes sense to keep this vignette compact and fast to run. More details about the usage of the package can be found in my [RBFIN paper](http://bibliotecadigital.fgv.br/ojs/index.php/rbfin/article/view/64587/65702 ). 24 | 25 | 26 | ## Reading trading data from local file (1 date) 27 | 28 | Let's assume you need to analize high frequency trading data for option contracts in a given date (2015-11-26). This file could be downloaded from the ftp using function `ghfd_download_file`, but it is already available locally within the package. 29 | 30 | The first step is to check the available tickers in the zip file: 31 | 32 | ```{r example1} 33 | library(GetHFData) 34 | 35 | out.file <- system.file("extdata", 'NEG_OPCOES_20151126.zip', package = "GetHFData") 36 | df.tickers <- ghfd_get_available_tickers_from_file(out.file) 37 | print(head(df.tickers)) # show only 10 38 | ``` 39 | 40 | In `df.tickers` one can find the symbols available in the file and also the number of trades for each. Now, lets take the 3 most traded instruments in that day and check the result of the import process: 41 | 42 | ```{r example2} 43 | 44 | my.assets <- df.tickers$tickers[1:3] # ticker to find in zip file 45 | 46 | type.matching <- 'exact' # defines how to match assets in dataset 47 | start.time <- '10:00:00' # defines first time period of day 48 | last.time <- '17:00:00' # defines last time period of day 49 | 50 | my.df <- ghfd_read_file(out.file, 51 | type.matching = type.matching, 52 | my.assets = my.assets, 53 | first.time = '10:00:00', 54 | last.time = '17:00:00', 55 | type.output = 'raw', 56 | agg.diff = '15 min') 57 | 58 | ``` 59 | 60 | Let's see the first part of the imported dataframe. 61 | 62 | ```{r} 63 | head(my.df) 64 | ``` 65 | 66 | The columns names are self explanatory: 67 | 68 | ```{r} 69 | names(my.df) 70 | ``` 71 | 72 | Now lets plot the prices of all instruments: 73 | 74 | ```{r plot.prices, fig.width=7, fig.height=2.5} 75 | library(ggplot2) 76 | 77 | p <- ggplot(my.df, aes(x = TradeDateTime, y = TradePrice, color = InstrumentSymbol)) 78 | p <- p + geom_line() 79 | print(p) 80 | ``` 81 | 82 | As we can see, this was a fairly stable day for the price of these option contracts. 83 | 84 | ## Downloading and reading trading data for several dates 85 | 86 | In the last example we only used one date. The package GetHDData also supports batch downloads and processing of several different tickers using start and end dates. In this vignette we are not running the code given the large size of the downloaded files. You should try the next example in your own computer (just copy, paste and run the code in R). 87 | 88 | In this example we will download files from the ftp for all stocks related to Petrobras (PETR) and Vale do Rio Doce (VALE). The data will be processed, resulting in a dataframe with aggregated data. 89 | 90 | ```{r notrun, eval=FALSE} 91 | library(GetHFData) 92 | 93 | first.time <- '11:00:00' 94 | last.time <- '17:00:00' 95 | 96 | first.date <- '2015-11-01' 97 | last.date <- '2015-11-10' 98 | type.output <- 'agg' 99 | type.data <- 'trades' 100 | agg.diff <- '15 min' 101 | 102 | # partial matching is available 103 | my.assets <- c('PETR','VALE') 104 | type.matching <- 'partial' 105 | type.market <- 'equity' 106 | 107 | df.out <- ghfd_get_HF_data(my.assets =my.assets, 108 | type.matching = type.matching, 109 | type.market = type.market, 110 | type.data = type.data, 111 | first.date = first.date, 112 | last.date = last.date, 113 | first.time = first.time, 114 | last.time = last.time, 115 | type.output = type.output, 116 | agg.diff = agg.diff) 117 | 118 | ``` 119 | --------------------------------------------------------------------------------