Version 1.4 of GetHFData adds functions for recreating the LOB (limit order book) from the order data. The LOB is recreated by sorting all trading orders (buy and sell) and matching them whenever there is a match of prices.
101 |
Simulating the LOB is a recursive and computer intensive problem. The current code is not optimized for speed and it may take a long time to process even a small set of financial orders.
Version 1.3 of GetHFData makes it possible to download and aggregate order data from Bovespa. The data comprises buy and sell orders sent by market operators. Tabular data includes type of orders (buy or sell, new/update/cancel/..), date/time of submission, priority time, prices, order quantity, among many other information.
101 |
Be aware that these are very large files. One day of buy and sell orders in the equity market is around 100 MB zipped and close to 1 GB unzipped. If you computer is not suited to store this data in its memory, it will crash.
102 |
Here’s an example of usage that will download and aggregate order data for all option contracts related to Petrobras (PETR):
Perlin M, Ramos H (2016).
122 | GetHFData: A R Package for Downloading and Aggregating High Frequency Trading Data from Bovespa.
123 | https://ssrn.com/abstract=2824058.
124 |
125 |
@Manual{,
126 | title = {GetHFData: A R Package for Downloading and Aggregating High Frequency Trading Data from Bovespa},
127 | author = {Marcelo Perlin and Henrique Ramos},
128 | year = {2016},
129 | journal = {Available at SSRN},
130 | url = {https://ssrn.com/abstract=2824058},
131 | }
Fixed bug in ghfd_get_ftp_contents for ‘equity’ option
128 |
129 |
130 |
131 |
132 | Version 1.5 (2017-11-27)
133 |
Minor update:
134 |
135 |
Added support for milsecond in LOB
136 |
137 |
138 |
139 |
140 | Version 1.4 (2017-09-10)
141 |
Major update:
142 |
143 |
Users can now recreate the LOB (limit order book) using order data from Bovespa
144 |
fixed bug for only.dl = TRUE
145 |
146 |
147 |
148 |
149 | Version 1.3 (2017-05-29)
150 |
Major update:
151 |
152 |
Users can now download and aggregate order files (input type.data)
153 |
Fixed link to paper
154 |
Partial matching for assets is now possible (e.g. use PETR for all stocks or options related to Petrobras)
155 |
implement option for only downloading files (this is helpful if you are dealing with order data and will process the files in other R session or software)
156 |
muted message “Using ‘,’ as decimal and ‘.’ as grouping mark. Use read_delim() for more control.”
157 |
158 |
159 |
160 |
161 | Version 1.2.4 (2017-01-30)
162 |
Minor update:
163 |
164 |
Fixed bug in msg output when length(my.assets) > 2
165 |
166 |
167 |
168 |
169 | Version 1.2.3 (2017-01-13)
170 |
Minor update:
171 |
172 |
Fixed bug for non existing assets in first date of download process
173 |
Changed input Date for simpler format (e.g. ‘2016-01-01’ and not as.Date(‘2016-01-01’))
174 |
175 |
176 |
177 |
178 | Version 1.2.2 (2016-12-05)
179 |
Minor update:
180 |
181 |
Revised apa citation on attach
182 |
Fixed some typos in vignette and added link to SSRN paper
183 |
184 |
185 |
186 |
187 | Version 1.2.1 (2016-11-07)
188 |
Minor update with the following changes:
189 |
190 |
The user can now download data from the odd lots equity market (type.market=‘equity-odds’)
191 |
Added Henrique Ramos as a contributor
192 |
Other minor changes
193 |
194 |
195 |
196 |
197 | Version 1.2.0 (2016-10-14)
198 |
Minor update with the following changes:
199 |
200 |
The function ghfd_get_HF_data now allows for partial matching of asset names and also the download of all assets available in ftp files
201 |
Function ghfd_get_available_tickers_from_ftp also returns the type of market in data.frame
202 |
203 |
204 |
205 |
206 | Version 1.1.0 (2016-08-15)
207 |
Major update from initial version with the following changes:
208 |
209 |
The function for finding tickers in the ftp now looks for the closest date in the case that the actual date is missing from the ftp
210 |
The function for finding tickers now returns a dataframe with the tickers and number of trades
211 |
Added control for bad files
212 |
The output for raw and agg type of output were revised
This function will take as input a ftp addresss, the name of the downloaded file in the local drive,
130 | and it will download the corresponding file. Returns TRUE if it worked and FALSE otherwise.
This function will read the zip file downloaded from Bovespa and output
131 | a numeric vector where the names of the elements represents the different tickers
132 | and the numeric values as the number of trades for each ticker
133 |
134 |
135 |
136 |
ghfd_get_available_tickers_from_file(out.file)
137 |
138 |
Arguments
139 |
140 |
141 |
142 |
out.file
143 |
Name of downloaded file with HFT data from Bovespa
144 |
145 |
146 |
147 |
Value
148 |
149 |
A dataframe with the number of trades for each ticker found in file
150 |
151 |
152 |
Examples
153 |
154 | ## get file from package (usually this would be been downloaded from the ftp)
155 | out.file<-system.file("extdata", 'NEG_OPCOES_20151126.zip', package="GetHFData")
156 |
157 | df.tickers<-ghfd_get_available_tickers_from_file(out.file)
158 |
159 | print(head(df.tickers))
This function will read the Bovespa ftp for a given market/date and output
131 | a numeric vector where the names of the elements represents the different tickers
132 | and the numeric values as the number of trades for each ticker
The tickers (symbols) of the derised assets to import data (e.g. c('PETR4', 'VALE5')). The function allow for partial patching (e.g. 'PETR' for all assets related to Petrobras). Default is set to NULL (download all available tickers)
146 |
147 |
148 |
type.matching
149 |
Type of matching for asset names in data ('exact' or 'partial')
150 |
151 |
152 |
first.time
153 |
The first intraday period to import the data. All trades/orders before this time of day are ignored. As character, e.g. '10:00:00'.
154 |
155 |
156 |
last.time
157 |
The last intraday period to import the data. All trades/orders after this time of day are ignored. As character, e.g. '18:00:00'.
158 |
159 |
160 |
type.output
161 |
Defines the type of output of the data. The choice 'agg' outputs aggregated data for time intervals defined in agg.diff.
162 | The choice 'raw' outputs the raw, tick by tick/order by order, data from the zip files.
163 |
164 |
165 |
agg.diff
166 |
The time interval used in the aggregation of data. Only used for type.output='agg'. It should contain a integer followed by a time unit ('sec' or 'secs', 'min' or 'mins', 'hour' or 'hours', 'day' or 'days').
167 | Example: agg.diff = '15 mins', agg.diff = '1 hour'.
The tickers (symbols) of the derised assets to import data (e.g. c('PETR4', 'VALE5')). The function allow for partial patching (e.g. 'PETR' for all assets related to Petrobras). Default is set to NULL (download all available tickers)
146 |
147 |
148 |
type.matching
149 |
Type of matching for asset names in data ('exact' or 'partial')
150 |
151 |
152 |
first.time
153 |
The first intraday period to import the data. All trades/orders before this time of day are ignored. As character, e.g. '10:00:00'.
154 |
155 |
156 |
last.time
157 |
The last intraday period to import the data. All trades/orders after this time of day are ignored. As character, e.g. '18:00:00'.
158 |
159 |
160 |
type.output
161 |
Defines the type of output of the data. The choice 'agg' outputs aggregated data for time intervals defined in agg.diff.
162 | The choice 'raw' outputs the raw, tick by tick/order by order, data from the zip files.
163 |
164 |
165 |
agg.diff
166 |
The time interval used in the aggregation of data. Only used for type.output='agg'. It should contain a integer followed by a time unit ('sec' or 'secs', 'min' or 'mins', 'hour' or 'hours', 'day' or 'days').
167 | Example: agg.diff = '15 mins', agg.diff = '1 hour'.
This internal recursive function organizes the lob by making sure that all prices and time are ordered.
130 | Every time that prices in the bid and ask matches, it will create a trade and modify the lob accordingly.
131 |
132 |
133 |
134 |
organize.lob(my.lob, silent=TRUE)
135 |
136 |
Arguments
137 |
138 |
139 |
140 |
my.lob
141 |
A LOB (order book)
142 |
143 |
144 |
silent
145 |
Should the function print progress ? (TRUE (default) or FALSE)
180 |
181 |
182 |
183 |
184 |
185 |
186 |
--------------------------------------------------------------------------------
/inst/CITATION:
--------------------------------------------------------------------------------
1 | bibentry(bibtype = "Manual",
2 | title = "GetHFData: A R Package for Downloading and Aggregating High Frequency Trading Data from Bovespa",
3 | author = c(person("Marcelo", "Perlin"),
4 | person("Henrique", "Ramos")),
5 | year = 2016,
6 | journal = 'Available at SSRN',
7 | url = "https://ssrn.com/abstract=2824058")
8 |
--------------------------------------------------------------------------------
/inst/extdata/Example_Orders.RData:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/msperlin/GetHFData/33328a5920a1087fc3729893d17f532d5970f349/inst/extdata/Example_Orders.RData
--------------------------------------------------------------------------------
/inst/extdata/NEG_OPCOES_20151126.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/msperlin/GetHFData/33328a5920a1087fc3729893d17f532d5970f349/inst/extdata/NEG_OPCOES_20151126.zip
--------------------------------------------------------------------------------
/man/add.order.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/ghfd_lob_fcts.R
3 | \name{add.order}
4 | \alias{add.order}
5 | \title{Adds an order to the LOB}
6 | \usage{
7 | add.order(my.lob, order.in, silent = TRUE)
8 | }
9 | \arguments{
10 | \item{my.lob}{A LOB (order book)}
11 |
12 | \item{order.in}{An order from the data}
13 |
14 | \item{silent}{Should the function print progress ? (TRUE (default) or FALSE)}
15 | }
16 | \value{
17 | An LOB with the new order
18 | }
19 | \description{
20 | Adds an order to the LOB
21 | }
22 | \examples{
23 | # no example (internal)
24 | }
25 |
--------------------------------------------------------------------------------
/man/ghfd_build_lob.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/ghfd_build_lob.R
3 | \name{ghfd_build_lob}
4 | \alias{ghfd_build_lob}
5 | \title{Building LOB (limit order book) from orders}
6 | \usage{
7 | ghfd_build_lob(df.orders, silent = TRUE)
8 | }
9 | \arguments{
10 | \item{df.orders}{A dataframe, output from ghfd_GetHFData}
11 |
12 | \item{silent}{Should the function print progress ? (TRUE (default) or FALSE)}
13 | }
14 | \value{
15 | A dataframe with information about LOB
16 | }
17 | \description{
18 | Building LOB (limit order book) from orders
19 | }
20 | \examples{
21 | \dontrun{
22 | library(GetHFData)
23 | first.time <- '11:00:00'
24 | last.time <- '17:00:00'
25 | first.date <- as.Date('2015-11-03')
26 | last.date <- as.Date('2015-11-03')
27 | type.output <- 'raw'
28 | type.data <- 'orders'
29 | type.market = 'equity-odds'
30 |
31 | df.out <- ghfd_get_HF_data(my.assets =my.assets,
32 | type.market = type.market,
33 | type.data = type.data,
34 | first.date = first.date,
35 | last.date = last.date,
36 | first.time = first.time,
37 | last.time = last.time,
38 | type.output = type.output)
39 |
40 | df.lob <- ghfd_build_lob(df.out)
41 | }
42 | }
43 |
--------------------------------------------------------------------------------
/man/ghfd_download_file.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/ghfd_download_file.R
3 | \name{ghfd_download_file}
4 | \alias{ghfd_download_file}
5 | \title{Downloads a single file from Bovespa ftp}
6 | \usage{
7 | ghfd_download_file(my.ftp, out.file, dl.dir = "Dl Files", max.dl.tries = 10)
8 | }
9 | \arguments{
10 | \item{my.ftp}{A complete, including file name, ftp address to download the file from}
11 |
12 | \item{out.file}{Name of downloaded file with HFT data from Bovespa}
13 |
14 | \item{dl.dir}{The folder to download the zip files (default = 'ftp files')}
15 |
16 | \item{max.dl.tries}{Maximum attempts to download the files from ftp}
17 | }
18 | \value{
19 | TRUE if sucessfull, FALSE if not
20 | }
21 | \description{
22 | This function will take as input a ftp addresss, the name of the downloaded file in the local drive,
23 | and it will download the corresponding file. Returns TRUE if it worked and FALSE otherwise.
24 | }
25 | \examples{
26 |
27 | my.ftp <- 'ftp://ftp.bmf.com.br/MarketData/Bovespa-Opcoes/NEG_OPCOES_20151229.zip'
28 | out.file <- 'temp.zip'
29 |
30 | \dontrun{
31 | ghfd_download_file(my.ftp = my.ftp, out.file=out.file)
32 | }
33 | }
34 |
--------------------------------------------------------------------------------
/man/ghfd_get_HF_data.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/ghfd_get_HF_data.R
3 | \name{ghfd_get_HF_data}
4 | \alias{ghfd_get_HF_data}
5 | \title{Downloads and aggregates high frequency trading data directly from the Bovespa ftp}
6 | \usage{
7 | ghfd_get_HF_data(
8 | my.assets = NULL,
9 | type.matching = "exact",
10 | type.market = "equity",
11 | type.data = "trades",
12 | first.date = "2016-01-01",
13 | last.date = "2016-01-05",
14 | first.time = NULL,
15 | last.time = NULL,
16 | type.output = "agg",
17 | agg.diff = "15 min",
18 | dl.dir = "ftp files",
19 | max.dl.tries = 10,
20 | clean.files = FALSE,
21 | only.dl = FALSE
22 | )
23 | }
24 | \arguments{
25 | \item{my.assets}{The tickers (symbols) of the derised assets to import data (e.g. c('PETR4', 'VALE5')). The function allow for partial patching (e.g. 'PETR' for all assets related to Petrobras). Default is set to NULL (download all available tickers)}
26 |
27 | \item{type.matching}{Type of matching for asset names in data ('exact' or 'partial')}
28 |
29 | \item{type.market}{The type of market to download data from ('equity', 'equity-odds','options', 'BMF' ).}
30 |
31 | \item{type.data}{The type of financial data to download and aggregate ('trades' or 'orders').}
32 |
33 | \item{first.date}{The first date of the imported data (e.g. '2016-01-01')}
34 |
35 | \item{last.date}{The last date of the imported data (e.g. '2016-01-05')}
36 |
37 | \item{first.time}{The first intraday period to import the data. All trades/orders before this time of day are ignored. As character, e.g. '10:00:00'.}
38 |
39 | \item{last.time}{The last intraday period to import the data. All trades/orders after this time of day are ignored. As character, e.g. '18:00:00'.}
40 |
41 | \item{type.output}{Defines the type of output of the data. The choice 'agg' outputs aggregated data for time intervals defined in agg.diff.
42 | The choice 'raw' outputs the raw, tick by tick/order by order, data from the zip files.}
43 |
44 | \item{agg.diff}{The time interval used in the aggregation of data. Only used for type.output='agg'. It should contain a integer followed by a time unit ('sec' or 'secs', 'min' or 'mins', 'hour' or 'hours', 'day' or 'days').
45 | Example: agg.diff = '15 mins', agg.diff = '1 hour'.}
46 |
47 | \item{dl.dir}{The folder to download the zip files (default = 'ftp files')}
48 |
49 | \item{max.dl.tries}{Maximum attempts to download the files from ftp}
50 |
51 | \item{clean.files}{Logical. Should the files be removed after reading it? (TRUE or FALSE)}
52 |
53 | \item{only.dl}{Logical. Should the function only download the files? (TRUE or FALSE). This is usefull if you just want the file for later analysis}
54 | }
55 | \value{
56 | A dataframe with the financial data in the raw format (tick by tick) or aggregated
57 | }
58 | \description{
59 | This function downloads zip files containing trades from Bovespa's ftp (ftp://ftp.bmf.com.br/MarketData/) and imports it into R.
60 | See the vignette and examples for more details on how to use the function.
61 | }
62 | \examples{
63 |
64 | my.assets <- 'ABEVA69'
65 | type.market <- 'options'
66 | first.date <- as.Date('2015-12-29')
67 | last.date <- as.Date('2015-12-29')
68 |
69 | \dontrun{
70 | df.out <- ghfd_get_HF_data(my.assets, type.market, first.date, last.date)
71 | }
72 | }
73 |
--------------------------------------------------------------------------------
/man/ghfd_get_available_tickers_from_file.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/ghfd_get_available_tickers_from_file.R
3 | \name{ghfd_get_available_tickers_from_file}
4 | \alias{ghfd_get_available_tickers_from_file}
5 | \title{Function to get available tickers from downloaded zip file}
6 | \usage{
7 | ghfd_get_available_tickers_from_file(out.file)
8 | }
9 | \arguments{
10 | \item{out.file}{Name of downloaded file with HFT data from Bovespa}
11 | }
12 | \value{
13 | A dataframe with the number of trades for each ticker found in file
14 | }
15 | \description{
16 | This function will read the zip file downloaded from Bovespa and output
17 | a numeric vector where the names of the elements represents the different tickers
18 | and the numeric values as the number of trades for each ticker
19 | }
20 | \examples{
21 |
22 | ## get file from package (usually this would be been downloaded from the ftp)
23 | out.file <- system.file("extdata", 'NEG_OPCOES_20151126.zip', package = "GetHFData")
24 |
25 | df.tickers <- ghfd_get_available_tickers_from_file(out.file)
26 |
27 | print(head(df.tickers))
28 | }
29 |
--------------------------------------------------------------------------------
/man/ghfd_get_available_tickers_from_ftp.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/ghfd_get_available_tickers_from_ftp.R
3 | \name{ghfd_get_available_tickers_from_ftp}
4 | \alias{ghfd_get_available_tickers_from_ftp}
5 | \title{Function to get available tickers from ftp}
6 | \usage{
7 | ghfd_get_available_tickers_from_ftp(
8 | my.date = "2015-11-03",
9 | type.market = "equity",
10 | type.data = "trades",
11 | dl.dir = "ftp files",
12 | max.dl.tries = 10
13 | )
14 | }
15 | \arguments{
16 | \item{my.date}{A single date to check tickers in ftp (e.g. '2015-11-03')}
17 |
18 | \item{type.market}{The type of market to download data from ('equity', 'equity-odds','options', 'BMF' ).}
19 |
20 | \item{type.data}{The type of financial data to download and aggregate ('trades' or 'orders').}
21 |
22 | \item{dl.dir}{The folder to download the zip files (default = 'ftp files')}
23 |
24 | \item{max.dl.tries}{Maximum attempts to download the files from ftp}
25 | }
26 | \value{
27 | A data.frame with the tickers, number of found trades and file name
28 | }
29 | \description{
30 | This function will read the Bovespa ftp for a given market/date and output
31 | a numeric vector where the names of the elements represents the different tickers
32 | and the numeric values as the number of trades for each ticker
33 | }
34 | \examples{
35 |
36 | \dontrun{
37 | df.tickers <- ghfd_get_available_tickers_from_ftp(my.date = '2015-11-03',
38 | type.market = 'BMF')
39 |
40 | print(head(df.tickers))
41 | }
42 | }
43 |
--------------------------------------------------------------------------------
/man/ghfd_get_ftp_contents.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/ghfd_get_ftp_contents.R
3 | \name{ghfd_get_ftp_contents}
4 | \alias{ghfd_get_ftp_contents}
5 | \title{Gets the contents of Bovespa ftp}
6 | \usage{
7 | ghfd_get_ftp_contents(
8 | type.market = "equity",
9 | max.dl.tries = 10,
10 | type.data = "trades"
11 | )
12 | }
13 | \arguments{
14 | \item{type.market}{The type of market to download data from ('equity', 'equity-odds','options', 'BMF' ).}
15 |
16 | \item{max.dl.tries}{Maximum attempts to download the files from ftp}
17 |
18 | \item{type.data}{The type of financial data to download and aggregate ('trades' or 'orders').}
19 | }
20 | \value{
21 | A list with all files from the ftp that are related to executed trades
22 | }
23 | \description{
24 | This function will access the Bovespa ftp and return a vector with all files related to trades (all others are ignored)
25 | }
26 | \examples{
27 |
28 | \dontrun{
29 | ftp.files <- ghfd_get_ftp_contents(type.market = 'equity')
30 | print(ftp.files)
31 | }
32 | }
33 |
--------------------------------------------------------------------------------
/man/ghfd_read_file.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/ghfd_read_file.R
3 | \name{ghfd_read_file}
4 | \alias{ghfd_read_file}
5 | \title{Reads zip file downloaded from Bovespa ftp (trades or orders)}
6 | \usage{
7 | ghfd_read_file(
8 | out.file,
9 | my.assets = NULL,
10 | type.matching = "exact",
11 | type.data = "trades",
12 | first.time = "10:00:00",
13 | last.time = "17:00:00",
14 | type.output = "agg",
15 | agg.diff = "15 min"
16 | )
17 | }
18 | \arguments{
19 | \item{out.file}{Name of zip file}
20 |
21 | \item{my.assets}{The tickers (symbols) of the derised assets to import data (e.g. c('PETR4', 'VALE5')). The function allow for partial patching (e.g. 'PETR' for all assets related to Petrobras). Default is set to NULL (download all available tickers)}
22 |
23 | \item{type.matching}{Type of matching for asset names in data ('exact' or 'partial')}
24 |
25 | \item{type.data}{The type of financial data to download and aggregate ('trades' or 'orders').}
26 |
27 | \item{first.time}{The first intraday period to import the data. All trades/orders before this time of day are ignored. As character, e.g. '10:00:00'.}
28 |
29 | \item{last.time}{The last intraday period to import the data. All trades/orders after this time of day are ignored. As character, e.g. '18:00:00'.}
30 |
31 | \item{type.output}{Defines the type of output of the data. The choice 'agg' outputs aggregated data for time intervals defined in agg.diff.
32 | The choice 'raw' outputs the raw, tick by tick/order by order, data from the zip files.}
33 |
34 | \item{agg.diff}{The time interval used in the aggregation of data. Only used for type.output='agg'. It should contain a integer followed by a time unit ('sec' or 'secs', 'min' or 'mins', 'hour' or 'hours', 'day' or 'days').
35 | Example: agg.diff = '15 mins', agg.diff = '1 hour'.}
36 | }
37 | \value{
38 | A dataframe with the raw (tick by tick/order by order) dataset
39 | }
40 | \description{
41 | Reads zip file downloaded from Bovespa ftp (trades or orders)
42 | }
43 | \examples{
44 |
45 | my.assets <- c('ABEVA20', 'PETRL78')
46 |
47 | ## getting data from local file (in practice it would be downloaded from ftp)
48 | out.file <- system.file("extdata", 'NEG_OPCOES_20151126.zip', package = "GetHFData")
49 |
50 | df.out <- ghfd_read_file(out.file, my.assets)
51 | print(head(df.out))
52 | }
53 |
--------------------------------------------------------------------------------
/man/ghfd_read_file.orders.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/ghfd_read_file.R
3 | \name{ghfd_read_file.orders}
4 | \alias{ghfd_read_file.orders}
5 | \title{Reads zip file downloaded from Bovespa ftp (orders) - INTERNAL USE}
6 | \usage{
7 | ghfd_read_file.orders(
8 | out.file,
9 | my.assets = NULL,
10 | type.matching = NULL,
11 | first.time = "10:00:00",
12 | last.time = "17:00:00",
13 | type.output = "agg",
14 | agg.diff = "15 min"
15 | )
16 | }
17 | \arguments{
18 | \item{out.file}{Name of zip file}
19 |
20 | \item{my.assets}{The tickers (symbols) of the derised assets to import data (e.g. c('PETR4', 'VALE5')). The function allow for partial patching (e.g. 'PETR' for all assets related to Petrobras). Default is set to NULL (download all available tickers)}
21 |
22 | \item{type.matching}{Type of matching for asset names in data ('exact' or 'partial')}
23 |
24 | \item{first.time}{The first intraday period to import the data. All trades/orders before this time of day are ignored. As character, e.g. '10:00:00'.}
25 |
26 | \item{last.time}{The last intraday period to import the data. All trades/orders after this time of day are ignored. As character, e.g. '18:00:00'.}
27 |
28 | \item{type.output}{Defines the type of output of the data. The choice 'agg' outputs aggregated data for time intervals defined in agg.diff.
29 | The choice 'raw' outputs the raw, tick by tick/order by order, data from the zip files.}
30 |
31 | \item{agg.diff}{The time interval used in the aggregation of data. Only used for type.output='agg'. It should contain a integer followed by a time unit ('sec' or 'secs', 'min' or 'mins', 'hour' or 'hours', 'day' or 'days').
32 | Example: agg.diff = '15 mins', agg.diff = '1 hour'.}
33 | }
34 | \value{
35 | A dataframe with trade data (aggregated or raw)
36 | }
37 | \description{
38 | Reads zip file downloaded from Bovespa ftp (orders) - INTERNAL USE
39 | }
40 | \examples{
41 |
42 | # no example
43 | }
44 |
--------------------------------------------------------------------------------
/man/ghfd_read_file.trades.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/ghfd_read_file.R
3 | \name{ghfd_read_file.trades}
4 | \alias{ghfd_read_file.trades}
5 | \title{Reads zip file downloaded from Bovespa ftp (trades) - INTERNAL USE}
6 | \usage{
7 | ghfd_read_file.trades(
8 | out.file,
9 | my.assets = NULL,
10 | type.matching = NULL,
11 | first.time = "10:00:00",
12 | last.time = "17:00:00",
13 | type.output = "agg",
14 | agg.diff = "15 min"
15 | )
16 | }
17 | \arguments{
18 | \item{out.file}{Name of zip file}
19 |
20 | \item{my.assets}{The tickers (symbols) of the derised assets to import data (e.g. c('PETR4', 'VALE5')). The function allow for partial patching (e.g. 'PETR' for all assets related to Petrobras). Default is set to NULL (download all available tickers)}
21 |
22 | \item{type.matching}{Type of matching for asset names in data ('exact' or 'partial')}
23 |
24 | \item{first.time}{The first intraday period to import the data. All trades/orders before this time of day are ignored. As character, e.g. '10:00:00'.}
25 |
26 | \item{last.time}{The last intraday period to import the data. All trades/orders after this time of day are ignored. As character, e.g. '18:00:00'.}
27 |
28 | \item{type.output}{Defines the type of output of the data. The choice 'agg' outputs aggregated data for time intervals defined in agg.diff.
29 | The choice 'raw' outputs the raw, tick by tick/order by order, data from the zip files.}
30 |
31 | \item{agg.diff}{The time interval used in the aggregation of data. Only used for type.output='agg'. It should contain a integer followed by a time unit ('sec' or 'secs', 'min' or 'mins', 'hour' or 'hours', 'day' or 'days').
32 | Example: agg.diff = '15 mins', agg.diff = '1 hour'.}
33 | }
34 | \value{
35 | A dataframe with trade data (aggregated or raw)
36 | }
37 | \description{
38 | Reads zip file downloaded from Bovespa ftp (trades) - INTERNAL USE
39 | }
40 | \examples{
41 |
42 | # no example
43 | }
44 |
--------------------------------------------------------------------------------
/man/organize.lob.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/ghfd_lob_fcts.R
3 | \name{organize.lob}
4 | \alias{organize.lob}
5 | \title{Organizes LOB (internal function)}
6 | \usage{
7 | organize.lob(my.lob, silent = TRUE)
8 | }
9 | \arguments{
10 | \item{my.lob}{A LOB (order book)}
11 |
12 | \item{silent}{Should the function print progress ? (TRUE (default) or FALSE)}
13 | }
14 | \value{
15 | An organized LOB
16 | }
17 | \description{
18 | This internal recursive function organizes the lob by making sure that all prices and time are ordered.
19 | Every time that prices in the bid and ask matches, it will create a trade and modify the lob accordingly.
20 | }
21 | \examples{
22 |
23 | # no examples (internal)
24 | }
25 |
--------------------------------------------------------------------------------
/man/print.lob.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/ghfd_lob_fcts.R
3 | \name{print.lob}
4 | \alias{print.lob}
5 | \title{Prints the LOB}
6 | \usage{
7 | \method{print}{lob}(my.lob, max.level = 3)
8 | }
9 | \arguments{
10 | \item{my.lob}{A LOB (order book)}
11 |
12 | \item{max.level}{Max level of lob to print}
13 | }
14 | \value{
15 | nothing
16 | }
17 | \description{
18 | Prints the LOB
19 | }
20 | \examples{
21 | # no example (internal)
22 | }
23 |
--------------------------------------------------------------------------------
/man/process.lob.from.df.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/ghfd_lob_fcts.R
3 | \name{process.lob.from.df}
4 | \alias{process.lob.from.df}
5 | \title{Process LOB from asset dataframe}
6 | \usage{
7 | process.lob.from.df(asset.df, silent = TRUE)
8 | }
9 | \arguments{
10 | \item{asset.df}{A dataframe with orders for a single asset}
11 |
12 | \item{silent}{Should the function print progress ? (TRUE (default) or FALSE)}
13 | }
14 | \value{
15 | The lob for the single asset
16 | }
17 | \description{
18 | Process LOB from asset dataframe
19 | }
20 | \examples{
21 | # no example (internal)
22 | }
23 |
--------------------------------------------------------------------------------
/tests/testthat.R:
--------------------------------------------------------------------------------
1 | library(testthat)
2 | library(GetHFData)
3 |
4 | test_check("GetHFData")
5 |
--------------------------------------------------------------------------------
/tests/testthat/test_ghfd.R:
--------------------------------------------------------------------------------
1 | library(testthat)
2 | library(GetHFData)
3 |
4 | #test_that(desc = 'Test of download function',{
5 | # expect_equal(1, 1) } )
6 |
7 | my.assets <- c('ABEVA20', 'PETRL78')
8 | out.file <- system.file("extdata", 'NEG_OPCOES_20151126.zip', package = "GetHFData")
9 |
10 | df.out <- ghfd_read_file(out.file, my.assets)
11 |
12 | test_that(desc = 'Test of read function',{
13 | expect_true(nrow(df.out)>0)
14 | } )
15 |
16 | #cat('\nDeleting test folder')
17 | #unlink(dl.folder, recursive = T)
18 |
19 |
--------------------------------------------------------------------------------
/vignettes/ghfd-vignette-LOB.R:
--------------------------------------------------------------------------------
1 | ## ----notrun, eval=FALSE-------------------------------------------------------
2 | # library(GetHFData)
3 | #
4 | # first.time <- '10:00:00'
5 | # last.time <- '17:00:00'
6 | #
7 | # first.date <- '2016-08-18'
8 | # last.date <- '2016-08-18'
9 | #
10 | # type.output <- 'raw' # aggregates data
11 | #
12 | # my.assets <- 'PETR4F'
13 | # type.matching <- 'exact'
14 | # type.market = 'equity-odds'
15 | # type.data <- 'orders' # order data
16 | #
17 | # df.out <- ghfd_get_HF_data(my.assets =my.assets,
18 | # type.data= type.data,
19 | # type.matching = type.matching,
20 | # type.market = type.market,
21 | # first.date = first.date,
22 | # last.date = last.date,
23 | # first.time = first.time,
24 | # last.time = last.time,
25 | # type.output = type.output)
26 | #
27 | # df.lob <- ghfd_build_lob(df.out)
28 | #
29 | #
30 |
31 |
--------------------------------------------------------------------------------
/vignettes/ghfd-vignette-LOB.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Recreating the LOB (limit order book)"
3 | author: "Marcelo Perlin"
4 | date: "`r Sys.Date()`"
5 | output: rmarkdown::html_vignette
6 | vignette: >
7 | %\VignetteIndexEntry{Recreating the LOB (limit order book)}
8 | %\VignetteEngine{knitr::rmarkdown}
9 | %\VignetteEncoding{UTF-8}
10 | ---
11 |
12 | Version 1.4 of `GetHFData` adds functions for recreating the LOB (limit order book) from the order data. The LOB is recreated by sorting all trading orders (buy and sell) and matching them whenever there is a match of prices.
13 |
14 | Simulating the LOB is a recursive and computer intensive problem. The current code is not optimized for speed and it may take a long time to process even a small set of financial orders.
15 |
16 | Here's an example of usage:
17 |
18 | ```{r notrun, eval=FALSE}
19 | library(GetHFData)
20 |
21 | first.time <- '10:00:00'
22 | last.time <- '17:00:00'
23 |
24 | first.date <- '2016-08-18'
25 | last.date <- '2016-08-18'
26 |
27 | type.output <- 'raw' # aggregates data
28 |
29 | my.assets <- 'PETR4F'
30 | type.matching <- 'exact'
31 | type.market = 'equity-odds'
32 | type.data <- 'orders' # order data
33 |
34 | df.out <- ghfd_get_HF_data(my.assets =my.assets,
35 | type.data= type.data,
36 | type.matching = type.matching,
37 | type.market = type.market,
38 | first.date = first.date,
39 | last.date = last.date,
40 | first.time = first.time,
41 | last.time = last.time,
42 | type.output = type.output)
43 |
44 | df.lob <- ghfd_build_lob(df.out)
45 |
46 |
47 | ```
48 |
--------------------------------------------------------------------------------
/vignettes/ghfd-vignette-Orders.R:
--------------------------------------------------------------------------------
1 | ## ----notrun, eval=FALSE-------------------------------------------------------
2 | # library(GetHFData)
3 | #
4 | # first.time <- '10:00:00'
5 | # last.time <- '17:00:00'
6 | #
7 | # first.date <- '2015-08-18'
8 | # last.date <- '2015-08-18'
9 | #
10 | # type.output <- 'agg' # aggregates data
11 | # agg.diff <- '5 min' # interval for aggregation
12 | #
13 | # my.assets <- 'PETR' # all options related to Petrobras (partial matching)
14 | # type.matching <- 'partial' # finds tickers from my.assets using partial matching
15 | # type.market = 'options' # option market
16 | # type.data <- 'orders' # order data
17 | #
18 | # df.out <- ghfd_get_HF_data(my.assets =my.assets,
19 | # type.data= type.data,
20 | # type.matching = type.matching,
21 | # type.market = type.market,
22 | # first.date = first.date,
23 | # last.date = last.date,
24 | # first.time = first.time,
25 | # last.time = last.time,
26 | # type.output = type.output,
27 | # agg.diff = agg.diff)
28 | #
29 |
30 |
--------------------------------------------------------------------------------
/vignettes/ghfd-vignette-Orders.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Downloading and aggregating order data from Bovespa"
3 | author: "Marcelo Perlin"
4 | date: "`r Sys.Date()`"
5 | output: rmarkdown::html_vignette
6 | vignette: >
7 | %\VignetteIndexEntry{Downloading and aggregating order data}
8 | %\VignetteEngine{knitr::rmarkdown}
9 | %\VignetteEncoding{UTF-8}
10 | ---
11 |
12 | Version 1.3 of `GetHFData` makes it possible to download and aggregate order data from Bovespa. The data comprises buy and sell orders sent by market operators. Tabular data includes type of orders (buy or sell, new/update/cancel/..), date/time of submission, priority time, prices, order quantity, among many other information.
13 |
14 | **Be aware that these are very large files.** One day of buy and sell orders in the equity market is around 100 MB zipped and close to 1 GB unzipped. If you computer is not suited to store this data in its memory, **it will crash**.
15 |
16 | Here's an example of usage that will download and aggregate order data for all option contracts related to Petrobras (PETR):
17 |
18 | ```{r notrun, eval=FALSE}
19 | library(GetHFData)
20 |
21 | first.time <- '10:00:00'
22 | last.time <- '17:00:00'
23 |
24 | first.date <- '2015-08-18'
25 | last.date <- '2015-08-18'
26 |
27 | type.output <- 'agg' # aggregates data
28 | agg.diff <- '5 min' # interval for aggregation
29 |
30 | my.assets <- 'PETR' # all options related to Petrobras (partial matching)
31 | type.matching <- 'partial' # finds tickers from my.assets using partial matching
32 | type.market = 'options' # option market
33 | type.data <- 'orders' # order data
34 |
35 | df.out <- ghfd_get_HF_data(my.assets =my.assets,
36 | type.data= type.data,
37 | type.matching = type.matching,
38 | type.market = type.market,
39 | first.date = first.date,
40 | last.date = last.date,
41 | first.time = first.time,
42 | last.time = last.time,
43 | type.output = type.output,
44 | agg.diff = agg.diff)
45 |
46 | ```
47 |
--------------------------------------------------------------------------------
/vignettes/ghfd-vignette-Trades.R:
--------------------------------------------------------------------------------
1 | ## ----example1-----------------------------------------------------------------
2 | library(GetHFData)
3 |
4 | out.file <- system.file("extdata", 'NEG_OPCOES_20151126.zip', package = "GetHFData")
5 | df.tickers <- ghfd_get_available_tickers_from_file(out.file)
6 | print(head(df.tickers)) # show only 10
7 |
8 | ## ----example2-----------------------------------------------------------------
9 |
10 | my.assets <- df.tickers$tickers[1:3] # ticker to find in zip file
11 |
12 | type.matching <- 'exact' # defines how to match assets in dataset
13 | start.time <- '10:00:00' # defines first time period of day
14 | last.time <- '17:00:00' # defines last time period of day
15 |
16 | my.df <- ghfd_read_file(out.file,
17 | type.matching = type.matching,
18 | my.assets = my.assets,
19 | first.time = '10:00:00',
20 | last.time = '17:00:00',
21 | type.output = 'raw',
22 | agg.diff = '15 min')
23 |
24 |
25 | ## -----------------------------------------------------------------------------
26 | head(my.df)
27 |
28 | ## -----------------------------------------------------------------------------
29 | names(my.df)
30 |
31 | ## ----plot.prices, fig.width=7, fig.height=2.5---------------------------------
32 | library(ggplot2)
33 |
34 | p <- ggplot(my.df, aes(x = TradeDateTime, y = TradePrice, color = InstrumentSymbol))
35 | p <- p + geom_line()
36 | print(p)
37 |
38 | ## ----notrun, eval=FALSE-------------------------------------------------------
39 | # library(GetHFData)
40 | #
41 | # first.time <- '11:00:00'
42 | # last.time <- '17:00:00'
43 | #
44 | # first.date <- '2015-11-01'
45 | # last.date <- '2015-11-10'
46 | # type.output <- 'agg'
47 | # type.data <- 'trades'
48 | # agg.diff <- '15 min'
49 | #
50 | # # partial matching is available
51 | # my.assets <- c('PETR','VALE')
52 | # type.matching <- 'partial'
53 | # type.market <- 'equity'
54 | #
55 | # df.out <- ghfd_get_HF_data(my.assets =my.assets,
56 | # type.matching = type.matching,
57 | # type.market = type.market,
58 | # type.data = type.data,
59 | # first.date = first.date,
60 | # last.date = last.date,
61 | # first.time = first.time,
62 | # last.time = last.time,
63 | # type.output = type.output,
64 | # agg.diff = agg.diff)
65 | #
66 |
67 |
--------------------------------------------------------------------------------
/vignettes/ghfd-vignette-Trades.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Downloading and aggregating trade data from Bovespa"
3 | author: "Marcelo Perlin"
4 | date: "`r Sys.Date()`"
5 | output: rmarkdown::html_vignette
6 | vignette: >
7 | %\VignetteIndexEntry{Downloading and aggregating trade data}
8 | %\VignetteEngine{knitr::rmarkdown}
9 | %\VignetteEncoding{UTF-8}
10 | ---
11 |
12 | Recently, Bovespa, the Brazilian financial exchange company, allowed external access to its [ftp site](ftp://ftp.bmf.com.br/). In this address one can find several information regarding the Brazilian financial system, including datasets with high frequency (tick by tick) trading data for three different markets: equity, options and BMF.
13 |
14 | Downloading and processing these files, however, can be exausting. The dataset is composed of zip files with the whole trading data, separated by day and market. These files are huge in size and processing or aggregating them in a usefull manner requires specific knowledge for the structure of the dataset.
15 |
16 | The package GetHFData make is easy to access this dataset directly by allowing the easy importation and aggregations of it. Based on this package the user can:
17 |
18 | * Access the contents of the Bovespa ftp using function function `ghfd_get_ftp_contents`
19 | * Get the list of available ticker in the trading data using `ghfd_get_available_tickers_from_ftp`
20 | * Download individual files using `ghfd_download_file`
21 | * Download and process a batch of dates and assets codes with `ghfd_get_HF_data`
22 |
23 | In the next example we will only use a local file from the package. Given the size of the files in the ftp and the CHECK process of CRAN, it makes sense to keep this vignette compact and fast to run. More details about the usage of the package can be found in my [RBFIN paper](http://bibliotecadigital.fgv.br/ojs/index.php/rbfin/article/view/64587/65702 ).
24 |
25 |
26 | ## Reading trading data from local file (1 date)
27 |
28 | Let's assume you need to analize high frequency trading data for option contracts in a given date (2015-11-26). This file could be downloaded from the ftp using function `ghfd_download_file`, but it is already available locally within the package.
29 |
30 | The first step is to check the available tickers in the zip file:
31 |
32 | ```{r example1}
33 | library(GetHFData)
34 |
35 | out.file <- system.file("extdata", 'NEG_OPCOES_20151126.zip', package = "GetHFData")
36 | df.tickers <- ghfd_get_available_tickers_from_file(out.file)
37 | print(head(df.tickers)) # show only 10
38 | ```
39 |
40 | In `df.tickers` one can find the symbols available in the file and also the number of trades for each. Now, lets take the 3 most traded instruments in that day and check the result of the import process:
41 |
42 | ```{r example2}
43 |
44 | my.assets <- df.tickers$tickers[1:3] # ticker to find in zip file
45 |
46 | type.matching <- 'exact' # defines how to match assets in dataset
47 | start.time <- '10:00:00' # defines first time period of day
48 | last.time <- '17:00:00' # defines last time period of day
49 |
50 | my.df <- ghfd_read_file(out.file,
51 | type.matching = type.matching,
52 | my.assets = my.assets,
53 | first.time = '10:00:00',
54 | last.time = '17:00:00',
55 | type.output = 'raw',
56 | agg.diff = '15 min')
57 |
58 | ```
59 |
60 | Let's see the first part of the imported dataframe.
61 |
62 | ```{r}
63 | head(my.df)
64 | ```
65 |
66 | The columns names are self explanatory:
67 |
68 | ```{r}
69 | names(my.df)
70 | ```
71 |
72 | Now lets plot the prices of all instruments:
73 |
74 | ```{r plot.prices, fig.width=7, fig.height=2.5}
75 | library(ggplot2)
76 |
77 | p <- ggplot(my.df, aes(x = TradeDateTime, y = TradePrice, color = InstrumentSymbol))
78 | p <- p + geom_line()
79 | print(p)
80 | ```
81 |
82 | As we can see, this was a fairly stable day for the price of these option contracts.
83 |
84 | ## Downloading and reading trading data for several dates
85 |
86 | In the last example we only used one date. The package GetHDData also supports batch downloads and processing of several different tickers using start and end dates. In this vignette we are not running the code given the large size of the downloaded files. You should try the next example in your own computer (just copy, paste and run the code in R).
87 |
88 | In this example we will download files from the ftp for all stocks related to Petrobras (PETR) and Vale do Rio Doce (VALE). The data will be processed, resulting in a dataframe with aggregated data.
89 |
90 | ```{r notrun, eval=FALSE}
91 | library(GetHFData)
92 |
93 | first.time <- '11:00:00'
94 | last.time <- '17:00:00'
95 |
96 | first.date <- '2015-11-01'
97 | last.date <- '2015-11-10'
98 | type.output <- 'agg'
99 | type.data <- 'trades'
100 | agg.diff <- '15 min'
101 |
102 | # partial matching is available
103 | my.assets <- c('PETR','VALE')
104 | type.matching <- 'partial'
105 | type.market <- 'equity'
106 |
107 | df.out <- ghfd_get_HF_data(my.assets =my.assets,
108 | type.matching = type.matching,
109 | type.market = type.market,
110 | type.data = type.data,
111 | first.date = first.date,
112 | last.date = last.date,
113 | first.time = first.time,
114 | last.time = last.time,
115 | type.output = type.output,
116 | agg.diff = agg.diff)
117 |
118 | ```
119 |
--------------------------------------------------------------------------------