YEAR: 2020 63 | COPYRIGHT HOLDER: Thierry Warin ; Romain Le Duc 64 |65 | 66 |
30 |
31 | For people less comfortable with R and to allow more people to have
32 | access to our package, we have also developed a Shiny
33 | application.Through the same logic present in our package, researchers
34 | can retrieve data from Statistics Canada.
35 |
36 | statcanR EploR is available [\[here\]](https://warin.ca/shiny/statcanr/)
37 |
38 | ## Installation
39 |
40 | The released version of statcanR package is accessible through CRAN and
41 | devtools.
42 |
43 | ``` r
44 | install.packages("statcanR")
45 |
46 | install.packages("devtools")
47 | devtools::install_github('warint/statcanR')
48 | ```
49 |
50 | ## Example
51 |
52 | This section presents an example of how to use the `statcanR` R package
53 | and its function `statcan_data()` and `statcan_download_data()`.
54 |
55 | The following example is provided to illustrate how to use the
56 | functions. It consists in collecting some descriptive statistics about
57 | the Canadian Labour Force at the federal, provincial and industrial
58 | levels, on a monthly basis.
59 |
60 | With a simple web search ‘statistics canada wages by industry
61 | metropolitan area monthly’, the table number can easily be found on
62 | Statisitcs Canada’s webpage. Here is below a figure that illustrates
63 | this example, such as ‘27-10-0014-01’ for the Federal expenditures on
64 | science and technology, by socio-economic objectives.
65 |
66 | Once the table number is identified, the statcan_data() function is easy
67 | to use in order to collect the data, as following:
68 |
69 | ``` r
70 | library(statcanR)
71 | mydata <- statcan_data("27-10-0014-01","eng")
72 | ```
73 |
74 | For the `statcan_download_data()` function there is no difference on how
75 | to use it, the only difference is that this function allow you to
76 | download the data in a csv file on top of having the data in your
77 | environment.
78 |
79 | ``` r
80 | library(statcanR)
81 | mydata <- statcan_download_data("27-10-0014-01","eng")
82 | ```
83 |
84 | ### Video Tutorial
85 |
86 | Tutorial made by Professor Charles Saunders, Director of Master of
87 | Financial Economics Program at Western University
88 | [biography](https://economics.uwo.ca/people/faculty/saunders.html)
89 |
90 | Thanks!
91 |
92 |
63 | All functions64 | 65 | |
66 | |
|---|---|
| 67 | 68 | | 69 |statcanR |
70 |
| 71 | 72 | | 73 |statcanR download data function |
74 |
| 75 | 76 | | 77 |Searching function for statcanR |
78 |
Copyright (c) 2020 Thierry Warin ; Romain Le Duc
65 |Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
66 |The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
67 |THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
68 |
37 |
38 | For people less comfortable with R and to allow more people to have access to our package, we have also developed a Shiny application.Through the same logic present in our package, researchers can retrieve data from Statistics Canada.
39 |
40 | statcanR EploR is available [[here]]( https://warin.ca/shiny/statcanr/)
41 |
42 | ## Installation
43 |
44 | The released version of statcanR package is accessible through CRAN and devtools.
45 |
46 | ``` r
47 | install.packages("statcanR")
48 |
49 | install.packages("devtools")
50 | devtools::install_github('warint/statcanR')
51 | ```
52 |
53 | ## Example
54 |
55 | This section presents an example of how to use the `statcanR` R package and its function `statcan_data()` and `statcan_download_data()`.
56 |
57 | The following example is provided to illustrate how to use the functions. It consists in collecting some descriptive statistics about the Canadian Labour Force at the federal, provincial and industrial levels, on a monthly basis.
58 |
59 | With a simple web search 'statistics canada wages by industry metropolitan area monthly', the table number can easily be found on Statisitcs Canada's webpage. Here is below a figure that illustrates this example, such as '27-10-0014-01' for the Federal expenditures on science and technology, by socio-economic objectives.
60 |
61 | Once the table number is identified, the statcan_data() function is easy to use in order to collect the data, as following:
62 |
63 | ``` r
64 | library(statcanR)
65 | mydata <- statcan_data("27-10-0014-01","eng")
66 | ```
67 |
68 | For the `statcan_download_data()` function there is no difference on how to use it, the only difference is that this function allow you to download the data in a csv file on top of having the data in your environment.
69 |
70 | ``` r
71 | library(statcanR)
72 | mydata <- statcan_download_data("27-10-0014-01","eng")
73 | ```
74 |
75 | ### Video Tutorial
76 |
77 | Tutorial made by Professor Charles Saunders, Director of Master of Financial
78 | Economics Program at Western University [biography](https://economics.uwo.ca/people/faculty/saunders.html)
79 |
80 | Thanks!
81 |
82 | NEWS.md
61 | Easily connect to Statistics Canada's Web Data Service with R. Open economic data (formerly known as CANSIM tables, now identified by Product IDs (PID)) are accessible as a data frame, directly in the user's R environment.
66 |statcan_data(tableNumber, lang)The table number of the Statistics Canada data table
The language wanted
The output will be a data table representing the data associated with the chosen table number.
87 |The
91 | statcan_data() function has 2 arguments to fulfill to get data: tableNumber & lang.
The tableNumber argument simply refers to the table number of the Statistics Canada data table a user wants to collect, 94 | such as '27-10-0014-01' for the Federal expenditures on science and technology, by socio-economic objectives, as an example.
95 | 96 |To get the table number: https://www150.statcan.gc.ca/n1/en/type/data.
97 |The second argument, lang, refers to the language. As Canada is a bilingual country, Statistics Canada displays all the economic data in both languages. 98 | Therefore, users can choose to collect satistics data tables in French or English by setting the lang argument with c('fra', 'eng').
99 |#mydata <- statcan_data('27-10-0014-01', 'eng')
104 |
105 |
106 |
30 |
31 | For people less comfortable with R and to allow more people to have
32 | access to our package, we have also developed a Shiny
33 | application.Through the same logic present in our package, researchers
34 | can retrieve data from Statistics Canada.
35 |
36 | statcanR ExploR is available [\[here\]](https://warin.ca/shiny/statcanr/)
37 |
38 | ## Installation
39 |
40 | The released version of statcanR package is accessible through CRAN and
41 | devtools.
42 |
43 | ``` r
44 | install.packages("statcanR")
45 |
46 | install.packages("devtools")
47 | devtools::install_github('warint/statcanR')
48 | ```
49 |
50 | ## Example
51 |
52 | This section presents an example of how to use the `statcanR` R package
53 | and its functions: `statcan_search()`, `statcan_data()`, and
54 | `statcan_download_data()`.
55 |
56 | The following example is provided to illustrate how to use the
57 | functions. It consists in collecting some descriptive statistics about
58 | the Canadian Labour Force at the federal, provincial and industrial
59 | levels, on a monthly basis.
60 |
61 | To identify a relevant table, the statcan_search() function can be used
62 | by using a keyword or set of keywords and specifying the language in which the
63 | data will be presented (English or French). Below is an example that reveals
64 | the data tables we could be interested in:
65 |
66 | ``` r
67 | library(statcanR)
68 | statcan_search(c("federal","expenditures","objectives"),"eng")
69 | ```
70 |
71 | Notice that for each corresponding table, the unique table number identifier is
72 | also presented. Let's focus the first table out of the two that appear, which
73 | contains data on Federal expenditures on science and technology,
74 | by socio-economic objectives. Once this table number is identified
75 | (‘27-10-0014-01’), the statcan_data() function is easy
76 | to use in order to collect the data, as following:
77 |
78 | ``` r
79 | library(statcanR)
80 | mydata <- statcan_data("27-10-0014-01","eng")
81 | ```
82 |
83 | For the `statcan_download_data()` function there is no difference on how
84 | to use it, the only difference is that this function allow you to
85 | download the data in a csv file on top of having the data in your
86 | environment.
87 |
88 | ``` r
89 | library(statcanR)
90 | mydata <- statcan_download_data("27-10-0014-01","eng")
91 | ```
92 |
93 | ### Video Tutorial
94 |
95 | Tutorial made by Professor Charles Saunders, Director of Master of
96 | Financial Economics Program at Western University
97 | [biography](https://economics.uwo.ca/people/faculty/saunders.html)
98 |
99 | Thanks!
100 |
101 | statcanR download data function
66 |statcan_download_data(tableNumber, lang)The table number of the Statistics Canada data table
The language wanted
The output will be a data table and csv file representing the data associated with the chosen table number.
87 |mydata <- statcan_data('27-10-0014-01', 'eng')
92 | #> statcanR: downloading remote table.
93 | #> Warning: One or more parsing issues, call `problems()` on your data frame for details,
94 | #> e.g.:
95 | #> dat <- vroom(...)
96 | #> problems(dat)
97 | #> Rows: 61 Columns: 10
98 | #> ── Column specification ────────────────────────────────────────────────────────
99 | #> Delimiter: ","
100 | #> chr (8): Cube Title, Product Id, CANSIM Id, URL, Cube Notes, Archive Status...
101 | #> num (1): Total number of dimensions
102 | #> date (1): End Reference Period
103 | #>
104 | #> ℹ Use `spec()` to retrieve the full column specification for this data.
105 | #> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
106 | The statcan_search() function has 2 arguments to fulfill to find a database: keywords and lang.
69 | The keywords argument refers to words that can be found in either the title or the description of the database. For example, inserting the keywords
70 | "economy","export",and "link" will bring up the title, table id, description, and release date for databases that include these keywords. In this case, only one data table ("Supply and use tables, link-1997 level")
71 | would be returned as it is the only data table containing all three words.
statcan_search(keywords, lang)The words that appear in the title or description of the data table
The language wanted
The output will be the title, id, description, and release date of a table
93 |Easily connect to Statistics Canada's Web Data Service with R. Open economic data (formerly known as CANSIM tables, now identified by Product IDs (PID)) are accessible as a data frame, directly in the user's R environment.
97 |statcan_search(c("economy","export","link"),"eng")
102 |
103 | vignettes/statCanR.Rmd
90 | statCanR.RmdstatcanR provides the R user with a consistent process to identify 102 | and collect data from Statistics Canada’s data portal. It allows users 103 | to search for and access all of Statistics Canada’ open economic data 104 | (formerly known as CANSIM tables) now identified by product IDs (PID) by 105 | the new Statistics Canada’s Web Data Service: https://www150.statcan.gc.ca/n1/en/type/data.
106 |First, install statcanR:
111 |
112 | devtools::install_github("warint/statcanR")Next, call statcanR to make sure everything is installed 114 | correctly.
115 | 117 |This section presents an example of how to use the statcanR R package 122 | and its functions: statcan_search() and statcan_data().
123 |The following example is provided to illustrate how to use these 124 | functions. It consists of collecting some descriptive statistics about 125 | the Canadian labour force at the federal, provincial and industrial 126 | levels, on a monthly basis.
127 |To identify a relevant table, the statcan_search() function can be 128 | used by using a keyword or set of keywords, depending on the language 129 | the user wishes to search in (between English and French). Below is an 130 | example that reveals the data tables we could be interested in:
131 |
132 | # Identify with statcan_search() function
133 | statcan_search(c("federal","expenditures"),"eng")Notice that for each corresponding table, the unique table number 135 | identifier is also presented. Let’s focus on the first table out of the 136 | two that appear, which contains data on Federal expenditures on science 137 | and technology, by socio-economic objectives. Once this table number is 138 | identified (‘27-10-0014-01’), the statcan_data() function is easy to use 139 | in order to collect the data, as following:
140 |
141 | # Get data with statcan_data function
142 | mydata <- statcan_download_data("27-10-0014-01", "eng")Should you want the same data in French, just replace the argument at 144 | the end of the function with “fra”.
145 |This section describes the code architecture of the statcanR 149 | package.
150 |The statcan_search() function has 2 arguments:
151 |The ‘keywords’ argument is useful for identifying what kind of open 156 | economic data Statistics Canada has available for users. It can take as 157 | inputs either a single word (i.e., statcan_search(“expenditures”)) or a 158 | vector of multiple keywords (such as: 159 | statcan_search(c(“expenditures”,“federal”,“objectives”))).
160 |The second argument, ‘lang’, refers to the language. As Canada is a 161 | bilingual country, Statistics Canada displays all the economic data in 162 | both languages. Therefore, users can see what data tables are available 163 | in French or English by setting the lang argument with c(“fra”, 164 | “eng”).
165 |For all data tables that contain these words in either their title or 166 | description, the resulting output will include four pieces of 167 | information: The name of the data table, its unique table number 168 | identifier, the description, and the release date. In details, the 169 | statcan_download_data() function has 2 arguments:
170 |The ‘tableNumber’ argument simply refers to the table number of the 175 | Statistics Canada data table a user wants to collect, such as 176 | ‘27-10-0014-01’ for the Federal expenditures on science and technology, 177 | by socio-economic objectives, as an example.
178 |Just as in the statcan_search() command, ‘lang’, refers to the 179 | language, and can be set to “eng” for English or “fra” for French. 180 | Therefore, users can choose to collect satistics data tables in French 181 | or English by setting the lang argument with c(“fra”, “eng”).
182 |The code architecture of the statcan_download_data() function is as 183 | follows. The first step if to clean the table number in order to align 184 | it with the official typology of Statistics Canada’s Web Data Service. 185 | The second step is to create a temporary folder where all the data 186 | wrangling operations are performed. The third step is to check and 187 | select the correct language. The fourth step is to define the right URL 188 | where the actual data table is stored and then to download the .Zip 189 | container. The fifth step is to unzip the previously downloaded .Zip 190 | file to extract the data and metadata from the .csv files. The final 191 | step is to load the statistics data into a data frame called ‘data’ and 192 | to add the official table indicator name in the new ‘INDICATOR’ 193 | column.
194 |To be more precise about the functions, below is some further code 195 | description:
196 |keyword_regex <- paste0("(", paste(keywords, collapse = "|"), ")", collapse = ".*"):
201 | The first step is to paste (either individually or in vector format) the
202 | keywords the user is interested in
matches <- apply(statcan_data, 1, function(row) {
204 | all(sapply(keywords, function(x) {
205 | grepl(x, paste(as.character(row), collapse = " "))
206 | })) }): Next, the matching relies on the
207 | sapply function, which is applied to any character string in a row of a
208 | given observation. Therefore, the “keyword” could be found in the title
209 | column or the description column (technically, it could even be in the
210 | “release date” or “id” columns).
filtered_data <- statcan_data[matches, ] The
212 | third step takes only the data where matches were found.
datatable(filtered_data, options = list(pageLength = 10))
214 | Finally, the datatable command presents the filtered data in a clean
215 | table.
tableNumber <- gsub("-", "", substr(tableNumber, 1, nchar(tableNumber)-2)):
223 | The first step is to clean the table number provided by the user in
224 | order to collect the overall data table related to the specific
225 | indicator the user is interested in. In fact, each indicator is an
226 | excerpt of the overall table. In addition, following the new Statistics
227 | Canada’s Web Data Service, the URL typology defined by REST API is
228 | stored in csv files by table numbers without a ‘-’. Also, the last 2
229 | digits after the last ‘-’ refer to the specific excerpt of the original
230 | table. Therefore, following the Statistics Canada Web Data Service’s
231 | typology, the function first removes the ‘-’ and the last 2 digits from
232 | the user’s selection.
if(lang == "eng") | if(lang == "fra"):
234 | The second step is the ‘if statement’ to get the data in the correct
235 | language.
urlFra <- paste0("https://www150.statcan.gc.ca/n1/fr/tbl/csv/", tableNumber, "-fra.zip"):
237 | The third step is to create the correct url in order to download the
238 | respective .Zip file from the Statistics Canada Web Data
239 | Service.
download(urlFra, download_dir, mode = "wb"): The
241 | fourth step is a simple downloading function that extracts .Zip file and
242 | download it into a temporary folder.
unzip(zipfile = download_dir, exdir = unzip_dir):
244 | The fifth step consists in unzipping the .Zip file. The unzipping
245 | process gives access to two different .csv files, such as the overall
246 | data table and the metadata table.
data.table::fread(csv_file): The sixth step consists
248 | in loading the data table into a unique data frame. The fread() function
249 | from the data.table package is used for its higher performance.
data$INDICATOR <- as.character(0) and
251 | data$INDICATOR <- as.character(read.csv(paste0(path,"/temp/", tableNumber, "_MetaData.csv"))[1,1]):
252 | The seventh step of the statcan_data() function consists in adding the
253 | name of the table from the metadata table.
unlink(tempdir()): The eighth step deletes the
255 | temporary folder used to download, unzip, extract and load the
256 | data.
return(Data): Finally, the last step of the
258 | statcan_data() function allows to return the value into the user’s
259 | environment.