├── .Rbuildignore ├── .gitignore ├── tests ├── testthat.R └── testthat │ └── test-fodrdataset.R ├── inst └── images │ ├── Screenshot 2018-12-28 15-42-00.png │ └── Screenshot 2018-12-28 15-43-36.png ├── R ├── list_portals.R ├── fodr.R ├── doc_get_attachments.R ├── doc_portal_search.R ├── doc_get_records.R ├── FODRPortal.R ├── utils.R └── FODRDataset.R ├── NAMESPACE ├── man ├── list_portals.Rd ├── fodr_portal.Rd ├── fodr.Rd ├── fodr_dataset.Rd ├── get_attachments.Rd ├── FODRPortal.Rd ├── portal_search.Rd ├── FODRDataset.Rd └── get_records.Rd ├── fodr.Rproj ├── DESCRIPTION ├── LICENSE ├── README.Rmd └── README.md /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^README\.Rmd$ 2 | ^.*\.Rproj$ 3 | ^\.Rproj\.user$ 4 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | .Rprofile 6 | -------------------------------------------------------------------------------- /tests/testthat.R: -------------------------------------------------------------------------------- 1 | library(testthat) 2 | library(fodr) 3 | 4 | test_check("fodr") 5 | -------------------------------------------------------------------------------- /inst/images/Screenshot 2018-12-28 15-42-00.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Tutuchan/fodr/HEAD/inst/images/Screenshot 2018-12-28 15-42-00.png -------------------------------------------------------------------------------- /inst/images/Screenshot 2018-12-28 15-43-36.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Tutuchan/fodr/HEAD/inst/images/Screenshot 2018-12-28 15-43-36.png -------------------------------------------------------------------------------- /R/list_portals.R: -------------------------------------------------------------------------------- 1 | #' list the available portals 2 | #' 3 | #' This function displays a data.frame of the available portals. 4 | #' 5 | #' @export 6 | list_portals <- function(){ 7 | portals() 8 | } 9 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | export(FODRDataset) 4 | export(FODRPortal) 5 | export(fodr_dataset) 6 | export(fodr_portal) 7 | export(list_portals) 8 | importFrom(magrittr,"%$%") 9 | importFrom(magrittr,"%>%") 10 | -------------------------------------------------------------------------------- /man/list_portals.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/list_portals.R 3 | \name{list_portals} 4 | \alias{list_portals} 5 | \title{list the available portals} 6 | \usage{ 7 | list_portals() 8 | } 9 | \description{ 10 | This function displays a data.frame of the available portals. 11 | } 12 | -------------------------------------------------------------------------------- /fodr.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | 15 | BuildType: Package 16 | PackageUseDevtools: Yes 17 | PackageInstallArgs: --no-multiarch --with-keep.source 18 | PackageRoxygenize: rd,collate,namespace 19 | -------------------------------------------------------------------------------- /man/fodr_portal.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/FODRPortal.R 3 | \name{fodr_portal} 4 | \alias{fodr_portal} 5 | \title{initialize a portal} 6 | \usage{ 7 | fodr_portal(portal) 8 | } 9 | \arguments{ 10 | \item{portal}{a character} 11 | } 12 | \description{ 13 | A wrapper around \code{FODRPortal$new(portal)} for convenience. 14 | } 15 | \examples{ 16 | \dontrun{ 17 | portal <- fodr_portal("paris") 18 | portal 19 | } 20 | 21 | } 22 | -------------------------------------------------------------------------------- /R/fodr.R: -------------------------------------------------------------------------------- 1 | #' fodr: fetch French Open Data with R 2 | #' 3 | #' Fetch data from various French Open Data portals. Use 4 | #' \code{\link{fodr_dataset}} and \code{\link{fodr_portal}} to retrieve records. 5 | #' 6 | #' @section Constants: 7 | #' \describe{ 8 | #' \item{MAX_API_RECORDS = 10000}{the OpenDataSoft \code{search} API has a limit for the number of rows that can be returned} 9 | #' } 10 | #' 11 | #' @importFrom magrittr %>% %$% 12 | #' 13 | #' @name fodr 14 | globalVariables(c("lat", "lon", "polygon")) 15 | -------------------------------------------------------------------------------- /man/fodr.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/fodr.R 3 | \name{fodr} 4 | \alias{fodr} 5 | \title{fodr: fetch French Open Data with R} 6 | \description{ 7 | Fetch data from various French Open Data portals. Use 8 | \code{\link{fodr_dataset}} and \code{\link{fodr_portal}} to retrieve records. 9 | } 10 | \section{Constants}{ 11 | 12 | \describe{ 13 | \item{MAX_API_RECORDS = 10000}{the OpenDataSoft \code{search} API has a limit for the number of rows that can be returned} 14 | } 15 | } 16 | 17 | -------------------------------------------------------------------------------- /man/fodr_dataset.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/FODRDataset.R 3 | \name{fodr_dataset} 4 | \alias{fodr_dataset} 5 | \title{initialize a dataset} 6 | \usage{ 7 | fodr_dataset(portal, id) 8 | } 9 | \arguments{ 10 | \item{portal}{a character in \code{\link{list_portals}}} 11 | 12 | \item{id}{a character} 13 | } 14 | \description{ 15 | A wrapper around \code{FODRDataset$new(portal, id)} for convenience. 16 | } 17 | \examples{ 18 | \dontrun{ 19 | votes <- fodr_dataset("paris", "resultats-des-votes-budget-participatif-2016") 20 | votes 21 | } 22 | 23 | } 24 | -------------------------------------------------------------------------------- /R/doc_get_attachments.R: -------------------------------------------------------------------------------- 1 | #' fetch dataset attachments 2 | #' 3 | #' This method is used to retrieve attachments from a dataset. 4 | #' 5 | #' @param fname a character, the title of the file in the \code{attachments} element of the \code{info} field 6 | #' @param output a character, the destination file name, if NULL (the default) it will be the same as \code{fname} 7 | #' 8 | #' @name get_attachments 9 | #' @examples 10 | #' \donttest{ 11 | #' horodateurs <- fodr_dataset("paris", "horodateurs-transactions-de-paiement") 12 | #' horodateurs$get_attachments(fname = "NOTICE_horodateurs.pdf") 13 | #' } 14 | NULL 15 | -------------------------------------------------------------------------------- /tests/testthat/test-fodrdataset.R: -------------------------------------------------------------------------------- 1 | context("test-fodrdataset") 2 | 3 | test_that("fodr_dataset works", { 4 | expect_is(fodr_dataset("ods", "correspondance-code-insee-code-postal"), "FODRDataset") 5 | }) 6 | 7 | test_that("get_records works", { 8 | dts1 <- fodr_dataset("ods", "correspondance-code-insee-code-postal") 9 | dts2 <- fodr_dataset("ods", "geoflar-departements") 10 | expect_is(df <- dts1$get_records(nrows = 10), "tbl_df") 11 | expect_equal(nrow(df), 10) 12 | expect_true("geo_shape" %in% names(dts2$get_records(nrow = 10))) 13 | if (requireNamespace("sf", quietly = TRUE)) { 14 | expect_is(df[["geo_shape"]][[1]], "sfg") 15 | } 16 | }) -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: fodr 2 | Type: Package 3 | Title: Interact with French open data services 4 | Version: 0.2.0.9001 5 | Authors@R: as.person(c( 6 | "Pierre Formont [aut, cre]", 7 | "Denis Roussel [aut]" 8 | )) 9 | Description: Connect to various French open data providers and fetch the 10 | available data. 11 | URL: https://github.com/Tutuchan/fodr 12 | BugReports: https://github.com/Tutuchan/fodr/issues 13 | License: MIT + file LICENSE 14 | LazyData: TRUE 15 | Imports: 16 | dplyr, 17 | httr, 18 | jsonlite, 19 | magrittr, 20 | purrr, 21 | R6, 22 | tibble, 23 | tidyr 24 | RoxygenNote: 6.1.1 25 | Suggests: 26 | testthat, 27 | sf 28 | Date: 2018-12-28 29 | -------------------------------------------------------------------------------- /man/get_attachments.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/doc_get_attachments.R 3 | \name{get_attachments} 4 | \alias{get_attachments} 5 | \title{fetch dataset attachments} 6 | \arguments{ 7 | \item{fname}{a character, the title of the file in the \code{attachments} element of the \code{info} field} 8 | 9 | \item{output}{a character, the destination file name, if NULL (the default) it will be the same as \code{fname}} 10 | } 11 | \description{ 12 | This method is used to retrieve attachments from a dataset. 13 | } 14 | \examples{ 15 | \donttest{ 16 | horodateurs <- fodr_dataset("paris", "horodateurs-transactions-de-paiement") 17 | horodateurs$get_attachments(fname = "NOTICE_horodateurs.pdf") 18 | } 19 | } 20 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License 2 | 3 | Copyright (c) 2016 Pierre Formont 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 6 | 7 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 10 | -------------------------------------------------------------------------------- /man/FODRPortal.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/FODRPortal.R 3 | \docType{class} 4 | \name{FODRPortal} 5 | \alias{FODRPortal} 6 | \title{main portal class} 7 | \format{An object of class \code{R6ClassGenerator} of length 24.} 8 | \value{ 9 | An object of class \code{\link{FODRPortal}} with methods designed to retrieve datasets from an open data portal. 10 | } 11 | \description{ 12 | This is the entry point to retrieve datasets from a portal. Initialize a \code{FODRPortal} with a \code{portal} using the \code{\link{fodr_portal}} wrapper. 13 | } 14 | \section{Fields}{ 15 | 16 | \describe{ 17 | \item{\code{portal}}{a character, the portal name} 18 | 19 | \item{\code{data}}{a list of \code{\link{FODRDataset}} objects} 20 | 21 | \item{\code{facets}}{a character vector of variables that can be used to filter results. For a 22 | \code{\link{FODRPortal}}, these are constant.} 23 | 24 | \item{\code{sortables}}{a character vector, indicates on which fields sorting is allowed. For a 25 | \code{\link{FODRPortal}}, these are constant.} 26 | 27 | \item{\code{themes}}{a character vector, the unique themes datasets on the portals can be associated with} 28 | }} 29 | 30 | \section{Methods}{ 31 | 32 | \describe{ 33 | \item{\code{\link[=portal_search]{search}}}{This method retrieves datasets from the portal.}} 34 | } 35 | 36 | \keyword{datasets} 37 | -------------------------------------------------------------------------------- /R/doc_portal_search.R: -------------------------------------------------------------------------------- 1 | #' fetch datasets on a portal 2 | #' 3 | #' This method is used to retrieve datasets from a portal. 4 | #' 5 | #' \code{refine} and \code{exclude}, if set, must be named lists where the names are the 6 | #' facets to use and the values of the list, the values to pick or exclude. 7 | #' 8 | #' \code{sort} takes a character in the \code{sortables} element of the portal object and sorts 9 | #' the results according to its value. Add a \code{-} in front in order to sort in descending 10 | #' order (e.g. \code{sort = "-commune"}). 11 | #' 12 | #' \code{q} is used to perform a full text-search in all elements of the dataset. To search for all 13 | #' datasets containing the word "Paris", use \code{q = "Paris"}. See 14 | #' \href{https://docs.opendatasoft.com/en/api/query_language_and_geo_filtering.html#query-language}{here} for more information. 15 | #' 16 | #' \code{lang} can be set to use language-specific functions on the elements passed to the \code{q} 17 | #' parameter but is not implemented yet. 18 | #' 19 | #' \code{theme} can be set to filter only datasets with a specific theme. 20 | #' 21 | #' @param nrows an integer, indicates the number of records to fetch (defaults to NULL, i.e. all matching records are fetched) 22 | #' @param refine a named list 23 | #' @param exclude a named list 24 | #' @param sort a character 25 | #' @param q a character, used to do full-text search 26 | #' @param lang a character, the language used in the \code{q} parameter 27 | #' @param theme a character, one of the themes of the portal 28 | #' @name portal_search 29 | #' @examples 30 | #' \donttest{ 31 | #' portal$search(nrows = NULL, refine = NULL, exclude = NULL, sort = NULL, 32 | #' q = NULL, lang = NULL, theme = NULL) 33 | #' } 34 | NULL 35 | -------------------------------------------------------------------------------- /man/portal_search.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/doc_portal_search.R 3 | \name{portal_search} 4 | \alias{portal_search} 5 | \title{fetch datasets on a portal} 6 | \arguments{ 7 | \item{nrows}{an integer, indicates the number of records to fetch (defaults to NULL, i.e. all matching records are fetched)} 8 | 9 | \item{refine}{a named list} 10 | 11 | \item{exclude}{a named list} 12 | 13 | \item{sort}{a character} 14 | 15 | \item{q}{a character, used to do full-text search} 16 | 17 | \item{lang}{a character, the language used in the \code{q} parameter} 18 | 19 | \item{theme}{a character, one of the themes of the portal} 20 | } 21 | \description{ 22 | This method is used to retrieve datasets from a portal. 23 | } 24 | \details{ 25 | \code{refine} and \code{exclude}, if set, must be named lists where the names are the 26 | facets to use and the values of the list, the values to pick or exclude. 27 | 28 | \code{sort} takes a character in the \code{sortables} element of the portal object and sorts 29 | the results according to its value. Add a \code{-} in front in order to sort in descending 30 | order (e.g. \code{sort = "-commune"}). 31 | 32 | \code{q} is used to perform a full text-search in all elements of the dataset. To search for all 33 | datasets containing the word "Paris", use \code{q = "Paris"}. See 34 | \href{https://docs.opendatasoft.com/en/api/query_language_and_geo_filtering.html#query-language}{here} for more information. 35 | 36 | \code{lang} can be set to use language-specific functions on the elements passed to the \code{q} 37 | parameter but is not implemented yet. 38 | 39 | \code{theme} can be set to filter only datasets with a specific theme. 40 | } 41 | \examples{ 42 | \donttest{ 43 | portal$search(nrows = NULL, refine = NULL, exclude = NULL, sort = NULL, 44 | q = NULL, lang = NULL, theme = NULL) 45 | } 46 | } 47 | -------------------------------------------------------------------------------- /man/FODRDataset.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/FODRDataset.R 3 | \docType{class} 4 | \name{FODRDataset} 5 | \alias{FODRDataset} 6 | \title{main dataset class} 7 | \format{An object of class \code{R6ClassGenerator} of length 24.} 8 | \value{ 9 | An object of class \code{\link{FODRDataset}} with methods designed to retrieve data from an open dataset. 10 | } 11 | \description{ 12 | This is the entry point to retrieve records from a dataset. 13 | Initialize a \code{FODRDataset} with a \code{portal} and an \code{id} using the \code{\link{fodr_dataset}} wrapper. 14 | } 15 | \section{Fields}{ 16 | 17 | \describe{ 18 | \item{\code{portal}}{a character, must be one of the available portals} 19 | 20 | \item{\code{data}}{a data.frame returned by the \code{\link[=get_records]{get_records}} method} 21 | 22 | \item{\code{fields}}{a character vector} 23 | 24 | \item{\code{facets}}{a character vector or variables that can be used to filter results} 25 | 26 | \item{\code{id}}{a character, the dataset id} 27 | 28 | \item{\code{info}}{a list of five elements: 29 | \itemize{ 30 | \item features: a character vector of available services on this dataset, \emph{unused} 31 | \item metas: a list of meta-information about the dataset (publisher, language, theme, etc.) 32 | \item attachments: a list of downloadable files related to this dataset, if any, \emph{not downloadable from R} 33 | \item alternative_exports: \emph{unknown purpose} 34 | \item billing_plans: \emph{unknown purpose} 35 | }} 36 | 37 | \item{\code{sortables}}{a character vector containing a subset of \strong{fields}, indicates on which fields sorting is allowed} 38 | 39 | \item{\code{url}}{a character, the actual url sent to the API} 40 | }} 41 | 42 | \section{Methods}{ 43 | 44 | \describe{ 45 | \item{\code{\link{get_attachments}}}{This method retrieves attachments from the dataset.} 46 | \item{\code{\link{get_records}}}{This method retrieves records from the dataset.} 47 | } 48 | } 49 | 50 | \keyword{datasets} 51 | -------------------------------------------------------------------------------- /R/doc_get_records.R: -------------------------------------------------------------------------------- 1 | #' fetch dataset records 2 | #' 3 | #' This method is used to retrieve records for a specific dataset. If the number of rows to return 4 | #' is higher than the maximum allowed by the \code{search} API (see the constants section in \code{\link{fodr}}), 5 | #' the \code{download} API is used. 6 | #' 7 | #' \code{refine} and \code{exclude}, if set, must be named lists where the names are the 8 | #' facets to use and the values of the list, the values to pick or exclude. For example, 9 | #' if a dataset has the \code{type_dossier} facet and you want to keep only the \code{DP} 10 | #' types, you should set \code{refine = list(type_dossier = "DP")}. 11 | #' 12 | #' \code{sort} takes a character in the \code{sortables} element of the dataset and sorts 13 | #' the results according to its value. Add a \code{-} in front in order to sort in descending 14 | #' order (e.g. \code{sort = "-commune"}). 15 | #' 16 | #' \code{q} is used to perform a full text-search in all elements of the dataset. To search for all 17 | #' records containing the word "Paris", use \code{q = "Paris"}. See 18 | #' \href{https://docs.opendatasoft.com/en/api/query_language_and_geo_filtering.html#query-language}{here} for more information. 19 | #' 20 | #' \code{lang} can be set to use language-specific functions on the elements passed to the \code{q} 21 | #' parameter but is not implemented yet. 22 | #' 23 | #' \code{geofilter.distance} can be used to retrieve only the records that are within the 24 | #' specified distance from the specified point, if applicable. 25 | #' 26 | #' \code{geofilter.polygon} can be used to retrieve only the records that are within the 27 | #' specified polygon, if applicable. 28 | #' 29 | #' @param nrows an integer, indicates the number of records to fetch (defaults to 10) 30 | #' @param refine a named list 31 | #' @param exclude a named list 32 | #' @param sort a character 33 | #' @param q a character, used to do full-text search 34 | #' @param lang a character, the language used in the \code{q} parameter 35 | #' @param geofilter.distance a numeric vector of three elements in the \code{(latitude, longitude, distance (in meters))} 36 | #' format (e.g. \code{c(48.57, 2.24, 500)}) 37 | #' @param geofilter.polygon a data.frame with two columns named \code{lat} and \code{lon} 38 | #' @param debug a logical, if TRUE, prints the url sent to the portal 39 | #' @param quiet a logical, if FALSE, information will be printed when using the \code{download} API 40 | #' 41 | #' @examples 42 | #' \donttest{ 43 | #' votes <- fodr_dataset("paris", "resultats-des-votes-budget-participatif-2016") 44 | #' votes$get_records() 45 | #' } 46 | #' @name get_records 47 | NULL 48 | -------------------------------------------------------------------------------- /man/get_records.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/doc_get_records.R 3 | \name{get_records} 4 | \alias{get_records} 5 | \title{fetch dataset records} 6 | \arguments{ 7 | \item{nrows}{an integer, indicates the number of records to fetch (defaults to 10)} 8 | 9 | \item{refine}{a named list} 10 | 11 | \item{exclude}{a named list} 12 | 13 | \item{sort}{a character} 14 | 15 | \item{q}{a character, used to do full-text search} 16 | 17 | \item{lang}{a character, the language used in the \code{q} parameter} 18 | 19 | \item{geofilter.distance}{a numeric vector of three elements in the \code{(latitude, longitude, distance (in meters))} 20 | format (e.g. \code{c(48.57, 2.24, 500)})} 21 | 22 | \item{geofilter.polygon}{a data.frame with two columns named \code{lat} and \code{lon}} 23 | 24 | \item{debug}{a logical, if TRUE, prints the url sent to the portal} 25 | 26 | \item{quiet}{a logical, if FALSE, information will be printed when using the \code{download} API} 27 | } 28 | \description{ 29 | This method is used to retrieve records for a specific dataset. If the number of rows to return 30 | is higher than the maximum allowed by the \code{search} API (see the constants section in \code{\link{fodr}}), 31 | the \code{download} API is used. 32 | } 33 | \details{ 34 | \code{refine} and \code{exclude}, if set, must be named lists where the names are the 35 | facets to use and the values of the list, the values to pick or exclude. For example, 36 | if a dataset has the \code{type_dossier} facet and you want to keep only the \code{DP} 37 | types, you should set \code{refine = list(type_dossier = "DP")}. 38 | 39 | \code{sort} takes a character in the \code{sortables} element of the dataset and sorts 40 | the results according to its value. Add a \code{-} in front in order to sort in descending 41 | order (e.g. \code{sort = "-commune"}). 42 | 43 | \code{q} is used to perform a full text-search in all elements of the dataset. To search for all 44 | records containing the word "Paris", use \code{q = "Paris"}. See 45 | \href{https://docs.opendatasoft.com/en/api/query_language_and_geo_filtering.html#query-language}{here} for more information. 46 | 47 | \code{lang} can be set to use language-specific functions on the elements passed to the \code{q} 48 | parameter but is not implemented yet. 49 | 50 | \code{geofilter.distance} can be used to retrieve only the records that are within the 51 | specified distance from the specified point, if applicable. 52 | 53 | \code{geofilter.polygon} can be used to retrieve only the records that are within the 54 | specified polygon, if applicable. 55 | } 56 | \examples{ 57 | \donttest{ 58 | votes <- fodr_dataset("paris", "resultats-des-votes-budget-participatif-2016") 59 | votes$get_records() 60 | } 61 | } 62 | -------------------------------------------------------------------------------- /R/FODRPortal.R: -------------------------------------------------------------------------------- 1 | #' main portal class 2 | #' 3 | #' This is the entry point to retrieve datasets from a portal. Initialize a \code{FODRPortal} with a \code{portal} using the \code{\link{fodr_portal}} wrapper. 4 | #' 5 | #' @docType class 6 | #' @field portal a character, the portal name 7 | #' @field data a list of \code{\link{FODRDataset}} objects 8 | #' @field facets a character vector of variables that can be used to filter results. For a 9 | #' \code{\link{FODRPortal}}, these are constant. 10 | #' @field sortables a character vector, indicates on which fields sorting is allowed. For a 11 | #' \code{\link{FODRPortal}}, these are constant. 12 | #' @field themes a character vector, the unique themes datasets on the portals can be associated with 13 | #' 14 | #' @return An object of class \code{\link{FODRPortal}} with methods designed to retrieve datasets from an open data portal. 15 | #' 16 | #' @section Methods: 17 | #' \describe{ 18 | #' \item{\code{\link[=portal_search]{search}}}{This method retrieves datasets from the portal.}} 19 | #' @usage NULL 20 | #' @export 21 | FODRPortal <- R6::R6Class( 22 | "FODRPortal", 23 | public = list( 24 | data = NULL, 25 | portal = NULL, 26 | facets = NULL, 27 | n_datasets = NULL, 28 | sortables = NULL, 29 | themes = NULL, 30 | initialize = function(portal){ 31 | self$portal <- portal 32 | self$n_datasets <- search_datasets(portal = self$portal)$data$nhits 33 | self$facets <- datasets_facets() 34 | self$sortables <- datasets_sortables() 35 | self$themes <- private$get_themes() 36 | }, 37 | search = function( 38 | nrows = NULL, 39 | refine = NULL, 40 | exclude = NULL, 41 | sort = NULL, 42 | q = NULL, 43 | lang = NULL, 44 | theme = NULL 45 | ) { 46 | if (is.null(nrows)) nrows <- self$n_datasets 47 | 48 | n_datasets <- if (is.null(theme)) nrows else 49 | private$themes_freq$Freq[private$themes_freq$themes == theme] 50 | 51 | listDatasets <- search_datasets( 52 | portal = self$portal, 53 | nrows, refine, 54 | exclude, 55 | sort, 56 | q, 57 | lang 58 | )$data$datasets 59 | 60 | cat(paste(length(listDatasets), "datasets found ..."), "\n") 61 | 62 | self$data <- lapply(listDatasets, function(dataset){ 63 | if (!is.null(theme)) if (all(dataset$metas$theme != theme)) return(NULL) 64 | FODRDataset$new(self$portal, dataset$datasetid) 65 | }) %>% 66 | clean_list() 67 | self$data 68 | }, 69 | print = function() { 70 | cat("FODRPortal object\n") 71 | cat("---------------------------------------------------------------\n") 72 | cat(paste("Portal:", self$portal, "\n")) 73 | cat(paste("Number of datasets:", self$n_datasets, "\n")) 74 | cat(paste("Themes:\n -", paste(self$themes, collapse = "\n - "), "\n")) 75 | cat("---------------------------------------------------------------\n") 76 | } 77 | ), 78 | private = list( 79 | get_themes = function(){ 80 | themes <- lapply( 81 | search_datasets( 82 | portal = self$portal, 83 | nrows = self$n_datasets 84 | )$data$datasets, function(dataset) { 85 | dataset$metas$theme 86 | }) %>% 87 | unlist() 88 | private$themes_freq <- as.data.frame(table(themes)) 89 | themes %>% 90 | unique %>% 91 | sort 92 | }, 93 | themes_freq = NULL 94 | ) 95 | ) 96 | 97 | 98 | #' @title initialize a portal 99 | #' 100 | #' @description A wrapper around \code{FODRPortal$new(portal)} for convenience. 101 | #' 102 | #' @param portal a character 103 | #' 104 | #' @examples 105 | #' \dontrun{ 106 | #' portal <- fodr_portal("paris") 107 | #' portal 108 | #' } 109 | #' 110 | #' @name fodr_portal 111 | #' 112 | #' @export 113 | fodr_portal <- function(portal){ 114 | FODRPortal$new(portal) 115 | } -------------------------------------------------------------------------------- /README.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | output: github_document 3 | --- 4 | 5 | 6 | 7 | ```{r setup, include = FALSE} 8 | knitr::opts_chunk$set( 9 | collapse = TRUE, 10 | comment = "#>", 11 | fig.path = "man/figures/README-", 12 | out.width = "100%" 13 | ) 14 | ``` 15 | # fodr 16 | 17 | `fodr` is an R package to access various French Open Data portals. 18 | 19 | Many of those portals use the OpenDataSoft platform to make their data available and this platform can be accessed with the [OpenDataSoft APIs](https://docs.opendatasoft.com/en/api/catalog_api.html). 20 | 21 | `fodr` wraps this API to make it easier to retrieve data directly in R. 22 | 23 | ## Installation 24 | 25 | The `devtools` package is needed to install `fodr`: 26 | 27 | ```r 28 | devtools::install_github("tutuchan/fodr") 29 | ``` 30 | 31 | ## Portals 32 | 33 | #### Available portals 34 | 35 | The following portals are currently available with `fodr`: 36 | 37 | ```{r} 38 | library(fodr) 39 | list_portals() 40 | ``` 41 | 42 | The portals have been identified from the [Open Data Inception](http://opendatainception.io) website. Many of these portals do not actually contain data and a large number of them are available *via* the ArcGIS Open platform. This API will be supported in a future release. 43 | 44 | #### Retrieve datasets on a portal 45 | 46 | Use the `fodr_portal` function with the corresponding **fodr slug** to create a `FODRPortal` object: 47 | 48 | ```{r} 49 | library(fodr) 50 | portal <- fodr_portal("paris") 51 | portal 52 | ``` 53 | 54 | The `search` method allows you to find datasets on this portal (see the function documentation for more information). By default, and contrary to the Open Data Soft API, all elements satisfying the search are returned. 55 | 56 | Let's look at the datasets that contain the word *vote*: 57 | 58 | ```{r} 59 | list_datasets <- portal$search(q = "vote") 60 | list_datasets[[1]] 61 | ``` 62 | 63 | #### Retrieve datasets by theme 64 | 65 | ```{r} 66 | library(magrittr) 67 | list_culture_datasets <- portal$search(theme = "Culture") 68 | lapply(list_culture_datasets, function(dataset) dataset$info$metas$theme) %>% 69 | unlist() %>% 70 | unique()%>% 71 | sort() 72 | ``` 73 | 74 | ## Datasets 75 | 76 | #### Retrieve records on a dataset 77 | 78 | ```{r} 79 | dts <- list_datasets[[1]] 80 | dts$get_records() 81 | ``` 82 | 83 | #### Filter records 84 | 85 | ```{r} 86 | dts <- list_datasets[[1]] 87 | dts$get_records(nrows = dts$info$metas$records_count, refine = list(validite = "oui")) 88 | ``` 89 | 90 | #### Download attachments 91 | 92 | Some datasets have attached files in a pdf, docx, xlsx, ... format. These can be retrieved using the `get_attachments` method: 93 | 94 | ```{r, eval=FALSE} 95 | dts <- fodr_dataset("erdf", "coefficients-des-profils") 96 | dts$get_attachments("DictionnaireProfils_1JUIL18.xlsb") 97 | ``` 98 | 99 | ## GIS data 100 | 101 | Some datasets have geographical information on each data point. 102 | 103 | For these datasets, two additional columns will be present when fetching records: `lng` and `lat` that correspond to the longitude and latitude of the coordinates of the data point. Additionally, if there are shapes associated to data points (polygons or linestrings for example), they will be stored in the `geo_shape` column either as a list of `data.frame`s with the same two columns `lng` and `lat` or in a list `sf` objects if `sf` package is already installed. 104 | The latter allows a straigtforward way to plot geometric data. 105 | 106 | See for example the following dataset: 107 | 108 | ```{r} 109 | dts <- fodr_dataset("stif", "gares-routieres-idf") 110 | dfRecords <- dts$get_records(nrows = 10) 111 | dfRecords 112 | ``` 113 | 114 | You can then use [leaflet](http://rstudio.github.io/leaflet/) to easily plot this data on a map, either using data points: 115 | 116 | ```{r, eval=FALSE} 117 | library(leaflet) 118 | leaflet(dfRecords) %>% 119 | addProviderTiles("CartoDB.Positron") %>% 120 | addMarkers(popup = ~gare_nom) 121 | ``` 122 | 123 | ![leaflet example](inst/images/Screenshot 2018-12-28 15-42-00.png?raw=true "Screenshot leaflet example") 124 | 125 | or using the `geo_shape` column: 126 | 127 | ```{r, eval=FALSE} 128 | library(sf) 129 | dfRecords[1,] %>% 130 | st_as_sf() %>% 131 | leaflet() %>% 132 | addProviderTiles("CartoDB.Positron") %>% 133 | addPolygons(label = ~gare_nom) 134 | ``` 135 | 136 | ![leaflet example](inst/images/Screenshot 2018-12-28 15-43-36.png?raw=true "Screenshot leaflet example") 137 | 138 | ## License of the data 139 | 140 | Most of the data is available under the [Open Licence](https://www.etalab.gouv.fr/licence-ouverte-open-licence) ([english PDF version](https://www.etalab.gouv.fr/wp-content/uploads/2014/05/Open_Licence.pdf)) but double check if you are unsure. 141 | 142 | ## TODO 143 | 144 | + handle portals that require authentification, 145 | + handle ArcGIS-powered portals, 146 | + possibly handle navitia.io portals, 147 | + ? 148 | -------------------------------------------------------------------------------- /R/utils.R: -------------------------------------------------------------------------------- 1 | from_json <- function(url, ...) { 2 | jsonlite::fromJSON( 3 | url, 4 | ..., 5 | simplifyVector = FALSE, 6 | flatten = FALSE 7 | ) 8 | } 9 | 10 | # Get base portal API url 11 | get_portal_url <- function(portal, endpoint){ 12 | stopifnot(portal %in% portals()$portals) 13 | paste0(get_base_url(portal), "/api/", endpoint, "/1.0/") 14 | } 15 | 16 | # Search for datasets on a portal 17 | search_datasets <- function( 18 | portal, 19 | nrows = NULL, 20 | refine = NULL, 21 | exclude = NULL, 22 | sort = NULL, 23 | q = NULL, 24 | lang = NULL 25 | ) { 26 | url <- get_portal_url(portal, "datasets") %>% 27 | paste0("search/") %>% 28 | add_parameters_to_url( 29 | nrows = nrows, 30 | refine = refine, 31 | exclude = exclude, 32 | sort = sort, 33 | q = q, 34 | lang = lang 35 | ) 36 | list( 37 | data = from_json(url), 38 | url = url 39 | ) 40 | } 41 | 42 | # Get dataset meta data 43 | get_dataset <- function(portal, id) { 44 | url <- get_portal_url(portal, "datasets") %>% 45 | paste0(id, "/") 46 | list( 47 | data = from_json(url), 48 | url = url 49 | ) 50 | } 51 | 52 | get_facets <- function(fields){ 53 | lapply(fields, function(field) { 54 | if (!"annotations" %in% names(field)) return(NULL) else { 55 | annotations <- field$annotations 56 | res <- lapply(annotations, function(annotation) { 57 | annotation$name == "facet" 58 | }) %>% 59 | unlist() %>% 60 | any() 61 | if (res) field$name else NULL 62 | } 63 | }) %>% unlist() 64 | } 65 | 66 | get_sortables <- function(fields){ 67 | lapply(fields, function(field) { 68 | if (field$type == "int") field$name else NULL 69 | }) %>% unlist() 70 | } 71 | 72 | # Transform Polygon elements in the geo_shape column 73 | tidy_polygon <- function(x) { 74 | y <- x[[1]] %>% 75 | purrr::transpose() 76 | if (requireNamespace("sf", quietly = TRUE)) { 77 | mat <- matrix(unlist(y), ncol = 2) 78 | colnames(mat) <- c("lng", "lat") 79 | sf::st_polygon(list(mat)) 80 | } else { 81 | tibble::tibble( 82 | lng = unlist(y[[1]]), 83 | lat = unlist(y[[2]]) 84 | ) 85 | } 86 | } 87 | 88 | # Transform MultiPolygon elements in the geo_shape column 89 | tidy_multipolygon <- function(x) { 90 | y <- lapply(x, function(xx) purrr::transpose(xx[[1]])) 91 | if (requireNamespace("sf", quietly = TRUE)) { 92 | mat <- lapply(y, function(yy) { 93 | res <- matrix(unlist(yy), ncol = 2) 94 | colnames(res) <- c("lng", "lat") 95 | res 96 | }) 97 | sf::st_polygon(mat) 98 | } else { 99 | lapply(y, function(yy) { 100 | tibble::tibble( 101 | lng = unlist(yy[[1]]), 102 | lat = unlist(yy[[2]]) 103 | ) 104 | }) 105 | } 106 | } 107 | 108 | # Transform LineString elements in the geo_shape column 109 | tidy_line_string <- function(x) { 110 | y <- x %>% 111 | purrr::transpose() 112 | if (requireNamespace("sf", quietly = TRUE)) { 113 | x <- matrix(unlist(y), ncol = 2) 114 | colnames(x) <- c("lng", "lat") 115 | sf::st_linestring(x) 116 | } else { 117 | tibble::tibble( 118 | lng = unlist(y[[1]]), 119 | lat = unlist(y[[2]]) 120 | ) 121 | } 122 | } 123 | 124 | # Transform LineString elements in the geo_shape column 125 | tidy_multiline_string <- function(x) { 126 | y <- lapply(x, function(xx) purrr::transpose(xx)) 127 | if (requireNamespace("sf", quietly = TRUE)) { 128 | mat <- lapply(y, function(yy) { 129 | res <- matrix(unlist(yy), ncol = 2) 130 | colnames(res) <- c("lng", "lat") 131 | res 132 | }) 133 | sf::st_multilinestring(list(mat)) 134 | } else { 135 | lapply(y, function(yy) { 136 | tibble::tibble( 137 | lng = unlist(yy[[1]]), 138 | lat = unlist(yy[[2]]) 139 | ) 140 | }) 141 | } 142 | } 143 | 144 | # Add additional parameters to the url 145 | add_parameters_to_url <- function( 146 | url, 147 | nrows = NULL, 148 | refine = NULL, 149 | exclude = NULL, 150 | sort = NULL, 151 | q = NULL, 152 | lang = NULL, 153 | geofilter.distance = NULL, 154 | geofilter.polygon = NULL, 155 | format = NULL, 156 | callback = NULL, 157 | debug = FALSE, 158 | ... 159 | ) { 160 | if ( 161 | all( 162 | is.null(nrows), 163 | is.null(refine), 164 | is.null(exclude), 165 | is.null(sort), 166 | is.null(q), 167 | is.null(lang), 168 | is.null(geofilter.distance), 169 | is.null(geofilter.polygon), 170 | is.null(format), 171 | is.null(callback) 172 | ) 173 | ) return(url) else additional_url <- c() 174 | 175 | # Handle nrows 176 | if (!is.null(nrows)) additional_url <- c(additional_url, rows = nrows) 177 | 178 | # Handle refine 179 | if (!is.null(refine)) for (i in seq_along(refine)) { 180 | facet <- names(refine)[i] 181 | val <- refine[[i]] 182 | names(val) <- paste0("refine.", facet) 183 | additional_url <- c(additional_url, facet = facet, val) 184 | } 185 | 186 | # Handle exclude 187 | if (!is.null(exclude)) for (i in seq_along(exclude)) { 188 | facet <- names(exclude)[i] 189 | val <- exclude[[i]] 190 | names(val) <- paste0("exclude.", facet) 191 | additional_url <- c(additional_url, facet = facet, val) 192 | } 193 | 194 | # Handle sort 195 | if (!is.null(sort)) additional_url <- c(additional_url, sort = sort) 196 | 197 | # Handle q 198 | if (!is.null(q)) additional_url <- c(additional_url, q = q) 199 | 200 | # Handle geofilter.distance 201 | if (!is.null(geofilter.distance)) additional_url <- c( 202 | additional_url, 203 | geofilter.distance = toString(geofilter.distance) 204 | ) 205 | 206 | # Handle geofilter.polygon 207 | if (!is.null(geofilter.polygon)) { 208 | geofilter.polygon <- 209 | tidyr::unite(geofilter.polygon, polygon, lat, lon, sep = ",") %>% 210 | dplyr::mutate(polygon = paste0("(", polygon, ")")) %$% 211 | polygon %>% 212 | toString() 213 | additional_url <- c( 214 | additional_url, 215 | geofilter.polygon = geofilter.polygon 216 | ) 217 | } 218 | 219 | # Handle format 220 | if (!is.null(format)) additional_url <- c(additional_url, format = format) 221 | 222 | sep <- if (grepl("?", url, fixed = TRUE)) "&" else "?" 223 | url <- paste0( 224 | url, 225 | sep, 226 | paste( 227 | names(additional_url), 228 | additional_url, 229 | sep = "=", 230 | collapse = "&") 231 | ) 232 | if (debug) print(url) 233 | url 234 | } 235 | 236 | clean_list <- function(l) { 237 | l[!vapply(l, is.null, logical(1))] 238 | } 239 | 240 | 241 | get_base_url <- function(portal){ 242 | (portals() %>% 243 | dplyr::filter(portals == portal) 244 | )$base_urls 245 | } 246 | 247 | # Constants ------------------------------------------------------------------------------------------------------- 248 | 249 | portals <- function(){ 250 | tibble::tibble( 251 | name = c( 252 | "RATP", 253 | "R\u00E9gion Ile-de-France", 254 | "Infogreffe", 255 | "Toulouse M\u00E9tropole", 256 | "STAR", 257 | "Issy-les-Moulineaux", 258 | "STIF", 259 | "Paris", 260 | "Tourisme Alpes-Maritimes", 261 | "Tourisme Pas-de-Calais", 262 | "D\u00E9partement des Hauts-de-Seine", 263 | "Minist\u00E8re de l'Education Nationale, de l'Enseignement sup\u00E9rieur et de la Recherche", 264 | "ERDF", 265 | "RTE", 266 | "OpenDataSoft Public" 267 | ), 268 | portals = c( 269 | "ratp", 270 | "iledefrance", 271 | "infogreffe", 272 | "toulouse", 273 | "star", 274 | "issy", 275 | "stif", 276 | "paris", 277 | "04", 278 | "62", 279 | "92", 280 | "enesr", 281 | "erdf", 282 | "rte", 283 | "ods" 284 | ), 285 | base_urls = c( 286 | "http://data.ratp.fr", 287 | "http://data.iledefrance.fr", 288 | "http://datainfogreffe.fr", 289 | "https://data.toulouse-metropole.fr", 290 | "https://data.explore.star.fr", 291 | "http://data.issy.com", 292 | "http://opendata.stif.info", 293 | "http://opendata.paris.fr", 294 | "http://tourisme04.opendatasoft.com", 295 | "http://tourisme62.opendatasoft.com", 296 | "https://opendata.hauts-de-seine.fr", 297 | "http://data.enseignementsup-recherche.gouv.fr", 298 | "https://data.erdf.fr", 299 | "https://opendata.rte-france.com", 300 | "https://public.opendatasoft.com" 301 | ) 302 | ) 303 | } 304 | 305 | datasets_facets <- function(){ 306 | c("modified", 307 | "published", 308 | "issued", 309 | "accrualperiodicity", 310 | "language", 311 | "license", 312 | "granularity", 313 | "dataquality", 314 | "theme", 315 | "keyword", 316 | "created", 317 | "creator", 318 | "contributor") 319 | } 320 | 321 | datasets_sortables <- function(){ 322 | c("modified", 323 | "issued", 324 | "created") 325 | } 326 | 327 | MAX_API_RECORDS <- 10000 328 | -------------------------------------------------------------------------------- /R/FODRDataset.R: -------------------------------------------------------------------------------- 1 | #' main dataset class 2 | #' 3 | #' @description This is the entry point to retrieve records from a dataset. 4 | #' Initialize a \code{FODRDataset} with a \code{portal} and an \code{id} using the \code{\link{fodr_dataset}} wrapper. 5 | #' 6 | #' @docType class 7 | #' 8 | #' @field portal a character, must be one of the available portals 9 | #' @field data a data.frame returned by the \code{\link[=get_records]{get_records}} method 10 | #' @field fields a character vector 11 | #' @field facets a character vector or variables that can be used to filter results 12 | #' @field id a character, the dataset id 13 | #' @field info a list of five elements: 14 | #' \itemize{ 15 | #' \item features: a character vector of available services on this dataset, \emph{unused} 16 | #' \item metas: a list of meta-information about the dataset (publisher, language, theme, etc.) 17 | #' \item attachments: a list of downloadable files related to this dataset, if any, \emph{not downloadable from R} 18 | #' \item alternative_exports: \emph{unknown purpose} 19 | #' \item billing_plans: \emph{unknown purpose} 20 | #' } 21 | #' 22 | #' @field sortables a character vector containing a subset of \strong{fields}, indicates on which fields sorting is allowed 23 | #' @field url a character, the actual url sent to the API 24 | #' @return An object of class \code{\link{FODRDataset}} with methods designed to retrieve data from an open dataset. 25 | #' 26 | #' @section Methods: 27 | #' \describe{ 28 | #' \item{\code{\link{get_attachments}}}{This method retrieves attachments from the dataset.} 29 | #' \item{\code{\link{get_records}}}{This method retrieves records from the dataset.} 30 | #' } 31 | #' 32 | #' @usage NULL 33 | #' 34 | #' @export 35 | FODRDataset <- R6::R6Class( 36 | "FODRDataset", 37 | public = list( 38 | portal = NULL, 39 | data = NULL, 40 | facets = NULL, 41 | fields = NULL, 42 | id = NULL, 43 | info = NULL, 44 | sortables = NULL, 45 | url = NULL, 46 | initialize = function(portal, id){ 47 | raw_data <- get_dataset(portal, id) 48 | 49 | self$portal <- portal 50 | self$id <- id 51 | 52 | self$url <- raw_data$url 53 | dataset <- raw_data$data 54 | 55 | 56 | self$info$features <- unlist(dataset$features) 57 | dataset$metas$keywords <- unlist(dataset$metas$keyword) 58 | dataset$metas$keyword <- NULL 59 | self$info <- c( 60 | self$info, 61 | dataset[c( 62 | "metas", 63 | "attachments", 64 | "alternative_exports", 65 | "billing_plans" 66 | )] 67 | ) 68 | self$info$attachments <- self$info$attachments %>% 69 | purrr::transpose() %>% 70 | lapply(unlist) 71 | self$fields <- dataset$fields 72 | self$facets <- get_facets(self$fields) 73 | self$sortables <- get_sortables(self$fields) 74 | }, 75 | get_attachments = function(fname, output = NULL){ 76 | attachments <- self$info$attachments 77 | id <- attachments$id[which(attachments$title == fname)] 78 | url <- paste0(self$url, "attachments/", id) 79 | if (is.null(output)) output <- fname 80 | curl::curl_download(url = url, destfile = output) 81 | }, 82 | get_records = function( 83 | nrows = NULL, 84 | refine = NULL, 85 | exclude = NULL, 86 | sort = NULL, 87 | q = NULL, 88 | lang = NULL, 89 | geofilter.distance = NULL, 90 | geofilter.polygon = NULL, 91 | quiet = TRUE, 92 | debug = FALSE, 93 | ... 94 | ) { 95 | if (is.null(nrows)) nrows <- self$info$metas$records_count 96 | 97 | if (nrows > MAX_API_RECORDS) { 98 | if (!quiet) cat( 99 | "Too many rows for direct call to API, downloading file ...\n" 100 | ) 101 | url <- get_portal_url(self$portal, "records") %>% 102 | paste0("download?dataset=", self$id) %>% 103 | add_parameters_to_url( 104 | refine = refine, 105 | exclude = exclude, 106 | q = q, 107 | lang = lang, 108 | geofilter.distance = geofilter.distance, 109 | geofilter.polygon = geofilter.polygon, 110 | format = "json", 111 | debug = debug 112 | ) 113 | response <- if (!quiet) httr::GET(url, httr::progress()) else 114 | httr::GET(url) 115 | if (!quiet) cat("\nFile downloaded, now parsing ...") 116 | res <- httr::content(response) 117 | } else { 118 | url <- get_portal_url(self$portal, "records") %>% 119 | paste0("search?dataset=", self$id) %>% 120 | add_parameters_to_url( 121 | nrows = nrows, 122 | refine = refine, 123 | exclude = exclude, 124 | sort = sort, 125 | q = q, 126 | lang = lang, 127 | geofilter.distance = geofilter.distance, 128 | geofilter.polygon = geofilter.polygon, 129 | debug = debug, 130 | ... 131 | ) 132 | res <- from_json(url)$records 133 | } 134 | 135 | 136 | out <- if (length(res) > 0) { 137 | nrows <- length(res) 138 | # Find all fields 139 | tres <- res %>% 140 | purrr::transpose() 141 | fields <- suppressWarnings( 142 | tres$fields %>% 143 | purrr::transpose() 144 | ) 145 | 146 | # Check if geo_shape field for GIS processing 147 | # The condition can't be based on the name because name can change (sometime geo_shape, sometime geo) 148 | # The solution here is to check for the first element of every fields and see if it's a list with 149 | # the correct format. 150 | geo_shape <- purrr::compact(purrr::map(fields, function(f) if (all(c("coordinates", "type") %in% names(f[[1]]))) f))[[1]] 151 | 152 | # Remove fields that have too many elements 153 | lfields <- lapply(fields, function(x) length(unlist(x))) 154 | fields <- fields[lfields <= nrows] 155 | 156 | records <- fields %>% 157 | lapply(function(x) { 158 | x[vapply(x, is.null, logical(1))] <- NA 159 | unlist(x)}) %>% 160 | tibble::as_tibble() 161 | 162 | # Handle GIS information 163 | geometry <- tres$geometry 164 | if (!is.null(geometry)) { 165 | geometry <- geometry %>% 166 | purrr::transpose() 167 | geometry$type <- unlist(geometry$type) 168 | dfLonlat <- lapply(geometry$coordinates, function(x) { 169 | tibble::tibble(lng = x[[1]], lat = x[[2]]) 170 | }) %>% 171 | dplyr::bind_rows() 172 | records <- dplyr::bind_cols(records, dfLonlat) 173 | } 174 | 175 | if (length(geo_shape) > 0) { 176 | geo_shape <- geo_shape %>% 177 | purrr::transpose() 178 | geo_shape$type <- unlist(geo_shape$type) 179 | 180 | # Can have LineString or MultiLineString, Polygon or MultiPolygon 181 | dfGeoShape <- dplyr::data_frame( 182 | geo_shape = lapply(seq_along(geo_shape$type), function(i) { 183 | coords <- geo_shape$coordinates[[i]] 184 | switch( 185 | geo_shape$type[i], 186 | LineString = tidy_line_string(coords), 187 | MultiLineString = tidy_multiline_string(coords), 188 | Polygon = tidy_polygon(coords), 189 | MultiPolygon = tidy_multipolygon(coords)) 190 | })) 191 | records <- dplyr::bind_cols(records, dfGeoShape) 192 | } 193 | records 194 | } else dplyr::data_frame() 195 | 196 | self$data <- out 197 | self$data 198 | }, 199 | 200 | print = function() { 201 | cat("FODRDataset object\n") 202 | cat("---------------------------------------------------------------\n") 203 | cat(paste("Dataset id:", self$id, "\n")) 204 | cat(paste("Theme:", toString(self$info$meta$theme), "\n")) 205 | cat(paste("Keywords:", toString(self$info$meta$keywords), "\n")) 206 | cat(paste("Publisher:", self$info$meta$publisher, "\n")) 207 | cat("---------------------------------------------------------------\n") 208 | cat(paste("Number of records:", self$info$meta$records_count, "\n")) 209 | if (is.null(nfiles <- nrow(self$info$attachments))) nfiles <- 0 210 | cat(paste("Number of files:", nfiles, "\n")) 211 | cat(paste("Modified:", as.Date(self$info$meta$modified), "\n")) 212 | if (!is.null(self$facets)) 213 | cat(paste("Facets:", toString(self$facets), "\n")) 214 | if (!is.null(self$sortables)) 215 | cat(paste("Sortables:", toString(self$sortables), "\n")) 216 | cat("---------------------------------------------------------------\n") 217 | cat("Description:\n") 218 | self$info$metas$description %>% 219 | gsub("

|
", "\n", .) %>% 220 | gsub("<.*?>", "", .) %>% 221 | gsub("\\t", "", .) %>% 222 | trimws() %>% 223 | cat() 224 | cat("\n---------------------------------------------------------------\n") 225 | } 226 | ) 227 | ) 228 | 229 | #' @title initialize a dataset 230 | #' 231 | #' @description A wrapper around \code{FODRDataset$new(portal, id)} for convenience. 232 | #' 233 | #' @param portal a character in \code{\link{list_portals}} 234 | #' @param id a character 235 | #' 236 | #' @examples 237 | #' \dontrun{ 238 | #' votes <- fodr_dataset("paris", "resultats-des-votes-budget-participatif-2016") 239 | #' votes 240 | #' } 241 | #' 242 | #' @name fodr_dataset 243 | #' @export 244 | fodr_dataset <- function(portal, id){ 245 | FODRDataset$new(portal, id) 246 | } 247 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | # fodr 5 | 6 | `fodr` is an R package to access various French Open Data portals. 7 | 8 | Many of those portals use the OpenDataSoft platform to make their data 9 | available and this platform can be accessed with the [OpenDataSoft 10 | APIs](https://docs.opendatasoft.com/en/api/catalog_api.html). 11 | 12 | `fodr` wraps this API to make it easier to retrieve data directly in R. 13 | 14 | ## Installation 15 | 16 | The `devtools` package is needed to install `fodr`: 17 | 18 | ``` r 19 | devtools::install_github("tutuchan/fodr") 20 | ``` 21 | 22 | ## Portals 23 | 24 | #### Available portals 25 | 26 | The following portals are currently available with `fodr`: 27 | 28 | ``` r 29 | library(fodr) 30 | list_portals() 31 | #> # A tibble: 15 x 3 32 | #> name portals base_urls 33 | #> 34 | #> 1 RATP ratp http://data.ratp.fr 35 | #> 2 Région Ile-de-France iledefra… http://data.iledefrance.… 36 | #> 3 Infogreffe infogref… http://datainfogreffe.fr 37 | #> 4 Toulouse Métropole toulouse https://data.toulouse-me… 38 | #> 5 STAR star https://data.explore.sta… 39 | #> 6 Issy-les-Moulineaux issy http://data.issy.com 40 | #> 7 STIF stif http://opendata.stif.info 41 | #> 8 Paris paris http://opendata.paris.fr 42 | #> 9 Tourisme Alpes-Maritimes 04 http://tourisme04.openda… 43 | #> 10 Tourisme Pas-de-Calais 62 http://tourisme62.openda… 44 | #> 11 Département des Hauts-de-Seine 92 https://opendata.hauts-d… 45 | #> 12 Ministère de l'Education Nationale,… enesr http://data.enseignement… 46 | #> 13 ERDF erdf https://data.erdf.fr 47 | #> 14 RTE rte https://opendata.rte-fra… 48 | #> 15 OpenDataSoft Public ods https://public.opendatas… 49 | ``` 50 | 51 | The portals have been identified from the [Open Data 52 | Inception](http://opendatainception.io) website. Many of these portals 53 | do not actually contain data and a large number of them are available 54 | *via* the ArcGIS Open platform. This API will be supported in a future 55 | release. 56 | 57 | #### Retrieve datasets on a portal 58 | 59 | Use the `fodr_portal` function with the corresponding **fodr slug** to 60 | create a `FODRPortal` object: 61 | 62 | ``` r 63 | library(fodr) 64 | portal <- fodr_portal("paris") 65 | portal 66 | #> FODRPortal object 67 | #> --------------------------------------------------------------- 68 | #> Portal: paris 69 | #> Number of datasets: 249 70 | #> Themes: 71 | #> - Administration et Finances Publiques 72 | #> - Citoyenneté 73 | #> - Commerces 74 | #> - Culture 75 | #> - Environnement 76 | #> - Equipements, Services, Social 77 | #> - Mobilité et Espace Public 78 | #> - Services 79 | #> - Urbanisme et Logements 80 | #> --------------------------------------------------------------- 81 | ``` 82 | 83 | The `search` method allows you to find datasets on this portal (see the 84 | function documentation for more information). By default, and contrary 85 | to the Open Data Soft API, all elements satisfying the search are 86 | returned. 87 | 88 | Let’s look at the datasets that contain the word *vote*: 89 | 90 | ``` r 91 | list_datasets <- portal$search(q = "vote") 92 | #> 36 datasets found ... 93 | list_datasets[[1]] 94 | #> FODRDataset object 95 | #> --------------------------------------------------------------- 96 | #> Dataset id: secteurs-des-bureaux-de-vote 97 | #> Theme: Citoyenneté 98 | #> Keywords: bureau de vote, elections, votes, suffrages 99 | #> Publisher: Mairie de Paris / Direction de la Démocratie, des Citoyens et des Territoires 100 | #> --------------------------------------------------------------- 101 | #> Number of records: 896 102 | #> Number of files: 0 103 | #> Modified: 2017-03-30 104 | #> Sortables: objectid, nbr_elect_f, nbr_elect_e_m, nbr_elect_e_e, nbr_elect_l12, arrondissement, num_bv 105 | #> --------------------------------------------------------------- 106 | #> Description: 107 | #> Sectionnement des bureaux de vote en vigueur à partir du 01 mars 2017Donnée initialement en NTF Lambert Zone I(EPSG : 27561)et reprojetée en RGF 93 Lambert 93(EPSG : 2154) Représentation du sectionnement des bureaux de vote, applicable à partir du 1emars 2017. La représentation du sectionnement ne se calque pas sur le bâti ou sur le parcellaire car c’est le rattachement au point-adresse qui est pris en considération. 108 | #> --------------------------------------------------------------- 109 | ``` 110 | 111 | #### Retrieve datasets by theme 112 | 113 | ``` r 114 | library(magrittr) 115 | list_culture_datasets <- portal$search(theme = "Culture") 116 | #> 249 datasets found ... 117 | lapply(list_culture_datasets, function(dataset) dataset$info$metas$theme) %>% 118 | unlist() %>% 119 | unique()%>% 120 | sort() 121 | #> [1] "Culture" 122 | ``` 123 | 124 | ## Datasets 125 | 126 | #### Retrieve records on a dataset 127 | 128 | ``` r 129 | dts <- list_datasets[[1]] 130 | dts$get_records() 131 | #> # A tibble: 896 x 14 132 | #> shape_area objectid nbr_elect_l12 arrondissement validite shape_len 133 | #> 134 | #> 1 0 4 0 19 oui 0 135 | #> 2 0 5 0 19 oui 0 136 | #> 3 0 9 0 19 oui 0 137 | #> 4 0 10 0 19 oui 0 138 | #> 5 0 12 0 19 oui 0 139 | #> 6 0 17 0 19 oui 0 140 | #> 7 0 19 0 19 oui 0 141 | #> 8 0 21 0 19 oui 0 142 | #> 9 0 22 0 19 non 0 143 | #> 10 0 29 0 19 non 0 144 | #> # ... with 886 more rows, and 8 more variables: nbr_elect_f , 145 | #> # nbr_elect_e_m , nbr_elect_e_e , num_bv , id_bv , 146 | #> # lng , lat , geo_shape 147 | ``` 148 | 149 | #### Filter records 150 | 151 | ``` r 152 | dts <- list_datasets[[1]] 153 | dts$get_records(nrows = dts$info$metas$records_count, refine = list(validite = "oui")) 154 | #> # A tibble: 54 x 14 155 | #> shape_area objectid nbr_elect_l12 arrondissement validite shape_len 156 | #> 157 | #> 1 0 4 0 19 oui 0 158 | #> 2 0 5 0 19 oui 0 159 | #> 3 0 9 0 19 oui 0 160 | #> 4 0 10 0 19 oui 0 161 | #> 5 0 12 0 19 oui 0 162 | #> 6 0 17 0 19 oui 0 163 | #> 7 0 19 0 19 oui 0 164 | #> 8 0 21 0 19 oui 0 165 | #> 9 0 33 0 19 oui 0 166 | #> 10 0 45 0 19 oui 0 167 | #> # ... with 44 more rows, and 8 more variables: nbr_elect_f , 168 | #> # nbr_elect_e_m , nbr_elect_e_e , num_bv , id_bv , 169 | #> # lng , lat , geo_shape 170 | ``` 171 | 172 | #### Download attachments 173 | 174 | Some datasets have attached files in a pdf, docx, xlsx, … format. These 175 | can be retrieved using the `get_attachments` method: 176 | 177 | ``` r 178 | dts <- fodr_dataset("erdf", "coefficients-des-profils") 179 | dts$get_attachments("DictionnaireProfils_1JUIL18.xlsb") 180 | ``` 181 | 182 | ## GIS data 183 | 184 | Some datasets have geographical information on each data point. 185 | 186 | For these datasets, two additional columns will be present when fetching 187 | records: `lng` and `lat` that correspond to the longitude and latitude 188 | of the coordinates of the data point. Additionally, if there are shapes 189 | associated to data points (polygons or linestrings for example), they 190 | will be stored in the `geo_shape` column either as a list of 191 | `data.frame`s with the same two columns `lng` and `lat` or in a list 192 | `sf` objects if `sf` package is already installed. The latter allows a 193 | straigtforward way to plot geometric data. 194 | 195 | See for example the following dataset: 196 | 197 | ``` r 198 | dts <- fodr_dataset("stif", "gares-routieres-idf") 199 | dfRecords <- dts$get_records(nrows = 10) 200 | dfRecords 201 | #> # A tibble: 10 x 14 202 | #> gr_id dpt_id lda_nom gare_nom zdl_id insee_txt gr_nom comm_nom zdl_nom 203 | #> 204 | #> 1 22 95 Argent… ARGENTE… 47875 95018 Argen… Argente… Argent… 205 | #> 2 183 77 Combs-… COMBS-L… 45771 77122 Combs… Combs-l… Combs-… 206 | #> 3 567 77 Nemour… NEMOURS… 43245 77431 Nemou… Saint-P… Nemour… 207 | #> 4 338 78 Houill… HOUILLE… 47439 78311 Houil… Houilles Houill… 208 | #> 5 854 91 Vigneu… VIGNEUX… 45735 91657 Vigne… Vigneux… Vigneu… 209 | #> 6 571 94 Nogent… NOGENT-… 46552 94058 Nogen… Le Perr… Nogent… 210 | #> 7 615 95 Persan… PERSAN-… 43178 95487 Persa… Persan Persan… 211 | #> 8 865 77 Villep… VILLEPA… 46725 77294 Ville… Mitry-M… Villep… 212 | #> 9 758 93 Saint-… SAINT-O… 43203 93070 Saint… Saint-O… Saint-… 213 | #> 10 690 77 Provin… PROVINS 47181 77379 Provi… Provins Provin… 214 | #> # ... with 5 more variables: acces_pmr , lda_id , lng , 215 | #> # lat , geo_shape 216 | ``` 217 | 218 | You can then use [leaflet](http://rstudio.github.io/leaflet/) to easily 219 | plot this data on a map, either using data points: 220 | 221 | ``` r 222 | library(leaflet) 223 | leaflet(dfRecords) %>% 224 | addProviderTiles("CartoDB.Positron") %>% 225 | addMarkers(popup = ~gare_nom) 226 | ``` 227 | 228 | ![leaflet 229 | example](inst/images/Screenshot%202018-12-28%2015-42-00.png?raw=true 230 | "Screenshot leaflet example") 231 | 232 | or using the `geo_shape` column: 233 | 234 | ``` r 235 | library(sf) 236 | dfRecords[1,] %>% 237 | st_as_sf() %>% 238 | leaflet() %>% 239 | addProviderTiles("CartoDB.Positron") %>% 240 | addPolygons(label = ~gare_nom) 241 | ``` 242 | 243 | ![leaflet 244 | example](inst/images/Screenshot%202018-12-28%2015-43-36.png?raw=true 245 | "Screenshot leaflet example") 246 | 247 | ## License of the data 248 | 249 | Most of the data is available under the [Open 250 | Licence](https://www.etalab.gouv.fr/licence-ouverte-open-licence) 251 | ([english PDF 252 | version](https://www.etalab.gouv.fr/wp-content/uploads/2014/05/Open_Licence.pdf)) 253 | but double check if you are unsure. 254 | 255 | ## TODO 256 | 257 | - handle portals that require authentification, 258 | - handle ArcGIS-powered portals, 259 | - possibly handle navitia.io portals, 260 | - ? 261 | --------------------------------------------------------------------------------