├── .Rbuildignore ├── .github ├── CONTRIBUTING.md ├── ISSUE_TEMPLATE.md └── PULL_REQUEST_TEMPLATE.md ├── .gitignore ├── .travis.yml ├── DESCRIPTION ├── Makefile ├── NAMESPACE ├── NEWS.md ├── R ├── colclass_dict.R ├── csvy-package.R ├── detect_metadata.R ├── get_yaml_header.R ├── read_csvy.R ├── read_metadata.R ├── write_csvy.R └── write_metadata.R ├── README.Rmd ├── README.md ├── appveyor.yml ├── docs ├── CONTRIBUTING.html ├── ISSUE_TEMPLATE.html ├── PULL_REQUEST_TEMPLATE.html ├── authors.html ├── index.html ├── jquery.sticky-kit.min.js ├── link.svg ├── news │ └── index.html ├── pkgdown.css ├── pkgdown.js ├── pkgdown.yml └── reference │ ├── csvy.html │ ├── index.html │ ├── read_csvy.html │ └── write_csvy.html ├── inst ├── CITATION └── examples │ ├── example1.csvy │ └── example2.csvy ├── man ├── colclass_dict.Rd ├── csvy.Rd ├── get_yaml_header.Rd ├── read_csvy.Rd ├── read_metadata.Rd ├── write_csvy.Rd └── write_metadata.Rd └── tests ├── test-all.R └── testthat ├── test-metadata.R ├── test-read_csvy.R ├── test-write_csvy.R └── test-write_metadata.R /.Rbuildignore: -------------------------------------------------------------------------------- 1 | .github/* 2 | ^\.travis\.yml$ 3 | ^appveyor\.yml$ 4 | ^travis-tool\.sh$ 5 | ^Makefile$ 6 | ^README\.Rmd$ 7 | ^README\.html$ 8 | ^README_files$ 9 | ^README_files/.+$ 10 | ^CONTRIBUTING\.md$ 11 | ^inst/standarderrors\.pdf$ 12 | ^figure$ 13 | ^figure/.+$ 14 | ^cache/.+$ 15 | ^docs$ 16 | ^docs/.+$ 17 | ^ignore$ 18 | ^inst/doc/.+\.log$ 19 | ^inst/doc/.+\.Rmd$ 20 | ^vignettes/figure$ 21 | ^vignettes/figure/.+$ 22 | ^vignettes/.+\.aux$ 23 | ^vignettes/.+\.bbl$ 24 | ^vignettes/.+\.blg$ 25 | ^vignettes/.+\.dvi$ 26 | ^vignettes/.+\.log$ 27 | ^vignettes/.+\.out$ 28 | ^vignettes/.+\.pdf$ 29 | ^vignettes/.+\.sty$ 30 | ^vignettes/.+\.tex$ 31 | ^data-raw$ 32 | ^revdep$ 33 | -------------------------------------------------------------------------------- /.github/CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | Contributions to **csvy** are welcome from anyone and are best sent as pull requests on [the GitHub repository](https://github.com/leeper/csvy/). This page provides some instructions to potential contributors about how to add to the package. 2 | 3 | 1. Contributions can be submitted as [a pull request](https://help.github.com/articles/creating-a-pull-request/) on GitHub by forking or cloning the [repo](https://github.com/leeper/csvy/), making changes and submitting the pull request. 4 | 5 | 2. Pull requests should involve only one commit per substantive change. This means if you change multiple files (e.g., code and documentation), these changes should be committed together. If you don't know how to do this (e.g., you are making changes in the GitHub web interface) just submit anyway and the maintainer will clean things up. 6 | 7 | 3. All contributions must be submitted consistent with the package license ([GPL-2](http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html)). 8 | 9 | 4. All contributions need to be noted in the `Authors@R` field in the [DESCRIPTION](https://github.com/leeper/csvy/blob/master/DESCRIPTION). Just follow the format of the existing entries to add your name (and, optionally, email address). Substantial contributions should also be noted in [`inst/CITATION`](https://github.com/leeper/csvy/blob/master/inst/CITATION). 10 | 11 | 5. Please run `R CMD BUILD csvy` and `R CMD CHECK csvy_VERSION.tar.gz` before submitting the pull request to check for any errors. 12 | 13 | Some specific types of changes that you might make are: 14 | 15 | 1. Documentation-only changes (e.g., to Rd files, README, vignettes). This is great! All contributions are welcome. 16 | 17 | 2. Changes requiring a new package dependency should be discussed on the GitHub issues page before submitting a pull request. 18 | 19 | 3. Message translations. These are very appreciated! The format is a pain, but if you're doing this I'm assuming you're already familiar with it. 20 | 21 | Any questions you have can be opened as GitHub issues or directed to thosjleeper (at) gmail.com. 22 | 23 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | Please specify whether your issue is about: 2 | 3 | - [ ] a possible bug 4 | - [ ] a question about package functionality 5 | - [ ] a suggested code or documentation change, improvement to the code, or feature request 6 | 7 | If you are reporting (1) a bug or (2) a question about code, please supply: 8 | 9 | - [a fully reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) using a publicly available dataset (or provide your data) 10 | - if an error is occurring, include the output of `traceback()` run immediately after the error occurs 11 | - the output of `sessionInfo()` 12 | 13 | Put your code here: 14 | 15 | ```R 16 | ## load package 17 | library("csvy") 18 | 19 | ## code goes here 20 | 21 | 22 | ## session info for your system 23 | sessionInfo() 24 | ``` 25 | 26 | -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | Please ensure the following before submitting a PR: 2 | 3 | - [ ] if suggesting code changes or improvements, [open an issue](https://github.com/leeper/csvy/issues/new) first 4 | - [ ] for all but trivial changes (e.g., typo fixes), add your name to [DESCRIPTION](https://github.com/leeper/csvy/blob/master/DESCRIPTION) 5 | - [ ] for all but trivial changes (e.g., typo fixes), documentation your change in [NEWS.md](https://github.com/leeper/csvy/blob/master/NEWS.md) with a parenthetical reference to the issue number being addressed 6 | - [ ] if changing documentation, edit files in `/R` not `/man` and run `devtools::document()` to update documentation 7 | - [ ] add code or new test files to [`/tests`](https://github.com/leeper/csvy/tree/master/tests/testthat) for any new functionality or bug fix 8 | - [ ] make sure `R CMD check` runs without error before submitting the PR 9 | 10 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.DS_Store 2 | .Rhistory 3 | iris.csvy 4 | revdep/* 5 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: r 2 | sudo: false 3 | r_packages: 4 | - covr 5 | after_success: 6 | - Rscript -e 'library("covr");codecov()' 7 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: csvy 2 | Type: Package 3 | Title: Import and Export CSV Data with a YAML Metadata Header 4 | Version: 0.3.0 5 | Date: 2018-07-31 6 | Authors@R: c(person("Thomas J.", "Leeper", 7 | role = c("aut", "cre"), 8 | email = "thosjleeper@gmail.com", 9 | comment = c(ORCID = "0000-0003-4097-6326")), 10 | person("Alexey N.", "Shiklomanov", 11 | role = c("aut"), 12 | email = "alexey.shiklomanov@gmail.com", 13 | comment = c(ORCID = "0000-0003-4022-5979")), 14 | person("Jonathan", "Carroll", 15 | email = "rpkg@jcarroll.com.au", 16 | role = c("aut"), 17 | comment = c(ORCID = "0000-0002-1404-5264")) 18 | ) 19 | Description: Support for import from and export to the CSVY file format. CSVY is a file format that combines the simplicity of CSV (comma-separated values) with the metadata of other plain text and binary formats (JSON, XML, Stata, etc.) by placing a YAML header on top of a regular CSV. 20 | URL: https://github.com/leeper/csvy 21 | BugReports: https://github.com/leeper/csvy/issues 22 | Imports: 23 | tools, 24 | data.table, 25 | jsonlite, 26 | yaml 27 | Suggests: 28 | testthat, 29 | datasets 30 | License: GPL-2 31 | RoxygenNote: 6.0.1 32 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | pkg = $(shell basename $(CURDIR)) 2 | 3 | all: build 4 | 5 | NAMESPACE: R/* 6 | Rscript -e "devtools::document()" 7 | 8 | README.md: README.Rmd 9 | Rscript -e "knitr::knit('README.Rmd')" 10 | 11 | README.html: README.md 12 | pandoc -o README.html README.md 13 | 14 | ../$(pkg)*.tar.gz: DESCRIPTION NAMESPACE README.md 15 | cd ../ && R CMD build $(pkg) 16 | 17 | build: ../$(pkg)*.tar.gz 18 | 19 | check: ../$(pkg)*.tar.gz 20 | cd ../ && R CMD check $(pkg)*.tar.gz 21 | rm ../$(pkg)*.tar.gz 22 | 23 | revdep: ../$(pkg)*.tar.gz 24 | Rscript -e "devtools::revdep_check()" 25 | 26 | install: ../$(pkg)*.tar.gz 27 | cd ../ && R CMD INSTALL $(pkg)*.tar.gz 28 | rm ../$(pkg)*.tar.gz 29 | 30 | website: R/* README.md DESCRIPTION 31 | Rscript -e "pkgdown::build_site()" 32 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | export(get_yaml_header) 4 | export(read_csvy) 5 | export(read_metadata) 6 | export(write_csvy) 7 | export(write_metadata) 8 | importFrom(data.table,fread) 9 | importFrom(data.table,fwrite) 10 | importFrom(jsonlite,fromJSON) 11 | importFrom(jsonlite,write_json) 12 | importFrom(stats,setNames) 13 | importFrom(tools,file_ext) 14 | importFrom(yaml,as.yaml) 15 | importFrom(yaml,yaml.load) 16 | -------------------------------------------------------------------------------- /NEWS.md: -------------------------------------------------------------------------------- 1 | # csvy 0.3.0 2 | 3 | - Updated support to current CSVY specifications. (#13, h/t Michael Chirico) 4 | - Argument `sep2` in `write_csvy()` has been corrected to `dec`. 5 | - Fixed an unclosed connection bug. (#23) 6 | 7 | # csvy 0.2.2 8 | 9 | - If reading a file `data.csvy` without a metadata header, and a `data.[yaml|yml|json]` file is present (in the same directory), that will be automatically read-in as the metadata (completes requests for #10, h/t @jonocarroll) 10 | 11 | # csvy 0.2.1 12 | 13 | * Expanded test suite and fixed some small bugs in the process. 14 | * Parse YAML header file first, then pass column classes to `data.table::fread` to improve performance (#9, Alexey Shiklomanov) 15 | 16 | # csvy 0.2.0 17 | 18 | - Removed support for `utils::read.csv()` and `readr::read_csv()` for simplicity. 19 | - Updated support to current CSVY specifications. (#13, h/t Michael Chirico) 20 | - Substantially changed internal code and added markup. 21 | - Changed example files. 22 | - Added option to output metadata to separate YAML or JSON file. (#10, h/t Hadley Wickham) 23 | 24 | # csvy 0.1.2 25 | 26 | - Address header that is not in the same order as data columns. (#1) 27 | - Support for `readr::read_csv()` and `utils::read.csv()`. (#2) 28 | 29 | # csvy 0.1.1 30 | 31 | - Initial release 32 | -------------------------------------------------------------------------------- /R/colclass_dict.R: -------------------------------------------------------------------------------- 1 | #' Dictionary of column classes for reading data 2 | colclass_dict <- c( 3 | "string" = "character", 4 | "integer" = "integer", 5 | "number" = "numeric", 6 | "factor" = "character", # Convert to factor afterwards -- fread doesn't do factors 7 | "date" = "Date", 8 | "datetime" = "POSIXct", 9 | "boolean" = "logical" 10 | ) 11 | -------------------------------------------------------------------------------- /R/csvy-package.R: -------------------------------------------------------------------------------- 1 | #' @name csvy 2 | #' @docType package 3 | #' @aliases csvy csvy-package 4 | #' @title Import and Export CSV Data With a YAML Metadata Header 5 | #' @description CSVY is a file format that combines the simplicity of CSV (comma-separated values) with the metadata of other plain text and binary formats (JSON, XML, Stata, etc.). The \href{http://csvy.org/}{CSVY file specification} is simple: place a YAML header on top of a regular CSV. The csvy package implements this format using two functions: \code{\link{write_csvy}} and \code{\link{read_csvy}}. 6 | NULL 7 | -------------------------------------------------------------------------------- /R/detect_metadata.R: -------------------------------------------------------------------------------- 1 | detect_metadata <- function(file) { 2 | filedir <- dirname(file) 3 | possible_metadata <- dir(filedir, pattern = "\\.[Jj]{1}[Ss]{1}[Oo]{1}[Nn]{1}$|\\.[Yy]{1}[Aa]?[Mm]{1}[Ll]{1}", full.names = TRUE) 4 | if (length(possible_metadata) > 1) { 5 | # too many potential metadata files found 6 | stop("More than one yaml/yml/json files detected in same directory as data") 7 | } else if (length(possible_metadata) == 0) { 8 | # no metadata file found, so just read file 9 | return(NULL) 10 | } 11 | # one file found 12 | message(sprintf("Attempting to read metadata from auto-detected file: %s", basename(possible_metadata))) 13 | return(read_metadata(possible_metadata)) 14 | } 15 | -------------------------------------------------------------------------------- /R/get_yaml_header.R: -------------------------------------------------------------------------------- 1 | #' Retrieve YAML header from file 2 | #' 3 | #' Note that this assumes only one Yaml header, starting on the first line of the file. 4 | #' 5 | #' @inheritParams read_csvy 6 | #' @param yaml_rxp Regular expression for parsing YAML header 7 | #' @param verbose Logical. If \code{TRUE}, print warning if no header found. 8 | #' @return Character vector of lines containing YAML header, or `NULL` if no YAML header found. 9 | #' @export 10 | get_yaml_header <- function(file, yaml_rxp = "^\\#*---[[:space:]]*$", verbose = TRUE) { 11 | # read first line to check for header 12 | con <- file(file, "r") 13 | on.exit(close(con)) 14 | first_line <- readLines(con, n = 1L) 15 | if (!length(first_line) || !grepl(yaml_rxp, first_line)) { 16 | if (isTRUE(verbose)) { 17 | warning("No YAML header found.") 18 | } 19 | return(NULL) 20 | } 21 | 22 | # if header, read it in until "---" found 23 | iline <- 1L 24 | closing_tag <- FALSE 25 | out <- character() 26 | while (!isTRUE(closing_tag)) { 27 | out[iline] <- readLines(con, n = 1L) 28 | if (grepl(yaml_rxp, out[iline])) { 29 | closing_tag <- TRUE 30 | } else { 31 | iline <- iline + 1L 32 | } 33 | } 34 | 35 | # remove leading comment character, if present 36 | if (all(grepl("^#", out))) { 37 | out <- gsub("^#", "", out) 38 | } 39 | return(out) 40 | } 41 | -------------------------------------------------------------------------------- /R/read_csvy.R: -------------------------------------------------------------------------------- 1 | #' @title Import CSVY data 2 | #' @description Import CSVY data as a data.frame 3 | #' @param file A character string or R connection specifying a file. 4 | #' @param metadata Optionally, a character string specifying a YAML (\dQuote{.yaml}) or JSON (\dQuote{.json}) file containing metadata (in lieu of including it in the header of the file). 5 | #' @param stringsAsFactors A logical specifying whether to treat character columns as factors. Passed to \code{\link[utils]{read.csv}} or \code{\link[data.table]{fread}} depending on the value of \code{method}. Ignored for \code{method = 'readr'} which never returns factors. 6 | #' @param detect_metadata A logical specifying whether to auto-detect a metadata file if none is specified (and if no header is found). 7 | #' @param \dots Additional arguments passed to \code{\link[data.table]{fread}}. 8 | #' @examples 9 | #' read_csvy(system.file("examples", "example1.csvy", package = "csvy")) 10 | #' 11 | #' @importFrom tools file_ext 12 | #' @importFrom jsonlite fromJSON 13 | #' @importFrom data.table fread 14 | #' @importFrom yaml yaml.load 15 | #' @export 16 | #' @seealso \code{\link{write_csvy}} 17 | read_csvy <- 18 | function( 19 | file, 20 | metadata = NULL, 21 | stringsAsFactors = FALSE, 22 | detect_metadata = TRUE, 23 | ... 24 | ) { 25 | 26 | # setup factor coercion conditional on presence of 'levels' metadata field 27 | if (isTRUE(stringsAsFactors)) { 28 | try_to_factorize <- "always" 29 | } else if (stringsAsFactors == "conditional") { 30 | stringsAsFactors <- FALSE 31 | try_to_factorize <- "conditional" 32 | } else { 33 | try_to_factorize <- "never" 34 | } 35 | 36 | if (is.null(metadata)) { 37 | metadata_raw <- get_yaml_header(file, verbose = FALSE) 38 | if (is.null(metadata_raw) & !detect_metadata) { 39 | # no metadata found in file and no auto-detection requested 40 | message("No metadata header found. Reading file as CSV.") 41 | out <- data.table::fread(input = file, sep = "auto", header = "auto", 42 | stringsAsFactors = stringsAsFactors, 43 | data.table = FALSE, ...) 44 | return(out) 45 | } else if (is.null(metadata_raw) & isTRUE(detect_metadata)) { 46 | # no metadata found in file but auto-detection requested 47 | message("No metadata header found in file, so attempting to auto-detect metadata file.") 48 | skip_lines <- 0L 49 | metadata_list <- detect_metadata(file) 50 | if (is.null(metadata_list)) { 51 | message("No metadata file found. Reading file as CSV.") 52 | out <- data.table::fread(input = file, sep = "auto", header = "auto", 53 | stringsAsFactors = stringsAsFactors, 54 | data.table = FALSE, ...) 55 | return(out) 56 | } 57 | } else { 58 | # metadata found in file 59 | skip_lines <- length(metadata_raw) + 1L # Including opening and closing "---" 60 | metadata_list <- yaml::yaml.load(paste(metadata_raw, collapse = "\n")) 61 | } 62 | } else { 63 | skip_lines <- 0L 64 | metadata_list <- read_metadata(metadata) 65 | } 66 | 67 | # find variable-level metadata 'fields' 68 | if ("fields" %in% names(metadata_list)) { 69 | fields <- metadata_list$fields 70 | col_classes <- NULL 71 | } else if ("schema" %in% names(metadata_list)) { 72 | fields <- metadata_list$schema$fields 73 | field_types <- vapply(fields, "[[", character(1), "type") 74 | col_classes <- colclass_dict[field_types] 75 | names(col_classes) <- vapply(fields, "[[", character(1), "name") 76 | } else { 77 | fields <- NULL 78 | col_classes <- NULL 79 | } 80 | 81 | # find 'dialect' to use for importing, if available 82 | if ("dialect" %in% names(metadata_list)) { 83 | ## delimiter 84 | sep <- metadata_list$dialect$delimeter 85 | if (is.null(sep)) sep <- "auto" 86 | ## header 87 | header <- as.logical(metadata_list$dialect$header) 88 | if (is.null(header)) { 89 | header <- "auto" 90 | } 91 | ## there are other args here but we really don't need them 92 | ## need to decide how to use them 93 | } else { 94 | sep <- "auto" 95 | header <- "auto" 96 | } 97 | 98 | # load the data 99 | out <- data.table::fread( 100 | file = file, 101 | sep = sep, 102 | header = header, 103 | stringsAsFactors = stringsAsFactors, 104 | data.table = FALSE, 105 | colClasses = col_classes, 106 | skip = skip_lines, 107 | ... 108 | ) 109 | 110 | # add data frame-level metadata to data 111 | out <- add_dataset_metadata(data_frame = out, metadata_list = metadata_list) 112 | 113 | # add variable-level metadata to data 114 | out <- add_variable_metadata(data = out, fields = fields, try_to_factorize = try_to_factorize) 115 | 116 | return(out) 117 | } 118 | 119 | check_variable_metadata <- function(data, fields) { 120 | if (is.null(fields)) { 121 | return(NULL) 122 | } 123 | 124 | hnames <- lapply(fields, `[[`, "name") 125 | 126 | missing_from_metadata <- names(data)[!names(data) %in% hnames] 127 | if (length(missing_from_metadata)) { 128 | warning("Metadata is missing for ", 129 | ngettext(length(missing_from_metadata), "variable", "variables"), 130 | " listed in data: ", paste(missing_from_metadata, collapse = ", ")) 131 | } 132 | 133 | missing_from_data <- unlist(hnames)[!unlist(hnames) %in% names(data)] 134 | if (length(missing_from_data)) { 135 | warning("Data is missing for ", 136 | ngettext(length(missing_from_data), "variable", "variables"), 137 | " listed in frontmatter: ", paste(missing_from_metadata, collapse = ", ")) 138 | } 139 | 140 | duplicated_metadata <- unlist(hnames)[duplicated(unlist(hnames))] 141 | if (length(duplicated_metadata)) { 142 | warning("Duplicate metadata entries for ", 143 | ngettext(length(duplicated_metadata), "variable", "variables"), 144 | " listed in frontmatter: ", paste(duplicated_metadata, collapse = ", ")) 145 | } 146 | 147 | duplicated_columns <- unlist(hnames)[duplicated(unlist(hnames))] 148 | if (length(duplicated_columns)) { 149 | warning("Duplicate column names for ", 150 | ngettext(length(duplicated_columns), "variable", "variables"), 151 | ": ", paste(duplicated_metadata, collapse = ", ")) 152 | } 153 | 154 | NULL 155 | } 156 | 157 | add_variable_metadata <- function(data, fields, try_to_factorize = "never") { 158 | 159 | # check metadata against header row 160 | check_variable_metadata(data = data, fields = fields) 161 | 162 | # add metadata to data, iterating across metadata list 163 | metadata_names <- lapply(fields, `[[`, "name") 164 | for (i in seq_along(fields)) { 165 | # grab attributes for this variable 166 | fields_this_col <- fields[[i]] 167 | 168 | # add 'title' field 169 | if ("title" %in% names(fields_this_col)) { 170 | attr(data[[i]], "label") <- fields_this_col[["title"]] 171 | } 172 | # add 'description' field 173 | if ("description" %in% names(fields_this_col)) { 174 | attr(data[[i]], "description") <- fields_this_col[["description"]] 175 | } 176 | ## store attributes already calculated 177 | dat_attributes <- attributes(data[[i]]) 178 | 179 | # handle 'type' and 'format' fields 180 | ## 'type' 181 | if ("type" %in% names(fields_this_col)) { 182 | if (fields_this_col[["type"]] == "string") { 183 | ## character/factor 184 | if (try_to_factorize == "always") { 185 | # convert all character to factor 186 | if (is.null(fields_this_col[["levels"]])) { 187 | try(data[[i]] <- as.factor(data[[i]])) 188 | } else { 189 | try(data[[i]] <- factor(data[[i]], levels = fields_this_col[["levels"]])) 190 | } 191 | } else if (try_to_factorize == "conditional") { 192 | # convert character to factor if levels are present 193 | if (is.null(fields_this_col[["levels"]])) { 194 | try(data[[i]] <- as.character(data[[i]])) 195 | } else { 196 | try(data[[i]] <- factor(data[[i]], levels = fields_this_col[["levels"]])) 197 | } 198 | } else { 199 | # do not convert character to factor 200 | try(data[[i]] <- as.character(data[[i]])) 201 | } 202 | } else if (fields_this_col[["type"]] == "date") { 203 | try(data[[i]] <- as.Date(data[[i]])) 204 | } else if (fields_this_col[["type"]] == "datetime") { 205 | try(data[[i]] <- as.POSIXct(data[[i]])) 206 | } else if (fields_this_col[["type"]] == "boolean") { 207 | try(data[[i]] <- as.logical(data[[i]])) 208 | } else if (fields_this_col[["type"]] == "number") { 209 | try(data[[i]] <- as.numeric(data[[i]])) 210 | } 211 | ## replace attributes 212 | attributes(data[[i]]) <- dat_attributes 213 | } 214 | ## 'format' (just added as an attribute for now) 215 | if ("format" %in% names(fields_this_col)) { 216 | attr(data[[i]], "format") <- fields_this_col[["format"]] 217 | } 218 | ## add 'levels' (if not added above during factor coercion) 219 | if ("levels" %in% names(fields_this_col) && (!"levels" %in% attributes(data[[i]]))) { 220 | attr(data[[i]], "levels") <- fields_this_col[["levels"]] 221 | } 222 | ## add 'labels' (not in schema but useful) 223 | if ("labels" %in% names(fields_this_col) && (!"labels" %in% attributes(data[[i]]))) { 224 | attr(data[[i]], "labels") <- fields_this_col[["labels"]] 225 | } 226 | rm(fields_this_col) 227 | } 228 | 229 | return(data) 230 | } 231 | 232 | add_dataset_metadata <- function(data_frame, metadata_list) { 233 | if ("profile" %in% names(metadata_list)) { 234 | attr(data_frame, "profile") <- metadata_list[["profile"]] 235 | } 236 | if ("title" %in% names(metadata_list)) { 237 | attr(data_frame, "title") <- metadata_list[["title"]] 238 | } 239 | if ("description" %in% names(metadata_list)) { 240 | attr(data_frame, "description") <- metadata_list[["description"]] 241 | } 242 | if ("name" %in% names(metadata_list)) { 243 | attr(data_frame, "name") <- metadata_list[["name"]] 244 | } 245 | if ("format" %in% names(metadata_list)) { 246 | attr(data_frame, "format") <- metadata_list[["format"]] 247 | } 248 | if ("sources" %in% names(metadata_list)) { 249 | attr(data_frame, "sources") <- metadata_list[["sources"]] 250 | } 251 | if ("licenses" %in% names(metadata_list)) { 252 | attr(data_frame, "sources") <- metadata_list[["licenses"]] 253 | } 254 | return(data_frame) 255 | } 256 | -------------------------------------------------------------------------------- /R/read_metadata.R: -------------------------------------------------------------------------------- 1 | #' @title Read metadata 2 | #' @md 3 | #' @description Read csvy metadata from an external `.yml/.yaml` or `.json` file 4 | #' 5 | #' @param file full path of file from which to read the metadata. 6 | #' 7 | #' @return the metadata as a list 8 | #' 9 | #' @importFrom yaml yaml.load 10 | #' @importFrom jsonlite fromJSON 11 | #' @importFrom tools file_ext 12 | #' 13 | #' @export 14 | read_metadata <- function(file) { 15 | ext <- tolower(tools::file_ext(file)) 16 | if (ext %in% c("yaml", "yml")) { 17 | metadata_list <- yaml::yaml.load(paste(readLines(file), collapse = "\n")) 18 | } else if (ext == "json") { 19 | metadata_list <- jsonlite::fromJSON(file, simplifyDataFrame = FALSE) 20 | } else { 21 | stop("'metadata' should be either a .json or .yaml file.") ## should fail 22 | } 23 | return(metadata_list) 24 | } 25 | -------------------------------------------------------------------------------- /R/write_csvy.R: -------------------------------------------------------------------------------- 1 | #' @title Export CSVY data 2 | #' @description Export data.frame to CSVY 3 | #' @param x A data.frame. 4 | #' @param file A character string or R connection specifying a file. 5 | #' @param metadata Optionally, a character string specifying a YAML (\dQuote{.yaml}) or JSON (\dQuote{.json}) file to write the metadata (in lieu of including it in the header of the file). 6 | #' @param sep A character string specifying a between-field separator. Passed to \code{\link[data.table]{fwrite}}. 7 | #' @param dec A character string specifying a within-field separator. Passed to \code{\link[data.table]{fwrite}}. 8 | #' @param comment_header A logical indicating whether to comment the lines containing the YAML front matter. Default is \code{TRUE}. 9 | #' @param metadata_only A logical indicating whether only the metadata should be produced (no CSV component). 10 | #' @param name A character string specifying a name for the dataset. 11 | #' @param \dots Additional arguments passed to \code{\link[data.table]{fwrite}}. 12 | #' @examples 13 | #' library("datasets") 14 | #' write_csvy(head(iris)) 15 | #' 16 | #' # write yaml w/o comment charaters 17 | #' write_csvy(head(iris), comment_header = FALSE) 18 | #' 19 | #' @importFrom stats setNames 20 | #' @importFrom data.table fwrite 21 | #' @importFrom yaml as.yaml 22 | #' @importFrom jsonlite write_json 23 | #' @export 24 | #' @seealso \code{\link{write_csvy}} 25 | write_csvy <- 26 | function( 27 | x, 28 | file, 29 | metadata = NULL, 30 | sep = ",", 31 | dec = ".", 32 | comment_header = if (is.null(metadata)) TRUE else FALSE, 33 | name = deparse(substitute(x)), 34 | metadata_only = FALSE, 35 | ... 36 | ) { 37 | 38 | ## data-level metadata 39 | metadata_list <- list(profile = "tabular-data-package", 40 | name = name) 41 | att <- attributes(x) 42 | metadata_list <- c(metadata_list, att[!names(att) %in% c("names", "class", "row.names")]) 43 | 44 | ## build variable-specific metadata list 45 | fields <- list() 46 | for (i in seq_along(x)) { 47 | # grab attributes for this variable 48 | fields_this_col <- attributes(x[[i]]) 49 | 50 | # initialize metadata list for this variable 51 | fields[[i]] <- list() 52 | 53 | # add 'name' field 54 | fields[[i]][["name"]] <- names(x)[i] 55 | # add 'title' field 56 | if ("label" %in% names(fields_this_col)) { 57 | fields[[i]][["title"]] <- fields_this_col[["label"]] 58 | } 59 | # R has no canonical analogue to 'description' field, but if it's there add it 60 | if ("description" %in% names(fields_this_col)) { 61 | fields[[i]][["description"]] <- fields_this_col[["description"]] 62 | } 63 | # add 'type' field 64 | ## default is 'string' unless specified otherwise 65 | fields[[i]][["type"]] <- switch(class(x[[i]])[1L], 66 | character = "string", 67 | Date = "date", 68 | integer = "integer", 69 | logical = "boolean", 70 | numeric = "number", 71 | POSIXct = "datetime", 72 | "string") 73 | if ("labels" %in% names(fields_this_col)) { 74 | fields[[i]][["labels"]] <- 75 | setNames(as.list(unname(fields_this_col$labels)), names(fields_this_col$labels)) 76 | } 77 | if ("levels" %in% names(fields_this_col)) { 78 | fields[[i]][["levels"]] <- 79 | setNames(as.list(unname(fields_this_col$levels)), names(fields_this_col$levels)) 80 | } 81 | rm(fields_this_col) 82 | } 83 | metadata_list[["fields"]] <- fields 84 | 85 | if (!is.null(metadata)) { 86 | ## write metadata to separate file 87 | write_metadata(metadata_list, metadata) 88 | ## don't write the csv component if metadata_only is TRUE 89 | if (!metadata_only) { 90 | # write CSV 91 | data.table::fwrite(x = x, file = file, sep = sep, dec = dec, ...) 92 | } 93 | } else { 94 | # write metadata to file 95 | y <- paste0("---\n", yaml::as.yaml(metadata_list), "---\n") 96 | if (isTRUE(comment_header)){ 97 | con <- textConnection(y) 98 | on.exit(close(con)) 99 | m <- readLines(con) 100 | y <- paste0("#", m[-length(m)],collapse = "\n") 101 | y <- c(y, "\n") 102 | } 103 | # write data to file 104 | if (missing(file)) { 105 | cat(y) 106 | data.table::fwrite(x = x, file = "", sep = sep, dec = dec, append = TRUE, col.names = TRUE, ...) 107 | } else { 108 | cat(y, file = file) 109 | # append CSV to file 110 | data.table::fwrite(x = x, file = file, sep = sep, dec = dec, append = TRUE, col.names = TRUE, ...) 111 | } 112 | } 113 | invisible(x) 114 | } 115 | -------------------------------------------------------------------------------- /R/write_metadata.R: -------------------------------------------------------------------------------- 1 | #' @title Write csvy metadata 2 | #' @md 3 | #' @description Write csvy metadata to an external `.yml/.yaml` or `.json` file 4 | #' 5 | #' @param metadata_list metadata to be stored. Must be valid as per 6 | #' [yaml::as.yaml()] or [jsonlite::write_json()] for that particular output 7 | #' type. 8 | #' @param file full path of file in which to save the metadata. 9 | #' 10 | #' @importFrom yaml as.yaml 11 | #' @importFrom jsonlite write_json 12 | #' @importFrom tools file_ext 13 | #' 14 | #' @return `NULL` (invisibly) 15 | #' @export 16 | write_metadata <- function(metadata_list = NULL, file = NULL) { 17 | 18 | if (is.null(metadata_list) || !is.list(metadata_list)) stop("must provide metadata_list as a list") 19 | if (is.null(file) || !is.character(file)) stop("metadata (filename) must be provided") 20 | 21 | ## get file extension 22 | ext <- tolower(tools::file_ext(file)) 23 | 24 | # write metadata to separate metadata file 25 | if (ext %in% c("yml", "yaml")) { 26 | cat(yaml::as.yaml(metadata_list), file = file) 27 | } else if (ext == "json") { 28 | jsonlite::write_json(metadata_list, path = file) 29 | } else { 30 | warning("'metadata' should be either a .json or .yaml file.") ## TODO stop? 31 | } 32 | 33 | return(invisible(NULL)) 34 | 35 | } 36 | -------------------------------------------------------------------------------- /README.Rmd: -------------------------------------------------------------------------------- 1 | # Import and Export CSV Data With a YAML Metadata Header 2 | 3 | CSVY is a file format that combines the simplicity of CSV (comma-separated values) with the metadata of other plain text and binary formats (JSON, XML, Stata, etc.). The [CSVY file specification](http://csvy.org/) is simple: place a YAML header on top of a regular CSV. The yaml header is formatted according to the [Table Schema](https://frictionlessdata.io/specs/table-schema/) of a [Tabular Data Package](https://frictionlessdata.io/specs/tabular-data-package/). 4 | 5 | A CSVY file looks like this: 6 | 7 | ``` 8 | #--- 9 | #profile: tabular-data-resource 10 | #name: my-dataset 11 | #path: https://raw.githubusercontent.com/csvy/csvy.github.io/master/examples/example.csvy 12 | #title: Example file of csvy 13 | #description: Show a csvy sample file. 14 | #format: csvy 15 | #mediatype: text/vnd.yaml 16 | #encoding: utf-8 17 | #schema: 18 | # fields: 19 | # - name: var1 20 | # type: string 21 | # - name: var2 22 | # type: integer 23 | # - name: var3 24 | # type: number 25 | #dialect: 26 | # csvddfVersion: 1.0 27 | # delimiter: "," 28 | # doubleQuote: false 29 | # lineTerminator: "\r\n" 30 | # quoteChar: "\"" 31 | # skipInitialSpace: true 32 | # header: true 33 | #sources: 34 | #- title: The csvy specifications 35 | # path: http://csvy.org/ 36 | # email: '' 37 | #licenses: 38 | #- name: CC-BY-4.0 39 | # title: Creative Commons Attribution 4.0 40 | # path: https://creativecommons.org/licenses/by/4.0/ 41 | #--- 42 | var1,var2,var3 43 | A,1,2.0 44 | B,3,4.3 45 | ``` 46 | 47 | Which we can read into R like this: 48 | 49 | 50 | ```{r} 51 | library("csvy") 52 | str(read_csvy(system.file("examples", "example1.csvy", package = "csvy"))) 53 | ``` 54 | 55 | Optional comment characters on the YAML lines make the data readable with any standard CSV parser while retaining the ability to import and export variable- and file-level metadata. The CSVY specification does not use these, but the csvy package for R does so that you (and other users) can continue to rely on `utils::read.csv()` or `readr::read_csv()` as usual. The `import()` function in [rio](https://cran.r-project.org/package=rio) supports CSVY natively. 56 | 57 | ### Export 58 | 59 | To create a CSVY file from R, just do: 60 | 61 | ```{r} 62 | library("csvy") 63 | library("datasets") 64 | write_csvy(iris, "iris.csvy") 65 | ``` 66 | 67 | It is also possible to export the metadata to separate YAML or JSON file (and then also possible to import from those separate files) by specifying the `metadata` field in `write_csvy()` and `read_csvy()`. 68 | 69 | ### Import 70 | 71 | To read a CSVY into R, just do: 72 | 73 | ```{r} 74 | d1 <- read_csvy("iris.csvy") 75 | str(d1) 76 | ``` 77 | 78 | or use any other appropriate data import function to ignore the YAML metadata: 79 | 80 | ```{r} 81 | d2 <- utils::read.table("iris.csvy", sep = ",", header = TRUE) 82 | str(d2) 83 | ``` 84 | 85 | ```{r, echo=FALSE} 86 | unlink("iris.csvy") 87 | ``` 88 | 89 | ## Package Installation 90 | 91 | The package is available on [CRAN](https://cran.r-project.org/package=csvy) and can be installed directly in R using: 92 | 93 | ```R 94 | install.packages("csvy") 95 | ``` 96 | 97 | The latest development version on GitHub can be installed using **devtools**: 98 | 99 | ```R 100 | if(!require("remotes")){ 101 | install.packages("remotes") 102 | } 103 | remotes::install_github("leeper/csvy") 104 | ``` 105 | 106 | [](https://cran.r-project.org/package=csvy) 107 |  108 | [](https://travis-ci.org/leeper/csvy) 109 | [](https://ci.appveyor.com/project/leeper/csvy) 110 | [](http://codecov.io/github/leeper/csvy?branch=master) 111 | 112 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Import and Export CSV Data With a YAML Metadata Header 2 | 3 | CSVY is a file format that combines the simplicity of CSV (comma-separated values) with the metadata of other plain text and binary formats (JSON, XML, Stata, etc.). The [CSVY file specification](http://csvy.org/) is simple: place a YAML header on top of a regular CSV. The yaml header is formatted according to the [Table Schema](https://frictionlessdata.io/specs/table-schema/) of a [Tabular Data Package](https://frictionlessdata.io/specs/tabular-data-package/). 4 | 5 | A CSVY file looks like this: 6 | 7 | ``` 8 | #--- 9 | #profile: tabular-data-resource 10 | #name: my-dataset 11 | #path: https://raw.githubusercontent.com/csvy/csvy.github.io/master/examples/example.csvy 12 | #title: Example file of csvy 13 | #description: Show a csvy sample file. 14 | #format: csvy 15 | #mediatype: text/vnd.yaml 16 | #encoding: utf-8 17 | #schema: 18 | # fields: 19 | # - name: var1 20 | # type: string 21 | # - name: var2 22 | # type: integer 23 | # - name: var3 24 | # type: number 25 | #dialect: 26 | # csvddfVersion: 1.0 27 | # delimiter: "," 28 | # doubleQuote: false 29 | # lineTerminator: "\r\n" 30 | # quoteChar: "\"" 31 | # skipInitialSpace: true 32 | # header: true 33 | #sources: 34 | #- title: The csvy specifications 35 | # path: http://csvy.org/ 36 | # email: '' 37 | #licenses: 38 | #- name: CC-BY-4.0 39 | # title: Creative Commons Attribution 4.0 40 | # path: https://creativecommons.org/licenses/by/4.0/ 41 | #--- 42 | var1,var2,var3 43 | A,1,2.0 44 | B,3,4.3 45 | ``` 46 | 47 | Which we can read into R like this: 48 | 49 | 50 | 51 | ```r 52 | library("csvy") 53 | str(read_csvy(system.file("examples", "example1.csvy", package = "csvy"))) 54 | ``` 55 | 56 | ``` 57 | ## 'data.frame': 2 obs. of 3 variables: 58 | ## $ var1: chr "A" "B" 59 | ## $ var2: int 1 3 60 | ## $ var3: num 2 4.3 61 | ## - attr(*, "profile")= chr "tabular-data-resource" 62 | ## - attr(*, "title")= chr "Example file of csvy" 63 | ## - attr(*, "description")= chr "Show a csvy sample file." 64 | ## - attr(*, "name")= chr "my-dataset" 65 | ## - attr(*, "format")= chr "csvy" 66 | ## - attr(*, "sources")=List of 1 67 | ## ..$ :List of 3 68 | ## .. ..$ name : chr "CC-BY-4.0" 69 | ## .. ..$ title: chr "Creative Commons Attribution 4.0" 70 | ## .. ..$ path : chr "https://creativecommons.org/licenses/by/4.0/" 71 | ``` 72 | 73 | Optional comment characters on the YAML lines make the data readable with any standard CSV parser while retaining the ability to import and export variable- and file-level metadata. The CSVY specification does not use these, but the csvy package for R does so that you (and other users) can continue to rely on `utils::read.csv()` or `readr::read_csv()` as usual. The `import()` function in [rio](https://cran.r-project.org/package=rio) supports CSVY natively. 74 | 75 | ### Export 76 | 77 | To create a CSVY file from R, just do: 78 | 79 | 80 | ```r 81 | library("csvy") 82 | library("datasets") 83 | write_csvy(iris, "iris.csvy") 84 | ``` 85 | 86 | It is also possible to export the metadata to separate YAML or JSON file (and then also possible to import from those separate files) by specifying the `metadata` field in `write_csvy()` and `read_csvy()`. 87 | 88 | ### Import 89 | 90 | To read a CSVY into R, just do: 91 | 92 | 93 | ```r 94 | d1 <- read_csvy("iris.csvy") 95 | str(d1) 96 | ``` 97 | 98 | ``` 99 | ## 'data.frame': 150 obs. of 5 variables: 100 | ## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... 101 | ## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... 102 | ## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... 103 | ## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... 104 | ## $ Species : chr "setosa" "setosa" "setosa" "setosa" ... 105 | ## ..- attr(*, "levels")= chr "setosa" "versicolor" "virginica" 106 | ## - attr(*, "profile")= chr "tabular-data-package" 107 | ## - attr(*, "name")= chr "iris" 108 | ``` 109 | 110 | or use any other appropriate data import function to ignore the YAML metadata: 111 | 112 | 113 | ```r 114 | d2 <- utils::read.table("iris.csvy", sep = ",", header = TRUE) 115 | str(d2) 116 | ``` 117 | 118 | ``` 119 | ## 'data.frame': 150 obs. of 5 variables: 120 | ## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... 121 | ## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... 122 | ## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... 123 | ## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... 124 | ## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... 125 | ``` 126 | 127 | 128 | 129 | ## Package Installation 130 | 131 | The package is available on [CRAN](https://cran.r-project.org/package=csvy) and can be installed directly in R using: 132 | 133 | ```R 134 | install.packages("csvy") 135 | ``` 136 | 137 | The latest development version on GitHub can be installed using **devtools**: 138 | 139 | ```R 140 | if(!require("remotes")){ 141 | install.packages("remotes") 142 | } 143 | remotes::install_github("leeper/csvy") 144 | ``` 145 | 146 | [](https://cran.r-project.org/package=csvy) 147 |  148 | [](https://travis-ci.org/leeper/csvy) 149 | [](https://ci.appveyor.com/project/leeper/csvy) 150 | [](http://codecov.io/github/leeper/csvy?branch=master) 151 | 152 | -------------------------------------------------------------------------------- /appveyor.yml: -------------------------------------------------------------------------------- 1 | # Download script file from GitHub 2 | init: 3 | ps: | 4 | $ErrorActionPreference = "Stop" 5 | Invoke-WebRequest http://raw.github.com/krlmlr/r-appveyor/master/scripts/appveyor-tool.ps1 -OutFile "..\appveyor-tool.ps1" 6 | Import-Module '..\appveyor-tool.ps1' 7 | 8 | environment: 9 | global: 10 | USE_RTOOLS: true 11 | 12 | install: 13 | ps: Bootstrap 14 | 15 | build_script: 16 | - travis-tool.sh install_deps 17 | 18 | test_script: 19 | - travis-tool.sh run_tests 20 | 21 | artifacts: 22 | - path: '*.Rcheck\**\*.log' 23 | name: Logs 24 | 25 | - path: '*.Rcheck\**\*.out' 26 | name: Logs 27 | 28 | - path: '*.Rcheck\**\*.fail' 29 | name: Logs 30 | 31 | - path: '*.Rcheck\**\*.Rout' 32 | name: Logs 33 | 34 | - path: '\*_*.tar.gz' 35 | name: Bits 36 | 37 | - path: '\*_*.zip' 38 | name: Bits 39 | -------------------------------------------------------------------------------- /docs/CONTRIBUTING.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 |
5 | 6 | 7 | 8 | 9 |Contributions to csvy are welcome from anyone and are best sent as pull requests on the GitHub repository. This page provides some instructions to potential contributors about how to add to the package.
92 |Contributions can be submitted as a pull request on GitHub by forking or cloning the repo, making changes and submitting the pull request.
Pull requests should involve only one commit per substantive change. This means if you change multiple files (e.g., code and documentation), these changes should be committed together. If you don’t know how to do this (e.g., you are making changes in the GitHub web interface) just submit anyway and the maintainer will clean things up.
All contributions must be submitted consistent with the package license (GPL-2).
All contributions need to be noted in the Authors@R
field in the DESCRIPTION. Just follow the format of the existing entries to add your name (and, optionally, email address). Substantial contributions should also be noted in inst/CITATION
.
Please run R CMD BUILD csvy
and R CMD CHECK csvy_VERSION.tar.gz
before submitting the pull request to check for any errors.
Some specific types of changes that you might make are:
100 |Documentation-only changes (e.g., to Rd files, README, vignettes). This is great! All contributions are welcome.
Changes requiring a new package dependency should be discussed on the GitHub issues page before submitting a pull request.
Message translations. These are very appreciated! The format is a pain, but if you’re doing this I’m assuming you’re already familiar with it.
Any questions you have can be opened as GitHub issues or directed to thosjleeper (at) gmail.com.
106 | 107 | 108 |Please specify whether your issue is about:
92 |If you are reporting (1) a bug or (2) a question about code, please supply:
98 |traceback()
run immediately after the error occurssessionInfo()
103 | Put your code here:
106 |## load package
107 | library("csvy")
108 |
109 | ## code goes here
110 |
111 |
112 | ## session info for your system
113 | sessionInfo()
Please ensure the following before submitting a PR:
92 |/R
not /man
and run devtools::document()
to update documentation/tests
for any new functionality or bug fixR CMD check
runs without error before submitting the PRCSVY is a file format that combines the simplicity of CSV (comma-separated values) with the metadata of other plain text and binary formats (JSON, XML, Stata, etc.). The CSVY file specification is simple: place a YAML header on top of a regular CSV. The yaml header is formatted according to the Table Schema of a Tabular Data Package.
74 |A CSVY file looks like this:
75 |#---
76 | #name: my-dataset
77 | #resources:
78 | #- order: 1
79 | # schema:
80 | # fields:
81 | # - name: var1
82 | # type: string
83 | # - name: var2
84 | # type: integer
85 | # - name: var3
86 | # type: number
87 | # dialect:
88 | # csvddfVersion: 1.0
89 | # delimiter: ","
90 | # doubleQuote: false
91 | # lineTerminator: "\r\n"
92 | # quoteChar: "\""
93 | # skipInitialSpace: true
94 | # header: true
95 | ---
96 | var1,var2,var3
97 | A,1,2.0
98 | B,3,4.3
99 | Which we can read into R like this:
100 |library("csvy")
101 | str(read_csvy(system.file("examples", "example3.csvy", package = "csvy")))
## 'data.frame': 2 obs. of 3 variables:
103 | ## $ var1: chr "A" "B"
104 | ## $ var2: int 1 3
105 | ## $ var3: num 2 4.3
106 | Optional comment characters on the YAML lines make the data readable with any standard CSV parser while retaining the ability to import and export variable- and file-level metadata. The CSVY specification does not use these, but the csvy package for R does so that you (and other users) can continue to rely on utils::read.csv()
or readr::read_csv()
as usual. The import()
function in rio supports CSVY natively.
To create a CSVY file from R, just do:
111 |library("csvy")
112 | library("datasets")
113 | write_csvy(iris, "iris.csvy")
It is also possible to export the metadata to separate YAML or JSON file (and then also possible to import from those separate files) by specifying the metadata
field in write_csvy()
and read_csvy()
.
To read a CSVY into R, just do:
120 |d1 <- read_csvy("iris.csvy")
121 | str(d1)
## 'data.frame': 150 obs. of 5 variables:
123 | ## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
124 | ## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
125 | ## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
126 | ## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
127 | ## $ Species : atomic setosa setosa setosa setosa ...
128 | ## ..- attr(*, "levels")= chr "setosa" "versicolor" "virginica"
129 | or use any other appropriate data import function to ignore the YAML metadata:
130 |d2 <- utils::read.table("iris.csvy", sep = ",", header = TRUE)
131 | str(d2)
## 'data.frame': 150 obs. of 5 variables:
133 | ## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
134 | ## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
135 | ## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
136 | ## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
137 | ## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
138 | The package is available on CRAN and can be installed directly in R using:
143 |install.packages("csvy")
The latest development version on GitHub can be installed using devtools:
145 |if(!require("ghit")){
146 | install.packages("ghit")
147 | }
148 | ghit::install_github("leeper/csvy")
NEWS.md
90 | R/csvy-package.R
92 | CSVY is a file format that combines the simplicity of CSV (comma-separated values) with the metadata of other plain text and binary formats (JSON, XML, Stata, etc.). The CSVY file specification is simple: place a YAML header on top of a regular CSV. The csvy package implements this format using two functions: write_csvy
and read_csvy
.
R/read_csvy.R
92 | Import CSVY data as a data.frame
96 | 97 | 98 |read_csvy(file, metadata = NULL, stringsAsFactors = FALSE, ...)99 | 100 |
file | 105 |A character string or R connection specifying a file. |
106 |
---|---|
metadata | 109 |Optionally, a character string specifying a YAML (“.yaml”) or JSON (“.json”) file containing metadata (in lieu of including it in the header of the file). |
110 |
stringsAsFactors | 113 |A logical specifying whether to treat character columns as factors. Passed to |
114 |
… | 117 |Additional arguments passed to |
118 |
131 |read_csvy(system.file("examples", "example3.csvy", package = "csvy"))#> var1 var2 var3 128 | #> 1 A 1 2.0 129 | #> 2 B 3 4.3130 |
R/write_csvy.R
92 | Export data.frame to CSVY
96 | 97 | 98 |write_csvy(x, file, metadata = NULL, sep = ",", sep2 = ".", 99 | comment_header = if (is.null(metadata)) TRUE else FALSE, 100 | name = as.character(substitute(x)), ...)101 | 102 |
x | 107 |A data.frame. |
108 |
---|---|
file | 111 |A character string or R connection specifying a file. |
112 |
metadata | 115 |Optionally, a character string specifying a YAML (“.yaml”) or JSON (“.json”) file to write the metadata (in lieu of including it in the header of the file). |
116 |
sep | 119 |A character string specifying a between-field separator. Passed to |
120 |
sep2 | 123 |A character string specifying a within-field separator. Passed to |
124 |
comment_header | 127 |A logical indicating whether to comment the lines containing the YAML front matter. Default is |
128 |
name | 131 |A character string specifying a name for the dataset. |
132 |
… | 135 |Additional arguments passed to |
136 |
write_csvy
230 |library("datasets") 146 | write_csvy(head(iris))#> #--- 147 | #> #profile: tabular-data-package 148 | #> #name: 149 | #> #- head 150 | #> #- iris 151 | #> #resources: 152 | #> #- order: 1 153 | #> # schema: 154 | #> # fields: 155 | #> # - name: Sepal.Length 156 | #> # type: number 157 | #> # - name: Sepal.Width 158 | #> # type: number 159 | #> # - name: Petal.Length 160 | #> # type: number 161 | #> # - name: Petal.Width 162 | #> # type: number 163 | #> # - name: Species 164 | #> # type: string 165 | #> # levels: 166 | #> # - setosa 167 | #> # - versicolor 168 | #> # - virginica 169 | #> # dialect: 170 | #> # csvddfVersion: 1.0 171 | #> # delimiter: ',' 172 | #> # doubleQuote: no 173 | #> # lineTerminator: \n 174 | #> # escapeChar: \ 175 | #> # quoteChar: '"' 176 | #> # skipInitialSpace: yes 177 | #> # header: yes 178 | #> # caseSensitiveHeader: yes 179 | #> #--- 180 | #> Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species 181 | #> 5.1,3.5,1.4,0.2,setosa 182 | #> 4.9,3,1.4,0.2,setosa 183 | #> 4.7,3.2,1.3,0.2,setosa 184 | #> 4.6,3.1,1.5,0.2,setosa 185 | #> 5,3.6,1.4,0.2,setosa 186 | #> 5.4,3.9,1.7,0.4,setosa187 | # write yaml w/o comment charaters 188 | write_csvy(head(iris), comment_header = FALSE)#> --- 189 | #> profile: tabular-data-package 190 | #> name: 191 | #> - head 192 | #> - iris 193 | #> resources: 194 | #> - order: 1 195 | #> schema: 196 | #> fields: 197 | #> - name: Sepal.Length 198 | #> type: number 199 | #> - name: Sepal.Width 200 | #> type: number 201 | #> - name: Petal.Length 202 | #> type: number 203 | #> - name: Petal.Width 204 | #> type: number 205 | #> - name: Species 206 | #> type: string 207 | #> levels: 208 | #> - setosa 209 | #> - versicolor 210 | #> - virginica 211 | #> dialect: 212 | #> csvddfVersion: 1.0 213 | #> delimiter: ',' 214 | #> doubleQuote: no 215 | #> lineTerminator: \n 216 | #> escapeChar: \ 217 | #> quoteChar: '"' 218 | #> skipInitialSpace: yes 219 | #> header: yes 220 | #> caseSensitiveHeader: yes 221 | #> --- 222 | #> Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species 223 | #> 5.1,3.5,1.4,0.2,setosa 224 | #> 4.9,3,1.4,0.2,setosa 225 | #> 4.7,3.2,1.3,0.2,setosa 226 | #> 4.6,3.1,1.5,0.2,setosa 227 | #> 5,3.6,1.4,0.2,setosa 228 | #> 5.4,3.9,1.7,0.4,setosa229 |