├── .gitignore ├── .travis.yml ├── LICENSE.md ├── README.md ├── bin └── csv-test ├── index.js ├── package.json └── test ├── cli.test.js └── fixtures ├── customValidators.csv ├── customValidators.js ├── customValidators.yml ├── test.csv ├── test.yml └── undefinedFields.yml /.gitignore: -------------------------------------------------------------------------------- 1 | # Logs 2 | logs 3 | *.log 4 | npm-debug.log* 5 | 6 | # Runtime data 7 | pids 8 | *.pid 9 | *.seed 10 | 11 | # Directory for instrumented libs generated by jscoverage/JSCover 12 | lib-cov 13 | 14 | # Coverage directory used by tools like istanbul 15 | coverage 16 | 17 | # Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files) 18 | .grunt 19 | 20 | # node-waf configuration 21 | .lock-wscript 22 | 23 | # Compiled binary addons (http://nodejs.org/api/addons.html) 24 | build/Release 25 | 26 | # Dependency directory 27 | # https://docs.npmjs.com/misc/faq#should-i-check-my-node-modules-folder-into-git 28 | node_modules 29 | 30 | .DS_Store 31 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: node_js 2 | node_js: 3 | - stable 4 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | This project is in the public domain within the United States. 2 | 3 | Additionally, we waive copyright and related rights in the work 4 | worldwide through the CC0 1.0 Universal public domain dedication. 5 | 6 | ## CC0 1.0 Universal Summary 7 | 8 | This is a human-readable summary of the [Legal Code (read the full text)](https://creativecommons.org/publicdomain/zero/1.0/legalcode). 9 | 10 | ### No Copyright 11 | 12 | The person who associated a work with this deed has dedicated the work to 13 | the public domain by waiving all of his or her rights to the work worldwide 14 | under copyright law, including all related and neighboring rights, to the 15 | extent allowed by law. 16 | 17 | You can copy, modify, distribute and perform the work, even for commercial 18 | purposes, all without asking permission. 19 | 20 | ### Other Information 21 | 22 | In no way are the patent or trademark rights of any person affected by CC0, 23 | nor are the rights that other persons may have in the work or in how the 24 | work is used, such as publicity or privacy rights. 25 | 26 | Unless expressly stated otherwise, the person who associated a work with 27 | this deed makes no warranties about the work, and disclaims liability for 28 | all uses of the work, to the fullest extent permitted by applicable law. 29 | When using or citing the work, you should not imply endorsement by the 30 | author or the affirmer. 31 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # csv-test [![Build Status](https://travis-ci.org/dhcole/csv-test.svg)](https://travis-ci.org/dhcole/csv-test) 2 | 3 | A command line application for validating CSV files. Use it for verifying that data files meet defined criteria, like "`age` is a number and greater than 10" or "`email` is a valid email address". 4 | 5 | It works by looping through each field of each row and testing the data value against a set of rules. It provides a log of any validation errors it finds. 6 | 7 | It uses node.js streams so it can work on large files with little memory consumption, tested with a 3,400,256 row CSV file using an average 53mb of memory completing in 1m49.886s. 8 | 9 | ## install 10 | 11 | At the command line / terminal, run the following: 12 | 13 | ```sh 14 | npm install -g csv-test 15 | ``` 16 | 17 | ## run 18 | 19 | Run the `csv-test` command by passing it a configuration file and a CSV file: 20 | 21 | ```sh 22 | csv-test path/to/config.yml path/to/data.csv 23 | ``` 24 | 25 | You will see output about your tests that looks like this: 26 | 27 | ```sh 28 | ✗ [row 2, field email] `isEmail` failed. 29 | 2 rows tested 30 | 1 error found 31 | ``` 32 | 33 | ## configuration 34 | 35 | `csv-test` runs by testing your CSV against a configuration file. The configuration file support [yaml](https://en.wikipedia.org/wiki/YAML), which is a human-readable data format. Here is an example configuration file: 36 | 37 | ```yml 38 | fields: 39 | name: 40 | - isLength: 41 | - 1 42 | - 10 43 | age: 44 | - isInt: 45 | max: 90 46 | min: 10 47 | email: isEmail 48 | ``` 49 | 50 | Start your configuration file with a `fields` key to define the validation settings for each field. Then specify a key for each field you'd like to validate. Fields that do not have rules specified will be skipped. 51 | 52 | Field validation settings can take a number of forms. For fields with only one rule, they can be strings, such as `age: isInt`. 53 | 54 | For fields with multiple rules or rules that take named options, use the array form: 55 | 56 | ```yaml 57 | age: 58 | - isNumeric 59 | - isInt: 60 | max: 90 61 | min: 10 62 | ``` 63 | 64 | Some rules take options as an array, such as `isLength`. Those rules should list options as an array: 65 | 66 | ```yml 67 | name: 68 | - isLength: 69 | - 1 70 | - 10 71 | ``` 72 | 73 | Rules that only take one option can be written as follows: 74 | 75 | ```yml 76 | state: 77 | contains: NJ 78 | ``` 79 | 80 | The configuration file also supports a `settings` key at the top level, which will configure how the CSV file should be parsed. For instance, `delimiter: ";"` tells the parser to parse on semicolons instead of commas. See the [node-csv-parse documentation](http://csv.adaltas.com/parse/#parser-options) for all available parse options. 81 | 82 | ## validation options 83 | 84 | `csv-test` uses the [validator.js](https://github.com/chriso/validator.js) library for validating data. Available validation rules include the following: 85 | 86 | ### Validators 87 | 88 | - **contains(str, seed)** - check if the string contains the seed. 89 | - **equals(str, comparison)** - check if the string matches the comparison. 90 | - **isAfter(str [, date])** - check if the string is a date that's after the specified date (defaults to now). 91 | - **isAlpha(str)** - check if the string contains only letters (a-zA-Z). 92 | - **isAlphanumeric(str)** - check if the string contains only letters and numbers. 93 | - **isAscii(str)** - check if the string contains ASCII chars only. 94 | - **isBase64(str)** - check if a string is base64 encoded. 95 | - **isBefore(str [, date])** - check if the string is a date that's before the specified date. 96 | - **isBoolean(str)** - check if a string is a boolean. 97 | - **isByteLength(str, min [, max])** - check if the string's length (in bytes) falls in a range. 98 | - **isCreditCard(str)** - check if the string is a credit card. 99 | - **isCurrency(str, options)** - check if the string is a valid currency amount. `options` is an object which defaults to `{symbol: '$', require_symbol: false, allow_space_after_symbol: false, symbol_after_digits: false, allow_negatives: true, parens_for_negatives: false, negative_sign_before_digits: false, negative_sign_after_digits: false, allow_negative_sign_placeholder: false, thousands_separator: ',', decimal_separator: '.', allow_space_after_digits: false }`. 100 | - **isDate(str)** - check if the string is a date. 101 | - **isDecimal(str)** - check if the string represents a decimal number, such as 0.1, .3, 1.1, 1.00003, 4.0, etc. 102 | - **isDivisibleBy(str, number)** - check if the string is a number that's divisible by another. 103 | - **isEmail(str [, options])** - check if the string is an email. `options` is an object which defaults to `{ allow_display_name: false, allow_utf8_local_part: true, require_tld: true }`. If `allow_display_name` is set to true, the validator will also match `Display Name `. If `allow_utf8_local_part` is set to false, the validator will not allow any non-English UTF8 character in email address' local part. If `require_tld` is set to false, e-mail addresses without having TLD in their domain will also be matched. 104 | - **isFQDN(str [, options])** - check if the string is a fully qualified domain name (e.g. domain.com). `options` is an object which defaults to `{ require_tld: true, allow_underscores: false, allow_trailing_dot: false }`. 105 | - **isFloat(str [, options])** - check if the string is a float. `options` is an object which can contain the keys `min` and/or `max` to validate the float is within boundaries (e.g. `{ min: 7.22, max: 9.55 }`). 106 | - **isFullWidth(str)** - check if the string contains any full-width chars. 107 | - **isHalfWidth(str)** - check if the string contains any half-width chars. 108 | - **isHexColor(str)** - check if the string is a hexadecimal color. 109 | - **isHexadecimal(str)** - check if the string is a hexadecimal number. 110 | - **isIP(str [, version])** - check if the string is an IP (version 4 or 6). 111 | - **isISBN(str [, version])** - check if the string is an ISBN (version 10 or 13). 112 | - **isISIN(str)** - check if the string is an [ISIN][ISIN] (stock/security identifier). 113 | - **isISO8601(str)** - check if the string is a valid [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) date. 114 | - **isIn(str, values)** - check if the string is in a array of allowed values. 115 | - **isInt(str [, options])** - check if the string is an integer. `options` is an object which can contain the keys `min` and/or `max` to check the integer is within boundaries (e.g. `{ min: 10, max: 99 }`). 116 | - **isJSON(str)** - check if the string is valid JSON (note: uses JSON.parse). 117 | - **isLength(str, min [, max])** - check if the string's length falls in a range. Note: this function takes into account surrogate pairs. 118 | - **isLowercase(str)** - check if the string is lowercase. 119 | - **isMobilePhone(str, locale)** - check if the string is a mobile phone number, (locale is one of `['zh-CN', 'en-ZA', 'en-AU', 'en-HK', 'pt-PT', 'fr-FR', 'el-GR', 'en-GB', 'en-US', 'en-ZM', 'ru-RU']`). 120 | - **isMongoId(str)** - check if the string is a valid hex-encoded representation of a [MongoDB ObjectId][mongoid]. 121 | - **isMultibyte(str)** - check if the string contains one or more multibyte chars. 122 | - **isNull(str)** - check if the string is null. 123 | - **isNumeric(str)** - check if the string contains only numbers. 124 | - **isSurrogatePair(str)** - check if the string contains any surrogate pairs chars. 125 | - **isURL(str [, options])** - check if the string is an URL. `options` is an object which defaults to `{ protocols: ['http','https','ftp'], require_tld: true, require_protocol: false, require_valid_protocol: true, allow_underscores: false, host_whitelist: false, host_blacklist: false, allow_trailing_dot: false, allow_protocol_relative_urls: false }`. 126 | - **isUUID(str [, version])** - check if the string is a UUID (version 3, 4 or 5). 127 | - **isUppercase(str)** - check if the string is uppercase. 128 | - **isVariableWidth(str)** - check if the string contains a mixture of full and half-width chars. 129 | - **matches(str, pattern [, modifiers])** - check if string matches the pattern. For example: `matches('foo', 'foo', 'i')`. 130 | 131 | [mongoid]: http://docs.mongodb.org/manual/reference/object-id/ 132 | [ISIN]: https://en.wikipedia.org/wiki/International_Securities_Identification_Number 133 | 134 | source: [validator.js documentation](https://github.com/chriso/validator.js/blob/master/README.md#validators) 135 | 136 | ### custom validators 137 | 138 | You can test against custom validators by defining your own validation functions in a file passed as a third argument: `csv-test path/to/config.yml path/to/data.csv path/to/customValidators.js` 139 | 140 | Your custom validators file should export an object with keys as validator names and values as functions. The functions take the string of the test value as the first argument and any subsequent options specified in the configuration file as additional arguments. They also have the field name, field value, and row for the test set to the context object. Access them in the validation function with `this.field`, `this.value`, and `this.row`. 141 | 142 | An example file of custom validators might look like this: 143 | 144 | ```js 145 | module.exports = { 146 | startsWith: function(str, seed) { 147 | return str.indexOf(seed) === 0; 148 | }, 149 | sumFields: function(str) { 150 | var cols = Array.prototype.slice.call(arguments); 151 | cols.shift(); 152 | var sum = cols.reduce(function(memo, col) { 153 | return memo + +this.row[col]; 154 | }, 0); 155 | return sum === +str; 156 | } 157 | }; 158 | ``` 159 | 160 | And these rules would be used in a configuration file like this: 161 | 162 | ```yml 163 | fields: 164 | id: 165 | - startsWith: n 166 | - isLength: 1 167 | cost: isInt 168 | fees: isInt 169 | sum: 170 | - sumFields: 171 | - cost 172 | - fees 173 | ``` 174 | 175 | ## public domain 176 | 177 | This project is in the worldwide [public domain](LICENSE.md). 178 | 179 | > This project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the [CC0 1.0 Universal public domain dedication](https://creativecommons.org/publicdomain/zero/1.0/). 180 | > 181 | > All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest. 182 | -------------------------------------------------------------------------------- /bin/csv-test: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env node 2 | 3 | var fs = require('fs'), 4 | path = require('path'), 5 | args = process.argv.slice(2), 6 | config = fs.readFileSync(path.resolve(process.cwd(), args[0]), 'utf8'), 7 | stream = fs.createReadStream(path.resolve(process.cwd(), args[1]), 'utf8'), 8 | validators = args[2] && require(path.resolve(process.cwd(), args[2])); 9 | 10 | require('..')(config, stream, validators); 11 | -------------------------------------------------------------------------------- /index.js: -------------------------------------------------------------------------------- 1 | var _ = require('lodash'), 2 | chalk = require('chalk'), 3 | yaml = require('js-yaml'), 4 | parse = require('csv-parse'), 5 | validator = require('validator'), 6 | transform = require('stream-transform'); 7 | 8 | module.exports = function(yml, stream, validators) { 9 | 10 | var index = 0, 11 | errorCount = 0, 12 | config = yaml.safeLoad(yml), 13 | settings = _.extend({ columns: true }, config.settings), 14 | parser = parse(settings), 15 | transformer = transform(testRow); 16 | 17 | if (validators && _.isObject(validators)) addValidators(validators); 18 | 19 | stream.pipe(parser).pipe(transformer); 20 | stream.on('end', finish); 21 | 22 | function testRow(row) { 23 | index++; 24 | _.each(row, function(value, field) { 25 | var error = testField(value, field, row); 26 | if (error) console.error(chalk.red(error)); 27 | }); 28 | } 29 | 30 | function testField(value, field, row) { 31 | var fieldIndex = '✗ [row ' + index + ', field ' + field + '] ', 32 | rules = config.fields[field], 33 | output = []; 34 | 35 | rules = _.isString(rules) ? [rules] : rules; 36 | 37 | _.each(rules, function(options, rule) { 38 | if (_.isArray(rules)) { 39 | rule = options; 40 | options = undefined; 41 | } 42 | 43 | if (_.isObject(rule)) { 44 | options = _.values(rule)[0]; 45 | rule = _.keys(rule)[0]; 46 | } 47 | 48 | var args = _.isArray(options) ? 49 | [value].concat(options) : [value, options], 50 | test = validator[rule]; 51 | 52 | if (!test) throw new Error('`' + rule + '` is not a valid rule.'); 53 | 54 | this.row = row; 55 | this.field = field; 56 | this.value = value; 57 | 58 | if (!test.apply(this, args)) { 59 | errorCount = errorCount + 1; 60 | output.push(fieldIndex + '`' + rule + '` failed.'); 61 | } 62 | }); 63 | 64 | return output.join('\n'); 65 | } 66 | 67 | function finish() { 68 | console.log(index + ' rows tested'); 69 | if (errorCount) { 70 | var label = errorCount === 1 ? ' error' : ' errors'; 71 | console.log(chalk.red(errorCount + label + ' found')); 72 | process.exit(1); 73 | } else { 74 | console.log(chalk.green('no errors found')); 75 | process.exit(0); 76 | } 77 | } 78 | 79 | function addValidators(validators) { 80 | _.each(validators, function(func, key) { 81 | validator.extend(key, func); 82 | }); 83 | } 84 | 85 | }; 86 | -------------------------------------------------------------------------------- /package.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "csv-test", 3 | "version": "0.1.0", 4 | "description": "A command line application for validating CSV files", 5 | "main": "index.js", 6 | "bin": "./bin/csv-test", 7 | "scripts": { 8 | "test": "tape test/*.test.js" 9 | }, 10 | "author": "dhcole", 11 | "license": "public domain", 12 | "repository": { 13 | "type": "git", 14 | "url": "https://github.com/dhcole/csv-test.git" 15 | }, 16 | "bugs": { 17 | "url": "https://github.com/dhcole/csv-test/issues" 18 | }, 19 | "dependencies": { 20 | "chalk": "^1.1.0", 21 | "csv-parse": "^1.0.0", 22 | "js-yaml": "^3.3.1", 23 | "lodash": "^3.10.1", 24 | "stream-transform": "^0.1.0", 25 | "validator": "^4.0.5" 26 | }, 27 | "devDependencies": { 28 | "tape": "^4.2.0" 29 | } 30 | } 31 | -------------------------------------------------------------------------------- /test/cli.test.js: -------------------------------------------------------------------------------- 1 | var tape = require('tape'), 2 | exec = require('child_process').exec; 3 | 4 | tape('execute CLI script', function(t) { 5 | var command = 'bin/csv-test test/fixtures/test.yml test/fixtures/test.csv'; 6 | exec(command, function(error, stdout, stderr) { 7 | if (error) console.error(error); 8 | t.plan(1); 9 | t.assert(!error, 'runs without error'); 10 | }); 11 | }); 12 | 13 | tape('allow fields without rules', function(t) { 14 | var command = 'bin/csv-test test/fixtures/undefinedFields.yml test/fixtures/test.csv'; 15 | exec(command, function(error, stdout, stderr) { 16 | if (error) console.error(error); 17 | t.plan(1); 18 | t.assert(!error, 'runs without error'); 19 | }); 20 | }); 21 | 22 | tape('support custom rules', function(t) { 23 | var command = 'bin/csv-test test/fixtures/customValidators.yml test/fixtures/customValidators.csv test/fixtures/customValidators.js'; 24 | exec(command, function(error, stdout, stderr) { 25 | if (error) console.error(error); 26 | t.plan(1); 27 | t.assert(!error, 'runs without error'); 28 | }); 29 | }); 30 | -------------------------------------------------------------------------------- /test/fixtures/customValidators.csv: -------------------------------------------------------------------------------- 1 | id,cost,fees,sum 2 | n1,10,6,16 3 | n2,20,12,32 4 | -------------------------------------------------------------------------------- /test/fixtures/customValidators.js: -------------------------------------------------------------------------------- 1 | module.exports = { 2 | startsWith: function(str, seed) { 3 | return str.indexOf(seed) === 0; 4 | }, 5 | sumFields: function(str) { 6 | var cols = Array.prototype.slice.call(arguments); 7 | cols.shift(); 8 | var sum = cols.reduce(function(memo, col) { 9 | return memo + +this.row[col]; 10 | }, 0); 11 | return sum === +str; 12 | } 13 | }; 14 | -------------------------------------------------------------------------------- /test/fixtures/customValidators.yml: -------------------------------------------------------------------------------- 1 | fields: 2 | id: 3 | - startsWith: n 4 | - isLength: 1 5 | cost: isInt 6 | fees: isInt 7 | sum: 8 | - sumFields: 9 | - cost 10 | - fees 11 | -------------------------------------------------------------------------------- /test/fixtures/test.csv: -------------------------------------------------------------------------------- 1 | name,age,email 2 | dave,30,dave@example.com 3 | itir,30,itir@example.com 4 | -------------------------------------------------------------------------------- /test/fixtures/test.yml: -------------------------------------------------------------------------------- 1 | fields: 2 | name: 3 | - isLength: 4 | - 1 5 | - 10 6 | age: 7 | - isInt: 8 | max: 90 9 | min: 10 10 | email: isEmail 11 | -------------------------------------------------------------------------------- /test/fixtures/undefinedFields.yml: -------------------------------------------------------------------------------- 1 | fields: 2 | name: 3 | - isLength: 4 | - 1 5 | - 10 6 | email: isEmail 7 | --------------------------------------------------------------------------------