├── .gitignore ├── Pipfile ├── README.rst ├── safeyaml.md ├── safeyaml.py ├── tests.py ├── tests ├── fix │ ├── nospace.yaml │ ├── nospace.yaml.output │ ├── unquoted.yaml │ └── unquoted.yaml.output └── validate │ ├── 0.yaml │ ├── 0005_bad.yaml │ ├── 0005_bad.yaml.error │ ├── 1.yaml │ ├── 2.yaml │ ├── 3.yaml │ └── 4.yaml └── tox.ini /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__ 2 | .tox 3 | .cache 4 | Pipfile.lock 5 | -------------------------------------------------------------------------------- /Pipfile: -------------------------------------------------------------------------------- 1 | [[source]] 2 | url = "https://pypi.python.org/simple" 3 | verify_ssl = true 4 | 5 | [dev-packages] 6 | pytest = "*" 7 | PyYAML = "*" 8 | -------------------------------------------------------------------------------- /README.rst: -------------------------------------------------------------------------------- 1 | SafeYAML 2 | ======== 3 | 4 | SafeYAML is an aggressively small subset of YAML. It's everything you need for 5 | human-readable-and-writable configuration files, and nothing more. 6 | 7 | You don't need to integrate a new parser library: keep using your language's 8 | best-maintained YAML parser, and drop the ``safeyaml`` linter into your CI 9 | pipeline, pre-commit hook and/or text editor. It's a standalone script, so you 10 | don't have any new dependencies to worry about. 11 | 12 | 13 | What's allowed? 14 | --------------- 15 | 16 | It's best described as JSON plus the following: 17 | 18 | - You can use indentation for structure (braces are optional) 19 | - Keys can be unquoted (``foo: 1``, rather than ``"foo": 1``), or quoted with ``''`` instead 20 | - Single-line comments with ``#`` 21 | - Trailing commas allowed within ``[]`` or ``{}`` 22 | 23 | More details are in the specification ``safeyaml.md`` 24 | 25 | Here's an example:: 26 | 27 | title: "SafeYAML Example" 28 | 29 | database: 30 | server: "192.168.1.1" 31 | 32 | ports: 33 | - 8000 34 | - 8001 35 | - 8002 36 | 37 | enabled: true 38 | 39 | servers: 40 | # JSON-style objects 41 | alpha: { 42 | "ip": "10.0.0.1", 43 | "names": [ 44 | "alpha", 45 | "alpha.server", 46 | ], 47 | } 48 | beta: { 49 | "ip": "10.0.0.2", 50 | "names": ["beta"], 51 | } 52 | 53 | As for what's disallowed: a lot. String values must always be quoted. Boolean 54 | values must be written as ``true`` or ``false`` (``yes``, ``Y``, ``y``, ``on`` 55 | etc are not allowed). *Indented blocks must start on their own line*. 56 | 57 | No anchors, no references. No multi-line strings. No multi-document streams. No 58 | custom tagged values. No Octal, or Hexadecimal. *No sexagesimal numbers.* 59 | 60 | 61 | Why? 62 | ---- 63 | 64 | The prevalence of YAML as a configuration format is testament to the 65 | unfriendliness of JSON, but the YAML language is terrifyingly huge and full of 66 | pitfalls for people just trying to write configuration (the number of ways to 67 | write a Boolean value is a prime example). 68 | 69 | There have been plenty of attempts to define other configuration languages 70 | (TOML, Hugo, HJSON, etc). Subjectively, most of them are less friendly than YAML 71 | (``.ini``-style formats quickly become cumbersome for structures with two or 72 | more levels of nesting). Objectively, *all* of them face the uphill struggle of 73 | needing a parser to be written and maintained in every popular programming 74 | language. 75 | 76 | A language which is a subset of YAML, however, needs no new parser - just a 77 | linter to ensure that files conform. The ``safeyaml`` linter is an independent 78 | executable, so whatever language and tooling you're currently using, you can 79 | continue to use it - it's just one more step in your code quality process. 80 | 81 | 82 | How do I use it? 83 | ---------------- 84 | 85 | The ``safeyaml`` executable will validate your YAML code, or fail with an error 86 | if it can't. Here's an example of a passing validation:: 87 | 88 | $ cat input.yaml 89 | title: "My YAML file" 90 | 91 | $ safeyaml input.yaml 92 | title: "My YAML file" 93 | 94 | Here's an example of an error:: 95 | 96 | $ cat input.yaml 97 | command: yes 98 | 99 | $ safeyaml input.yaml 100 | input.yaml:1:11:Can't use 'yes' as a value. Please either surround it in quotes 101 | if it's a string, or replace it with `true` if it's a boolean. 102 | 103 | With the ``--fix`` option, ``safeyaml`` can automatically repair some problems 104 | within YAML files. 105 | 106 | Here's an example file that has some problems:: 107 | 108 | $ cat input.yaml 109 | name: sonic the hedgehog # Unquoted string 110 | settings:{a:1,b:2} # Missing ' ' after ':' 111 | list: 112 | - "item" # Unindented list item 113 | 114 | $ safeyaml --fix input.yaml 115 | name: "sonic the hedgehog" 116 | settings: {a: 1,b: 2} 117 | list: 118 | - "item" 119 | 120 | To rewrite the fixed YAML back to the input files, pass the ``--in-place`` flag:: 121 | 122 | $ safeyaml --fix --in-place input.yaml 123 | 124 | You can turn individual "fix" rules off and on: 125 | 126 | ``--fix-unquoted`` will put quotes around unquoted strings inside an indented map. This does not affect map keys (which must still be in identifier format, i.e ``a1.b2.c2``). 127 | 128 | ``--fix-nospace`` ensures that at least one space follows the ``:`` after every key. 129 | 130 | ``--fix-nodent`` ensures list items inside maps are further indented than their parent key. 131 | 132 | There are also some more forceful options which aren't included in ``--fix``: 133 | 134 | ``--force-string-keys`` turns every key into a string. This will replace any key that has a boolean or null ('true' etc) with the string version (i.e ``"true"``). 135 | 136 | ``--force-commas`` ensures every non-empty list or map has a trailing comma. 137 | 138 | 139 | Other Arguments 140 | --------------- 141 | 142 | ``--json`` output JSON instead of YAML. 143 | 144 | ``--quiet`` don't output YAML on success. 145 | 146 | 147 | How do I generate it? 148 | --------------------- 149 | 150 | Don't. Generating YAML is almost always a bad idea. Generate JSON if you need to 151 | serialize data. 152 | -------------------------------------------------------------------------------- /safeyaml.md: -------------------------------------------------------------------------------- 1 | # Specification 2 | 3 | This is a rough overview of the subset of YAML, but it's best to think of it as a superset of JSON, with: 4 | 5 | Like JSON: 6 | 7 | - Root Objects can only be lists or objects, and not strings or numbers. 8 | - Non-Zero Integers cannot have a leading 0 9 | - Strings cannot be multi line 10 | 11 | Like YAML: 12 | 13 | - Trailing commas allowed in `[]`, or `{}` 14 | - Byte Order Marks are ignored at start of document 15 | - Trailing whitespace is ignored 16 | - Objects and lists have *flow* syntax, or indented block forms. 17 | - A ": " must follow a bareword key 18 | - Strings can use the `\xFF` and `\UFFFFFFFF` along with `\uFFFF` to specify a codepoint 19 | - Unquoted keys are supported that match an indentifier like format (leading character (not a number) followed by any number, char, `.` or `_`. 20 | 21 | Unlike both: 22 | 23 | - JSON allows surrogate pairs, SafeYAML requries utf-8 and codepoints. 24 | - JSON and YAML allow duplicate keys, SafeYAML rejects them 25 | - Indented maps/lists take a value on the same line *or* an indented map/list on the next line 26 | 27 | Not in SafeYAML but in YAML 28 | 29 | - `*` and `&` are unsupported operations. 30 | - No tags allowed `!` `!!`. 31 | - No multiline strings, or flow-forms for strings '|' '>' 32 | - All YAML string escapes (except `x` `U` are unsupported) 33 | - Merge keys or '<<' unsupported 34 | - No '?' key syntax 35 | - Indented blocks cannot be nested on the same line. 36 | 37 | ## Rough Grammar 38 | 39 | ws :== (whitespace| newline | comment)* 40 | 41 | document :== bom? ws root ws 42 | 43 | root :== object | list | indented_object | indented_list 44 | 45 | value :== object | list | string | number | builtin 46 | 47 | object = '{' ws key value ws (',' ws key value ws)* (',')? ws '}' 48 | 49 | key = string ws ':' ws | bareword ': ' ws 50 | 51 | list = '[' ws value ws (',' ws value ws)* (',')? ws ']' 52 | 53 | string = '"' string_contents '"' | '\'' string_contents '\'' 54 | 55 | number = integer | floating_point 56 | 57 | builtin = 'null' | 'true' | 'false' 58 | 59 | indented_key :== string | bareword 60 | 61 | indented_value :== indented_object | indented_list | value 62 | 63 | indented_object :== indent indented_key ': ' indented_value (nl indented_key ': ' idented_value) dedent 64 | 65 | indented_list :== indent '- ' indented_value (nl '- ' idented_value) dedent 66 | 67 | # Indentation Rules 68 | 69 | Idented blocks `- item` or `name: value` cannot be nested on the same line, and nested items must be indented further in. 70 | 71 | For example 72 | 73 | ``` 74 | - "One" 75 | - "Two" 76 | - 77 | name_one: 3 78 | name_two: 4 79 | 80 | ``` 81 | 82 | And not: 83 | 84 | ``` 85 | name: 86 | - 1 # Error: shoud be indented one 87 | ``` 88 | 89 | Indented blocks must not share lines: 90 | 91 | ``` 92 | - a: thing # a:thing needs to be on own line 93 | 94 | - - thing_a # - thing_a needs own line 95 | - thing_b 96 | ``` 97 | 98 | Like so: 99 | 100 | ``` 101 | - 102 | a: thing 103 | - 104 | - thing 105 | - thing 106 | ``` 107 | 108 | # Barewords 109 | 110 | Barewords are allowed as the keys for objects, that match a identifier like pattern 111 | 112 | - Leading Character (non digit) 113 | 114 | In repair, keys still have to match identifiers, but values can have a string until end of line (assuming no special characters) 115 | 116 | - Then Alphanumeric, `_` and `.` 117 | -------------------------------------------------------------------------------- /safeyaml.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import io 4 | import re 5 | import sys 6 | import json 7 | import argparse 8 | 9 | from collections import OrderedDict 10 | 11 | whitespace = re.compile(r"(?:\ |\t|\r|\n)+") 12 | 13 | comment = re.compile(r"(#[^\r\n]*(?:\r?\n|$))+") 14 | import sys 15 | 16 | int_b10 = re.compile(r"\d[\d]*") 17 | flt_b10 = re.compile(r"\.[\d]+") 18 | exp_b10 = re.compile(r"[eE](?:\+|-)?[\d+]") 19 | 20 | string_dq = re.compile( 21 | r'"(?:[^"\\\n\x00-\x1F\uD800-\uDFFF]|\\(?:[\'"\\/bfnrt]|x[0-9a-fA-F]{2}|u[0-9a-fA-F]{4}|U[0-9a-fA-F]{8}))*"') 22 | string_sq = re.compile( 23 | r"'(?:[^'\\\n\x00-\x1F\uD800-\uDFFF]|\\(?:[\"'\\/bfnrt]|x[0-9a-fA-F]{2}|u[0-9a-fA-F]{4}|U[0-9a-fA-F]{8}))*'") 24 | 25 | identifier = re.compile(r"(?!\d)[\w\.]+") 26 | barewords = re.compile( 27 | r"(?!\d)(?:(?![\r\n#$@%`,:\"\|'\[\]\{\}\&\*\?\<\>]).|:[^\r\n\s])*") 28 | 29 | key_name = re.compile("(?:{}|{}|{})".format( 30 | string_dq.pattern, string_sq.pattern, identifier.pattern)) 31 | 32 | str_escapes = { 33 | 'b': '\b', 34 | 'n': '\n', 35 | 'f': '\f', 36 | 'r': '\r', 37 | 't': '\t', 38 | '/': '/', 39 | '"': '"', 40 | "'": "'", 41 | '\\': '\\', 42 | } 43 | 44 | builtin_names = {'null': None, 'true': True, 'false': False} 45 | 46 | reserved_names = set("y|n|yes|no|on|off".split("|")) 47 | 48 | newlines = re.compile(r'\r?\n') # Todo: unicode 49 | 50 | 51 | def get_position(buf, pos): 52 | "Given a new offset, find the next position" 53 | line = 1 54 | line_off = 0 55 | for match in newlines.finditer(buf, 0, pos): 56 | line += 1 57 | line_off = match.end() 58 | 59 | col = (pos - line_off) + 1 60 | return line, col 61 | 62 | 63 | class Options: 64 | def __init__(self, fix_unquoted=False, fix_nospace=False, force_string_keys=False, force_commas=False): 65 | self.fix_unquoted = fix_unquoted 66 | self.fix_nospace = fix_nospace 67 | self.force_string_keys = force_string_keys 68 | self.force_commas = force_commas 69 | 70 | 71 | class ParserErr(Exception): 72 | def name(self): 73 | return self.__class__.__name__ 74 | 75 | def explain(self): 76 | return self.reason 77 | 78 | def __init__(self, buf, pos, reason=None): 79 | self.buf = buf 80 | self.pos = pos 81 | if reason is None: 82 | nl = buf.rfind(' ', pos - 10, pos) 83 | if nl < 0: 84 | nl = pos - 5 85 | reason = "Unknown Character {} (context: {})".format( 86 | repr(buf[pos]), repr(buf[pos - 10:pos + 5])) 87 | self.reason = reason 88 | Exception.__init__(self, "{} (at pos={})".format(reason, pos)) 89 | 90 | 91 | class SemanticErr(ParserErr): 92 | pass 93 | 94 | 95 | class DuplicateKey(SemanticErr): 96 | pass 97 | 98 | 99 | class ReservedKey(SemanticErr): 100 | pass 101 | 102 | 103 | class SyntaxErr(ParserErr): 104 | pass 105 | 106 | 107 | class BadIndent(SyntaxErr): 108 | pass 109 | 110 | 111 | class BadKey(SyntaxErr): 112 | pass 113 | 114 | 115 | class Bareword(ParserErr): 116 | pass 117 | 118 | 119 | class BadString(SyntaxErr): 120 | pass 121 | 122 | 123 | class BadNumber(SyntaxErr): 124 | pass 125 | 126 | 127 | class NoRootObject(SyntaxErr): 128 | pass 129 | 130 | 131 | class ObjectIndentationErr(SyntaxErr): 132 | pass 133 | 134 | 135 | class TrailingContent(SyntaxErr): 136 | pass 137 | 138 | 139 | class UnsupportedYAML(ParserErr): 140 | pass 141 | 142 | 143 | class UnsupportedEscape(ParserErr): 144 | pass 145 | 146 | 147 | def parse(buf, output=None, options=None): 148 | if not buf: 149 | raise NoRootObject(buf, 0, "Empty Document") 150 | 151 | output = output or io.StringIO() 152 | options = options or Options() 153 | pos = 1 if buf.startswith("\uFEFF") else 0 154 | 155 | out = [] 156 | while pos != len(buf): 157 | obj, pos = parse_document(buf, pos, output, options) 158 | out.append(obj) 159 | 160 | if buf[pos:pos+3] == '---': 161 | output.write(buf[pos:pos+3]) 162 | pos += 3 163 | elif pos < len(buf): 164 | raise TrailingContent(buf, pos, "Trailing content: {}".format( 165 | repr(buf[pos:pos + 10]))) 166 | 167 | return out 168 | 169 | def parse_document(buf, pos, output, options): 170 | obj, pos = parse_structure(buf, pos, output, options, at_root=True) 171 | 172 | start = pos 173 | m = whitespace.match(buf, pos) 174 | while m: 175 | pos = m.end() 176 | m = comment.match(buf, pos) 177 | if m: 178 | pos = m.end() 179 | m = whitespace.match(buf, pos) 180 | output.write(buf[start:pos]) 181 | return obj, pos 182 | 183 | 184 | def peek_line(buf,pos): 185 | start = pos 186 | while pos < len(buf): 187 | peek = buf[pos] 188 | if peek in ('\r','\n'): break 189 | pos +=1 190 | return buf[start:pos] 191 | 192 | 193 | def move_to_next(buf, pos): 194 | line_pos = pos 195 | next_line = False 196 | while pos < len(buf): 197 | peek = buf[pos] 198 | 199 | if peek == ' ': 200 | pos += 1 201 | elif peek == '\n' or peek == '\r': 202 | pos += 1 203 | line_pos = pos 204 | next_line = True 205 | elif peek == '#': 206 | next_line = True 207 | while pos < len(buf): 208 | pos += 1 209 | if buf[pos] == '\r' or buf[pos] == '\n': 210 | line_pos = pos 211 | next_line = True 212 | break 213 | else: 214 | break 215 | return pos, pos - line_pos, next_line 216 | 217 | 218 | def skip_whitespace(buf, pos, output): 219 | m = whitespace.match(buf, pos) 220 | while m: 221 | output.write(buf[pos:m.end()]) 222 | pos = m.end() 223 | m = comment.match(buf, pos) 224 | if m: 225 | output.write(buf[pos:m.end()]) 226 | pos = m.end() 227 | m = whitespace.match(buf, pos) 228 | return pos 229 | 230 | 231 | def parse_structure(buf, pos, output, options, indent=0, at_root=False): 232 | while True: 233 | start = pos 234 | pos, my_indent, next_line = move_to_next(buf, pos) 235 | 236 | if my_indent < indent: 237 | raise BadIndent( 238 | buf, pos, "The parser has gotten terribly confused, I'm sorry. Try re-indenting") 239 | 240 | output.write(buf[start:pos]) 241 | peek = buf[pos] 242 | 243 | if peek in ('*', '&', '?', '|', '<', '>', '%', '@'): 244 | raise UnsupportedYAML( 245 | buf, pos, "I found a {} outside of quotes. It's too special to let pass. Anchors, References, and other directives are not valid SafeYAML, Sorry.".format(peek)) 246 | 247 | if peek == '-' and buf[pos:pos + 3] == '---': 248 | output.write(buf[pos:pos+3]) 249 | pos += 3 250 | continue 251 | break 252 | 253 | if peek == '-': 254 | return parse_indented_list(buf, pos, output, options, my_indent) 255 | 256 | m = key_name.match(buf, pos) 257 | 258 | if peek == '"' or peek == '"' or m: 259 | return parse_indented_map(buf, pos, output, options, my_indent, at_root) 260 | 261 | if peek == '{': 262 | if at_root: 263 | return parse_map(buf, pos, output, options) 264 | else: 265 | raise BadIndent( 266 | buf, pos, "Expected an indented object or indented list, but found {} on next line") 267 | if peek == '[': 268 | if at_root: 269 | return parse_list(buf, pos, output, options) 270 | else: 271 | raise BadIndent( 272 | buf, pos, "Expected an indented object or indented list, but found [] on next line") 273 | 274 | if peek in "+-0123456789": 275 | if at_root: 276 | raise NoRootObject( 277 | buf, pos, "No root object found: expected object or list, found start of number") 278 | else: 279 | raise BadIndent( 280 | buf, pos, "Expected an indented object or indented list, but found start of number on next line.") 281 | 282 | raise SyntaxErr( 283 | buf, pos, "The parser has become terribly confused, I'm sorry") 284 | 285 | 286 | def parse_indented_list(buf, pos, output, options, my_indent): 287 | out = [] 288 | while pos < len(buf): 289 | if buf[pos] != '-': 290 | break 291 | output.write("-") 292 | pos += 1 293 | if buf[pos] not in (' ', '\r', '\n'): 294 | raise BadKey( 295 | buf, pos, "For indented lists i.e '- foo', the '-' must be followed by ' ', or '\n', not: {}".format(buf[pos - 1:pos + 1])) 296 | 297 | new_pos, new_indent, next_line = move_to_next(buf, pos) 298 | if next_line and new_indent <= my_indent: 299 | raise BadIndent( 300 | buf, new_pos, "Expecting a list item, but the next line isn't indented enough") 301 | 302 | if not next_line: 303 | output.write(buf[pos:new_pos]) 304 | line = peek_line(buf,pos) 305 | if ': ' in line: 306 | new_indent = my_indent + 1 +(new_pos-pos) 307 | obj, pos = parse_indented_map(buf, new_pos, output, options, new_indent, at_root=False) 308 | else: 309 | obj, pos = parse_value(buf, new_pos, output, options) 310 | 311 | else: 312 | obj, pos = parse_structure( 313 | buf, pos, output, options, indent=my_indent) 314 | 315 | out.append(obj) 316 | 317 | new_pos, new_indent, next_line = move_to_next(buf, pos) 318 | if next_line and new_indent == my_indent and buf[new_pos:new_pos+1] == '-': 319 | output.write(buf[pos:new_pos]) 320 | pos = new_pos 321 | continue 322 | 323 | break 324 | 325 | return out, pos 326 | 327 | 328 | def parse_indented_map(buf, pos, output, options, my_indent, at_root): 329 | out = OrderedDict() 330 | 331 | while pos < len(buf): 332 | m = key_name.match(buf, pos) 333 | if not m: 334 | break 335 | 336 | name, pos, is_bare = parse_key(buf, pos, output, options) 337 | if name in out: 338 | raise DuplicateKey( 339 | buf, pos, "Can't have duplicate keys: {} is defined twice.".format(repr(name))) 340 | 341 | if buf[pos] != ':': 342 | if is_bare or not at_root: 343 | raise BadKey(buf, pos, "Expected 'key:', but didn't find a ':', found {}".format( 344 | repr(buf[pos:]))) 345 | else: 346 | raise NoRootObject( 347 | buf, pos, "Expected 'key:', but didn't find a ':', found a string {}. Note that strings must be inside a containing object or list, and cannot be root element".format(repr(buf[pos:]))) 348 | 349 | output.write(":") 350 | pos += 1 351 | if buf[pos] not in (' ', '\r', '\n'): 352 | if options.fix_nospace: 353 | output.write(' ') 354 | else: 355 | raise BadKey(buf, pos, "For key {}, expected space or newline after ':', found {}.".format( 356 | repr(name), repr(buf[pos:]))) 357 | 358 | new_pos, new_indent, next_line = move_to_next(buf, pos) 359 | if next_line and new_indent < my_indent: 360 | raise BadIndent( 361 | buf, new_pos, "Missing value. Found a key, but the line afterwards isn't indented enough to count.") 362 | 363 | if not next_line: 364 | output.write(buf[pos:new_pos]) 365 | obj, pos = parse_value(buf, new_pos, output, options) 366 | else: 367 | output.write(buf[pos:new_pos - new_indent]) 368 | obj, pos = parse_structure( 369 | buf, new_pos - new_indent, output, options, indent=my_indent) 370 | 371 | # dupe check 372 | out[name] = obj 373 | 374 | new_pos, new_indent, next_line = move_to_next(buf, pos) 375 | if not next_line or new_indent != my_indent: 376 | break 377 | else: 378 | output.write(buf[pos:new_pos]) 379 | pos = new_pos 380 | 381 | return out, pos 382 | 383 | 384 | def parse_value(buf, pos, output, options=None): 385 | pos = skip_whitespace(buf, pos, output) 386 | 387 | peek = buf[pos] 388 | 389 | if peek in ('*', '&', '?', '|', '<', '>', '%', '@'): 390 | raise UnsupportedYAML( 391 | buf, pos, "I found a {} outside of quotes. It's too special to let pass. Anchors, References, and other directives are not valid SafeYAML, Sorry.".format(peek)) 392 | 393 | if peek == '-' and buf[pos:pos + 3] == '---': 394 | raise UnsupportedYAML( 395 | buf, pos, "A SafeYAML document is a single document, '---' separators are unsupported") 396 | 397 | if peek == '{': 398 | return parse_map(buf, pos, output, options) 399 | elif peek == '[': 400 | return parse_list(buf, pos, output, options) 401 | elif peek == "'" or peek == '"': 402 | return parse_string(buf, pos, output, options) 403 | elif peek in "-+0123456789": 404 | return parse_number(buf, pos, output, options) 405 | else: 406 | return parse_bareword(buf, pos, output, options) 407 | 408 | # raise ParserErr(buf, pos, "Bug in parser, sorry") 409 | 410 | 411 | def parse_map(buf, pos, output, options): 412 | output.write('{') 413 | out = OrderedDict() 414 | 415 | pos += 1 416 | pos = skip_whitespace(buf, pos, output) 417 | 418 | comma = None 419 | 420 | while buf[pos] != '}': 421 | 422 | key, new_pos, is_bare = parse_key(buf, pos, output, options) 423 | 424 | if key in out: 425 | raise DuplicateKey( 426 | buf, pos, 'duplicate key: {}, {}'.format(key, out)) 427 | 428 | pos = skip_whitespace(buf, new_pos, output) 429 | 430 | peek = buf[pos] 431 | 432 | # bare key check 433 | 434 | if peek == ':': 435 | output.write(':') 436 | pos += 1 437 | else: 438 | raise BadKey( 439 | buf, pos, "Expected a ':', when parsing a key: value pair but found {}".format(repr(peek))) 440 | 441 | if is_bare and buf[pos] not in (' ', '\r', '\n'): 442 | if options.fix_nospace: 443 | output.write(' ') 444 | else: 445 | raise BadKey(buf, pos, "For key {}, expected space or newline after ':', found {}.".format( 446 | repr(key), repr(buf[pos:]))) 447 | 448 | pos = skip_whitespace(buf, pos, output) 449 | 450 | item, pos = parse_value(buf, pos, output, options) 451 | 452 | # dupe check 453 | out[key] = item 454 | 455 | pos = skip_whitespace(buf, pos, output) 456 | 457 | peek = buf[pos] 458 | comma = False 459 | if peek == ',': 460 | pos += 1 461 | output.write(',') 462 | comma = True 463 | pos = skip_whitespace(buf, pos, output) 464 | elif peek != '}': 465 | raise SyntaxErr( 466 | buf, pos, "Expecting a ',', or a '{}' but found {}".format('}', repr(peek))) 467 | 468 | if options.force_commas: 469 | if out and comma == False: 470 | output.write(',') 471 | output.write('}') 472 | return out, pos + 1 473 | 474 | 475 | def parse_key(buf, pos, output, options): 476 | m = identifier.match(buf, pos) 477 | if m: 478 | item = buf[pos:m.end()] 479 | name = item.lower() 480 | 481 | if name in builtin_names: 482 | if options.force_string_keys: 483 | item = '"{}"'.format(item) 484 | else: 485 | raise BadKey( 486 | buf, pos, "Found '{}' as a bareword key, Please use quotes around it.".format(item)) 487 | elif name in reserved_names: 488 | if options.force_string_keys: 489 | item = '"{}"'.format(item) 490 | else: 491 | raise ReservedKey( 492 | buf, pos, "Found '{}' as a bareword key, which can be parsed as a boolean. Please use quotes around it.".format(item)) 493 | elif options.force_string_keys: 494 | item = '"{}"'.format(item) 495 | 496 | output.write(item) 497 | pos = m.end() 498 | return name, pos, True 499 | else: 500 | name, pos = parse_string(buf, pos, output, options) 501 | return name, pos, False 502 | 503 | 504 | def parse_list(buf, pos, output, options): 505 | output.write("[") 506 | out = [] 507 | 508 | pos += 1 509 | 510 | pos = skip_whitespace(buf, pos, output) 511 | comma = None 512 | 513 | while buf[pos] != ']': 514 | item, pos = parse_value(buf, pos, output, options) 515 | out.append(item) 516 | 517 | pos = skip_whitespace(buf, pos, output) 518 | 519 | peek = buf[pos] 520 | comma = False 521 | if peek == ',': 522 | output.write(',') 523 | comma = True 524 | pos += 1 525 | pos = skip_whitespace(buf, pos, output) 526 | elif peek != ']': 527 | raise SyntaxErr( 528 | buf, pos, "Inside a [], Expecting a ',', or a ']' but found {}".format(repr(peek))) 529 | if options.force_commas: 530 | if out and comma == False: 531 | output.write(',') 532 | output.write("]") 533 | pos += 1 534 | 535 | return out, pos 536 | 537 | 538 | def parse_string(buf, pos, output, options): 539 | s = io.StringIO() 540 | peek = buf[pos] 541 | 542 | # validate string 543 | if peek == "'": 544 | m = string_sq.match(buf, pos) 545 | if m: 546 | end = m.end() 547 | output.write(buf[pos:end]) 548 | else: 549 | raise BadString(buf, pos, "Invalid single quoted string") 550 | else: 551 | m = string_dq.match(buf, pos) 552 | if m: 553 | end = m.end() 554 | output.write(buf[pos:end]) 555 | else: 556 | raise BadString(buf, pos, "Invalid double quoted string") 557 | 558 | lo = pos + 1 # skip quotes 559 | while lo < end - 1: 560 | hi = buf.find("\\", lo, end) 561 | if hi == -1: 562 | s.write(buf[lo:end - 1]) # skip quote 563 | break 564 | 565 | s.write(buf[lo:hi]) 566 | 567 | esc = buf[hi + 1] 568 | if esc in str_escapes: 569 | s.write(str_escapes[esc]) 570 | lo = hi + 2 571 | elif esc == 'x': 572 | n = int(buf[hi + 2:hi + 4], 16) 573 | s.write(chr(n)) 574 | lo = hi + 4 575 | elif esc == 'u': 576 | n = int(buf[hi + 2:hi + 6], 16) 577 | if 0xD800 <= n <= 0xDFFF: 578 | raise BadString( 579 | buf, hi, 'string cannot have surrogate pairs') 580 | s.write(chr(n)) 581 | lo = hi + 6 582 | elif esc == 'U': 583 | n = int(buf[hi + 2:hi + 10], 16) 584 | if 0xD800 <= n <= 0xDFFF: 585 | raise BadString( 586 | buf, hi, 'string cannot have surrogate pairs') 587 | s.write(chr(n)) 588 | lo = hi + 10 589 | else: 590 | raise UnsupportedEscape( 591 | buf, hi, "Unkown escape character {}".format(repr(esc))) 592 | 593 | out = s.getvalue() 594 | 595 | # XXX output.write string.escape 596 | 597 | return out, end 598 | 599 | 600 | def parse_number(buf, pos, output, options): 601 | flt_end = None 602 | exp_end = None 603 | 604 | sign = +1 605 | 606 | start = pos 607 | 608 | if buf[pos] in "+-": 609 | if buf[pos] == "-": 610 | sign = -1 611 | pos += 1 612 | peek = buf[pos] 613 | 614 | leading_zero = (peek == '0') 615 | m = int_b10.match(buf, pos) 616 | if m: 617 | int_end = m.end() 618 | end = int_end 619 | else: 620 | raise BadNumber(buf, pos, "Invalid number") 621 | 622 | t = flt_b10.match(buf, end) 623 | if t: 624 | flt_end = t.end() 625 | end = flt_end 626 | e = exp_b10.match(buf, end) 627 | if e: 628 | exp_end = e.end() 629 | end = exp_end 630 | 631 | if flt_end or exp_end: 632 | out = sign * float(buf[pos:end]) 633 | else: 634 | out = sign * int(buf[pos:end]) 635 | if leading_zero and out != 0: 636 | raise BadNumber( 637 | buf, pos, "Can't have leading zeros on non-zero integers") 638 | 639 | output.write(buf[start:end]) 640 | 641 | return out, end 642 | 643 | 644 | def parse_bareword(buf, pos, output, options): 645 | m = identifier.match(buf, pos) 646 | item = None 647 | if m: 648 | end = m.end() 649 | item = buf[pos:end] 650 | name = item.lower() 651 | 652 | if name in builtin_names: 653 | out = builtin_names[name] 654 | output.write(name) 655 | return out, m.end() 656 | elif options.fix_unquoted: 657 | pass 658 | elif item.lower() in reserved_names: 659 | raise ReservedKey( 660 | buf, pos, "Can't use '{}' as a value. Please either surround it in quotes if it\'s a string, or replace it with `true` if it\'s a boolean.".format(item)) 661 | else: 662 | raise Bareword( 663 | buf, pos, "{} doesn't look like 'true', 'false', or 'null', who are you kidding ".format(repr(item))) 664 | 665 | if options.fix_unquoted: 666 | m = barewords.match(buf, pos) 667 | if m: 668 | end = m.end() 669 | item = buf[pos:end].strip() 670 | output.write('"{}"'.format(item)) 671 | if buf[end:end + 1] not in ('', '\r', '\n', '#'): 672 | raise Bareword( 673 | buf, pos, "The parser is trying its very best but could only make out '{}', but there is other junk on that line. You fix it.".format(item)) 674 | elif buf[end:end + 1] == '#': 675 | output.write(' ') 676 | 677 | return item, m.end() 678 | raise Bareword(buf, pos, "The parser doesn't know how to parse anymore and has given up. Use less barewords: {}...".format( 679 | repr(buf[pos:pos + 5]))) 680 | 681 | 682 | if __name__ == '__main__': 683 | parser = argparse.ArgumentParser( 684 | description="SafeYAML Linter, checks (or formats) a YAML file for common ambiguities") 685 | 686 | parser.add_argument("file", nargs="*", default=None, 687 | help="filename to read, without will read from stdin") 688 | parser.add_argument("--fix", action='store_true', 689 | default=False, help="ask the parser to hog wild") 690 | parser.add_argument("--fix-unquoted", action='store_true', default=False, 691 | help="ask the parser to try its best to parse unquoted strings/barewords") 692 | parser.add_argument("--fix-nospace", action='store_true', 693 | default=False, help="fix map keys not to have ' ' after a ':'") 694 | parser.add_argument("--force-string-keys", action='store_true', 695 | default=False, help="quote every bareword") 696 | parser.add_argument("--force-commas", action='store_true', 697 | default=False, help="trailing commas") 698 | parser.add_argument("--quiet", action='store_true', 699 | default=False, help="don't print cleaned file") 700 | parser.add_argument("--in-place", action='store_true', 701 | default=False, help="edit file") 702 | 703 | parser.add_argument("--json", action='store_true', 704 | default=False, help="output json instead of yaml") 705 | 706 | args = parser.parse_args() # will only return when action is given 707 | 708 | options = Options( 709 | fix_unquoted=args.fix_unquoted or args.fix, 710 | fix_nospace=args.fix_nospace or args.fix, 711 | force_string_keys=args.force_string_keys, 712 | force_commas=args.force_commas, 713 | ) 714 | 715 | if args.in_place: 716 | if args.json: 717 | print('error: safeyaml --in-place cannot be used with --json') 718 | print() 719 | sys.exit(-2) 720 | 721 | if len(args.file) < 1: 722 | print('error: safeyaml --in-place takes at least one file') 723 | print() 724 | sys.exit(-2) 725 | 726 | for filename in args.file: 727 | with open(filename, 'r+') as fh: 728 | try: 729 | output = io.StringIO() 730 | obj = parse(fh.read(), output=output, options=options) 731 | except ParserErr as p: 732 | line, col = get_position(p.buf, p.pos) 733 | print("{}:{}:{}:{}".format(filename, line, 734 | col, p.explain()), file=sys.stderr) 735 | sys.exit(-2) 736 | else: 737 | fh.seek(0) 738 | fh.truncate(0) 739 | fh.write(output.getvalue()) 740 | 741 | else: 742 | input_fh, output_fh = sys.stdin, sys.stdout 743 | filename = "" 744 | 745 | if args.file: 746 | if len(args.file) > 1: 747 | print( 748 | 'error: safeyaml only takes one file as argument, unless --in-place given') 749 | print() 750 | sys.exit(-1) 751 | 752 | input_fh = open(args.file[0]) # closed on exit 753 | filename = args.file 754 | 755 | try: 756 | output = io.StringIO() 757 | obj = parse(input_fh.read(), output=output, options=options) 758 | except ParserErr as p: 759 | line, col = get_position(p.buf, p.pos) 760 | print("{}:{}:{}:{}".format(filename, line, 761 | col, p.explain()), file=sys.stderr) 762 | sys.exit(-2) 763 | 764 | if not args.quiet: 765 | 766 | if args.json: 767 | json.dump(obj, output_fh) 768 | else: 769 | output_fh.write(output.getvalue()) 770 | 771 | sys.exit(0) 772 | -------------------------------------------------------------------------------- /tests.py: -------------------------------------------------------------------------------- 1 | import io 2 | import os 3 | import glob 4 | import yaml 5 | import pytest 6 | 7 | import safeyaml 8 | 9 | SMOKE_TESTS = { 10 | """ [0] """: [0], 11 | """ [1.2] """: [1.2], 12 | """ [-3.4] """: [-3.4], 13 | """ [+5.6] """: [+5.6], 14 | """ "test": 1 """: {'test': 1}, 15 | """ x: 'test' """: {'x': 'test'}, 16 | """ [1 ,2,3] """: [1, 2, 3], 17 | """ [1,2,3,] """: [1, 2, 3], 18 | """ {"a":1} """: {'a': 1}, 19 | """ {'b':2,} """: {'b': 2}, 20 | """ [1 #foo\n] """: [1], 21 | } 22 | 23 | 24 | @pytest.mark.parametrize("code,ref_obj", SMOKE_TESTS.items()) 25 | def test_smoke(code, ref_obj): 26 | obj = safeyaml.parse(code)[0] 27 | assert obj == ref_obj 28 | 29 | 30 | @pytest.mark.parametrize("path", glob.glob("tests/validate/*.yaml")) 31 | def test_validate(path): 32 | check_file(path, validate=True) 33 | 34 | 35 | @pytest.mark.parametrize("path", glob.glob("tests/fix/*.yaml")) 36 | def test_fix(path): 37 | check_file(path, fix=True) 38 | 39 | 40 | def check_file(path, validate=False, fix=False): 41 | output_file = '{}.output'.format(path) 42 | error_file = '{}.error'.format(path) 43 | 44 | with open(path) as fh: 45 | contents = fh.read() 46 | 47 | if os.path.exists(error_file): 48 | with open(error_file) as fh: 49 | name, pos = fh.readline().split(':') 50 | pos = int(pos) 51 | 52 | with pytest.raises(safeyaml.ParserErr) as excinfo: 53 | safeyaml.parse(contents) 54 | 55 | error = excinfo.value 56 | assert error.name() == name 57 | assert error.pos == pos 58 | return 59 | 60 | options = safeyaml.Options( 61 | fix_unquoted=fix, 62 | fix_nospace=fix, 63 | ) 64 | 65 | output = io.StringIO() 66 | obj = safeyaml.parse(contents, output=output, options=options)[0] 67 | output = output.getvalue() 68 | 69 | if validate: 70 | try: 71 | ref_obj = yaml.load(contents) 72 | except: 73 | raise Exception("input isn't valid YAML: {}".format(contents)) 74 | 75 | assert obj == ref_obj 76 | 77 | try: 78 | parsed_output = yaml.load(output) 79 | except Exception as e: 80 | raise Exception("output isn't valid YAML: {}".format(output)) 81 | 82 | assert parsed_output == ref_obj 83 | 84 | if fix: 85 | with open(output_file) as fh: 86 | expected_output = fh.read() 87 | assert output == expected_output 88 | 89 | 90 | if __name__ == '__main__': 91 | pytest.main(['-q', __file__]) 92 | -------------------------------------------------------------------------------- /tests/fix/nospace.yaml: -------------------------------------------------------------------------------- 1 | setting:{a:1,b:2} 2 | manyspaces: "hi" 3 | -------------------------------------------------------------------------------- /tests/fix/nospace.yaml.output: -------------------------------------------------------------------------------- 1 | setting: {a: 1,b: 2} 2 | manyspaces: "hi" 3 | -------------------------------------------------------------------------------- /tests/fix/unquoted.yaml: -------------------------------------------------------------------------------- 1 | single_word: sonic 2 | multiple_words: sonic the hedgehog 3 | number: 3 4 | -------------------------------------------------------------------------------- /tests/fix/unquoted.yaml.output: -------------------------------------------------------------------------------- 1 | single_word: "sonic" 2 | multiple_words: "sonic the hedgehog" 3 | number: 3 4 | -------------------------------------------------------------------------------- /tests/validate/0.yaml: -------------------------------------------------------------------------------- 1 | { "a" :1, "b": 2 } 2 | -------------------------------------------------------------------------------- /tests/validate/0005_bad.yaml: -------------------------------------------------------------------------------- 1 | "name" 2 | -------------------------------------------------------------------------------- /tests/validate/0005_bad.yaml.error: -------------------------------------------------------------------------------- 1 | NoRootObject:6 2 | -------------------------------------------------------------------------------- /tests/validate/1.yaml: -------------------------------------------------------------------------------- 1 | # test 2 | - 1 3 | - 4 | name: -3 5 | -------------------------------------------------------------------------------- /tests/validate/2.yaml: -------------------------------------------------------------------------------- 1 | - 2 | - 1 3 | - 2 4 | - 5 | - 3 6 | - 4 7 | - 8 | - 9 | - 10 | - 11 | - 12 | - 5 13 | - 14 | - 15 | - 6 16 | -------------------------------------------------------------------------------- /tests/validate/3.yaml: -------------------------------------------------------------------------------- 1 | name: "value" 2 | foo: 123.45 3 | nested: 4 | "keys": 1 5 | "nice": 2 6 | "feels": 7 | pretty: "good" 8 | what: {a.b: 123, c.d: 456} 9 | -------------------------------------------------------------------------------- /tests/validate/4.yaml: -------------------------------------------------------------------------------- 1 | title: "SafeYAML Example" 2 | 3 | database: 4 | server: "192.168.1.1" 5 | 6 | # JSON-style arrays 7 | ports: [ 8 | 8000, 9 | 8001, 10 | 8002, 11 | ] 12 | 13 | enabled: true 14 | 15 | servers: 16 | # JSON-style objects 17 | alpha: { 18 | "ip": "10.0.0.1", 19 | "dc": "eqdc10", 20 | } 21 | beta: { 22 | "ip": "10.0.0.2", 23 | "dc": "eqdc10", 24 | } 25 | -------------------------------------------------------------------------------- /tox.ini: -------------------------------------------------------------------------------- 1 | [tox] 2 | envlist = py36 3 | skipsdist = true 4 | 5 | [testenv] 6 | passenv = HOME 7 | deps = pipenv 8 | commands = 9 | pipenv install --dev 10 | python tests.py 11 | pipenv check 12 | - pipenv check --style . 13 | --------------------------------------------------------------------------------