├── .gitignore ├── README.md ├── elm-package.json ├── LICENSE ├── comparison.md └── src ├── Parser ├── Internal.elm ├── LowLevel.elm └── LanguageKit.elm └── Parser.elm /.gitignore: -------------------------------------------------------------------------------- 1 | elm-stuff -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Moved to [elm/parser](https://github.com/elm/parser) 2 | -------------------------------------------------------------------------------- /elm-package.json: -------------------------------------------------------------------------------- 1 | { 2 | "version": "2.0.1", 3 | "summary": "a parsing library, focused on simplicity and great error messages", 4 | "repository": "https://github.com/elm-tools/parser.git", 5 | "license": "BSD-3-Clause", 6 | "source-directories": [ 7 | "src" 8 | ], 9 | "exposed-modules": [ 10 | "Parser", 11 | "Parser.LanguageKit", 12 | "Parser.LowLevel" 13 | ], 14 | "dependencies": { 15 | "elm-lang/core": "5.1.0 <= v < 6.0.0", 16 | "elm-tools/parser-primitives": "1.0.0 <= v < 2.0.0" 17 | }, 18 | "elm-version": "0.18.0 <= v < 0.19.0" 19 | } 20 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2017-present, Evan Czaplicki 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | * Neither the name of the {organization} nor the names of its 15 | contributors may be used to endorse or promote products derived from 16 | this software without specific prior written permission. 17 | 18 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 19 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 20 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 21 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 22 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 23 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 24 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 25 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 26 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 27 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 28 | -------------------------------------------------------------------------------- /comparison.md: -------------------------------------------------------------------------------- 1 | ## Comparison with Prior Work 2 | 3 | I have not seen the [parser pipeline][1] or the [context stack][2] ideas in other libraries, but [delayed commits][3] relate to prior work. 4 | 5 | [1]: README.md#parser-pipelines 6 | [2]: README.md#tracking-context 7 | [3]: README.md#delayed-commits 8 | 9 | Most parser combinator libraries I have seen are based on Haskell’s Parsec library, which has primitives named `try` and `lookAhead`. I believe [`delayedCommitMap`][delayedCommitMap] is a better primitive for two reasons. 10 | 11 | [delayedCommitMap]: http://package.elm-lang.org/packages/elm-tools/parser/latest/Parser#delayedCommitMap 12 | 13 | 14 | ### Performance and Composition 15 | 16 | Say we want to create a precise error message for `length [1,,3]`. The naive approach with Haskell’s Parsec library produces very bad error messages: 17 | 18 | ```haskell 19 | spaceThenArg :: Parser Expr 20 | spaceThenArg = 21 | try (spaces >> term) 22 | ``` 23 | 24 | This means we get a precise error from `term`, but then throw it away and say something went wrong at the space before the `[`. Very confusing! To improve quality, we must write something like this: 25 | 26 | ```haskell 27 | spaceThenArg :: Parser Expr 28 | spaceThenArg = 29 | choice 30 | [ do lookAhead (spaces >> char '[') 31 | spaces 32 | term 33 | , try (spaces >> term) 34 | ] 35 | ``` 36 | 37 | Notice that we parse `spaces` twice no matter what. 38 | 39 | Notice that we also had to hardcode `[` in the `lookAhead`. What if we update `term` to parse records that start with `{` as well? To get good commits on records, we must remember to update `lookAhead` to look for `oneOf "[{"`. Implementation details are leaking out of `term`! 40 | 41 | With `delayedCommit` in this Elm library, you can just say: 42 | 43 | ```elm 44 | spaceThenArg : Parser Expr 45 | spaceThenArg = 46 | delayedCommit spaces term 47 | ``` 48 | 49 | It does less work, and is more reliable as `term` evolves. I believe `delayedCommit` makes `lookAhead` pointless. 50 | 51 | 52 | ### Expressiveness 53 | 54 | You can define `try` in terms of [`delayedCommitMap`][delayedCommitMap] like this: 55 | 56 | ```elm 57 | try : Parser a -> Parser a 58 | try parser = 59 | delayedCommitMap always parser (succeed ()) 60 | ``` 61 | 62 | No expressiveness is lost! 63 | 64 | While it is possible to define `try`, I left it out of this package. In practice, `try` often leads to “bad commits” where your parser fails in a very specific way, but you then backtrack to a less specific error message. I considered naming it `allOrNothing` to better explain how it changes commit behavior, but ultimately, I thought it was best to encourage users to express their parsers with `delayedCommit` directly. 65 | 66 | 67 | ### Summary 68 | 69 | Compared to previous work, `delayedCommit` lets you produce precise error messages **more efficiently**. By thinking about “commit behavior” directly, you also end up with **cleaner composition** of parsers. And these benefits come **without any loss of expressiveness**. 70 | -------------------------------------------------------------------------------- /src/Parser/Internal.elm: -------------------------------------------------------------------------------- 1 | module Parser.Internal exposing 2 | ( Parser(..) 3 | , Step(..) 4 | , State 5 | , chomp 6 | , chompDigits 7 | , chompDotAndExp 8 | , isBadIntEnd 9 | ) 10 | 11 | 12 | import Char 13 | import ParserPrimitives as Prim 14 | 15 | 16 | 17 | -- PARSERS 18 | 19 | 20 | type Parser ctx x a = 21 | Parser (State ctx -> Step ctx x a) 22 | 23 | 24 | type Step ctx x a 25 | = Good a (State ctx) 26 | | Bad x (State ctx) 27 | 28 | 29 | type alias State ctx = 30 | { source : String 31 | , offset : Int 32 | , indent : Int 33 | , context : List ctx 34 | , row : Int 35 | , col : Int 36 | } 37 | 38 | 39 | 40 | -- CHOMPERS 41 | 42 | 43 | chomp : (Char -> Bool) -> Int -> String -> Int 44 | chomp isGood offset source = 45 | let 46 | newOffset = 47 | Prim.isSubChar isGood offset source 48 | in 49 | if newOffset < 0 then 50 | offset 51 | 52 | else 53 | chomp isGood newOffset source 54 | 55 | 56 | 57 | -- CHOMP DIGITS 58 | 59 | 60 | chompDigits : (Char -> Bool) -> Int -> String -> Result Int Int 61 | chompDigits isValidDigit offset source = 62 | let 63 | newOffset = 64 | chomp isValidDigit offset source 65 | in 66 | -- no digits 67 | if newOffset == offset then 68 | Err newOffset 69 | 70 | -- ends with non-digit characters 71 | else if Prim.isSubChar isBadIntEnd newOffset source /= -1 then 72 | Err newOffset 73 | 74 | -- all valid digits! 75 | else 76 | Ok newOffset 77 | 78 | 79 | isBadIntEnd : Char -> Bool 80 | isBadIntEnd char = 81 | Char.isDigit char 82 | || Char.isUpper char 83 | || Char.isLower char 84 | || char == '.' 85 | 86 | 87 | 88 | -- CHOMP FLOAT STUFF 89 | 90 | 91 | chompDotAndExp : Int -> String -> Result Int Int 92 | chompDotAndExp offset source = 93 | let 94 | dotOffset = 95 | Prim.isSubChar isDot offset source 96 | in 97 | if dotOffset == -1 then 98 | chompExp offset source 99 | 100 | else 101 | chompExp (chomp Char.isDigit dotOffset source) source 102 | 103 | 104 | isDot : Char -> Bool 105 | isDot char = 106 | char == '.' 107 | 108 | 109 | chompExp : Int -> String -> Result Int Int 110 | chompExp offset source = 111 | let 112 | eOffset = 113 | Prim.isSubChar isE offset source 114 | in 115 | if eOffset == -1 then 116 | Ok offset 117 | 118 | else 119 | let 120 | opOffset = 121 | Prim.isSubChar isPlusOrMinus eOffset source 122 | 123 | expOffset = 124 | if opOffset == -1 then eOffset else opOffset 125 | in 126 | if Prim.isSubChar isZero expOffset source /= -1 then 127 | Err expOffset 128 | 129 | else if Prim.isSubChar Char.isDigit expOffset source == -1 then 130 | Err expOffset 131 | 132 | else 133 | chompDigits Char.isDigit expOffset source 134 | 135 | 136 | isE : Char -> Bool 137 | isE char = 138 | char == 'e' || char == 'E' 139 | 140 | 141 | isZero : Char -> Bool 142 | isZero char = 143 | char == '0' 144 | 145 | 146 | isPlusOrMinus : Char -> Bool 147 | isPlusOrMinus char = 148 | char == '+' || char == '-' 149 | 150 | -------------------------------------------------------------------------------- /src/Parser/LowLevel.elm: -------------------------------------------------------------------------------- 1 | module Parser.LowLevel exposing 2 | ( getIndentLevel 3 | , withIndentLevel 4 | 5 | , getPosition 6 | , getRow 7 | , getCol 8 | 9 | , getOffset 10 | , getSource 11 | ) 12 | 13 | {-| You are unlikely to need any of this under normal circumstances. 14 | 15 | # Indentation 16 | @docs getIndentLevel, withIndentLevel 17 | 18 | # Row, Column, Offset, and Source 19 | @docs getPosition, getRow, getCol, getOffset, getSource 20 | 21 | -} 22 | 23 | import Parser exposing (Parser) 24 | import Parser.Internal as I exposing (State) 25 | 26 | 27 | 28 | -- INDENTATION 29 | 30 | 31 | {-| This parser tracks “indentation level” so you can parse indentation 32 | sensitive languages. Indentation levels correspond to column numbers, so 33 | it starts at 1. 34 | -} 35 | getIndentLevel : Parser Int 36 | getIndentLevel = 37 | I.Parser <| \state -> I.Good state.indent state 38 | 39 | 40 | {-| Run a parser with a given indentation level. So you will likely 41 | use `getCol` to get the current column, `andThen` give that to 42 | `withIndentLevel`. 43 | -} 44 | withIndentLevel : Int -> Parser a -> Parser a 45 | withIndentLevel newIndent (I.Parser parse) = 46 | I.Parser <| \state1 -> 47 | case parse (changeIndent newIndent state1) of 48 | I.Good a state2 -> 49 | I.Good a (changeIndent state1.indent state2) 50 | 51 | I.Bad x state2 -> 52 | I.Bad x (changeIndent state1.indent state2) 53 | 54 | 55 | changeIndent : Int -> State ctx -> State ctx 56 | changeIndent newIndent { source, offset, context, row, col } = 57 | { source = source 58 | , offset = offset 59 | , indent = newIndent 60 | , context = context 61 | , row = row 62 | , col = col 63 | } 64 | 65 | 66 | 67 | -- POSITION 68 | 69 | 70 | {-| Code editors treat code like a grid. There are rows and columns. 71 | In most editors, rows and colums are 1-indexed. You move to a new row 72 | whenever you see a `\n` character. 73 | 74 | The `getPosition` parser succeeds with your current row and column 75 | within the string you are parsing. 76 | -} 77 | getPosition : Parser (Int, Int) 78 | getPosition = 79 | I.Parser <| \state -> I.Good (state.row, state.col) state 80 | 81 | 82 | {-| The `getRow` parser succeeds with your current row within 83 | the string you are parsing. 84 | -} 85 | getRow : Parser Int 86 | getRow = 87 | I.Parser <| \state -> I.Good state.row state 88 | 89 | 90 | {-| The `getCol` parser succeeds with your current column within 91 | the string you are parsing. 92 | -} 93 | getCol : Parser Int 94 | getCol = 95 | I.Parser <| \state -> I.Good state.col state 96 | 97 | 98 | {-| Editors think of code as a grid, but behind the scenes it is just 99 | a flat array of UTF16 characters. `getOffset` tells you your index in 100 | that flat array. So if you have read `"\n\n\n\n"` you are on row 5, 101 | column 1, and offset 4. 102 | 103 | **Note:** browsers use UTF16 strings, so characters may be one or two 16-bit 104 | words. This means you can read 4 characters, but your offset will move by 8. 105 | -} 106 | getOffset : Parser Int 107 | getOffset = 108 | I.Parser <| \state -> I.Good state.offset state 109 | 110 | 111 | {-| Get the entire string you are parsing right now. Paired with 112 | `getOffset` this can let you use `String.slice` to grab substrings 113 | with very little intermediate allocation. 114 | -} 115 | getSource : Parser String 116 | getSource = 117 | I.Parser <| \state -> I.Good state.source state 118 | 119 | -------------------------------------------------------------------------------- /src/Parser/LanguageKit.elm: -------------------------------------------------------------------------------- 1 | module Parser.LanguageKit exposing 2 | ( variable 3 | , list, record, tuple, sequence, Trailing(..) 4 | , whitespace, LineComment(..), MultiComment(..) 5 | ) 6 | 7 | 8 | {-| 9 | 10 | # Variables 11 | @docs variable 12 | 13 | # Lists, records, and that sort of thing 14 | @docs list, record, tuple, sequence, Trailing 15 | 16 | # Whitespace 17 | @docs whitespace, LineComment, MultiComment 18 | 19 | -} 20 | 21 | 22 | import Set exposing (Set) 23 | import Parser exposing (..) 24 | import Parser.Internal as I exposing (Step(..), State) 25 | import ParserPrimitives as Prim 26 | 27 | 28 | 29 | -- VARIABLES 30 | 31 | 32 | {-| Create a parser for variables. It takes two `Char` checkers. The 33 | first one is for the first character. The second one is for all the 34 | other characters. 35 | 36 | In Elm, we distinguish between upper and lower case variables, so we 37 | can do something like this: 38 | 39 | import Char 40 | import Parser exposing (..) 41 | import Parser.LanguageKit exposing (variable) 42 | import Set 43 | 44 | lowVar : Parser String 45 | lowVar = 46 | variable Char.isLower isVarChar keywords 47 | 48 | capVar : Parser String 49 | capVar = 50 | variable Char.isUpper isVarChar keywords 51 | 52 | isVarChar : Char -> Bool 53 | isVarChar char = 54 | Char.isLower char 55 | || Char.isUpper char 56 | || Char.isDigit char 57 | || char == '_' 58 | 59 | keywords : Set.Set String 60 | keywords = 61 | Set.fromList [ "let", "in", "case", "of" ] 62 | -} 63 | variable : (Char -> Bool) -> (Char -> Bool) -> Set String -> Parser String 64 | variable isFirst isOther keywords = 65 | I.Parser <| \({ source, offset, indent, context, row, col } as state1) -> 66 | let 67 | firstOffset = 68 | Prim.isSubChar isFirst offset source 69 | in 70 | if firstOffset == -1 then 71 | Bad ExpectingVariable state1 72 | 73 | else 74 | let 75 | state2 = 76 | if firstOffset == -2 then 77 | varHelp isOther (offset + 1) (row + 1) 1 source indent context 78 | else 79 | varHelp isOther firstOffset row (col + 1) source indent context 80 | 81 | name = 82 | String.slice offset state2.offset source 83 | in 84 | if Set.member name keywords then 85 | Bad ExpectingVariable state1 86 | 87 | else 88 | Good name state2 89 | 90 | 91 | varHelp : (Char -> Bool) -> Int -> Int -> Int -> String -> Int -> List ctx -> State ctx 92 | varHelp isGood offset row col source indent context = 93 | let 94 | newOffset = 95 | Prim.isSubChar isGood offset source 96 | in 97 | if newOffset == -1 then 98 | { source = source 99 | , offset = offset 100 | , indent = indent 101 | , context = context 102 | , row = row 103 | , col = col 104 | } 105 | 106 | else if newOffset == -2 then 107 | varHelp isGood (offset + 1) (row + 1) 1 source indent context 108 | 109 | else 110 | varHelp isGood newOffset row (col + 1) source indent context 111 | 112 | 113 | 114 | -- SEQUENCES 115 | 116 | 117 | {-| Parse a comma-separated list like `[ 1, 2, 3 ]`. You provide 118 | a parser for the spaces and for the list items. So if you want 119 | to parse a list of integers, you would say: 120 | 121 | import Parser exposing (Parser) 122 | import Parser.LanguageKit as Parser 123 | 124 | intList : Parser (List Int) 125 | intList = 126 | Parser.list spaces Parser.int 127 | 128 | spaces : Parser () 129 | spaces = 130 | Parser.ignore zeroOrMore (\char -> char == ' ') 131 | 132 | -- run intList "[]" == Ok [] 133 | -- run intList "[ ]" == Ok [] 134 | -- run intList "[1,2,3]" == Ok [1,2,3] 135 | -- run intList "[ 1, 2, 3 ]" == Ok [1,2,3] 136 | -- run intList "[ 1 , 2 , 3 ]" == Ok [1,2,3] 137 | -- run intList "[ 1, 2, 3, ]" == Err ... 138 | -- run intList "[, 1, 2, 3 ]" == Err ... 139 | 140 | **Note:** If you want trailing commas, check out the 141 | [`sequence`](#sequence) function. 142 | -} 143 | list : Parser () -> Parser a -> Parser (List a) 144 | list spaces item = 145 | sequence 146 | { start = "[" 147 | , separator = "," 148 | , end = "]" 149 | , spaces = spaces 150 | , item = item 151 | , trailing = Forbidden 152 | } 153 | 154 | 155 | {-| Help parse records like `{ a = 2, b = 2 }`. You provide 156 | a parser for the spaces and for the list items, you might say: 157 | 158 | import Parser exposing ( Parser, (|.), (|=), zeroOrMore ) 159 | import Parser.LanguageKit as Parser 160 | 161 | record : Parser (List (String, Int)) 162 | record = 163 | Parser.record spaces field 164 | 165 | field : Parser (String, Int) 166 | field = 167 | Parser.succeed (,) 168 | |= lowVar 169 | |. spaces 170 | |. Parser.symbol "=" 171 | |. spaces 172 | |= int 173 | 174 | spaces : Parser () 175 | spaces = 176 | Parser.ignore zeroOrMore (\char -> char == ' ') 177 | 178 | -- run record "{}" == Ok [] 179 | -- run record "{ }" == Ok [] 180 | -- run record "{ x = 3 }" == Ok [ ("x",3) ] 181 | -- run record "{ x = 3, }" == Err ... 182 | -- run record "{ x = 3, y = 4 }" == Ok [ ("x",3), ("y",4) ] 183 | -- run record "{ x = 3, y = }" == Err ... 184 | 185 | **Note:** If you want trailing commas, check out the 186 | [`sequence`](#sequence) function. 187 | -} 188 | record : Parser () -> Parser a -> Parser (List a) 189 | record spaces item = 190 | sequence 191 | { start = "{" 192 | , separator = "," 193 | , end = "}" 194 | , spaces = spaces 195 | , item = item 196 | , trailing = Forbidden 197 | } 198 | 199 | 200 | {-| Help parse tuples like `(3, 4)`. Works just like [`list`](#list) 201 | and [`record`](#record). And if you need something custom, check out 202 | the [`sequence`](#sequence) function. 203 | -} 204 | tuple : Parser () -> Parser a -> Parser (List a) 205 | tuple spaces item = 206 | sequence 207 | { start = "(" 208 | , separator = "," 209 | , end = ")" 210 | , spaces = spaces 211 | , item = item 212 | , trailing = Forbidden 213 | } 214 | 215 | 216 | {-| Handle things *like* lists and records, but you can customize the 217 | details however you need. Say you want to parse C-style code blocks: 218 | 219 | import Parser exposing (Parser) 220 | import Parser.LanguageKit as Parser exposing (Trailing(..)) 221 | 222 | block : Parser (List Stmt) 223 | block = 224 | Parser.sequence 225 | { start = "{" 226 | , separator = ";" 227 | , end = "}" 228 | , spaces = spaces 229 | , item = statement 230 | , trailing = Mandatory -- demand a trailing semi-colon 231 | } 232 | 233 | -- spaces : Parser () 234 | -- statement : Parser Stmt 235 | 236 | **Note:** If you need something more custom, do not be afraid to check 237 | out the implementation and customize it for your case. It is better to 238 | get nice error messages with a lower-level implementation than to try 239 | to hack high-level parsers to do things they are not made for. 240 | -} 241 | sequence 242 | : { start : String 243 | , separator : String 244 | , end : String 245 | , spaces : Parser () 246 | , item : Parser a 247 | , trailing : Trailing 248 | } 249 | -> Parser (List a) 250 | sequence { start, end, spaces, item, separator, trailing } = 251 | symbol start 252 | |- spaces 253 | |- sequenceEnd end spaces item separator trailing 254 | 255 | 256 | {-| What’s the deal with trailing commas? Are they `Forbidden`? 257 | Are they `Optional`? Are they `Mandatory`? Welcome to [shapes 258 | club](http://poorlydrawnlines.com/comic/shapes-club/)! 259 | -} 260 | type Trailing = Forbidden | Optional | Mandatory 261 | 262 | 263 | ignore : Parser ignore -> Parser keep -> Parser keep 264 | ignore ignoreParser keepParser = 265 | map2 revAlways ignoreParser keepParser 266 | 267 | 268 | (|-) : Parser ignore -> Parser keep -> Parser keep 269 | (|-) = 270 | ignore 271 | 272 | 273 | revAlways : ignore -> keep -> keep 274 | revAlways _ keep = 275 | keep 276 | 277 | 278 | sequenceEnd : String -> Parser () -> Parser a -> String -> Trailing -> Parser (List a) 279 | sequenceEnd end spaces parseItem sep trailing = 280 | let 281 | chompRest item = 282 | case trailing of 283 | Forbidden -> 284 | sequenceEndForbidden end spaces parseItem sep [item] 285 | 286 | Optional -> 287 | sequenceEndOptional end spaces parseItem sep [item] 288 | 289 | Mandatory -> 290 | spaces 291 | |- symbol sep 292 | |- spaces 293 | |- sequenceEndMandatory end spaces parseItem sep [item] 294 | in 295 | oneOf 296 | [ parseItem 297 | |> andThen chompRest 298 | , symbol end 299 | |- succeed [] 300 | ] 301 | 302 | 303 | sequenceEndForbidden : String -> Parser () -> Parser a -> String -> List a -> Parser (List a) 304 | sequenceEndForbidden end spaces parseItem sep revItems = 305 | let 306 | chompRest item = 307 | sequenceEndForbidden end spaces parseItem sep (item :: revItems) 308 | in 309 | ignore spaces <| 310 | oneOf 311 | [ symbol sep 312 | |- spaces 313 | |- andThen chompRest parseItem 314 | , symbol end 315 | |- succeed (List.reverse revItems) 316 | ] 317 | 318 | 319 | sequenceEndOptional : String -> Parser () -> Parser a -> String -> List a -> Parser (List a) 320 | sequenceEndOptional end spaces parseItem sep revItems = 321 | let 322 | parseEnd = 323 | andThen (\_ -> succeed (List.reverse revItems)) (symbol end) 324 | 325 | chompRest item = 326 | sequenceEndOptional end spaces parseItem sep (item :: revItems) 327 | in 328 | ignore spaces <| 329 | oneOf 330 | [ symbol sep 331 | |- spaces 332 | |- oneOf [ andThen chompRest parseItem, parseEnd ] 333 | , parseEnd 334 | ] 335 | 336 | 337 | sequenceEndMandatory : String -> Parser () -> Parser a -> String -> List a -> Parser (List a) 338 | sequenceEndMandatory end spaces parseItem sep revItems = 339 | let 340 | chompRest item = 341 | sequenceEndMandatory end spaces parseItem sep (item :: revItems) 342 | in 343 | oneOf 344 | [ andThen chompRest <| 345 | parseItem 346 | |. spaces 347 | |. symbol sep 348 | |. spaces 349 | , symbol end 350 | |- succeed (List.reverse revItems) 351 | ] 352 | 353 | 354 | 355 | -- WHITESPACE 356 | 357 | 358 | {-| Create a custom whitespace parser. It will always chomp the 359 | `' '`, `'\r'`, and `'\n'` characters, but you can customize some 360 | other things. Here are some examples: 361 | 362 | elm : Parser () 363 | elm = 364 | whitespace 365 | { allowTabs = False 366 | , lineComment = LineComment "--" 367 | , multiComment = NestableComment "{-" "-}" 368 | } 369 | 370 | js : Parser () 371 | js = 372 | whitespace 373 | { allowTabs = True 374 | , lineComment = LineComment "//" 375 | , multiComment = UnnestableComment "/*" "*/" 376 | } 377 | 378 | If you need further customization, please open an issue describing your 379 | scenario or check out the source code and write it yourself. This is all 380 | built using stuff from the root `Parser` module. 381 | -} 382 | whitespace 383 | : { allowTabs : Bool 384 | , lineComment : LineComment 385 | , multiComment : MultiComment 386 | } 387 | -> Parser () 388 | whitespace { allowTabs, lineComment, multiComment } = 389 | let 390 | tabParser = 391 | if allowTabs then 392 | [ Parser.ignore zeroOrMore isTab ] 393 | else 394 | [] 395 | 396 | lineParser = 397 | case lineComment of 398 | NoLineComment -> 399 | [] 400 | 401 | LineComment start -> 402 | [ symbol start 403 | |. ignoreUntil "\n" 404 | ] 405 | 406 | multiParser = 407 | case multiComment of 408 | NoMultiComment -> 409 | [] 410 | 411 | UnnestableComment start end -> 412 | [ symbol start 413 | |. ignoreUntil end 414 | ] 415 | 416 | NestableComment start end -> 417 | [ nestableComment start end 418 | ] 419 | in 420 | whitespaceHelp <| 421 | oneOf (tabParser ++ lineParser ++ multiParser) 422 | 423 | 424 | chompSpaces : Parser () 425 | chompSpaces = 426 | Parser.ignore zeroOrMore isSpace 427 | 428 | 429 | isSpace : Char -> Bool 430 | isSpace char = 431 | char == ' ' || char == '\n' || char == '\r' 432 | 433 | 434 | isTab : Char -> Bool 435 | isTab char = 436 | char == '\t' 437 | 438 | 439 | whitespaceHelp : Parser a -> Parser () 440 | whitespaceHelp parser = 441 | ignore chompSpaces <| 442 | oneOf [ andThen (\_ -> whitespaceHelp parser) parser, succeed () ] 443 | 444 | 445 | {-| Are line comments allowed? If so, what symbol do they start with? 446 | 447 | LineComment "--" -- Elm 448 | LineComment "//" -- JS 449 | LineComment "#" -- Python 450 | NoLineComment -- OCaml 451 | -} 452 | type LineComment = NoLineComment | LineComment String 453 | 454 | 455 | {-| Are multi-line comments allowed? If so, what symbols do they start 456 | and end with? 457 | 458 | NestableComment "{-" "-}" -- Elm 459 | UnnestableComment "/*" "*/" -- JS 460 | NoMultiComment -- Python 461 | 462 | In Elm, you can nest multi-line comments. In C-like languages, like JS, 463 | this is not allowed. As soon as you see a `*/` the comment is over no 464 | matter what. 465 | -} 466 | type MultiComment 467 | = NoMultiComment 468 | | NestableComment String String 469 | | UnnestableComment String String 470 | 471 | 472 | nestableComment : String -> String -> Parser () 473 | nestableComment start end = 474 | case (String.uncons start, String.uncons end) of 475 | (Nothing, _) -> 476 | fail "Trying to parse a multi-line comment, but the start token cannot be the empty string!" 477 | 478 | (_, Nothing) -> 479 | fail "Trying to parse a multi-line comment, but the end token cannot be the empty string!" 480 | 481 | ( Just (startChar, _), Just (endChar, _) ) -> 482 | let 483 | isNotRelevant char = 484 | char /= startChar && char /= endChar 485 | in 486 | symbol start 487 | |. nestableCommentHelp isNotRelevant start end 1 488 | 489 | 490 | nestableCommentHelp : (Char -> Bool) -> String -> String -> Int -> Parser () 491 | nestableCommentHelp isNotRelevant start end nestLevel = 492 | lazy <| \_ -> 493 | ignore (Parser.ignore zeroOrMore isNotRelevant) <| 494 | oneOf 495 | [ ignore (symbol end) <| 496 | if nestLevel == 1 then 497 | succeed () 498 | else 499 | nestableCommentHelp isNotRelevant start end (nestLevel - 1) 500 | , ignore (symbol start) <| 501 | nestableCommentHelp isNotRelevant start end (nestLevel + 1) 502 | , ignore (Parser.ignore (Exactly 1) isChar) <| 503 | nestableCommentHelp isNotRelevant start end nestLevel 504 | ] 505 | 506 | 507 | isChar : Char -> Bool 508 | isChar char = 509 | True 510 | -------------------------------------------------------------------------------- /src/Parser.elm: -------------------------------------------------------------------------------- 1 | module Parser exposing 2 | ( Parser 3 | , run 4 | , int, float, symbol, keyword, end 5 | , Count(..), zeroOrMore, oneOrMore, keep, ignore, repeat 6 | , succeed, fail, map, oneOf, (|=), (|.), map2, lazy, andThen 7 | , delayedCommit, delayedCommitMap 8 | , source, sourceMap, ignoreUntil 9 | , Error, Problem(..), Context, inContext 10 | ) 11 | 12 | {-| 13 | 14 | # Parsers 15 | @docs Parser, run 16 | 17 | # Numbers and Keywords 18 | @docs int, float, symbol, keyword, end 19 | 20 | # Repeat Parsers 21 | @docs Count, zeroOrMore, oneOrMore, keep, ignore, repeat 22 | 23 | # Combining Parsers 24 | @docs succeed, fail, map, oneOf, (|=), (|.), map2, lazy, andThen 25 | 26 | # Delayed Commits 27 | @docs delayedCommit, delayedCommitMap 28 | 29 | # Efficiency Tricks 30 | @docs source, sourceMap, ignoreUntil 31 | 32 | # Errors 33 | @docs Error, Problem, Context, inContext 34 | -} 35 | 36 | import Char 37 | import Parser.Internal as Internal exposing (Parser(..), Step(..)) 38 | import ParserPrimitives as Prim 39 | 40 | 41 | 42 | -- PARSER 43 | 44 | 45 | {-| A parser! If you have a `Parser Int`, it is a parser that turns 46 | strings into integers. 47 | -} 48 | type alias Parser a = 49 | Internal.Parser Context Problem a 50 | 51 | 52 | type alias Step a = 53 | Internal.Step Context Problem a 54 | 55 | 56 | type alias State = 57 | Internal.State Context 58 | 59 | 60 | {-| Actually run a parser. 61 | 62 | run (keyword "true") "true" == Ok () 63 | run (keyword "true") "True" == Err ... 64 | run (keyword "true") "false" == Err ... 65 | -} 66 | run : Parser a -> String -> Result Error a 67 | run (Parser parse) source = 68 | let 69 | initialState = 70 | { source = source 71 | , offset = 0 72 | , indent = 1 73 | , context = [] 74 | , row = 1 75 | , col = 1 76 | } 77 | in 78 | case parse initialState of 79 | Good a _ -> 80 | Ok a 81 | 82 | Bad problem { row, col, context } -> 83 | Err 84 | { row = row 85 | , col = col 86 | , source = source 87 | , problem = problem 88 | , context = context 89 | } 90 | 91 | 92 | -- ERRORS 93 | 94 | 95 | {-| Parse errors as data. You can format it however makes the most 96 | sense for your application. Maybe that is all text, or maybe it is fancy 97 | interactive HTML. Up to you! 98 | 99 | You get: 100 | 101 | - The `row` and `col` of the error. 102 | - The full `source` provided to the [`run`](#run) function. 103 | - The actual `problem` you ran into. 104 | - A stack of `context` that describes where the error is *conceptually*. 105 | 106 | **Note:** `context` is a stack. That means [`inContext`](#inContext) 107 | adds to the *front* of this list, not the back. So if you want the 108 | [`Context`](#Context) closest to the error, you want the first element 109 | of the `context` stack. 110 | -} 111 | type alias Error = 112 | { row : Int 113 | , col : Int 114 | , source : String 115 | , problem : Problem 116 | , context : List Context 117 | } 118 | 119 | 120 | {-| The particular problem you ran into. 121 | 122 | The tricky one here is `BadRepeat`. That means that you are running 123 | `zeroOrMore parser` where `parser` can succeed without consuming any 124 | input. That means it will just loop forever, consuming no input until 125 | the program crashes. 126 | -} 127 | type Problem 128 | = BadOneOf (List Problem) 129 | | BadInt 130 | | BadFloat 131 | | BadRepeat 132 | | ExpectingEnd 133 | | ExpectingSymbol String 134 | | ExpectingKeyword String 135 | | ExpectingVariable 136 | | ExpectingClosing String 137 | | Fail String 138 | 139 | 140 | {-| Most parsers only let you know the row and column where the error 141 | occurred. But what if you could *also* say “the error occured **while 142 | parsing a list**” and let folks know what the *parser* thinks it is 143 | doing?! 144 | 145 | The error messages would be a lot nicer! That is what Elm compiler does, 146 | and it is what `Context` helps you do in this library! **See the 147 | [`inContext`](#inContext) docs for a nice example!** 148 | 149 | About the actual fields: 150 | 151 | - `description` is set by [`inContext`](#inContext) 152 | - `row` and `col` are where [`inContext`](#inContext) began 153 | 154 | Say you use `inContext` in your list parser. And say get an error trying 155 | to parse `[ 1, 23zm5, 3 ]`. In addition to error information about `23zm5`, 156 | you would have `Context` with the row and column of the starting `[` symbol. 157 | -} 158 | type alias Context = 159 | { row : Int 160 | , col : Int 161 | , description : String 162 | } 163 | 164 | 165 | 166 | -- PRIMITIVES 167 | 168 | 169 | {-| A parser that succeeds without consuming any text. 170 | 171 | run (succeed 90210 ) "mississippi" == Ok 90210 172 | run (succeed 3.141 ) "mississippi" == Ok 3.141 173 | run (succeed () ) "mississippi" == Ok () 174 | run (succeed Nothing) "mississippi" == Ok Nothing 175 | 176 | Seems weird, but it is often useful in combination with 177 | [`oneOf`](#oneOf) or [`andThen`](#andThen). 178 | -} 179 | succeed : a -> Parser a 180 | succeed a = 181 | Parser <| \state -> Good a state 182 | 183 | 184 | {-| A parser always fails. 185 | 186 | run (fail "bad list") "[1,2,3]" == Err .. 187 | 188 | Seems weird, but it is often useful in combination with 189 | [`oneOf`](#oneOf) or [`andThen`](#andThen). 190 | -} 191 | fail : String -> Parser a 192 | fail message = 193 | Parser <| \state -> Bad (Fail message) state 194 | 195 | 196 | 197 | -- MAPPING 198 | 199 | 200 | {-| Transform the result of a parser. Maybe you have a value that is 201 | an integer or `null`: 202 | 203 | nullOrInt : Parser (Maybe Int) 204 | nullOrInt = 205 | oneOf 206 | [ map Just int 207 | , map (\_ -> Nothing) (keyword "null") 208 | ] 209 | 210 | -- run nullOrInt "0" == Ok (Just 0) 211 | -- run nullOrInt "13" == Ok (Just 13) 212 | -- run nullOrInt "null" == Ok Nothing 213 | -- run nullOrInt "zero" == Err ... 214 | 215 | -} 216 | map : (a -> b) -> Parser a -> Parser b 217 | map func (Parser parse) = 218 | Parser <| \state1 -> 219 | case parse state1 of 220 | Good a state2 -> 221 | Good (func a) state2 222 | 223 | Bad x state2 -> 224 | Bad x state2 225 | 226 | 227 | {-| **This function is not used much in practice.** It is nicer to use 228 | the [parser pipeline][pp] operators [`(|.)`](#|.) and [`(|=)`](#|=) 229 | instead. 230 | 231 | [pp]: https://github.com/elm-tools/parser/blob/master/README.md#parser-pipeline 232 | 233 | That said, this function can combine two parsers. Maybe you 234 | want to parse some spaces followed by an integer: 235 | 236 | spacesThenInt : Parser Int 237 | spacesThenInt = 238 | map2 (\_ n -> n) spaces int 239 | 240 | spaces : Parser () 241 | spaces = 242 | ignore zeroOrMore (\char -> char == ' ') 243 | 244 | We can also use `map2` to define `(|.)` and `(|=)` like this: 245 | 246 | (|.) : Parser keep -> Parser ignore -> Parser keep 247 | (|.) keepParser ignoreParser = 248 | map2 (\keep _ -> keep) keepParser ignoreParser 249 | 250 | (|=) : Parser (a -> b) -> Parser a -> Parser b 251 | (|=) funcParser argParser = 252 | map2 (\func arg -> func arg) funcParser argParser 253 | -} 254 | map2 : (a -> b -> value) -> Parser a -> Parser b -> Parser value 255 | map2 func (Parser parseA) (Parser parseB) = 256 | Parser <| \state1 -> 257 | case parseA state1 of 258 | Bad x state2 -> 259 | Bad x state2 260 | 261 | Good a state2 -> 262 | case parseB state2 of 263 | Bad x state3 -> 264 | Bad x state3 265 | 266 | Good b state3 -> 267 | Good (func a b) state3 268 | 269 | 270 | {-| **Keep** a value in a parser pipeline. 271 | 272 | Read about parser pipelines **[here][]**. They are really nice! 273 | 274 | [here]: https://github.com/elm-tools/parser/blob/master/README.md#parser-pipeline 275 | -} 276 | (|=) : Parser (a -> b) -> Parser a -> Parser b 277 | (|=) parseFunc parseArg = 278 | map2 apply parseFunc parseArg 279 | 280 | 281 | apply : (a -> b) -> a -> b 282 | apply f a = 283 | f a 284 | 285 | 286 | {-| **Ignore** a value in a parser pipeline. 287 | 288 | Read about parser pipelines **[here][]**. They are really nice! 289 | 290 | [here]: https://github.com/elm-tools/parser/blob/master/README.md#parser-pipeline 291 | -} 292 | (|.) : Parser keep -> Parser ignore -> Parser keep 293 | (|.) keepParser ignoreParser = 294 | map2 always keepParser ignoreParser 295 | 296 | 297 | infixl 5 |. 298 | infixl 5 |= 299 | 300 | 301 | 302 | -- AND THEN 303 | 304 | 305 | {-| Run a parser *and then* run another parser! 306 | -} 307 | andThen : (a -> Parser b) -> Parser a -> Parser b 308 | andThen callback (Parser parseA) = 309 | Parser <| \state1 -> 310 | case parseA state1 of 311 | Bad x state2 -> 312 | Bad x state2 313 | 314 | Good a state2 -> 315 | let 316 | (Parser parseB) = 317 | callback a 318 | in 319 | parseB state2 320 | 321 | 322 | 323 | -- LAZY 324 | 325 | 326 | {-| Helper to define recursive parsers. Say we want a parser for simple 327 | boolean expressions: 328 | 329 | true 330 | false 331 | (true || false) 332 | (true || (true || false)) 333 | 334 | Notice that a boolean expression might contain *other* boolean expressions. 335 | That means we will want to define our parser in terms of itself: 336 | 337 | type Boolean 338 | = MyTrue 339 | | MyFalse 340 | | MyOr Boolean Boolean 341 | 342 | boolean : Parser Boolean 343 | boolean = 344 | oneOf 345 | [ succeed MyTrue 346 | |. keyword "true" 347 | , succeed MyFalse 348 | |. keyword "false" 349 | , succeed MyOr 350 | |. symbol "(" 351 | |. spaces 352 | |= lazy (\_ -> boolean) 353 | |. spaces 354 | |. symbol "||" 355 | |. spaces 356 | |= lazy (\_ -> boolean) 357 | |. spaces 358 | |. symbol ")" 359 | ] 360 | 361 | spaces : Parser () 362 | spaces = 363 | ignore zeroOrMore (\char -> char == ' ') 364 | 365 | **Notice that `boolean` uses `boolean` in its definition!** In Elm, you can 366 | only define a value in terms of itself it is behind a function call. So 367 | `lazy` helps us define these self-referential parsers. 368 | 369 | **Note:** In some cases, it may be more natural or efficient to use 370 | `andThen` to hide a self-reference behind a function. 371 | -} 372 | lazy : (() -> Parser a) -> Parser a 373 | lazy thunk = 374 | Parser <| \state -> 375 | let 376 | (Parser parse) = 377 | thunk () 378 | in 379 | parse state 380 | 381 | 382 | 383 | -- ONE OF 384 | 385 | 386 | {-| Try a bunch of different parsers. If a parser does not commit, we 387 | move on and try the next one. If a parser *does* commit, we give up on any 388 | remaining parsers. 389 | 390 | The idea is: if you make progress and commit to a parser, you want to 391 | get error messages from *that path*. If you bactrack and keep trying stuff 392 | you will get a much less precise error. 393 | 394 | So say we are parsing “language terms” that include integers and lists 395 | of integers: 396 | 397 | term : Parser Expr 398 | term = 399 | oneOf 400 | [ listOf int 401 | , int 402 | ] 403 | 404 | listOf : Parser a -> Parser (List a) 405 | listOf parser = 406 | succeed identity 407 | |. symbol "[" 408 | |. spaces 409 | ... 410 | 411 | When we get to `oneOf`, we first try the `listOf int` parser. If we see a 412 | `[` we *commit* to that parser. That means if something goes wrong, we do 413 | not backtrack. Instead the parse fails! If we do not see a `[` we move on 414 | to the second option and just try the `int` parser. 415 | -} 416 | oneOf : List (Parser a) -> Parser a 417 | oneOf parsers = 418 | Parser <| \state -> oneOfHelp state [] parsers 419 | 420 | 421 | oneOfHelp : State -> List Problem -> List (Parser a) -> Step a 422 | oneOfHelp state problems parsers = 423 | case parsers of 424 | [] -> 425 | Bad (BadOneOf (List.reverse problems)) state 426 | 427 | Parser parse :: remainingParsers -> 428 | case parse state of 429 | Good _ _ as step -> 430 | step 431 | 432 | Bad problem { row, col } as step -> 433 | if state.row == row && state.col == col then 434 | oneOfHelp state (problem :: problems) remainingParsers 435 | 436 | else 437 | step 438 | 439 | 440 | 441 | -- REPEAT 442 | 443 | 444 | {-| Try to use the parser as many times as possible. Say we want to parse 445 | `NaN` a bunch of times: 446 | 447 | batman : Parser Int 448 | batman = 449 | map List.length (repeat zeroOrMore (keyword "NaN")) 450 | 451 | -- run batman "whatever" == Ok 0 452 | -- run batman "" == Ok 0 453 | -- run batman "NaN" == Ok 1 454 | -- run batman "NaNNaN" == Ok 2 455 | -- run batman "NaNNaNNaN" == Ok 3 456 | -- run batman "NaNNaN batman!" == Ok 2 457 | 458 | **Note:** If you are trying to parse things like `[1,2,3]` or `{ x = 3 }` 459 | check out the [`list`](Parser-LanguageKit#list) and 460 | [`record`](Parser-LanguageKit#record) functions in the 461 | [`Parser.LanguageKit`](Parser-LanguageKit) module. 462 | -} 463 | repeat : Count -> Parser a -> Parser (List a) 464 | repeat count (Parser parse) = 465 | case count of 466 | Exactly n -> 467 | Parser <| \state -> 468 | repeatExactly n parse [] state 469 | 470 | AtLeast n -> 471 | Parser <| \state -> 472 | repeatAtLeast n parse [] state 473 | 474 | 475 | repeatExactly : Int -> (State -> Step a) -> List a -> State -> Step (List a) 476 | repeatExactly n parse revList state1 = 477 | if n <= 0 then 478 | Good (List.reverse revList) state1 479 | 480 | else 481 | case parse state1 of 482 | Good a state2 -> 483 | if state1.row == state2.row && state1.col == state2.col then 484 | Bad BadRepeat state2 485 | else 486 | repeatExactly (n - 1) parse (a :: revList) state2 487 | 488 | Bad x state2 -> 489 | Bad x state2 490 | 491 | 492 | repeatAtLeast : Int -> (State -> Step a) -> List a -> State -> Step (List a) 493 | repeatAtLeast n parse revList state1 = 494 | case parse state1 of 495 | Good a state2 -> 496 | if state1.row == state2.row && state1.col == state2.col then 497 | Bad BadRepeat state2 498 | else 499 | repeatAtLeast (n - 1) parse (a :: revList) state2 500 | 501 | Bad x state2 -> 502 | if state1.row == state2.row && state1.col == state2.col && n <= 0 then 503 | Good (List.reverse revList) state1 504 | 505 | else 506 | Bad x state2 507 | 508 | 509 | 510 | -- DELAYED COMMIT 511 | 512 | 513 | {-| Only commit if `Parser a` succeeds and `Parser value` makes some progress. 514 | 515 | This is very important for generating high quality error messages! Read more 516 | about this [here][1] and [here][2]. 517 | 518 | [1]: https://github.com/elm-tools/parser/blob/master/README.md#delayed-commits 519 | [2]: https://github.com/elm-tools/parser/blob/master/comparison.md 520 | -} 521 | delayedCommit : Parser a -> Parser value -> Parser value 522 | delayedCommit filler realStuff = 523 | delayedCommitMap (\_ v -> v) filler realStuff 524 | 525 | 526 | {-| Like [`delayedCommit`](#delayedCommit), but lets you extract values from 527 | both parsers. Read more about it [here][1] and [here][2]. 528 | 529 | [1]: https://github.com/elm-tools/parser/blob/master/README.md#delayed-commits 530 | [2]: https://github.com/elm-tools/parser/blob/master/comparison.md 531 | -} 532 | delayedCommitMap : (a -> b -> value) -> Parser a -> Parser b -> Parser value 533 | delayedCommitMap func (Parser parseA) (Parser parseB) = 534 | Parser <| \state1 -> 535 | case parseA state1 of 536 | Bad x _ -> 537 | Bad x state1 538 | 539 | Good a state2 -> 540 | case parseB state2 of 541 | Good b state3 -> 542 | Good (func a b) state3 543 | 544 | Bad x state3 -> 545 | if state2.row == state3.row && state2.col == state3.col then 546 | Bad x state1 547 | else 548 | Bad x state3 549 | 550 | 551 | 552 | -- SYMBOLS and KEYWORDS 553 | 554 | 555 | {-| Parse symbols like `,`, `(`, and `&&`. 556 | 557 | run (symbol "[") "[" == Ok () 558 | run (symbol "[") "4" == Err ... (ExpectingSymbol "[") ... 559 | -} 560 | symbol : String -> Parser () 561 | symbol str = 562 | token ExpectingSymbol str 563 | 564 | 565 | {-| Parse keywords like `let`, `case`, and `type`. 566 | 567 | run (keyword "let") "let" == Ok () 568 | run (keyword "let") "var" == Err ... (ExpectingKeyword "let") ... 569 | -} 570 | keyword : String -> Parser () 571 | keyword str = 572 | token ExpectingKeyword str 573 | 574 | 575 | token : (String -> Problem) -> String -> Parser () 576 | token makeProblem str = 577 | Parser <| \({ source, offset, indent, context, row, col } as state) -> 578 | let 579 | (newOffset, newRow, newCol) = 580 | Prim.isSubString str offset row col source 581 | in 582 | if newOffset == -1 then 583 | Bad (makeProblem str) state 584 | 585 | else 586 | Good () 587 | { source = source 588 | , offset = newOffset 589 | , indent = indent 590 | , context = context 591 | , row = newRow 592 | , col = newCol 593 | } 594 | 595 | 596 | -- INT 597 | 598 | 599 | {-| Parse integers. It accepts decimal and hexidecimal formats. 600 | 601 | -- decimal 602 | run int "1234" == Ok 1234 603 | run int "1.34" == Err ... 604 | run int "1e31" == Err ... 605 | run int "123a" == Err ... 606 | run int "0123" == Err ... 607 | 608 | -- hexidecimal 609 | run int "0x001A" == Ok 26 610 | run int "0x001a" == Ok 26 611 | run int "0xBEEF" == Ok 48879 612 | run int "0x12.0" == Err ... 613 | run int "0x12an" == Err ... 614 | 615 | **Note:** If you want a parser for both `Int` and `Float` literals, 616 | check out [`Parser.LanguageKit.number`](Parser-LanguageKit#number). 617 | It does not backtrack, so it should be faster and give better error 618 | messages than using `oneOf` and combining `int` and `float` yourself. 619 | 620 | **Note:** If you want to enable octal or binary `Int` literals, 621 | check out [`Parser.LanguageKit.int`](Parser-LanguageKit#int). 622 | -} 623 | int : Parser Int 624 | int = 625 | Parser <| \{ source, offset, indent, context, row, col } -> 626 | case intHelp offset (Prim.isSubChar isZero offset source) source of 627 | Err badOffset -> 628 | Bad BadInt 629 | { source = source 630 | , offset = badOffset 631 | , indent = indent 632 | , context = context 633 | , row = row 634 | , col = col + (badOffset - offset) 635 | } 636 | 637 | Ok goodOffset -> 638 | case String.toInt (String.slice offset goodOffset source) of 639 | Err _ -> 640 | Debug.crash badIntMsg 641 | 642 | Ok n -> 643 | Good n 644 | { source = source 645 | , offset = goodOffset 646 | , indent = indent 647 | , context = context 648 | , row = row 649 | , col = col + (goodOffset - offset) 650 | } 651 | 652 | 653 | intHelp : Int -> Int -> String -> Result Int Int 654 | intHelp offset zeroOffset source = 655 | if zeroOffset == -1 then 656 | Internal.chompDigits Char.isDigit offset source 657 | 658 | else if Prim.isSubChar isX zeroOffset source /= -1 then 659 | Internal.chompDigits Char.isHexDigit (offset + 2) source 660 | 661 | -- else if Prim.isSubChar isO zeroOffset source /= -1 then 662 | -- Internal.chompDigits Char.isOctDigit (offset + 2) source 663 | 664 | else if Prim.isSubChar Internal.isBadIntEnd zeroOffset source == -1 then 665 | Ok zeroOffset 666 | 667 | else 668 | Err zeroOffset 669 | 670 | 671 | isZero : Char -> Bool 672 | isZero char = 673 | char == '0' 674 | 675 | 676 | isO : Char -> Bool 677 | isO char = 678 | char == 'o' 679 | 680 | 681 | isX : Char -> Bool 682 | isX char = 683 | char == 'x' 684 | 685 | 686 | badIntMsg : String 687 | badIntMsg = 688 | """The `Parser.int` parser seems to have a bug. 689 | Please report an SSCCE to .""" 690 | 691 | 692 | 693 | -- FLOAT 694 | 695 | 696 | {-| Parse floats. 697 | 698 | run float "123" == Ok 123 699 | run float "3.1415" == Ok 3.1415 700 | run float "0.1234" == Ok 0.1234 701 | run float ".1234" == Ok 0.1234 702 | run float "1e-42" == Ok 1e-42 703 | run float "6.022e23" == Ok 6.022e23 704 | run float "6.022E23" == Ok 6.022e23 705 | run float "6.022e+23" == Ok 6.022e23 706 | run float "6.022e" == Err .. 707 | run float "6.022n" == Err .. 708 | run float "6.022.31" == Err .. 709 | 710 | **Note:** If you want a parser for both `Int` and `Float` literals, 711 | check out [`Parser.LanguageKit.number`](Parser-LanguageKit#number). 712 | It does not backtrack, so it should be faster and give better error 713 | messages than using `oneOf` and combining `int` and `float` yourself. 714 | 715 | **Note:** If you want to disable literals like `.123` like Elm, 716 | check out [`Parser.LanguageKit.float`](Parser-LanguageKit#float). 717 | -} 718 | float : Parser Float 719 | float = 720 | Parser <| \{ source, offset, indent, context, row, col } -> 721 | case floatHelp offset (Prim.isSubChar isZero offset source) source of 722 | Err badOffset -> 723 | Bad BadFloat 724 | { source = source 725 | , offset = badOffset 726 | , indent = indent 727 | , context = context 728 | , row = row 729 | , col = col + (badOffset - offset) 730 | } 731 | 732 | Ok goodOffset -> 733 | case String.toFloat (String.slice offset goodOffset source) of 734 | Err _ -> 735 | Debug.crash badFloatMsg 736 | 737 | Ok n -> 738 | Good n 739 | { source = source 740 | , offset = goodOffset 741 | , indent = indent 742 | , context = context 743 | , row = row 744 | , col = col + (goodOffset - offset) 745 | } 746 | 747 | 748 | floatHelp : Int -> Int -> String -> Result Int Int 749 | floatHelp offset zeroOffset source = 750 | if zeroOffset >= 0 then 751 | Internal.chompDotAndExp zeroOffset source 752 | 753 | else 754 | let 755 | dotOffset = 756 | Internal.chomp Char.isDigit offset source 757 | 758 | result = 759 | Internal.chompDotAndExp dotOffset source 760 | in 761 | case result of 762 | Err _ -> 763 | result 764 | 765 | Ok n -> 766 | if n == offset then Err n else result 767 | 768 | 769 | badFloatMsg : String 770 | badFloatMsg = 771 | """The `Parser.float` parser seems to have a bug. 772 | Please report an SSCCE to .""" 773 | 774 | 775 | 776 | -- END 777 | 778 | 779 | {-| Check if you have reached the end of the string you are parsing. 780 | 781 | justAnInt : Parser Int 782 | justAnInt = 783 | succeed identity 784 | |= int 785 | |. end 786 | 787 | -- run justAnInt "90210" == Ok 90210 788 | -- run justAnInt "1 + 2" == Err ... 789 | -- run int "1 + 2" == Ok 1 790 | 791 | Parsers can succeed without parsing the whole string. Ending your parser 792 | with `end` guarantees that you have successfully parsed the whole string. 793 | -} 794 | end : Parser () 795 | end = 796 | Parser <| \state -> 797 | if String.length state.source == state.offset then 798 | Good () state 799 | 800 | else 801 | Bad ExpectingEnd state 802 | 803 | 804 | 805 | -- SOURCE 806 | 807 | 808 | {-| Run a parser, but return the underlying source code that actually 809 | got parsed. 810 | 811 | -- run (source (ignore oneOrMore Char.isLower)) "abc" == Ok "abc" 812 | -- keep count isOk = source (ignore count isOk) 813 | 814 | This becomes a useful optimization when you need to [`keep`](#keep) 815 | something very specific. For example, say we want to parse capitalized 816 | words: 817 | 818 | import Char 819 | 820 | variable : Parser String 821 | variable = 822 | succeed (++) 823 | |= keep (Exactly 1) Char.isUpper 824 | |= keep zeroOrMore Char.isLower 825 | 826 | In this case, each `keep` allocates a string. Then we use `(++)` to create the 827 | final string. That means *three* strings are allocated. 828 | 829 | In contrast, using `source` with `ignore` lets you grab the final string 830 | directly. It tracks where the parser starts and ends, so it can use 831 | `String.slice` to grab that part directly. 832 | 833 | variable : Parser String 834 | variable = 835 | source <| 836 | ignore (Exactly 1) Char.isUpper 837 | |. ignore zeroOrMore Char.isLower 838 | 839 | This version only allocates *one* string. 840 | -} 841 | source : Parser a -> Parser String 842 | source parser = 843 | sourceMap always parser 844 | 845 | 846 | {-| Like `source`, but it allows you to combine the source string 847 | with the value that is produced by the parser. So maybe you want 848 | a float, but you also want to know exactly how it looked. 849 | 850 | number : Parser (String, Float) 851 | number = 852 | sourceMap (,) float 853 | 854 | -- run number "100" == Ok ("100", 100) 855 | -- run number "1e2" == Ok ("1e2", 100) 856 | -} 857 | sourceMap : (String -> a -> b) -> Parser a -> Parser b 858 | sourceMap func (Parser parse) = 859 | Parser <| \({source, offset} as state1) -> 860 | case parse state1 of 861 | Bad x state2 -> 862 | Bad x state2 863 | 864 | Good a state2 -> 865 | let 866 | subString = 867 | String.slice offset state2.offset source 868 | in 869 | Good (func subString a) state2 870 | 871 | 872 | 873 | -- REPEAT 874 | 875 | 876 | {-| How many characters to [`keep`](#keep) or [`ignore`](#ignore). 877 | -} 878 | type Count = AtLeast Int | Exactly Int 879 | 880 | 881 | {-| A simple alias for `AtLeast 0` so your code reads nicer: 882 | 883 | import Char 884 | 885 | spaces : Parser String 886 | spaces = 887 | keep zeroOrMore (\c -> c == ' ') 888 | 889 | -- same as: keep (AtLeast 0) (\c -> c == ' ') 890 | -} 891 | zeroOrMore : Count 892 | zeroOrMore = 893 | AtLeast 0 894 | 895 | 896 | {-| A simple alias for `AtLeast 1` so your code reads nicer: 897 | 898 | import Char 899 | 900 | lows : Parser String 901 | lows = 902 | keep oneOrMore Char.isLower 903 | 904 | -- same as: keep (AtLeast 1) Char.isLower 905 | -} 906 | oneOrMore : Count 907 | oneOrMore = 908 | AtLeast 1 909 | 910 | 911 | {-| Keep some characters. If you want a capital letter followed by 912 | zero or more lower case letters, you could say: 913 | 914 | import Char 915 | 916 | capitalized : Parser String 917 | capitalized = 918 | succeed (++) 919 | |= keep (Exactly 1) Char.isUpper 920 | |= keep zeroOrMore Char.isLower 921 | 922 | -- good: Cat, Tom, Sally 923 | -- bad: cat, tom, TOM, tOm 924 | 925 | **Note:** Check out [`source`](#source) for a more efficient 926 | way to grab the underlying source of a complex parser. 927 | -} 928 | keep : Count -> (Char -> Bool) -> Parser String 929 | keep count predicate = 930 | source (ignore count predicate) 931 | 932 | 933 | {-| Ignore some characters. If you want to ignore one or more 934 | spaces, you might say: 935 | 936 | spaces : Parser () 937 | spaces = 938 | ignore oneOrMore (\c -> c == ' ') 939 | 940 | -} 941 | ignore : Count -> (Char -> Bool) -> Parser () 942 | ignore count predicate = 943 | case count of 944 | Exactly n -> 945 | Parser <| \{ source, offset, indent, context, row, col } -> 946 | ignoreExactly n predicate source offset indent context row col 947 | 948 | AtLeast n -> 949 | Parser <| \{ source, offset, indent, context, row, col } -> 950 | ignoreAtLeast n predicate source offset indent context row col 951 | 952 | 953 | ignoreExactly : Int -> (Char -> Bool) -> String -> Int -> Int -> List Context -> Int -> Int -> Step () 954 | ignoreExactly n predicate source offset indent context row col = 955 | if n <= 0 then 956 | Good () 957 | { source = source 958 | , offset = offset 959 | , indent = indent 960 | , context = context 961 | , row = row 962 | , col = col 963 | } 964 | 965 | else 966 | let 967 | newOffset = 968 | Prim.isSubChar predicate offset source 969 | in 970 | if newOffset == -1 then 971 | Bad BadRepeat 972 | { source = source 973 | , offset = offset 974 | , indent = indent 975 | , context = context 976 | , row = row 977 | , col = col 978 | } 979 | 980 | else if newOffset == -2 then 981 | ignoreExactly (n - 1) predicate source (offset + 1) indent context (row + 1) 1 982 | 983 | else 984 | ignoreExactly (n - 1) predicate source newOffset indent context row (col + 1) 985 | 986 | 987 | ignoreAtLeast : Int -> (Char -> Bool) -> String -> Int -> Int -> List Context -> Int -> Int -> Step () 988 | ignoreAtLeast n predicate source offset indent context row col = 989 | let 990 | newOffset = 991 | Prim.isSubChar predicate offset source 992 | in 993 | -- no match 994 | if newOffset == -1 then 995 | let 996 | state = 997 | { source = source 998 | , offset = offset 999 | , indent = indent 1000 | , context = context 1001 | , row = row 1002 | , col = col 1003 | } 1004 | in 1005 | if n <= 0 then Good () state else Bad BadRepeat state 1006 | 1007 | -- matched a newline 1008 | else if newOffset == -2 then 1009 | ignoreAtLeast (n - 1) predicate source (offset + 1) indent context (row + 1) 1 1010 | 1011 | -- normal match 1012 | else 1013 | ignoreAtLeast (n - 1) predicate source newOffset indent context row (col + 1) 1014 | 1015 | 1016 | 1017 | -- IGNORE UNTIL 1018 | 1019 | 1020 | {-| Ignore characters until *after* the given string. 1021 | So maybe we want to parse Elm-style single-line comments: 1022 | 1023 | elmComment : Parser () 1024 | elmComment = 1025 | symbol "--" 1026 | |. ignoreUntil "\n" 1027 | 1028 | Or maybe you want to parse JS-style multi-line comments: 1029 | 1030 | jsComment : Parser () 1031 | jsComment = 1032 | symbol "/*" 1033 | |. ignoreUntil "*/" 1034 | 1035 | **Note:** You must take more care when parsing Elm-style multi-line 1036 | comments. Elm can recognize nested comments, but the `jsComment` parser 1037 | cannot. See [`Parser.LanguageKit.whitespace`](Parser-LanguageKit#whitespace) 1038 | for help with this. 1039 | -} 1040 | ignoreUntil : String -> Parser () 1041 | ignoreUntil str = 1042 | Parser <| \({ source, offset, indent, context, row, col } as state) -> 1043 | let 1044 | (newOffset, newRow, newCol) = 1045 | Prim.findSubString False str offset row col source 1046 | in 1047 | if newOffset == -1 then 1048 | Bad (ExpectingClosing str) state 1049 | 1050 | else 1051 | Good () 1052 | { source = source 1053 | , offset = newOffset 1054 | , indent = indent 1055 | , context = context 1056 | , row = newRow 1057 | , col = newCol 1058 | } 1059 | 1060 | 1061 | 1062 | -- CONTEXT 1063 | 1064 | 1065 | {-| Specify what you are parsing right now. So if you have a parser 1066 | for lists like `[ 1, 2, 3 ]` you could say: 1067 | 1068 | list : Parser (List Int) 1069 | list = 1070 | inContext "list" <| 1071 | succeed identity 1072 | |. symbol "[" 1073 | |. spaces 1074 | |= commaSep int 1075 | |. spaces 1076 | |. symbol "]" 1077 | 1078 | -- spaces : Parser () 1079 | -- commaSep : Parser a -> Parser (List a) 1080 | 1081 | Now you get that extra context information if there is a parse error anywhere 1082 | in the list. For example, if you have `[ 1, 23zm5, 3 ]` you could generate an 1083 | error message like this: 1084 | 1085 | I ran into a problem while parsing this list: 1086 | 1087 | [ 1, 23zm5, 3 ] 1088 | ^ 1089 | Looking for a valid integer, like 6 or 90210. 1090 | 1091 | Notice that the error message knows you are parsing a list right now! 1092 | -} 1093 | inContext : String -> Parser a -> Parser a 1094 | inContext ctx (Parser parse) = 1095 | Parser <| \({ context, row, col } as initialState) -> 1096 | let 1097 | state1 = 1098 | changeContext (Context row col ctx :: context) initialState 1099 | in 1100 | case parse state1 of 1101 | Good a state2 -> 1102 | Good a (changeContext context state2) 1103 | 1104 | Bad _ _ as step -> 1105 | step 1106 | 1107 | 1108 | changeContext : List Context -> State -> State 1109 | changeContext newContext { source, offset, indent, row, col } = 1110 | { source = source 1111 | , offset = offset 1112 | , indent = indent 1113 | , context = newContext 1114 | , row = row 1115 | , col = col 1116 | } 1117 | --------------------------------------------------------------------------------