├── .gitignore
├── LICENSE
├── README.md
├── comparison.md
├── elm.json
├── examples
    ├── DoubleQuoteString.elm
    ├── Math.elm
    ├── README.md
    └── elm.json
├── semantics.md
└── src
    ├── Elm
        └── Kernel
        │   └── Parser.js
    ├── Parser.elm
    └── Parser
        └── Advanced.elm


/.gitignore:
--------------------------------------------------------------------------------
1 | elm-stuff


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2017-present, Evan Czaplicki
 2 | All rights reserved.
 3 | 
 4 | Redistribution and use in source and binary forms, with or without
 5 | modification, are permitted provided that the following conditions are met:
 6 | 
 7 | * Redistributions of source code must retain the above copyright notice, this
 8 |   list of conditions and the following disclaimer.
 9 | 
10 | * Redistributions in binary form must reproduce the above copyright notice,
11 |   this list of conditions and the following disclaimer in the documentation
12 |   and/or other materials provided with the distribution.
13 | 
14 | * Neither the name of the {organization} nor the names of its
15 |   contributors may be used to endorse or promote products derived from
16 |   this software without specific prior written permission.
17 | 
18 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
19 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
21 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
22 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
24 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
25 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
26 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
27 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
28 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Parser
 2 | 
 3 | Regular expressions are quite confusing and difficult to use. This library provides a coherent alternative that handles more cases and produces clearer code.
 4 | 
 5 | The particular goals of this library are:
 6 | 
 7 |   - Make writing parsers as simple and fun as possible.
 8 |   - Produce excellent error messages.
 9 |   - Go pretty fast.
10 | 
11 | This is achieved with a couple concepts that I have not seen in any other parser libraries: [parser pipelines](#parser-pipelines), [backtracking](#backtracking), and [tracking context](#tracking-context).
12 | 
13 | 
14 | ## Parser Pipelines
15 | 
16 | To parse a 2D point like `( 3, 4 )`, you might create a `point` parser like this:
17 | 
18 | ```elm
19 | import Parser exposing (Parser, (|.), (|=), succeed, symbol, float, spaces)
20 | 
21 | type alias Point =
22 |   { x : Float
23 |   , y : Float
24 |   }
25 | 
26 | point : Parser Point
27 | point =
28 |   succeed Point
29 |     |. symbol "("
30 |     |. spaces
31 |     |= float
32 |     |. spaces
33 |     |. symbol ","
34 |     |. spaces
35 |     |= float
36 |     |. spaces
37 |     |. symbol ")"
38 | ```
39 | 
40 | All the interesting stuff is happening in `point`. It uses two operators:
41 | 
42 |   - [`(|.)`][ignore] means “parse this, but **ignore** the result”
43 |   - [`(|=)`][keep] means “parse this, and **keep** the result”
44 | 
45 | So the `Point` function only gets the result of the two `float` parsers.
46 | 
47 | [ignore]: https://package.elm-lang.org/packages/elm/parser/latest/Parser#|.
48 | [keep]: https://package.elm-lang.org/packages/elm/parser/latest/Parser#|=
49 | 
50 | The theory is that `|=` introduces more “visual noise” than `|.`, making it pretty easy to pick out which lines in the pipeline are important.
51 | 
52 | I recommend having one line per operator in your parser pipeline. If you need multiple lines for some reason, use a `let` or make a helper function.
53 | 
54 | 
55 | 
56 | ## Backtracking
57 | 
58 | To make fast parsers with precise error messages, all of the parsers in this package do not backtrack by default. Once you start going down a path, you keep going down it.
59 | 
60 | This is nice in a string like `[ 1, 23zm5, 3 ]` where you want the error at the `z`. If we had backtracking by default, you might get the error on `[` instead. That is way less specific and harder to fix!
61 | 
62 | So the defaults are nice, but sometimes the easiest way to write a parser is to look ahead a bit and see what is going to happen. It is definitely more costly to do this, but it can be handy if there is no other way. This is the role of [`backtrackable`](https://package.elm-lang.org/packages/elm/parser/latest/Parser#backtrackable) parsers. Check out the [semantics](https://github.com/elm/parser/blob/master/semantics.md) page for more details!
63 | 
64 | 
65 | ## Tracking Context
66 | 
67 | Most parsers tell you the row and column of the problem:
68 | 
69 |     Something went wrong at (4:17)
70 | 
71 | That may be true, but it is not how humans think. It is how text editors think! It would be better to say:
72 | 
73 |     I found a problem with this list:
74 | 
75 |         [ 1, 23zm5, 3 ]
76 |              ^
77 |     I wanted an integer, like 6 or 90219.
78 | 
79 | Notice that the error messages says `this list`. That is context! That is the language my brain speaks, not rows and columns.
80 | 
81 | Once you get comfortable with the `Parser` module, you can switch over to `Parser.Advanced` and use [`inContext`](https://package.elm-lang.org/packages/elm/parser/latest/Parser-Advanced#inContext) to track exactly what your parser thinks it is doing at the moment. You can let the parser know “I am trying to parse a `"list"` right now” so if an error happens anywhere in that context, you get the hand annotation!
82 | 
83 | This technique is used by the parser in the Elm compiler to give more helpful error messages.
84 | 
85 | 
86 | ## [Comparison with Prior Work](https://github.com/elm/parser/blob/master/comparison.md)
87 | 


--------------------------------------------------------------------------------
/comparison.md:
--------------------------------------------------------------------------------
 1 | ## Comparison with Prior Work
 2 | 
 3 | I have not seen the [parser pipeline][1] or the [context stack][2] ideas in other libraries, but [backtracking][3] relate to prior work.
 4 | 
 5 | [1]: README.md#parser-pipelines
 6 | [2]: README.md#tracking-context
 7 | [3]: README.md#backtracking
 8 | 
 9 | Most parser combinator libraries I have seen are based on Haskell’s Parsec library, which has primitives named `try` and `lookAhead`. I believe [`backtrackable`][backtrackable] is a better primitive for two reasons.
10 | 
11 | [backtrackable]: https://package.elm-lang.org/packages/elm/parser/latest/Parser#backtrackable
12 | 
13 | 
14 | ### Performance and Composition
15 | 
16 | Say we want to create a precise error message for `length [1,,3]`. The naive approach with Haskell’s Parsec library produces very bad error messages:
17 | 
18 | ```haskell
19 | spaceThenArg :: Parser Expr
20 | spaceThenArg =
21 |   try (spaces >> term)
22 | ```
23 | 
24 | This means we get a precise error from `term`, but then throw it away and say something went wrong at the space before the `[`. Very confusing! To improve quality, we must write something like this:
25 | 
26 | ```haskell
27 | spaceThenArg :: Parser Expr
28 | spaceThenArg =
29 |   choice
30 |     [ do  lookAhead (spaces >> char '[')
31 |           spaces
32 |           term
33 |     , try (spaces >> term)
34 |     ]
35 | ```
36 | 
37 | Notice that we parse `spaces` twice no matter what.
38 | 
39 | Notice that we also had to hardcode `[` in the `lookAhead`. What if we update `term` to parse records that start with `{` as well? To get good commits on records, we must remember to update `lookAhead` to look for `oneOf "[{"`. Implementation details are leaking out of `term`!
40 | 
41 | With `backtrackable` in this Elm library, you can just say:
42 | 
43 | ```elm
44 | spaceThenArg : Parser Expr
45 | spaceThenArg =
46 |   succeed identity
47 |     |. backtrackable spaces
48 |     |= term
49 | ```
50 | 
51 | It does less work, and is more reliable as `term` evolves. I believe the presence of `backtrackable` means that `lookAhead` is no longer needed.
52 | 
53 | 
54 | ### Expressiveness
55 | 
56 | You can define `try` in terms of [`backtrackable`][backtrackable] like this:
57 | 
58 | ```elm
59 | try : Parser a -> Parser a
60 | try parser =
61 |   succeed identity
62 |     |= backtrackable parser
63 |     |. commit ()
64 | ```
65 | 
66 | No expressiveness is lost!
67 | 
68 | So while it is possible to define `try`, I left it out of the public API. In practice, `try` often leads to “bad commits” where your parser fails in a very specific way, but you then backtrack to a less specific error message. I considered naming it `allOrNothing` to better explain how it changes commit behavior, but ultimately, I thought it was best to encourage users to express their parsers with `backtrackable` directly.
69 | 
70 | 
71 | ### Summary
72 | 
73 | Compared to previous work, `backtrackable` lets you produce precise error messages **more efficiently**. By thinking about “backtracking behavior” directly, you also end up with **cleaner composition** of parsers. And these benefits come **without any loss of expressiveness**.
74 | 


--------------------------------------------------------------------------------
/elm.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "type": "package",
 3 |     "name": "elm/parser",
 4 |     "summary": "a parsing library, focused on simplicity and great error messages",
 5 |     "license": "BSD-3-Clause",
 6 |     "version": "1.1.0",
 7 |     "exposed-modules": [
 8 |         "Parser",
 9 |         "Parser.Advanced"
10 |     ],
11 |     "elm-version": "0.19.0 <= v < 0.20.0",
12 |     "dependencies": {
13 |         "elm/core": "1.0.0 <= v < 2.0.0"
14 |     },
15 |     "test-dependencies": {}
16 | }


--------------------------------------------------------------------------------
/examples/DoubleQuoteString.elm:
--------------------------------------------------------------------------------
 1 | import Browser
 2 | import Char
 3 | import Html
 4 | import Parser exposing (..)
 5 | 
 6 | 
 7 | 
 8 | -- MAIN
 9 | 
10 | 
11 | main =
12 |   Html.text <| Debug.toString <|
13 |     run string "\"hello\""
14 | 
15 | 
16 | 
17 | -- STRINGS
18 | 
19 | 
20 | string : Parser String
21 | string =
22 |   succeed identity
23 |     |. token "\""
24 |     |= loop [] stringHelp
25 | 
26 | 
27 | stringHelp : List String -> Parser (Step (List String) String)
28 | stringHelp revChunks =
29 |   oneOf
30 |     [ succeed (\chunk -> Loop (chunk :: revChunks))
31 |         |. token "\\"
32 |         |= oneOf
33 |             [ map (\_ -> "\n") (token "n")
34 |             , map (\_ -> "\t") (token "t")
35 |             , map (\_ -> "\r") (token "r")
36 |             , succeed String.fromChar
37 |                 |. token "u{"
38 |                 |= unicode
39 |                 |. token "}"
40 |             ]
41 |     , token "\""
42 |         |> map (\_ -> Done (String.join "" (List.reverse revChunks)))
43 |     , chompWhile isUninteresting
44 |         |> getChompedString
45 |         |> map (\chunk -> Loop (chunk :: revChunks))
46 |     ]
47 | 
48 | 
49 | isUninteresting : Char -> Bool
50 | isUninteresting char =
51 |   char /= '\\' && char /= '"'
52 | 
53 | 
54 | 
55 | -- UNICODE
56 | 
57 | 
58 | unicode : Parser Char
59 | unicode =
60 |   getChompedString (chompWhile Char.isHexDigit)
61 |     |> andThen codeToChar
62 | 
63 | 
64 | codeToChar : String -> Parser Char
65 | codeToChar str =
66 |   let
67 |     length = String.length str
68 |     code = String.foldl addHex 0 str
69 |   in
70 |   if 4 <= length && length <= 6 then
71 |     problem "code point must have between 4 and 6 digits"
72 |   else if 0 <= code && code <= 0x10FFFF then
73 |     succeed (Char.fromCode code)
74 |   else
75 |     problem "code point must be between 0 and 0x10FFFF"
76 | 
77 | 
78 | addHex : Char -> Int -> Int
79 | addHex char total =
80 |   let
81 |     code = Char.toCode char
82 |   in
83 |   if 0x30 <= code && code <= 0x39 then
84 |     16 * total + (code - 0x30)
85 |   else if 0x41 <= code && code <= 0x46 then
86 |     16 * total + (10 + code - 0x41)
87 |   else
88 |     16 * total + (10 + code - 0x61)
89 | 


--------------------------------------------------------------------------------
/examples/Math.elm:
--------------------------------------------------------------------------------
  1 | module Math exposing
  2 |   ( Expr
  3 |   , evaluate
  4 |   , parse
  5 |   )
  6 | 
  7 | 
  8 | import Html exposing (div, p, text)
  9 | import Parser exposing (..)
 10 | 
 11 | 
 12 | 
 13 | -- MAIN
 14 | 
 15 | 
 16 | main =
 17 |   case parse "2 * (3 + 4)" of
 18 |     Err err ->
 19 |       text (Debug.toString err)
 20 | 
 21 |     Ok expr ->
 22 |       div []
 23 |         [ p [] [ text (Debug.toString expr) ]
 24 |         , p [] [ text (String.fromFloat (evaluate expr)) ]
 25 |         ]
 26 | 
 27 | 
 28 | 
 29 | -- EXPRESSIONS
 30 | 
 31 | 
 32 | type Expr
 33 |   = Integer Int
 34 |   | Floating Float
 35 |   | Add Expr Expr
 36 |   | Mul Expr Expr
 37 | 
 38 | 
 39 | evaluate : Expr -> Float
 40 | evaluate expr =
 41 |   case expr of
 42 |     Integer n ->
 43 |       toFloat n
 44 | 
 45 |     Floating n ->
 46 |       n
 47 | 
 48 |     Add a b ->
 49 |       evaluate a + evaluate b
 50 | 
 51 |     Mul a b ->
 52 |       evaluate a * evaluate b
 53 | 
 54 | 
 55 | parse : String -> Result (List DeadEnd) Expr
 56 | parse string =
 57 |   run expression string
 58 | 
 59 | 
 60 | 
 61 | -- PARSER
 62 | 
 63 | 
 64 | {-| We want to handle integers, hexadecimal numbers, and floats. Octal numbers
 65 | like `0o17` and binary numbers like `0b01101100` are not allowed.
 66 | -}
 67 | digits : Parser Expr
 68 | digits =
 69 |   number
 70 |     { int = Just Integer
 71 |     , hex = Just Integer
 72 |     , octal = Nothing
 73 |     , binary = Nothing
 74 |     , float = Just Floating
 75 |     }
 76 | 
 77 | 
 78 | {-| A term is a standalone chunk of math, like `4` or `(3 + 4)`. We use it as
 79 | a building block in larger expressions.
 80 | -}
 81 | term : Parser Expr
 82 | term =
 83 |   oneOf
 84 |     [ digits
 85 |     , succeed identity
 86 |         |. symbol "("
 87 |         |. spaces
 88 |         |= lazy (\_ -> expression)
 89 |         |. spaces
 90 |         |. symbol ")"
 91 |     ]
 92 | 
 93 | 
 94 | {-| Every expression starts with a term. After that, it may be done, or there
 95 | may be a `+` or `*` sign and more math.
 96 | -}
 97 | expression : Parser Expr
 98 | expression =
 99 |   term
100 |     |> andThen (expressionHelp [])
101 | 
102 | 
103 | {-| Once you have parsed a term, you can start looking for `+` and `* operators.
104 | I am tracking everything as a list, that way I can be sure to follow the order
105 | of operations (PEMDAS) when building the final expression.
106 | 
107 | In one case, I need an operator and another term. If that happens I keep
108 | looking for more. In the other case, I am done parsing, and I finalize the
109 | expression.
110 | -}
111 | expressionHelp : List (Expr, Operator) -> Expr -> Parser Expr
112 | expressionHelp revOps expr =
113 |   oneOf
114 |     [ succeed Tuple.pair
115 |         |. spaces
116 |         |= operator
117 |         |. spaces
118 |         |= term
119 |         |> andThen (\(op, newExpr) -> expressionHelp ((expr,op) :: revOps) newExpr)
120 |     , lazy (\_ -> succeed (finalize revOps expr))
121 |     ]
122 | 
123 | 
124 | type Operator = AddOp | MulOp
125 | 
126 | 
127 | operator : Parser Operator
128 | operator =
129 |   oneOf
130 |     [ map (\_ -> AddOp) (symbol "+")
131 |     , map (\_ -> MulOp) (symbol "*")
132 |     ]
133 | 
134 | 
135 | {-| We only have `+` and `*` in this parser. If we see a `MulOp` we can
136 | immediately group those two expressions. If we see an `AddOp` we wait to group
137 | until all the multiplies have been taken care of.
138 | 
139 | This code is kind of tricky, but it is a baseline for what you would need if
140 | you wanted to add `/`, `-`, `==`, `&&`, etc. which bring in more complex
141 | associativity and precedence rules.
142 | -}
143 | finalize : List (Expr, Operator) -> Expr -> Expr
144 | finalize revOps finalExpr =
145 |   case revOps of
146 |     [] ->
147 |       finalExpr
148 | 
149 |     (expr, MulOp) :: otherRevOps ->
150 |       finalize otherRevOps (Mul expr finalExpr)
151 | 
152 |     (expr, AddOp) :: otherRevOps ->
153 |       Add (finalize otherRevOps expr) finalExpr
154 | 


--------------------------------------------------------------------------------
/examples/README.md:
--------------------------------------------------------------------------------
 1 | # Run the Examples
 2 | 
 3 | To try these examples out locally, you can run the following terminal commands:
 4 | 
 5 | ```bash
 6 | git clone https://github.com/elm/parser.git
 7 | cd parser/examples
 8 | elm reactor
 9 | ```
10 | 
11 | After that, go to [`http://localhost:8000`](http://localhost:8000) and click on
12 | the example you want to see.
13 | 
14 | 
15 | ## Exercises
16 | 
17 | - Have a user input feed into the `Math` parser. Show people the results live.
18 | - Expand the `Math` parser to cover `-` and `/` as well.
19 | - Handle more escape characters in `DoubleQuotedString`. Maybe hexidecimal
20 | escapes like `\x41` and `\x0A` that are possible in JavaScript.


--------------------------------------------------------------------------------
/examples/elm.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "type": "application",
 3 |     "source-directories": [
 4 |         "."
 5 |     ],
 6 |     "elm-version": "0.19.0",
 7 |     "dependencies": {
 8 |         "direct": {
 9 |             "elm/browser": "1.0.0",
10 |             "elm/core": "1.0.0",
11 |             "elm/html": "1.0.0",
12 |             "elm/parser": "1.1.0"
13 |         },
14 |         "indirect": {
15 |             "elm/json": "1.0.0",
16 |             "elm/time": "1.0.0",
17 |             "elm/url": "1.0.0",
18 |             "elm/virtual-dom": "1.0.0"
19 |         }
20 |     },
21 |     "test-dependencies": {
22 |         "direct": {},
23 |         "indirect": {}
24 |     }
25 | }


--------------------------------------------------------------------------------
/semantics.md:
--------------------------------------------------------------------------------
  1 | # Semantics
  2 | 
  3 | The goal of this document is to explain how different parsers fit together. When will it backtrack? When will it not?
  4 | 
  5 | <br>
  6 | 
  7 | ### `keyword : String -> Parser ()`
  8 | 
  9 | Say we have `keyword "import"`:
 10 | 
 11 | | String        | Result     |
 12 | |---------------|------------|
 13 | | `"import"`    | `OK{false}` |
 14 | | `"imp"`       | `ERR{true}` |
 15 | | `"export"`    | `ERR{true}` |
 16 | 
 17 | In our `OK{false}` notation, we are indicating:
 18 | 
 19 | 1. Did the parser succeed? `OK` if yes. `ERR` if not.
 20 | 2. Is it possible to backtrack? So when `keyword` succeeds, backtracking is not allowed anymore. You must continue along that path.
 21 | 
 22 | <br>
 23 | 
 24 | 
 25 | ### `map : (a -> b) -> Parser a -> Parser b`
 26 | 
 27 | Say we have `map func parser`:
 28 | 
 29 | | `parser` | Result   |
 30 | |----------|----------|
 31 | | `OK{b}`  | `OK{b}`  |
 32 | | `ERR{b}` | `ERR{b}` |
 33 | 
 34 | So result of `map func parser` is always the same as the result of the `parser` itself.
 35 | 
 36 | <br>
 37 | 
 38 | 
 39 | ### `map2 : (a -> b -> c) -> Parser a -> Parser b -> Parser c`
 40 | 
 41 | Say we have `map2 func parserA parserB`:
 42 | 
 43 | | `parserA` | `parserB` | Result         |
 44 | |-----------|-----------|----------------|
 45 | | `OK{b}`   | `OK{b'}`  | `OK{b && b'}`  |
 46 | | `OK{b}`   | `ERR{b'}` | `ERR{b && b'}` |
 47 | | `ERR{b}`  |           | `ERR{b}`       |
 48 | 
 49 | If `parserA` succeeds, we try `parserB`. If they are both backtrackable, the combined result is backtrackable.
 50 | 
 51 | If `parserA` fails, that is our result.
 52 | 
 53 | This is used to define our pipeline operators like this:
 54 | 
 55 | ```elm
 56 | (|.) a b = map2 (\keep ignore -> keep) a b
 57 | (|=) a b = map2 (\func arg -> func arg) a b
 58 | ```
 59 | 
 60 | <br>
 61 | 
 62 | 
 63 | ### `either : Parser a -> Parser a -> Parser a`
 64 | 
 65 | Say we have `either parserA parserB`:
 66 | 
 67 | | `parserA`    | `parserB` | Result       |
 68 | |--------------|-----------|--------------|
 69 | | `OK{b}`      |           | `OK{b}`      |
 70 | | `ERR{true}`  | `OK{b}`   | `OK{b}`      |
 71 | | `ERR{true}`  | `ERR{b}`  | `ERR{b}`     |
 72 | | `ERR{false}` |           | `ERR{false}` |
 73 | 
 74 | The 4th case is very important! **If `parserA` is not backtrackable, you do not even try `parserB`.**
 75 | 
 76 | The `either` function does not appear in the public API, but I used it here because it makes the rules a bit easier to read. In the public API, we have `oneOf` instead. You can think of `oneOf` as trying `either` the head of the list, or `oneOf` the parsers in the tail of the list.
 77 | 
 78 | <br>
 79 | 
 80 | 
 81 | ### `andThen : (a -> Parser b) -> Parser a -> Parser b`
 82 | 
 83 | Say we have `andThen callback parserA` where `callback a` produces `parserB`:
 84 | 
 85 | | `parserA` | `parserB` | Result         |
 86 | |-----------|-----------|----------------|
 87 | | `ERR{b}`  |           | `ERR{b}`       |
 88 | | `OK{b}`   | `OK{b'}`  | `OK{b && b'}`  |
 89 | | `OK{b}`   | `ERR{b'}` | `ERR{b && b'}` |
 90 | 
 91 | If both parts are backtrackable, the overall result is backtrackable.
 92 | 
 93 | <br>
 94 | 
 95 | 
 96 | ### `backtrackable : Parser a -> Parser a`
 97 | 
 98 | Say we have `backtrackable parser`:
 99 | 
100 | | `parser` | Result      |
101 | |----------|-------------|
102 | | `OK{b}`  | `OK{true}`  |
103 | | `ERR{b}` | `ERR{true}` |
104 | 
105 | No matter how `parser` was defined, it is backtrackable now. This becomes very interesting when paired with `oneOf`. You can have one of the options start with a `backtrackable` segment, so even if you do start down that path, you can still try the next parser if something fails. **This has important yet subtle implications on performance, so definitely read on!**
106 | 
107 | <br>
108 | 
109 | 
110 | ## Examples
111 | 
112 | This parser is intended to give you very precise control over backtracking behavior, and I think that is best explained through examples.
113 | 
114 | <br>
115 | 
116 | ### `backtrackable`
117 | 
118 | Say we have `map2 func (backtrackable spaces) (symbol ",")` which can eat a bunch of spaces followed by a comma. Here is how it would work on different strings:
119 | 
120 | | String  | Result      |
121 | |---------|-------------|
122 | | `"  ,"` | `OK{false}` |
123 | | `"  :"` | `ERR{true}` |
124 | | `"abc"` | `ERR{true}` |
125 | 
126 | Remember how `map2` is backtrackable only if both parsers are backtrackable. So in the first case, the overall result is not backtrackable because `symbol ","` succeeded.
127 | 
128 | This becomes useful when paired with `either`!
129 | 
130 | <br>
131 | 
132 | 
133 | ### `backtrackable` + `oneOf` (inefficient)
134 | 
135 | Say we have the following `parser` definition:
136 | 
137 | ```elm
138 | parser : Parser (Maybe Int)
139 | parser =
140 |   oneOf
141 |     [ succeed Just
142 |         |. backtrackable spaces
143 |         |. symbol ","
144 |         |. spaces
145 |         |= int
146 |     , succeed Nothing
147 |         |. spaces
148 |         |. symbol "]"
149 |     ]
150 | ```
151 | 
152 | Here is how it would work on different strings:
153 | 
154 | | String    | Result       |
155 | |-----------|--------------|
156 | | `"  , 4"` | `OK{false}`  |
157 | | `"  ,"`   | `ERR{false}` |
158 | | `"  , a"` | `ERR{false}` |
159 | | `"  ]"`   | `OK{false}`  |
160 | | `"  a"`   | `ERR{false}` |
161 | | `"abc"`   | `ERR{true}`  |
162 | 
163 | Some of these cases are tricky, so let's look at them in more depth:
164 | 
165 | - `"  , a"` &mdash; `backtrackable spaces`, `symbol ","`, and `spaces` all succeed. At that point we have `OK{false}`. The `int` parser then fails on `a`, so we finish with `ERR{false}`. That means `oneOf` will NOT try the second possibility.
166 | - `"  ]"` &mdash; `backtrackable spaces` succeeds, but `symbol ","` fails. At that point we have `ERR{true}`, so `oneOf` tries the second possibility. After backtracking, `spaces` and `symbol "]"` succeed with `OK{false}`.
167 | - `"  a"` &mdash; `backtrackable spaces` succeeds, but `symbol ","` fails. At that point we have `ERR{true}`, so `oneOf` tries the second possibility. After backtracking, `spaces` succeeds with `OK{false}` and `symbol "]"` fails resulting in `ERR{false}`.
168 | 
169 | <br>
170 | 
171 | 
172 | ### `oneOf` (efficient)
173 | 
174 | Notice that in the previous example, we parsed `spaces` twice in some cases. This is inefficient, especially in large files with lots of whitespace. Backtracking is very inefficient in general though, so **if you are interested in performance, it is worthwhile to try to eliminate as many uses of `backtrackable` as possible.**
175 | 
176 | So we can rewrite that last example to never backtrack:
177 | 
178 | ```elm
179 | parser : Parser (Maybe Int)
180 | parser =
181 |   succeed identity
182 |   	|. spaces
183 |   	|= oneOf
184 |         [ succeed Just
185 |             |. symbol ","
186 |             |. spaces
187 |             |= int
188 |         , succeed Nothing
189 |             |. symbol "]"
190 |         ]
191 | ```
192 | 
193 | Now we are guaranteed to consume the spaces only one time. After that, we decide if we are looking at a `,` or `]`, so we never backtrack and reparse things.
194 | 
195 | If you are strategic in shuffling parsers around, you can write parsers that do not need `backtrackable` at all. The resulting parsers are quite fast. They are essentially the same as [LR(k)](https://en.wikipedia.org/wiki/Canonical_LR_parser) parsers, but more pleasant to write. I did this in Elm compiler for parsing Elm code, and it was very significantly faster.
196 | 


--------------------------------------------------------------------------------
/src/Elm/Kernel/Parser.js:
--------------------------------------------------------------------------------
  1 | /*
  2 | 
  3 | import Elm.Kernel.Utils exposing (chr, Tuple2, Tuple3)
  4 | 
  5 | */
  6 | 
  7 | 
  8 | 
  9 | // STRINGS
 10 | 
 11 | 
 12 | var _Parser_isSubString = F5(function(smallString, offset, row, col, bigString)
 13 | {
 14 | 	var smallLength = smallString.length;
 15 | 	var isGood = offset + smallLength <= bigString.length;
 16 | 
 17 | 	for (var i = 0; isGood && i < smallLength; )
 18 | 	{
 19 | 		var code = bigString.charCodeAt(offset);
 20 | 		isGood =
 21 | 			smallString[i++] === bigString[offset++]
 22 | 			&& (
 23 | 				code === 0x000A /* \n */
 24 | 					? ( row++, col=1 )
 25 | 					: ( col++, (code & 0xF800) === 0xD800 ? smallString[i++] === bigString[offset++] : 1 )
 26 | 			)
 27 | 	}
 28 | 
 29 | 	return __Utils_Tuple3(isGood ? offset : -1, row, col);
 30 | });
 31 | 
 32 | 
 33 | 
 34 | // CHARS
 35 | 
 36 | 
 37 | var _Parser_isSubChar = F3(function(predicate, offset, string)
 38 | {
 39 | 	return (
 40 | 		string.length <= offset
 41 | 			? -1
 42 | 			:
 43 | 		(string.charCodeAt(offset) & 0xF800) === 0xD800
 44 | 			? (predicate(__Utils_chr(string.substr(offset, 2))) ? offset + 2 : -1)
 45 | 			:
 46 | 		(predicate(__Utils_chr(string[offset]))
 47 | 			? ((string[offset] === '\n') ? -2 : (offset + 1))
 48 | 			: -1
 49 | 		)
 50 | 	);
 51 | });
 52 | 
 53 | 
 54 | var _Parser_isAsciiCode = F3(function(code, offset, string)
 55 | {
 56 | 	return string.charCodeAt(offset) === code;
 57 | });
 58 | 
 59 | 
 60 | 
 61 | // NUMBERS
 62 | 
 63 | 
 64 | var _Parser_chompBase10 = F2(function(offset, string)
 65 | {
 66 | 	for (; offset < string.length; offset++)
 67 | 	{
 68 | 		var code = string.charCodeAt(offset);
 69 | 		if (code < 0x30 || 0x39 < code)
 70 | 		{
 71 | 			return offset;
 72 | 		}
 73 | 	}
 74 | 	return offset;
 75 | });
 76 | 
 77 | 
 78 | var _Parser_consumeBase = F3(function(base, offset, string)
 79 | {
 80 | 	for (var total = 0; offset < string.length; offset++)
 81 | 	{
 82 | 		var digit = string.charCodeAt(offset) - 0x30;
 83 | 		if (digit < 0 || base <= digit) break;
 84 | 		total = base * total + digit;
 85 | 	}
 86 | 	return __Utils_Tuple2(offset, total);
 87 | });
 88 | 
 89 | 
 90 | var _Parser_consumeBase16 = F2(function(offset, string)
 91 | {
 92 | 	for (var total = 0; offset < string.length; offset++)
 93 | 	{
 94 | 		var code = string.charCodeAt(offset);
 95 | 		if (0x30 <= code && code <= 0x39)
 96 | 		{
 97 | 			total = 16 * total + code - 0x30;
 98 | 		}
 99 | 		else if (0x41 <= code && code <= 0x46)
100 | 		{
101 | 			total = 16 * total + code - 55;
102 | 		}
103 | 		else if (0x61 <= code && code <= 0x66)
104 | 		{
105 | 			total = 16 * total + code - 87;
106 | 		}
107 | 		else
108 | 		{
109 | 			break;
110 | 		}
111 | 	}
112 | 	return __Utils_Tuple2(offset, total);
113 | });
114 | 
115 | 
116 | 
117 | // FIND STRING
118 | 
119 | 
120 | var _Parser_findSubString = F5(function(smallString, offset, row, col, bigString)
121 | {
122 | 	var newOffset = bigString.indexOf(smallString, offset);
123 | 	var target = newOffset < 0 ? bigString.length : newOffset + smallString.length;
124 | 
125 | 	while (offset < target)
126 | 	{
127 | 		var code = bigString.charCodeAt(offset++);
128 | 		code === 0x000A /* \n */
129 | 			? ( col=1, row++ )
130 | 			: ( col++, (code & 0xF800) === 0xD800 && offset++ )
131 | 	}
132 | 
133 | 	return __Utils_Tuple3(newOffset, row, col);
134 | });
135 | 


--------------------------------------------------------------------------------
/src/Parser.elm:
--------------------------------------------------------------------------------
   1 | module Parser exposing
   2 |   ( Parser, run
   3 |   , int, float, number, symbol, keyword, variable, end
   4 |   , succeed, (|=), (|.), lazy, andThen, problem
   5 |   , oneOf, map, backtrackable, commit, token
   6 |   , sequence, Trailing(..), loop, Step(..)
   7 |   , spaces, lineComment, multiComment, Nestable(..)
   8 |   , getChompedString, chompIf, chompWhile, chompUntil, chompUntilEndOr, mapChompedString
   9 |   , DeadEnd, Problem(..), deadEndsToString
  10 |   , withIndent, getIndent
  11 |   , getPosition, getRow, getCol, getOffset, getSource
  12 |   )
  13 | 
  14 | 
  15 | {-|
  16 | 
  17 | # Parsers
  18 | @docs Parser, run
  19 | 
  20 | # Building Blocks
  21 | @docs int, float, number, symbol, keyword, variable, end
  22 | 
  23 | # Pipelines
  24 | @docs succeed, (|=), (|.), lazy, andThen, problem
  25 | 
  26 | # Branches
  27 | @docs oneOf, map, backtrackable, commit, token
  28 | 
  29 | # Loops
  30 | @docs sequence, Trailing, loop, Step
  31 | 
  32 | # Whitespace
  33 | @docs spaces, lineComment, multiComment, Nestable
  34 | 
  35 | # Chompers
  36 | @docs getChompedString, chompIf, chompWhile, chompUntil, chompUntilEndOr, mapChompedString
  37 | 
  38 | # Errors
  39 | @docs DeadEnd, Problem, deadEndsToString
  40 | 
  41 | # Indentation
  42 | @docs withIndent, getIndent
  43 | 
  44 | # Positions
  45 | @docs getPosition, getRow, getCol, getOffset, getSource
  46 | -}
  47 | 
  48 | 
  49 | import Char
  50 | import Parser.Advanced as A exposing ((|=), (|.))
  51 | import Set
  52 | 
  53 | 
  54 | 
  55 | -- INFIX OPERATORS - see Parser.Advanced for why 5 and 6 were chosen
  56 | 
  57 | 
  58 | infix left 5 (|=) = keeper
  59 | infix left 6 (|.) = ignorer
  60 | 
  61 | 
  62 | 
  63 | -- PARSERS
  64 | 
  65 | 
  66 | {-| A `Parser` helps turn a `String` into nicely structured data. For example,
  67 | we can [`run`](#run) the [`int`](#int) parser to turn `String` to `Int`:
  68 | 
  69 |     run int "123456" == Ok 123456
  70 |     run int "3.1415" == Err ...
  71 | 
  72 | The cool thing is that you can combine `Parser` values to handle much more
  73 | complex scenarios.
  74 | -}
  75 | type alias Parser a =
  76 |   A.Parser Never Problem a
  77 | 
  78 | 
  79 | 
  80 | -- RUN
  81 | 
  82 | 
  83 | {-| Try a parser. Here are some examples using the [`keyword`](#keyword)
  84 | parser:
  85 | 
  86 |     run (keyword "true") "true"  == Ok ()
  87 |     run (keyword "true") "True"  == Err ...
  88 |     run (keyword "true") "false" == Err ...
  89 |     run (keyword "true") "true!" == Ok ()
  90 | 
  91 | Notice the last case! A `Parser` will chomp as much as possible and not worry
  92 | about the rest. Use the [`end`](#end) parser to ensure you made it to the end
  93 | of the string!
  94 | -}
  95 | run : Parser a -> String -> Result (List DeadEnd) a
  96 | run parser source =
  97 |   case A.run parser source of
  98 |     Ok a ->
  99 |       Ok a
 100 | 
 101 |     Err problems ->
 102 |       Err (List.map problemToDeadEnd problems)
 103 | 
 104 | 
 105 | problemToDeadEnd : A.DeadEnd Never Problem -> DeadEnd
 106 | problemToDeadEnd p =
 107 |   DeadEnd p.row p.col p.problem
 108 | 
 109 | 
 110 | 
 111 | -- PROBLEMS
 112 | 
 113 | 
 114 | {-| A parser can run into situations where there is no way to make progress.
 115 | When that happens, I record the `row` and `col` where you got stuck and the
 116 | particular `problem` you ran into. That is a `DeadEnd`!
 117 | 
 118 | **Note:** I count rows and columns like a text editor. The beginning is `row=1`
 119 | and `col=1`. As I chomp characters, the `col` increments. When I reach a `\n`
 120 | character, I increment the `row` and set `col=1`.
 121 | -}
 122 | type alias DeadEnd =
 123 |   { row : Int
 124 |   , col : Int
 125 |   , problem : Problem
 126 |   }
 127 | 
 128 | 
 129 | {-| When you run into a `DeadEnd`, I record some information about why you
 130 | got stuck. This data is useful for producing helpful error messages. This is
 131 | how [`deadEndsToString`](#deadEndsToString) works!
 132 | 
 133 | **Note:** If you feel limited by this type (i.e. having to represent custom
 134 | problems as strings) I highly recommend switching to `Parser.Advanced`. It
 135 | lets you define your own `Problem` type. It can also track "context" which
 136 | can improve error messages a ton! This is how the Elm compiler produces
 137 | relatively nice parse errors, and I am excited to see those techniques applied
 138 | elsewhere!
 139 | -}
 140 | type Problem
 141 |   = Expecting String
 142 |   | ExpectingInt
 143 |   | ExpectingHex
 144 |   | ExpectingOctal
 145 |   | ExpectingBinary
 146 |   | ExpectingFloat
 147 |   | ExpectingNumber
 148 |   | ExpectingVariable
 149 |   | ExpectingSymbol String
 150 |   | ExpectingKeyword String
 151 |   | ExpectingEnd
 152 |   | UnexpectedChar
 153 |   | Problem String
 154 |   | BadRepeat
 155 | 
 156 | 
 157 | {-| Turn all the `DeadEnd` data into a string that is easier for people to
 158 | read.
 159 | 
 160 | **Note:** This is just a baseline of quality. It cannot do anything with colors.
 161 | It is not interactivite. It just turns the raw data into strings. I really hope
 162 | folks will check out the source code for some inspiration on how to turn errors
 163 | into `Html` with nice colors and interaction! The `Parser.Advanced` module lets
 164 | you work with context as well, which really unlocks another level of quality!
 165 | The "context" technique is how the Elm compiler can say "I think I am parsing a
 166 | list, so I was expecting a closing `]` here." Telling users what the parser
 167 | _thinks_ is happening can be really helpful!
 168 | -}
 169 | deadEndsToString : List DeadEnd -> String
 170 | deadEndsToString deadEnds =
 171 |   "TODO deadEndsToString"
 172 | 
 173 | 
 174 | 
 175 | -- PIPELINES
 176 | 
 177 | 
 178 | {-| A parser that succeeds without chomping any characters.
 179 | 
 180 |     run (succeed 90210  ) "mississippi" == Ok 90210
 181 |     run (succeed 3.141  ) "mississippi" == Ok 3.141
 182 |     run (succeed ()     ) "mississippi" == Ok ()
 183 |     run (succeed Nothing) "mississippi" == Ok Nothing
 184 | 
 185 | Seems weird on its own, but it is very useful in combination with other
 186 | functions. The docs for [`(|=)`](#|=) and [`andThen`](#andThen) have some neat
 187 | examples.
 188 | -}
 189 | succeed : a -> Parser a
 190 | succeed =
 191 |   A.succeed
 192 | 
 193 | 
 194 | {-| **Keep** values in a parser pipeline. For example, we could say:
 195 | 
 196 |     type alias Point = { x : Float, y : Float }
 197 | 
 198 |     point : Parser Point
 199 |     point =
 200 |       succeed Point
 201 |         |. symbol "("
 202 |         |. spaces
 203 |         |= float
 204 |         |. spaces
 205 |         |. symbol ","
 206 |         |. spaces
 207 |         |= float
 208 |         |. spaces
 209 |         |. symbol ")"
 210 | 
 211 | All the parsers in this pipeline will chomp characters and produce values. So
 212 | `symbol "("` will chomp one paren and produce a `()` value. Similarly, `float`
 213 | will chomp some digits and produce a `Float` value. The `(|.)` and `(|=)`
 214 | operators just decide whether we give the values to the `Point` function.
 215 | 
 216 | So in this case, we skip the `()` from `symbol "("`, we skip the `()` from
 217 | `spaces`, we keep the `Float` from `float`, etc.
 218 | -}
 219 | keeper : Parser (a -> b) -> Parser a -> Parser b
 220 | keeper =
 221 |   (|=)
 222 | 
 223 | 
 224 | {-| **Skip** values in a parser pipeline. For example, maybe we want to parse
 225 | some JavaScript variables:
 226 | 
 227 |     var : Parser String
 228 |     var =
 229 |       getChompedString <|
 230 |         succeed ()
 231 |           |. chompIf isStartChar
 232 |           |. chompWhile isInnerChar
 233 | 
 234 |     isStartChar : Char -> Bool
 235 |     isStartChar char =
 236 |       Char.isAlpha char || char == '_' || char == '$'
 237 | 
 238 |     isInnerChar : Char -> Bool
 239 |     isInnerChar char =
 240 |       isStartChar char || Char.isDigit char
 241 | 
 242 | `chompIf isStartChar` can chomp one character and produce a `()` value.
 243 | `chompWhile isInnerChar` can chomp zero or more characters and produce a `()`
 244 | value. The `(|.)` operators are saying to still chomp all the characters, but
 245 | skip the two `()` values that get produced. No one cares about them.
 246 | -}
 247 | ignorer : Parser keep -> Parser ignore -> Parser keep
 248 | ignorer =
 249 |   (|.)
 250 | 
 251 | 
 252 | {-| Helper to define recursive parsers. Say we want a parser for simple
 253 | boolean expressions:
 254 | 
 255 |     true
 256 |     false
 257 |     (true || false)
 258 |     (true || (true || false))
 259 | 
 260 | Notice that a boolean expression might contain *other* boolean expressions.
 261 | That means we will want to define our parser in terms of itself:
 262 | 
 263 |     type Boolean
 264 |       = MyTrue
 265 |       | MyFalse
 266 |       | MyOr Boolean Boolean
 267 | 
 268 |     boolean : Parser Boolean
 269 |     boolean =
 270 |       oneOf
 271 |         [ succeed MyTrue
 272 |             |. keyword "true"
 273 |         , succeed MyFalse
 274 |             |. keyword "false"
 275 |         , succeed MyOr
 276 |             |. symbol "("
 277 |             |. spaces
 278 |             |= lazy (\_ -> boolean)
 279 |             |. spaces
 280 |             |. symbol "||"
 281 |             |. spaces
 282 |             |= lazy (\_ -> boolean)
 283 |             |. spaces
 284 |             |. symbol ")"
 285 |         ]
 286 | 
 287 | **Notice that `boolean` uses `boolean` in its definition!** In Elm, you can
 288 | only define a value in terms of itself it is behind a function call. So
 289 | `lazy` helps us define these self-referential parsers. (`andThen` can be used
 290 | for this as well!)
 291 | -}
 292 | lazy : (() -> Parser a) -> Parser a
 293 | lazy =
 294 |   A.lazy
 295 | 
 296 | 
 297 | {-| Parse one thing `andThen` parse another thing. This is useful when you want
 298 | to check on what you just parsed. For example, maybe you want U.S. zip codes
 299 | and `int` is not suitable because it does not allow leading zeros. You could
 300 | say:
 301 | 
 302 |     zipCode : Parser String
 303 |     zipCode =
 304 |       getChompedString (chompWhile Char.isDigit)
 305 |         |> andThen checkZipCode
 306 | 
 307 |     checkZipCode : String -> Parser String
 308 |     checkZipCode code =
 309 |       if String.length code == 5 then
 310 |         succeed code
 311 |       else
 312 |         problem "a U.S. zip code has exactly 5 digits"
 313 | 
 314 | First we chomp digits `andThen` we check if it is a valid U.S. zip code. We
 315 | `succeed` if it has exactly five digits and report a `problem` if not.
 316 | 
 317 | Check out [`examples/DoubleQuoteString.elm`](https://github.com/elm/parser/blob/master/examples/DoubleQuoteString.elm)
 318 | for another example, this time using `andThen` to verify unicode code points.
 319 | 
 320 | **Note:** If you are using `andThen` recursively and blowing the stack, check
 321 | out the [`loop`](#loop) function to limit stack usage.
 322 | -}
 323 | andThen : (a -> Parser b) -> Parser a -> Parser b
 324 | andThen =
 325 |   A.andThen
 326 | 
 327 | 
 328 | {-| Indicate that a parser has reached a dead end. "Everything was going fine
 329 | until I ran into this problem." Check out the [`andThen`](#andThen) docs to see
 330 | an example usage.
 331 | -}
 332 | problem : String -> Parser a
 333 | problem msg =
 334 |   A.problem (Problem msg)
 335 | 
 336 | 
 337 | 
 338 | -- BACKTRACKING
 339 | 
 340 | 
 341 | {-| If you are parsing JSON, the values can be strings, floats, booleans,
 342 | arrays, objects, or null. You need a way to pick `oneOf` them! Here is a
 343 | sample of what that code might look like:
 344 | 
 345 |     type Json
 346 |       = Number Float
 347 |       | Boolean Bool
 348 |       | Null
 349 | 
 350 |     json : Parser Json
 351 |     json =
 352 |       oneOf
 353 |         [ map Number float
 354 |         , map (\_ -> Boolean True) (keyword "true")
 355 |         , map (\_ -> Boolean False) (keyword "false")
 356 |         , map (\_ -> Null) keyword "null"
 357 |         ]
 358 | 
 359 | This parser will keep trying parsers until `oneOf` them starts chomping
 360 | characters. Once a path is chosen, it does not come back and try the others.
 361 | 
 362 | **Note:** I highly recommend reading [this document][semantics] to learn how
 363 | `oneOf` and `backtrackable` interact. It is subtle and important!
 364 | 
 365 | [semantics]: https://github.com/elm/parser/blob/master/semantics.md
 366 | -}
 367 | oneOf : List (Parser a) -> Parser a
 368 | oneOf =
 369 |   A.oneOf
 370 | 
 371 | 
 372 | {-| Transform the result of a parser. Maybe you have a value that is
 373 | an integer or `null`:
 374 | 
 375 |     nullOrInt : Parser (Maybe Int)
 376 |     nullOrInt =
 377 |       oneOf
 378 |         [ map Just int
 379 |         , map (\_ -> Nothing) (keyword "null")
 380 |         ]
 381 | 
 382 |     -- run nullOrInt "0"    == Ok (Just 0)
 383 |     -- run nullOrInt "13"   == Ok (Just 13)
 384 |     -- run nullOrInt "null" == Ok Nothing
 385 |     -- run nullOrInt "zero" == Err ...
 386 | -}
 387 | map : (a -> b) -> Parser a -> Parser b
 388 | map =
 389 |   A.map
 390 | 
 391 | 
 392 | {-| It is quite tricky to use `backtrackable` well! It can be very useful, but
 393 | also can degrade performance and error message quality.
 394 | 
 395 | Read [this document](https://github.com/elm/parser/blob/master/semantics.md)
 396 | to learn how `oneOf`, `backtrackable`, and `commit` work and interact with
 397 | each other. It is subtle and important!
 398 | -}
 399 | backtrackable : Parser a -> Parser a
 400 | backtrackable =
 401 |   A.backtrackable
 402 | 
 403 | 
 404 | {-| `commit` is almost always paired with `backtrackable` in some way, and it
 405 | is tricky to use well.
 406 | 
 407 | Read [this document](https://github.com/elm/parser/blob/master/semantics.md)
 408 | to learn how `oneOf`, `backtrackable`, and `commit` work and interact with
 409 | each other. It is subtle and important!
 410 | -}
 411 | commit : a -> Parser a
 412 | commit =
 413 |   A.commit
 414 | 
 415 | 
 416 | 
 417 | -- TOKEN
 418 | 
 419 | 
 420 | {-| Parse exactly the given string, without any regard to what comes next.
 421 | 
 422 | A potential pitfall when parsing keywords is getting tricked by variables that
 423 | start with a keyword, like `let` in `letters` or `import` in `important`. This
 424 | is especially likely if you have a whitespace parser that can consume zero
 425 | characters. So the [`keyword`](#keyword) parser is defined with `token` and a
 426 | trick to peek ahead a bit:
 427 | 
 428 |     keyword : String -> Parser ()
 429 |     keyword kwd =
 430 |       succeed identity
 431 |         |. backtrackable (token kwd)
 432 |         |= oneOf
 433 |             [ map (\_ -> True) (backtrackable (chompIf isVarChar))
 434 |             , succeed False
 435 |             ]
 436 |         |> andThen (checkEnding kwd)
 437 | 
 438 |     checkEnding : String -> Bool -> Parser ()
 439 |     checkEnding kwd isBadEnding =
 440 |       if isBadEnding then
 441 |         problem ("expecting the `" ++ kwd ++ "` keyword")
 442 |       else
 443 |         commit ()
 444 | 
 445 |     isVarChar : Char -> Bool
 446 |     isVarChar char =
 447 |       Char.isAlphaNum char || char == '_'
 448 | 
 449 | This definition is specially designed so that (1) if you really see `let` you
 450 | commit to that path and (2) if you see `letters` instead you can backtrack and
 451 | try other options. If I had just put a `backtrackable` around the whole thing
 452 | you would not get (1) anymore.
 453 | -}
 454 | token : String -> Parser ()
 455 | token str =
 456 |   A.token (toToken str)
 457 | 
 458 | 
 459 | toToken : String -> A.Token Problem
 460 | toToken str =
 461 |   A.Token str (Expecting str)
 462 | 
 463 | 
 464 | 
 465 | -- LOOPS
 466 | 
 467 | 
 468 | {-| A parser that can loop indefinitely. This can be helpful when parsing
 469 | repeated structures, like a bunch of statements:
 470 | 
 471 |     statements : Parser (List Stmt)
 472 |     statements =
 473 |       loop [] statementsHelp
 474 | 
 475 |     statementsHelp : List Stmt -> Parser (Step (List Stmt) (List Stmt))
 476 |     statementsHelp revStmts =
 477 |       oneOf
 478 |         [ succeed (\stmt -> Loop (stmt :: revStmts))
 479 |             |= statement
 480 |             |. spaces
 481 |             |. symbol ";"
 482 |             |. spaces
 483 |         , succeed ()
 484 |             |> map (\_ -> Done (List.reverse revStmts))
 485 |         ]
 486 | 
 487 |     -- statement : Parser Stmt
 488 | 
 489 | Notice that the statements are tracked in reverse as we `Loop`, and we reorder
 490 | them only once we are `Done`. This is a very common pattern with `loop`!
 491 | 
 492 | Check out [`examples/DoubleQuoteString.elm`](https://github.com/elm/parser/blob/master/examples/DoubleQuoteString.elm)
 493 | for another example.
 494 | 
 495 | **IMPORTANT NOTE:** Parsers like `succeed ()` and `chompWhile Char.isAlpha` can
 496 | succeed without consuming any characters. So in some cases you may want to use
 497 | [`getOffset`](#getOffset) to ensure that each step actually consumed characters.
 498 | Otherwise you could end up in an infinite loop!
 499 | 
 500 | **Note:** Anything you can write with `loop`, you can also write as a parser
 501 | that chomps some characters `andThen` calls itself with new arguments. The
 502 | problem with calling `andThen` recursively is that it grows the stack, so you
 503 | cannot do it indefinitely. So `loop` is important because enables tail-call
 504 | elimination, allowing you to parse however many repeats you want.
 505 | -}
 506 | loop : state -> (state -> Parser (Step state a)) -> Parser a
 507 | loop state callback =
 508 |   A.loop state (\s -> map toAdvancedStep (callback s))
 509 | 
 510 | 
 511 | {-| Decide what steps to take next in your [`loop`](#loop).
 512 | 
 513 | If you are `Done`, you give the result of the whole `loop`. If you decide to
 514 | `Loop` around again, you give a new state to work from. Maybe you need to add
 515 | an item to a list? Or maybe you need to track some information about what you
 516 | just saw?
 517 | 
 518 | **Note:** It may be helpful to learn about [finite-state machines][fsm] to get
 519 | a broader intuition about using `state`. I.e. You may want to create a `type`
 520 | that describes four possible states, and then use `Loop` to transition between
 521 | them as you consume characters.
 522 | 
 523 | [fsm]: https://en.wikipedia.org/wiki/Finite-state_machine
 524 | -}
 525 | type Step state a
 526 |   = Loop state
 527 |   | Done a
 528 | 
 529 | 
 530 | toAdvancedStep : Step s a -> A.Step s a
 531 | toAdvancedStep step =
 532 |   case step of
 533 |     Loop s -> A.Loop s
 534 |     Done a -> A.Done a
 535 | 
 536 | 
 537 | 
 538 | -- NUMBERS
 539 | 
 540 | 
 541 | {-| Parse integers.
 542 | 
 543 |     run int "1"    == Ok 1
 544 |     run int "1234" == Ok 1234
 545 | 
 546 |     run int "-789" == Err ...
 547 |     run int "0123" == Err ...
 548 |     run int "1.34" == Err ...
 549 |     run int "1e31" == Err ...
 550 |     run int "123a" == Err ...
 551 |     run int "0x1A" == Err ...
 552 | 
 553 | If you want to handle a leading `+` or `-` you should do it with a custom
 554 | parser like this:
 555 | 
 556 |     myInt : Parser Int
 557 |     myInt =
 558 |       oneOf
 559 |         [ succeed negate
 560 |             |. symbol "-"
 561 |             |= int
 562 |         , int
 563 |         ]
 564 | 
 565 | **Note:** If you want a parser for both `Int` and `Float` literals, check out
 566 | [`number`](#number) below. It will be faster than using `oneOf` to combining
 567 | `int` and `float` yourself.
 568 | -}
 569 | int : Parser Int
 570 | int =
 571 |   A.int ExpectingInt ExpectingInt
 572 | 
 573 | 
 574 | {-| Parse floats.
 575 | 
 576 |     run float "123"       == Ok 123
 577 |     run float "3.1415"    == Ok 3.1415
 578 |     run float "0.1234"    == Ok 0.1234
 579 |     run float ".1234"     == Ok 0.1234
 580 |     run float "1e-42"     == Ok 1e-42
 581 |     run float "6.022e23"  == Ok 6.022e23
 582 |     run float "6.022E23"  == Ok 6.022e23
 583 |     run float "6.022e+23" == Ok 6.022e23
 584 | 
 585 | If you want to disable literals like `.123` (like in Elm) you could write
 586 | something like this:
 587 | 
 588 |     elmFloat : Parser Float
 589 |     elmFloat =
 590 |       oneOf
 591 |         [ symbol "."
 592 |             |. problem "floating point numbers must start with a digit, like 0.25"
 593 |         , float
 594 |         ]
 595 | 
 596 | **Note:** If you want a parser for both `Int` and `Float` literals, check out
 597 | [`number`](#number) below. It will be faster than using `oneOf` to combining
 598 | `int` and `float` yourself.
 599 | -}
 600 | float : Parser Float
 601 | float =
 602 |   A.float ExpectingFloat ExpectingFloat
 603 | 
 604 | 
 605 | 
 606 | -- NUMBER
 607 | 
 608 | 
 609 | {-| Parse a bunch of different kinds of numbers without backtracking. A parser
 610 | for Elm would need to handle integers, floats, and hexadecimal like this:
 611 | 
 612 |     type Expr
 613 |       = Variable String
 614 |       | Int Int
 615 |       | Float Float
 616 |       | Apply Expr Expr
 617 | 
 618 |     elmNumber : Parser Expr
 619 |     elmNumber =
 620 |       number
 621 |         { int = Just Int
 622 |         , hex = Just Int    -- 0x001A is allowed
 623 |         , octal = Nothing   -- 0o0731 is not
 624 |         , binary = Nothing  -- 0b1101 is not
 625 |         , float = Just Float
 626 |         }
 627 | 
 628 | If you wanted to implement the [`float`](#float) parser, it would be like this:
 629 | 
 630 |     float : Parser Float
 631 |     float =
 632 |       number
 633 |         { int = Just toFloat
 634 |         , hex = Nothing
 635 |         , octal = Nothing
 636 |         , binary = Nothing
 637 |         , float = Just identity
 638 |         }
 639 | 
 640 | Notice that it actually is processing `int` results! This is because `123`
 641 | looks like an integer to me, but maybe it looks like a float to you. If you had
 642 | `int = Nothing`, floats would need a decimal like `1.0` in every case. If you
 643 | like explicitness, that may actually be preferable!
 644 | 
 645 | **Note:** This function does not check for weird trailing characters in the
 646 | current implementation, so parsing `123abc` can succeed up to `123` and then
 647 | move on. This is helpful for people who want to parse things like `40px` or
 648 | `3m`, but it requires a bit of extra code to rule out trailing characters in
 649 | other cases.
 650 | -}
 651 | number
 652 |   : { int : Maybe (Int -> a)
 653 |     , hex : Maybe (Int -> a)
 654 |     , octal : Maybe (Int -> a)
 655 |     , binary : Maybe (Int -> a)
 656 |     , float : Maybe (Float -> a)
 657 |     }
 658 |   -> Parser a
 659 | number i =
 660 |   A.number
 661 |     { int = Result.fromMaybe ExpectingInt i.int
 662 |     , hex = Result.fromMaybe ExpectingHex i.hex
 663 |     , octal = Result.fromMaybe ExpectingOctal i.octal
 664 |     , binary = Result.fromMaybe ExpectingBinary i.binary
 665 |     , float = Result.fromMaybe ExpectingFloat i.float
 666 |     , invalid = ExpectingNumber
 667 |     , expecting = ExpectingNumber
 668 |     }
 669 | 
 670 | 
 671 | 
 672 | -- SYMBOL
 673 | 
 674 | 
 675 | {-| Parse symbols like `(` and `,`.
 676 | 
 677 |     run (symbol "[") "[" == Ok ()
 678 |     run (symbol "[") "4" == Err ... (ExpectingSymbol "[") ...
 679 | 
 680 | **Note:** This is good for stuff like brackets and semicolons, but it probably
 681 | should not be used for binary operators like `+` and `-` because you can find
 682 | yourself in weird situations. For example, is `3--4` a typo? Or is it `3 - -4`?
 683 | I have had better luck with `chompWhile isSymbol` and sorting out which
 684 | operator it is afterwards.
 685 | -}
 686 | symbol : String -> Parser ()
 687 | symbol str =
 688 |   A.symbol (A.Token str (ExpectingSymbol str))
 689 | 
 690 | 
 691 | 
 692 | -- KEYWORD
 693 | 
 694 | 
 695 | {-| Parse keywords like `let`, `case`, and `type`.
 696 | 
 697 |     run (keyword "let") "let"     == Ok ()
 698 |     run (keyword "let") "var"     == Err ... (ExpectingKeyword "let") ...
 699 |     run (keyword "let") "letters" == Err ... (ExpectingKeyword "let") ...
 700 | 
 701 | **Note:** Notice the third case there! `keyword` actually looks ahead one
 702 | character to make sure it is not a letter, number, or underscore. The goal is
 703 | to help with parsers like this:
 704 | 
 705 |     succeed identity
 706 |       |. keyword "let"
 707 |       |. spaces
 708 |       |= elmVar
 709 |       |. spaces
 710 |       |. symbol "="
 711 | 
 712 | The trouble is that `spaces` may chomp zero characters (to handle expressions
 713 | like `[1,2]` and `[ 1 , 2 ]`) and in this case, it would mean `letters` could
 714 | be parsed as `let ters` and then wonder where the equals sign is! Check out the
 715 | [`token`](#token) docs if you need to customize this!
 716 | -}
 717 | keyword : String -> Parser ()
 718 | keyword kwd =
 719 |   A.keyword (A.Token kwd (ExpectingKeyword kwd))
 720 | 
 721 | 
 722 | 
 723 | -- END
 724 | 
 725 | 
 726 | {-| Check if you have reached the end of the string you are parsing.
 727 | 
 728 |     justAnInt : Parser Int
 729 |     justAnInt =
 730 |       succeed identity
 731 |         |= int
 732 |         |. end
 733 | 
 734 |     -- run justAnInt "90210" == Ok 90210
 735 |     -- run justAnInt "1 + 2" == Err ...
 736 |     -- run int       "1 + 2" == Ok 1
 737 | 
 738 | Parsers can succeed without parsing the whole string. Ending your parser
 739 | with `end` guarantees that you have successfully parsed the whole string.
 740 | -}
 741 | end : Parser ()
 742 | end =
 743 |   A.end ExpectingEnd
 744 | 
 745 | 
 746 | 
 747 | -- CHOMPED STRINGS
 748 | 
 749 | 
 750 | {-| Sometimes parsers like `int` or `variable` cannot do exactly what you
 751 | need. The "chomping" family of functions is meant for that case! Maybe you
 752 | need to parse [valid PHP variables][php] like `$x` and `$txt`:
 753 | 
 754 |     php : Parser String
 755 |     php =
 756 |       getChompedString <|
 757 |         succeed ()
 758 |           |. chompIf (\c -> c == '$')
 759 |           |. chompIf (\c -> Char.isAlpha c || c == '_')
 760 |           |. chompWhile (\c -> Char.isAlphaNum c || c == '_')
 761 | 
 762 | The idea is that you create a bunch of chompers that validate the underlying
 763 | characters. Then `getChompedString` extracts the underlying `String` efficiently.
 764 | 
 765 | **Note:** Maybe it is helpful to see how you can use [`getOffset`](#getOffset)
 766 | and [`getSource`](#getSource) to implement this function:
 767 | 
 768 |     getChompedString : Parser a -> Parser String
 769 |     getChompedString parser =
 770 |       succeed String.slice
 771 |         |= getOffset
 772 |         |. parser
 773 |         |= getOffset
 774 |         |= getSource
 775 | 
 776 | [php]: https://www.w3schools.com/php/php_variables.asp
 777 | -}
 778 | getChompedString : Parser a -> Parser String
 779 | getChompedString =
 780 |   A.getChompedString
 781 | 
 782 | 
 783 | {-| This works just like [`getChompedString`](#getChompedString) but gives
 784 | a bit more flexibility. For example, maybe you want to parse Elm doc comments
 785 | and get (1) the full comment and (2) all of the names listed in the docs.
 786 | 
 787 | You could implement `mapChompedString` like this:
 788 | 
 789 |     mapChompedString : (String -> a -> b) -> Parser a -> Parser String
 790 |     mapChompedString func parser =
 791 |       succeed (\start value end src -> func (String.slice start end src) value)
 792 |         |= getOffset
 793 |         |= parser
 794 |         |= getOffset
 795 |         |= getSource
 796 | 
 797 | -}
 798 | mapChompedString : (String -> a -> b) -> Parser a -> Parser b
 799 | mapChompedString =
 800 |   A.mapChompedString
 801 | 
 802 | 
 803 | 
 804 | {-| Chomp one character if it passes the test.
 805 | 
 806 |     chompUpper : Parser ()
 807 |     chompUpper =
 808 |       chompIf Char.isUpper
 809 | 
 810 | So this can chomp a character like `T` and produces a `()` value.
 811 | -}
 812 | chompIf : (Char -> Bool) -> Parser ()
 813 | chompIf isGood =
 814 |   A.chompIf isGood UnexpectedChar
 815 | 
 816 | 
 817 | 
 818 | {-| Chomp zero or more characters if they pass the test. This is commonly
 819 | useful for chomping whitespace or variable names:
 820 | 
 821 |     whitespace : Parser ()
 822 |     whitespace =
 823 |       chompWhile (\c -> c == ' ' || c == '\t' || c == '\n' || c == '\r')
 824 | 
 825 |     elmVar : Parser String
 826 |     elmVar =
 827 |       getChompedString <|
 828 |         succeed ()
 829 |           |. chompIf Char.isLower
 830 |           |. chompWhile (\c -> Char.isAlphaNum c || c == '_')
 831 | 
 832 | **Note:** a `chompWhile` parser always succeeds! This can lead to tricky
 833 | situations, especially if you define your whitespace with it. In that case,
 834 | you could accidentally interpret `letx` as the keyword `let` followed by
 835 | "spaces" followed by the variable `x`. This is why the `keyword` and `number`
 836 | parsers peek ahead, making sure they are not followed by anything unexpected.
 837 | -}
 838 | chompWhile : (Char -> Bool) -> Parser ()
 839 | chompWhile =
 840 |   A.chompWhile
 841 | 
 842 | 
 843 | {-| Chomp until you see a certain string. You could define C-style multi-line
 844 | comments like this:
 845 | 
 846 |     comment : Parser ()
 847 |     comment =
 848 |       symbol "/*"
 849 |         |. chompUntil "*/"
 850 | 
 851 | I recommend using [`multiComment`](#multiComment) for this particular scenario
 852 | though. It can be trickier than it looks!
 853 | -}
 854 | chompUntil : String -> Parser ()
 855 | chompUntil str =
 856 |   A.chompUntil (toToken str)
 857 | 
 858 | 
 859 | {-| Chomp until you see a certain string or until you run out of characters to
 860 | chomp! You could define single-line comments like this:
 861 | 
 862 |     elm : Parser ()
 863 |     elm =
 864 |       symbol "--"
 865 |         |. chompUntilEndOr "\n"
 866 | 
 867 | A file may end with a single-line comment, so the file can end before you see
 868 | a newline. Tricky!
 869 | 
 870 | I recommend just using [`lineComment`](#lineComment) for this particular
 871 | scenario.
 872 | -}
 873 | chompUntilEndOr : String -> Parser ()
 874 | chompUntilEndOr =
 875 |   A.chompUntilEndOr
 876 | 
 877 | 
 878 | 
 879 | -- INDENTATION
 880 | 
 881 | 
 882 | {-| Some languages are indentation sensitive. Python cares about tabs. Elm
 883 | cares about spaces sometimes. `withIndent` and `getIndent` allow you to manage
 884 | "indentation state" yourself, however is necessary in your scenario.
 885 | -}
 886 | withIndent : Int -> Parser a -> Parser a
 887 | withIndent =
 888 |   A.withIndent
 889 | 
 890 | 
 891 | {-| When someone said `withIndent` earlier, what number did they put in there?
 892 | 
 893 | - `getIndent` results in `0`, the default value
 894 | - `withIndent 4 getIndent` results in `4`
 895 | 
 896 | So you are just asking about things you said earlier. These numbers do not leak
 897 | out of `withIndent`, so say we have:
 898 | 
 899 |     succeed Tuple.pair
 900 |       |= withIndent 4 getIndent
 901 |       |= getIndent
 902 | 
 903 | Assuming there are no `withIndent` above this, you would get `(4,0)` from this.
 904 | -}
 905 | getIndent : Parser Int
 906 | getIndent =
 907 |   A.getIndent
 908 | 
 909 | 
 910 | 
 911 | -- POSITION
 912 | 
 913 | 
 914 | {-| Code editors treat code like a grid, with rows and columns. The start is
 915 | `row=1` and `col=1`. As you chomp characters, the `col` increments. When you
 916 | run into a `\n` character, the `row` increments and `col` goes back to `1`.
 917 | 
 918 | In the Elm compiler, I track the start and end position of every expression
 919 | like this:
 920 | 
 921 |     type alias Located a =
 922 |       { start : (Int, Int)
 923 |       , value : a
 924 |       , end : (Int, Int)
 925 |       }
 926 | 
 927 |     located : Parser a -> Parser (Located a)
 928 |     located parser =
 929 |       succeed Located
 930 |         |= getPosition
 931 |         |= parser
 932 |         |= getPosition
 933 | 
 934 | So if there is a problem during type inference, I use this saved position
 935 | information to underline the exact problem!
 936 | 
 937 | **Note:** Tabs count as one character, so if you are parsing something like
 938 | Python, I recommend sorting that out *after* parsing. So if I wanted the `^^^^`
 939 | underline like in Elm, I would find the `row` in the source code and do
 940 | something like this:
 941 | 
 942 |     makeUnderline : String -> Int -> Int -> String
 943 |     makeUnderline row minCol maxCol =
 944 |       String.toList row
 945 |         |> List.indexedMap (toUnderlineChar minCol maxCol)
 946 |         |> String.fromList
 947 | 
 948 |     toUnderlineChar : Int -> Int -> Int -> Char -> Char
 949 |     toUnderlineChar minCol maxCol col char =
 950 |       if minCol <= col && col <= maxCol then
 951 |         '^'
 952 |       else if char == '\t' then
 953 |         '\t'
 954 |       else
 955 |         ' '
 956 | 
 957 | So it would preserve any tabs from the source line. There are tons of other
 958 | ways to do this though. The point is just that you handle the tabs after
 959 | parsing but before anyone looks at the numbers in a context where tabs may
 960 | equal 2, 4, or 8.
 961 | -}
 962 | getPosition : Parser (Int, Int)
 963 | getPosition =
 964 |   A.getPosition
 965 | 
 966 | 
 967 | {-| This is a more efficient version of `map Tuple.first getPosition`. Maybe
 968 | you just want to track the line number for some reason? This lets you do that.
 969 | 
 970 | See [`getPosition`](#getPosition) for an explanation of rows and columns.
 971 | -}
 972 | getRow : Parser Int
 973 | getRow =
 974 |   A.getRow
 975 | 
 976 | 
 977 | {-| This is a more efficient version of `map Tuple.second getPosition`. This
 978 | can be useful in combination with [`withIndent`](#withIndent) and
 979 | [`getIndent`](#getIndent), like this:
 980 | 
 981 |     checkIndent : Parser ()
 982 |     checkIndent =
 983 |       succeed (\indent column -> indent <= column)
 984 |         |= getIndent
 985 |         |= getCol
 986 |         |> andThen checkIndentHelp
 987 | 
 988 |     checkIndentHelp : Bool -> Parser ()
 989 |     checkIndentHelp isIndented =
 990 |       if isIndented then
 991 |         succeed ()
 992 |       else
 993 |         problem "expecting more spaces"
 994 | 
 995 | So the `checkIndent` parser only succeeds when you are "deeper" than the
 996 | current indent level. You could use this to parse Elm-style `let` expressions.
 997 | -}
 998 | getCol : Parser Int
 999 | getCol =
1000 |   A.getCol
1001 | 
1002 | 
1003 | {-| Editors think of code as a grid, but behind the scenes it is just a flat
1004 | array of UTF-16 characters. `getOffset` tells you your index in that flat
1005 | array. So if you chomp `"\n\n\n\n"` you are on row 5, column 1, and offset 4.
1006 | 
1007 | **Note:** JavaScript uses a somewhat odd version of UTF-16 strings, so a single
1008 | character may take two slots. So in JavaScript, `'abc'.length === 3` but
1009 | `'🙈🙉🙊'.length === 6`. Try it out! And since Elm runs in JavaScript, the offset
1010 | moves by those rules.
1011 | -}
1012 | getOffset : Parser Int
1013 | getOffset =
1014 |   A.getOffset
1015 | 
1016 | 
1017 | {-| Get the full string that is being parsed. You could use this to define
1018 | `getChompedString` or `mapChompedString` if you wanted:
1019 | 
1020 |     getChompedString : Parser a -> Parser String
1021 |     getChompedString parser =
1022 |       succeed String.slice
1023 |         |= getOffset
1024 |         |. parser
1025 |         |= getOffset
1026 |         |= getSource
1027 | -}
1028 | getSource : Parser String
1029 | getSource =
1030 |   A.getSource
1031 | 
1032 | 
1033 | 
1034 | -- VARIABLES
1035 | 
1036 | 
1037 | {-| Create a parser for variables. If we wanted to parse type variables in Elm,
1038 | we could try something like this:
1039 | 
1040 |     import Char
1041 |     import Parser exposing (..)
1042 |     import Set
1043 | 
1044 |     typeVar : Parser String
1045 |     typeVar =
1046 |       variable
1047 |         { start = Char.isLower
1048 |         , inner = \c -> Char.isAlphaNum c || c == '_'
1049 |         , reserved = Set.fromList [ "let", "in", "case", "of" ]
1050 |         }
1051 | 
1052 | This is saying it _must_ start with a lower-case character. After that,
1053 | characters can be letters, numbers, or underscores. It is also saying that if
1054 | you run into any of these reserved names, it is definitely not a variable.
1055 | -}
1056 | variable :
1057 |   { start : Char -> Bool
1058 |   , inner : Char -> Bool
1059 |   , reserved : Set.Set String
1060 |   }
1061 |   -> Parser String
1062 | variable i =
1063 |   A.variable
1064 |     { start = i.start
1065 |     , inner = i.inner
1066 |     , reserved = i.reserved
1067 |     , expecting = ExpectingVariable
1068 |     }
1069 | 
1070 | 
1071 | 
1072 | -- SEQUENCES
1073 | 
1074 | 
1075 | {-| Handle things like lists and records, but you can customize the details
1076 | however you need. Say you want to parse C-style code blocks:
1077 | 
1078 |     import Parser exposing (Parser, Trailing(..))
1079 | 
1080 |     block : Parser (List Stmt)
1081 |     block =
1082 |       Parser.sequence
1083 |         { start = "{"
1084 |         , separator = ";"
1085 |         , end = "}"
1086 |         , spaces = spaces
1087 |         , item = statement
1088 |         , trailing = Mandatory -- demand a trailing semi-colon
1089 |         }
1090 | 
1091 |     -- statement : Parser Stmt
1092 | 
1093 | **Note:** If you need something more custom, do not be afraid to check
1094 | out the implementation and customize it for your case. It is better to
1095 | get nice error messages with a lower-level implementation than to try
1096 | to hack high-level parsers to do things they are not made for.
1097 | -}
1098 | sequence
1099 |   : { start : String
1100 |     , separator : String
1101 |     , end : String
1102 |     , spaces : Parser ()
1103 |     , item : Parser a
1104 |     , trailing : Trailing
1105 |     }
1106 |   -> Parser (List a)
1107 | sequence i =
1108 |   A.sequence
1109 |     { start = toToken i.start
1110 |     , separator = toToken i.separator
1111 |     , end = toToken i.end
1112 |     , spaces = i.spaces
1113 |     , item = i.item
1114 |     , trailing = toAdvancedTrailing i.trailing
1115 |     }
1116 | 
1117 | 
1118 | {-| What’s the deal with trailing commas? Are they `Forbidden`?
1119 | Are they `Optional`? Are they `Mandatory`? Welcome to [shapes
1120 | club](https://poorlydrawnlines.com/comic/shapes-club/)!
1121 | -}
1122 | type Trailing = Forbidden | Optional | Mandatory
1123 | 
1124 | 
1125 | toAdvancedTrailing : Trailing -> A.Trailing
1126 | toAdvancedTrailing trailing =
1127 |   case trailing of
1128 |     Forbidden -> A.Forbidden
1129 |     Optional -> A.Optional
1130 |     Mandatory -> A.Mandatory
1131 | 
1132 | 
1133 | 
1134 | -- WHITESPACE
1135 | 
1136 | 
1137 | {-| Parse zero or more `' '`, `'\n'`, and `'\r'` characters.
1138 | 
1139 | The implementation is pretty simple:
1140 | 
1141 |     spaces : Parser ()
1142 |     spaces =
1143 |       chompWhile (\c -> c == ' ' || c == '\n' || c == '\r')
1144 | 
1145 | So if you need something different (like tabs) just define an alternative with
1146 | the necessary tweaks! Check out [`lineComment`](#lineComment) and
1147 | [`multiComment`](#multiComment) for more complex situations.
1148 | -}
1149 | spaces : Parser ()
1150 | spaces =
1151 |   A.spaces
1152 | 
1153 | 
1154 | {-| Parse single-line comments:
1155 | 
1156 |     elm : Parser ()
1157 |     elm =
1158 |       lineComment "--"
1159 | 
1160 |     js : Parser ()
1161 |     js =
1162 |       lineComment "//"
1163 | 
1164 |     python : Parser ()
1165 |     python =
1166 |       lineComment "#"
1167 | 
1168 | This parser is defined like this:
1169 | 
1170 |     lineComment : String -> Parser ()
1171 |     lineComment str =
1172 |       symbol str
1173 |         |. chompUntilEndOr "\n"
1174 | 
1175 | So it will consume the remainder of the line. If the file ends before you see
1176 | a newline, that is fine too.
1177 | -}
1178 | lineComment : String -> Parser ()
1179 | lineComment str =
1180 |   A.lineComment (toToken str)
1181 | 
1182 | 
1183 | {-| Parse multi-line comments. So if you wanted to parse Elm whitespace or
1184 | JS whitespace, you could say:
1185 | 
1186 |     elm : Parser ()
1187 |     elm =
1188 |       loop 0 <| ifProgress <|
1189 |         oneOf
1190 |           [ lineComment "--"
1191 |           , multiComment "{-" "-}" Nestable
1192 |           , spaces
1193 |           ]
1194 | 
1195 |     js : Parser ()
1196 |     js =
1197 |       loop 0 <| ifProgress <|
1198 |         oneOf
1199 |           [ lineComment "//"
1200 |           , multiComment "/*" "*/" NotNestable
1201 |           , chompWhile (\c -> c == ' ' || c == '\n' || c == '\r' || c == '\t')
1202 |           ]
1203 | 
1204 |     ifProgress : Parser a -> Int -> Parser (Step Int ())
1205 |     ifProgress parser offset =
1206 |       succeed identity
1207 |         |. parser
1208 |         |= getOffset
1209 |         |> map (\newOffset -> if offset == newOffset then Done () else Loop newOffset)
1210 | 
1211 | **Note:** The fact that `spaces` comes last in the definition of `elm` is very
1212 | important! It can succeed without consuming any characters, so if it were the
1213 | first option, it would always succeed and bypass the others! (Same is true of
1214 | `chompWhile` in `js`.) This possibility of success without consumption is also
1215 | why wee need the `ifProgress` helper. It detects if there is no more whitespace
1216 | to consume.
1217 | -}
1218 | multiComment : String -> String -> Nestable -> Parser ()
1219 | multiComment open close nestable =
1220 |   A.multiComment (toToken open) (toToken close) (toAdvancedNestable nestable)
1221 | 
1222 | 
1223 | {-| Not all languages handle multi-line comments the same. Multi-line comments
1224 | in C-style syntax are `NotNestable`, meaning they can be implemented like this:
1225 | 
1226 |     js : Parser ()
1227 |     js =
1228 |       symbol "/*"
1229 |         |. chompUntil "*/"
1230 | 
1231 | In fact, `multiComment "/*" "*/" NotNestable` *is* implemented like that! It is
1232 | very simple, but it does not allow you to nest comments like this:
1233 | 
1234 | ```javascript
1235 | /*
1236 | line1
1237 | /* line2 */
1238 | line3
1239 | */
1240 | ```
1241 | 
1242 | It would stop on the first `*/`, eventually throwing a syntax error on the
1243 | second `*/`. This can be pretty annoying in long files.
1244 | 
1245 | Languages like Elm allow you to nest multi-line comments, but your parser needs
1246 | to be a bit fancier to handle this. After you start a comment, you have to
1247 | detect if there is another one inside it! And then you have to make sure all
1248 | the `{-` and `-}` match up properly! Saying `multiComment "{-" "-}" Nestable`
1249 | does all that for you.
1250 | -}
1251 | type Nestable = NotNestable | Nestable
1252 | 
1253 | 
1254 | toAdvancedNestable : Nestable -> A.Nestable
1255 | toAdvancedNestable nestable =
1256 |   case nestable of
1257 |     NotNestable -> A.NotNestable
1258 |     Nestable -> A.Nestable
1259 | 


--------------------------------------------------------------------------------
/src/Parser/Advanced.elm:
--------------------------------------------------------------------------------
   1 | module Parser.Advanced exposing
   2 |   ( Parser, run, DeadEnd, inContext, Token(..)
   3 |   , int, float, number, symbol, keyword, variable, end
   4 |   , succeed, (|=), (|.), lazy, andThen, problem
   5 |   , oneOf, map, backtrackable, commit, token
   6 |   , sequence, Trailing(..), loop, Step(..)
   7 |   , spaces, lineComment, multiComment, Nestable(..)
   8 |   , getChompedString, chompIf, chompWhile, chompUntil, chompUntilEndOr, mapChompedString
   9 |   , withIndent, getIndent
  10 |   , getPosition, getRow, getCol, getOffset, getSource
  11 |   )
  12 | 
  13 | 
  14 | {-|
  15 | 
  16 | # Parsers
  17 | @docs Parser, run, DeadEnd, inContext, Token
  18 | 
  19 | * * *
  20 | **Everything past here works just like in the
  21 | [`Parser`](/packages/elm/parser/latest/Parser) module, except that `String`
  22 | arguments become `Token` arguments, and you need to provide a `Problem` for
  23 | certain scenarios.**
  24 | * * *
  25 | 
  26 | # Building Blocks
  27 | @docs int, float, number, symbol, keyword, variable, end
  28 | 
  29 | # Pipelines
  30 | @docs succeed, (|=), (|.), lazy, andThen, problem
  31 | 
  32 | # Branches
  33 | @docs oneOf, map, backtrackable, commit, token
  34 | 
  35 | # Loops
  36 | @docs sequence, Trailing, loop, Step
  37 | 
  38 | # Whitespace
  39 | @docs spaces, lineComment, multiComment, Nestable
  40 | 
  41 | # Chompers
  42 | @docs getChompedString, chompIf, chompWhile, chompUntil, chompUntilEndOr, mapChompedString
  43 | 
  44 | # Indentation
  45 | @docs withIndent, getIndent
  46 | 
  47 | # Positions
  48 | @docs getPosition, getRow, getCol, getOffset, getSource
  49 | -}
  50 | 
  51 | 
  52 | import Char
  53 | import Elm.Kernel.Parser
  54 | import Set
  55 | 
  56 | 
  57 | 
  58 | -- INFIX OPERATORS
  59 | 
  60 | 
  61 | infix left 5 (|=) = keeper
  62 | infix left 6 (|.) = ignorer
  63 | 
  64 | 
  65 | {- NOTE: the (|.) oporator binds tighter to slightly reduce the amount
  66 | of recursion in pipelines. For example:
  67 | 
  68 |     func
  69 |       |. a
  70 |       |. b
  71 |       |= c
  72 |       |. d
  73 |       |. e
  74 | 
  75 | With the same precedence:
  76 | 
  77 |     (ignorer (ignorer (keeper (ignorer (ignorer func a) b) c) d) e)
  78 | 
  79 | With higher precedence:
  80 | 
  81 |     keeper (ignorer (ignorer func a) b) (ignorer (ignorer c d) e)
  82 | 
  83 | So the maximum call depth goes from 5 to 3.
  84 | -}
  85 | 
  86 | 
  87 | 
  88 | -- PARSERS
  89 | 
  90 | 
  91 | {-| An advanced `Parser` gives two ways to improve your error messages:
  92 | 
  93 | - `problem` &mdash; Instead of all errors being a `String`, you can create a
  94 | custom type like `type Problem = BadIndent | BadKeyword String` and track
  95 | problems much more precisely.
  96 | - `context` &mdash; Error messages can be further improved when precise
  97 | problems are paired with information about where you ran into trouble. By
  98 | tracking the context, instead of saying “I found a bad keyword” you can say
  99 | “I found a bad keyword when parsing a list” and give folks a better idea of
 100 | what the parser thinks it is doing.
 101 | 
 102 | I recommend starting with the simpler [`Parser`][parser] module though, and
 103 | when you feel comfortable and want better error messages, you can create a type
 104 | alias like this:
 105 | 
 106 | ```elm
 107 | import Parser.Advanced
 108 | 
 109 | type alias MyParser a =
 110 |   Parser.Advanced.Parser Context Problem a
 111 | 
 112 | type Context = Definition String | List | Record
 113 | 
 114 | type Problem = BadIndent | BadKeyword String
 115 | ```
 116 | 
 117 | All of the functions from `Parser` should exist in `Parser.Advanced` in some
 118 | form, allowing you to switch over pretty easily.
 119 | 
 120 | [parser]: /packages/elm/parser/latest/Parser
 121 | -}
 122 | type Parser context problem value =
 123 |   Parser (State context -> PStep context problem value)
 124 | 
 125 | 
 126 | type PStep context problem value
 127 |   = Good Bool value (State context)
 128 |   | Bad Bool (Bag context problem)
 129 | 
 130 | 
 131 | type alias State context =
 132 |   { src : String
 133 |   , offset : Int
 134 |   , indent : Int
 135 |   , context : List (Located context)
 136 |   , row : Int
 137 |   , col : Int
 138 |   }
 139 | 
 140 | 
 141 | type alias Located context =
 142 |   { row : Int
 143 |   , col : Int
 144 |   , context : context
 145 |   }
 146 | 
 147 | 
 148 | 
 149 | -- RUN
 150 | 
 151 | 
 152 | {-| This works just like [`Parser.run`](/packages/elm/parser/latest/Parser#run).
 153 | The only difference is that when it fails, it has much more precise information
 154 | for each dead end.
 155 | -}
 156 | run : Parser c x a -> String -> Result (List (DeadEnd c x)) a
 157 | run (Parser parse) src =
 158 |   case parse { src = src, offset = 0, indent = 1, context = [], row = 1, col = 1} of
 159 |     Good _ value _ ->
 160 |       Ok value
 161 | 
 162 |     Bad _ bag ->
 163 |       Err (bagToList bag [])
 164 | 
 165 | 
 166 | 
 167 | -- PROBLEMS
 168 | 
 169 | 
 170 | {-| Say you are parsing a function named `viewHealthData` that contains a list.
 171 | You might get a `DeadEnd` like this:
 172 | 
 173 | ```elm
 174 | { row = 18
 175 | , col = 22
 176 | , problem = UnexpectedComma
 177 | , contextStack =
 178 |     [ { row = 14
 179 |       , col = 1
 180 |       , context = Definition "viewHealthData"
 181 |       }
 182 |     , { row = 15
 183 |       , col = 4
 184 |       , context = List
 185 |       }
 186 |     ]
 187 | }
 188 | ```
 189 | 
 190 | We have a ton of information here! So in the error message, we can say that “I
 191 | ran into an issue when parsing a list in the definition of `viewHealthData`. It
 192 | looks like there is an extra comma.” Or maybe something even better!
 193 | 
 194 | Furthermore, many parsers just put a mark where the problem manifested. By
 195 | tracking the `row` and `col` of the context, we can show a much larger region
 196 | as a way of indicating “I thought I was parsing this thing that starts over
 197 | here.” Otherwise you can get very confusing error messages on a missing `]` or
 198 | `}` or `)` because “I need more indentation” on something unrelated.
 199 | 
 200 | **Note:** Rows and columns are counted like a text editor. The beginning is `row=1`
 201 | and `col=1`. The `col` increments as characters are chomped. When a `\n` is chomped,
 202 | `row` is incremented and `col` starts over again at `1`.
 203 | -}
 204 | type alias DeadEnd context problem =
 205 |   { row : Int
 206 |   , col : Int
 207 |   , problem : problem
 208 |   , contextStack : List { row : Int, col : Int, context : context }
 209 |   }
 210 | 
 211 | 
 212 | type Bag c x
 213 |   = Empty
 214 |   | AddRight (Bag c x) (DeadEnd c x)
 215 |   | Append (Bag c x) (Bag c x)
 216 | 
 217 | 
 218 | fromState : State c -> x -> Bag c x
 219 | fromState s x =
 220 |   AddRight Empty (DeadEnd s.row s.col x s.context)
 221 | 
 222 | 
 223 | fromInfo : Int -> Int -> x -> List (Located c) -> Bag c x
 224 | fromInfo row col x context =
 225 |   AddRight Empty (DeadEnd row col x context)
 226 | 
 227 | 
 228 | bagToList : Bag c x -> List (DeadEnd c x) -> List (DeadEnd c x)
 229 | bagToList bag list =
 230 |   case bag of
 231 |     Empty ->
 232 |       list
 233 | 
 234 |     AddRight bag1 x ->
 235 |       bagToList bag1 (x :: list)
 236 | 
 237 |     Append bag1 bag2 ->
 238 |       bagToList bag1 (bagToList bag2 list)
 239 | 
 240 | 
 241 | 
 242 | -- PRIMITIVES
 243 | 
 244 | 
 245 | {-| Just like [`Parser.succeed`](Parser#succeed)
 246 | -}
 247 | succeed : a -> Parser c x a
 248 | succeed a =
 249 |   Parser <| \s ->
 250 |     Good False a s
 251 | 
 252 | 
 253 | {-| Just like [`Parser.problem`](Parser#problem) except you provide a custom
 254 | type for your problem.
 255 | -}
 256 | problem : x -> Parser c x a
 257 | problem x =
 258 |   Parser <| \s ->
 259 |     Bad False (fromState s x)
 260 | 
 261 | 
 262 | 
 263 | -- MAPPING
 264 | 
 265 | 
 266 | {-| Just like [`Parser.map`](Parser#map)
 267 | -}
 268 | map : (a -> b) -> Parser c x a -> Parser c x b
 269 | map func (Parser parse) =
 270 |   Parser <| \s0 ->
 271 |     case parse s0 of
 272 |       Good p a s1 ->
 273 |         Good p (func a) s1
 274 | 
 275 |       Bad p x ->
 276 |         Bad p x
 277 | 
 278 | 
 279 | map2 : (a -> b -> value) -> Parser c x a -> Parser c x b -> Parser c x value
 280 | map2 func (Parser parseA) (Parser parseB) =
 281 |   Parser <| \s0 ->
 282 |     case parseA s0 of
 283 |       Bad p x ->
 284 |         Bad p x
 285 | 
 286 |       Good p1 a s1 ->
 287 |         case parseB s1 of
 288 |           Bad p2 x ->
 289 |             Bad (p1 || p2) x
 290 | 
 291 |           Good p2 b s2 ->
 292 |             Good (p1 || p2) (func a b) s2
 293 | 
 294 | 
 295 | {-| Just like the [`(|=)`](Parser#|=) from the `Parser` module.
 296 | -}
 297 | keeper : Parser c x (a -> b) -> Parser c x a -> Parser c x b
 298 | keeper parseFunc parseArg =
 299 |   map2 (<|) parseFunc parseArg
 300 | 
 301 | 
 302 | {-| Just like the [`(|.)`](Parser#|.) from the `Parser` module.
 303 | -}
 304 | ignorer : Parser c x keep -> Parser c x ignore -> Parser c x keep
 305 | ignorer keepParser ignoreParser =
 306 |   map2 always keepParser ignoreParser
 307 | 
 308 | 
 309 | 
 310 | -- AND THEN
 311 | 
 312 | 
 313 | {-| Just like [`Parser.andThen`](Parser#andThen)
 314 | -}
 315 | andThen : (a -> Parser c x b) -> Parser c x a -> Parser c x b
 316 | andThen callback (Parser parseA) =
 317 |   Parser <| \s0 ->
 318 |     case parseA s0 of
 319 |       Bad p x ->
 320 |         Bad p x
 321 | 
 322 |       Good p1 a s1 ->
 323 |         let
 324 |           (Parser parseB) =
 325 |             callback a
 326 |         in
 327 |         case parseB s1 of
 328 |           Bad p2 x ->
 329 |             Bad (p1 || p2) x
 330 | 
 331 |           Good p2 b s2 ->
 332 |             Good (p1 || p2) b s2
 333 | 
 334 | 
 335 | 
 336 | -- LAZY
 337 | 
 338 | 
 339 | {-| Just like [`Parser.lazy`](Parser#lazy)
 340 | -}
 341 | lazy : (() -> Parser c x a) -> Parser c x a
 342 | lazy thunk =
 343 |   Parser <| \s ->
 344 |     let
 345 |       (Parser parse) =
 346 |         thunk ()
 347 |     in
 348 |     parse s
 349 | 
 350 | 
 351 | 
 352 | -- ONE OF
 353 | 
 354 | 
 355 | {-| Just like [`Parser.oneOf`](Parser#oneOf)
 356 | -}
 357 | oneOf : List (Parser c x a) -> Parser c x a
 358 | oneOf parsers =
 359 |   Parser <| \s -> oneOfHelp s Empty parsers
 360 | 
 361 | 
 362 | oneOfHelp : State c -> Bag c x -> List (Parser c x a) -> PStep c x a
 363 | oneOfHelp s0 bag parsers =
 364 |   case parsers of
 365 |     [] ->
 366 |       Bad False bag
 367 | 
 368 |     Parser parse :: remainingParsers ->
 369 |       case parse s0 of
 370 |         Good _ _ _ as step ->
 371 |           step
 372 | 
 373 |         Bad p x as step ->
 374 |           if p then
 375 |             step
 376 |           else
 377 |             oneOfHelp s0 (Append bag x) remainingParsers
 378 | 
 379 | 
 380 | 
 381 | -- LOOP
 382 | 
 383 | 
 384 | {-| Just like [`Parser.Step`](Parser#Step)
 385 | -}
 386 | type Step state a
 387 |   = Loop state
 388 |   | Done a
 389 | 
 390 | 
 391 | {-| Just like [`Parser.loop`](Parser#loop)
 392 | -}
 393 | loop : state -> (state -> Parser c x (Step state a)) -> Parser c x a
 394 | loop state callback =
 395 |   Parser <| \s ->
 396 |     loopHelp False state callback s
 397 | 
 398 | 
 399 | loopHelp : Bool -> state -> (state -> Parser c x (Step state a)) -> State c -> PStep c x a
 400 | loopHelp p state callback s0 =
 401 |   let
 402 |     (Parser parse) =
 403 |       callback state
 404 |   in
 405 |   case parse s0 of
 406 |     Good p1 step s1 ->
 407 |       case step of
 408 |         Loop newState ->
 409 |           loopHelp (p || p1) newState callback s1
 410 | 
 411 |         Done result ->
 412 |           Good (p || p1) result s1
 413 | 
 414 |     Bad p1 x ->
 415 |       Bad (p || p1) x
 416 | 
 417 | 
 418 | 
 419 | -- BACKTRACKABLE
 420 | 
 421 | 
 422 | {-| Just like [`Parser.backtrackable`](Parser#backtrackable)
 423 | -}
 424 | backtrackable : Parser c x a -> Parser c x a
 425 | backtrackable (Parser parse) =
 426 |   Parser <| \s0 ->
 427 |     case parse s0 of
 428 |       Bad _ x ->
 429 |         Bad False x
 430 | 
 431 |       Good _ a s1 ->
 432 |         Good False a s1
 433 | 
 434 | 
 435 | {-| Just like [`Parser.commit`](Parser#commit)
 436 | -}
 437 | commit : a -> Parser c x a
 438 | commit a =
 439 |   Parser <| \s -> Good True a s
 440 | 
 441 | 
 442 | 
 443 | -- SYMBOL
 444 | 
 445 | 
 446 | {-| Just like [`Parser.symbol`](Parser#symbol) except you provide a `Token` to
 447 | clearly indicate your custom type of problems:
 448 | 
 449 |     comma : Parser Context Problem ()
 450 |     comma =
 451 |       symbol (Token "," ExpectingComma)
 452 | 
 453 | -}
 454 | symbol : Token x -> Parser c x ()
 455 | symbol =
 456 |   token
 457 | 
 458 | 
 459 | 
 460 | -- KEYWORD
 461 | 
 462 | 
 463 | {-| Just like [`Parser.keyword`](Parser#keyword) except you provide a `Token`
 464 | to clearly indicate your custom type of problems:
 465 | 
 466 |     let_ : Parser Context Problem ()
 467 |     let_ =
 468 |       symbol (Token "let" ExpectingLet)
 469 | 
 470 | Note that this would fail to chomp `letter` because of the subsequent
 471 | characters. Use `token` if you do not want that last letter check.
 472 | -}
 473 | keyword : Token x -> Parser c x ()
 474 | keyword (Token kwd expecting) =
 475 |   let
 476 |     progress =
 477 |       not (String.isEmpty kwd)
 478 |   in
 479 |   Parser <| \s ->
 480 |     let
 481 |       (newOffset, newRow, newCol) =
 482 |         isSubString kwd s.offset s.row s.col s.src
 483 |     in
 484 |     if newOffset == -1 || 0 <= isSubChar (\c -> Char.isAlphaNum c || c == '_') newOffset s.src then
 485 |       Bad False (fromState s expecting)
 486 |     else
 487 |       Good progress ()
 488 |         { src = s.src
 489 |         , offset = newOffset
 490 |         , indent = s.indent
 491 |         , context = s.context
 492 |         , row = newRow
 493 |         , col = newCol
 494 |         }
 495 | 
 496 | 
 497 | 
 498 | -- TOKEN
 499 | 
 500 | 
 501 | {-| With the simpler `Parser` module, you could just say `symbol ","` and
 502 | parse all the commas you wanted. But now that we have a custom type for our
 503 | problems, we actually have to specify that as well. So anywhere you just used
 504 | a `String` in the simpler module, you now use a `Token Problem` in the advanced
 505 | module:
 506 | 
 507 |     type Problem
 508 |       = ExpectingComma
 509 |       | ExpectingListEnd
 510 | 
 511 |     comma : Token Problem
 512 |     comma =
 513 |       Token "," ExpectingComma
 514 | 
 515 |     listEnd : Token Problem
 516 |     listEnd =
 517 |       Token "]" ExpectingListEnd
 518 | 
 519 | You can be creative with your custom type. Maybe you want a lot of detail.
 520 | Maybe you want looser categories. It is a custom type. Do what makes sense for
 521 | you!
 522 | -}
 523 | type Token x = Token String x
 524 | 
 525 | 
 526 | {-| Just like [`Parser.token`](Parser#token) except you provide a `Token`
 527 | specifying your custom type of problems.
 528 | -}
 529 | token : Token x -> Parser c x ()
 530 | token (Token str expecting) =
 531 |   let
 532 |     progress =
 533 |       not (String.isEmpty str)
 534 |   in
 535 |   Parser <| \s ->
 536 |     let
 537 |       (newOffset, newRow, newCol) =
 538 |         isSubString str s.offset s.row s.col s.src
 539 |     in
 540 |     if newOffset == -1 then
 541 |       Bad False (fromState s expecting)
 542 |     else
 543 |       Good progress ()
 544 |         { src = s.src
 545 |         , offset = newOffset
 546 |         , indent = s.indent
 547 |         , context = s.context
 548 |         , row = newRow
 549 |         , col = newCol
 550 |         }
 551 | 
 552 | 
 553 | 
 554 | -- INT
 555 | 
 556 | 
 557 | {-| Just like [`Parser.int`](Parser#int) where you have to handle negation
 558 | yourself. The only difference is that you provide a two potential problems:
 559 | 
 560 |     int : x -> x -> Parser c x Int
 561 |     int expecting invalid =
 562 |       number
 563 |         { int = Ok identity
 564 |         , hex = Err invalid
 565 |         , octal = Err invalid
 566 |         , binary = Err invalid
 567 |         , float = Err invalid
 568 |         , invalid = invalid
 569 |         , expecting = expecting
 570 |         }
 571 | 
 572 | You can use problems like `ExpectingInt` and `InvalidNumber`.
 573 | -}
 574 | int : x -> x -> Parser c x Int
 575 | int expecting invalid =
 576 |   number
 577 |     { int = Ok identity
 578 |     , hex = Err invalid
 579 |     , octal = Err invalid
 580 |     , binary = Err invalid
 581 |     , float = Err invalid
 582 |     , invalid = invalid
 583 |     , expecting = expecting
 584 |     }
 585 | 
 586 | 
 587 | 
 588 | -- FLOAT
 589 | 
 590 | 
 591 | {-| Just like [`Parser.float`](Parser#float) where you have to handle negation
 592 | yourself. The only difference is that you provide a two potential problems:
 593 | 
 594 |     float : x -> x -> Parser c x Float
 595 |     float expecting invalid =
 596 |       number
 597 |         { int = Ok toFloat
 598 |         , hex = Err invalid
 599 |         , octal = Err invalid
 600 |         , binary = Err invalid
 601 |         , float = Ok identity
 602 |         , invalid = invalid
 603 |         , expecting = expecting
 604 |         }
 605 | 
 606 | You can use problems like `ExpectingFloat` and `InvalidNumber`.
 607 | -}
 608 | float : x -> x -> Parser c x Float
 609 | float expecting invalid =
 610 |   number
 611 |     { int = Ok toFloat
 612 |     , hex = Err invalid
 613 |     , octal = Err invalid
 614 |     , binary = Err invalid
 615 |     , float = Ok identity
 616 |     , invalid = invalid
 617 |     , expecting = expecting
 618 |     }
 619 | 
 620 | 
 621 | 
 622 | -- NUMBER
 623 | 
 624 | 
 625 | {-| Just like [`Parser.number`](Parser#number) where you have to handle
 626 | negation yourself. The only difference is that you provide all the potential
 627 | problems.
 628 | -}
 629 | number
 630 |   : { int : Result x (Int -> a)
 631 |     , hex : Result x (Int -> a)
 632 |     , octal : Result x (Int -> a)
 633 |     , binary : Result x (Int -> a)
 634 |     , float : Result x (Float -> a)
 635 |     , invalid : x
 636 |     , expecting : x
 637 |     }
 638 |   -> Parser c x a
 639 | number c =
 640 |   Parser <| \s ->
 641 |     if isAsciiCode 0x30 {- 0 -} s.offset s.src then
 642 |       let
 643 |         zeroOffset = s.offset + 1
 644 |         baseOffset = zeroOffset + 1
 645 |       in
 646 |       if isAsciiCode 0x78 {- x -} zeroOffset s.src then
 647 |         finalizeInt c.invalid c.hex baseOffset (consumeBase16 baseOffset s.src) s
 648 |       else if isAsciiCode 0x6F {- o -} zeroOffset s.src then
 649 |         finalizeInt c.invalid c.octal baseOffset (consumeBase 8 baseOffset s.src) s
 650 |       else if isAsciiCode 0x62 {- b -} zeroOffset s.src then
 651 |         finalizeInt c.invalid c.binary baseOffset (consumeBase 2 baseOffset s.src) s
 652 |       else
 653 |         finalizeFloat c.invalid c.expecting c.int c.float (zeroOffset, 0) s
 654 | 
 655 |     else
 656 |       finalizeFloat c.invalid c.expecting c.int c.float (consumeBase 10 s.offset s.src) s
 657 | 
 658 | 
 659 | consumeBase : Int -> Int -> String -> (Int, Int)
 660 | consumeBase =
 661 |   Elm.Kernel.Parser.consumeBase
 662 | 
 663 | 
 664 | consumeBase16 : Int -> String -> (Int, Int)
 665 | consumeBase16 =
 666 |   Elm.Kernel.Parser.consumeBase16
 667 | 
 668 | 
 669 | finalizeInt : x -> Result x (Int -> a) -> Int -> (Int, Int) -> State c -> PStep c x a
 670 | finalizeInt invalid handler startOffset (endOffset, n) s =
 671 |   case handler of
 672 |     Err x ->
 673 |       Bad True (fromState s x)
 674 | 
 675 |     Ok toValue ->
 676 |       if startOffset == endOffset
 677 |         then Bad (s.offset < startOffset) (fromState s invalid)
 678 |         else Good True (toValue n) (bumpOffset endOffset s)
 679 | 
 680 | 
 681 | bumpOffset : Int -> State c -> State c
 682 | bumpOffset newOffset s =
 683 |   { src = s.src
 684 |   , offset = newOffset
 685 |   , indent = s.indent
 686 |   , context = s.context
 687 |   , row = s.row
 688 |   , col = s.col + (newOffset - s.offset)
 689 |   }
 690 | 
 691 | 
 692 | finalizeFloat : x -> x -> Result x (Int -> a) -> Result x (Float -> a) -> (Int, Int) -> State c -> PStep c x a
 693 | finalizeFloat invalid expecting intSettings floatSettings intPair s =
 694 |   let
 695 |     intOffset = Tuple.first intPair
 696 |     floatOffset = consumeDotAndExp intOffset s.src
 697 |   in
 698 |   if floatOffset < 0 then
 699 |     Bad True (fromInfo s.row (s.col - (floatOffset + s.offset)) invalid s.context)
 700 | 
 701 |   else if s.offset == floatOffset then
 702 |     Bad False (fromState s expecting)
 703 | 
 704 |   else if intOffset == floatOffset then
 705 |     finalizeInt invalid intSettings s.offset intPair s
 706 | 
 707 |   else
 708 |     case floatSettings of
 709 |       Err x ->
 710 |         Bad True (fromState s invalid)
 711 | 
 712 |       Ok toValue ->
 713 |         case String.toFloat (String.slice s.offset floatOffset s.src) of
 714 |           Nothing -> Bad True (fromState s invalid)
 715 |           Just n -> Good True (toValue n) (bumpOffset floatOffset s)
 716 | 
 717 | 
 718 | --
 719 | -- On a failure, returns negative index of problem.
 720 | --
 721 | consumeDotAndExp : Int -> String -> Int
 722 | consumeDotAndExp offset src =
 723 |   if isAsciiCode 0x2E {- . -} offset src then
 724 |     consumeExp (chompBase10 (offset + 1) src) src
 725 |   else
 726 |     consumeExp offset src
 727 | 
 728 | 
 729 | --
 730 | -- On a failure, returns negative index of problem.
 731 | --
 732 | consumeExp : Int -> String -> Int
 733 | consumeExp offset src =
 734 |   if isAsciiCode 0x65 {- e -} offset src || isAsciiCode 0x45 {- E -} offset src then
 735 |     let
 736 |       eOffset = offset + 1
 737 | 
 738 |       expOffset =
 739 |         if isAsciiCode 0x2B {- + -} eOffset src || isAsciiCode 0x2D {- - -} eOffset src then
 740 |           eOffset + 1
 741 |         else
 742 |           eOffset
 743 | 
 744 |       newOffset = chompBase10 expOffset src
 745 |     in
 746 |     if expOffset == newOffset then
 747 |       -newOffset
 748 |     else
 749 |       newOffset
 750 | 
 751 |   else
 752 |     offset
 753 | 
 754 | 
 755 | chompBase10 : Int -> String -> Int
 756 | chompBase10 =
 757 |   Elm.Kernel.Parser.chompBase10
 758 | 
 759 | 
 760 | 
 761 | -- END
 762 | 
 763 | 
 764 | {-| Just like [`Parser.end`](Parser#end) except you provide the problem that
 765 | arises when the parser is not at the end of the input.
 766 | -}
 767 | end : x -> Parser c x ()
 768 | end x =
 769 |   Parser <| \s ->
 770 |     if String.length s.src == s.offset then
 771 |       Good False () s
 772 |     else
 773 |       Bad False (fromState s x)
 774 | 
 775 | 
 776 | 
 777 | -- CHOMPED STRINGS
 778 | 
 779 | 
 780 | {-| Just like [`Parser.getChompedString`](Parser#getChompedString)
 781 | -}
 782 | getChompedString : Parser c x a -> Parser c x String
 783 | getChompedString parser =
 784 |   mapChompedString always parser
 785 | 
 786 | 
 787 | {-| Just like [`Parser.mapChompedString`](Parser#mapChompedString)
 788 | -}
 789 | mapChompedString : (String -> a -> b) -> Parser c x a -> Parser c x b
 790 | mapChompedString func (Parser parse) =
 791 |   Parser <| \s0 ->
 792 |     case parse s0 of
 793 |       Bad p x ->
 794 |         Bad p x
 795 | 
 796 |       Good p a s1 ->
 797 |         Good p (func (String.slice s0.offset s1.offset s0.src) a) s1
 798 | 
 799 | 
 800 | 
 801 | -- CHOMP IF
 802 | 
 803 | 
 804 | {-| Just like [`Parser.chompIf`](Parser#chompIf) except you provide a problem
 805 | in case a character cannot be chomped.
 806 | -}
 807 | chompIf : (Char -> Bool) -> x -> Parser c x ()
 808 | chompIf isGood expecting =
 809 |   Parser <| \s ->
 810 |     let
 811 |       newOffset = isSubChar isGood s.offset s.src
 812 |     in
 813 |     -- not found
 814 |     if newOffset == -1 then
 815 |       Bad False (fromState s expecting)
 816 | 
 817 |     -- newline
 818 |     else if newOffset == -2 then
 819 |       Good True ()
 820 |         { src = s.src
 821 |         , offset = s.offset + 1
 822 |         , indent = s.indent
 823 |         , context = s.context
 824 |         , row = s.row + 1
 825 |         , col = 1
 826 |         }
 827 | 
 828 |     -- found
 829 |     else
 830 |       Good True ()
 831 |         { src = s.src
 832 |         , offset = newOffset
 833 |         , indent = s.indent
 834 |         , context = s.context
 835 |         , row = s.row
 836 |         , col = s.col + 1
 837 |         }
 838 | 
 839 | 
 840 | 
 841 | -- CHOMP WHILE
 842 | 
 843 | 
 844 | {-| Just like [`Parser.chompWhile`](Parser#chompWhile)
 845 | -}
 846 | chompWhile : (Char -> Bool) -> Parser c x ()
 847 | chompWhile isGood =
 848 |   Parser <| \s ->
 849 |     chompWhileHelp isGood s.offset s.row s.col s
 850 | 
 851 | 
 852 | chompWhileHelp : (Char -> Bool) -> Int -> Int -> Int -> State c -> PStep c x ()
 853 | chompWhileHelp isGood offset row col s0 =
 854 |   let
 855 |     newOffset = isSubChar isGood offset s0.src
 856 |   in
 857 |   -- no match
 858 |   if newOffset == -1 then
 859 |     Good (s0.offset < offset) ()
 860 |       { src = s0.src
 861 |       , offset = offset
 862 |       , indent = s0.indent
 863 |       , context = s0.context
 864 |       , row = row
 865 |       , col = col
 866 |       }
 867 | 
 868 |   -- matched a newline
 869 |   else if newOffset == -2 then
 870 |     chompWhileHelp isGood (offset + 1) (row + 1) 1 s0
 871 | 
 872 |   -- normal match
 873 |   else
 874 |     chompWhileHelp isGood newOffset row (col + 1) s0
 875 | 
 876 | 
 877 | 
 878 | -- CHOMP UNTIL
 879 | 
 880 | 
 881 | {-| Just like [`Parser.chompUntil`](Parser#chompUntil) except you provide a
 882 | `Token` in case you chomp all the way to the end of the input without finding
 883 | what you need.
 884 | -}
 885 | chompUntil : Token x -> Parser c x ()
 886 | chompUntil (Token str expecting) =
 887 |   Parser <| \s ->
 888 |     let
 889 |       (newOffset, newRow, newCol) =
 890 |         findSubString str s.offset s.row s.col s.src
 891 |     in
 892 |     if newOffset == -1 then
 893 |       Bad False (fromInfo newRow newCol expecting s.context)
 894 | 
 895 |     else
 896 |       Good (s.offset < newOffset) ()
 897 |         { src = s.src
 898 |         , offset = newOffset
 899 |         , indent = s.indent
 900 |         , context = s.context
 901 |         , row = newRow
 902 |         , col = newCol
 903 |         }
 904 | 
 905 | 
 906 | {-| Just like [`Parser.chompUntilEndOr`](Parser#chompUntilEndOr)
 907 | -}
 908 | chompUntilEndOr : String -> Parser c x ()
 909 | chompUntilEndOr str =
 910 |   Parser <| \s ->
 911 |     let
 912 |       (newOffset, newRow, newCol) =
 913 |         Elm.Kernel.Parser.findSubString str s.offset s.row s.col s.src
 914 | 
 915 |       adjustedOffset =
 916 |         if newOffset < 0 then String.length s.src else newOffset
 917 |     in
 918 |     Good (s.offset < adjustedOffset) ()
 919 |       { src = s.src
 920 |       , offset = adjustedOffset
 921 |       , indent = s.indent
 922 |       , context = s.context
 923 |       , row = newRow
 924 |       , col = newCol
 925 |       }
 926 | 
 927 | 
 928 | 
 929 | -- CONTEXT
 930 | 
 931 | 
 932 | {-| This is how you mark that you are in a certain context. For example, here
 933 | is a rough outline of some code that uses `inContext` to mark when you are
 934 | parsing a specific definition:
 935 | 
 936 |     import Char
 937 |     import Parser.Advanced exposing (..)
 938 |     import Set
 939 | 
 940 |     type Context
 941 |       = Definition String
 942 |       | List
 943 | 
 944 |     definition : Parser Context Problem Expr
 945 |     definition =
 946 |       functionName
 947 |         |> andThen definitionBody
 948 | 
 949 |     definitionBody : String -> Parser Context Problem Expr
 950 |     definitionBody name =
 951 |       inContext (Definition name) <|
 952 |         succeed (Function name)
 953 |           |= arguments
 954 |           |. symbol (Token "=" ExpectingEquals)
 955 |           |= expression
 956 | 
 957 |     functionName : Parser c Problem String
 958 |     functionName =
 959 |       variable
 960 |         { start = Char.isLower
 961 |         , inner = Char.isAlphaNum
 962 |         , reserved = Set.fromList ["let","in"]
 963 |         , expecting = ExpectingFunctionName
 964 |         }
 965 | 
 966 | First we parse the function name, and then we parse the rest of the definition.
 967 | Importantly, we call `inContext` so that any dead end that occurs in
 968 | `definitionBody` will get this extra context information. That way you can say
 969 | things like, “I was expecting an equals sign in the `view` definition.” Context!
 970 | -}
 971 | inContext : context -> Parser context x a -> Parser context x a
 972 | inContext context (Parser parse) =
 973 |   Parser <| \s0 ->
 974 |     case parse (changeContext (Located s0.row s0.col context :: s0.context) s0) of
 975 |       Good p a s1 ->
 976 |         Good p a (changeContext s0.context s1)
 977 | 
 978 |       Bad _ _ as step ->
 979 |         step
 980 | 
 981 | 
 982 | changeContext : List (Located c) -> State c -> State c
 983 | changeContext newContext s =
 984 |   { src = s.src
 985 |   , offset = s.offset
 986 |   , indent = s.indent
 987 |   , context = newContext
 988 |   , row = s.row
 989 |   , col = s.col
 990 |   }
 991 | 
 992 | 
 993 | 
 994 | -- INDENTATION
 995 | 
 996 | 
 997 | {-| Just like [`Parser.getIndent`](Parser#getIndent)
 998 | -}
 999 | getIndent : Parser c x Int
1000 | getIndent =
1001 |   Parser <| \s -> Good False s.indent s
1002 | 
1003 | 
1004 | {-| Just like [`Parser.withIndent`](Parser#withIndent)
1005 | -}
1006 | withIndent : Int -> Parser c x a -> Parser c x a
1007 | withIndent newIndent (Parser parse) =
1008 |   Parser <| \s0 ->
1009 |     case parse (changeIndent newIndent s0) of
1010 |       Good p a s1 ->
1011 |         Good p a (changeIndent s0.indent s1)
1012 | 
1013 |       Bad p x ->
1014 |         Bad p x
1015 | 
1016 | 
1017 | changeIndent : Int -> State c -> State c
1018 | changeIndent newIndent s =
1019 |   { src = s.src
1020 |   , offset = s.offset
1021 |   , indent = newIndent
1022 |   , context = s.context
1023 |   , row = s.row
1024 |   , col = s.col
1025 |   }
1026 | 
1027 | 
1028 | 
1029 | -- POSITION
1030 | 
1031 | 
1032 | {-| Just like [`Parser.getPosition`](Parser#getPosition)
1033 | -}
1034 | getPosition : Parser c x (Int, Int)
1035 | getPosition =
1036 |   Parser <| \s -> Good False (s.row, s.col) s
1037 | 
1038 | 
1039 | {-| Just like [`Parser.getRow`](Parser#getRow)
1040 | -}
1041 | getRow : Parser c x Int
1042 | getRow =
1043 |   Parser <| \s -> Good False s.row s
1044 | 
1045 | 
1046 | {-| Just like [`Parser.getCol`](Parser#getCol)
1047 | -}
1048 | getCol : Parser c x Int
1049 | getCol =
1050 |   Parser <| \s -> Good False s.col s
1051 | 
1052 | 
1053 | {-| Just like [`Parser.getOffset`](Parser#getOffset)
1054 | -}
1055 | getOffset : Parser c x Int
1056 | getOffset =
1057 |   Parser <| \s -> Good False s.offset s
1058 | 
1059 | 
1060 | {-| Just like [`Parser.getSource`](Parser#getSource)
1061 | -}
1062 | getSource : Parser c x String
1063 | getSource =
1064 |   Parser <| \s -> Good False s.src s
1065 | 
1066 | 
1067 | 
1068 | -- LOW-LEVEL HELPERS
1069 | 
1070 | 
1071 | {-| When making a fast parser, you want to avoid allocation as much as
1072 | possible. That means you never want to mess with the source string, only
1073 | keep track of an offset into that string.
1074 | 
1075 | You use `isSubString` like this:
1076 | 
1077 |     isSubString "let" offset row col "let x = 4 in x"
1078 |         --==> ( newOffset, newRow, newCol )
1079 | 
1080 | You are looking for `"let"` at a given `offset`. On failure, the
1081 | `newOffset` is `-1`. On success, the `newOffset` is the new offset. With
1082 | our `"let"` example, it would be `offset + 3`.
1083 | 
1084 | You also provide the current `row` and `col` which do not align with
1085 | `offset` in a clean way. For example, when you see a `\n` you are at
1086 | `row = row + 1` and `col = 1`. Furthermore, some UTF16 characters are
1087 | two words wide, so even if there are no newlines, `offset` and `col`
1088 | may not be equal.
1089 | -}
1090 | isSubString : String -> Int -> Int -> Int -> String -> (Int, Int, Int)
1091 | isSubString =
1092 |   Elm.Kernel.Parser.isSubString
1093 | 
1094 | 
1095 | {-| Again, when parsing, you want to allocate as little as possible.
1096 | So this function lets you say:
1097 | 
1098 |     isSubChar isSpace offset "this is the source string"
1099 |         --==> newOffset
1100 | 
1101 | The `(Char -> Bool)` argument is called a predicate.
1102 | The `newOffset` value can be a few different things:
1103 | 
1104 |   - `-1` means that the predicate failed
1105 |   - `-2` means the predicate succeeded with a `\n`
1106 |   - otherwise you will get `offset + 1` or `offset + 2`
1107 |     depending on whether the UTF16 character is one or two
1108 |     words wide.
1109 | -}
1110 | isSubChar : (Char -> Bool) -> Int -> String -> Int
1111 | isSubChar =
1112 |   Elm.Kernel.Parser.isSubChar
1113 | 
1114 | 
1115 | {-| Check an offset in the string. Is it equal to the given Char? Are they
1116 | both ASCII characters?
1117 | -}
1118 | isAsciiCode : Int -> Int -> String -> Bool
1119 | isAsciiCode =
1120 |   Elm.Kernel.Parser.isAsciiCode
1121 | 
1122 | 
1123 | {-| Find a substring after a given offset.
1124 | 
1125 |     findSubString "42" offset row col "Is 42 the answer?"
1126 |         --==> (newOffset, newRow, newCol)
1127 | 
1128 | If `offset = 0` we would get `(3, 1, 4)`
1129 | If `offset = 7` we would get `(-1, 1, 18)`
1130 | -}
1131 | findSubString : String -> Int -> Int -> Int -> String -> (Int, Int, Int)
1132 | findSubString =
1133 |   Elm.Kernel.Parser.findSubString
1134 | 
1135 | 
1136 | 
1137 | -- VARIABLES
1138 | 
1139 | 
1140 | {-| Just like [`Parser.variable`](Parser#variable) except you specify the
1141 | problem yourself.
1142 | -}
1143 | variable :
1144 |   { start : Char -> Bool
1145 |   , inner : Char -> Bool
1146 |   , reserved : Set.Set String
1147 |   , expecting : x
1148 |   }
1149 |   -> Parser c x String
1150 | variable i =
1151 |   Parser <| \s ->
1152 |     let
1153 |       firstOffset =
1154 |         isSubChar i.start s.offset s.src
1155 |     in
1156 |     if firstOffset == -1 then
1157 |       Bad False (fromState s i.expecting)
1158 |     else
1159 |       let
1160 |         s1 =
1161 |           if firstOffset == -2 then
1162 |             varHelp i.inner (s.offset + 1) (s.row + 1) 1 s.src s.indent s.context
1163 |           else
1164 |             varHelp i.inner firstOffset s.row (s.col + 1) s.src s.indent s.context
1165 | 
1166 |         name =
1167 |           String.slice s.offset s1.offset s.src
1168 |       in
1169 |       if Set.member name i.reserved then
1170 |         Bad False (fromState s i.expecting)
1171 |       else
1172 |         Good True name s1
1173 | 
1174 | 
1175 | varHelp : (Char -> Bool) -> Int -> Int -> Int -> String -> Int -> List (Located c) -> State c
1176 | varHelp isGood offset row col src indent context =
1177 |   let
1178 |     newOffset = isSubChar isGood offset src
1179 |   in
1180 |   if newOffset == -1 then
1181 |     { src = src
1182 |     , offset = offset
1183 |     , indent = indent
1184 |     , context = context
1185 |     , row = row
1186 |     , col = col
1187 |     }
1188 | 
1189 |   else if newOffset == -2 then
1190 |     varHelp isGood (offset + 1) (row + 1) 1 src indent context
1191 | 
1192 |   else
1193 |     varHelp isGood newOffset row (col + 1) src indent context
1194 | 
1195 | 
1196 | 
1197 | -- SEQUENCES
1198 | 
1199 | 
1200 | {-| Just like [`Parser.sequence`](Parser#sequence) except with a `Token` for
1201 | the start, separator, and end. That way you can specify your custom type of
1202 | problem for when something is not found.
1203 | -}
1204 | sequence
1205 |   : { start : Token x
1206 |     , separator : Token x
1207 |     , end : Token x
1208 |     , spaces : Parser c x ()
1209 |     , item : Parser c x a
1210 |     , trailing : Trailing
1211 |     }
1212 |   -> Parser c x (List a)
1213 | sequence i =
1214 |   skip (token i.start) <|
1215 |   skip i.spaces <|
1216 |     sequenceEnd (token i.end) i.spaces i.item (token i.separator) i.trailing
1217 | 
1218 | 
1219 | {-| What’s the deal with trailing commas? Are they `Forbidden`?
1220 | Are they `Optional`? Are they `Mandatory`? Welcome to [shapes
1221 | club](https://poorlydrawnlines.com/comic/shapes-club/)!
1222 | -}
1223 | type Trailing = Forbidden | Optional | Mandatory
1224 | 
1225 | 
1226 | skip : Parser c x ignore -> Parser c x keep -> Parser c x keep
1227 | skip iParser kParser =
1228 |   map2 revAlways iParser kParser
1229 | 
1230 | 
1231 | revAlways : a -> b -> b
1232 | revAlways _ b =
1233 |   b
1234 | 
1235 | 
1236 | sequenceEnd : Parser c x () -> Parser c x () -> Parser c x a -> Parser c x () -> Trailing -> Parser c x (List a)
1237 | sequenceEnd ender ws parseItem sep trailing =
1238 |   let
1239 |     chompRest item =
1240 |       case trailing of
1241 |         Forbidden ->
1242 |           loop [item] (sequenceEndForbidden ender ws parseItem sep)
1243 | 
1244 |         Optional ->
1245 |           loop [item] (sequenceEndOptional ender ws parseItem sep)
1246 | 
1247 |         Mandatory ->
1248 |           ignorer
1249 |             ( skip ws <| skip sep <| skip ws <|
1250 |                 loop [item] (sequenceEndMandatory ws parseItem sep)
1251 |             )
1252 |             ender
1253 |   in
1254 |   oneOf
1255 |     [ parseItem |> andThen chompRest
1256 |     , ender |> map (\_ -> [])
1257 |     ]
1258 | 
1259 | 
1260 | sequenceEndForbidden : Parser c x () -> Parser c x () -> Parser c x a -> Parser c x () -> List a -> Parser c x (Step (List a) (List a))
1261 | sequenceEndForbidden ender ws parseItem sep revItems =
1262 |   let
1263 |     chompRest item =
1264 |       sequenceEndForbidden ender ws parseItem sep (item :: revItems)
1265 |   in
1266 |   skip ws <|
1267 |     oneOf
1268 |       [ skip sep <| skip ws <| map (\item -> Loop (item :: revItems)) parseItem
1269 |       , ender |> map (\_ -> Done (List.reverse revItems))
1270 |       ]
1271 | 
1272 | 
1273 | sequenceEndOptional : Parser c x () -> Parser c x () -> Parser c x a -> Parser c x () -> List a -> Parser c x (Step (List a) (List a))
1274 | sequenceEndOptional ender ws parseItem sep revItems =
1275 |   let
1276 |     parseEnd =
1277 |       map (\_ -> Done (List.reverse revItems)) ender
1278 |   in
1279 |   skip ws <|
1280 |     oneOf
1281 |       [ skip sep <| skip ws <|
1282 |           oneOf
1283 |             [ parseItem |> map (\item -> Loop (item :: revItems))
1284 |             , parseEnd
1285 |             ]
1286 |       , parseEnd
1287 |       ]
1288 | 
1289 | 
1290 | sequenceEndMandatory : Parser c x () -> Parser c x a -> Parser c x () -> List a -> Parser c x (Step (List a) (List a))
1291 | sequenceEndMandatory ws parseItem sep revItems =
1292 |   oneOf
1293 |     [ map (\item -> Loop (item :: revItems)) <|
1294 |         ignorer parseItem (ignorer ws (ignorer sep ws))
1295 |     , map (\_ -> Done (List.reverse revItems)) (succeed ())
1296 |     ]
1297 | 
1298 | 
1299 | 
1300 | -- WHITESPACE
1301 | 
1302 | 
1303 | {-| Just like [`Parser.spaces`](Parser#spaces)
1304 | -}
1305 | spaces : Parser c x ()
1306 | spaces =
1307 |   chompWhile (\c -> c == ' ' || c == '\n' || c == '\r')
1308 | 
1309 | 
1310 | {-| Just like [`Parser.lineComment`](Parser#lineComment) except you provide a
1311 | `Token` describing the starting symbol.
1312 | -}
1313 | lineComment : Token x -> Parser c x ()
1314 | lineComment start =
1315 |   ignorer (token start) (chompUntilEndOr "\n")
1316 | 
1317 | 
1318 | {-| Just like [`Parser.multiComment`](Parser#multiComment) except with a
1319 | `Token` for the open and close symbols.
1320 | -}
1321 | multiComment : Token x -> Token x -> Nestable -> Parser c x ()
1322 | multiComment open close nestable =
1323 |   case nestable of
1324 |     NotNestable ->
1325 |       ignorer (token open) (chompUntil close)
1326 | 
1327 |     Nestable ->
1328 |       nestableComment open close
1329 | 
1330 | 
1331 | {-| Works just like [`Parser.Nestable`](Parser#nestable) to help distinguish
1332 | between unnestable `/*` `*/` comments like in JS and nestable `{-` `-}`
1333 | comments like in Elm.
1334 | -}
1335 | type Nestable = NotNestable | Nestable
1336 | 
1337 | 
1338 | nestableComment : Token x -> Token x -> Parser c x ()
1339 | nestableComment (Token oStr oX as open) (Token cStr cX as close) =
1340 |   case String.uncons oStr of
1341 |     Nothing ->
1342 |       problem oX
1343 | 
1344 |     Just (openChar, _) ->
1345 |       case String.uncons cStr of
1346 |         Nothing ->
1347 |           problem cX
1348 | 
1349 |         Just (closeChar, _) ->
1350 |           let
1351 |             isNotRelevant char =
1352 |               char /= openChar && char /= closeChar
1353 | 
1354 |             chompOpen =
1355 |               token open
1356 |           in
1357 |           ignorer chompOpen (nestableHelp isNotRelevant chompOpen (token close) cX 1)
1358 | 
1359 | 
1360 | nestableHelp : (Char -> Bool) -> Parser c x () -> Parser c x () -> x -> Int -> Parser c x ()
1361 | nestableHelp isNotRelevant open close expectingClose nestLevel =
1362 |   skip (chompWhile isNotRelevant) <|
1363 |     oneOf
1364 |       [ if nestLevel == 1 then
1365 |           close
1366 |         else
1367 |           close
1368 |             |> andThen (\_ -> nestableHelp isNotRelevant open close expectingClose (nestLevel - 1))
1369 |       , open
1370 |           |> andThen (\_ -> nestableHelp isNotRelevant open close expectingClose (nestLevel + 1))
1371 |       , chompIf isChar expectingClose
1372 |           |> andThen (\_ -> nestableHelp isNotRelevant open close expectingClose nestLevel)
1373 |       ]
1374 | 
1375 | 
1376 | isChar : Char -> Bool
1377 | isChar char =
1378 |   True
1379 | 


--------------------------------------------------------------------------------