├── .gitignore
├── LICENSE
├── README.md
├── comparison.md
├── elm.json
├── examples
├── DoubleQuoteString.elm
├── Math.elm
├── README.md
└── elm.json
├── semantics.md
└── src
├── Elm
└── Kernel
│ └── Parser.js
├── Parser.elm
└── Parser
└── Advanced.elm
/.gitignore:
--------------------------------------------------------------------------------
1 | elm-stuff
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright (c) 2017-present, Evan Czaplicki
2 | All rights reserved.
3 |
4 | Redistribution and use in source and binary forms, with or without
5 | modification, are permitted provided that the following conditions are met:
6 |
7 | * Redistributions of source code must retain the above copyright notice, this
8 | list of conditions and the following disclaimer.
9 |
10 | * Redistributions in binary form must reproduce the above copyright notice,
11 | this list of conditions and the following disclaimer in the documentation
12 | and/or other materials provided with the distribution.
13 |
14 | * Neither the name of the {organization} nor the names of its
15 | contributors may be used to endorse or promote products derived from
16 | this software without specific prior written permission.
17 |
18 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
19 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
21 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
22 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
24 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
25 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
26 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
27 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
28 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Parser
2 |
3 | Regular expressions are quite confusing and difficult to use. This library provides a coherent alternative that handles more cases and produces clearer code.
4 |
5 | The particular goals of this library are:
6 |
7 | - Make writing parsers as simple and fun as possible.
8 | - Produce excellent error messages.
9 | - Go pretty fast.
10 |
11 | This is achieved with a couple concepts that I have not seen in any other parser libraries: [parser pipelines](#parser-pipelines), [backtracking](#backtracking), and [tracking context](#tracking-context).
12 |
13 |
14 | ## Parser Pipelines
15 |
16 | To parse a 2D point like `( 3, 4 )`, you might create a `point` parser like this:
17 |
18 | ```elm
19 | import Parser exposing (Parser, (|.), (|=), succeed, symbol, float, spaces)
20 |
21 | type alias Point =
22 | { x : Float
23 | , y : Float
24 | }
25 |
26 | point : Parser Point
27 | point =
28 | succeed Point
29 | |. symbol "("
30 | |. spaces
31 | |= float
32 | |. spaces
33 | |. symbol ","
34 | |. spaces
35 | |= float
36 | |. spaces
37 | |. symbol ")"
38 | ```
39 |
40 | All the interesting stuff is happening in `point`. It uses two operators:
41 |
42 | - [`(|.)`][ignore] means “parse this, but **ignore** the result”
43 | - [`(|=)`][keep] means “parse this, and **keep** the result”
44 |
45 | So the `Point` function only gets the result of the two `float` parsers.
46 |
47 | [ignore]: https://package.elm-lang.org/packages/elm/parser/latest/Parser#|.
48 | [keep]: https://package.elm-lang.org/packages/elm/parser/latest/Parser#|=
49 |
50 | The theory is that `|=` introduces more “visual noise” than `|.`, making it pretty easy to pick out which lines in the pipeline are important.
51 |
52 | I recommend having one line per operator in your parser pipeline. If you need multiple lines for some reason, use a `let` or make a helper function.
53 |
54 |
55 |
56 | ## Backtracking
57 |
58 | To make fast parsers with precise error messages, all of the parsers in this package do not backtrack by default. Once you start going down a path, you keep going down it.
59 |
60 | This is nice in a string like `[ 1, 23zm5, 3 ]` where you want the error at the `z`. If we had backtracking by default, you might get the error on `[` instead. That is way less specific and harder to fix!
61 |
62 | So the defaults are nice, but sometimes the easiest way to write a parser is to look ahead a bit and see what is going to happen. It is definitely more costly to do this, but it can be handy if there is no other way. This is the role of [`backtrackable`](https://package.elm-lang.org/packages/elm/parser/latest/Parser#backtrackable) parsers. Check out the [semantics](https://github.com/elm/parser/blob/master/semantics.md) page for more details!
63 |
64 |
65 | ## Tracking Context
66 |
67 | Most parsers tell you the row and column of the problem:
68 |
69 | Something went wrong at (4:17)
70 |
71 | That may be true, but it is not how humans think. It is how text editors think! It would be better to say:
72 |
73 | I found a problem with this list:
74 |
75 | [ 1, 23zm5, 3 ]
76 | ^
77 | I wanted an integer, like 6 or 90219.
78 |
79 | Notice that the error messages says `this list`. That is context! That is the language my brain speaks, not rows and columns.
80 |
81 | Once you get comfortable with the `Parser` module, you can switch over to `Parser.Advanced` and use [`inContext`](https://package.elm-lang.org/packages/elm/parser/latest/Parser-Advanced#inContext) to track exactly what your parser thinks it is doing at the moment. You can let the parser know “I am trying to parse a `"list"` right now” so if an error happens anywhere in that context, you get the hand annotation!
82 |
83 | This technique is used by the parser in the Elm compiler to give more helpful error messages.
84 |
85 |
86 | ## [Comparison with Prior Work](https://github.com/elm/parser/blob/master/comparison.md)
87 |
--------------------------------------------------------------------------------
/comparison.md:
--------------------------------------------------------------------------------
1 | ## Comparison with Prior Work
2 |
3 | I have not seen the [parser pipeline][1] or the [context stack][2] ideas in other libraries, but [backtracking][3] relate to prior work.
4 |
5 | [1]: README.md#parser-pipelines
6 | [2]: README.md#tracking-context
7 | [3]: README.md#backtracking
8 |
9 | Most parser combinator libraries I have seen are based on Haskell’s Parsec library, which has primitives named `try` and `lookAhead`. I believe [`backtrackable`][backtrackable] is a better primitive for two reasons.
10 |
11 | [backtrackable]: https://package.elm-lang.org/packages/elm/parser/latest/Parser#backtrackable
12 |
13 |
14 | ### Performance and Composition
15 |
16 | Say we want to create a precise error message for `length [1,,3]`. The naive approach with Haskell’s Parsec library produces very bad error messages:
17 |
18 | ```haskell
19 | spaceThenArg :: Parser Expr
20 | spaceThenArg =
21 | try (spaces >> term)
22 | ```
23 |
24 | This means we get a precise error from `term`, but then throw it away and say something went wrong at the space before the `[`. Very confusing! To improve quality, we must write something like this:
25 |
26 | ```haskell
27 | spaceThenArg :: Parser Expr
28 | spaceThenArg =
29 | choice
30 | [ do lookAhead (spaces >> char '[')
31 | spaces
32 | term
33 | , try (spaces >> term)
34 | ]
35 | ```
36 |
37 | Notice that we parse `spaces` twice no matter what.
38 |
39 | Notice that we also had to hardcode `[` in the `lookAhead`. What if we update `term` to parse records that start with `{` as well? To get good commits on records, we must remember to update `lookAhead` to look for `oneOf "[{"`. Implementation details are leaking out of `term`!
40 |
41 | With `backtrackable` in this Elm library, you can just say:
42 |
43 | ```elm
44 | spaceThenArg : Parser Expr
45 | spaceThenArg =
46 | succeed identity
47 | |. backtrackable spaces
48 | |= term
49 | ```
50 |
51 | It does less work, and is more reliable as `term` evolves. I believe the presence of `backtrackable` means that `lookAhead` is no longer needed.
52 |
53 |
54 | ### Expressiveness
55 |
56 | You can define `try` in terms of [`backtrackable`][backtrackable] like this:
57 |
58 | ```elm
59 | try : Parser a -> Parser a
60 | try parser =
61 | succeed identity
62 | |= backtrackable parser
63 | |. commit ()
64 | ```
65 |
66 | No expressiveness is lost!
67 |
68 | So while it is possible to define `try`, I left it out of the public API. In practice, `try` often leads to “bad commits” where your parser fails in a very specific way, but you then backtrack to a less specific error message. I considered naming it `allOrNothing` to better explain how it changes commit behavior, but ultimately, I thought it was best to encourage users to express their parsers with `backtrackable` directly.
69 |
70 |
71 | ### Summary
72 |
73 | Compared to previous work, `backtrackable` lets you produce precise error messages **more efficiently**. By thinking about “backtracking behavior” directly, you also end up with **cleaner composition** of parsers. And these benefits come **without any loss of expressiveness**.
74 |
--------------------------------------------------------------------------------
/elm.json:
--------------------------------------------------------------------------------
1 | {
2 | "type": "package",
3 | "name": "elm/parser",
4 | "summary": "a parsing library, focused on simplicity and great error messages",
5 | "license": "BSD-3-Clause",
6 | "version": "1.1.0",
7 | "exposed-modules": [
8 | "Parser",
9 | "Parser.Advanced"
10 | ],
11 | "elm-version": "0.19.0 <= v < 0.20.0",
12 | "dependencies": {
13 | "elm/core": "1.0.0 <= v < 2.0.0"
14 | },
15 | "test-dependencies": {}
16 | }
--------------------------------------------------------------------------------
/examples/DoubleQuoteString.elm:
--------------------------------------------------------------------------------
1 | import Browser
2 | import Char
3 | import Html
4 | import Parser exposing (..)
5 |
6 |
7 |
8 | -- MAIN
9 |
10 |
11 | main =
12 | Html.text <| Debug.toString <|
13 | run string "\"hello\""
14 |
15 |
16 |
17 | -- STRINGS
18 |
19 |
20 | string : Parser String
21 | string =
22 | succeed identity
23 | |. token "\""
24 | |= loop [] stringHelp
25 |
26 |
27 | stringHelp : List String -> Parser (Step (List String) String)
28 | stringHelp revChunks =
29 | oneOf
30 | [ succeed (\chunk -> Loop (chunk :: revChunks))
31 | |. token "\\"
32 | |= oneOf
33 | [ map (\_ -> "\n") (token "n")
34 | , map (\_ -> "\t") (token "t")
35 | , map (\_ -> "\r") (token "r")
36 | , succeed String.fromChar
37 | |. token "u{"
38 | |= unicode
39 | |. token "}"
40 | ]
41 | , token "\""
42 | |> map (\_ -> Done (String.join "" (List.reverse revChunks)))
43 | , chompWhile isUninteresting
44 | |> getChompedString
45 | |> map (\chunk -> Loop (chunk :: revChunks))
46 | ]
47 |
48 |
49 | isUninteresting : Char -> Bool
50 | isUninteresting char =
51 | char /= '\\' && char /= '"'
52 |
53 |
54 |
55 | -- UNICODE
56 |
57 |
58 | unicode : Parser Char
59 | unicode =
60 | getChompedString (chompWhile Char.isHexDigit)
61 | |> andThen codeToChar
62 |
63 |
64 | codeToChar : String -> Parser Char
65 | codeToChar str =
66 | let
67 | length = String.length str
68 | code = String.foldl addHex 0 str
69 | in
70 | if 4 <= length && length <= 6 then
71 | problem "code point must have between 4 and 6 digits"
72 | else if 0 <= code && code <= 0x10FFFF then
73 | succeed (Char.fromCode code)
74 | else
75 | problem "code point must be between 0 and 0x10FFFF"
76 |
77 |
78 | addHex : Char -> Int -> Int
79 | addHex char total =
80 | let
81 | code = Char.toCode char
82 | in
83 | if 0x30 <= code && code <= 0x39 then
84 | 16 * total + (code - 0x30)
85 | else if 0x41 <= code && code <= 0x46 then
86 | 16 * total + (10 + code - 0x41)
87 | else
88 | 16 * total + (10 + code - 0x61)
89 |
--------------------------------------------------------------------------------
/examples/Math.elm:
--------------------------------------------------------------------------------
1 | module Math exposing
2 | ( Expr
3 | , evaluate
4 | , parse
5 | )
6 |
7 |
8 | import Html exposing (div, p, text)
9 | import Parser exposing (..)
10 |
11 |
12 |
13 | -- MAIN
14 |
15 |
16 | main =
17 | case parse "2 * (3 + 4)" of
18 | Err err ->
19 | text (Debug.toString err)
20 |
21 | Ok expr ->
22 | div []
23 | [ p [] [ text (Debug.toString expr) ]
24 | , p [] [ text (String.fromFloat (evaluate expr)) ]
25 | ]
26 |
27 |
28 |
29 | -- EXPRESSIONS
30 |
31 |
32 | type Expr
33 | = Integer Int
34 | | Floating Float
35 | | Add Expr Expr
36 | | Mul Expr Expr
37 |
38 |
39 | evaluate : Expr -> Float
40 | evaluate expr =
41 | case expr of
42 | Integer n ->
43 | toFloat n
44 |
45 | Floating n ->
46 | n
47 |
48 | Add a b ->
49 | evaluate a + evaluate b
50 |
51 | Mul a b ->
52 | evaluate a * evaluate b
53 |
54 |
55 | parse : String -> Result (List DeadEnd) Expr
56 | parse string =
57 | run expression string
58 |
59 |
60 |
61 | -- PARSER
62 |
63 |
64 | {-| We want to handle integers, hexadecimal numbers, and floats. Octal numbers
65 | like `0o17` and binary numbers like `0b01101100` are not allowed.
66 | -}
67 | digits : Parser Expr
68 | digits =
69 | number
70 | { int = Just Integer
71 | , hex = Just Integer
72 | , octal = Nothing
73 | , binary = Nothing
74 | , float = Just Floating
75 | }
76 |
77 |
78 | {-| A term is a standalone chunk of math, like `4` or `(3 + 4)`. We use it as
79 | a building block in larger expressions.
80 | -}
81 | term : Parser Expr
82 | term =
83 | oneOf
84 | [ digits
85 | , succeed identity
86 | |. symbol "("
87 | |. spaces
88 | |= lazy (\_ -> expression)
89 | |. spaces
90 | |. symbol ")"
91 | ]
92 |
93 |
94 | {-| Every expression starts with a term. After that, it may be done, or there
95 | may be a `+` or `*` sign and more math.
96 | -}
97 | expression : Parser Expr
98 | expression =
99 | term
100 | |> andThen (expressionHelp [])
101 |
102 |
103 | {-| Once you have parsed a term, you can start looking for `+` and `* operators.
104 | I am tracking everything as a list, that way I can be sure to follow the order
105 | of operations (PEMDAS) when building the final expression.
106 |
107 | In one case, I need an operator and another term. If that happens I keep
108 | looking for more. In the other case, I am done parsing, and I finalize the
109 | expression.
110 | -}
111 | expressionHelp : List (Expr, Operator) -> Expr -> Parser Expr
112 | expressionHelp revOps expr =
113 | oneOf
114 | [ succeed Tuple.pair
115 | |. spaces
116 | |= operator
117 | |. spaces
118 | |= term
119 | |> andThen (\(op, newExpr) -> expressionHelp ((expr,op) :: revOps) newExpr)
120 | , lazy (\_ -> succeed (finalize revOps expr))
121 | ]
122 |
123 |
124 | type Operator = AddOp | MulOp
125 |
126 |
127 | operator : Parser Operator
128 | operator =
129 | oneOf
130 | [ map (\_ -> AddOp) (symbol "+")
131 | , map (\_ -> MulOp) (symbol "*")
132 | ]
133 |
134 |
135 | {-| We only have `+` and `*` in this parser. If we see a `MulOp` we can
136 | immediately group those two expressions. If we see an `AddOp` we wait to group
137 | until all the multiplies have been taken care of.
138 |
139 | This code is kind of tricky, but it is a baseline for what you would need if
140 | you wanted to add `/`, `-`, `==`, `&&`, etc. which bring in more complex
141 | associativity and precedence rules.
142 | -}
143 | finalize : List (Expr, Operator) -> Expr -> Expr
144 | finalize revOps finalExpr =
145 | case revOps of
146 | [] ->
147 | finalExpr
148 |
149 | (expr, MulOp) :: otherRevOps ->
150 | finalize otherRevOps (Mul expr finalExpr)
151 |
152 | (expr, AddOp) :: otherRevOps ->
153 | Add (finalize otherRevOps expr) finalExpr
154 |
--------------------------------------------------------------------------------
/examples/README.md:
--------------------------------------------------------------------------------
1 | # Run the Examples
2 |
3 | To try these examples out locally, you can run the following terminal commands:
4 |
5 | ```bash
6 | git clone https://github.com/elm/parser.git
7 | cd parser/examples
8 | elm reactor
9 | ```
10 |
11 | After that, go to [`http://localhost:8000`](http://localhost:8000) and click on
12 | the example you want to see.
13 |
14 |
15 | ## Exercises
16 |
17 | - Have a user input feed into the `Math` parser. Show people the results live.
18 | - Expand the `Math` parser to cover `-` and `/` as well.
19 | - Handle more escape characters in `DoubleQuotedString`. Maybe hexidecimal
20 | escapes like `\x41` and `\x0A` that are possible in JavaScript.
--------------------------------------------------------------------------------
/examples/elm.json:
--------------------------------------------------------------------------------
1 | {
2 | "type": "application",
3 | "source-directories": [
4 | "."
5 | ],
6 | "elm-version": "0.19.0",
7 | "dependencies": {
8 | "direct": {
9 | "elm/browser": "1.0.0",
10 | "elm/core": "1.0.0",
11 | "elm/html": "1.0.0",
12 | "elm/parser": "1.1.0"
13 | },
14 | "indirect": {
15 | "elm/json": "1.0.0",
16 | "elm/time": "1.0.0",
17 | "elm/url": "1.0.0",
18 | "elm/virtual-dom": "1.0.0"
19 | }
20 | },
21 | "test-dependencies": {
22 | "direct": {},
23 | "indirect": {}
24 | }
25 | }
--------------------------------------------------------------------------------
/semantics.md:
--------------------------------------------------------------------------------
1 | # Semantics
2 |
3 | The goal of this document is to explain how different parsers fit together. When will it backtrack? When will it not?
4 |
5 |
6 |
7 | ### `keyword : String -> Parser ()`
8 |
9 | Say we have `keyword "import"`:
10 |
11 | | String | Result |
12 | |---------------|------------|
13 | | `"import"` | `OK{false}` |
14 | | `"imp"` | `ERR{true}` |
15 | | `"export"` | `ERR{true}` |
16 |
17 | In our `OK{false}` notation, we are indicating:
18 |
19 | 1. Did the parser succeed? `OK` if yes. `ERR` if not.
20 | 2. Is it possible to backtrack? So when `keyword` succeeds, backtracking is not allowed anymore. You must continue along that path.
21 |
22 |
23 |
24 |
25 | ### `map : (a -> b) -> Parser a -> Parser b`
26 |
27 | Say we have `map func parser`:
28 |
29 | | `parser` | Result |
30 | |----------|----------|
31 | | `OK{b}` | `OK{b}` |
32 | | `ERR{b}` | `ERR{b}` |
33 |
34 | So result of `map func parser` is always the same as the result of the `parser` itself.
35 |
36 |
37 |
38 |
39 | ### `map2 : (a -> b -> c) -> Parser a -> Parser b -> Parser c`
40 |
41 | Say we have `map2 func parserA parserB`:
42 |
43 | | `parserA` | `parserB` | Result |
44 | |-----------|-----------|----------------|
45 | | `OK{b}` | `OK{b'}` | `OK{b && b'}` |
46 | | `OK{b}` | `ERR{b'}` | `ERR{b && b'}` |
47 | | `ERR{b}` | | `ERR{b}` |
48 |
49 | If `parserA` succeeds, we try `parserB`. If they are both backtrackable, the combined result is backtrackable.
50 |
51 | If `parserA` fails, that is our result.
52 |
53 | This is used to define our pipeline operators like this:
54 |
55 | ```elm
56 | (|.) a b = map2 (\keep ignore -> keep) a b
57 | (|=) a b = map2 (\func arg -> func arg) a b
58 | ```
59 |
60 |
61 |
62 |
63 | ### `either : Parser a -> Parser a -> Parser a`
64 |
65 | Say we have `either parserA parserB`:
66 |
67 | | `parserA` | `parserB` | Result |
68 | |--------------|-----------|--------------|
69 | | `OK{b}` | | `OK{b}` |
70 | | `ERR{true}` | `OK{b}` | `OK{b}` |
71 | | `ERR{true}` | `ERR{b}` | `ERR{b}` |
72 | | `ERR{false}` | | `ERR{false}` |
73 |
74 | The 4th case is very important! **If `parserA` is not backtrackable, you do not even try `parserB`.**
75 |
76 | The `either` function does not appear in the public API, but I used it here because it makes the rules a bit easier to read. In the public API, we have `oneOf` instead. You can think of `oneOf` as trying `either` the head of the list, or `oneOf` the parsers in the tail of the list.
77 |
78 |
79 |
80 |
81 | ### `andThen : (a -> Parser b) -> Parser a -> Parser b`
82 |
83 | Say we have `andThen callback parserA` where `callback a` produces `parserB`:
84 |
85 | | `parserA` | `parserB` | Result |
86 | |-----------|-----------|----------------|
87 | | `ERR{b}` | | `ERR{b}` |
88 | | `OK{b}` | `OK{b'}` | `OK{b && b'}` |
89 | | `OK{b}` | `ERR{b'}` | `ERR{b && b'}` |
90 |
91 | If both parts are backtrackable, the overall result is backtrackable.
92 |
93 |
94 |
95 |
96 | ### `backtrackable : Parser a -> Parser a`
97 |
98 | Say we have `backtrackable parser`:
99 |
100 | | `parser` | Result |
101 | |----------|-------------|
102 | | `OK{b}` | `OK{true}` |
103 | | `ERR{b}` | `ERR{true}` |
104 |
105 | No matter how `parser` was defined, it is backtrackable now. This becomes very interesting when paired with `oneOf`. You can have one of the options start with a `backtrackable` segment, so even if you do start down that path, you can still try the next parser if something fails. **This has important yet subtle implications on performance, so definitely read on!**
106 |
107 |
108 |
109 |
110 | ## Examples
111 |
112 | This parser is intended to give you very precise control over backtracking behavior, and I think that is best explained through examples.
113 |
114 |
115 |
116 | ### `backtrackable`
117 |
118 | Say we have `map2 func (backtrackable spaces) (symbol ",")` which can eat a bunch of spaces followed by a comma. Here is how it would work on different strings:
119 |
120 | | String | Result |
121 | |---------|-------------|
122 | | `" ,"` | `OK{false}` |
123 | | `" :"` | `ERR{true}` |
124 | | `"abc"` | `ERR{true}` |
125 |
126 | Remember how `map2` is backtrackable only if both parsers are backtrackable. So in the first case, the overall result is not backtrackable because `symbol ","` succeeded.
127 |
128 | This becomes useful when paired with `either`!
129 |
130 |
131 |
132 |
133 | ### `backtrackable` + `oneOf` (inefficient)
134 |
135 | Say we have the following `parser` definition:
136 |
137 | ```elm
138 | parser : Parser (Maybe Int)
139 | parser =
140 | oneOf
141 | [ succeed Just
142 | |. backtrackable spaces
143 | |. symbol ","
144 | |. spaces
145 | |= int
146 | , succeed Nothing
147 | |. spaces
148 | |. symbol "]"
149 | ]
150 | ```
151 |
152 | Here is how it would work on different strings:
153 |
154 | | String | Result |
155 | |-----------|--------------|
156 | | `" , 4"` | `OK{false}` |
157 | | `" ,"` | `ERR{false}` |
158 | | `" , a"` | `ERR{false}` |
159 | | `" ]"` | `OK{false}` |
160 | | `" a"` | `ERR{false}` |
161 | | `"abc"` | `ERR{true}` |
162 |
163 | Some of these cases are tricky, so let's look at them in more depth:
164 |
165 | - `" , a"` — `backtrackable spaces`, `symbol ","`, and `spaces` all succeed. At that point we have `OK{false}`. The `int` parser then fails on `a`, so we finish with `ERR{false}`. That means `oneOf` will NOT try the second possibility.
166 | - `" ]"` — `backtrackable spaces` succeeds, but `symbol ","` fails. At that point we have `ERR{true}`, so `oneOf` tries the second possibility. After backtracking, `spaces` and `symbol "]"` succeed with `OK{false}`.
167 | - `" a"` — `backtrackable spaces` succeeds, but `symbol ","` fails. At that point we have `ERR{true}`, so `oneOf` tries the second possibility. After backtracking, `spaces` succeeds with `OK{false}` and `symbol "]"` fails resulting in `ERR{false}`.
168 |
169 |
170 |
171 |
172 | ### `oneOf` (efficient)
173 |
174 | Notice that in the previous example, we parsed `spaces` twice in some cases. This is inefficient, especially in large files with lots of whitespace. Backtracking is very inefficient in general though, so **if you are interested in performance, it is worthwhile to try to eliminate as many uses of `backtrackable` as possible.**
175 |
176 | So we can rewrite that last example to never backtrack:
177 |
178 | ```elm
179 | parser : Parser (Maybe Int)
180 | parser =
181 | succeed identity
182 | |. spaces
183 | |= oneOf
184 | [ succeed Just
185 | |. symbol ","
186 | |. spaces
187 | |= int
188 | , succeed Nothing
189 | |. symbol "]"
190 | ]
191 | ```
192 |
193 | Now we are guaranteed to consume the spaces only one time. After that, we decide if we are looking at a `,` or `]`, so we never backtrack and reparse things.
194 |
195 | If you are strategic in shuffling parsers around, you can write parsers that do not need `backtrackable` at all. The resulting parsers are quite fast. They are essentially the same as [LR(k)](https://en.wikipedia.org/wiki/Canonical_LR_parser) parsers, but more pleasant to write. I did this in Elm compiler for parsing Elm code, and it was very significantly faster.
196 |
--------------------------------------------------------------------------------
/src/Elm/Kernel/Parser.js:
--------------------------------------------------------------------------------
1 | /*
2 |
3 | import Elm.Kernel.Utils exposing (chr, Tuple2, Tuple3)
4 |
5 | */
6 |
7 |
8 |
9 | // STRINGS
10 |
11 |
12 | var _Parser_isSubString = F5(function(smallString, offset, row, col, bigString)
13 | {
14 | var smallLength = smallString.length;
15 | var isGood = offset + smallLength <= bigString.length;
16 |
17 | for (var i = 0; isGood && i < smallLength; )
18 | {
19 | var code = bigString.charCodeAt(offset);
20 | isGood =
21 | smallString[i++] === bigString[offset++]
22 | && (
23 | code === 0x000A /* \n */
24 | ? ( row++, col=1 )
25 | : ( col++, (code & 0xF800) === 0xD800 ? smallString[i++] === bigString[offset++] : 1 )
26 | )
27 | }
28 |
29 | return __Utils_Tuple3(isGood ? offset : -1, row, col);
30 | });
31 |
32 |
33 |
34 | // CHARS
35 |
36 |
37 | var _Parser_isSubChar = F3(function(predicate, offset, string)
38 | {
39 | return (
40 | string.length <= offset
41 | ? -1
42 | :
43 | (string.charCodeAt(offset) & 0xF800) === 0xD800
44 | ? (predicate(__Utils_chr(string.substr(offset, 2))) ? offset + 2 : -1)
45 | :
46 | (predicate(__Utils_chr(string[offset]))
47 | ? ((string[offset] === '\n') ? -2 : (offset + 1))
48 | : -1
49 | )
50 | );
51 | });
52 |
53 |
54 | var _Parser_isAsciiCode = F3(function(code, offset, string)
55 | {
56 | return string.charCodeAt(offset) === code;
57 | });
58 |
59 |
60 |
61 | // NUMBERS
62 |
63 |
64 | var _Parser_chompBase10 = F2(function(offset, string)
65 | {
66 | for (; offset < string.length; offset++)
67 | {
68 | var code = string.charCodeAt(offset);
69 | if (code < 0x30 || 0x39 < code)
70 | {
71 | return offset;
72 | }
73 | }
74 | return offset;
75 | });
76 |
77 |
78 | var _Parser_consumeBase = F3(function(base, offset, string)
79 | {
80 | for (var total = 0; offset < string.length; offset++)
81 | {
82 | var digit = string.charCodeAt(offset) - 0x30;
83 | if (digit < 0 || base <= digit) break;
84 | total = base * total + digit;
85 | }
86 | return __Utils_Tuple2(offset, total);
87 | });
88 |
89 |
90 | var _Parser_consumeBase16 = F2(function(offset, string)
91 | {
92 | for (var total = 0; offset < string.length; offset++)
93 | {
94 | var code = string.charCodeAt(offset);
95 | if (0x30 <= code && code <= 0x39)
96 | {
97 | total = 16 * total + code - 0x30;
98 | }
99 | else if (0x41 <= code && code <= 0x46)
100 | {
101 | total = 16 * total + code - 55;
102 | }
103 | else if (0x61 <= code && code <= 0x66)
104 | {
105 | total = 16 * total + code - 87;
106 | }
107 | else
108 | {
109 | break;
110 | }
111 | }
112 | return __Utils_Tuple2(offset, total);
113 | });
114 |
115 |
116 |
117 | // FIND STRING
118 |
119 |
120 | var _Parser_findSubString = F5(function(smallString, offset, row, col, bigString)
121 | {
122 | var newOffset = bigString.indexOf(smallString, offset);
123 | var target = newOffset < 0 ? bigString.length : newOffset + smallString.length;
124 |
125 | while (offset < target)
126 | {
127 | var code = bigString.charCodeAt(offset++);
128 | code === 0x000A /* \n */
129 | ? ( col=1, row++ )
130 | : ( col++, (code & 0xF800) === 0xD800 && offset++ )
131 | }
132 |
133 | return __Utils_Tuple3(newOffset, row, col);
134 | });
135 |
--------------------------------------------------------------------------------
/src/Parser.elm:
--------------------------------------------------------------------------------
1 | module Parser exposing
2 | ( Parser, run
3 | , int, float, number, symbol, keyword, variable, end
4 | , succeed, (|=), (|.), lazy, andThen, problem
5 | , oneOf, map, backtrackable, commit, token
6 | , sequence, Trailing(..), loop, Step(..)
7 | , spaces, lineComment, multiComment, Nestable(..)
8 | , getChompedString, chompIf, chompWhile, chompUntil, chompUntilEndOr, mapChompedString
9 | , DeadEnd, Problem(..), deadEndsToString
10 | , withIndent, getIndent
11 | , getPosition, getRow, getCol, getOffset, getSource
12 | )
13 |
14 |
15 | {-|
16 |
17 | # Parsers
18 | @docs Parser, run
19 |
20 | # Building Blocks
21 | @docs int, float, number, symbol, keyword, variable, end
22 |
23 | # Pipelines
24 | @docs succeed, (|=), (|.), lazy, andThen, problem
25 |
26 | # Branches
27 | @docs oneOf, map, backtrackable, commit, token
28 |
29 | # Loops
30 | @docs sequence, Trailing, loop, Step
31 |
32 | # Whitespace
33 | @docs spaces, lineComment, multiComment, Nestable
34 |
35 | # Chompers
36 | @docs getChompedString, chompIf, chompWhile, chompUntil, chompUntilEndOr, mapChompedString
37 |
38 | # Errors
39 | @docs DeadEnd, Problem, deadEndsToString
40 |
41 | # Indentation
42 | @docs withIndent, getIndent
43 |
44 | # Positions
45 | @docs getPosition, getRow, getCol, getOffset, getSource
46 | -}
47 |
48 |
49 | import Char
50 | import Parser.Advanced as A exposing ((|=), (|.))
51 | import Set
52 |
53 |
54 |
55 | -- INFIX OPERATORS - see Parser.Advanced for why 5 and 6 were chosen
56 |
57 |
58 | infix left 5 (|=) = keeper
59 | infix left 6 (|.) = ignorer
60 |
61 |
62 |
63 | -- PARSERS
64 |
65 |
66 | {-| A `Parser` helps turn a `String` into nicely structured data. For example,
67 | we can [`run`](#run) the [`int`](#int) parser to turn `String` to `Int`:
68 |
69 | run int "123456" == Ok 123456
70 | run int "3.1415" == Err ...
71 |
72 | The cool thing is that you can combine `Parser` values to handle much more
73 | complex scenarios.
74 | -}
75 | type alias Parser a =
76 | A.Parser Never Problem a
77 |
78 |
79 |
80 | -- RUN
81 |
82 |
83 | {-| Try a parser. Here are some examples using the [`keyword`](#keyword)
84 | parser:
85 |
86 | run (keyword "true") "true" == Ok ()
87 | run (keyword "true") "True" == Err ...
88 | run (keyword "true") "false" == Err ...
89 | run (keyword "true") "true!" == Ok ()
90 |
91 | Notice the last case! A `Parser` will chomp as much as possible and not worry
92 | about the rest. Use the [`end`](#end) parser to ensure you made it to the end
93 | of the string!
94 | -}
95 | run : Parser a -> String -> Result (List DeadEnd) a
96 | run parser source =
97 | case A.run parser source of
98 | Ok a ->
99 | Ok a
100 |
101 | Err problems ->
102 | Err (List.map problemToDeadEnd problems)
103 |
104 |
105 | problemToDeadEnd : A.DeadEnd Never Problem -> DeadEnd
106 | problemToDeadEnd p =
107 | DeadEnd p.row p.col p.problem
108 |
109 |
110 |
111 | -- PROBLEMS
112 |
113 |
114 | {-| A parser can run into situations where there is no way to make progress.
115 | When that happens, I record the `row` and `col` where you got stuck and the
116 | particular `problem` you ran into. That is a `DeadEnd`!
117 |
118 | **Note:** I count rows and columns like a text editor. The beginning is `row=1`
119 | and `col=1`. As I chomp characters, the `col` increments. When I reach a `\n`
120 | character, I increment the `row` and set `col=1`.
121 | -}
122 | type alias DeadEnd =
123 | { row : Int
124 | , col : Int
125 | , problem : Problem
126 | }
127 |
128 |
129 | {-| When you run into a `DeadEnd`, I record some information about why you
130 | got stuck. This data is useful for producing helpful error messages. This is
131 | how [`deadEndsToString`](#deadEndsToString) works!
132 |
133 | **Note:** If you feel limited by this type (i.e. having to represent custom
134 | problems as strings) I highly recommend switching to `Parser.Advanced`. It
135 | lets you define your own `Problem` type. It can also track "context" which
136 | can improve error messages a ton! This is how the Elm compiler produces
137 | relatively nice parse errors, and I am excited to see those techniques applied
138 | elsewhere!
139 | -}
140 | type Problem
141 | = Expecting String
142 | | ExpectingInt
143 | | ExpectingHex
144 | | ExpectingOctal
145 | | ExpectingBinary
146 | | ExpectingFloat
147 | | ExpectingNumber
148 | | ExpectingVariable
149 | | ExpectingSymbol String
150 | | ExpectingKeyword String
151 | | ExpectingEnd
152 | | UnexpectedChar
153 | | Problem String
154 | | BadRepeat
155 |
156 |
157 | {-| Turn all the `DeadEnd` data into a string that is easier for people to
158 | read.
159 |
160 | **Note:** This is just a baseline of quality. It cannot do anything with colors.
161 | It is not interactivite. It just turns the raw data into strings. I really hope
162 | folks will check out the source code for some inspiration on how to turn errors
163 | into `Html` with nice colors and interaction! The `Parser.Advanced` module lets
164 | you work with context as well, which really unlocks another level of quality!
165 | The "context" technique is how the Elm compiler can say "I think I am parsing a
166 | list, so I was expecting a closing `]` here." Telling users what the parser
167 | _thinks_ is happening can be really helpful!
168 | -}
169 | deadEndsToString : List DeadEnd -> String
170 | deadEndsToString deadEnds =
171 | "TODO deadEndsToString"
172 |
173 |
174 |
175 | -- PIPELINES
176 |
177 |
178 | {-| A parser that succeeds without chomping any characters.
179 |
180 | run (succeed 90210 ) "mississippi" == Ok 90210
181 | run (succeed 3.141 ) "mississippi" == Ok 3.141
182 | run (succeed () ) "mississippi" == Ok ()
183 | run (succeed Nothing) "mississippi" == Ok Nothing
184 |
185 | Seems weird on its own, but it is very useful in combination with other
186 | functions. The docs for [`(|=)`](#|=) and [`andThen`](#andThen) have some neat
187 | examples.
188 | -}
189 | succeed : a -> Parser a
190 | succeed =
191 | A.succeed
192 |
193 |
194 | {-| **Keep** values in a parser pipeline. For example, we could say:
195 |
196 | type alias Point = { x : Float, y : Float }
197 |
198 | point : Parser Point
199 | point =
200 | succeed Point
201 | |. symbol "("
202 | |. spaces
203 | |= float
204 | |. spaces
205 | |. symbol ","
206 | |. spaces
207 | |= float
208 | |. spaces
209 | |. symbol ")"
210 |
211 | All the parsers in this pipeline will chomp characters and produce values. So
212 | `symbol "("` will chomp one paren and produce a `()` value. Similarly, `float`
213 | will chomp some digits and produce a `Float` value. The `(|.)` and `(|=)`
214 | operators just decide whether we give the values to the `Point` function.
215 |
216 | So in this case, we skip the `()` from `symbol "("`, we skip the `()` from
217 | `spaces`, we keep the `Float` from `float`, etc.
218 | -}
219 | keeper : Parser (a -> b) -> Parser a -> Parser b
220 | keeper =
221 | (|=)
222 |
223 |
224 | {-| **Skip** values in a parser pipeline. For example, maybe we want to parse
225 | some JavaScript variables:
226 |
227 | var : Parser String
228 | var =
229 | getChompedString <|
230 | succeed ()
231 | |. chompIf isStartChar
232 | |. chompWhile isInnerChar
233 |
234 | isStartChar : Char -> Bool
235 | isStartChar char =
236 | Char.isAlpha char || char == '_' || char == '$'
237 |
238 | isInnerChar : Char -> Bool
239 | isInnerChar char =
240 | isStartChar char || Char.isDigit char
241 |
242 | `chompIf isStartChar` can chomp one character and produce a `()` value.
243 | `chompWhile isInnerChar` can chomp zero or more characters and produce a `()`
244 | value. The `(|.)` operators are saying to still chomp all the characters, but
245 | skip the two `()` values that get produced. No one cares about them.
246 | -}
247 | ignorer : Parser keep -> Parser ignore -> Parser keep
248 | ignorer =
249 | (|.)
250 |
251 |
252 | {-| Helper to define recursive parsers. Say we want a parser for simple
253 | boolean expressions:
254 |
255 | true
256 | false
257 | (true || false)
258 | (true || (true || false))
259 |
260 | Notice that a boolean expression might contain *other* boolean expressions.
261 | That means we will want to define our parser in terms of itself:
262 |
263 | type Boolean
264 | = MyTrue
265 | | MyFalse
266 | | MyOr Boolean Boolean
267 |
268 | boolean : Parser Boolean
269 | boolean =
270 | oneOf
271 | [ succeed MyTrue
272 | |. keyword "true"
273 | , succeed MyFalse
274 | |. keyword "false"
275 | , succeed MyOr
276 | |. symbol "("
277 | |. spaces
278 | |= lazy (\_ -> boolean)
279 | |. spaces
280 | |. symbol "||"
281 | |. spaces
282 | |= lazy (\_ -> boolean)
283 | |. spaces
284 | |. symbol ")"
285 | ]
286 |
287 | **Notice that `boolean` uses `boolean` in its definition!** In Elm, you can
288 | only define a value in terms of itself it is behind a function call. So
289 | `lazy` helps us define these self-referential parsers. (`andThen` can be used
290 | for this as well!)
291 | -}
292 | lazy : (() -> Parser a) -> Parser a
293 | lazy =
294 | A.lazy
295 |
296 |
297 | {-| Parse one thing `andThen` parse another thing. This is useful when you want
298 | to check on what you just parsed. For example, maybe you want U.S. zip codes
299 | and `int` is not suitable because it does not allow leading zeros. You could
300 | say:
301 |
302 | zipCode : Parser String
303 | zipCode =
304 | getChompedString (chompWhile Char.isDigit)
305 | |> andThen checkZipCode
306 |
307 | checkZipCode : String -> Parser String
308 | checkZipCode code =
309 | if String.length code == 5 then
310 | succeed code
311 | else
312 | problem "a U.S. zip code has exactly 5 digits"
313 |
314 | First we chomp digits `andThen` we check if it is a valid U.S. zip code. We
315 | `succeed` if it has exactly five digits and report a `problem` if not.
316 |
317 | Check out [`examples/DoubleQuoteString.elm`](https://github.com/elm/parser/blob/master/examples/DoubleQuoteString.elm)
318 | for another example, this time using `andThen` to verify unicode code points.
319 |
320 | **Note:** If you are using `andThen` recursively and blowing the stack, check
321 | out the [`loop`](#loop) function to limit stack usage.
322 | -}
323 | andThen : (a -> Parser b) -> Parser a -> Parser b
324 | andThen =
325 | A.andThen
326 |
327 |
328 | {-| Indicate that a parser has reached a dead end. "Everything was going fine
329 | until I ran into this problem." Check out the [`andThen`](#andThen) docs to see
330 | an example usage.
331 | -}
332 | problem : String -> Parser a
333 | problem msg =
334 | A.problem (Problem msg)
335 |
336 |
337 |
338 | -- BACKTRACKING
339 |
340 |
341 | {-| If you are parsing JSON, the values can be strings, floats, booleans,
342 | arrays, objects, or null. You need a way to pick `oneOf` them! Here is a
343 | sample of what that code might look like:
344 |
345 | type Json
346 | = Number Float
347 | | Boolean Bool
348 | | Null
349 |
350 | json : Parser Json
351 | json =
352 | oneOf
353 | [ map Number float
354 | , map (\_ -> Boolean True) (keyword "true")
355 | , map (\_ -> Boolean False) (keyword "false")
356 | , map (\_ -> Null) keyword "null"
357 | ]
358 |
359 | This parser will keep trying parsers until `oneOf` them starts chomping
360 | characters. Once a path is chosen, it does not come back and try the others.
361 |
362 | **Note:** I highly recommend reading [this document][semantics] to learn how
363 | `oneOf` and `backtrackable` interact. It is subtle and important!
364 |
365 | [semantics]: https://github.com/elm/parser/blob/master/semantics.md
366 | -}
367 | oneOf : List (Parser a) -> Parser a
368 | oneOf =
369 | A.oneOf
370 |
371 |
372 | {-| Transform the result of a parser. Maybe you have a value that is
373 | an integer or `null`:
374 |
375 | nullOrInt : Parser (Maybe Int)
376 | nullOrInt =
377 | oneOf
378 | [ map Just int
379 | , map (\_ -> Nothing) (keyword "null")
380 | ]
381 |
382 | -- run nullOrInt "0" == Ok (Just 0)
383 | -- run nullOrInt "13" == Ok (Just 13)
384 | -- run nullOrInt "null" == Ok Nothing
385 | -- run nullOrInt "zero" == Err ...
386 | -}
387 | map : (a -> b) -> Parser a -> Parser b
388 | map =
389 | A.map
390 |
391 |
392 | {-| It is quite tricky to use `backtrackable` well! It can be very useful, but
393 | also can degrade performance and error message quality.
394 |
395 | Read [this document](https://github.com/elm/parser/blob/master/semantics.md)
396 | to learn how `oneOf`, `backtrackable`, and `commit` work and interact with
397 | each other. It is subtle and important!
398 | -}
399 | backtrackable : Parser a -> Parser a
400 | backtrackable =
401 | A.backtrackable
402 |
403 |
404 | {-| `commit` is almost always paired with `backtrackable` in some way, and it
405 | is tricky to use well.
406 |
407 | Read [this document](https://github.com/elm/parser/blob/master/semantics.md)
408 | to learn how `oneOf`, `backtrackable`, and `commit` work and interact with
409 | each other. It is subtle and important!
410 | -}
411 | commit : a -> Parser a
412 | commit =
413 | A.commit
414 |
415 |
416 |
417 | -- TOKEN
418 |
419 |
420 | {-| Parse exactly the given string, without any regard to what comes next.
421 |
422 | A potential pitfall when parsing keywords is getting tricked by variables that
423 | start with a keyword, like `let` in `letters` or `import` in `important`. This
424 | is especially likely if you have a whitespace parser that can consume zero
425 | characters. So the [`keyword`](#keyword) parser is defined with `token` and a
426 | trick to peek ahead a bit:
427 |
428 | keyword : String -> Parser ()
429 | keyword kwd =
430 | succeed identity
431 | |. backtrackable (token kwd)
432 | |= oneOf
433 | [ map (\_ -> True) (backtrackable (chompIf isVarChar))
434 | , succeed False
435 | ]
436 | |> andThen (checkEnding kwd)
437 |
438 | checkEnding : String -> Bool -> Parser ()
439 | checkEnding kwd isBadEnding =
440 | if isBadEnding then
441 | problem ("expecting the `" ++ kwd ++ "` keyword")
442 | else
443 | commit ()
444 |
445 | isVarChar : Char -> Bool
446 | isVarChar char =
447 | Char.isAlphaNum char || char == '_'
448 |
449 | This definition is specially designed so that (1) if you really see `let` you
450 | commit to that path and (2) if you see `letters` instead you can backtrack and
451 | try other options. If I had just put a `backtrackable` around the whole thing
452 | you would not get (1) anymore.
453 | -}
454 | token : String -> Parser ()
455 | token str =
456 | A.token (toToken str)
457 |
458 |
459 | toToken : String -> A.Token Problem
460 | toToken str =
461 | A.Token str (Expecting str)
462 |
463 |
464 |
465 | -- LOOPS
466 |
467 |
468 | {-| A parser that can loop indefinitely. This can be helpful when parsing
469 | repeated structures, like a bunch of statements:
470 |
471 | statements : Parser (List Stmt)
472 | statements =
473 | loop [] statementsHelp
474 |
475 | statementsHelp : List Stmt -> Parser (Step (List Stmt) (List Stmt))
476 | statementsHelp revStmts =
477 | oneOf
478 | [ succeed (\stmt -> Loop (stmt :: revStmts))
479 | |= statement
480 | |. spaces
481 | |. symbol ";"
482 | |. spaces
483 | , succeed ()
484 | |> map (\_ -> Done (List.reverse revStmts))
485 | ]
486 |
487 | -- statement : Parser Stmt
488 |
489 | Notice that the statements are tracked in reverse as we `Loop`, and we reorder
490 | them only once we are `Done`. This is a very common pattern with `loop`!
491 |
492 | Check out [`examples/DoubleQuoteString.elm`](https://github.com/elm/parser/blob/master/examples/DoubleQuoteString.elm)
493 | for another example.
494 |
495 | **IMPORTANT NOTE:** Parsers like `succeed ()` and `chompWhile Char.isAlpha` can
496 | succeed without consuming any characters. So in some cases you may want to use
497 | [`getOffset`](#getOffset) to ensure that each step actually consumed characters.
498 | Otherwise you could end up in an infinite loop!
499 |
500 | **Note:** Anything you can write with `loop`, you can also write as a parser
501 | that chomps some characters `andThen` calls itself with new arguments. The
502 | problem with calling `andThen` recursively is that it grows the stack, so you
503 | cannot do it indefinitely. So `loop` is important because enables tail-call
504 | elimination, allowing you to parse however many repeats you want.
505 | -}
506 | loop : state -> (state -> Parser (Step state a)) -> Parser a
507 | loop state callback =
508 | A.loop state (\s -> map toAdvancedStep (callback s))
509 |
510 |
511 | {-| Decide what steps to take next in your [`loop`](#loop).
512 |
513 | If you are `Done`, you give the result of the whole `loop`. If you decide to
514 | `Loop` around again, you give a new state to work from. Maybe you need to add
515 | an item to a list? Or maybe you need to track some information about what you
516 | just saw?
517 |
518 | **Note:** It may be helpful to learn about [finite-state machines][fsm] to get
519 | a broader intuition about using `state`. I.e. You may want to create a `type`
520 | that describes four possible states, and then use `Loop` to transition between
521 | them as you consume characters.
522 |
523 | [fsm]: https://en.wikipedia.org/wiki/Finite-state_machine
524 | -}
525 | type Step state a
526 | = Loop state
527 | | Done a
528 |
529 |
530 | toAdvancedStep : Step s a -> A.Step s a
531 | toAdvancedStep step =
532 | case step of
533 | Loop s -> A.Loop s
534 | Done a -> A.Done a
535 |
536 |
537 |
538 | -- NUMBERS
539 |
540 |
541 | {-| Parse integers.
542 |
543 | run int "1" == Ok 1
544 | run int "1234" == Ok 1234
545 |
546 | run int "-789" == Err ...
547 | run int "0123" == Err ...
548 | run int "1.34" == Err ...
549 | run int "1e31" == Err ...
550 | run int "123a" == Err ...
551 | run int "0x1A" == Err ...
552 |
553 | If you want to handle a leading `+` or `-` you should do it with a custom
554 | parser like this:
555 |
556 | myInt : Parser Int
557 | myInt =
558 | oneOf
559 | [ succeed negate
560 | |. symbol "-"
561 | |= int
562 | , int
563 | ]
564 |
565 | **Note:** If you want a parser for both `Int` and `Float` literals, check out
566 | [`number`](#number) below. It will be faster than using `oneOf` to combining
567 | `int` and `float` yourself.
568 | -}
569 | int : Parser Int
570 | int =
571 | A.int ExpectingInt ExpectingInt
572 |
573 |
574 | {-| Parse floats.
575 |
576 | run float "123" == Ok 123
577 | run float "3.1415" == Ok 3.1415
578 | run float "0.1234" == Ok 0.1234
579 | run float ".1234" == Ok 0.1234
580 | run float "1e-42" == Ok 1e-42
581 | run float "6.022e23" == Ok 6.022e23
582 | run float "6.022E23" == Ok 6.022e23
583 | run float "6.022e+23" == Ok 6.022e23
584 |
585 | If you want to disable literals like `.123` (like in Elm) you could write
586 | something like this:
587 |
588 | elmFloat : Parser Float
589 | elmFloat =
590 | oneOf
591 | [ symbol "."
592 | |. problem "floating point numbers must start with a digit, like 0.25"
593 | , float
594 | ]
595 |
596 | **Note:** If you want a parser for both `Int` and `Float` literals, check out
597 | [`number`](#number) below. It will be faster than using `oneOf` to combining
598 | `int` and `float` yourself.
599 | -}
600 | float : Parser Float
601 | float =
602 | A.float ExpectingFloat ExpectingFloat
603 |
604 |
605 |
606 | -- NUMBER
607 |
608 |
609 | {-| Parse a bunch of different kinds of numbers without backtracking. A parser
610 | for Elm would need to handle integers, floats, and hexadecimal like this:
611 |
612 | type Expr
613 | = Variable String
614 | | Int Int
615 | | Float Float
616 | | Apply Expr Expr
617 |
618 | elmNumber : Parser Expr
619 | elmNumber =
620 | number
621 | { int = Just Int
622 | , hex = Just Int -- 0x001A is allowed
623 | , octal = Nothing -- 0o0731 is not
624 | , binary = Nothing -- 0b1101 is not
625 | , float = Just Float
626 | }
627 |
628 | If you wanted to implement the [`float`](#float) parser, it would be like this:
629 |
630 | float : Parser Float
631 | float =
632 | number
633 | { int = Just toFloat
634 | , hex = Nothing
635 | , octal = Nothing
636 | , binary = Nothing
637 | , float = Just identity
638 | }
639 |
640 | Notice that it actually is processing `int` results! This is because `123`
641 | looks like an integer to me, but maybe it looks like a float to you. If you had
642 | `int = Nothing`, floats would need a decimal like `1.0` in every case. If you
643 | like explicitness, that may actually be preferable!
644 |
645 | **Note:** This function does not check for weird trailing characters in the
646 | current implementation, so parsing `123abc` can succeed up to `123` and then
647 | move on. This is helpful for people who want to parse things like `40px` or
648 | `3m`, but it requires a bit of extra code to rule out trailing characters in
649 | other cases.
650 | -}
651 | number
652 | : { int : Maybe (Int -> a)
653 | , hex : Maybe (Int -> a)
654 | , octal : Maybe (Int -> a)
655 | , binary : Maybe (Int -> a)
656 | , float : Maybe (Float -> a)
657 | }
658 | -> Parser a
659 | number i =
660 | A.number
661 | { int = Result.fromMaybe ExpectingInt i.int
662 | , hex = Result.fromMaybe ExpectingHex i.hex
663 | , octal = Result.fromMaybe ExpectingOctal i.octal
664 | , binary = Result.fromMaybe ExpectingBinary i.binary
665 | , float = Result.fromMaybe ExpectingFloat i.float
666 | , invalid = ExpectingNumber
667 | , expecting = ExpectingNumber
668 | }
669 |
670 |
671 |
672 | -- SYMBOL
673 |
674 |
675 | {-| Parse symbols like `(` and `,`.
676 |
677 | run (symbol "[") "[" == Ok ()
678 | run (symbol "[") "4" == Err ... (ExpectingSymbol "[") ...
679 |
680 | **Note:** This is good for stuff like brackets and semicolons, but it probably
681 | should not be used for binary operators like `+` and `-` because you can find
682 | yourself in weird situations. For example, is `3--4` a typo? Or is it `3 - -4`?
683 | I have had better luck with `chompWhile isSymbol` and sorting out which
684 | operator it is afterwards.
685 | -}
686 | symbol : String -> Parser ()
687 | symbol str =
688 | A.symbol (A.Token str (ExpectingSymbol str))
689 |
690 |
691 |
692 | -- KEYWORD
693 |
694 |
695 | {-| Parse keywords like `let`, `case`, and `type`.
696 |
697 | run (keyword "let") "let" == Ok ()
698 | run (keyword "let") "var" == Err ... (ExpectingKeyword "let") ...
699 | run (keyword "let") "letters" == Err ... (ExpectingKeyword "let") ...
700 |
701 | **Note:** Notice the third case there! `keyword` actually looks ahead one
702 | character to make sure it is not a letter, number, or underscore. The goal is
703 | to help with parsers like this:
704 |
705 | succeed identity
706 | |. keyword "let"
707 | |. spaces
708 | |= elmVar
709 | |. spaces
710 | |. symbol "="
711 |
712 | The trouble is that `spaces` may chomp zero characters (to handle expressions
713 | like `[1,2]` and `[ 1 , 2 ]`) and in this case, it would mean `letters` could
714 | be parsed as `let ters` and then wonder where the equals sign is! Check out the
715 | [`token`](#token) docs if you need to customize this!
716 | -}
717 | keyword : String -> Parser ()
718 | keyword kwd =
719 | A.keyword (A.Token kwd (ExpectingKeyword kwd))
720 |
721 |
722 |
723 | -- END
724 |
725 |
726 | {-| Check if you have reached the end of the string you are parsing.
727 |
728 | justAnInt : Parser Int
729 | justAnInt =
730 | succeed identity
731 | |= int
732 | |. end
733 |
734 | -- run justAnInt "90210" == Ok 90210
735 | -- run justAnInt "1 + 2" == Err ...
736 | -- run int "1 + 2" == Ok 1
737 |
738 | Parsers can succeed without parsing the whole string. Ending your parser
739 | with `end` guarantees that you have successfully parsed the whole string.
740 | -}
741 | end : Parser ()
742 | end =
743 | A.end ExpectingEnd
744 |
745 |
746 |
747 | -- CHOMPED STRINGS
748 |
749 |
750 | {-| Sometimes parsers like `int` or `variable` cannot do exactly what you
751 | need. The "chomping" family of functions is meant for that case! Maybe you
752 | need to parse [valid PHP variables][php] like `$x` and `$txt`:
753 |
754 | php : Parser String
755 | php =
756 | getChompedString <|
757 | succeed ()
758 | |. chompIf (\c -> c == '$')
759 | |. chompIf (\c -> Char.isAlpha c || c == '_')
760 | |. chompWhile (\c -> Char.isAlphaNum c || c == '_')
761 |
762 | The idea is that you create a bunch of chompers that validate the underlying
763 | characters. Then `getChompedString` extracts the underlying `String` efficiently.
764 |
765 | **Note:** Maybe it is helpful to see how you can use [`getOffset`](#getOffset)
766 | and [`getSource`](#getSource) to implement this function:
767 |
768 | getChompedString : Parser a -> Parser String
769 | getChompedString parser =
770 | succeed String.slice
771 | |= getOffset
772 | |. parser
773 | |= getOffset
774 | |= getSource
775 |
776 | [php]: https://www.w3schools.com/php/php_variables.asp
777 | -}
778 | getChompedString : Parser a -> Parser String
779 | getChompedString =
780 | A.getChompedString
781 |
782 |
783 | {-| This works just like [`getChompedString`](#getChompedString) but gives
784 | a bit more flexibility. For example, maybe you want to parse Elm doc comments
785 | and get (1) the full comment and (2) all of the names listed in the docs.
786 |
787 | You could implement `mapChompedString` like this:
788 |
789 | mapChompedString : (String -> a -> b) -> Parser a -> Parser String
790 | mapChompedString func parser =
791 | succeed (\start value end src -> func (String.slice start end src) value)
792 | |= getOffset
793 | |= parser
794 | |= getOffset
795 | |= getSource
796 |
797 | -}
798 | mapChompedString : (String -> a -> b) -> Parser a -> Parser b
799 | mapChompedString =
800 | A.mapChompedString
801 |
802 |
803 |
804 | {-| Chomp one character if it passes the test.
805 |
806 | chompUpper : Parser ()
807 | chompUpper =
808 | chompIf Char.isUpper
809 |
810 | So this can chomp a character like `T` and produces a `()` value.
811 | -}
812 | chompIf : (Char -> Bool) -> Parser ()
813 | chompIf isGood =
814 | A.chompIf isGood UnexpectedChar
815 |
816 |
817 |
818 | {-| Chomp zero or more characters if they pass the test. This is commonly
819 | useful for chomping whitespace or variable names:
820 |
821 | whitespace : Parser ()
822 | whitespace =
823 | chompWhile (\c -> c == ' ' || c == '\t' || c == '\n' || c == '\r')
824 |
825 | elmVar : Parser String
826 | elmVar =
827 | getChompedString <|
828 | succeed ()
829 | |. chompIf Char.isLower
830 | |. chompWhile (\c -> Char.isAlphaNum c || c == '_')
831 |
832 | **Note:** a `chompWhile` parser always succeeds! This can lead to tricky
833 | situations, especially if you define your whitespace with it. In that case,
834 | you could accidentally interpret `letx` as the keyword `let` followed by
835 | "spaces" followed by the variable `x`. This is why the `keyword` and `number`
836 | parsers peek ahead, making sure they are not followed by anything unexpected.
837 | -}
838 | chompWhile : (Char -> Bool) -> Parser ()
839 | chompWhile =
840 | A.chompWhile
841 |
842 |
843 | {-| Chomp until you see a certain string. You could define C-style multi-line
844 | comments like this:
845 |
846 | comment : Parser ()
847 | comment =
848 | symbol "/*"
849 | |. chompUntil "*/"
850 |
851 | I recommend using [`multiComment`](#multiComment) for this particular scenario
852 | though. It can be trickier than it looks!
853 | -}
854 | chompUntil : String -> Parser ()
855 | chompUntil str =
856 | A.chompUntil (toToken str)
857 |
858 |
859 | {-| Chomp until you see a certain string or until you run out of characters to
860 | chomp! You could define single-line comments like this:
861 |
862 | elm : Parser ()
863 | elm =
864 | symbol "--"
865 | |. chompUntilEndOr "\n"
866 |
867 | A file may end with a single-line comment, so the file can end before you see
868 | a newline. Tricky!
869 |
870 | I recommend just using [`lineComment`](#lineComment) for this particular
871 | scenario.
872 | -}
873 | chompUntilEndOr : String -> Parser ()
874 | chompUntilEndOr =
875 | A.chompUntilEndOr
876 |
877 |
878 |
879 | -- INDENTATION
880 |
881 |
882 | {-| Some languages are indentation sensitive. Python cares about tabs. Elm
883 | cares about spaces sometimes. `withIndent` and `getIndent` allow you to manage
884 | "indentation state" yourself, however is necessary in your scenario.
885 | -}
886 | withIndent : Int -> Parser a -> Parser a
887 | withIndent =
888 | A.withIndent
889 |
890 |
891 | {-| When someone said `withIndent` earlier, what number did they put in there?
892 |
893 | - `getIndent` results in `0`, the default value
894 | - `withIndent 4 getIndent` results in `4`
895 |
896 | So you are just asking about things you said earlier. These numbers do not leak
897 | out of `withIndent`, so say we have:
898 |
899 | succeed Tuple.pair
900 | |= withIndent 4 getIndent
901 | |= getIndent
902 |
903 | Assuming there are no `withIndent` above this, you would get `(4,0)` from this.
904 | -}
905 | getIndent : Parser Int
906 | getIndent =
907 | A.getIndent
908 |
909 |
910 |
911 | -- POSITION
912 |
913 |
914 | {-| Code editors treat code like a grid, with rows and columns. The start is
915 | `row=1` and `col=1`. As you chomp characters, the `col` increments. When you
916 | run into a `\n` character, the `row` increments and `col` goes back to `1`.
917 |
918 | In the Elm compiler, I track the start and end position of every expression
919 | like this:
920 |
921 | type alias Located a =
922 | { start : (Int, Int)
923 | , value : a
924 | , end : (Int, Int)
925 | }
926 |
927 | located : Parser a -> Parser (Located a)
928 | located parser =
929 | succeed Located
930 | |= getPosition
931 | |= parser
932 | |= getPosition
933 |
934 | So if there is a problem during type inference, I use this saved position
935 | information to underline the exact problem!
936 |
937 | **Note:** Tabs count as one character, so if you are parsing something like
938 | Python, I recommend sorting that out *after* parsing. So if I wanted the `^^^^`
939 | underline like in Elm, I would find the `row` in the source code and do
940 | something like this:
941 |
942 | makeUnderline : String -> Int -> Int -> String
943 | makeUnderline row minCol maxCol =
944 | String.toList row
945 | |> List.indexedMap (toUnderlineChar minCol maxCol)
946 | |> String.fromList
947 |
948 | toUnderlineChar : Int -> Int -> Int -> Char -> Char
949 | toUnderlineChar minCol maxCol col char =
950 | if minCol <= col && col <= maxCol then
951 | '^'
952 | else if char == '\t' then
953 | '\t'
954 | else
955 | ' '
956 |
957 | So it would preserve any tabs from the source line. There are tons of other
958 | ways to do this though. The point is just that you handle the tabs after
959 | parsing but before anyone looks at the numbers in a context where tabs may
960 | equal 2, 4, or 8.
961 | -}
962 | getPosition : Parser (Int, Int)
963 | getPosition =
964 | A.getPosition
965 |
966 |
967 | {-| This is a more efficient version of `map Tuple.first getPosition`. Maybe
968 | you just want to track the line number for some reason? This lets you do that.
969 |
970 | See [`getPosition`](#getPosition) for an explanation of rows and columns.
971 | -}
972 | getRow : Parser Int
973 | getRow =
974 | A.getRow
975 |
976 |
977 | {-| This is a more efficient version of `map Tuple.second getPosition`. This
978 | can be useful in combination with [`withIndent`](#withIndent) and
979 | [`getIndent`](#getIndent), like this:
980 |
981 | checkIndent : Parser ()
982 | checkIndent =
983 | succeed (\indent column -> indent <= column)
984 | |= getIndent
985 | |= getCol
986 | |> andThen checkIndentHelp
987 |
988 | checkIndentHelp : Bool -> Parser ()
989 | checkIndentHelp isIndented =
990 | if isIndented then
991 | succeed ()
992 | else
993 | problem "expecting more spaces"
994 |
995 | So the `checkIndent` parser only succeeds when you are "deeper" than the
996 | current indent level. You could use this to parse Elm-style `let` expressions.
997 | -}
998 | getCol : Parser Int
999 | getCol =
1000 | A.getCol
1001 |
1002 |
1003 | {-| Editors think of code as a grid, but behind the scenes it is just a flat
1004 | array of UTF-16 characters. `getOffset` tells you your index in that flat
1005 | array. So if you chomp `"\n\n\n\n"` you are on row 5, column 1, and offset 4.
1006 |
1007 | **Note:** JavaScript uses a somewhat odd version of UTF-16 strings, so a single
1008 | character may take two slots. So in JavaScript, `'abc'.length === 3` but
1009 | `'🙈🙉🙊'.length === 6`. Try it out! And since Elm runs in JavaScript, the offset
1010 | moves by those rules.
1011 | -}
1012 | getOffset : Parser Int
1013 | getOffset =
1014 | A.getOffset
1015 |
1016 |
1017 | {-| Get the full string that is being parsed. You could use this to define
1018 | `getChompedString` or `mapChompedString` if you wanted:
1019 |
1020 | getChompedString : Parser a -> Parser String
1021 | getChompedString parser =
1022 | succeed String.slice
1023 | |= getOffset
1024 | |. parser
1025 | |= getOffset
1026 | |= getSource
1027 | -}
1028 | getSource : Parser String
1029 | getSource =
1030 | A.getSource
1031 |
1032 |
1033 |
1034 | -- VARIABLES
1035 |
1036 |
1037 | {-| Create a parser for variables. If we wanted to parse type variables in Elm,
1038 | we could try something like this:
1039 |
1040 | import Char
1041 | import Parser exposing (..)
1042 | import Set
1043 |
1044 | typeVar : Parser String
1045 | typeVar =
1046 | variable
1047 | { start = Char.isLower
1048 | , inner = \c -> Char.isAlphaNum c || c == '_'
1049 | , reserved = Set.fromList [ "let", "in", "case", "of" ]
1050 | }
1051 |
1052 | This is saying it _must_ start with a lower-case character. After that,
1053 | characters can be letters, numbers, or underscores. It is also saying that if
1054 | you run into any of these reserved names, it is definitely not a variable.
1055 | -}
1056 | variable :
1057 | { start : Char -> Bool
1058 | , inner : Char -> Bool
1059 | , reserved : Set.Set String
1060 | }
1061 | -> Parser String
1062 | variable i =
1063 | A.variable
1064 | { start = i.start
1065 | , inner = i.inner
1066 | , reserved = i.reserved
1067 | , expecting = ExpectingVariable
1068 | }
1069 |
1070 |
1071 |
1072 | -- SEQUENCES
1073 |
1074 |
1075 | {-| Handle things like lists and records, but you can customize the details
1076 | however you need. Say you want to parse C-style code blocks:
1077 |
1078 | import Parser exposing (Parser, Trailing(..))
1079 |
1080 | block : Parser (List Stmt)
1081 | block =
1082 | Parser.sequence
1083 | { start = "{"
1084 | , separator = ";"
1085 | , end = "}"
1086 | , spaces = spaces
1087 | , item = statement
1088 | , trailing = Mandatory -- demand a trailing semi-colon
1089 | }
1090 |
1091 | -- statement : Parser Stmt
1092 |
1093 | **Note:** If you need something more custom, do not be afraid to check
1094 | out the implementation and customize it for your case. It is better to
1095 | get nice error messages with a lower-level implementation than to try
1096 | to hack high-level parsers to do things they are not made for.
1097 | -}
1098 | sequence
1099 | : { start : String
1100 | , separator : String
1101 | , end : String
1102 | , spaces : Parser ()
1103 | , item : Parser a
1104 | , trailing : Trailing
1105 | }
1106 | -> Parser (List a)
1107 | sequence i =
1108 | A.sequence
1109 | { start = toToken i.start
1110 | , separator = toToken i.separator
1111 | , end = toToken i.end
1112 | , spaces = i.spaces
1113 | , item = i.item
1114 | , trailing = toAdvancedTrailing i.trailing
1115 | }
1116 |
1117 |
1118 | {-| What’s the deal with trailing commas? Are they `Forbidden`?
1119 | Are they `Optional`? Are they `Mandatory`? Welcome to [shapes
1120 | club](https://poorlydrawnlines.com/comic/shapes-club/)!
1121 | -}
1122 | type Trailing = Forbidden | Optional | Mandatory
1123 |
1124 |
1125 | toAdvancedTrailing : Trailing -> A.Trailing
1126 | toAdvancedTrailing trailing =
1127 | case trailing of
1128 | Forbidden -> A.Forbidden
1129 | Optional -> A.Optional
1130 | Mandatory -> A.Mandatory
1131 |
1132 |
1133 |
1134 | -- WHITESPACE
1135 |
1136 |
1137 | {-| Parse zero or more `' '`, `'\n'`, and `'\r'` characters.
1138 |
1139 | The implementation is pretty simple:
1140 |
1141 | spaces : Parser ()
1142 | spaces =
1143 | chompWhile (\c -> c == ' ' || c == '\n' || c == '\r')
1144 |
1145 | So if you need something different (like tabs) just define an alternative with
1146 | the necessary tweaks! Check out [`lineComment`](#lineComment) and
1147 | [`multiComment`](#multiComment) for more complex situations.
1148 | -}
1149 | spaces : Parser ()
1150 | spaces =
1151 | A.spaces
1152 |
1153 |
1154 | {-| Parse single-line comments:
1155 |
1156 | elm : Parser ()
1157 | elm =
1158 | lineComment "--"
1159 |
1160 | js : Parser ()
1161 | js =
1162 | lineComment "//"
1163 |
1164 | python : Parser ()
1165 | python =
1166 | lineComment "#"
1167 |
1168 | This parser is defined like this:
1169 |
1170 | lineComment : String -> Parser ()
1171 | lineComment str =
1172 | symbol str
1173 | |. chompUntilEndOr "\n"
1174 |
1175 | So it will consume the remainder of the line. If the file ends before you see
1176 | a newline, that is fine too.
1177 | -}
1178 | lineComment : String -> Parser ()
1179 | lineComment str =
1180 | A.lineComment (toToken str)
1181 |
1182 |
1183 | {-| Parse multi-line comments. So if you wanted to parse Elm whitespace or
1184 | JS whitespace, you could say:
1185 |
1186 | elm : Parser ()
1187 | elm =
1188 | loop 0 <| ifProgress <|
1189 | oneOf
1190 | [ lineComment "--"
1191 | , multiComment "{-" "-}" Nestable
1192 | , spaces
1193 | ]
1194 |
1195 | js : Parser ()
1196 | js =
1197 | loop 0 <| ifProgress <|
1198 | oneOf
1199 | [ lineComment "//"
1200 | , multiComment "/*" "*/" NotNestable
1201 | , chompWhile (\c -> c == ' ' || c == '\n' || c == '\r' || c == '\t')
1202 | ]
1203 |
1204 | ifProgress : Parser a -> Int -> Parser (Step Int ())
1205 | ifProgress parser offset =
1206 | succeed identity
1207 | |. parser
1208 | |= getOffset
1209 | |> map (\newOffset -> if offset == newOffset then Done () else Loop newOffset)
1210 |
1211 | **Note:** The fact that `spaces` comes last in the definition of `elm` is very
1212 | important! It can succeed without consuming any characters, so if it were the
1213 | first option, it would always succeed and bypass the others! (Same is true of
1214 | `chompWhile` in `js`.) This possibility of success without consumption is also
1215 | why wee need the `ifProgress` helper. It detects if there is no more whitespace
1216 | to consume.
1217 | -}
1218 | multiComment : String -> String -> Nestable -> Parser ()
1219 | multiComment open close nestable =
1220 | A.multiComment (toToken open) (toToken close) (toAdvancedNestable nestable)
1221 |
1222 |
1223 | {-| Not all languages handle multi-line comments the same. Multi-line comments
1224 | in C-style syntax are `NotNestable`, meaning they can be implemented like this:
1225 |
1226 | js : Parser ()
1227 | js =
1228 | symbol "/*"
1229 | |. chompUntil "*/"
1230 |
1231 | In fact, `multiComment "/*" "*/" NotNestable` *is* implemented like that! It is
1232 | very simple, but it does not allow you to nest comments like this:
1233 |
1234 | ```javascript
1235 | /*
1236 | line1
1237 | /* line2 */
1238 | line3
1239 | */
1240 | ```
1241 |
1242 | It would stop on the first `*/`, eventually throwing a syntax error on the
1243 | second `*/`. This can be pretty annoying in long files.
1244 |
1245 | Languages like Elm allow you to nest multi-line comments, but your parser needs
1246 | to be a bit fancier to handle this. After you start a comment, you have to
1247 | detect if there is another one inside it! And then you have to make sure all
1248 | the `{-` and `-}` match up properly! Saying `multiComment "{-" "-}" Nestable`
1249 | does all that for you.
1250 | -}
1251 | type Nestable = NotNestable | Nestable
1252 |
1253 |
1254 | toAdvancedNestable : Nestable -> A.Nestable
1255 | toAdvancedNestable nestable =
1256 | case nestable of
1257 | NotNestable -> A.NotNestable
1258 | Nestable -> A.Nestable
1259 |
--------------------------------------------------------------------------------
/src/Parser/Advanced.elm:
--------------------------------------------------------------------------------
1 | module Parser.Advanced exposing
2 | ( Parser, run, DeadEnd, inContext, Token(..)
3 | , int, float, number, symbol, keyword, variable, end
4 | , succeed, (|=), (|.), lazy, andThen, problem
5 | , oneOf, map, backtrackable, commit, token
6 | , sequence, Trailing(..), loop, Step(..)
7 | , spaces, lineComment, multiComment, Nestable(..)
8 | , getChompedString, chompIf, chompWhile, chompUntil, chompUntilEndOr, mapChompedString
9 | , withIndent, getIndent
10 | , getPosition, getRow, getCol, getOffset, getSource
11 | )
12 |
13 |
14 | {-|
15 |
16 | # Parsers
17 | @docs Parser, run, DeadEnd, inContext, Token
18 |
19 | * * *
20 | **Everything past here works just like in the
21 | [`Parser`](/packages/elm/parser/latest/Parser) module, except that `String`
22 | arguments become `Token` arguments, and you need to provide a `Problem` for
23 | certain scenarios.**
24 | * * *
25 |
26 | # Building Blocks
27 | @docs int, float, number, symbol, keyword, variable, end
28 |
29 | # Pipelines
30 | @docs succeed, (|=), (|.), lazy, andThen, problem
31 |
32 | # Branches
33 | @docs oneOf, map, backtrackable, commit, token
34 |
35 | # Loops
36 | @docs sequence, Trailing, loop, Step
37 |
38 | # Whitespace
39 | @docs spaces, lineComment, multiComment, Nestable
40 |
41 | # Chompers
42 | @docs getChompedString, chompIf, chompWhile, chompUntil, chompUntilEndOr, mapChompedString
43 |
44 | # Indentation
45 | @docs withIndent, getIndent
46 |
47 | # Positions
48 | @docs getPosition, getRow, getCol, getOffset, getSource
49 | -}
50 |
51 |
52 | import Char
53 | import Elm.Kernel.Parser
54 | import Set
55 |
56 |
57 |
58 | -- INFIX OPERATORS
59 |
60 |
61 | infix left 5 (|=) = keeper
62 | infix left 6 (|.) = ignorer
63 |
64 |
65 | {- NOTE: the (|.) oporator binds tighter to slightly reduce the amount
66 | of recursion in pipelines. For example:
67 |
68 | func
69 | |. a
70 | |. b
71 | |= c
72 | |. d
73 | |. e
74 |
75 | With the same precedence:
76 |
77 | (ignorer (ignorer (keeper (ignorer (ignorer func a) b) c) d) e)
78 |
79 | With higher precedence:
80 |
81 | keeper (ignorer (ignorer func a) b) (ignorer (ignorer c d) e)
82 |
83 | So the maximum call depth goes from 5 to 3.
84 | -}
85 |
86 |
87 |
88 | -- PARSERS
89 |
90 |
91 | {-| An advanced `Parser` gives two ways to improve your error messages:
92 |
93 | - `problem` — Instead of all errors being a `String`, you can create a
94 | custom type like `type Problem = BadIndent | BadKeyword String` and track
95 | problems much more precisely.
96 | - `context` — Error messages can be further improved when precise
97 | problems are paired with information about where you ran into trouble. By
98 | tracking the context, instead of saying “I found a bad keyword” you can say
99 | “I found a bad keyword when parsing a list” and give folks a better idea of
100 | what the parser thinks it is doing.
101 |
102 | I recommend starting with the simpler [`Parser`][parser] module though, and
103 | when you feel comfortable and want better error messages, you can create a type
104 | alias like this:
105 |
106 | ```elm
107 | import Parser.Advanced
108 |
109 | type alias MyParser a =
110 | Parser.Advanced.Parser Context Problem a
111 |
112 | type Context = Definition String | List | Record
113 |
114 | type Problem = BadIndent | BadKeyword String
115 | ```
116 |
117 | All of the functions from `Parser` should exist in `Parser.Advanced` in some
118 | form, allowing you to switch over pretty easily.
119 |
120 | [parser]: /packages/elm/parser/latest/Parser
121 | -}
122 | type Parser context problem value =
123 | Parser (State context -> PStep context problem value)
124 |
125 |
126 | type PStep context problem value
127 | = Good Bool value (State context)
128 | | Bad Bool (Bag context problem)
129 |
130 |
131 | type alias State context =
132 | { src : String
133 | , offset : Int
134 | , indent : Int
135 | , context : List (Located context)
136 | , row : Int
137 | , col : Int
138 | }
139 |
140 |
141 | type alias Located context =
142 | { row : Int
143 | , col : Int
144 | , context : context
145 | }
146 |
147 |
148 |
149 | -- RUN
150 |
151 |
152 | {-| This works just like [`Parser.run`](/packages/elm/parser/latest/Parser#run).
153 | The only difference is that when it fails, it has much more precise information
154 | for each dead end.
155 | -}
156 | run : Parser c x a -> String -> Result (List (DeadEnd c x)) a
157 | run (Parser parse) src =
158 | case parse { src = src, offset = 0, indent = 1, context = [], row = 1, col = 1} of
159 | Good _ value _ ->
160 | Ok value
161 |
162 | Bad _ bag ->
163 | Err (bagToList bag [])
164 |
165 |
166 |
167 | -- PROBLEMS
168 |
169 |
170 | {-| Say you are parsing a function named `viewHealthData` that contains a list.
171 | You might get a `DeadEnd` like this:
172 |
173 | ```elm
174 | { row = 18
175 | , col = 22
176 | , problem = UnexpectedComma
177 | , contextStack =
178 | [ { row = 14
179 | , col = 1
180 | , context = Definition "viewHealthData"
181 | }
182 | , { row = 15
183 | , col = 4
184 | , context = List
185 | }
186 | ]
187 | }
188 | ```
189 |
190 | We have a ton of information here! So in the error message, we can say that “I
191 | ran into an issue when parsing a list in the definition of `viewHealthData`. It
192 | looks like there is an extra comma.” Or maybe something even better!
193 |
194 | Furthermore, many parsers just put a mark where the problem manifested. By
195 | tracking the `row` and `col` of the context, we can show a much larger region
196 | as a way of indicating “I thought I was parsing this thing that starts over
197 | here.” Otherwise you can get very confusing error messages on a missing `]` or
198 | `}` or `)` because “I need more indentation” on something unrelated.
199 |
200 | **Note:** Rows and columns are counted like a text editor. The beginning is `row=1`
201 | and `col=1`. The `col` increments as characters are chomped. When a `\n` is chomped,
202 | `row` is incremented and `col` starts over again at `1`.
203 | -}
204 | type alias DeadEnd context problem =
205 | { row : Int
206 | , col : Int
207 | , problem : problem
208 | , contextStack : List { row : Int, col : Int, context : context }
209 | }
210 |
211 |
212 | type Bag c x
213 | = Empty
214 | | AddRight (Bag c x) (DeadEnd c x)
215 | | Append (Bag c x) (Bag c x)
216 |
217 |
218 | fromState : State c -> x -> Bag c x
219 | fromState s x =
220 | AddRight Empty (DeadEnd s.row s.col x s.context)
221 |
222 |
223 | fromInfo : Int -> Int -> x -> List (Located c) -> Bag c x
224 | fromInfo row col x context =
225 | AddRight Empty (DeadEnd row col x context)
226 |
227 |
228 | bagToList : Bag c x -> List (DeadEnd c x) -> List (DeadEnd c x)
229 | bagToList bag list =
230 | case bag of
231 | Empty ->
232 | list
233 |
234 | AddRight bag1 x ->
235 | bagToList bag1 (x :: list)
236 |
237 | Append bag1 bag2 ->
238 | bagToList bag1 (bagToList bag2 list)
239 |
240 |
241 |
242 | -- PRIMITIVES
243 |
244 |
245 | {-| Just like [`Parser.succeed`](Parser#succeed)
246 | -}
247 | succeed : a -> Parser c x a
248 | succeed a =
249 | Parser <| \s ->
250 | Good False a s
251 |
252 |
253 | {-| Just like [`Parser.problem`](Parser#problem) except you provide a custom
254 | type for your problem.
255 | -}
256 | problem : x -> Parser c x a
257 | problem x =
258 | Parser <| \s ->
259 | Bad False (fromState s x)
260 |
261 |
262 |
263 | -- MAPPING
264 |
265 |
266 | {-| Just like [`Parser.map`](Parser#map)
267 | -}
268 | map : (a -> b) -> Parser c x a -> Parser c x b
269 | map func (Parser parse) =
270 | Parser <| \s0 ->
271 | case parse s0 of
272 | Good p a s1 ->
273 | Good p (func a) s1
274 |
275 | Bad p x ->
276 | Bad p x
277 |
278 |
279 | map2 : (a -> b -> value) -> Parser c x a -> Parser c x b -> Parser c x value
280 | map2 func (Parser parseA) (Parser parseB) =
281 | Parser <| \s0 ->
282 | case parseA s0 of
283 | Bad p x ->
284 | Bad p x
285 |
286 | Good p1 a s1 ->
287 | case parseB s1 of
288 | Bad p2 x ->
289 | Bad (p1 || p2) x
290 |
291 | Good p2 b s2 ->
292 | Good (p1 || p2) (func a b) s2
293 |
294 |
295 | {-| Just like the [`(|=)`](Parser#|=) from the `Parser` module.
296 | -}
297 | keeper : Parser c x (a -> b) -> Parser c x a -> Parser c x b
298 | keeper parseFunc parseArg =
299 | map2 (<|) parseFunc parseArg
300 |
301 |
302 | {-| Just like the [`(|.)`](Parser#|.) from the `Parser` module.
303 | -}
304 | ignorer : Parser c x keep -> Parser c x ignore -> Parser c x keep
305 | ignorer keepParser ignoreParser =
306 | map2 always keepParser ignoreParser
307 |
308 |
309 |
310 | -- AND THEN
311 |
312 |
313 | {-| Just like [`Parser.andThen`](Parser#andThen)
314 | -}
315 | andThen : (a -> Parser c x b) -> Parser c x a -> Parser c x b
316 | andThen callback (Parser parseA) =
317 | Parser <| \s0 ->
318 | case parseA s0 of
319 | Bad p x ->
320 | Bad p x
321 |
322 | Good p1 a s1 ->
323 | let
324 | (Parser parseB) =
325 | callback a
326 | in
327 | case parseB s1 of
328 | Bad p2 x ->
329 | Bad (p1 || p2) x
330 |
331 | Good p2 b s2 ->
332 | Good (p1 || p2) b s2
333 |
334 |
335 |
336 | -- LAZY
337 |
338 |
339 | {-| Just like [`Parser.lazy`](Parser#lazy)
340 | -}
341 | lazy : (() -> Parser c x a) -> Parser c x a
342 | lazy thunk =
343 | Parser <| \s ->
344 | let
345 | (Parser parse) =
346 | thunk ()
347 | in
348 | parse s
349 |
350 |
351 |
352 | -- ONE OF
353 |
354 |
355 | {-| Just like [`Parser.oneOf`](Parser#oneOf)
356 | -}
357 | oneOf : List (Parser c x a) -> Parser c x a
358 | oneOf parsers =
359 | Parser <| \s -> oneOfHelp s Empty parsers
360 |
361 |
362 | oneOfHelp : State c -> Bag c x -> List (Parser c x a) -> PStep c x a
363 | oneOfHelp s0 bag parsers =
364 | case parsers of
365 | [] ->
366 | Bad False bag
367 |
368 | Parser parse :: remainingParsers ->
369 | case parse s0 of
370 | Good _ _ _ as step ->
371 | step
372 |
373 | Bad p x as step ->
374 | if p then
375 | step
376 | else
377 | oneOfHelp s0 (Append bag x) remainingParsers
378 |
379 |
380 |
381 | -- LOOP
382 |
383 |
384 | {-| Just like [`Parser.Step`](Parser#Step)
385 | -}
386 | type Step state a
387 | = Loop state
388 | | Done a
389 |
390 |
391 | {-| Just like [`Parser.loop`](Parser#loop)
392 | -}
393 | loop : state -> (state -> Parser c x (Step state a)) -> Parser c x a
394 | loop state callback =
395 | Parser <| \s ->
396 | loopHelp False state callback s
397 |
398 |
399 | loopHelp : Bool -> state -> (state -> Parser c x (Step state a)) -> State c -> PStep c x a
400 | loopHelp p state callback s0 =
401 | let
402 | (Parser parse) =
403 | callback state
404 | in
405 | case parse s0 of
406 | Good p1 step s1 ->
407 | case step of
408 | Loop newState ->
409 | loopHelp (p || p1) newState callback s1
410 |
411 | Done result ->
412 | Good (p || p1) result s1
413 |
414 | Bad p1 x ->
415 | Bad (p || p1) x
416 |
417 |
418 |
419 | -- BACKTRACKABLE
420 |
421 |
422 | {-| Just like [`Parser.backtrackable`](Parser#backtrackable)
423 | -}
424 | backtrackable : Parser c x a -> Parser c x a
425 | backtrackable (Parser parse) =
426 | Parser <| \s0 ->
427 | case parse s0 of
428 | Bad _ x ->
429 | Bad False x
430 |
431 | Good _ a s1 ->
432 | Good False a s1
433 |
434 |
435 | {-| Just like [`Parser.commit`](Parser#commit)
436 | -}
437 | commit : a -> Parser c x a
438 | commit a =
439 | Parser <| \s -> Good True a s
440 |
441 |
442 |
443 | -- SYMBOL
444 |
445 |
446 | {-| Just like [`Parser.symbol`](Parser#symbol) except you provide a `Token` to
447 | clearly indicate your custom type of problems:
448 |
449 | comma : Parser Context Problem ()
450 | comma =
451 | symbol (Token "," ExpectingComma)
452 |
453 | -}
454 | symbol : Token x -> Parser c x ()
455 | symbol =
456 | token
457 |
458 |
459 |
460 | -- KEYWORD
461 |
462 |
463 | {-| Just like [`Parser.keyword`](Parser#keyword) except you provide a `Token`
464 | to clearly indicate your custom type of problems:
465 |
466 | let_ : Parser Context Problem ()
467 | let_ =
468 | symbol (Token "let" ExpectingLet)
469 |
470 | Note that this would fail to chomp `letter` because of the subsequent
471 | characters. Use `token` if you do not want that last letter check.
472 | -}
473 | keyword : Token x -> Parser c x ()
474 | keyword (Token kwd expecting) =
475 | let
476 | progress =
477 | not (String.isEmpty kwd)
478 | in
479 | Parser <| \s ->
480 | let
481 | (newOffset, newRow, newCol) =
482 | isSubString kwd s.offset s.row s.col s.src
483 | in
484 | if newOffset == -1 || 0 <= isSubChar (\c -> Char.isAlphaNum c || c == '_') newOffset s.src then
485 | Bad False (fromState s expecting)
486 | else
487 | Good progress ()
488 | { src = s.src
489 | , offset = newOffset
490 | , indent = s.indent
491 | , context = s.context
492 | , row = newRow
493 | , col = newCol
494 | }
495 |
496 |
497 |
498 | -- TOKEN
499 |
500 |
501 | {-| With the simpler `Parser` module, you could just say `symbol ","` and
502 | parse all the commas you wanted. But now that we have a custom type for our
503 | problems, we actually have to specify that as well. So anywhere you just used
504 | a `String` in the simpler module, you now use a `Token Problem` in the advanced
505 | module:
506 |
507 | type Problem
508 | = ExpectingComma
509 | | ExpectingListEnd
510 |
511 | comma : Token Problem
512 | comma =
513 | Token "," ExpectingComma
514 |
515 | listEnd : Token Problem
516 | listEnd =
517 | Token "]" ExpectingListEnd
518 |
519 | You can be creative with your custom type. Maybe you want a lot of detail.
520 | Maybe you want looser categories. It is a custom type. Do what makes sense for
521 | you!
522 | -}
523 | type Token x = Token String x
524 |
525 |
526 | {-| Just like [`Parser.token`](Parser#token) except you provide a `Token`
527 | specifying your custom type of problems.
528 | -}
529 | token : Token x -> Parser c x ()
530 | token (Token str expecting) =
531 | let
532 | progress =
533 | not (String.isEmpty str)
534 | in
535 | Parser <| \s ->
536 | let
537 | (newOffset, newRow, newCol) =
538 | isSubString str s.offset s.row s.col s.src
539 | in
540 | if newOffset == -1 then
541 | Bad False (fromState s expecting)
542 | else
543 | Good progress ()
544 | { src = s.src
545 | , offset = newOffset
546 | , indent = s.indent
547 | , context = s.context
548 | , row = newRow
549 | , col = newCol
550 | }
551 |
552 |
553 |
554 | -- INT
555 |
556 |
557 | {-| Just like [`Parser.int`](Parser#int) where you have to handle negation
558 | yourself. The only difference is that you provide a two potential problems:
559 |
560 | int : x -> x -> Parser c x Int
561 | int expecting invalid =
562 | number
563 | { int = Ok identity
564 | , hex = Err invalid
565 | , octal = Err invalid
566 | , binary = Err invalid
567 | , float = Err invalid
568 | , invalid = invalid
569 | , expecting = expecting
570 | }
571 |
572 | You can use problems like `ExpectingInt` and `InvalidNumber`.
573 | -}
574 | int : x -> x -> Parser c x Int
575 | int expecting invalid =
576 | number
577 | { int = Ok identity
578 | , hex = Err invalid
579 | , octal = Err invalid
580 | , binary = Err invalid
581 | , float = Err invalid
582 | , invalid = invalid
583 | , expecting = expecting
584 | }
585 |
586 |
587 |
588 | -- FLOAT
589 |
590 |
591 | {-| Just like [`Parser.float`](Parser#float) where you have to handle negation
592 | yourself. The only difference is that you provide a two potential problems:
593 |
594 | float : x -> x -> Parser c x Float
595 | float expecting invalid =
596 | number
597 | { int = Ok toFloat
598 | , hex = Err invalid
599 | , octal = Err invalid
600 | , binary = Err invalid
601 | , float = Ok identity
602 | , invalid = invalid
603 | , expecting = expecting
604 | }
605 |
606 | You can use problems like `ExpectingFloat` and `InvalidNumber`.
607 | -}
608 | float : x -> x -> Parser c x Float
609 | float expecting invalid =
610 | number
611 | { int = Ok toFloat
612 | , hex = Err invalid
613 | , octal = Err invalid
614 | , binary = Err invalid
615 | , float = Ok identity
616 | , invalid = invalid
617 | , expecting = expecting
618 | }
619 |
620 |
621 |
622 | -- NUMBER
623 |
624 |
625 | {-| Just like [`Parser.number`](Parser#number) where you have to handle
626 | negation yourself. The only difference is that you provide all the potential
627 | problems.
628 | -}
629 | number
630 | : { int : Result x (Int -> a)
631 | , hex : Result x (Int -> a)
632 | , octal : Result x (Int -> a)
633 | , binary : Result x (Int -> a)
634 | , float : Result x (Float -> a)
635 | , invalid : x
636 | , expecting : x
637 | }
638 | -> Parser c x a
639 | number c =
640 | Parser <| \s ->
641 | if isAsciiCode 0x30 {- 0 -} s.offset s.src then
642 | let
643 | zeroOffset = s.offset + 1
644 | baseOffset = zeroOffset + 1
645 | in
646 | if isAsciiCode 0x78 {- x -} zeroOffset s.src then
647 | finalizeInt c.invalid c.hex baseOffset (consumeBase16 baseOffset s.src) s
648 | else if isAsciiCode 0x6F {- o -} zeroOffset s.src then
649 | finalizeInt c.invalid c.octal baseOffset (consumeBase 8 baseOffset s.src) s
650 | else if isAsciiCode 0x62 {- b -} zeroOffset s.src then
651 | finalizeInt c.invalid c.binary baseOffset (consumeBase 2 baseOffset s.src) s
652 | else
653 | finalizeFloat c.invalid c.expecting c.int c.float (zeroOffset, 0) s
654 |
655 | else
656 | finalizeFloat c.invalid c.expecting c.int c.float (consumeBase 10 s.offset s.src) s
657 |
658 |
659 | consumeBase : Int -> Int -> String -> (Int, Int)
660 | consumeBase =
661 | Elm.Kernel.Parser.consumeBase
662 |
663 |
664 | consumeBase16 : Int -> String -> (Int, Int)
665 | consumeBase16 =
666 | Elm.Kernel.Parser.consumeBase16
667 |
668 |
669 | finalizeInt : x -> Result x (Int -> a) -> Int -> (Int, Int) -> State c -> PStep c x a
670 | finalizeInt invalid handler startOffset (endOffset, n) s =
671 | case handler of
672 | Err x ->
673 | Bad True (fromState s x)
674 |
675 | Ok toValue ->
676 | if startOffset == endOffset
677 | then Bad (s.offset < startOffset) (fromState s invalid)
678 | else Good True (toValue n) (bumpOffset endOffset s)
679 |
680 |
681 | bumpOffset : Int -> State c -> State c
682 | bumpOffset newOffset s =
683 | { src = s.src
684 | , offset = newOffset
685 | , indent = s.indent
686 | , context = s.context
687 | , row = s.row
688 | , col = s.col + (newOffset - s.offset)
689 | }
690 |
691 |
692 | finalizeFloat : x -> x -> Result x (Int -> a) -> Result x (Float -> a) -> (Int, Int) -> State c -> PStep c x a
693 | finalizeFloat invalid expecting intSettings floatSettings intPair s =
694 | let
695 | intOffset = Tuple.first intPair
696 | floatOffset = consumeDotAndExp intOffset s.src
697 | in
698 | if floatOffset < 0 then
699 | Bad True (fromInfo s.row (s.col - (floatOffset + s.offset)) invalid s.context)
700 |
701 | else if s.offset == floatOffset then
702 | Bad False (fromState s expecting)
703 |
704 | else if intOffset == floatOffset then
705 | finalizeInt invalid intSettings s.offset intPair s
706 |
707 | else
708 | case floatSettings of
709 | Err x ->
710 | Bad True (fromState s invalid)
711 |
712 | Ok toValue ->
713 | case String.toFloat (String.slice s.offset floatOffset s.src) of
714 | Nothing -> Bad True (fromState s invalid)
715 | Just n -> Good True (toValue n) (bumpOffset floatOffset s)
716 |
717 |
718 | --
719 | -- On a failure, returns negative index of problem.
720 | --
721 | consumeDotAndExp : Int -> String -> Int
722 | consumeDotAndExp offset src =
723 | if isAsciiCode 0x2E {- . -} offset src then
724 | consumeExp (chompBase10 (offset + 1) src) src
725 | else
726 | consumeExp offset src
727 |
728 |
729 | --
730 | -- On a failure, returns negative index of problem.
731 | --
732 | consumeExp : Int -> String -> Int
733 | consumeExp offset src =
734 | if isAsciiCode 0x65 {- e -} offset src || isAsciiCode 0x45 {- E -} offset src then
735 | let
736 | eOffset = offset + 1
737 |
738 | expOffset =
739 | if isAsciiCode 0x2B {- + -} eOffset src || isAsciiCode 0x2D {- - -} eOffset src then
740 | eOffset + 1
741 | else
742 | eOffset
743 |
744 | newOffset = chompBase10 expOffset src
745 | in
746 | if expOffset == newOffset then
747 | -newOffset
748 | else
749 | newOffset
750 |
751 | else
752 | offset
753 |
754 |
755 | chompBase10 : Int -> String -> Int
756 | chompBase10 =
757 | Elm.Kernel.Parser.chompBase10
758 |
759 |
760 |
761 | -- END
762 |
763 |
764 | {-| Just like [`Parser.end`](Parser#end) except you provide the problem that
765 | arises when the parser is not at the end of the input.
766 | -}
767 | end : x -> Parser c x ()
768 | end x =
769 | Parser <| \s ->
770 | if String.length s.src == s.offset then
771 | Good False () s
772 | else
773 | Bad False (fromState s x)
774 |
775 |
776 |
777 | -- CHOMPED STRINGS
778 |
779 |
780 | {-| Just like [`Parser.getChompedString`](Parser#getChompedString)
781 | -}
782 | getChompedString : Parser c x a -> Parser c x String
783 | getChompedString parser =
784 | mapChompedString always parser
785 |
786 |
787 | {-| Just like [`Parser.mapChompedString`](Parser#mapChompedString)
788 | -}
789 | mapChompedString : (String -> a -> b) -> Parser c x a -> Parser c x b
790 | mapChompedString func (Parser parse) =
791 | Parser <| \s0 ->
792 | case parse s0 of
793 | Bad p x ->
794 | Bad p x
795 |
796 | Good p a s1 ->
797 | Good p (func (String.slice s0.offset s1.offset s0.src) a) s1
798 |
799 |
800 |
801 | -- CHOMP IF
802 |
803 |
804 | {-| Just like [`Parser.chompIf`](Parser#chompIf) except you provide a problem
805 | in case a character cannot be chomped.
806 | -}
807 | chompIf : (Char -> Bool) -> x -> Parser c x ()
808 | chompIf isGood expecting =
809 | Parser <| \s ->
810 | let
811 | newOffset = isSubChar isGood s.offset s.src
812 | in
813 | -- not found
814 | if newOffset == -1 then
815 | Bad False (fromState s expecting)
816 |
817 | -- newline
818 | else if newOffset == -2 then
819 | Good True ()
820 | { src = s.src
821 | , offset = s.offset + 1
822 | , indent = s.indent
823 | , context = s.context
824 | , row = s.row + 1
825 | , col = 1
826 | }
827 |
828 | -- found
829 | else
830 | Good True ()
831 | { src = s.src
832 | , offset = newOffset
833 | , indent = s.indent
834 | , context = s.context
835 | , row = s.row
836 | , col = s.col + 1
837 | }
838 |
839 |
840 |
841 | -- CHOMP WHILE
842 |
843 |
844 | {-| Just like [`Parser.chompWhile`](Parser#chompWhile)
845 | -}
846 | chompWhile : (Char -> Bool) -> Parser c x ()
847 | chompWhile isGood =
848 | Parser <| \s ->
849 | chompWhileHelp isGood s.offset s.row s.col s
850 |
851 |
852 | chompWhileHelp : (Char -> Bool) -> Int -> Int -> Int -> State c -> PStep c x ()
853 | chompWhileHelp isGood offset row col s0 =
854 | let
855 | newOffset = isSubChar isGood offset s0.src
856 | in
857 | -- no match
858 | if newOffset == -1 then
859 | Good (s0.offset < offset) ()
860 | { src = s0.src
861 | , offset = offset
862 | , indent = s0.indent
863 | , context = s0.context
864 | , row = row
865 | , col = col
866 | }
867 |
868 | -- matched a newline
869 | else if newOffset == -2 then
870 | chompWhileHelp isGood (offset + 1) (row + 1) 1 s0
871 |
872 | -- normal match
873 | else
874 | chompWhileHelp isGood newOffset row (col + 1) s0
875 |
876 |
877 |
878 | -- CHOMP UNTIL
879 |
880 |
881 | {-| Just like [`Parser.chompUntil`](Parser#chompUntil) except you provide a
882 | `Token` in case you chomp all the way to the end of the input without finding
883 | what you need.
884 | -}
885 | chompUntil : Token x -> Parser c x ()
886 | chompUntil (Token str expecting) =
887 | Parser <| \s ->
888 | let
889 | (newOffset, newRow, newCol) =
890 | findSubString str s.offset s.row s.col s.src
891 | in
892 | if newOffset == -1 then
893 | Bad False (fromInfo newRow newCol expecting s.context)
894 |
895 | else
896 | Good (s.offset < newOffset) ()
897 | { src = s.src
898 | , offset = newOffset
899 | , indent = s.indent
900 | , context = s.context
901 | , row = newRow
902 | , col = newCol
903 | }
904 |
905 |
906 | {-| Just like [`Parser.chompUntilEndOr`](Parser#chompUntilEndOr)
907 | -}
908 | chompUntilEndOr : String -> Parser c x ()
909 | chompUntilEndOr str =
910 | Parser <| \s ->
911 | let
912 | (newOffset, newRow, newCol) =
913 | Elm.Kernel.Parser.findSubString str s.offset s.row s.col s.src
914 |
915 | adjustedOffset =
916 | if newOffset < 0 then String.length s.src else newOffset
917 | in
918 | Good (s.offset < adjustedOffset) ()
919 | { src = s.src
920 | , offset = adjustedOffset
921 | , indent = s.indent
922 | , context = s.context
923 | , row = newRow
924 | , col = newCol
925 | }
926 |
927 |
928 |
929 | -- CONTEXT
930 |
931 |
932 | {-| This is how you mark that you are in a certain context. For example, here
933 | is a rough outline of some code that uses `inContext` to mark when you are
934 | parsing a specific definition:
935 |
936 | import Char
937 | import Parser.Advanced exposing (..)
938 | import Set
939 |
940 | type Context
941 | = Definition String
942 | | List
943 |
944 | definition : Parser Context Problem Expr
945 | definition =
946 | functionName
947 | |> andThen definitionBody
948 |
949 | definitionBody : String -> Parser Context Problem Expr
950 | definitionBody name =
951 | inContext (Definition name) <|
952 | succeed (Function name)
953 | |= arguments
954 | |. symbol (Token "=" ExpectingEquals)
955 | |= expression
956 |
957 | functionName : Parser c Problem String
958 | functionName =
959 | variable
960 | { start = Char.isLower
961 | , inner = Char.isAlphaNum
962 | , reserved = Set.fromList ["let","in"]
963 | , expecting = ExpectingFunctionName
964 | }
965 |
966 | First we parse the function name, and then we parse the rest of the definition.
967 | Importantly, we call `inContext` so that any dead end that occurs in
968 | `definitionBody` will get this extra context information. That way you can say
969 | things like, “I was expecting an equals sign in the `view` definition.” Context!
970 | -}
971 | inContext : context -> Parser context x a -> Parser context x a
972 | inContext context (Parser parse) =
973 | Parser <| \s0 ->
974 | case parse (changeContext (Located s0.row s0.col context :: s0.context) s0) of
975 | Good p a s1 ->
976 | Good p a (changeContext s0.context s1)
977 |
978 | Bad _ _ as step ->
979 | step
980 |
981 |
982 | changeContext : List (Located c) -> State c -> State c
983 | changeContext newContext s =
984 | { src = s.src
985 | , offset = s.offset
986 | , indent = s.indent
987 | , context = newContext
988 | , row = s.row
989 | , col = s.col
990 | }
991 |
992 |
993 |
994 | -- INDENTATION
995 |
996 |
997 | {-| Just like [`Parser.getIndent`](Parser#getIndent)
998 | -}
999 | getIndent : Parser c x Int
1000 | getIndent =
1001 | Parser <| \s -> Good False s.indent s
1002 |
1003 |
1004 | {-| Just like [`Parser.withIndent`](Parser#withIndent)
1005 | -}
1006 | withIndent : Int -> Parser c x a -> Parser c x a
1007 | withIndent newIndent (Parser parse) =
1008 | Parser <| \s0 ->
1009 | case parse (changeIndent newIndent s0) of
1010 | Good p a s1 ->
1011 | Good p a (changeIndent s0.indent s1)
1012 |
1013 | Bad p x ->
1014 | Bad p x
1015 |
1016 |
1017 | changeIndent : Int -> State c -> State c
1018 | changeIndent newIndent s =
1019 | { src = s.src
1020 | , offset = s.offset
1021 | , indent = newIndent
1022 | , context = s.context
1023 | , row = s.row
1024 | , col = s.col
1025 | }
1026 |
1027 |
1028 |
1029 | -- POSITION
1030 |
1031 |
1032 | {-| Just like [`Parser.getPosition`](Parser#getPosition)
1033 | -}
1034 | getPosition : Parser c x (Int, Int)
1035 | getPosition =
1036 | Parser <| \s -> Good False (s.row, s.col) s
1037 |
1038 |
1039 | {-| Just like [`Parser.getRow`](Parser#getRow)
1040 | -}
1041 | getRow : Parser c x Int
1042 | getRow =
1043 | Parser <| \s -> Good False s.row s
1044 |
1045 |
1046 | {-| Just like [`Parser.getCol`](Parser#getCol)
1047 | -}
1048 | getCol : Parser c x Int
1049 | getCol =
1050 | Parser <| \s -> Good False s.col s
1051 |
1052 |
1053 | {-| Just like [`Parser.getOffset`](Parser#getOffset)
1054 | -}
1055 | getOffset : Parser c x Int
1056 | getOffset =
1057 | Parser <| \s -> Good False s.offset s
1058 |
1059 |
1060 | {-| Just like [`Parser.getSource`](Parser#getSource)
1061 | -}
1062 | getSource : Parser c x String
1063 | getSource =
1064 | Parser <| \s -> Good False s.src s
1065 |
1066 |
1067 |
1068 | -- LOW-LEVEL HELPERS
1069 |
1070 |
1071 | {-| When making a fast parser, you want to avoid allocation as much as
1072 | possible. That means you never want to mess with the source string, only
1073 | keep track of an offset into that string.
1074 |
1075 | You use `isSubString` like this:
1076 |
1077 | isSubString "let" offset row col "let x = 4 in x"
1078 | --==> ( newOffset, newRow, newCol )
1079 |
1080 | You are looking for `"let"` at a given `offset`. On failure, the
1081 | `newOffset` is `-1`. On success, the `newOffset` is the new offset. With
1082 | our `"let"` example, it would be `offset + 3`.
1083 |
1084 | You also provide the current `row` and `col` which do not align with
1085 | `offset` in a clean way. For example, when you see a `\n` you are at
1086 | `row = row + 1` and `col = 1`. Furthermore, some UTF16 characters are
1087 | two words wide, so even if there are no newlines, `offset` and `col`
1088 | may not be equal.
1089 | -}
1090 | isSubString : String -> Int -> Int -> Int -> String -> (Int, Int, Int)
1091 | isSubString =
1092 | Elm.Kernel.Parser.isSubString
1093 |
1094 |
1095 | {-| Again, when parsing, you want to allocate as little as possible.
1096 | So this function lets you say:
1097 |
1098 | isSubChar isSpace offset "this is the source string"
1099 | --==> newOffset
1100 |
1101 | The `(Char -> Bool)` argument is called a predicate.
1102 | The `newOffset` value can be a few different things:
1103 |
1104 | - `-1` means that the predicate failed
1105 | - `-2` means the predicate succeeded with a `\n`
1106 | - otherwise you will get `offset + 1` or `offset + 2`
1107 | depending on whether the UTF16 character is one or two
1108 | words wide.
1109 | -}
1110 | isSubChar : (Char -> Bool) -> Int -> String -> Int
1111 | isSubChar =
1112 | Elm.Kernel.Parser.isSubChar
1113 |
1114 |
1115 | {-| Check an offset in the string. Is it equal to the given Char? Are they
1116 | both ASCII characters?
1117 | -}
1118 | isAsciiCode : Int -> Int -> String -> Bool
1119 | isAsciiCode =
1120 | Elm.Kernel.Parser.isAsciiCode
1121 |
1122 |
1123 | {-| Find a substring after a given offset.
1124 |
1125 | findSubString "42" offset row col "Is 42 the answer?"
1126 | --==> (newOffset, newRow, newCol)
1127 |
1128 | If `offset = 0` we would get `(3, 1, 4)`
1129 | If `offset = 7` we would get `(-1, 1, 18)`
1130 | -}
1131 | findSubString : String -> Int -> Int -> Int -> String -> (Int, Int, Int)
1132 | findSubString =
1133 | Elm.Kernel.Parser.findSubString
1134 |
1135 |
1136 |
1137 | -- VARIABLES
1138 |
1139 |
1140 | {-| Just like [`Parser.variable`](Parser#variable) except you specify the
1141 | problem yourself.
1142 | -}
1143 | variable :
1144 | { start : Char -> Bool
1145 | , inner : Char -> Bool
1146 | , reserved : Set.Set String
1147 | , expecting : x
1148 | }
1149 | -> Parser c x String
1150 | variable i =
1151 | Parser <| \s ->
1152 | let
1153 | firstOffset =
1154 | isSubChar i.start s.offset s.src
1155 | in
1156 | if firstOffset == -1 then
1157 | Bad False (fromState s i.expecting)
1158 | else
1159 | let
1160 | s1 =
1161 | if firstOffset == -2 then
1162 | varHelp i.inner (s.offset + 1) (s.row + 1) 1 s.src s.indent s.context
1163 | else
1164 | varHelp i.inner firstOffset s.row (s.col + 1) s.src s.indent s.context
1165 |
1166 | name =
1167 | String.slice s.offset s1.offset s.src
1168 | in
1169 | if Set.member name i.reserved then
1170 | Bad False (fromState s i.expecting)
1171 | else
1172 | Good True name s1
1173 |
1174 |
1175 | varHelp : (Char -> Bool) -> Int -> Int -> Int -> String -> Int -> List (Located c) -> State c
1176 | varHelp isGood offset row col src indent context =
1177 | let
1178 | newOffset = isSubChar isGood offset src
1179 | in
1180 | if newOffset == -1 then
1181 | { src = src
1182 | , offset = offset
1183 | , indent = indent
1184 | , context = context
1185 | , row = row
1186 | , col = col
1187 | }
1188 |
1189 | else if newOffset == -2 then
1190 | varHelp isGood (offset + 1) (row + 1) 1 src indent context
1191 |
1192 | else
1193 | varHelp isGood newOffset row (col + 1) src indent context
1194 |
1195 |
1196 |
1197 | -- SEQUENCES
1198 |
1199 |
1200 | {-| Just like [`Parser.sequence`](Parser#sequence) except with a `Token` for
1201 | the start, separator, and end. That way you can specify your custom type of
1202 | problem for when something is not found.
1203 | -}
1204 | sequence
1205 | : { start : Token x
1206 | , separator : Token x
1207 | , end : Token x
1208 | , spaces : Parser c x ()
1209 | , item : Parser c x a
1210 | , trailing : Trailing
1211 | }
1212 | -> Parser c x (List a)
1213 | sequence i =
1214 | skip (token i.start) <|
1215 | skip i.spaces <|
1216 | sequenceEnd (token i.end) i.spaces i.item (token i.separator) i.trailing
1217 |
1218 |
1219 | {-| What’s the deal with trailing commas? Are they `Forbidden`?
1220 | Are they `Optional`? Are they `Mandatory`? Welcome to [shapes
1221 | club](https://poorlydrawnlines.com/comic/shapes-club/)!
1222 | -}
1223 | type Trailing = Forbidden | Optional | Mandatory
1224 |
1225 |
1226 | skip : Parser c x ignore -> Parser c x keep -> Parser c x keep
1227 | skip iParser kParser =
1228 | map2 revAlways iParser kParser
1229 |
1230 |
1231 | revAlways : a -> b -> b
1232 | revAlways _ b =
1233 | b
1234 |
1235 |
1236 | sequenceEnd : Parser c x () -> Parser c x () -> Parser c x a -> Parser c x () -> Trailing -> Parser c x (List a)
1237 | sequenceEnd ender ws parseItem sep trailing =
1238 | let
1239 | chompRest item =
1240 | case trailing of
1241 | Forbidden ->
1242 | loop [item] (sequenceEndForbidden ender ws parseItem sep)
1243 |
1244 | Optional ->
1245 | loop [item] (sequenceEndOptional ender ws parseItem sep)
1246 |
1247 | Mandatory ->
1248 | ignorer
1249 | ( skip ws <| skip sep <| skip ws <|
1250 | loop [item] (sequenceEndMandatory ws parseItem sep)
1251 | )
1252 | ender
1253 | in
1254 | oneOf
1255 | [ parseItem |> andThen chompRest
1256 | , ender |> map (\_ -> [])
1257 | ]
1258 |
1259 |
1260 | sequenceEndForbidden : Parser c x () -> Parser c x () -> Parser c x a -> Parser c x () -> List a -> Parser c x (Step (List a) (List a))
1261 | sequenceEndForbidden ender ws parseItem sep revItems =
1262 | let
1263 | chompRest item =
1264 | sequenceEndForbidden ender ws parseItem sep (item :: revItems)
1265 | in
1266 | skip ws <|
1267 | oneOf
1268 | [ skip sep <| skip ws <| map (\item -> Loop (item :: revItems)) parseItem
1269 | , ender |> map (\_ -> Done (List.reverse revItems))
1270 | ]
1271 |
1272 |
1273 | sequenceEndOptional : Parser c x () -> Parser c x () -> Parser c x a -> Parser c x () -> List a -> Parser c x (Step (List a) (List a))
1274 | sequenceEndOptional ender ws parseItem sep revItems =
1275 | let
1276 | parseEnd =
1277 | map (\_ -> Done (List.reverse revItems)) ender
1278 | in
1279 | skip ws <|
1280 | oneOf
1281 | [ skip sep <| skip ws <|
1282 | oneOf
1283 | [ parseItem |> map (\item -> Loop (item :: revItems))
1284 | , parseEnd
1285 | ]
1286 | , parseEnd
1287 | ]
1288 |
1289 |
1290 | sequenceEndMandatory : Parser c x () -> Parser c x a -> Parser c x () -> List a -> Parser c x (Step (List a) (List a))
1291 | sequenceEndMandatory ws parseItem sep revItems =
1292 | oneOf
1293 | [ map (\item -> Loop (item :: revItems)) <|
1294 | ignorer parseItem (ignorer ws (ignorer sep ws))
1295 | , map (\_ -> Done (List.reverse revItems)) (succeed ())
1296 | ]
1297 |
1298 |
1299 |
1300 | -- WHITESPACE
1301 |
1302 |
1303 | {-| Just like [`Parser.spaces`](Parser#spaces)
1304 | -}
1305 | spaces : Parser c x ()
1306 | spaces =
1307 | chompWhile (\c -> c == ' ' || c == '\n' || c == '\r')
1308 |
1309 |
1310 | {-| Just like [`Parser.lineComment`](Parser#lineComment) except you provide a
1311 | `Token` describing the starting symbol.
1312 | -}
1313 | lineComment : Token x -> Parser c x ()
1314 | lineComment start =
1315 | ignorer (token start) (chompUntilEndOr "\n")
1316 |
1317 |
1318 | {-| Just like [`Parser.multiComment`](Parser#multiComment) except with a
1319 | `Token` for the open and close symbols.
1320 | -}
1321 | multiComment : Token x -> Token x -> Nestable -> Parser c x ()
1322 | multiComment open close nestable =
1323 | case nestable of
1324 | NotNestable ->
1325 | ignorer (token open) (chompUntil close)
1326 |
1327 | Nestable ->
1328 | nestableComment open close
1329 |
1330 |
1331 | {-| Works just like [`Parser.Nestable`](Parser#nestable) to help distinguish
1332 | between unnestable `/*` `*/` comments like in JS and nestable `{-` `-}`
1333 | comments like in Elm.
1334 | -}
1335 | type Nestable = NotNestable | Nestable
1336 |
1337 |
1338 | nestableComment : Token x -> Token x -> Parser c x ()
1339 | nestableComment (Token oStr oX as open) (Token cStr cX as close) =
1340 | case String.uncons oStr of
1341 | Nothing ->
1342 | problem oX
1343 |
1344 | Just (openChar, _) ->
1345 | case String.uncons cStr of
1346 | Nothing ->
1347 | problem cX
1348 |
1349 | Just (closeChar, _) ->
1350 | let
1351 | isNotRelevant char =
1352 | char /= openChar && char /= closeChar
1353 |
1354 | chompOpen =
1355 | token open
1356 | in
1357 | ignorer chompOpen (nestableHelp isNotRelevant chompOpen (token close) cX 1)
1358 |
1359 |
1360 | nestableHelp : (Char -> Bool) -> Parser c x () -> Parser c x () -> x -> Int -> Parser c x ()
1361 | nestableHelp isNotRelevant open close expectingClose nestLevel =
1362 | skip (chompWhile isNotRelevant) <|
1363 | oneOf
1364 | [ if nestLevel == 1 then
1365 | close
1366 | else
1367 | close
1368 | |> andThen (\_ -> nestableHelp isNotRelevant open close expectingClose (nestLevel - 1))
1369 | , open
1370 | |> andThen (\_ -> nestableHelp isNotRelevant open close expectingClose (nestLevel + 1))
1371 | , chompIf isChar expectingClose
1372 | |> andThen (\_ -> nestableHelp isNotRelevant open close expectingClose nestLevel)
1373 | ]
1374 |
1375 |
1376 | isChar : Char -> Bool
1377 | isChar char =
1378 | True
1379 |
--------------------------------------------------------------------------------