├── .gitignore ├── LICENSE ├── README.md ├── elm.json └── src ├── Elm └── Kernel │ └── Regex.js └── Regex.elm /.gitignore: -------------------------------------------------------------------------------- 1 | elm-stuff -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright 2017-present Evan Czaplicki 2 | 3 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 4 | 5 | 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 6 | 7 | 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 8 | 9 | 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. 10 | 11 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 12 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Regex in Elm 2 | 3 | **Generally speaking, it will be easier and nicer to use a parsing library like [`elm/parser`][elm] instead of this.** 4 | 5 | [elm]: https://package.elm-lang.org/packages/elm/parser/latest 6 | 7 | That said, sometimes you may want the kind of regular expressions that appear in JavaScript. Maybe you found some regex on StackOverflow and just want to place it in your code directly. This library supports that scenario. 8 | 9 | 10 | 11 | ## Future Plans 12 | 13 | I hope that _other_ packages will spring up for common parsing tasks, making `regex` less and less useful. 14 | 15 | So instead of searching Stack Overflow for "email regex" we could have a well-tested package for validating emails. Instead of searching Stack Overflow for "phone numbers" we could have a well-tested package for validating phone numbers that gathered a bunch of helpful information on handling international numbers. Etc. 16 | 17 | And as the community handles more and more cases in an _excellent_ way, I hope a day will come when no one wants the `regex` package anymore. 18 | 19 | 20 | 21 | ## Historical Notes 22 | 23 | I want to draw a distinction between **regular expressions** and **regex**. These are related, but not the same. I think understanding the distinction helps motivate why I recommend against using this package. 24 | 25 | 26 | ### Regular Expressions 27 | 28 | In theoretical computer science, the idea of a “regular expression” is a simple expression that matches a set of strings. For example: 29 | 30 | - `a` matches `"a"` 31 | - `ab` matches `"ab"` 32 | - `ab*` matches `"a"`, `"ab"`, `"abb"`, `"abbb"`, `"abbbb"`, etc. 33 | - `(ab)*` matches `""`, `"ab"`, `"abab"`, `"ababab"`, `"abababab"`, etc. 34 | - `a|b` matches `"a"` and `"b"` 35 | - `a|(bb)*` matches `"a"`, `""`, `"bb"`, `"bbbb"`, `"bbbb"`, `"bbbbbb"`, etc. 36 | 37 | So you basically have `*` to repeat, parentheses for grouping, and `|` for providing alternatives. That is it! A simple syntax that can describe a bunch of different things. 38 | 39 | It also has quite beautiful relationships to finite automata, context-free grammars, turing machines, etc. If you are into this sort of thing, I highly recommend [Introduction to the Theory of Computation](https://math.mit.edu/~sipser/book.html) by Michael Sipser! 40 | 41 | 42 | ### Regex 43 | 44 | So people came up with that simple thing in computer science, and on its surface, it looks like a good way to match email addresses, phone numbers, etc. But regular expressions only match or not. How can we _extract_ information from the string as well? Well, this is how regex was born. 45 | 46 | A bunch of extensions were added to the root idea, significantly complicating the syntax and behavior. For example, instead of using parentheses just for grouping, parentheses also extract information. But wait, how does `(a|b)*` work if we are extracting everything inside the parens? What should be extracted from matching strings like `"aabb"` or `"aba"` now? 47 | 48 | So lots of things like that were added, and the result is called “regex” and it appears in a bunch of common programming languages like Perl, Python, and JavaScript. 49 | 50 | 51 | ### Reflections 52 | 53 | The regex idea has become quite influential. It is “good enough” for a lot of cases, but it is also quite confusing and difficult to use reliably. If you look around Stack Overflow, you will find tons of questions like "how do I parse an email address?" and many folks just copy/paste the answers without really reading or understanding the regex thoroughly. Does the regex really work? What exactly do you want to allow and disallow? 54 | 55 | The root issue is that regular expressions were not _meant_ to parse everything. For example, regular expressions are unable to describe sets of strings with balanced parentheses, so no regular expression can describe the set of `"()"`, `"(())"`, `"((()))"`, etc. (That means [they cannot parse matching HTML tags](https://stackoverflow.com/a/1732454) either!) But you _can_ do that with context-free grammars! With one really elegant addition! So the limitations of regular expressions are actually their whole point. They are _supposed_ to be simple to show why other formulations can express more. 56 | 57 | So this is why I recommend the [`elm/parser`][elm] package over this one. It _is_ meant to parse everything, and in a way that works really nice with Elm. 58 | -------------------------------------------------------------------------------- /elm.json: -------------------------------------------------------------------------------- 1 | { 2 | "type": "package", 3 | "name": "elm/regex", 4 | "summary": "Support for JS-style regular expressions in Elm", 5 | "license": "BSD-3-Clause", 6 | "version": "1.0.0", 7 | "exposed-modules": [ 8 | "Regex" 9 | ], 10 | "elm-version": "0.19.0 <= v < 0.20.0", 11 | "dependencies": { 12 | "elm/core": "1.0.0 <= v < 2.0.0" 13 | }, 14 | "test-dependencies": {} 15 | } -------------------------------------------------------------------------------- /src/Elm/Kernel/Regex.js: -------------------------------------------------------------------------------- 1 | /* 2 | 3 | import Elm.Kernel.List exposing (fromArray) 4 | import Maybe exposing (Just, Nothing) 5 | import Regex exposing (Match) 6 | 7 | */ 8 | 9 | // CREATE 10 | 11 | var _Regex_never = /.^/; 12 | 13 | var _Regex_fromStringWith = F2(function(options, string) 14 | { 15 | var flags = 'g'; 16 | if (options.__$multiline) { flags += 'm'; } 17 | if (options.__$caseInsensitive) { flags += 'i'; } 18 | 19 | try 20 | { 21 | return __Maybe_Just(new RegExp(string, flags)); 22 | } 23 | catch(error) 24 | { 25 | return __Maybe_Nothing; 26 | } 27 | }); 28 | 29 | 30 | // USE 31 | 32 | var _Regex_contains = F2(function(re, string) 33 | { 34 | return string.match(re) !== null; 35 | }); 36 | 37 | 38 | var _Regex_findAtMost = F3(function(n, re, str) 39 | { 40 | var out = []; 41 | var number = 0; 42 | var string = str; 43 | var lastIndex = re.lastIndex; 44 | var prevLastIndex = -1; 45 | var result; 46 | while (number++ < n && (result = re.exec(string))) 47 | { 48 | if (prevLastIndex == re.lastIndex) break; 49 | var i = result.length - 1; 50 | var subs = new Array(i); 51 | while (i > 0) 52 | { 53 | var submatch = result[i]; 54 | subs[--i] = submatch 55 | ? __Maybe_Just(submatch) 56 | : __Maybe_Nothing; 57 | } 58 | out.push(A4(__Regex_Match, result[0], result.index, number, __List_fromArray(subs))); 59 | prevLastIndex = re.lastIndex; 60 | } 61 | re.lastIndex = lastIndex; 62 | return __List_fromArray(out); 63 | }); 64 | 65 | 66 | var _Regex_replaceAtMost = F4(function(n, re, replacer, string) 67 | { 68 | var count = 0; 69 | function jsReplacer(match) 70 | { 71 | if (count++ >= n) 72 | { 73 | return match; 74 | } 75 | var i = arguments.length - 3; 76 | var submatches = new Array(i); 77 | while (i > 0) 78 | { 79 | var submatch = arguments[i]; 80 | submatches[--i] = submatch 81 | ? __Maybe_Just(submatch) 82 | : __Maybe_Nothing; 83 | } 84 | return replacer(A4(__Regex_Match, match, arguments[arguments.length - 2], count, __List_fromArray(submatches))); 85 | } 86 | return string.replace(re, jsReplacer); 87 | }); 88 | 89 | var _Regex_splitAtMost = F3(function(n, re, str) 90 | { 91 | var string = str; 92 | var out = []; 93 | var start = re.lastIndex; 94 | var restoreLastIndex = re.lastIndex; 95 | while (n--) 96 | { 97 | var result = re.exec(string); 98 | if (!result) break; 99 | out.push(string.slice(start, result.index)); 100 | start = re.lastIndex; 101 | } 102 | out.push(string.slice(start)); 103 | re.lastIndex = restoreLastIndex; 104 | return __List_fromArray(out); 105 | }); 106 | 107 | var _Regex_infinity = Infinity; 108 | -------------------------------------------------------------------------------- /src/Regex.elm: -------------------------------------------------------------------------------- 1 | module Regex exposing 2 | ( Regex 3 | , fromString 4 | , fromStringWith 5 | , Options 6 | , never 7 | , contains 8 | , split 9 | , find 10 | , replace 11 | , Match 12 | , splitAtMost 13 | , findAtMost 14 | , replaceAtMost 15 | ) 16 | 17 | 18 | {-| A library for working with regex. The syntax matches the [`RegExp`][js] 19 | library from JavaScript. 20 | 21 | [js]: https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions 22 | 23 | # Create 24 | @docs Regex, fromString, fromStringWith, Options, never 25 | 26 | # Use 27 | @docs contains, split, find, replace, Match 28 | 29 | # Fancier Uses 30 | @docs splitAtMost, findAtMost, replaceAtMost 31 | 32 | -} 33 | 34 | 35 | import Elm.Kernel.Regex 36 | 37 | 38 | 39 | -- CREATE 40 | 41 | 42 | {-| A regular expression [as specified in JavaScript][js]. 43 | 44 | [js]: https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions 45 | 46 | -} 47 | type Regex = Regex 48 | 49 | 50 | {-| Try to create a `Regex`. Not all strings are valid though, so you get a 51 | `Maybe' back. This means you can safely accept input from users. 52 | 53 | import Regex 54 | 55 | lowerCase : Regex.Regex 56 | lowerCase = 57 | Maybe.withDefault Regex.never <| 58 | Regex.fromString "[a-z]+" 59 | 60 | **Note:** There are some [shorthand character classes][short] like `\w` for 61 | word characters, `\s` for whitespace characters, and `\d` for digits. **Make 62 | sure they are properly escaped!** If you specify them directly in your code, 63 | they would look like `"\\w\\s\\d"`. 64 | 65 | [short]: https://www.regular-expressions.info/shorthand.html 66 | -} 67 | fromString : String -> Maybe Regex 68 | fromString string = 69 | fromStringWith { caseInsensitive = False, multiline = False } string 70 | 71 | 72 | {-| Create a `Regex` with some additional options. For example, you can define 73 | `fromString` like this: 74 | 75 | import Regex 76 | 77 | fromString : String -> Maybe Regex.Regex 78 | fromString string = 79 | fromStringWith { caseInsensitive = False, multiline = False } string 80 | 81 | -} 82 | fromStringWith : Options -> String -> Maybe Regex 83 | fromStringWith = 84 | Elm.Kernel.Regex.fromStringWith 85 | 86 | 87 | {-|-} 88 | type alias Options = 89 | { caseInsensitive : Bool 90 | , multiline : Bool 91 | } 92 | 93 | 94 | {-| A regular expression that never matches any string. 95 | -} 96 | never : Regex 97 | never = 98 | Elm.Kernel.Regex.never 99 | 100 | 101 | 102 | -- USE 103 | 104 | 105 | {-| Check to see if a Regex is contained in a string. 106 | 107 | import Regex 108 | 109 | digit : Regex.Regex 110 | digit = 111 | Maybe.withDefault Regex.never <| 112 | Regex.fromString "[0-9]" 113 | 114 | -- Regex.contains digit "abc123" == True 115 | -- Regex.contains digit "abcxyz" == False 116 | -} 117 | contains : Regex -> String -> Bool 118 | contains = 119 | Elm.Kernel.Regex.contains 120 | 121 | 122 | {-| Split a string. The following example will split on commas and tolerate 123 | whitespace on either side of the comma: 124 | 125 | import Regex 126 | 127 | comma : Regex.Regex 128 | comma = 129 | Maybe.withDefault Regex.never <| 130 | Regex.fromString " *, *" 131 | 132 | -- Regex.split comma "tom,99,90,85" == ["tom","99","90","85"] 133 | -- Regex.split comma "tom, 99, 90, 85" == ["tom","99","90","85"] 134 | -- Regex.split comma "tom , 99, 90, 85" == ["tom","99","90","85"] 135 | 136 | If you want some really fancy splits, a library like 137 | [`elm/parser`][parser] will probably be easier to use. 138 | 139 | [parser]: /packages/elm/parser/latest 140 | -} 141 | split : Regex -> String -> List String 142 | split = 143 | Elm.Kernel.Regex.splitAtMost Elm.Kernel.Regex.infinity 144 | 145 | 146 | {-| Find matches in a string: 147 | 148 | import Regex 149 | 150 | location : Regex.Regex 151 | location = 152 | Maybe.withDefault Regex.never <| 153 | Regex.fromString "[oi]n a (\\w+)" 154 | 155 | places : List Regex.Match 156 | places = 157 | Regex.find location "I am on a boat in a lake." 158 | 159 | -- map .match places == [ "on a boat", "in a lake" ] 160 | -- map .submatches places == [ [Just "boat"], [Just "lake"] ] 161 | 162 | If you need `submatches` for some reason, a library like 163 | [`elm/parser`][parser] will probably lead to better code in the long run. 164 | 165 | [parser]: /packages/elm/parser/latest 166 | -} 167 | find : Regex -> String -> List Match 168 | find = 169 | Elm.Kernel.Regex.findAtMost Elm.Kernel.Regex.infinity 170 | 171 | 172 | {-| The details about a particular match: 173 | 174 | * `match` — the full string of the match. 175 | * `index` — the index of the match in the original string. 176 | * `number` — if you find many matches, you can think of each one 177 | as being labeled with a `number` starting at one. So the first time you 178 | find a match, that is match `number` one. Second time is match `number` two. 179 | This is useful when paired with `replace` if replacement is dependent on how 180 | many times a pattern has appeared before. 181 | * `submatches` — a `Regex` can have [subpatterns][sub], sup-parts that 182 | are in parentheses. This is a list of all these submatches. This is kind of 183 | garbage to use, and using a package like [`elm/parser`][parser] is 184 | probably easier. 185 | 186 | [sub]: https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions#Using_Parenthesized_Substring_Matches 187 | [parser]: /packages/elm/parser/latest 188 | 189 | -} 190 | type alias Match = 191 | { match : String 192 | , index : Int 193 | , number : Int 194 | , submatches : List (Maybe String) 195 | } 196 | 197 | 198 | {-| Replace matches. The function from `Match` to `String` lets 199 | you use the details of a specific match when making replacements. 200 | 201 | import Regex 202 | 203 | userReplace : String -> (Regex.Match -> String) -> String -> String 204 | userReplace userRegex replacer string = 205 | case Regex.fromString userRegex of 206 | Nothing -> 207 | string 208 | 209 | Just regex -> 210 | Regex.replace regex replacer string 211 | 212 | devowel : String -> String 213 | devowel string = 214 | userReplace "[aeiou]" (\_ -> "") string 215 | 216 | -- devowel "The quick brown fox" == "Th qck brwn fx" 217 | 218 | reverseWords : String -> String 219 | reverseWords string = 220 | userReplace "\\w+" (.match >> String.reverse) string 221 | 222 | -- reverseWords "deliver mined parts" == "reviled denim strap" 223 | -} 224 | replace : Regex -> (Match -> String) -> String -> String 225 | replace = 226 | Elm.Kernel.Regex.replaceAtMost Elm.Kernel.Regex.infinity 227 | 228 | 229 | 230 | -- AT MOST 231 | 232 | 233 | {-| Just like `split` but it stops after some number of matches. 234 | 235 | A library like [`elm/parser`][parser] will probably lead to better code in 236 | the long run. 237 | 238 | [parser]: /packages/elm/parser/latest 239 | -} 240 | splitAtMost : Int -> Regex -> String -> List String 241 | splitAtMost = 242 | Elm.Kernel.Regex.splitAtMost 243 | 244 | 245 | {-| Just like `find` but it stops after some number of matches. 246 | 247 | A library like [`elm/parser`][parser] will probably lead to better code in 248 | the long run. 249 | 250 | [parser]: /packages/elm/parser/latest 251 | -} 252 | findAtMost : Int -> Regex -> String -> List Match 253 | findAtMost = 254 | Elm.Kernel.Regex.findAtMost 255 | 256 | 257 | {-| Just like `replace` but it stops after some number of matches. 258 | 259 | A library like [`elm/parser`][parser] will probably lead to better code in 260 | the long run. 261 | 262 | [parser]: /packages/elm/parser/latest 263 | -} 264 | replaceAtMost : Int -> Regex -> (Match -> String) -> String -> String 265 | replaceAtMost = 266 | Elm.Kernel.Regex.replaceAtMost --------------------------------------------------------------------------------