├── .gitignore
├── LICENSE
├── README.md
├── elm.json
└── src
    ├── Elm
        └── Kernel
        │   └── Regex.js
    └── Regex.elm


/.gitignore:
--------------------------------------------------------------------------------
1 | elm-stuff


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright 2017-present Evan Czaplicki
 2 | 
 3 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
 4 | 
 5 | 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
 6 | 
 7 | 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
 8 | 
 9 | 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
10 | 
11 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
12 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Regex in Elm
 2 | 
 3 | **Generally speaking, it will be easier and nicer to use a parsing library like [`elm/parser`][elm] instead of this.**
 4 | 
 5 | [elm]: https://package.elm-lang.org/packages/elm/parser/latest
 6 | 
 7 | That said, sometimes you may want the kind of regular expressions that appear in JavaScript. Maybe you found some regex on StackOverflow and just want to place it in your code directly. This library supports that scenario.
 8 | 
 9 | 
10 | 
11 | ## Future Plans
12 | 
13 | I hope that _other_ packages will spring up for common parsing tasks, making `regex` less and less useful.
14 | 
15 | So instead of searching Stack Overflow for "email regex" we could have a well-tested package for validating emails. Instead of searching Stack Overflow for "phone numbers" we could have a well-tested package for validating phone numbers that gathered a bunch of helpful information on handling international numbers. Etc.
16 | 
17 | And as the community handles more and more cases in an _excellent_ way, I hope a day will come when no one wants the `regex` package anymore.
18 | 
19 | 
20 | 
21 | ## Historical Notes
22 | 
23 | I want to draw a distinction between **regular expressions** and **regex**. These are related, but not the same. I think understanding the distinction helps motivate why I recommend against using this package.
24 | 
25 | 
26 | ### Regular Expressions
27 | 
28 | In theoretical computer science, the idea of a “regular expression” is a simple expression that matches a set of strings. For example:
29 | 
30 | - `a` matches  `"a"`
31 | - `ab` matches  `"ab"`
32 | - `ab*` matches  `"a"`, `"ab"`, `"abb"`, `"abbb"`, `"abbbb"`, etc.
33 | - `(ab)*` matches  `""`, `"ab"`, `"abab"`, `"ababab"`, `"abababab"`, etc.
34 | - `a|b` matches `"a"` and `"b"`
35 | - `a|(bb)*` matches `"a"`, `""`, `"bb"`, `"bbbb"`, `"bbbb"`, `"bbbbbb"`, etc.
36 | 
37 | So you basically have `*` to repeat, parentheses for grouping, and `|` for providing alternatives. That is it! A simple syntax that can describe a bunch of different things.
38 | 
39 | It also has quite beautiful relationships to finite automata, context-free grammars, turing machines, etc. If you are into this sort of thing, I highly recommend [Introduction to the Theory of Computation](https://math.mit.edu/~sipser/book.html) by Michael Sipser!
40 | 
41 | 
42 | ### Regex
43 | 
44 | So people came up with that simple thing in computer science, and on its surface, it looks like a good way to match email addresses, phone numbers, etc. But regular expressions only match or not. How can we _extract_ information from the string as well? Well, this is how regex was born.
45 | 
46 | A bunch of extensions were added to the root idea, significantly complicating the syntax and behavior. For example, instead of using parentheses just for grouping, parentheses also extract information. But wait, how does `(a|b)*` work if we are extracting everything inside the parens? What should be extracted from matching strings like `"aabb"` or `"aba"` now?
47 | 
48 | So lots of things like that were added, and the result is called “regex” and it appears in a bunch of common programming languages like Perl, Python, and JavaScript.
49 | 
50 | 
51 | ### Reflections
52 | 
53 | The regex idea has become quite influential. It is “good enough” for a lot of cases, but it is also quite confusing and difficult to use reliably. If you look around Stack Overflow, you will find tons of questions like "how do I parse an email address?" and many folks just copy/paste the answers without really reading or understanding the regex thoroughly. Does the regex really work? What exactly do you want to allow and disallow?
54 | 
55 | The root issue is that regular expressions were not _meant_ to parse everything. For example, regular expressions are unable to describe sets of strings with balanced parentheses, so no regular expression can describe the set of `"()"`, `"(())"`, `"((()))"`, etc. (That means [they cannot parse matching HTML tags](https://stackoverflow.com/a/1732454) either!) But you _can_ do that with context-free grammars! With one really elegant addition! So the limitations of regular expressions are actually their whole point. They are _supposed_ to be simple to show why other formulations can express more.
56 | 
57 | So this is why I recommend the [`elm/parser`][elm] package over this one. It _is_ meant to parse everything, and in a way that works really nice with Elm.
58 | 


--------------------------------------------------------------------------------
/elm.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "type": "package",
 3 |     "name": "elm/regex",
 4 |     "summary": "Support for JS-style regular expressions in Elm",
 5 |     "license": "BSD-3-Clause",
 6 |     "version": "1.0.0",
 7 |     "exposed-modules": [
 8 |         "Regex"
 9 |     ],
10 |     "elm-version": "0.19.0 <= v < 0.20.0",
11 |     "dependencies": {
12 |         "elm/core": "1.0.0 <= v < 2.0.0"
13 |     },
14 |     "test-dependencies": {}
15 | }


--------------------------------------------------------------------------------
/src/Elm/Kernel/Regex.js:
--------------------------------------------------------------------------------
  1 | /*
  2 | 
  3 | import Elm.Kernel.List exposing (fromArray)
  4 | import Maybe exposing (Just, Nothing)
  5 | import Regex exposing (Match)
  6 | 
  7 | */
  8 | 
  9 | // CREATE
 10 | 
 11 | var _Regex_never = /.^/;
 12 | 
 13 | var _Regex_fromStringWith = F2(function(options, string)
 14 | {
 15 | 	var flags = 'g';
 16 | 	if (options.__$multiline) { flags += 'm'; }
 17 | 	if (options.__$caseInsensitive) { flags += 'i'; }
 18 | 
 19 | 	try
 20 | 	{
 21 | 		return __Maybe_Just(new RegExp(string, flags));
 22 | 	}
 23 | 	catch(error)
 24 | 	{
 25 | 		return __Maybe_Nothing;
 26 | 	}
 27 | });
 28 | 
 29 | 
 30 | // USE
 31 | 
 32 | var _Regex_contains = F2(function(re, string)
 33 | {
 34 | 	return string.match(re) !== null;
 35 | });
 36 | 
 37 | 
 38 | var _Regex_findAtMost = F3(function(n, re, str)
 39 | {
 40 | 	var out = [];
 41 | 	var number = 0;
 42 | 	var string = str;
 43 | 	var lastIndex = re.lastIndex;
 44 | 	var prevLastIndex = -1;
 45 | 	var result;
 46 | 	while (number++ < n && (result = re.exec(string)))
 47 | 	{
 48 | 		if (prevLastIndex == re.lastIndex) break;
 49 | 		var i = result.length - 1;
 50 | 		var subs = new Array(i);
 51 | 		while (i > 0)
 52 | 		{
 53 | 			var submatch = result[i];
 54 | 			subs[--i] = submatch
 55 | 				? __Maybe_Just(submatch)
 56 | 				: __Maybe_Nothing;
 57 | 		}
 58 | 		out.push(A4(__Regex_Match, result[0], result.index, number, __List_fromArray(subs)));
 59 | 		prevLastIndex = re.lastIndex;
 60 | 	}
 61 | 	re.lastIndex = lastIndex;
 62 | 	return __List_fromArray(out);
 63 | });
 64 | 
 65 | 
 66 | var _Regex_replaceAtMost = F4(function(n, re, replacer, string)
 67 | {
 68 | 	var count = 0;
 69 | 	function jsReplacer(match)
 70 | 	{
 71 | 		if (count++ >= n)
 72 | 		{
 73 | 			return match;
 74 | 		}
 75 | 		var i = arguments.length - 3;
 76 | 		var submatches = new Array(i);
 77 | 		while (i > 0)
 78 | 		{
 79 | 			var submatch = arguments[i];
 80 | 			submatches[--i] = submatch
 81 | 				? __Maybe_Just(submatch)
 82 | 				: __Maybe_Nothing;
 83 | 		}
 84 | 		return replacer(A4(__Regex_Match, match, arguments[arguments.length - 2], count, __List_fromArray(submatches)));
 85 | 	}
 86 | 	return string.replace(re, jsReplacer);
 87 | });
 88 | 
 89 | var _Regex_splitAtMost = F3(function(n, re, str)
 90 | {
 91 | 	var string = str;
 92 | 	var out = [];
 93 | 	var start = re.lastIndex;
 94 | 	var restoreLastIndex = re.lastIndex;
 95 | 	while (n--)
 96 | 	{
 97 | 		var result = re.exec(string);
 98 | 		if (!result) break;
 99 | 		out.push(string.slice(start, result.index));
100 | 		start = re.lastIndex;
101 | 	}
102 | 	out.push(string.slice(start));
103 | 	re.lastIndex = restoreLastIndex;
104 | 	return __List_fromArray(out);
105 | });
106 | 
107 | var _Regex_infinity = Infinity;
108 | 


--------------------------------------------------------------------------------
/src/Regex.elm:
--------------------------------------------------------------------------------
  1 | module Regex exposing
  2 |   ( Regex
  3 |   , fromString
  4 |   , fromStringWith
  5 |   , Options
  6 |   , never
  7 |   , contains
  8 |   , split
  9 |   , find
 10 |   , replace
 11 |   , Match
 12 |   , splitAtMost
 13 |   , findAtMost
 14 |   , replaceAtMost
 15 |   )
 16 | 
 17 | 
 18 | {-| A library for working with regex. The syntax matches the [`RegExp`][js]
 19 | library from JavaScript.
 20 | 
 21 | [js]: https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions
 22 | 
 23 | # Create
 24 | @docs Regex, fromString, fromStringWith, Options, never
 25 | 
 26 | # Use
 27 | @docs contains, split, find, replace, Match
 28 | 
 29 | # Fancier Uses
 30 | @docs splitAtMost, findAtMost, replaceAtMost
 31 | 
 32 | -}
 33 | 
 34 | 
 35 | import Elm.Kernel.Regex
 36 | 
 37 | 
 38 | 
 39 | -- CREATE
 40 | 
 41 | 
 42 | {-| A regular expression [as specified in JavaScript][js].
 43 | 
 44 | [js]: https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions
 45 | 
 46 | -}
 47 | type Regex = Regex
 48 | 
 49 | 
 50 | {-| Try to create a `Regex`. Not all strings are valid though, so you get a
 51 | `Maybe' back. This means you can safely accept input from users.
 52 | 
 53 |     import Regex
 54 | 
 55 |     lowerCase : Regex.Regex
 56 |     lowerCase =
 57 |       Maybe.withDefault Regex.never <|
 58 |         Regex.fromString "[a-z]+"
 59 | 
 60 | **Note:** There are some [shorthand character classes][short] like `\w` for
 61 | word characters, `\s` for whitespace characters, and `\d` for digits. **Make
 62 | sure they are properly escaped!** If you specify them directly in your code,
 63 | they would look like `"\\w\\s\\d"`.
 64 | 
 65 | [short]: https://www.regular-expressions.info/shorthand.html
 66 | -}
 67 | fromString : String -> Maybe Regex
 68 | fromString string =
 69 |   fromStringWith { caseInsensitive = False, multiline = False } string
 70 | 
 71 | 
 72 | {-| Create a `Regex` with some additional options. For example, you can define
 73 | `fromString` like this:
 74 | 
 75 |     import Regex
 76 | 
 77 |     fromString : String -> Maybe Regex.Regex
 78 |     fromString string =
 79 |       fromStringWith { caseInsensitive = False, multiline = False } string
 80 | 
 81 | -}
 82 | fromStringWith : Options -> String -> Maybe Regex
 83 | fromStringWith =
 84 |   Elm.Kernel.Regex.fromStringWith
 85 | 
 86 | 
 87 | {-|-}
 88 | type alias Options =
 89 |   { caseInsensitive : Bool
 90 |   , multiline : Bool
 91 |   }
 92 | 
 93 | 
 94 | {-| A regular expression that never matches any string.
 95 | -}
 96 | never : Regex
 97 | never =
 98 |   Elm.Kernel.Regex.never
 99 | 
100 | 
101 | 
102 | -- USE
103 | 
104 | 
105 | {-| Check to see if a Regex is contained in a string.
106 | 
107 |     import Regex
108 | 
109 |     digit : Regex.Regex
110 |     digit =
111 |       Maybe.withDefault Regex.never <|
112 |         Regex.fromString "[0-9]"
113 | 
114 |     -- Regex.contains digit "abc123" == True
115 |     -- Regex.contains digit "abcxyz" == False
116 | -}
117 | contains : Regex -> String -> Bool
118 | contains =
119 |   Elm.Kernel.Regex.contains
120 | 
121 | 
122 | {-| Split a string. The following example will split on commas and tolerate
123 | whitespace on either side of the comma:
124 | 
125 |     import Regex
126 | 
127 |     comma : Regex.Regex
128 |     comma =
129 |       Maybe.withDefault Regex.never <|
130 |         Regex.fromString " *, *"
131 | 
132 |     -- Regex.split comma "tom,99,90,85"     == ["tom","99","90","85"]
133 |     -- Regex.split comma "tom, 99, 90, 85"  == ["tom","99","90","85"]
134 |     -- Regex.split comma "tom , 99, 90, 85" == ["tom","99","90","85"]
135 | 
136 | If you want some really fancy splits, a library like
137 | [`elm/parser`][parser] will probably be easier to use.
138 | 
139 | [parser]: /packages/elm/parser/latest
140 | -}
141 | split : Regex -> String -> List String
142 | split =
143 |   Elm.Kernel.Regex.splitAtMost Elm.Kernel.Regex.infinity
144 | 
145 | 
146 | {-| Find matches in a string:
147 | 
148 |     import Regex
149 | 
150 |     location : Regex.Regex
151 |     location =
152 |       Maybe.withDefault Regex.never <|
153 |         Regex.fromString "[oi]n a (\\w+)"
154 | 
155 |     places : List Regex.Match
156 |     places =
157 |       Regex.find location "I am on a boat in a lake."
158 | 
159 |     -- map .match      places == [ "on a boat", "in a lake" ]
160 |     -- map .submatches places == [ [Just "boat"], [Just "lake"] ]
161 | 
162 | If you need `submatches` for some reason, a library like
163 | [`elm/parser`][parser] will probably lead to better code in the long run.
164 | 
165 | [parser]: /packages/elm/parser/latest
166 | -}
167 | find : Regex -> String -> List Match
168 | find =
169 |   Elm.Kernel.Regex.findAtMost Elm.Kernel.Regex.infinity
170 | 
171 | 
172 | {-| The details about a particular match:
173 | 
174 |   * `match` &mdash; the full string of the match.
175 |   * `index` &mdash; the index of the match in the original string.
176 |   * `number` &mdash; if you find many matches, you can think of each one
177 |     as being labeled with a `number` starting at one. So the first time you
178 |     find a match, that is match `number` one. Second time is match `number` two.
179 |     This is useful when paired with `replace` if replacement is dependent on how
180 |     many times a pattern has appeared before.
181 |   * `submatches` &mdash; a `Regex` can have [subpatterns][sub], sup-parts that
182 |     are in parentheses. This is a list of all these submatches. This is kind of
183 |     garbage to use, and using a package like [`elm/parser`][parser] is
184 |     probably easier.
185 | 
186 | [sub]: https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions#Using_Parenthesized_Substring_Matches
187 | [parser]: /packages/elm/parser/latest
188 | 
189 | -}
190 | type alias Match =
191 |   { match : String
192 |   , index : Int
193 |   , number : Int
194 |   , submatches : List (Maybe String)
195 |   }
196 | 
197 | 
198 | {-| Replace matches. The function from `Match` to `String` lets
199 | you use the details of a specific match when making replacements.
200 | 
201 |     import Regex
202 | 
203 |     userReplace : String -> (Regex.Match -> String) -> String -> String
204 |     userReplace userRegex replacer string =
205 |       case Regex.fromString userRegex of
206 |         Nothing ->
207 |           string
208 | 
209 |         Just regex ->
210 |           Regex.replace regex replacer string
211 | 
212 |     devowel : String -> String
213 |     devowel string =
214 |       userReplace "[aeiou]" (\_ -> "") string
215 | 
216 |     -- devowel "The quick brown fox" == "Th qck brwn fx"
217 | 
218 |     reverseWords : String -> String
219 |     reverseWords string =
220 |       userReplace "\\w+" (.match >> String.reverse) string
221 | 
222 |     -- reverseWords "deliver mined parts" == "reviled denim strap"
223 | -}
224 | replace : Regex -> (Match -> String) -> String -> String
225 | replace =
226 |   Elm.Kernel.Regex.replaceAtMost Elm.Kernel.Regex.infinity
227 | 
228 | 
229 | 
230 | -- AT MOST
231 | 
232 | 
233 | {-| Just like `split` but it stops after some number of matches.
234 | 
235 | A library like [`elm/parser`][parser] will probably lead to better code in
236 | the long run.
237 | 
238 | [parser]: /packages/elm/parser/latest
239 | -}
240 | splitAtMost : Int -> Regex -> String -> List String
241 | splitAtMost =
242 |   Elm.Kernel.Regex.splitAtMost
243 | 
244 | 
245 | {-| Just like `find` but it stops after some number of matches.
246 | 
247 | A library like [`elm/parser`][parser] will probably lead to better code in
248 | the long run.
249 | 
250 | [parser]: /packages/elm/parser/latest
251 | -}
252 | findAtMost : Int -> Regex -> String -> List Match
253 | findAtMost =
254 |   Elm.Kernel.Regex.findAtMost
255 | 
256 | 
257 | {-| Just like `replace` but it stops after some number of matches.
258 | 
259 | A library like [`elm/parser`][parser] will probably lead to better code in
260 | the long run.
261 | 
262 | [parser]: /packages/elm/parser/latest
263 | -}
264 | replaceAtMost : Int -> Regex -> (Match -> String) -> String -> String
265 | replaceAtMost =
266 |   Elm.Kernel.Regex.replaceAtMost


--------------------------------------------------------------------------------