├── .gitattributes ├── .gitignore ├── LICENSE.md ├── README.md ├── docs ├── 01_Regex_in_Typescript.md ├── 01_What_Is_A_Regular_Expression.md ├── 02_Finite_Automata.md ├── A_Play_03.md ├── A_Play_04.md ├── DFA1.png ├── NFA1.png ├── notes.md ├── paper.css └── summary.md ├── haskell ├── 01_SimpleRegex │ ├── LICENSE │ ├── README.md │ ├── SimpleRegex.cabal │ ├── package.yaml │ ├── src │ │ └── SimpleRegex.hs │ ├── stack.yaml │ └── test │ │ └── Tests.hs ├── 02_RiggedRegex │ ├── LICENSE │ ├── README.md │ ├── RiggedRegex.cabal │ ├── package.yaml │ ├── src │ │ └── RiggedRegex.hs │ ├── stack.yaml │ └── test │ │ └── Tests.hs ├── 03_Brzozowski │ ├── BrzExp.cabal │ ├── LICENSE │ ├── README.md │ ├── Setup.hs │ ├── package.yaml │ ├── src │ │ └── BrzExp.hs │ ├── stack.yaml │ └── test │ │ └── Tests.hs ├── 04_Gluskov │ ├── Glushkov.cabal │ ├── LICENSE │ ├── README.md │ ├── package.yaml │ ├── src │ │ └── Glushkov.hs │ ├── stack.yaml │ └── test │ │ └── Tests.hs ├── 05_RiggedBrz │ ├── LICENSE │ ├── README.md │ ├── RiggedBrz.cabal │ ├── Setup.hs │ ├── package.yaml │ ├── src │ │ └── RiggedBrz.hs │ ├── stack.yaml │ └── test │ │ └── Tests.hs ├── 06_RiggedRegex_Combinator │ ├── LICENSE │ ├── README.md │ ├── RiggedRegex.cabal │ ├── package.yaml │ ├── src │ │ └── RiggedRegex.hs │ ├── stack.yaml │ └── test │ │ └── Tests.hs ├── 07_Rigged_Glushkov │ ├── LICENSE │ ├── README.md │ ├── RiggedGlushkov.cabal │ ├── package.yaml │ ├── src │ │ └── RiggedGlushkov.hs │ ├── stack.yaml │ └── test │ │ └── Tests.hs ├── 08_Heavyweights │ ├── Heavyweights.cabal │ ├── LICENSE │ ├── LICENSE.md │ ├── README.md │ ├── package.yaml │ ├── src │ │ └── Heavyweights.hs │ ├── stack.yaml │ └── test │ │ └── Tests.hs └── 09_Classed_Brzozowski │ ├── BrzExp.cabal │ ├── LICENSE │ ├── README.md │ ├── Setup.hs │ ├── package.yaml │ ├── src │ └── BrzExp.hs │ ├── stack.yaml │ └── test │ └── Tests.hs ├── node └── 01_Kleene.ts ├── python ├── 01_rigged_brzozowski.py ├── 02_rigged_brzozowski.py └── README.md └── rust ├── 01_simpleregex ├── Cargo.toml ├── README.md └── src │ └── lib.rs ├── 02_riggedregex ├── Cargo.toml ├── README.md └── src │ └── lib.rs ├── 03_brzozowski_1 ├── .gitignore ├── Cargo.toml ├── README.md └── src │ └── lib.rs ├── 04_brzozowski_2 ├── .gitignore ├── Cargo.toml ├── README.md └── src │ └── lib.rs ├── 05_glushkov ├── Cargo.toml ├── README.md └── src │ └── lib.rs ├── 06_riggedglushkov ├── Cargo.toml ├── README.md └── src │ └── lib.rs ├── 07_heavyweights ├── Cargo.toml ├── README.md └── src │ └── lib.rs ├── 08_riggedbrz ├── Cargo.toml ├── README.md └── src │ └── lib.rs └── 09_riggedbrz ├── Cargo.toml ├── README.md └── src └── lib.rs /.gitattributes: -------------------------------------------------------------------------------- 1 | haskell/* linguist-vendored 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .#* 2 | *~ 3 | *# 4 | *.aux 5 | cabal-dev 6 | cabal.project.local 7 | cabal.project.local~ 8 | .cabal-sandbox/ 9 | cabal.sandbox.config 10 | Cargo.lock 11 | *.chi 12 | *.chs.h 13 | dist 14 | dist-* 15 | *.dyn_hi 16 | *.dyn_o 17 | *.eventlog 18 | .ghc.environment.* 19 | *.hi 20 | *.hp 21 | .hpc 22 | .hsenv 23 | .HTF/ 24 | *.o 25 | *.prof 26 | **/*.rs.bk 27 | .stack-work/ 28 | rust/**/target 29 | *.pyc 30 | *.pyo 31 | -------------------------------------------------------------------------------- /docs/01_What_Is_A_Regular_Expression.md: -------------------------------------------------------------------------------- 1 | # What is a regular expression? 2 | 3 | So what *is* a regular expression? Let's build up from the bottom: we 4 | start with: 5 | 6 | Alphabet 7 | : An alphabet is a set of symbols (or we can call them letters) 8 | 9 | Word 10 | : A word is a sequence of symbols from an alphabet 11 | 12 | Language 13 | : A language is a set of word sequences 14 | 15 | If our alphabet is ASCII and words are English, then a very simple 16 | language would be something like 17 | 18 | Common_Pets: {dog, cat, fish, hamster, parakeet}. 19 | 20 | Stephen Cole Kleene proposed a formal definition for "regular languages" 21 | in 1959, and what we have developed since then is a series of 22 | refinements that allow us to parse regular languages in something like 23 | linear time. Kleene's operations were meant to *generate* languages, 24 | and the research program since that time has been to turn generators 25 | into recognizers. But let's start with Kleene's generators. 26 | 27 | ## Regular Languages 28 | 29 | There are six basic operators in a regular language, and each of them is 30 | itself a regular language. The first three are the base languages, 31 | encoding the "zero," "one," and "element" of the regular language, and 32 | the second three are composite languages; they contain other regular 33 | languages (including other composites) to describe a complete 34 | generator. 35 | 36 | Given an alphabet, `A`, we can say: 37 | 38 | `L[[∅]] = ∅` 39 | : A language that contains nothing is made up of nothing. 40 | 41 | `L[[ε]] = {ε}` 42 | : A language containing only the empty string can only 43 | generate empty strings. 44 | 45 | `L[[a]] = {a}` 46 | : A language containing only the letter 'a' can only generate a single 47 | instance of the letter 'a'. (This is true for all letters in the 48 | alphabet.) 49 | 50 | `L[[r · s]] = {u · v | u ∈ L[[r]] and v ∈ L[[s]]}` 51 | : A composite language made up of the *sequence* of two other regular 52 | expressions `r` and `s` can generate any tuple `uv` for every `u` 53 | generated by `r` and every `v` generated by `s`. 54 | 55 | `L[[r | s]] = L[[r]] ∪ L[[s]]` 56 | : A composite language made up of the *alternatives* of two other 57 | regular expressions `r` and `s` can generate either the strings of `r` 58 | or the strings of `s`, or both! 59 | 60 | `L[[r∗]] = {ε} ∪ L[[r · r*]]` 61 | : A composite language that repeats `r` zero or more times can generate 62 | zero or more instances of the strings generated by `r`. 63 | 64 | ## Regular Expressions 65 | 66 | What we usually think of as "regular expressions" are in fact a small 67 | programming language designed to be parsed and to internally generate a 68 | function that recognizes whether or not a string "belongs to" the sets 69 | of strings described by a Kleene Algebra. Programatically, a regular 70 | expression is a function that takes a regular language and a string, and 71 | returns back a boolean value indicating whether or not the string 72 | belongs to the set of strings described by the regular language. 73 | 74 | Regular expressions take Kleene's algebra and turn it backwards, asking 75 | "Can this given string be generated by an expression in Kleene's 76 | algebra?" In both the Rust and Haskell branches you'll find the 77 | SimpleRegex implementations, which take this quite literally. The 78 | Haskell version is the most concise; it literally encodes Kleene's five 79 | generative operations (the language of null doesn't generate anything) 80 | and *all possible combinations of `r` and `s` for any given composite 81 | language* and then tests all those combinations to see if the expression 82 | generated any of them. 83 | 84 | This is, of course, inexcusably slow. For any string of length `n`, the 85 | number of comparisons done, thanks mostly to the Sequence composite, is 86 | 2n-1 operations. For a string of 8 letters, that's 256 87 | different combinations of strings that have to be matched, and on my 88 | fairly modern laptop that takes a little longer than 20 seconds. 89 | Increase that to 15 letters and you'll be waiting almost an hour. 90 | 91 | The entirety of the modern parsing research program has been to make 92 | this faster and easier to use. There have been many attempts, and this 93 | project isn't meant to break new ground; instead, its goal is to take 94 | promising results from a variety of different academic research projects 95 | and explore whether there's anything new and interesting that we can 96 | exploit in a modern systems language like Rust or C++. 97 | 98 | -------------------------------------------------------------------------------- /docs/A_Play_03.md: -------------------------------------------------------------------------------- 1 | In the [last 2 | post](https://elfsternberg.com/2019/01/23/a-play-on-regular-expressions-part-2/) 3 | on "[A Play on Regular 4 | Expressions](https://www-ps.informatik.uni-kiel.de/~sebf/pub/regexp-play.html)," 5 | I showed how we go from a boolean regular expression to a "rigged" one; 6 | one that uses an arbitrary data structure to extract data from the 7 | process of recognizing regular expressions. The data structure must 8 | conform to a set of mathematical laws (the 9 | [semiring](https://en.wikipedia.org/wiki/Semiring) laws), but that 10 | simple requirement led us to some surprisingly robust results. 11 | 12 | Now, the question is: Can we port this to Rust? 13 | 14 | Easily. 15 | 16 | The first thing to do, however, is to *not* implement a Semiring. A 17 | Semiring is a conceptual item, and in Rust it turns out that you can get 18 | away without defining a Semiring as a trait; instead, it's a collection 19 | of traits derived from the `num_traits` crate: `Zero, zero, One, one`; 20 | the capitalized versions are the traits, and the lower case ones are the 21 | implementations we have to provide. 22 | 23 | I won't post the entire code here, but you can check it out in [Rigged 24 | Kleene Regular Expressions in 25 | Rust](https://github.com/elfsternberg/riggedregex/tree/master/rust/02_riggedregex). 26 | Here are a few highlights: 27 | 28 | The `accept()` function for the Haskell version looked like this: 29 | 30 | acceptw :: Semiring s => Regw c s -> [c] -> s 31 | acceptw Epsw u = if null u then one else zero 32 | acceptw (Symw f) u = case u of [c] -> f c; _ -> zero 33 | acceptw (Altw p q) u = acceptw p u `add` acceptw q u 34 | acceptw (Seqw p q) u = sumr [ acceptw p u1 `mul` acceptw q u2 | (u1, u2) <- split u ] 35 | acceptw (Repw r) u = sumr [ prodr [ acceptw r ui | ui <- ps ] | ps <- parts u ] 36 | 37 | The `accept()` function in Rust looks almost the same: 38 | 39 | pub fn acceptw(r: &Regw, s: &[char]) -> S 40 | where S: Zero + One 41 | { 42 | match r { 43 | Regw::Eps => if s.is_empty() { one() } else { zero() }, 44 | Regw::Sym(c) => if s.len() == 1 { c(s[0]) } else { zero() }, 45 | Regw::Alt(r1, r2) => S::add(acceptw(&r1, s), acceptw(&r2, s)), 46 | Regw::Seq(r1, r2) => split(s) 47 | .into_iter() 48 | .map(|(u1, u2)| acceptw(r1, &u1) * acceptw(r2, &u2)) 49 | .fold(S::zero(), sumr), 50 | Regw::Rep(r) => parts(s) 51 | .into_iter() 52 | .map(|ps| ps.into_iter().map(|u| acceptw(r, &u)).fold(S::one(), prod)) 53 | .fold(S::zero(), sumr) 54 | } 55 | } 56 | 57 | There's a bit more machinery here to support the `sum`-over and 58 | `product`-over maps. There's also the `where S: Zero + One` clause, 59 | which tells us that our Semiring must be something that understands 60 | those two notions and have implementations for them. 61 | 62 | To restore our boolean version of our engine, we have to build a nominal 63 | container that supports the various traits of our semiring. To do that, 64 | we need to implement the methods associated with `Zero`, `One`, `Mul`, 65 | and `Add`, and explain what they mean to the datatype of our semiring. 66 | The actual work is straightforward. 67 | 68 | pub struct Recognizer(bool); 69 | 70 | impl Zero for Recognizer { 71 | fn zero() -> Recognizer { Recognizer(false) } 72 | fn is_zero(&self) -> bool { !self.0 } 73 | } 74 | 75 | impl One for Recognizer { 76 | fn one() -> Recognizer { Recognizer(true) } 77 | } 78 | 79 | impl Mul for Recognizer { 80 | type Output = Recognizer; 81 | fn mul(self, rhs: Recognizer) -> Recognizer { Recognizer(self.0 && rhs.0) } 82 | } 83 | 84 | impl Add for Recognizer { 85 | type Output = Recognizer; 86 | fn add(self, rhs: Recognizer) -> Recognizer { Recognizer(self.0 || rhs.0) } 87 | } 88 | 89 | Also, unlike Haskell, Rust must be explicitly told what kind of Semiring 90 | will be used before processing, whereas Haskell will see what kind of 91 | Semiring you need to produce the processed result and hook up the 92 | machinery for you, but that's not surprising. In Rust, you "lift" a 93 | straight expression to a rigged one thusly: 94 | 95 | let rigged: Regw = rig(&evencs); 96 | 97 | All in all, porting the Haskell to Rust was extremely straightforward. 98 | The code looks remarkably similar, but for one detail. In the Kleene 99 | version of regular expressions we're emulating as closely as possible 100 | the "all possible permutations of our input string" implicit in the 101 | set-theoretic language of Kleene's 1956 paper. That slows us down a 102 | lot, but in Haskell the code for doing it was extremely straightforward, 103 | which two simple functions to create all possible permutations for both 104 | the sequence and repetition options: 105 | 106 | split [] = [([], [])] 107 | split (c:cs) = ([], c : cs) : [(c : s1, s2) | (s1, s2) <- split cs] 108 | parts [] = [[]] 109 | parts [c] = [[[c]]] 110 | parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs] 111 | 112 | In Rust, these two functions were 21 and 29 lines long, respectively. 113 | Rust's demands that you pay attention to memory usage and the rules 114 | about it require that you also be very explicit about when you want it, 115 | so Rust knows exactly when you no longer want it and can release it back 116 | to the allocator. 117 | 118 | Rust's syntax and support are amazing, and the way Haskell can be ported 119 | to Rust with little to no loss of fidelity makes me happy to work in 120 | both. 121 | -------------------------------------------------------------------------------- /docs/DFA1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/elfsternberg/riggedregex/065a5741807071a55b0e22c0c1ed8a1dfcb1abec/docs/DFA1.png -------------------------------------------------------------------------------- /docs/NFA1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/elfsternberg/riggedregex/065a5741807071a55b0e22c0c1ed8a1dfcb1abec/docs/NFA1.png -------------------------------------------------------------------------------- /docs/notes.md: -------------------------------------------------------------------------------- 1 | Owens and Reppy did a much better job than I originally thought. They 2 | use the tilde to mean "is recognized by," as in "r ~ u" means "`r` 3 | *recognizes* the string `u`". 4 | 5 | Following on the nullability issue, r ~ ε ⇔ ν(r) = ε, r ~ aw ⇔ δ(a)r ~ w 6 | (`r` recognizes `aw` if the derivative of r with respect to `a` 7 | recognizes only `w`). 8 | 9 | r ≡ s (r is equivalent to s) if 𝓛⟦r⟧ = 𝓛⟦s⟧. Note that this is 10 | "equivalance" under set theory, where given a binary equivalence 11 | operation. That is, if the elements of some set S have an equivalence 12 | notion, then the set S can be split into *equivalence classes*. 13 | 14 | 1. At each step we have a residual regular expression `r` for the 15 | residual string `s` 16 | 17 | 2. Instead of computing the derivative on the fly, we precompute the 18 | derivative of `r` for each symbol in our alphabet `Σ`, thereby 19 | constructing a DFA for the language in `r`. 20 | 21 | 3. Computing equivalence can be expensive 22 | 23 | 4. It is not practical to iterate over every Unicode codepoint for 24 | each state. 25 | 26 | 5. A scanner-generator takes a collection of REs, not just one. 27 | 28 | Owens & Reppy introduce a notion of *weak equivalence*, which is a set 29 | of rules for harmonizing some regular expression equivalents. These 30 | look a lot like some of the performance optimizations found in Might & 31 | Adams. 32 | 33 | They define a *class*, **S**, where **S** ⊆ Σ. **S** covers both the 34 | empty set and the single character set, as well as a multi-character 35 | *class*. 36 | 37 | They then add equivalence expressions: **R** + **S** ≈ **T** where 38 | T = R ∪ S. (Note that this works for *recognition*. But what about more 39 | complex operations?) 40 | 41 | We say that a and b are equivalent in r only if δ(a)r ≡ δ(b)r. 42 | 43 | r = a + b · a + c 44 | 45 | (Do we read this "a OR ba OR c" or "(a or b)(a or c)". If we read it 46 | the first way, then this makes sense: the equivalence classes for r 47 | produce three possible derivatives: {a, c}, transition to `ε`; {b}, 48 | transition to `a`; or Σ\{a,b,c}, which is the alphabet that excludes a, 49 | b, or c, and transitions to `⊘`.) 50 | 51 | All right, having gotten that out of the way, we say that i ≅ᵣ j (the 52 | derivative class of r(i) is equivalent to the derivative class of r(j)) 53 | if `δᵢr ≡ δⱼr`. 54 | 55 | fun goto q (S, (Q, δ)) = 56 | let c ∈ S 57 | let q c = ∂ c q 58 | in 59 | if ∃q 0 ∈ Q such that q 0 ≈ q c 60 | then (Q, δ ∪ {(q, S) 7→ q 0 }) 61 | else 62 | let Q 0 = Q ∪ {q c } 63 | let δ 0 = δ ∪ {(q, S) 7→ q c } 64 | in explore (Q 0 , δ 0 , q c ) 65 | 66 | let explore (Q, δ, q) = fold (goto q) (Q, δ) (C(q)) 67 | 68 | fun mkDFA r = 69 | let q 0 = ∂ ε r 70 | let (Q, δ) = explore ({q 0 }, {}, q 0 ) 71 | let F = {q | q ∈ Q and ν(q) = ε} 72 | in hQ, q 0 , F, δi 73 | 74 | 75 | 76 | 77 | 78 | -------------------------------------------------------------------------------- /docs/paper.css: -------------------------------------------------------------------------------- 1 | /* 2 | * I add this to html files generated with pandoc. 3 | */ 4 | 5 | html { 6 | font-size: 100%; 7 | overflow-y: scroll; 8 | -webkit-text-size-adjust: 100%; 9 | -ms-text-size-adjust: 100%; 10 | } 11 | 12 | body { 13 | color: #444; 14 | font-family: "HelveticaÊNeue", "Helvetica", "Arial", sans-serif; 15 | font-size: 14px; 16 | line-height: 1.7; 17 | padding: 1em; 18 | margin: auto; 19 | max-width: 42em; 20 | background: #fefefe; 21 | } 22 | 23 | a { 24 | color: #0645ad; 25 | text-decoration: none; 26 | } 27 | 28 | a:visited { 29 | color: #0b0080; 30 | } 31 | 32 | a:hover { 33 | color: #06e; 34 | } 35 | 36 | a:active { 37 | color: #faa700; 38 | } 39 | 40 | a:focus { 41 | outline: thin dotted; 42 | } 43 | 44 | *::-moz-selection { 45 | background: rgba(255, 255, 0, 0.3); 46 | color: #000; 47 | } 48 | 49 | *::selection { 50 | background: rgba(255, 255, 0, 0.3); 51 | color: #000; 52 | } 53 | 54 | a::-moz-selection { 55 | background: rgba(255, 255, 0, 0.3); 56 | color: #0645ad; 57 | } 58 | 59 | a::selection { 60 | background: rgba(255, 255, 0, 0.3); 61 | color: #0645ad; 62 | } 63 | 64 | p { 65 | margin: 1em 0; 66 | } 67 | 68 | img { 69 | max-width: 100%; 70 | } 71 | 72 | h1, h2, h3, h4, h5, h6 { 73 | color: #111; 74 | line-height: 125%; 75 | margin-top: 2em; 76 | font-weight: normal; 77 | } 78 | 79 | h4, h5, h6 { 80 | font-weight: bold; 81 | } 82 | 83 | h1 { 84 | font-size: 2.5em; 85 | } 86 | 87 | h2 { 88 | font-size: 2em; 89 | } 90 | 91 | h3 { 92 | font-size: 1.5em; 93 | } 94 | 95 | h4 { 96 | font-size: 1.2em; 97 | } 98 | 99 | h5 { 100 | font-size: 1em; 101 | } 102 | 103 | h6 { 104 | font-size: 0.9em; 105 | } 106 | 107 | blockquote { 108 | color: #666666; 109 | margin: 0; 110 | padding-left: 3em; 111 | border-left: 0.5em #EEE solid; 112 | } 113 | 114 | hr { 115 | display: block; 116 | height: 2px; 117 | border: 0; 118 | border-top: 1px solid #aaa; 119 | border-bottom: 1px solid #eee; 120 | margin: 1em 0; 121 | padding: 0; 122 | } 123 | 124 | pre, code, kbd, samp { 125 | color: #000; 126 | font-family: monospace, monospace; 127 | _font-family: 'courier new', monospace; 128 | font-size: 0.98em; 129 | } 130 | 131 | pre { 132 | white-space: pre; 133 | white-space: pre-wrap; 134 | word-wrap: break-word; 135 | } 136 | 137 | b, strong { 138 | font-weight: bold; 139 | } 140 | 141 | dfn { 142 | font-style: italic; 143 | } 144 | 145 | ins { 146 | background: #ff9; 147 | color: #000; 148 | text-decoration: none; 149 | } 150 | 151 | mark { 152 | background: #ff0; 153 | color: #000; 154 | font-style: italic; 155 | font-weight: bold; 156 | } 157 | 158 | sub, sup { 159 | font-size: 75%; 160 | line-height: 0; 161 | position: relative; 162 | vertical-align: baseline; 163 | } 164 | 165 | sup { 166 | top: -0.5em; 167 | } 168 | 169 | sub { 170 | bottom: -0.25em; 171 | } 172 | 173 | ul, ol { 174 | margin: 1em 0; 175 | padding: 0 0 0 2em; 176 | } 177 | 178 | li p:last-child { 179 | margin-bottom: 0; 180 | } 181 | 182 | ul ul, ol ol { 183 | margin: .3em 0; 184 | } 185 | 186 | dl { 187 | margin-bottom: 1em; 188 | } 189 | 190 | dt { 191 | font-weight: bold; 192 | margin-bottom: .8em; 193 | } 194 | 195 | dd { 196 | margin: 0 0 .8em 2em; 197 | } 198 | 199 | dd:last-child { 200 | margin-bottom: 0; 201 | } 202 | 203 | img { 204 | border: 0; 205 | -ms-interpolation-mode: bicubic; 206 | vertical-align: middle; 207 | } 208 | 209 | figure { 210 | display: block; 211 | text-align: center; 212 | margin: 1em 0; 213 | } 214 | 215 | figure img { 216 | border: none; 217 | margin: 0 auto; 218 | } 219 | 220 | figcaption { 221 | font-size: 0.8em; 222 | font-style: italic; 223 | margin: 0 0 .8em; 224 | } 225 | 226 | table { 227 | margin-bottom: 2em; 228 | border-bottom: 1px solid #ddd; 229 | border-right: 1px solid #ddd; 230 | border-spacing: 0; 231 | border-collapse: collapse; 232 | } 233 | 234 | table th { 235 | padding: .2em 1em; 236 | background-color: #eee; 237 | border-top: 1px solid #ddd; 238 | border-left: 1px solid #ddd; 239 | } 240 | 241 | table td { 242 | padding: .2em 1em; 243 | border-top: 1px solid #ddd; 244 | border-left: 1px solid #ddd; 245 | vertical-align: top; 246 | } 247 | 248 | .author { 249 | font-size: 1.2em; 250 | text-align: center; 251 | } 252 | 253 | @media only screen and (min-width: 480px) { 254 | body { 255 | font-size: 14px; 256 | } 257 | } 258 | @media only screen and (min-width: 768px) { 259 | body { 260 | font-size: 16px; 261 | } 262 | } 263 | @media print { 264 | * { 265 | background: transparent !important; 266 | color: black !important; 267 | filter: none !important; 268 | -ms-filter: none !important; 269 | } 270 | 271 | body { 272 | font-size: 12pt; 273 | max-width: 100%; 274 | } 275 | 276 | a, a:visited { 277 | text-decoration: underline; 278 | } 279 | 280 | hr { 281 | height: 1px; 282 | border: 0; 283 | border-bottom: 1px solid black; 284 | } 285 | 286 | a[href]:after { 287 | content: " (" attr(href) ")"; 288 | } 289 | 290 | abbr[title]:after { 291 | content: " (" attr(title) ")"; 292 | } 293 | 294 | .ir a:after, a[href^="javascript:"]:after, a[href^="#"]:after { 295 | content: ""; 296 | } 297 | 298 | pre, blockquote { 299 | border: 1px solid #999; 300 | padding-right: 1em; 301 | page-break-inside: avoid; 302 | } 303 | 304 | tr, img { 305 | page-break-inside: avoid; 306 | } 307 | 308 | img { 309 | max-width: 100% !important; 310 | } 311 | 312 | @page :left { 313 | margin: 15mm 20mm 15mm 10mm; 314 | } 315 | 316 | @page :right { 317 | margin: 15mm 10mm 15mm 20mm; 318 | } 319 | 320 | p, h2, h3 { 321 | orphans: 3; 322 | widows: 3; 323 | } 324 | 325 | h2, h3 { 326 | page-break-after: avoid; 327 | } 328 | } 329 | -------------------------------------------------------------------------------- /docs/summary.md: -------------------------------------------------------------------------------- 1 | L[[∅]] = ∅ 2 | L[[ε]] = {ε} 3 | L[[a]] = {a} 4 | L[[r · s]] = {u · v | u ∈ L[[r]] and v ∈ L[[s]]} 5 | L[[r | s]] = L[[r]] ∪ L[[s]] 6 | L[[r∗]] = {ε} ∪ L[[r · r*]] 7 | -------------------------------------------------------------------------------- /haskell/01_SimpleRegex/LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2010, Thomas Wilke, Frank Huch, Sebastian Fischer 2 | 3 | All rights reserved. 4 | 5 | Redistribution and use in source and binary forms, with or without 6 | modification, are permitted provided that the following conditions are 7 | met: 8 | 9 | 1. Redistributions of source code must retain the above copyright 10 | notice, this list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright 13 | notice, this list of conditions and the following disclaimer in 14 | the documentation and/or other materials provided with the 15 | distribution. 16 | 17 | 3. Neither the name of the author nor the names of his contributors 18 | may be used to endorse or promote products derived from this 19 | software without specific prior written permission. 20 | 21 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 22 | ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 23 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 24 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR 25 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 26 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 27 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR 28 | PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 29 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 30 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 31 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 32 | 33 | -------------------------------------------------------------------------------- /haskell/01_SimpleRegex/README.md: -------------------------------------------------------------------------------- 1 | # Kleene Regular Expressions, in Haskell 2 | 3 | This is literally the definition of a simple string recognizing regular 4 | expression in Haskell. It consists of the `Reg` datatype encompassing 5 | the five standard operations of regular expressions and an `accept` 6 | function that takes the expression and a string and returns a Boolean 7 | yes/no on recognition or failure. It is a direct implementation of 8 | Kleene's algebra: 9 | 10 | L[[ε]] = {ε} 11 | L[[a]] = {a} 12 | L[[r · s]] = {u · v | u ∈ L[[r]] and v ∈ L[[s]]} 13 | L[[r | s]] = L[[r]] ∪ L[[s]] 14 | L[[r∗]] = {ε} ∪ L[[r · r*]] 15 | 16 | Those equations are for: recognizing an empty string, recognizing a 17 | letter, recognizing two expressions in sequence, recognizing two 18 | expression alternatives, and the repetition operation. 19 | 20 | The `accept` function has two helper functions that split the string, 21 | and all substrings, into all possible substrings such that *every 22 | possible combination* of string and expression are tested, and if the 23 | resulting tree of `and`s (from Sequencing and Repetition) and `or`s 24 | (from Alternation) has at least one complete collection of `True` from 25 | top to bottom then the function returns true. 26 | 27 | This generation and comparison of substrings is grossly inefficient; an 28 | string of eight 'a's with `a*` will take 30 seconds on a modern laptop; 29 | increase that to twelve and you'll be waiting about an hour. The cost 30 | is `2^(n - 1)`, where `n` is the length of the string; this is a 31 | consequence of the sequencing operation. Sequences aren't just about 32 | letters: they could be about anything, including repetition (which 33 | itself creates new sequences) and other sequences, and the cost of 34 | examining every possible combination of sequencing creates this 35 | exponential cost. 36 | 37 | It is quite amazing, though, to actually *see* a straightforward 38 | implementation of Kleene's Regular Expressions in code. 39 | 40 | 41 | 42 | -------------------------------------------------------------------------------- /haskell/01_SimpleRegex/SimpleRegex.cabal: -------------------------------------------------------------------------------- 1 | cabal-version: 1.12 2 | 3 | -- This file has been generated from package.yaml by hpack version 0.31.1. 4 | -- 5 | -- see: https://github.com/sol/hpack 6 | -- 7 | -- hash: 2b06081e19cfbe96fa9a2d9a12695410d10cc0b73d3fe0c09d77986d2f101773 8 | 9 | name: SimpleRegex 10 | version: 0.1.0.0 11 | category: Regex 12 | homepage: https://github.com/elfsternberg/RegexWeightedPearl#readme 13 | author: Elf M. Sternberg 14 | maintainer: elf.sternberg@gmail.com 15 | copyright: Copyright ⓒ 2019 Elf M. Sternberg 16 | license: BSD3 17 | license-file: LICENSE 18 | build-type: Simple 19 | extra-source-files: 20 | README.md 21 | 22 | library 23 | exposed-modules: 24 | SimpleRegex 25 | other-modules: 26 | Paths_SimpleRegex 27 | hs-source-dirs: 28 | src 29 | ghc-options: -Wall 30 | build-depends: 31 | base 32 | default-language: Haskell2010 33 | 34 | test-suite test 35 | type: exitcode-stdio-1.0 36 | main-is: Tests.hs 37 | other-modules: 38 | Paths_SimpleRegex 39 | hs-source-dirs: 40 | test 41 | build-depends: 42 | SimpleRegex 43 | , base 44 | , hspec 45 | default-language: Haskell2010 46 | -------------------------------------------------------------------------------- /haskell/01_SimpleRegex/package.yaml: -------------------------------------------------------------------------------- 1 | name: SimpleRegex 2 | version: 0.1.0.0 3 | 4 | homepage: https://github.com/elfsternberg/riggedregex#readme 5 | license: BSD3 6 | license-file: LICENSE 7 | author: Elf M. Sternberg 8 | maintainer: elf.sternberg@gmail.com 9 | copyright: Copyright ⓒ 2019 Elf M. Sternberg 10 | category: Regex 11 | build-type: Simple 12 | extra-source-files: README.md 13 | 14 | dependencies: 15 | - base 16 | 17 | library: 18 | exposed-modules: SimpleRegex 19 | ghc-options: -Wall 20 | source-dirs: src 21 | 22 | tests: 23 | test: 24 | main: Tests.hs 25 | source-dirs: test 26 | dependencies: 27 | - SimpleRegex 28 | - hspec 29 | 30 | 31 | -------------------------------------------------------------------------------- /haskell/01_SimpleRegex/src/SimpleRegex.hs: -------------------------------------------------------------------------------- 1 | {-# LANGUAGE LambdaCase #-} 2 | 3 | module SimpleRegex ( accept, Reg (..) ) where 4 | 5 | data Reg = 6 | Eps -- Epsilon 7 | | Sym Char -- Character 8 | | Alt Reg Reg -- Alternation 9 | | Seq Reg Reg -- Sequence 10 | | Rep Reg -- R* 11 | 12 | accept :: Reg -> String -> Bool 13 | -- Epsilon 14 | accept Eps u = null u 15 | -- Accept if the character offered matches the character constructed 16 | accept (Sym c) u = u == [c] 17 | -- Constructed of two other expressions, accept if either one does. 18 | accept (Alt p q) u = accept p u || accept q u 19 | -- Constructed of two other expressions, accept if p accepts some part 20 | -- of u and q accepts the rest, where u is split arbitrarily 21 | accept (Seq p q) u = or [accept p u1 && accept q u2 | (u1, u2) <- split u] 22 | -- For all convolutions of u containing no empty strings, 23 | -- if all the entries of that convolution are accepted, 24 | -- then at least one convolution is acceptable. 25 | accept (Rep r) u = or [and [accept r ui | ui <- ps] | ps <- parts u] 26 | 27 | -- Generate a list of all possible combinations of a prefix and suffix 28 | -- for the string offered.w 29 | split :: [a] -> [([a], [a])] 30 | split [] = [([], [])] 31 | split (c:cs) = ([], c : cs) : [(c : s1, s2) | (s1, s2) <- split cs] 32 | 33 | -- Generate lists of lists that contain all possible convolutions of 34 | -- the input string, not including the empty string. 35 | parts :: [a] -> [[[a]]] 36 | parts [] = [[]] 37 | parts [c] = [[[c]]] 38 | parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs] 39 | 40 | -------------------------------------------------------------------------------- /haskell/01_SimpleRegex/stack.yaml: -------------------------------------------------------------------------------- 1 | # This file was automatically generated by 'stack init' 2 | # 3 | # Some commonly used options have been documented as comments in this file. 4 | # For advanced use and comprehensive documentation of the format, please see: 5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/ 6 | 7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version. 8 | # A snapshot resolver dictates the compiler version and the set of packages 9 | # to be used for project dependencies. For example: 10 | # 11 | # resolver: lts-3.5 12 | # resolver: nightly-2015-09-21 13 | # resolver: ghc-7.10.2 14 | # 15 | # The location of a snapshot can be provided as a file or url. Stack assumes 16 | # a snapshot provided as a file might change, whereas a url resource does not. 17 | # 18 | # resolver: ./custom-snapshot.yaml 19 | # resolver: https://example.com/snapshots/2018-01-01.yaml 20 | resolver: lts-13.4 21 | 22 | # User packages to be built. 23 | # Various formats can be used as shown in the example below. 24 | # 25 | # packages: 26 | # - some-directory 27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz 28 | # - location: 29 | # git: https://github.com/commercialhaskell/stack.git 30 | # commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a 31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a 32 | # subdirs: 33 | # - auto-update 34 | # - wai 35 | packages: 36 | - . 37 | # Dependency packages to be pulled from upstream that are not in the resolver 38 | # using the same syntax as the packages field. 39 | # (e.g., acme-missiles-0.3) 40 | # extra-deps: [] 41 | 42 | # Override default flag values for local packages and extra-deps 43 | # flags: {} 44 | 45 | # Extra package databases containing global packages 46 | # extra-package-dbs: [] 47 | 48 | # Control whether we use the GHC we find on the path 49 | # system-ghc: true 50 | # 51 | # Require a specific version of stack, using version ranges 52 | # require-stack-version: -any # Default 53 | # require-stack-version: ">=1.9" 54 | # 55 | # Override the architecture used by stack, especially useful on Windows 56 | # arch: i386 57 | # arch: x86_64 58 | # 59 | # Extra directories used by stack for building 60 | # extra-include-dirs: [/path/to/dir] 61 | # extra-lib-dirs: [/path/to/dir] 62 | # 63 | # Allow a newer minor version of GHC than the snapshot specifies 64 | # compiler-check: newer-minor 65 | -------------------------------------------------------------------------------- /haskell/01_SimpleRegex/test/Tests.hs: -------------------------------------------------------------------------------- 1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-} 2 | {-# LANGUAGE RecordWildCards #-} 3 | import Data.Foldable (for_) 4 | import Test.Hspec (Spec, it, shouldBe) 5 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith) 6 | import SimpleRegex (Reg (..), accept) 7 | 8 | main :: IO () 9 | main = hspecWith defaultConfig {configFastFail = True} specs 10 | 11 | specs :: Spec 12 | specs = do 13 | 14 | let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) ) 15 | let onec = Seq nocs (Sym 'c') 16 | let evencs = Seq ( Rep ( Seq onec onec ) ) nocs 17 | 18 | let as = Alt (Sym 'a') (Rep (Sym 'a')) 19 | let bs = Alt (Sym 'b') (Rep (Sym 'b')) 20 | 21 | it "simple expression" $ 22 | accept evencs "acc" `shouldBe` True 23 | 24 | for_ cases test 25 | where 26 | test Case {..} = it description assertion 27 | where 28 | assertion = accept regex sample `shouldBe` result 29 | 30 | 31 | data Case = Case 32 | { description :: String 33 | , regex :: Reg 34 | , sample :: String 35 | , result :: Bool 36 | } 37 | 38 | cases :: [Case] 39 | cases = 40 | [ Case {description = "empty", regex = Eps, sample = "", result = True} 41 | , Case {description = "char", regex = Sym 'a', sample = "a", result = True} 42 | , Case 43 | {description = "not char", regex = Sym 'a', sample = "b", result = False} 44 | , Case 45 | { description = "char vs empty" 46 | , regex = Sym 'a' 47 | , sample = "" 48 | , result = False 49 | } 50 | , Case 51 | { description = "left alt" 52 | , regex = Alt (Sym 'a') (Sym 'b') 53 | , sample = "a" 54 | , result = True 55 | } 56 | , Case 57 | { description = "right alt" 58 | , regex = Alt (Sym 'a') (Sym 'b') 59 | , sample = "b" 60 | , result = True 61 | } 62 | , Case 63 | { description = "neither alt" 64 | , regex = Alt (Sym 'a') (Sym 'b') 65 | , sample = "c" 66 | , result = False 67 | } 68 | , Case 69 | { description = "empty alt" 70 | , regex = Alt (Sym 'a') (Sym 'b') 71 | , sample = "" 72 | , result = False 73 | } 74 | , Case 75 | { description = "empty rep" 76 | , regex = Rep (Sym 'a') 77 | , sample = "" 78 | , result = True 79 | } 80 | , Case 81 | { description = "one rep" 82 | , regex = Rep (Sym 'a') 83 | , sample = "a" 84 | , result = True 85 | } 86 | , Case 87 | { description = "multiple rep" 88 | , regex = Rep (Sym 'a') 89 | , sample = "aaaaaaaaa" 90 | , result = True 91 | } 92 | , Case 93 | { description = "multiple rep with failure" 94 | , regex = Rep (Sym 'a') 95 | , sample = "aaaaaaaaab" 96 | , result = False 97 | } 98 | , Case 99 | { description = "sequence" 100 | , regex = Seq (Sym 'a') (Sym 'b') 101 | , sample = "ab" 102 | , result = True 103 | } 104 | , Case 105 | { description = "sequence with empty" 106 | , regex = Seq (Sym 'a') (Sym 'b') 107 | , sample = "" 108 | , result = False 109 | } 110 | , Case 111 | { description = "bad short sequence" 112 | , regex = Seq (Sym 'a') (Sym 'b') 113 | , sample = "a" 114 | , result = False 115 | } 116 | , Case 117 | { description = "bad long sequence" 118 | , regex = Seq (Sym 'a') (Sym 'b') 119 | , sample = "abc" 120 | , result = False 121 | } 122 | ] 123 | 124 | 125 | -------------------------------------------------------------------------------- /haskell/02_RiggedRegex/LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2010, Thomas Wilke, Frank Huch, Sebastian Fischer 2 | 3 | All rights reserved. 4 | 5 | Redistribution and use in source and binary forms, with or without 6 | modification, are permitted provided that the following conditions are 7 | met: 8 | 9 | 1. Redistributions of source code must retain the above copyright 10 | notice, this list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright 13 | notice, this list of conditions and the following disclaimer in 14 | the documentation and/or other materials provided with the 15 | distribution. 16 | 17 | 3. Neither the name of the author nor the names of his contributors 18 | may be used to endorse or promote products derived from this 19 | software without specific prior written permission. 20 | 21 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 22 | ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 23 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 24 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR 25 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 26 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 27 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR 28 | PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 29 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 30 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 31 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 32 | 33 | -------------------------------------------------------------------------------- /haskell/02_RiggedRegex/README.md: -------------------------------------------------------------------------------- 1 | # Kleene Regular Expressions with Rigging, in Haskell 2 | 3 | This program builds on the simple regular expressions in Version 01, 4 | provding a new definition of a regular expression `Regw` that takes two 5 | types, a source type and an output type. The output type must be a 6 | [*Semiring*](https://en.wikipedia.org/wiki/Semiring). 7 | 8 | A semiring is a set R equipped with two binary operations + and ⋅, and 9 | two constants identified as 0 and 1. By providing a semiring to the 10 | regular expression, we change the return type of the regular expression 11 | to any set that can obey the semiring laws. There's a surprising amount 12 | of stuff you can do with the semiring laws. 13 | 14 | In this example, I've providing a function, `rigged`, that takes a 15 | simple regular expression from Version 01, and wraps or extracts 16 | the contents of that regular expression into the `Regw` datatype. 17 | Instead of the boolean mathematics of Version 01, we use the semiring 18 | symbols `add` and `mul` to represent the sum and product operations on 19 | the return type. We then define the "symbol accepted" boolean to return 20 | either the `zero` or `one` type of the semiring. 21 | 22 | I've provided two semirings: One of (0, 1, +, *, Integers), and one of 23 | (False, True, ||, &&, Booleans). Both work well. 24 | 25 | The `accept expression string` function of the original still works, but 26 | if you say `accept (rigged expression) string :: Int`, Haskell will *go 27 | find* a Semiring that allows this function to work and return the number 28 | of ambiguities encountered during parsing. If you ask for Bool as a 29 | return type, it will behave as the original. 30 | 31 | Sometimes, Haskell is bleeding magical. 32 | 33 | -------------------------------------------------------------------------------- /haskell/02_RiggedRegex/RiggedRegex.cabal: -------------------------------------------------------------------------------- 1 | cabal-version: 1.12 2 | 3 | -- This file has been generated from package.yaml by hpack version 0.31.1. 4 | -- 5 | -- see: https://github.com/sol/hpack 6 | -- 7 | -- hash: a27275fb9824bb59f3ba73db8613283f0ce03f9ab6d1053ec40e17977c04aa1d 8 | 9 | name: RiggedRegex 10 | version: 0.1.0.0 11 | category: Regex 12 | homepage: https://github.com/elfsternberg/riggedregex#readme 13 | author: Elf M. Sternberg 14 | maintainer: elf.sternberg@gmail.com 15 | copyright: Copyright ⓒ 2019 Elf M. Sternberg 16 | license: BSD3 17 | license-file: LICENSE 18 | build-type: Simple 19 | extra-source-files: 20 | README.md 21 | 22 | library 23 | exposed-modules: 24 | RiggedRegex 25 | other-modules: 26 | Paths_RiggedRegex 27 | hs-source-dirs: 28 | src 29 | ghc-options: -Wall 30 | build-depends: 31 | base 32 | default-language: Haskell2010 33 | 34 | test-suite test 35 | type: exitcode-stdio-1.0 36 | main-is: Tests.hs 37 | other-modules: 38 | Paths_RiggedRegex 39 | hs-source-dirs: 40 | test 41 | build-depends: 42 | RiggedRegex 43 | , base 44 | , hspec 45 | default-language: Haskell2010 46 | -------------------------------------------------------------------------------- /haskell/02_RiggedRegex/package.yaml: -------------------------------------------------------------------------------- 1 | name: RiggedRegex 2 | version: 0.1.0.0 3 | 4 | homepage: https://github.com/elfsternberg/riggedregex#readme 5 | license: BSD3 6 | license-file: LICENSE 7 | author: Elf M. Sternberg 8 | maintainer: elf.sternberg@gmail.com 9 | copyright: Copyright ⓒ 2019 Elf M. Sternberg 10 | category: Regex 11 | build-type: Simple 12 | extra-source-files: README.md 13 | 14 | dependencies: 15 | - base 16 | 17 | library: 18 | exposed-modules: RiggedRegex 19 | ghc-options: -Wall 20 | source-dirs: src 21 | 22 | tests: 23 | test: 24 | main: Tests.hs 25 | source-dirs: test 26 | dependencies: 27 | - RiggedRegex 28 | - hspec 29 | 30 | 31 | -------------------------------------------------------------------------------- /haskell/02_RiggedRegex/src/RiggedRegex.hs: -------------------------------------------------------------------------------- 1 | {-# LANGUAGE LambdaCase #-} 2 | 3 | module RiggedRegex ( accept, acceptw, Reg (..), Regw (..), rigged ) where 4 | 5 | data Reg = 6 | Eps -- Epsilon 7 | | Sym Char -- Character 8 | | Alt Reg Reg -- Alternation 9 | | Seq Reg Reg -- Sequence 10 | | Rep Reg -- R* 11 | 12 | accept :: Reg -> String -> Bool 13 | -- Epsilon 14 | accept Eps u = null u 15 | -- Accept if the character offered matches the character constructed 16 | accept (Sym c) u = u == [c] 17 | -- Constructed of two other expressions, accept if either one does. 18 | accept (Alt p q) u = accept p u || accept q u 19 | -- Constructed of two other expressions, accept if p accepts some part 20 | -- of u and q accepts the rest, where u is split arbitrarily 21 | accept (Seq p q) u = or [accept p u1 && accept q u2 | (u1, u2) <- split u] 22 | -- For all convolutions of u containing no empty strings, 23 | -- if all the entries of that convolution are accepted, 24 | -- then at least one convolution is acceptable. 25 | accept (Rep r) u = or [and [accept r ui | ui <- ps] | ps <- parts u] 26 | 27 | -- Generate a list of all possible combinations of a prefix and suffix 28 | -- for the string offered.w 29 | split :: [a] -> [([a], [a])] 30 | split [] = [([], [])] 31 | split (c:cs) = ([], c : cs) : [(c : s1, s2) | (s1, s2) <- split cs] 32 | 33 | -- Generate lists of lists that contain all possible convolutions of 34 | -- the input string, not including the empty string. 35 | parts :: [a] -> [[[a]]] 36 | parts [] = [[]] 37 | parts [c] = [[[c]]] 38 | parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs] 39 | 40 | -- A semiring is an algebraic structure with a zero, a one, a 41 | -- "multiplication" operation, and an "addition" operation. Zero is 42 | -- the identity operator for addition, One is the identity operator for 43 | -- multiplication, both composition operators are associative (it does 44 | -- not matter how sequential operations are grouped), and addition is 45 | -- commutative (the order of the operations does not matter). Also, 46 | -- zero `mul` anything is always zero. 47 | -- 48 | -- Which, in regular expressions in general, holds that the null regex 49 | -- is zero, and the empty string regex is one, alternation is addition 50 | -- and ... sequence is multiplication? Like "sum" and "product" types? 51 | 52 | class Semiring s where 53 | zero, one :: s 54 | mul, add :: s -> s -> s 55 | 56 | -- Symw (c -> s) represents a mapping from a symbol to its given weight. 57 | 58 | sym :: Semiring s => Char -> Regw Char s 59 | sym c = Symw (\b -> if b == c then one else zero) 60 | 61 | data Regw c s = 62 | Epsw -- Epsilon 63 | | Symw (c -> s) -- Character 64 | | Altw (Regw c s) (Regw c s) -- Alternation 65 | | Seqw (Regw c s) (Regw c s) -- Sequence 66 | | Repw (Regw c s) -- R* 67 | 68 | rigged :: Semiring s => Reg -> Regw Char s 69 | rigged = \case 70 | Eps -> Epsw 71 | (Sym c) -> sym c 72 | (Alt p q) -> Altw (rigged p) (rigged q) 73 | (Seq p q) -> Seqw (rigged p) (rigged q) 74 | (Rep r) -> Repw (rigged r) 75 | 76 | acceptw :: Semiring s => Regw c s -> [c] -> s 77 | acceptw Epsw u = if null u then one else zero 78 | acceptw (Symw f) u = 79 | case u of 80 | [c] -> f c 81 | _ -> zero 82 | acceptw (Altw p q) u = acceptw p u `add` acceptw q u 83 | acceptw (Seqw p q) u = sumr [ acceptw p u1 `mul` acceptw q u2 | (u1, u2) <- split u ] 84 | acceptw (Repw r) u = sumr [ prodr [ acceptw r ui | ui <- ps ] | ps <- parts u ] 85 | 86 | sumr, prodr :: Semiring r => [r] -> r 87 | sumr = foldr add zero 88 | prodr = foldr mul one 89 | 90 | instance Semiring Bool where 91 | zero = False 92 | one = True 93 | add = (||) 94 | mul = (&&) 95 | 96 | instance Semiring Int where 97 | zero = 0 98 | one = 1 99 | add = (+) 100 | mul = (*) 101 | 102 | -------------------------------------------------------------------------------- /haskell/02_RiggedRegex/stack.yaml: -------------------------------------------------------------------------------- 1 | # This file was automatically generated by 'stack init' 2 | # 3 | # Some commonly used options have been documented as comments in this file. 4 | # For advanced use and comprehensive documentation of the format, please see: 5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/ 6 | 7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version. 8 | # A snapshot resolver dictates the compiler version and the set of packages 9 | # to be used for project dependencies. For example: 10 | # 11 | # resolver: lts-3.5 12 | # resolver: nightly-2015-09-21 13 | # resolver: ghc-7.10.2 14 | # 15 | # The location of a snapshot can be provided as a file or url. Stack assumes 16 | # a snapshot provided as a file might change, whereas a url resource does not. 17 | # 18 | # resolver: ./custom-snapshot.yaml 19 | # resolver: https://example.com/snapshots/2018-01-01.yaml 20 | resolver: lts-13.4 21 | 22 | # User packages to be built. 23 | # Various formats can be used as shown in the example below. 24 | # 25 | # packages: 26 | # - some-directory 27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz 28 | # - location: 29 | # git: https://github.com/commercialhaskell/stack.git 30 | # commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a 31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a 32 | # subdirs: 33 | # - auto-update 34 | # - wai 35 | packages: 36 | - . 37 | # Dependency packages to be pulled from upstream that are not in the resolver 38 | # using the same syntax as the packages field. 39 | # (e.g., acme-missiles-0.3) 40 | # extra-deps: [] 41 | 42 | # Override default flag values for local packages and extra-deps 43 | # flags: {} 44 | 45 | # Extra package databases containing global packages 46 | # extra-package-dbs: [] 47 | 48 | # Control whether we use the GHC we find on the path 49 | # system-ghc: true 50 | # 51 | # Require a specific version of stack, using version ranges 52 | # require-stack-version: -any # Default 53 | # require-stack-version: ">=1.9" 54 | # 55 | # Override the architecture used by stack, especially useful on Windows 56 | # arch: i386 57 | # arch: x86_64 58 | # 59 | # Extra directories used by stack for building 60 | # extra-include-dirs: [/path/to/dir] 61 | # extra-lib-dirs: [/path/to/dir] 62 | # 63 | # Allow a newer minor version of GHC than the snapshot specifies 64 | # compiler-check: newer-minor 65 | -------------------------------------------------------------------------------- /haskell/02_RiggedRegex/test/Tests.hs: -------------------------------------------------------------------------------- 1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-} 2 | {-# LANGUAGE RecordWildCards #-} 3 | import Data.Foldable (for_) 4 | import Test.Hspec (Spec, it, shouldBe) 5 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith) 6 | import RiggedRegex (Reg (..), accept, acceptw, rigged) 7 | 8 | main :: IO () 9 | main = hspecWith defaultConfig {configFastFail = True} specs 10 | 11 | specs :: Spec 12 | specs = do 13 | 14 | let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) ) 15 | let onec = Seq nocs (Sym 'c') 16 | let evencs = Seq ( Rep ( Seq onec onec ) ) nocs 17 | 18 | let as = Alt (Sym 'a') (Rep (Sym 'a')) 19 | let bs = Alt (Sym 'b') (Rep (Sym 'b')) 20 | 21 | it "simple expression" $ 22 | accept evencs "acc" `shouldBe` True 23 | 24 | it "lifted expression" $ 25 | (acceptw (rigged evencs) "acc" :: Bool) `shouldBe` True 26 | 27 | it "lifted expression short" $ 28 | (acceptw (rigged evencs) "acc" :: Int) `shouldBe` 1 29 | 30 | it "lifted expression counter two" $ 31 | (acceptw (rigged as) "a" :: Int) `shouldBe` 2 32 | 33 | it "lifted expression counter one" $ 34 | (acceptw (rigged as) "aa" :: Int) `shouldBe` 1 35 | 36 | it "lifted expression dynamic counter four" $ 37 | (acceptw (rigged (Seq as bs)) "ab" :: Int) `shouldBe` 4 38 | 39 | for_ cases test 40 | where 41 | test Case {..} = it description assertion 42 | where 43 | assertion = (acceptw (rigged regex) sample :: Bool) `shouldBe` result 44 | 45 | data Case = Case 46 | { description :: String 47 | , regex :: Reg 48 | , sample :: String 49 | , result :: Bool 50 | } 51 | 52 | cases :: [Case] 53 | cases = 54 | [ Case {description = "empty", regex = Eps, sample = "", result = True} 55 | , Case {description = "char", regex = Sym 'a', sample = "a", result = True} 56 | , Case 57 | {description = "not char", regex = Sym 'a', sample = "b", result = False} 58 | , Case 59 | { description = "char vs empty" 60 | , regex = Sym 'a' 61 | , sample = "" 62 | , result = False 63 | } 64 | , Case 65 | { description = "left alt" 66 | , regex = Alt (Sym 'a') (Sym 'b') 67 | , sample = "a" 68 | , result = True 69 | } 70 | , Case 71 | { description = "right alt" 72 | , regex = Alt (Sym 'a') (Sym 'b') 73 | , sample = "b" 74 | , result = True 75 | } 76 | , Case 77 | { description = "neither alt" 78 | , regex = Alt (Sym 'a') (Sym 'b') 79 | , sample = "c" 80 | , result = False 81 | } 82 | , Case 83 | { description = "empty alt" 84 | , regex = Alt (Sym 'a') (Sym 'b') 85 | , sample = "" 86 | , result = False 87 | } 88 | , Case 89 | { description = "empty rep" 90 | , regex = Rep (Sym 'a') 91 | , sample = "" 92 | , result = True 93 | } 94 | , Case 95 | { description = "one rep" 96 | , regex = Rep (Sym 'a') 97 | , sample = "a" 98 | , result = True 99 | } 100 | , Case 101 | { description = "multiple rep" 102 | , regex = Rep (Sym 'a') 103 | , sample = "aaaaaaaaa" 104 | , result = True 105 | } 106 | , Case 107 | { description = "multiple rep with failure" 108 | , regex = Rep (Sym 'a') 109 | , sample = "aaaaaaaaab" 110 | , result = False 111 | } 112 | , Case 113 | { description = "sequence" 114 | , regex = Seq (Sym 'a') (Sym 'b') 115 | , sample = "ab" 116 | , result = True 117 | } 118 | , Case 119 | { description = "sequence with empty" 120 | , regex = Seq (Sym 'a') (Sym 'b') 121 | , sample = "" 122 | , result = False 123 | } 124 | , Case 125 | { description = "bad short sequence" 126 | , regex = Seq (Sym 'a') (Sym 'b') 127 | , sample = "a" 128 | , result = False 129 | } 130 | , Case 131 | { description = "bad long sequence" 132 | , regex = Seq (Sym 'a') (Sym 'b') 133 | , sample = "abc" 134 | , result = False 135 | } 136 | ] 137 | 138 | 139 | -------------------------------------------------------------------------------- /haskell/03_Brzozowski/BrzExp.cabal: -------------------------------------------------------------------------------- 1 | cabal-version: 1.12 2 | 3 | -- This file has been generated from package.yaml by hpack version 0.31.1. 4 | -- 5 | -- see: https://github.com/sol/hpack 6 | -- 7 | -- hash: bf664c2df8f31bc5a6ffd907ebaf10824d56d8d271f07d3f4cc7562ce9d44395 8 | 9 | name: BrzExp 10 | version: 0.1.0.0 11 | category: Regex 12 | homepage: https://github.com/elfsternberg/riggedregex#readme 13 | author: Elf M. Sternberg 14 | maintainer: elf.sternberg@gmail.com 15 | copyright: Copyright ⓒ 2019 Elf M. Sternberg 16 | license: BSD3 17 | license-file: LICENSE 18 | build-type: Simple 19 | extra-source-files: 20 | README.md 21 | 22 | library 23 | exposed-modules: 24 | BrzExp 25 | other-modules: 26 | Paths_BrzExp 27 | hs-source-dirs: 28 | src 29 | ghc-options: -Wall 30 | build-depends: 31 | base 32 | default-language: Haskell2010 33 | 34 | test-suite test 35 | type: exitcode-stdio-1.0 36 | main-is: Tests.hs 37 | other-modules: 38 | Paths_BrzExp 39 | hs-source-dirs: 40 | test 41 | build-depends: 42 | BrzExp 43 | , base 44 | , hspec 45 | default-language: Haskell2010 46 | -------------------------------------------------------------------------------- /haskell/03_Brzozowski/LICENSE: -------------------------------------------------------------------------------- 1 | The Brzozowski experiments are original work, and are copyright and 2 | licensed by Elf M. Sternberg according to the Mozilla 2.0 Public 3 | License. See the LICENSE.md file in the main directory 4 | -------------------------------------------------------------------------------- /haskell/03_Brzozowski/README.md: -------------------------------------------------------------------------------- 1 | # Brzozowski Regular Expressions, in Haskell 2 | 3 | This is a regex recognizer implementing Brzozowski's Algorithm, in 4 | Haskell. Brzozowski's Algorithm has been a bit of a fascination for me, 5 | because it made generally much more sense that the traditional 6 | algorithm, especially since the Pumping Lemma is much more intelligible 7 | under Brzozowski than it is with more common forms of automata analysis. 8 | 9 | Brzozowski's algorithm basically says that a regular expression is a 10 | function that, given a string and a regular expression, returns three 11 | things: the remainder of the input after the leading character has been 12 | consumed, and a new function that represents the rest of the regular 13 | expression after that leading character has been analyzed, and the 14 | status of the analysis thus far. 15 | 16 | Brzozowski called this "the derivative of the regular expression." 17 | 18 | The only trick to dealing with Brzozowski's Algorithm is with respect to 19 | nullability: it is important to know if a regular expression _may be 20 | nullable_ (that is, it may accept the empty string). A separate 21 | function describes the nullability of the different kinds of expressions 22 | in our system. 23 | 24 | 25 | 26 | 27 | 28 | -------------------------------------------------------------------------------- /haskell/03_Brzozowski/Setup.hs: -------------------------------------------------------------------------------- 1 | import Distribution.Simple 2 | main = defaultMain 3 | -------------------------------------------------------------------------------- /haskell/03_Brzozowski/package.yaml: -------------------------------------------------------------------------------- 1 | name: BrzExp 2 | version: 0.1.0.0 3 | 4 | homepage: https://github.com/elfsternberg/riggedregex#readme 5 | license: BSD3 6 | license-file: LICENSE 7 | author: Elf M. Sternberg 8 | maintainer: elf.sternberg@gmail.com 9 | copyright: Copyright ⓒ 2019 Elf M. Sternberg 10 | category: Regex 11 | build-type: Simple 12 | extra-source-files: README.md 13 | 14 | dependencies: 15 | - base 16 | 17 | library: 18 | exposed-modules: BrzExp 19 | ghc-options: -Wall 20 | source-dirs: src 21 | 22 | tests: 23 | test: 24 | main: Tests.hs 25 | source-dirs: test 26 | dependencies: 27 | - BrzExp 28 | - hspec 29 | 30 | -------------------------------------------------------------------------------- /haskell/03_Brzozowski/src/BrzExp.hs: -------------------------------------------------------------------------------- 1 | module BrzExp ( accept, nullable, Brz (..) ) where 2 | data Brz = Emp | Eps | Sym Char | Alt Brz Brz | Seq Brz Brz | Rep Brz 3 | 4 | derive :: Brz -> Char -> Brz 5 | derive Emp _ = Emp 6 | derive Eps _ = Emp 7 | derive (Sym c) u = if c == u then Eps else Emp 8 | derive (Seq l r) u 9 | | nullable l = Alt (Seq (derive l u) r) (derive r u) 10 | | otherwise = Seq (derive l u) r 11 | 12 | derive (Alt Emp r) u = derive r u 13 | derive (Alt l Emp) u = derive l u 14 | derive (Alt l r) u = Alt (derive r u) (derive l u) 15 | 16 | derive (Rep r) u = Seq (derive r u) (Rep r) 17 | 18 | nullable :: Brz -> Bool 19 | nullable Emp = False 20 | nullable Eps = True 21 | nullable (Sym _) = False 22 | nullable (Alt l r) = nullable l || nullable r 23 | nullable (Seq l r) = nullable l && nullable r 24 | nullable (Rep _) = True 25 | 26 | accept :: Brz -> String -> Bool 27 | accept r [] = nullable r 28 | accept r (s:ss) = accept (derive r s) ss 29 | 30 | 31 | -------------------------------------------------------------------------------- /haskell/03_Brzozowski/stack.yaml: -------------------------------------------------------------------------------- 1 | # This file was automatically generated by 'stack init' 2 | # 3 | # Some commonly used options have been documented as comments in this file. 4 | # For advanced use and comprehensive documentation of the format, please see: 5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/ 6 | 7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version. 8 | # A snapshot resolver dictates the compiler version and the set of packages 9 | # to be used for project dependencies. For example: 10 | # 11 | # resolver: lts-3.5 12 | # resolver: nightly-2015-09-21 13 | # resolver: ghc-7.10.2 14 | # 15 | # The location of a snapshot can be provided as a file or url. Stack assumes 16 | # a snapshot provided as a file might change, whereas a url resource does not. 17 | # 18 | # resolver: ./custom-snapshot.yaml 19 | # resolver: https://example.com/snapshots/2018-01-01.yaml 20 | resolver: lts-13.4 21 | 22 | # User packages to be built. 23 | # Various formats can be used as shown in the example below. 24 | # 25 | # packages: 26 | # - some-directory 27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz 28 | # - location: 29 | # git: https://github.com/commercialhaskell/stack.git 30 | # commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a 31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a 32 | # subdirs: 33 | # - auto-update 34 | # - wai 35 | packages: 36 | - . 37 | # Dependency packages to be pulled from upstream that are not in the resolver 38 | # using the same syntax as the packages field. 39 | # (e.g., acme-missiles-0.3) 40 | # extra-deps: [] 41 | 42 | # Override default flag values for local packages and extra-deps 43 | # flags: {} 44 | 45 | # Extra package databases containing global packages 46 | # extra-package-dbs: [] 47 | 48 | # Control whether we use the GHC we find on the path 49 | # system-ghc: true 50 | # 51 | # Require a specific version of stack, using version ranges 52 | # require-stack-version: -any # Default 53 | # require-stack-version: ">=1.9" 54 | # 55 | # Override the architecture used by stack, especially useful on Windows 56 | # arch: i386 57 | # arch: x86_64 58 | # 59 | # Extra directories used by stack for building 60 | # extra-include-dirs: [/path/to/dir] 61 | # extra-lib-dirs: [/path/to/dir] 62 | # 63 | # Allow a newer minor version of GHC than the snapshot specifies 64 | # compiler-check: newer-minor 65 | -------------------------------------------------------------------------------- /haskell/03_Brzozowski/test/Tests.hs: -------------------------------------------------------------------------------- 1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-} 2 | {-# LANGUAGE RecordWildCards #-} 3 | 4 | import Data.Foldable (for_) 5 | import Test.Hspec (Spec, describe, it, shouldBe) 6 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith) 7 | 8 | import BrzExp (Brz (..), accept) 9 | 10 | main :: IO () 11 | main = hspecWith defaultConfig {configFastFail = True} specs 12 | 13 | specs :: Spec 14 | specs = describe "accept" $ for_ cases test 15 | where 16 | test Case {..} = it description assertion 17 | where 18 | assertion = accept regex sample `shouldBe` result 19 | 20 | data Case = Case 21 | { description :: String 22 | , regex :: Brz 23 | , sample :: String 24 | , result :: Bool 25 | } 26 | 27 | -- let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) ) 28 | -- onec = Seq nocs (Sym 'c') 29 | -- evencs = Seq ( Rep ( Seq onec onec ) ) nocs 30 | -- as = Alt (Sym 'a') (Rep (Sym 'a')) 31 | -- bs = Alt (Sym 'b') (Rep (Sym 'b')) 32 | cases :: [Case] 33 | cases = 34 | [ Case {description = "empty", regex = Eps, sample = "", result = True} 35 | , Case {description = "null", regex = Emp, sample = "", result = False} 36 | , Case {description = "char", regex = Sym 'a', sample = "a", result = True} 37 | , Case 38 | {description = "not char", regex = Sym 'a', sample = "b", result = False} 39 | , Case 40 | { description = "char vs empty" 41 | , regex = Sym 'a' 42 | , sample = "" 43 | , result = False 44 | } 45 | , Case 46 | { description = "left alt" 47 | , regex = Alt (Sym 'a') (Sym 'b') 48 | , sample = "a" 49 | , result = True 50 | } 51 | , Case 52 | { description = "right alt" 53 | , regex = Alt (Sym 'a') (Sym 'b') 54 | , sample = "b" 55 | , result = True 56 | } 57 | , Case 58 | { description = "neither alt" 59 | , regex = Alt (Sym 'a') (Sym 'b') 60 | , sample = "c" 61 | , result = False 62 | } 63 | , Case 64 | { description = "empty alt" 65 | , regex = Alt (Sym 'a') (Sym 'b') 66 | , sample = "" 67 | , result = False 68 | } 69 | , Case 70 | { description = "empty rep" 71 | , regex = Rep (Sym 'a') 72 | , sample = "" 73 | , result = True 74 | } 75 | , Case 76 | { description = "one rep" 77 | , regex = Rep (Sym 'a') 78 | , sample = "a" 79 | , result = True 80 | } 81 | , Case 82 | { description = "multiple rep" 83 | , regex = Rep (Sym 'a') 84 | , sample = "aaaaaaaaa" 85 | , result = True 86 | } 87 | , Case 88 | { description = "multiple rep with failure" 89 | , regex = Rep (Sym 'a') 90 | , sample = "aaaaaaaaab" 91 | , result = False 92 | } 93 | , Case 94 | { description = "sequence" 95 | , regex = Seq (Sym 'a') (Sym 'b') 96 | , sample = "ab" 97 | , result = True 98 | } 99 | , Case 100 | { description = "sequence with empty" 101 | , regex = Seq (Sym 'a') (Sym 'b') 102 | , sample = "" 103 | , result = False 104 | } 105 | , Case 106 | { description = "bad short sequence" 107 | , regex = Seq (Sym 'a') (Sym 'b') 108 | , sample = "a" 109 | , result = False 110 | } 111 | , Case 112 | { description = "bad long sequence" 113 | , regex = Seq (Sym 'a') (Sym 'b') 114 | , sample = "abc" 115 | , result = False 116 | } 117 | ] 118 | -------------------------------------------------------------------------------- /haskell/04_Gluskov/Glushkov.cabal: -------------------------------------------------------------------------------- 1 | cabal-version: 1.12 2 | 3 | -- This file has been generated from package.yaml by hpack version 0.31.1. 4 | -- 5 | -- see: https://github.com/sol/hpack 6 | -- 7 | -- hash: 1a234ba3e4b3372f4e6f179bb337b813ee69faffc8b001f781636f1ba3d185e4 8 | 9 | name: Glushkov 10 | version: 0.1.0.0 11 | category: Regex 12 | homepage: https://github.com/elfsternberg/riggedregex#readme 13 | author: Elf M. Sternberg 14 | maintainer: elf.sternberg@gmail.com 15 | copyright: Copyright ⓒ 2019 Elf M. Sternberg 16 | license: BSD3 17 | license-file: LICENSE 18 | build-type: Simple 19 | extra-source-files: 20 | README.md 21 | 22 | library 23 | exposed-modules: 24 | Glushkov 25 | other-modules: 26 | Paths_Glushkov 27 | hs-source-dirs: 28 | src 29 | ghc-options: -Wall 30 | build-depends: 31 | base 32 | default-language: Haskell2010 33 | 34 | test-suite test 35 | type: exitcode-stdio-1.0 36 | main-is: Tests.hs 37 | other-modules: 38 | Paths_Glushkov 39 | hs-source-dirs: 40 | test 41 | build-depends: 42 | Glushkov 43 | , base 44 | , hspec 45 | default-language: Haskell2010 46 | -------------------------------------------------------------------------------- /haskell/04_Gluskov/LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2010, Thomas Wilke, Frank Huch, Sebastian Fischer 2 | 3 | All rights reserved. 4 | 5 | Redistribution and use in source and binary forms, with or without 6 | modification, are permitted provided that the following conditions are 7 | met: 8 | 9 | 1. Redistributions of source code must retain the above copyright 10 | notice, this list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright 13 | notice, this list of conditions and the following disclaimer in 14 | the documentation and/or other materials provided with the 15 | distribution. 16 | 17 | 3. Neither the name of the author nor the names of his contributors 18 | may be used to endorse or promote products derived from this 19 | software without specific prior written permission. 20 | 21 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 22 | ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 23 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 24 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR 25 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 26 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 27 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR 28 | PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 29 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 30 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 31 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 32 | 33 | -------------------------------------------------------------------------------- /haskell/04_Gluskov/README.md: -------------------------------------------------------------------------------- 1 | # Glushkov Regular Expressions, in Haskell 2 | 3 | This is a Glushkov's construction of regular expressions. The basic 4 | idea is that for every symbol encountered during parsing, a 5 | corresponding symbol in the tree is marked (or, if not symbols are 6 | marked, the parse is a failure). Composites are followed to their 7 | ends for each character, and if the symbol matches it is "marked". 8 | 9 | In this instance, are passing a Glushkov regular expression tree, 10 | and for each character it returns a new, complete copy of the tree, 11 | only with the marks "shifted" to where they should be given the 12 | character. In this way, each iteration of the tree keeps the NFA 13 | list of states that are active; they are the paths that lead to 14 | marked symbols. 15 | 16 | 'final' here means that no more symbols have to be read to match 17 | the expression. 'empty' here means that the expression matches 18 | only the empty string. 19 | 20 | 'final' is used here to determine if, for the Glushkov expression 21 | passed in, does the expression contain a marked symbol? This is 22 | used both to determine the end state of the expression, and in 23 | sequences to determine if the rightmost expression must be evaluted, 24 | that is, if we're currently going down a 'marked' path and the left 25 | expression can handle the empty string OR the left expression is 26 | final. 27 | 28 | The accept method is just a fold over the expression. The initial 29 | value is the shift of the first character, with the assumed mark of 30 | 'True' being included because we can always parse infinitely many 31 | empty strings before the sample begins. The returned value of that 32 | shift is our new regular expression, on which we then progressively 33 | call `shift False accg c`; here False means that we're only going to 34 | shift marks we've already found. 35 | 36 | The "trick" to understand this is to consider the string "ab" for 37 | the sequence "ab". The first time through, we start with True, and 38 | what gets marked is the symbol 'a'. 39 | 40 | When we pass the letter 'b', what happens? Well, the 'a' symbol 41 | will be unmarked (it didn't match the character), but the second 42 | part of the shift expression says that the left expression is final 43 | (it's a symbol and it's marked!), so we call `shift True (Sym 'b') 44 | 'b'`, and the mark moves to the correct destination. 45 | 46 | It continues to blow my mind that so much of mathematics can be directly 47 | translated into Haskell with no loss of fidelity. 48 | -------------------------------------------------------------------------------- /haskell/04_Gluskov/package.yaml: -------------------------------------------------------------------------------- 1 | name: Glushkov 2 | version: 0.1.0.0 3 | 4 | homepage: https://github.com/elfsternberg/riggedregex#readme 5 | license: BSD3 6 | license-file: LICENSE 7 | author: Elf M. Sternberg 8 | maintainer: elf.sternberg@gmail.com 9 | copyright: Copyright ⓒ 2019 Elf M. Sternberg 10 | category: Regex 11 | build-type: Simple 12 | extra-source-files: README.md 13 | 14 | dependencies: 15 | - base 16 | 17 | library: 18 | exposed-modules: Glushkov 19 | ghc-options: -Wall 20 | source-dirs: src 21 | 22 | tests: 23 | test: 24 | main: Tests.hs 25 | source-dirs: test 26 | dependencies: 27 | - Glushkov 28 | - hspec 29 | 30 | -------------------------------------------------------------------------------- /haskell/04_Gluskov/src/Glushkov.hs: -------------------------------------------------------------------------------- 1 | module Glushkov (Glu (..), accept) where 2 | 3 | data Glu = Eps 4 | | Sym Bool Char 5 | | Alt Glu Glu 6 | | Seq Glu Glu 7 | | Rep Glu 8 | 9 | shift :: Bool -> Glu -> Char -> Glu 10 | shift _ Eps _ = Eps 11 | shift m (Sym _ x) c = Sym (m && x == c) x 12 | shift m (Alt p q) c = Alt (shift m p c) (shift m q c) 13 | shift m (Seq p q) c = Seq (shift m p c) (shift (m && empty p || final p) q c) 14 | shift m (Rep r) c = Rep (shift (m || final r) r c) 15 | 16 | empty :: Glu -> Bool 17 | empty Eps = True 18 | empty (Sym _ _) = False 19 | empty (Alt p q) = empty p || empty q 20 | empty (Seq p q) = empty p && empty q 21 | empty (Rep _) = True 22 | 23 | final :: Glu -> Bool 24 | final Eps = False 25 | final (Sym b _) = b 26 | final (Alt p q) = final p || final q 27 | final (Seq p q) = final p && empty q || final q 28 | final (Rep r) = final r 29 | 30 | accept :: Glu -> String -> Bool 31 | accept r [] = empty r 32 | accept r (c:cs) = final (foldl (shift False) (shift True r c) cs) 33 | -------------------------------------------------------------------------------- /haskell/04_Gluskov/stack.yaml: -------------------------------------------------------------------------------- 1 | # This file was automatically generated by 'stack init' 2 | # 3 | # Some commonly used options have been documented as comments in this file. 4 | # For advanced use and comprehensive documentation of the format, please see: 5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/ 6 | 7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version. 8 | # A snapshot resolver dictates the compiler version and the set of packages 9 | # to be used for project dependencies. For example: 10 | # 11 | # resolver: lts-3.5 12 | # resolver: nightly-2015-09-21 13 | # resolver: ghc-7.10.2 14 | # 15 | # The location of a snapshot can be provided as a file or url. Stack assumes 16 | # a snapshot provided as a file might change, whereas a url resource does not. 17 | # 18 | # resolver: ./custom-snapshot.yaml 19 | # resolver: https://example.com/snapshots/2018-01-01.yaml 20 | resolver: lts-13.5 21 | 22 | # User packages to be built. 23 | # Various formats can be used as shown in the example below. 24 | # 25 | # packages: 26 | # - some-directory 27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz 28 | # - location: 29 | # git: https://github.com/commercialhaskell/stack.git 30 | # commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a 31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a 32 | # subdirs: 33 | # - auto-update 34 | # - wai 35 | packages: 36 | - . 37 | # Dependency packages to be pulled from upstream that are not in the resolver 38 | # using the same syntax as the packages field. 39 | # (e.g., acme-missiles-0.3) 40 | # extra-deps: [] 41 | 42 | # Override default flag values for local packages and extra-deps 43 | # flags: {} 44 | 45 | # Extra package databases containing global packages 46 | # extra-package-dbs: [] 47 | 48 | # Control whether we use the GHC we find on the path 49 | # system-ghc: true 50 | # 51 | # Require a specific version of stack, using version ranges 52 | # require-stack-version: -any # Default 53 | # require-stack-version: ">=1.9" 54 | # 55 | # Override the architecture used by stack, especially useful on Windows 56 | # arch: i386 57 | # arch: x86_64 58 | # 59 | # Extra directories used by stack for building 60 | # extra-include-dirs: [/path/to/dir] 61 | # extra-lib-dirs: [/path/to/dir] 62 | # 63 | # Allow a newer minor version of GHC than the snapshot specifies 64 | # compiler-check: newer-minor 65 | -------------------------------------------------------------------------------- /haskell/04_Gluskov/test/Tests.hs: -------------------------------------------------------------------------------- 1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-} 2 | {-# LANGUAGE RecordWildCards #-} 3 | 4 | import Data.Foldable (for_) 5 | import Test.Hspec (Spec, describe, it, shouldBe) 6 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith) 7 | 8 | import Glushkov (Glu (..), accept) 9 | 10 | main :: IO () 11 | main = hspecWith defaultConfig {configFastFail = True} specs 12 | 13 | specs :: Spec 14 | specs = describe "accept" $ for_ cases test 15 | where 16 | test Case {..} = it description assertion 17 | where 18 | assertion = accept regex sample `shouldBe` result 19 | 20 | data Case = Case 21 | { description :: String 22 | , regex :: Glu 23 | , sample :: String 24 | , result :: Bool 25 | } 26 | 27 | -- let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) ) 28 | -- onec = Seq nocs (Sym 'c') 29 | -- evencs = Seq ( Rep ( Seq onec onec ) ) nocs 30 | -- as = Alt (Sym 'a') (Rep (Sym 'a')) 31 | -- bs = Alt (Sym 'b') (Rep (Sym 'b')) 32 | 33 | cases :: [Case] 34 | cases = 35 | [ Case {description = "empty", regex = Eps, sample = "", result = True} 36 | , Case {description = "char", regex = Sym False 'a', sample = "a", result = True} 37 | , Case 38 | {description = "not char", regex = Sym False 'a', sample = "b", result = False} 39 | , Case 40 | { description = "char vs empty" 41 | , regex = Sym False 'a' 42 | , sample = "" 43 | , result = False 44 | } 45 | , Case 46 | { description = "left alt" 47 | , regex = Alt (Sym False 'a') (Sym False 'b') 48 | , sample = "a" 49 | , result = True 50 | } 51 | , Case 52 | { description = "right alt" 53 | , regex = Alt (Sym False 'a') (Sym False 'b') 54 | , sample = "b" 55 | , result = True 56 | } 57 | , Case 58 | { description = "neither alt" 59 | , regex = Alt (Sym False 'a') (Sym False 'b') 60 | , sample = "c" 61 | , result = False 62 | } 63 | , Case 64 | { description = "empty alt" 65 | , regex = Alt (Sym False 'a') (Sym False 'b') 66 | , sample = "" 67 | , result = False 68 | } 69 | , Case 70 | { description = "empty rep" 71 | , regex = Rep (Sym False 'a') 72 | , sample = "" 73 | , result = True 74 | } 75 | , Case 76 | { description = "one rep" 77 | , regex = Rep (Sym False 'a') 78 | , sample = "a" 79 | , result = True 80 | } 81 | , Case 82 | { description = "multiple rep" 83 | , regex = Rep (Sym False 'a') 84 | , sample = "aaaaaaaaa" 85 | , result = True 86 | } 87 | , Case 88 | { description = "multiple rep with failure" 89 | , regex = Rep (Sym False 'a') 90 | , sample = "aaaaaaaaab" 91 | , result = False 92 | } 93 | , Case 94 | { description = "sequence" 95 | , regex = Seq (Sym False 'a') (Sym False 'b') 96 | , sample = "ab" 97 | , result = True 98 | } 99 | , Case 100 | { description = "sequence with empty" 101 | , regex = Seq (Sym False 'a') (Sym False 'b') 102 | , sample = "" 103 | , result = False 104 | } 105 | , Case 106 | { description = "bad short sequence" 107 | , regex = Seq (Sym False 'a') (Sym False 'b') 108 | , sample = "a" 109 | , result = False 110 | } 111 | , Case 112 | { description = "bad long sequence" 113 | , regex = Seq (Sym False 'a') (Sym False 'b') 114 | , sample = "abc" 115 | , result = False 116 | } 117 | ] 118 | -------------------------------------------------------------------------------- /haskell/05_RiggedBrz/LICENSE: -------------------------------------------------------------------------------- 1 | The Brzozowski experiments are original work, and are copyright and 2 | licensed by Elf M. Sternberg according to the Mozilla 2.0 Public 3 | License. See the LICENSE.md file in the main directory 4 | -------------------------------------------------------------------------------- /haskell/05_RiggedBrz/README.md: -------------------------------------------------------------------------------- 1 | # Rigged Brzozowski Regular Expressions, in Haskell 2 | 3 | This is the naive implementation of Brzozowski's Algorithm, but with a 4 | Semiring implementation for gathering complex information from the parse 5 | process. This implementation is "naive" in that it saves everything, 6 | including the very large number of dead branches that hang off Sequence 7 | processing, and then discards them at the very end of the process. 8 | 9 | This implementation finally proves to me something that I've been trying 10 | to express for a while: Might, Adams, et. al.'s implementations of tree 11 | parsing *are* Semiring implementations, they just don't call it that, 12 | but the fundamental underlying operations are the same. 13 | 14 | I'm fascinated by the lack of the nullability operator. Instead, it's 15 | just resolved by Emp being parsed as `zero` and Eps as `one * s` where 16 | `s` is the product of the previous operation, and then the new `Delta` 17 | operator preserves this semantic, using multiplicative annhilation to 18 | discard false parses while also being immune to the `Sequence` semantic 19 | that destroys success parse history. 20 | 21 | This can't last. And Might admits it doesn't last. Darais's 22 | implementation goes back to having a separate function for nullability 23 | that both preserves the status of known-nullable expressions and handles 24 | recursion. Darais's version also implements an incredible number of 25 | optimizations to prune, compact, and process the parse tree early, 26 | enabling a number of speedups and caching strategies that get you within 27 | spitting distance of RE2. 28 | 29 | 30 | -------------------------------------------------------------------------------- /haskell/05_RiggedBrz/RiggedBrz.cabal: -------------------------------------------------------------------------------- 1 | cabal-version: 1.12 2 | 3 | -- This file has been generated from package.yaml by hpack version 0.31.1. 4 | -- 5 | -- see: https://github.com/sol/hpack 6 | -- 7 | -- hash: 1c0fd9015f1269af8c7445bf2af4c10b835d053b9ddce4a3df3815fb4724e489 8 | 9 | name: RiggedBrz 10 | version: 0.1.0.0 11 | category: Regex 12 | homepage: https://github.com/elfsternberg/riggedregex#readme 13 | author: Elf M. Sternberg 14 | maintainer: elf.sternberg@gmail.com 15 | copyright: Copyright ⓒ 2019 Elf M. Sternberg 16 | license: BSD3 17 | license-file: LICENSE 18 | build-type: Simple 19 | extra-source-files: 20 | README.md 21 | 22 | library 23 | exposed-modules: 24 | RiggedBrz 25 | other-modules: 26 | Paths_RiggedBrz 27 | hs-source-dirs: 28 | src 29 | ghc-options: -Wall 30 | build-depends: 31 | base 32 | , containers 33 | default-language: Haskell2010 34 | 35 | test-suite test 36 | type: exitcode-stdio-1.0 37 | main-is: Tests.hs 38 | other-modules: 39 | Paths_RiggedBrz 40 | hs-source-dirs: 41 | test 42 | build-depends: 43 | RiggedBrz 44 | , base 45 | , containers 46 | , hspec 47 | default-language: Haskell2010 48 | -------------------------------------------------------------------------------- /haskell/05_RiggedBrz/Setup.hs: -------------------------------------------------------------------------------- 1 | import Distribution.Simple 2 | main = defaultMain 3 | -------------------------------------------------------------------------------- /haskell/05_RiggedBrz/package.yaml: -------------------------------------------------------------------------------- 1 | name: RiggedBrz 2 | version: 0.1.0.0 3 | 4 | homepage: https://github.com/elfsternberg/riggedregex#readme 5 | license: BSD3 6 | license-file: LICENSE 7 | author: Elf M. Sternberg 8 | maintainer: elf.sternberg@gmail.com 9 | copyright: Copyright ⓒ 2019 Elf M. Sternberg 10 | category: Regex 11 | build-type: Simple 12 | extra-source-files: README.md 13 | 14 | dependencies: 15 | - containers 16 | - base 17 | 18 | library: 19 | exposed-modules: RiggedBrz 20 | ghc-options: -Wall 21 | source-dirs: src 22 | 23 | tests: 24 | test: 25 | main: Tests.hs 26 | source-dirs: test 27 | dependencies: 28 | - RiggedBrz 29 | - hspec 30 | 31 | -------------------------------------------------------------------------------- /haskell/05_RiggedBrz/src/RiggedBrz.hs: -------------------------------------------------------------------------------- 1 | {-# LANGUAGE LambdaCase #-} 2 | {-# LANGUAGE FlexibleInstances #-} 3 | 4 | module RiggedBrz ( Brz (..), parse, rigged, riggeds ) where 5 | 6 | import Data.Set 7 | 8 | data Brz = Emp | Eps | Sym Char | Alt Brz Brz | Seq Brz Brz | Rep Brz deriving (Eq) 9 | 10 | -- Transform a Brz into a Brzr. That's all it does. It's not magical. 11 | 12 | rigging :: Semiring s => (Char -> Brzr Char s) -> Brz -> Brzr Char s 13 | rigging s = \case 14 | Emp -> Empr 15 | Eps -> Epsr one 16 | (Sym c) -> s c 17 | (Alt p q) -> Altr (rigging s p) (rigging s q) 18 | (Seq p q) -> Seqr (rigging s p) (rigging s q) 19 | (Rep r) -> Repr (rigging s r) 20 | 21 | class Semiring s where 22 | zero, one :: s 23 | mul, add :: s -> s -> s 24 | 25 | data Brzr c s = Empr 26 | | Epsr s 27 | | Delr (Brzr c s) 28 | | Symr (c -> s) 29 | | Altr (Brzr c s) (Brzr c s) 30 | | Seqr (Brzr c s) (Brzr c s) 31 | | Repr (Brzr c s) 32 | 33 | deriver :: Semiring s => Brzr c s -> c -> Brzr c s 34 | deriver Empr _ = Empr 35 | deriver (Epsr _) _ = Empr 36 | deriver (Delr _) _ = Empr 37 | deriver (Symr f) u = Epsr $ (f u) 38 | 39 | deriver (Seqr l r) u = 40 | Altr dl dr 41 | where 42 | dl = Seqr (deriver l u) r 43 | dr = Seqr (Delr l) (deriver r u) 44 | 45 | deriver (Altr l r) u = go (deriver l u) (deriver r u) 46 | where go Empr r1 = r1 47 | go r1 Empr = r1 48 | go l1 r1 = Altr l1 r1 49 | 50 | deriver (Repr r) u = Seqr (deriver r u) (Repr r) 51 | 52 | parsenull :: Semiring s => (Brzr c s) -> s 53 | parsenull Empr = zero 54 | parsenull (Symr _) = zero 55 | parsenull (Repr _) = one 56 | parsenull (Epsr s) = s 57 | parsenull (Delr s) = parsenull s 58 | parsenull (Altr p q) = parsenull p `add` parsenull q 59 | parsenull (Seqr p q) = parsenull p `mul` parsenull q 60 | 61 | instance Semiring Int where 62 | zero = 0 63 | one = 1 64 | add = (Prelude.+) 65 | mul = (Prelude.*) 66 | 67 | instance Semiring Bool where 68 | zero = False 69 | one = True 70 | add = (||) 71 | mul = (&&) 72 | 73 | instance Semiring (Set String) where 74 | zero = empty 75 | one = singleton "" 76 | add = union 77 | mul a b = Data.Set.map (uncurry (++)) $ cartesianProduct a b 78 | 79 | -- Rigging for boolean and integer values. 80 | 81 | sym :: Semiring s => Char -> Brzr Char s 82 | sym c = Symr (\b -> if b == c then one else zero) 83 | 84 | rigged :: Semiring s => Brz -> Brzr Char s 85 | rigged = rigging sym 86 | 87 | -- Rigging for parse forests 88 | 89 | syms :: Char -> Brzr Char (Set String) 90 | syms c = Symr (\b -> if b == c then singleton [c] else zero) 91 | 92 | riggeds :: Brz -> Brzr Char (Set String) 93 | riggeds = rigging syms 94 | 95 | parse :: (Semiring s) => (Brzr Char s) -> String -> s 96 | parse w [] = parsenull w 97 | parse w (c:cs) = parse (deriver w c) cs 98 | 99 | -------------------------------------------------------------------------------- /haskell/05_RiggedBrz/stack.yaml: -------------------------------------------------------------------------------- 1 | # This file was automatically generated by 'stack init' 2 | # 3 | # Some commonly used options have been documented as comments in this file. 4 | # For advanced use and comprehensive documentation of the format, please see: 5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/ 6 | 7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version. 8 | # A snapshot resolver dictates the compiler version and the set of packages 9 | # to be used for project dependencies. For example: 10 | # 11 | # resolver: lts-3.5 12 | # resolver: nightly-2015-09-21 13 | # resolver: ghc-7.10.2 14 | # 15 | # The location of a snapshot can be provided as a file or url. Stack assumes 16 | # a snapshot provided as a file might change, whereas a url resource does not. 17 | # 18 | # resolver: ./custom-snapshot.yaml 19 | # resolver: https://example.com/snapshots/2018-01-01.yaml 20 | resolver: lts-13.5 21 | 22 | # User packages to be built. 23 | # Various formats can be used as shown in the example below. 24 | # 25 | # packages: 26 | # - some-directory 27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz 28 | # - location: 29 | # git: https://github.com/commercialhaskell/stack.git 30 | # commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a 31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a 32 | # subdirs: 33 | # - auto-update 34 | # - wai 35 | packages: 36 | - . 37 | # Dependency packages to be pulled from upstream that are not in the resolver 38 | # using the same syntax as the packages field. 39 | # (e.g., acme-missiles-0.3) 40 | extra-deps: 41 | - containers-0.6.0.1 42 | 43 | # Override default flag values for local packages and extra-deps 44 | # flags: {} 45 | 46 | # Extra package databases containing global packages 47 | # extra-package-dbs: [] 48 | 49 | # Control whether we use the GHC we find on the path 50 | # system-ghc: true 51 | # 52 | # Require a specific version of stack, using version ranges 53 | # require-stack-version: -any # Default 54 | # require-stack-version: ">=1.9" 55 | # 56 | # Override the architecture used by stack, especially useful on Windows 57 | # arch: i386 58 | # arch: x86_64 59 | # 60 | # Extra directories used by stack for building 61 | # extra-include-dirs: [/path/to/dir] 62 | # extra-lib-dirs: [/path/to/dir] 63 | # 64 | # Allow a newer minor version of GHC than the snapshot specifies 65 | # compiler-check: newer-minor 66 | -------------------------------------------------------------------------------- /haskell/05_RiggedBrz/test/Tests.hs: -------------------------------------------------------------------------------- 1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-} 2 | {-# LANGUAGE RecordWildCards #-} 3 | import Data.Foldable (for_) 4 | import Test.Hspec (Spec, it, shouldBe) 5 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith) 6 | import RiggedBrz ( Brz (..), parse, rigged, riggeds ) 7 | import Data.Set 8 | import Data.List (sort) 9 | 10 | main :: IO () 11 | main = hspecWith defaultConfig {configFastFail = True} specs 12 | 13 | specs :: Spec 14 | specs = do 15 | 16 | let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) ) 17 | let onec = Seq nocs (Sym 'c') 18 | let evencs = Seq ( Rep ( Seq onec onec ) ) nocs 19 | 20 | let as = Alt (Sym 'a') (Rep (Sym 'a')) 21 | let bs = Alt (Sym 'b') (Rep (Sym 'b')) 22 | 23 | it "lifted expression" $ 24 | (parse (rigged evencs) "acc" :: Bool) `shouldBe` True 25 | 26 | it "lifted expression short" $ 27 | (parse (rigged evencs) "acc" :: Int) `shouldBe` 1 28 | 29 | it "lifted expression counter two" $ 30 | (parse (rigged as) "a" :: Int) `shouldBe` 2 31 | 32 | it "lifted expression counter one" $ 33 | (parse (rigged as) "aa" :: Int) `shouldBe` 1 34 | 35 | it "lifted expression dynamic counter four" $ 36 | (parse (rigged (Seq as bs)) "ab" :: Int) `shouldBe` 4 37 | 38 | it "parse forests" $ 39 | (sort $ toList $ (parse (riggeds (Seq as bs)) "ab" :: Set String)) `shouldBe` ["ab"] 40 | 41 | for_ cases test 42 | where 43 | test Case {..} = it description assertion 44 | where 45 | assertion = (parse (rigged regex) sample :: Bool) `shouldBe` result 46 | 47 | data Case = Case 48 | { description :: String 49 | , regex :: Brz 50 | , sample :: String 51 | , result :: Bool 52 | } 53 | 54 | cases :: [Case] 55 | cases = 56 | [ Case {description = "empty", regex = Eps, sample = "", result = True} 57 | , Case {description = "char", regex = Sym 'a', sample = "a", result = True} 58 | , Case 59 | {description = "not char", regex = Sym 'a', sample = "b", result = False} 60 | , Case 61 | { description = "char vs empty" 62 | , regex = Sym 'a' 63 | , sample = "" 64 | , result = False 65 | } 66 | , Case 67 | { description = "left alt" 68 | , regex = Alt (Sym 'a') (Sym 'b') 69 | , sample = "a" 70 | , result = True 71 | } 72 | , Case 73 | { description = "right alt" 74 | , regex = Alt (Sym 'a') (Sym 'b') 75 | , sample = "b" 76 | , result = True 77 | } 78 | , Case 79 | { description = "neither alt" 80 | , regex = Alt (Sym 'a') (Sym 'b') 81 | , sample = "c" 82 | , result = False 83 | } 84 | , Case 85 | { description = "empty alt" 86 | , regex = Alt (Sym 'a') (Sym 'b') 87 | , sample = "" 88 | , result = False 89 | } 90 | , Case 91 | { description = "empty rep" 92 | , regex = Rep (Sym 'a') 93 | , sample = "" 94 | , result = True 95 | } 96 | , Case 97 | { description = "one rep" 98 | , regex = Rep (Sym 'a') 99 | , sample = "a" 100 | , result = True 101 | } 102 | , Case 103 | { description = "multiple rep" 104 | , regex = Rep (Sym 'a') 105 | , sample = "aaaaaaaaa" 106 | , result = True 107 | } 108 | , Case 109 | { description = "multiple rep with failure" 110 | , regex = Rep (Sym 'a') 111 | , sample = "aaaaaaaaab" 112 | , result = False 113 | } 114 | , Case 115 | { description = "sequence" 116 | , regex = Seq (Sym 'a') (Sym 'b') 117 | , sample = "ab" 118 | , result = True 119 | } 120 | , Case 121 | { description = "sequence with empty" 122 | , regex = Seq (Sym 'a') (Sym 'b') 123 | , sample = "" 124 | , result = False 125 | } 126 | , Case 127 | { description = "bad short sequence" 128 | , regex = Seq (Sym 'a') (Sym 'b') 129 | , sample = "a" 130 | , result = False 131 | } 132 | , Case 133 | { description = "bad long sequence" 134 | , regex = Seq (Sym 'a') (Sym 'b') 135 | , sample = "abc" 136 | , result = False 137 | } 138 | ] 139 | 140 | 141 | -------------------------------------------------------------------------------- /haskell/06_RiggedRegex_Combinator/LICENSE: -------------------------------------------------------------------------------- 1 | The Brzozowski experiments are original work, and are copyright and 2 | licensed by Elf M. Sternberg according to the Mozilla 2.0 Public 3 | License. See the LICENSE.md file in the main directory 4 | -------------------------------------------------------------------------------- /haskell/06_RiggedRegex_Combinator/README.md: -------------------------------------------------------------------------------- 1 | # Kleene Regular Expressions with Rigging, in Haskell 2 | 3 | This variant takes the RiggedRegex (Version 02) and provides a third 4 | Semiring, `Semiring Set String`. `Zero` is the empty set, `One` is a 5 | set with an empty string, `Add` is union and `Mul` is the cartesian 6 | concatenation of the tuples generated by the cartesian product. The 7 | `sym` function is now modified to return `Zero` on failure, or on 8 | success a Set containing the recognized character as a string. 9 | 10 | The union of a any set with the empty set is the set; the cartesian 11 | product of any set with the empty set is the empty set; the 12 | concatenation of the empty string with any set of strings is that set of 13 | strings, so the Semiring properties hold. 14 | 15 | The result is a regular expression engine that returns all possible 16 | unique sets of strings that resulted from matching the regular 17 | expression, or the empty set if no match happened. 18 | 19 | I'm not yet comfortable with the theoretical underpinnings of this 20 | variant, but I'm reading intensely to see where I can land this. 21 | 22 | It turns out that what I did is just fine, and is well-supported by the 23 | theoretical underpinnings. See "[Semiring 24 | Parsing](https://www.aclweb.org/anthology/J99-4004.pdf)" by Joshua 25 | Goodman. 26 | 27 | 28 | -------------------------------------------------------------------------------- /haskell/06_RiggedRegex_Combinator/RiggedRegex.cabal: -------------------------------------------------------------------------------- 1 | cabal-version: 1.12 2 | 3 | -- This file has been generated from package.yaml by hpack version 0.31.1. 4 | -- 5 | -- see: https://github.com/sol/hpack 6 | -- 7 | -- hash: 24886bf51ff45652f17f1174185b977a916ba0794b24fee1315723e119dc204a 8 | 9 | name: RiggedRegex 10 | version: 0.1.0.0 11 | category: Regex 12 | homepage: https://github.com/elfsternberg/riggedregex#readme 13 | author: Elf M. Sternberg 14 | maintainer: elf.sternberg@gmail.com 15 | copyright: Copyright ⓒ 2019 Elf M. Sternberg 16 | license: BSD3 17 | license-file: LICENSE 18 | build-type: Simple 19 | extra-source-files: 20 | README.md 21 | 22 | library 23 | exposed-modules: 24 | RiggedRegex 25 | other-modules: 26 | Paths_RiggedRegex 27 | hs-source-dirs: 28 | src 29 | ghc-options: -Wall 30 | build-depends: 31 | base 32 | , containers 33 | default-language: Haskell2010 34 | 35 | test-suite test 36 | type: exitcode-stdio-1.0 37 | main-is: Tests.hs 38 | other-modules: 39 | Paths_RiggedRegex 40 | hs-source-dirs: 41 | test 42 | build-depends: 43 | RiggedRegex 44 | , base 45 | , containers 46 | , hspec 47 | default-language: Haskell2010 48 | -------------------------------------------------------------------------------- /haskell/06_RiggedRegex_Combinator/package.yaml: -------------------------------------------------------------------------------- 1 | name: RiggedRegex 2 | version: 0.1.0.0 3 | 4 | homepage: https://github.com/elfsternberg/riggedregex#readme 5 | license: BSD3 6 | license-file: LICENSE 7 | author: Elf M. Sternberg 8 | maintainer: elf.sternberg@gmail.com 9 | copyright: Copyright ⓒ 2019 Elf M. Sternberg 10 | category: Regex 11 | build-type: Simple 12 | extra-source-files: README.md 13 | 14 | dependencies: 15 | - containers 16 | - base 17 | 18 | library: 19 | exposed-modules: RiggedRegex 20 | ghc-options: -Wall 21 | source-dirs: src 22 | 23 | tests: 24 | test: 25 | main: Tests.hs 26 | source-dirs: test 27 | dependencies: 28 | - RiggedRegex 29 | - hspec 30 | 31 | 32 | -------------------------------------------------------------------------------- /haskell/06_RiggedRegex_Combinator/src/RiggedRegex.hs: -------------------------------------------------------------------------------- 1 | {-# LANGUAGE LambdaCase #-} 2 | {-# LANGUAGE FlexibleInstances #-} 3 | 4 | module RiggedRegex ( accept, acceptw, Reg (..), Regw (..), rigged, riggeds ) where 5 | 6 | import Data.Set hiding (split) 7 | 8 | data Reg = 9 | Eps -- Epsilon 10 | | Sym Char -- Character 11 | | Alt Reg Reg -- Alternation 12 | | Seq Reg Reg -- Sequence 13 | | Rep Reg -- R* 14 | 15 | accept :: Reg -> String -> Bool 16 | -- Epsilon 17 | accept Eps u = Prelude.null u 18 | -- Accept if the character offered matches the character constructed 19 | accept (Sym c) u = u == [c] 20 | -- Constructed of two other expressions, accept if either one does. 21 | accept (Alt p q) u = accept p u || accept q u 22 | -- Constructed of two other expressions, accept if p accepts some part 23 | -- of u and q accepts the rest, where u is split arbitrarily 24 | accept (Seq p q) u = or [accept p u1 && accept q u2 | (u1, u2) <- split u] 25 | -- For all convolutions of u containing no empty strings, 26 | -- if all the entries of that convolution are accepted, 27 | -- then at least one convolution is acceptable. 28 | accept (Rep r) u = or [and [accept r ui | ui <- ps] | ps <- parts u] 29 | 30 | -- Generate a list of all possible combinations of a prefix and suffix 31 | -- for the string offered.w 32 | split :: [a] -> [([a], [a])] 33 | split [] = [([], [])] 34 | split (c:cs) = ([], c : cs) : [(c : s1, s2) | (s1, s2) <- split cs] 35 | 36 | -- Generate lists of lists that contain all possible convolutions of 37 | -- the input string, not including the empty string. 38 | parts :: [a] -> [[[a]]] 39 | parts [] = [[]] 40 | parts [c] = [[[c]]] 41 | parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs] 42 | 43 | -- A semiring is an algebraic structure with a zero, a one, a 44 | -- "multiplication" operation, and an "addition" operation. Zero is 45 | -- the identity operator for addition, One is the identity operator for 46 | -- multiplication, both composition operators are associative (it does 47 | -- not matter how sequential operations are grouped), and addition is 48 | -- commutative (the order of the operations does not matter). Also, 49 | -- zero `mul` anything is always zero. 50 | -- 51 | -- Which, in regular expressions in general, holds that the null regex 52 | -- is zero, and the empty string regex is one, alternation is addition 53 | -- and ... sequence is multiplication? Like "sum" and "product" types? 54 | 55 | -- Symw (c -> s) represents a mapping from a symbol to its given weight. 56 | 57 | class Semiring s where 58 | zero, one :: s 59 | mul, add :: s -> s -> s 60 | 61 | sym :: Semiring s => Char -> Regw Char s 62 | sym c = Symw (\b -> if b == c then one else zero) 63 | 64 | data Regw c s = 65 | Epsw -- Epsilon 66 | | Symw (c -> s) -- Character 67 | | Altw (Regw c s) (Regw c s) -- Alternation 68 | | Seqw (Regw c s) (Regw c s) -- Sequence 69 | | Repw (Regw c s) -- R* 70 | 71 | rigging :: Semiring s => (Char -> Regw Char s) -> Reg -> Regw Char s 72 | rigging s = \case 73 | Eps -> Epsw 74 | (Sym c) -> s c 75 | (Alt p q) -> Altw (rigging s p) (rigging s q) 76 | (Seq p q) -> Seqw (rigging s p) (rigging s q) 77 | (Rep r) -> Repw (rigging s r) 78 | 79 | rigged :: Semiring s => Reg -> Regw Char s 80 | rigged = rigging sym 81 | 82 | acceptw :: Semiring s => Regw c s -> [c] -> s 83 | acceptw Epsw u = if Prelude.null u then one else zero 84 | acceptw (Symw f) u = 85 | case u of 86 | [c] -> f c 87 | _ -> zero 88 | acceptw (Altw p q) u = acceptw p u `add` acceptw q u 89 | acceptw (Seqw p q) u = sumr [ acceptw p u1 `mul` acceptw q u2 | (u1, u2) <- split u ] 90 | acceptw (Repw r) u = sumr [ prodr [ acceptw r ui | ui <- ps ] | ps <- parts u ] 91 | 92 | -- Something feels hacky about this. I mean, I know, on the one 93 | -- hand than any epsilon is still "one" as far as the system is 94 | -- concerned; on the other hand, I would much rather have a better 95 | -- theoretical ground for what I just did here... 96 | 97 | syms :: Char -> Regw Char (Set String) 98 | syms c = Symw (\b -> if b == c then singleton [c] else zero) 99 | 100 | riggeds :: Reg -> Regw Char (Set String) 101 | riggeds = rigging syms 102 | 103 | sumr, prodr :: Semiring r => [r] -> r 104 | sumr = Prelude.foldr add zero 105 | prodr = Prelude.foldr mul one 106 | 107 | instance Semiring Int where 108 | zero = 0 109 | one = 1 110 | add = (Prelude.+) 111 | mul = (Prelude.*) 112 | 113 | instance Semiring Bool where 114 | zero = False 115 | one = True 116 | add = (||) 117 | mul = (&&) 118 | 119 | -- εs = {(ε, s)} Empty Word 120 | -- c = {(c, c)} Token 121 | -- L1 ◦ L2 = {(uv,(s, t)) | (u, s) ∈ L1 and (v, t) ∈ L2} Concatenation 122 | -- L1 ∪ L2 = {(u, s) | (u, s)} ∈ L1 Alternation 123 | 124 | -- Boolean Semiring (TRUE, FALSE,∨,∧, FALSE, TRUE) recognition 125 | -- Inside Semiring (R(1/0), +, ×, 0, 1) string probability 126 | -- Counting Semiring (N(∞/0), +, ×, 0, 1) number of derivations 127 | -- Derivation Forests Semiring (2E,∪, ·, ∅, {<>}) set of derivation 128 | 129 | instance Semiring (Set String) where 130 | zero = empty 131 | one = singleton "" 132 | add = union 133 | mul a b = Data.Set.map (uncurry (++)) $ cartesianProduct a b 134 | -------------------------------------------------------------------------------- /haskell/06_RiggedRegex_Combinator/stack.yaml: -------------------------------------------------------------------------------- 1 | # This file was automatically generated by 'stack init' 2 | # 3 | # Some commonly used options have been documented as comments in this file. 4 | # For advanced use and comprehensive documentation of the format, please see: 5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/ 6 | 7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version. 8 | # A snapshot resolver dictates the compiler version and the set of packages 9 | # to be used for project dependencies. For example: 10 | # 11 | # resolver: lts-3.5 12 | # resolver: nightly-2015-09-21 13 | # resolver: ghc-7.10.2 14 | # 15 | # The location of a snapshot can be provided as a file or url. Stack assumes 16 | # a snapshot provided as a file might change, whereas a url resource does not. 17 | # 18 | # resolver: ./custom-snapshot.yaml 19 | # resolver: https://example.com/snapshots/2018-01-01.yaml 20 | resolver: lts-13.4 21 | 22 | # User packages to be built. 23 | # Various formats can be used as shown in the example below. 24 | # 25 | # packages: 26 | # - some-directory 27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz 28 | # - location: 29 | # git: https://github.com/commercialhaskell/stack.git 30 | # commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a 31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a 32 | # subdirs: 33 | # - auto-update 34 | # - wai 35 | packages: 36 | - . 37 | # Dependency packages to be pulled from upstream that are not in the resolver 38 | # using the same syntax as the packages field. 39 | # (e.g., acme-missiles-0.3) 40 | # extra-deps: [] 41 | 42 | # Override default flag values for local packages and extra-deps 43 | # flags: {} 44 | 45 | # Extra package databases containing global packages 46 | # extra-package-dbs: [] 47 | 48 | # Control whether we use the GHC we find on the path 49 | # system-ghc: true 50 | # 51 | # Require a specific version of stack, using version ranges 52 | # require-stack-version: -any # Default 53 | # require-stack-version: ">=1.9" 54 | # 55 | # Override the architecture used by stack, especially useful on Windows 56 | # arch: i386 57 | # arch: x86_64 58 | # 59 | # Extra directories used by stack for building 60 | # extra-include-dirs: [/path/to/dir] 61 | # extra-lib-dirs: [/path/to/dir] 62 | # 63 | # Allow a newer minor version of GHC than the snapshot specifies 64 | # compiler-check: newer-minor 65 | -------------------------------------------------------------------------------- /haskell/06_RiggedRegex_Combinator/test/Tests.hs: -------------------------------------------------------------------------------- 1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-} 2 | {-# LANGUAGE RecordWildCards #-} 3 | import Data.Foldable (for_) 4 | import Test.Hspec (Spec, it, shouldBe) 5 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith) 6 | import RiggedRegex (Reg (..), accept, acceptw, rigged, riggeds) 7 | import Data.Set 8 | import Data.List (sort) 9 | 10 | main :: IO () 11 | main = hspecWith defaultConfig {configFastFail = True} specs 12 | 13 | specs :: Spec 14 | specs = do 15 | 16 | let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) ) 17 | let onec = Seq nocs (Sym 'c') 18 | let evencs = Seq ( Rep ( Seq onec onec ) ) nocs 19 | 20 | let as = Alt (Sym 'a') (Rep (Sym 'a')) 21 | let bs = Alt (Sym 'b') (Rep (Sym 'b')) 22 | 23 | it "simple expression" $ 24 | accept evencs "acc" `shouldBe` True 25 | 26 | it "lifted expression" $ 27 | (acceptw (rigged evencs) "acc" :: Bool) `shouldBe` True 28 | 29 | it "lifted expression short" $ 30 | (acceptw (rigged evencs) "acc" :: Int) `shouldBe` 1 31 | 32 | it "lifted expression counter two" $ 33 | (acceptw (rigged as) "a" :: Int) `shouldBe` 2 34 | 35 | it "lifted expression counter one" $ 36 | (acceptw (rigged as) "aa" :: Int) `shouldBe` 1 37 | 38 | it "lifted expression dynamic counter four" $ 39 | (acceptw (rigged (Seq as bs)) "ab" :: Int) `shouldBe` 4 40 | 41 | it "parse forests" $ 42 | (sort $ toList $ (acceptw (riggeds (Seq as bs)) "ab" :: Set String)) `shouldBe` ["ab"] 43 | 44 | for_ cases test 45 | where 46 | test Case {..} = it description assertion 47 | where 48 | assertion = (acceptw (rigged regex) sample :: Bool) `shouldBe` result 49 | 50 | data Case = Case 51 | { description :: String 52 | , regex :: Reg 53 | , sample :: String 54 | , result :: Bool 55 | } 56 | 57 | cases :: [Case] 58 | cases = 59 | [ Case {description = "empty", regex = Eps, sample = "", result = True} 60 | , Case {description = "char", regex = Sym 'a', sample = "a", result = True} 61 | , Case 62 | {description = "not char", regex = Sym 'a', sample = "b", result = False} 63 | , Case 64 | { description = "char vs empty" 65 | , regex = Sym 'a' 66 | , sample = "" 67 | , result = False 68 | } 69 | , Case 70 | { description = "left alt" 71 | , regex = Alt (Sym 'a') (Sym 'b') 72 | , sample = "a" 73 | , result = True 74 | } 75 | , Case 76 | { description = "right alt" 77 | , regex = Alt (Sym 'a') (Sym 'b') 78 | , sample = "b" 79 | , result = True 80 | } 81 | , Case 82 | { description = "neither alt" 83 | , regex = Alt (Sym 'a') (Sym 'b') 84 | , sample = "c" 85 | , result = False 86 | } 87 | , Case 88 | { description = "empty alt" 89 | , regex = Alt (Sym 'a') (Sym 'b') 90 | , sample = "" 91 | , result = False 92 | } 93 | , Case 94 | { description = "empty rep" 95 | , regex = Rep (Sym 'a') 96 | , sample = "" 97 | , result = True 98 | } 99 | , Case 100 | { description = "one rep" 101 | , regex = Rep (Sym 'a') 102 | , sample = "a" 103 | , result = True 104 | } 105 | , Case 106 | { description = "multiple rep" 107 | , regex = Rep (Sym 'a') 108 | , sample = "aaaaaaaaa" 109 | , result = True 110 | } 111 | , Case 112 | { description = "multiple rep with failure" 113 | , regex = Rep (Sym 'a') 114 | , sample = "aaaaaaaaab" 115 | , result = False 116 | } 117 | , Case 118 | { description = "sequence" 119 | , regex = Seq (Sym 'a') (Sym 'b') 120 | , sample = "ab" 121 | , result = True 122 | } 123 | , Case 124 | { description = "sequence with empty" 125 | , regex = Seq (Sym 'a') (Sym 'b') 126 | , sample = "" 127 | , result = False 128 | } 129 | , Case 130 | { description = "bad short sequence" 131 | , regex = Seq (Sym 'a') (Sym 'b') 132 | , sample = "a" 133 | , result = False 134 | } 135 | , Case 136 | { description = "bad long sequence" 137 | , regex = Seq (Sym 'a') (Sym 'b') 138 | , sample = "abc" 139 | , result = False 140 | } 141 | ] 142 | 143 | 144 | -------------------------------------------------------------------------------- /haskell/07_Rigged_Glushkov/LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2010, Thomas Wilke, Frank Huch, Sebastian Fischer 2 | 3 | All rights reserved. 4 | 5 | Redistribution and use in source and binary forms, with or without 6 | modification, are permitted provided that the following conditions are 7 | met: 8 | 9 | 1. Redistributions of source code must retain the above copyright 10 | notice, this list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright 13 | notice, this list of conditions and the following disclaimer in 14 | the documentation and/or other materials provided with the 15 | distribution. 16 | 17 | 3. Neither the name of the author nor the names of his contributors 18 | may be used to endorse or promote products derived from this 19 | software without specific prior written permission. 20 | 21 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 22 | ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 23 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 24 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR 25 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 26 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 27 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR 28 | PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 29 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 30 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 31 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 32 | 33 | -------------------------------------------------------------------------------- /haskell/07_Rigged_Glushkov/README.md: -------------------------------------------------------------------------------- 1 | # Rigged Glushkov Regular Expressions, in Haskell 2 | 3 | This is by far the most successful Haskell experiment yet. It builds on 4 | Experiment 04, "Glushkov Regular Expressions," and adds the Semiring 5 | implementation. 6 | 7 | We use the familiar pattern of building our regular expressions using 8 | the Kleene primitive pattern developed for Experiment 01, then lift the 9 | constructed expression into our Gluskhov representation and run it 10 | through a modified version of the 'shift' function to produce a result. 11 | In this version, as in previous rigged versions, we apply the logic of 12 | regular expressions to our semiring data during parsing. 13 | 14 | One thing that was necessary here was that, to support more complex 15 | semirings, those that are not just primitive data with simple zero or 16 | one representations, I needed to provide a constructor to the shift 17 | function that knew how to build new symbol operations. When you "rig" 18 | the Kleene representation, you must provide a function that takes a char 19 | and returns a symbol operator that includes the semiring. 20 | 21 | Rigging examples were *not* included in the paper. This was the first 22 | experiment where I had to come up with some parts of the solution on my 23 | own, and solving it was a fun problem. This particular version took 24 | about four hours to puzzle out, but it was worth it. I'm sure there are 25 | alternatives to my rigging-with-constructor solution, but this works and 26 | I'm not unhappy with it. It does look a bit cluttered, but that's 27 | actually how it's presented in the paper; my solution actually reduces 28 | some of the clutter. 29 | 30 | Otherwise, this version works pretty much the same way you'd expect a 31 | merger of the Kleene Semiring version and the Glushkov boolean version 32 | work. 33 | 34 | One thing that came out of the paper was the use of a Haskell 35 | record-type to record whether or not a node had already been analyzed 36 | for its finality and emptiness; this caches those results and "shorts 37 | out" traversing down the tree to rediscover these properties, resulting 38 | in a bit of a speed-up. 39 | -------------------------------------------------------------------------------- /haskell/07_Rigged_Glushkov/RiggedGlushkov.cabal: -------------------------------------------------------------------------------- 1 | cabal-version: 1.12 2 | 3 | -- This file has been generated from package.yaml by hpack version 0.31.1. 4 | -- 5 | -- see: https://github.com/sol/hpack 6 | -- 7 | -- hash: b650d4292e70e7a507191f08e2c62b80a0eb311278b7aef3a2a084f2dac0c3ca 8 | 9 | name: RiggedGlushkov 10 | version: 0.1.0.0 11 | category: Regex 12 | homepage: https://github.com/elfsternberg/riggedregex#readme 13 | author: Elf M. Sternberg 14 | maintainer: elf.sternberg@gmail.com 15 | copyright: Copyright ⓒ 2019 Elf M. Sternberg 16 | license: BSD3 17 | license-file: LICENSE 18 | build-type: Simple 19 | extra-source-files: 20 | README.md 21 | 22 | library 23 | exposed-modules: 24 | RiggedGlushkov 25 | other-modules: 26 | Paths_RiggedGlushkov 27 | hs-source-dirs: 28 | src 29 | ghc-options: -Wall 30 | build-depends: 31 | base 32 | , containers 33 | default-language: Haskell2010 34 | 35 | test-suite test 36 | type: exitcode-stdio-1.0 37 | main-is: Tests.hs 38 | other-modules: 39 | Paths_RiggedGlushkov 40 | hs-source-dirs: 41 | test 42 | build-depends: 43 | RiggedGlushkov 44 | , base 45 | , containers 46 | , hspec 47 | default-language: Haskell2010 48 | -------------------------------------------------------------------------------- /haskell/07_Rigged_Glushkov/package.yaml: -------------------------------------------------------------------------------- 1 | name: RiggedGlushkov 2 | version: 0.1.0.0 3 | 4 | homepage: https://github.com/elfsternberg/riggedregex#readme 5 | license: BSD3 6 | license-file: LICENSE 7 | author: Elf M. Sternberg 8 | maintainer: elf.sternberg@gmail.com 9 | copyright: Copyright ⓒ 2019 Elf M. Sternberg 10 | category: Regex 11 | build-type: Simple 12 | extra-source-files: README.md 13 | 14 | dependencies: 15 | - containers 16 | - base 17 | 18 | library: 19 | exposed-modules: RiggedGlushkov 20 | ghc-options: -Wall 21 | source-dirs: src 22 | 23 | tests: 24 | test: 25 | main: Tests.hs 26 | source-dirs: test 27 | dependencies: 28 | - RiggedGlushkov 29 | - hspec 30 | 31 | -------------------------------------------------------------------------------- /haskell/07_Rigged_Glushkov/src/RiggedGlushkov.hs: -------------------------------------------------------------------------------- 1 | {-# LANGUAGE FlexibleInstances #-} 2 | {-# LANGUAGE LambdaCase #-} 3 | 4 | module RiggedGlushkov ( Glu(..), acceptg, rigged, riggeds ) where 5 | 6 | import Data.Set hiding (foldl, split) 7 | 8 | data Glu 9 | = Eps 10 | | Sym Bool Char 11 | | Alt Glu Glu 12 | | Seq Glu Glu 13 | | Rep Glu 14 | 15 | -- Just as with the Kleene versions, we're going to exploit the fact 16 | -- that we have a working version. For Rust, we're going to do 17 | -- something a little different. But for now... 18 | -- 19 | -- This is interesting. The paper decides that, to keep the cost of 20 | -- processing down, we're going to cache the results of empty and 21 | -- final. One of the prices paid, though, is in the complexity of the 22 | -- data type for our expressions, and that complexity is now managed 23 | -- through factories. 24 | 25 | class Semiring s where 26 | zero, one :: s 27 | mul, add :: s -> s -> s 28 | 29 | data Glue c s = Glue 30 | { emptye :: s 31 | , finale :: s 32 | , gluw :: Gluw c s 33 | } 34 | 35 | data Gluw c s 36 | = Epsw 37 | | Symw (c -> s) 38 | | Altw (Glue c s) (Glue c s) 39 | | Seqw (Glue c s) (Glue c s) 40 | | Repw (Glue c s) 41 | 42 | epsw :: Semiring s => Glue c s 43 | epsw = Glue {emptye = one, finale = zero, gluw = Epsw} 44 | 45 | symw :: Semiring s => (c -> s) -> Glue c s 46 | symw f = Glue {emptye = zero, finale = zero, gluw = Symw f} 47 | 48 | altw :: Semiring s => Glue c s -> Glue c s -> Glue c s 49 | altw l r = 50 | Glue 51 | { emptye = add (emptye l) (emptye r), 52 | finale = add (finale l) (finale r), 53 | gluw = Altw l r 54 | } 55 | 56 | seqw :: Semiring s => Glue c s -> Glue c s -> Glue c s 57 | seqw l r = 58 | Glue 59 | { emptye = mul (emptye l) (emptye r), 60 | finale = add (mul (finale l) (emptye r)) (finale r), 61 | gluw = Seqw l r 62 | } 63 | 64 | repw :: Semiring s => Glue c s -> Glue c s 65 | repw r = Glue {emptye = one, finale = finale r, gluw = Repw r} 66 | 67 | -- for my edification, the syntax under Symw is syntax for "replace 68 | -- this value in the created record." 69 | -- > data Foo = Foo { a :: Int, b :: Int } deriving (Show) 70 | -- > (Foo 1 2) { b = 4 } 71 | -- Foo { a = 1, b = 4 } 72 | -- It doesn't seem to be functional, i.e. Foo 1 2 $ { b = 4 } doesn't work. 73 | shifte :: Semiring s => s -> Gluw c s -> c -> Glue c s 74 | shifte _ Epsw _ = epsw 75 | shifte m (Symw f) c = (symw f) {finale = m `mul` f c} 76 | shifte m (Seqw l r) c = 77 | seqw 78 | (shifte m (gluw l) c) 79 | (shifte (add (m `mul` (emptye l)) (finale l)) (gluw r) c) 80 | shifte m (Altw l r) c = altw (shifte m (gluw l) c) (shifte m (gluw r) c) 81 | shifte m (Repw r) c = repw (shifte (m `add` finale r) (gluw r) c) 82 | 83 | sym :: (Semiring s, Eq c) => c -> Glue c s 84 | sym c = symw (\b -> if b == c then one else zero) 85 | 86 | rigging :: Semiring s => (Char -> Glue Char s) -> Glu -> Glue Char s 87 | rigging s = 88 | \case 89 | Eps -> epsw 90 | (Sym _ c) -> s c 91 | (Alt p q) -> altw (rigging s p) (rigging s q) 92 | (Seq p q) -> seqw (rigging s p) (rigging s q) 93 | (Rep r) -> repw (rigging s r) 94 | 95 | rigged :: Semiring s => Glu -> Glue Char s 96 | rigged = rigging sym 97 | 98 | syms :: Char -> Glue Char (Set String) 99 | syms c = symw (\b -> if b == c then singleton [c] else zero) 100 | 101 | riggeds :: Glu -> Glue Char (Set String) 102 | riggeds = rigging syms 103 | 104 | instance Semiring (Set String) where 105 | zero = empty 106 | one = singleton "" 107 | add = union 108 | mul a b = Data.Set.map (uncurry (++)) $ cartesianProduct a b 109 | 110 | instance Semiring Int where 111 | zero = 0 112 | one = 1 113 | add = (Prelude.+) 114 | mul = (Prelude.*) 115 | 116 | instance Semiring Bool where 117 | zero = False 118 | one = True 119 | add = (||) 120 | mul = (&&) 121 | 122 | acceptg :: Semiring s => Glue c s -> [c] -> s 123 | acceptg r [] = emptye r 124 | acceptg r (c:cs) = 125 | finale (foldl (shifte zero . gluw) (shifte one (gluw r) c) cs) 126 | -------------------------------------------------------------------------------- /haskell/07_Rigged_Glushkov/stack.yaml: -------------------------------------------------------------------------------- 1 | # This file was automatically generated by 'stack init' 2 | # 3 | # Some commonly used options have been documented as comments in this file. 4 | # For advanced use and comprehensive documentation of the format, please see: 5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/ 6 | 7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version. 8 | # A snapshot resolver dictates the compiler version and the set of packages 9 | # to be used for project dependencies. For example: 10 | # 11 | # resolver: lts-3.5 12 | # resolver: nightly-2015-09-21 13 | # resolver: ghc-7.10.2 14 | # 15 | # The location of a snapshot can be provided as a file or url. Stack assumes 16 | # a snapshot provided as a file might change, whereas a url resource does not. 17 | # 18 | # resolver: ./custom-snapshot.yaml 19 | # resolver: https://example.com/snapshots/2018-01-01.yaml 20 | resolver: lts-13.5 21 | 22 | # User packages to be built. 23 | # Various formats can be used as shown in the example below. 24 | # 25 | # packages: 26 | # - some-directory 27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz 28 | # - location: 29 | # git: https://github.com/commercialhaskell/stack.git 30 | # commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a 31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a 32 | # subdirs: 33 | # - auto-update 34 | # - wai 35 | packages: 36 | - . 37 | # Dependency packages to be pulled from upstream that are not in the resolver 38 | # using the same syntax as the packages field. 39 | # (e.g., acme-missiles-0.3) 40 | # extra-deps: [] 41 | 42 | # Override default flag values for local packages and extra-deps 43 | # flags: {} 44 | 45 | # Extra package databases containing global packages 46 | # extra-package-dbs: [] 47 | 48 | # Control whether we use the GHC we find on the path 49 | # system-ghc: true 50 | # 51 | # Require a specific version of stack, using version ranges 52 | # require-stack-version: -any # Default 53 | # require-stack-version: ">=1.9" 54 | # 55 | # Override the architecture used by stack, especially useful on Windows 56 | # arch: i386 57 | # arch: x86_64 58 | # 59 | # Extra directories used by stack for building 60 | # extra-include-dirs: [/path/to/dir] 61 | # extra-lib-dirs: [/path/to/dir] 62 | # 63 | # Allow a newer minor version of GHC than the snapshot specifies 64 | # compiler-check: newer-minor 65 | -------------------------------------------------------------------------------- /haskell/07_Rigged_Glushkov/test/Tests.hs: -------------------------------------------------------------------------------- 1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-} 2 | {-# LANGUAGE RecordWildCards #-} 3 | 4 | import Data.Foldable (for_) 5 | import Test.Hspec (Spec, describe, it, shouldBe) 6 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith) 7 | import RiggedGlushkov (Glu (..), acceptg, rigged, riggeds) 8 | import Data.Set 9 | import Data.List (sort) 10 | 11 | main :: IO () 12 | main = hspecWith defaultConfig {configFastFail = True} specs 13 | 14 | msym :: Char -> Glu 15 | msym c = Sym False c 16 | 17 | specs :: Spec 18 | specs = do 19 | 20 | let nocs = Rep ( Alt ( msym 'a' ) ( msym 'b' ) ) 21 | let onec = Seq nocs (msym 'c') 22 | let evencs = Seq ( Rep ( Seq onec onec ) ) nocs 23 | 24 | let as = Alt (msym 'a') (Rep (msym 'a')) 25 | let bs = Alt (msym 'b') (Rep (msym 'b')) 26 | 27 | -- it "lifted expression" $ 28 | -- (acceptg (rigged evencs) "acc" :: Bool) `shouldBe` True 29 | 30 | it "lifted expression short" $ 31 | (acceptg (rigged evencs) "acc" :: Int) `shouldBe` 1 32 | 33 | it "lifted expression counter two" $ 34 | (acceptg (rigged as) "a" :: Int) `shouldBe` 2 35 | 36 | it "lifted expression counter one" $ 37 | (acceptg (rigged as) "aa" :: Int) `shouldBe` 1 38 | 39 | it "lifted expression dynamic counter four" $ 40 | (acceptg (rigged (Seq as bs)) "ab" :: Int) `shouldBe` 4 41 | 42 | it "parse forests" $ 43 | (sort $ toList $ (acceptg (riggeds (Seq as bs)) "ab" :: Set String)) `shouldBe` ["ab"] 44 | 45 | for_ cases test 46 | where 47 | test Case {..} = it description assertion 48 | where 49 | assertion = (acceptg (rigged regex) sample :: Bool) `shouldBe` result 50 | 51 | data Case = Case 52 | { description :: String 53 | , regex :: Glu 54 | , sample :: String 55 | , result :: Bool 56 | } 57 | 58 | cases :: [Case] 59 | cases = 60 | [ Case {description = "empty", regex = Eps, sample = "", result = True} 61 | , Case {description = "char", regex = msym 'a', sample = "a", result = True} 62 | , Case 63 | {description = "not char", regex = msym 'a', sample = "b", result = False} 64 | , Case 65 | { description = "char vs empty" 66 | , regex = msym 'a' 67 | , sample = "" 68 | , result = False 69 | } 70 | , Case 71 | { description = "left alt" 72 | , regex = Alt (msym 'a') (msym 'b') 73 | , sample = "a" 74 | , result = True 75 | } 76 | , Case 77 | { description = "right alt" 78 | , regex = Alt (msym 'a') (msym 'b') 79 | , sample = "b" 80 | , result = True 81 | } 82 | , Case 83 | { description = "neither alt" 84 | , regex = Alt (msym 'a') (msym 'b') 85 | , sample = "c" 86 | , result = False 87 | } 88 | , Case 89 | { description = "empty alt" 90 | , regex = Alt (msym 'a') (msym 'b') 91 | , sample = "" 92 | , result = False 93 | } 94 | , Case 95 | { description = "empty rep" 96 | , regex = Rep (msym 'a') 97 | , sample = "" 98 | , result = True 99 | } 100 | , Case 101 | { description = "one rep" 102 | , regex = Rep (msym 'a') 103 | , sample = "a" 104 | , result = True 105 | } 106 | , Case 107 | { description = "multiple rep" 108 | , regex = Rep (msym 'a') 109 | , sample = "aaaaaaaaa" 110 | , result = True 111 | } 112 | , Case 113 | { description = "multiple rep with failure" 114 | , regex = Rep (msym 'a') 115 | , sample = "aaaaaaaaab" 116 | , result = False 117 | } 118 | , Case 119 | { description = "sequence" 120 | , regex = Seq (msym 'a') (msym 'b') 121 | , sample = "ab" 122 | , result = True 123 | } 124 | , Case 125 | { description = "sequence with empty" 126 | , regex = Seq (msym 'a') (msym 'b') 127 | , sample = "" 128 | , result = False 129 | } 130 | , Case 131 | { description = "bad short sequence" 132 | , regex = Seq (msym 'a') (msym 'b') 133 | , sample = "a" 134 | , result = False 135 | } 136 | , Case 137 | { description = "bad long sequence" 138 | , regex = Seq (msym 'a') (msym 'b') 139 | , sample = "abc" 140 | , result = False 141 | } 142 | ] 143 | 144 | 145 | -------------------------------------------------------------------------------- /haskell/08_Heavyweights/Heavyweights.cabal: -------------------------------------------------------------------------------- 1 | cabal-version: 1.12 2 | 3 | -- This file has been generated from package.yaml by hpack version 0.31.1. 4 | -- 5 | -- see: https://github.com/sol/hpack 6 | -- 7 | -- hash: c903f1e543aacf648806799a4c925c51ece2e6c833560c723350207fa137497f 8 | 9 | name: Heavyweights 10 | version: 0.1.0.0 11 | category: Regex 12 | homepage: https://github.com/elfsternberg/riggedregex#readme 13 | author: Elf M. Sternberg 14 | maintainer: elf.sternberg@gmail.com 15 | copyright: Copyright ⓒ 2019 Elf M. Sternberg 16 | license: MPL-2.0 17 | license-file: LICENSE.md 18 | build-type: Simple 19 | extra-source-files: 20 | README.md 21 | 22 | library 23 | exposed-modules: 24 | Heavyweights 25 | other-modules: 26 | Paths_Heavyweights 27 | hs-source-dirs: 28 | src 29 | ghc-options: -Wall 30 | build-depends: 31 | base 32 | , containers 33 | default-language: Haskell2010 34 | 35 | test-suite test 36 | type: exitcode-stdio-1.0 37 | main-is: Tests.hs 38 | other-modules: 39 | Paths_Heavyweights 40 | hs-source-dirs: 41 | test 42 | build-depends: 43 | Heavyweights 44 | , base 45 | , containers 46 | , hspec 47 | default-language: Haskell2010 48 | -------------------------------------------------------------------------------- /haskell/08_Heavyweights/LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2010, Thomas Wilke, Frank Huch, Sebastian Fischer 2 | 3 | All rights reserved. 4 | 5 | Redistribution and use in source and binary forms, with or without 6 | modification, are permitted provided that the following conditions are 7 | met: 8 | 9 | 1. Redistributions of source code must retain the above copyright 10 | notice, this list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright 13 | notice, this list of conditions and the following disclaimer in 14 | the documentation and/or other materials provided with the 15 | distribution. 16 | 17 | 3. Neither the name of the author nor the names of his contributors 18 | may be used to endorse or promote products derived from this 19 | software without specific prior written permission. 20 | 21 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 22 | ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 23 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 24 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR 25 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 26 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 27 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR 28 | PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 29 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 30 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 31 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 32 | 33 | -------------------------------------------------------------------------------- /haskell/08_Heavyweights/LICENSE.md: -------------------------------------------------------------------------------- 1 | See license in main directory 2 | -------------------------------------------------------------------------------- /haskell/08_Heavyweights/README.md: -------------------------------------------------------------------------------- 1 | # Rigged Glushkov Regular Expressions in Haskell: Compliance Experiments 2 | 3 | This implementation doesn't differ much from [Experiment 07: Rigged 4 | Glushkov Regular Expressions in Haskell](../07_Rigged_Glushkov), except 5 | that it adds two new Semiring implementations to the library. 6 | 7 | Recall the basics of Semiring theory: There is a zero, a one, an 8 | "addition" operation and a "multiplication" operation. These two 9 | operators have identities (in numbers, addition has zero, multiplication 10 | has one) and the operators behave similarly (multiplication times zero 11 | always equals zero, or "nothing."), and a data type on which these 12 | operations work. 13 | 14 | We've used these principles to do boolean recognition; multiplication is 15 | the boolean `and` operator, used to encode sequences using annihilation: 16 | any sequence that doesn't match is `False`, and `False && x` is always 17 | `False`. If the entire truth of an expression depends on an annihilated 18 | sequence, then it's not true. 19 | 20 | We've used it to count ambiguities, via integers: by using addition *as* 21 | addition, we count the number of different regular languages encoded in 22 | our initial expression that could have produced the string submitted, 23 | thus revealing the number of ambiguities in our expression. Each `or` 24 | that returns 1 reveals a different path, so an alternate pattern will 25 | return the sum of the paths that pass through it. 26 | 27 | And we've even used it to identify the strings that match. By saying 28 | that our Semiring is a "set of strings", our addition is union (that is, 29 | we keep the set of all paths through the alternate pattern), and 30 | multiplication is the concatenation of the cartesian products of the two 31 | elements of a sequence (so for a basic pattern with no alternatives it's 32 | just a concatenation of the two strings, but for alternatives with 33 | multiples, it's the concatenation of all possible combinations), we've 34 | created a way to extract the exact string(s) that we submitted to the 35 | machine that matched. 36 | 37 | In *Heavyweights*, Fischer, Huch and Wilke go further and show how 38 | clever choices among zeros and ones can lead to some rather powerful 39 | outcomes. 40 | 41 | The first thing to appreciate is that our symbol operator, `sym`, has 42 | never actually been about symbols. It's about predicates. Our base 43 | implementation has been to pass a closured comparison with our desired 44 | symbol, returning zero or one. 45 | 46 | For the string implementation, which is *not* covered in the paper and 47 | which I managed to extract, successfully, from Might's work, I passed to 48 | `sym` instead a closured comparison to the desired symbol, and the 49 | return value was either the zero or `singleton [c]`, meaning a set with 50 | a string of one character in it. (I'm quite proud of that work; it both 51 | affirmed my notion that Might & Adams had a semiring implementation, 52 | they just didn't call it that, and that I was able to merge two 53 | different equational systems, applying some notions of category theory 54 | to do so.) 55 | 56 | The definition of `sym` was: `sym :: (Semiring s, Eq c) => c -> Reg c 57 | s`. I added `syms`: `syms :: Char -> Reg Char (Set String)`. Now the 58 | three provide `symi`: symi :: Semiringi s => Char -> Reg (Int, Char) s` 59 | This is a semiring that *takes* both an Int and a Char, and their 60 | `accept` method `zip`s the input value with a position value, so that 61 | both are available for processing. Remember that everything else 62 | depends on the Semiring, and *not* the input type; only `sym` cares. 63 | 64 | Now they add an "indexed semiring," and to it provide a version of `sym` 65 | that returns the `index` semiring when true, and zero otherwise. 66 | 67 | class Semiring s => Semiringi s where 68 | index :: Int -> s 69 | 70 | symi :: Semiringi s => Char -> Glue (Int, Char) s 71 | symi c = symw weight 72 | where weight (pos, x) | x == c = index pos 73 | | otherwise = zero 74 | 75 | But what *is* the `index` semiring? Here's where things get 76 | interesting. Fischer, et. al., want to encode the length of the longest 77 | submatch. The first thing they do is define submatch as a variant of 78 | accept, with a lead-in that just matches everything. This is okay, as 79 | this is a Glushkov machine and that just means that the 'arb' NFA will 80 | almost always be active, but it won't be important to us, it's not 81 | working with `symi` values. 82 | 83 | submatch :: Semiring s => Glue (Int, c) s -> [c] -> s 84 | submatch r s = 85 | accept (seqw arb (seqw r arb)) (zip [0..] s) 86 | where arb = repw (symw (\_ -> one)) 87 | 88 | So... what are the zero and one of a "longest submatch" operation? The 89 | zero is that no match ever occurred. The one is that a match is 90 | possible, but hasn't yet occurred. Any other value is a submatch. The 91 | final value is the longest interval of the submatch. 92 | 93 | Fischer, et al. break up their semiring into two parts: 94 | 95 | data LeftLong = NoLeftLong | LeftLong Range deriving (Show) 96 | data Range = NoRange | Range Int Int deriving (Show) 97 | 98 | `NoLeftLong` is zero; it could never happen, there was no match. 99 | `NoRange` is the one, meaning it could still happen, it just hasn't 100 | yet. And `Range` is a submatch that has been found. 101 | 102 | For addition (which symbolizes alternation, recall), adding a failure to 103 | anything is the anything, no `add NoLeftLong x = x`, and that's true the 104 | other way. Adding a range with an empty range is just the range, and 105 | adding two ranges is to pick the longer of the two. 106 | 107 | For multiplication, again, multiplying by failure is just failure. 108 | Multiplying anything with `NoRange` means that the anything is preserved 109 | unchanged, and multiplying two ranges is a new range with the start of 110 | the first range and the end of the latter range. (Recall that for 111 | Semirings, the operations are associative but they are *not* 112 | commutative. They may *be* commutative for some sets, but it's not a 113 | requirement of semirings and you shouldn't count on commutativity.) 114 | 115 | -------------------------------------------------------------------------------- /haskell/08_Heavyweights/package.yaml: -------------------------------------------------------------------------------- 1 | name: Heavyweights 2 | version: 0.1.0.0 3 | 4 | homepage: https://github.com/elfsternberg/riggedregex#readme 5 | license: MPL-2.0 6 | license-file: LICENSE.md 7 | author: Elf M. Sternberg 8 | maintainer: elf.sternberg@gmail.com 9 | copyright: Copyright ⓒ 2019 Elf M. Sternberg 10 | category: Regex 11 | build-type: Simple 12 | extra-source-files: README.md 13 | 14 | dependencies: 15 | - base 16 | - containers 17 | 18 | library: 19 | exposed-modules: Heavyweights 20 | ghc-options: -Wall 21 | source-dirs: src 22 | 23 | tests: 24 | test: 25 | main: Tests.hs 26 | source-dirs: test 27 | dependencies: 28 | - Heavyweights 29 | - hspec 30 | -------------------------------------------------------------------------------- /haskell/08_Heavyweights/src/Heavyweights.hs: -------------------------------------------------------------------------------- 1 | {-# LANGUAGE FlexibleInstances #-} 2 | {-# LANGUAGE LambdaCase #-} 3 | 4 | module Heavyweights ( Reg(..), Glue, accept, rigged, riggeds, riggew, submatch, symi, altw, seqw, repw, LeftLong(..) ) where 5 | 6 | import Data.Set hiding (foldl, split) 7 | 8 | data Reg 9 | = Eps 10 | | Sym Bool Char 11 | | Alt Reg Reg 12 | | Seq Reg Reg 13 | | Rep Reg 14 | 15 | -- Just as with the Kleene versions, we're going to exploit the fact 16 | -- that we have a working version. For Rust, we're going to do 17 | -- something a little different. But for now... 18 | -- 19 | -- This is interesting. The paper decides that, to keep the cost of 20 | -- processing down, we're going to cache the results of emptyg and 21 | -- final. One of the prices paid, though, is in the complexity of the 22 | -- data type for our expressions, and that complexity is now managed 23 | -- through factories. 24 | 25 | class Semiring s where 26 | zero, one :: s 27 | mul, add :: s -> s -> s 28 | 29 | data Glue c s = Glue 30 | { emptyg :: s 31 | , final :: s 32 | , glu :: Glu c s 33 | } 34 | 35 | -- 'Glu' is just the representative of the regex element 36 | -- 'Glue' is the extended representation with cached values 37 | 38 | data Glu c s 39 | = Epsw 40 | | Symw (c -> s) 41 | | Altw (Glue c s) (Glue c s) 42 | | Seqw (Glue c s) (Glue c s) 43 | | Repw (Glue c s) 44 | 45 | epsw :: Semiring s => Glue c s 46 | epsw = Glue {emptyg = one, final = zero, glu = Epsw} 47 | 48 | symw :: Semiring s => (c -> s) -> Glue c s 49 | symw f = Glue {emptyg = zero, final = zero, glu = Symw f} 50 | 51 | altw :: Semiring s => Glue c s -> Glue c s -> Glue c s 52 | altw l r = 53 | Glue 54 | { emptyg = add (emptyg l) (emptyg r), 55 | final = add (final l) (final r), 56 | glu = Altw l r 57 | } 58 | 59 | seqw :: Semiring s => Glue c s -> Glue c s -> Glue c s 60 | seqw l r = 61 | Glue 62 | { emptyg = mul (emptyg l) (emptyg r), 63 | final = add (mul (final l) (emptyg r)) (final r), 64 | glu = Seqw l r 65 | } 66 | 67 | repw :: Semiring s => Glue c s -> Glue c s 68 | repw r = Glue {emptyg = one, final = final r, glu = Repw r} 69 | 70 | -- for my edification, the syntax under Symw is syntax for "replace 71 | -- this value in the created record." 72 | -- > data Foo = Foo { a :: Int, b :: Int } deriving (Show) 73 | -- > (Foo 1 2) { b = 4 } 74 | -- Foo { a = 1, b = 4 } 75 | -- It doesn't seem to be functional, i.e. Foo 1 2 $ { b = 4 } doesn't work. 76 | 77 | shift :: Semiring s => s -> Glu c s -> c -> Glue c s 78 | shift _ Epsw _ = epsw 79 | shift m (Symw f) c = (symw f) {final = m `mul` f c} 80 | shift m (Seqw l r) c = 81 | seqw 82 | (shift m (glu l) c) 83 | (shift (add (m `mul` (emptyg l)) (final l)) (glu r) c) 84 | shift m (Altw l r) c = altw (shift m (glu l) c) (shift m (glu r) c) 85 | shift m (Repw r) c = repw (shift (m `add` final r) (glu r) c) 86 | 87 | sym :: (Semiring s, Eq c) => c -> Glue c s 88 | sym c = symw (\b -> if b == c then one else zero) 89 | 90 | rigging :: Semiring s => (Char -> Glue t s) -> Reg -> Glue t s 91 | rigging s = 92 | \case 93 | Eps -> epsw 94 | (Sym _ c) -> s c 95 | (Alt p q) -> altw (rigging s p) (rigging s q) 96 | (Seq p q) -> seqw (rigging s p) (rigging s q) 97 | (Rep r) -> repw (rigging s r) 98 | 99 | rigged :: Semiring s => Reg -> Glue Char s 100 | rigged = rigging sym 101 | 102 | syms :: Char -> Glue Char (Set String) 103 | syms c = symw (\b -> if b == c then singleton [c] else zero) 104 | 105 | riggeds :: Reg -> Glue Char (Set String) 106 | riggeds = rigging syms 107 | 108 | instance Semiring (Set String) where 109 | zero = empty 110 | one = singleton "" 111 | add = union 112 | mul a b = Data.Set.map (uncurry (++)) $ cartesianProduct a b 113 | 114 | instance Semiring Int where 115 | zero = 0 116 | one = 1 117 | add = (Prelude.+) 118 | mul = (Prelude.*) 119 | 120 | instance Semiring Bool where 121 | zero = False 122 | one = True 123 | add = (||) 124 | mul = (&&) 125 | 126 | accept :: Semiring s => Glue c s -> [c] -> s 127 | accept r [] = emptyg r 128 | accept r (c:cs) = 129 | final (foldl (shift zero . glu) (shift one (glu r) c) cs) 130 | 131 | submatch :: Semiring s => Glue (Int, c) s -> [c] -> s 132 | submatch r s = 133 | accept (seqw arb (seqw r arb)) (zip [0..] s) 134 | where arb = repw (symw (\_ -> one)) 135 | 136 | class Semiring s => Semiringi s where 137 | index :: Int -> s 138 | 139 | symi :: Semiringi s => Char -> Glue (Int, Char) s 140 | symi c = symw weight 141 | where weight (pos, x) | x == c = index pos 142 | | otherwise = zero 143 | 144 | riggew :: Semiringi s => Reg -> Glue (Int, Char) s 145 | riggew = rigging symi 146 | 147 | data Leftmost = NoLeft | Leftmost Start deriving (Show) 148 | data Start = NoStart | Start Int deriving (Show) 149 | 150 | instance Semiring Leftmost where 151 | zero = NoLeft 152 | one = Leftmost NoStart 153 | add NoLeft x = x 154 | add x NoLeft = x 155 | add (Leftmost x) (Leftmost y) = Leftmost (leftmost x y) 156 | where leftmost NoStart NoStart = NoStart 157 | leftmost NoStart (Start i) = Start i 158 | leftmost (Start i) NoStart = Start i 159 | leftmost (Start i) (Start j) = Start (min i j) 160 | mul NoLeft _ = NoLeft 161 | mul _ NoLeft = NoLeft 162 | mul (Leftmost x) (Leftmost y) = Leftmost (start x y) 163 | where start NoStart s = s 164 | start s _ = s 165 | 166 | instance Semiringi Leftmost where 167 | index = Leftmost . Start 168 | 169 | -- Leftlong Implementation! 170 | 171 | data LeftLong = NoLeftLong | NoRange | Range Int Int deriving (Show, Eq) 172 | 173 | instance Semiring LeftLong where 174 | zero = NoLeftLong 175 | one = NoRange 176 | 177 | -- The addition of two leftlongs is the selection 178 | -- of the longer of the two, provided there are 179 | -- two. 180 | 181 | add NoLeftLong x = x 182 | add x NoLeftLong = x 183 | add NoRange x = x 184 | add x NoRange = x 185 | add (Range i j) (Range k l) 186 | | i < k || i == k && j > l = Range i j 187 | | otherwise = Range k l 188 | 189 | -- The multiplication of two leftlongs is the the longest possible 190 | -- range among the leftlongs provided; the zero is still annhilation, 191 | -- the one is still identity, and `mul` here is the start of the left 192 | -- component and the end of the right component. 193 | 194 | mul NoLeftLong _ = NoLeftLong 195 | mul _ NoLeftLong = NoLeftLong 196 | mul NoRange x = x 197 | mul x NoRange = x 198 | mul (Range i _) (Range _ l) = Range i l 199 | 200 | instance Semiringi LeftLong where 201 | index i = Range i i 202 | 203 | -------------------------------------------------------------------------------- /haskell/08_Heavyweights/stack.yaml: -------------------------------------------------------------------------------- 1 | # This file was automatically generated by 'stack init' 2 | # 3 | # Some commonly used options have been documented as comments in this file. 4 | # For advanced use and comprehensive documentation of the format, please see: 5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/ 6 | 7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version. 8 | # A snapshot resolver dictates the compiler version and the set of packages 9 | # to be used for project dependencies. For example: 10 | # 11 | # resolver: lts-3.5 12 | # resolver: nightly-2015-09-21 13 | # resolver: ghc-7.10.2 14 | # 15 | # The location of a snapshot can be provided as a file or url. Stack assumes 16 | # a snapshot provided as a file might change, whereas a url resource does not. 17 | # 18 | # resolver: ./custom-snapshot.yaml 19 | # resolver: https://example.com/snapshots/2018-01-01.yaml 20 | resolver: lts-13.7 21 | 22 | # User packages to be built. 23 | # Various formats can be used as shown in the example below. 24 | # 25 | # packages: 26 | # - some-directory 27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz 28 | # - location: 29 | # git: https://github.com/commercialhaskell/stack.git 30 | # commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a 31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a 32 | # subdirs: 33 | # - auto-update 34 | # - wai 35 | packages: 36 | - . 37 | # Dependency packages to be pulled from upstream that are not in the resolver 38 | # using the same syntax as the packages field. 39 | # (e.g., acme-missiles-0.3) 40 | # extra-deps: [] 41 | 42 | # Override default flag values for local packages and extra-deps 43 | # flags: {} 44 | 45 | # Extra package databases containing global packages 46 | # extra-package-dbs: [] 47 | 48 | # Control whether we use the GHC we find on the path 49 | # system-ghc: true 50 | # 51 | # Require a specific version of stack, using version ranges 52 | # require-stack-version: -any # Default 53 | # require-stack-version: ">=1.9" 54 | # 55 | # Override the architecture used by stack, especially useful on Windows 56 | # arch: i386 57 | # arch: x86_64 58 | # 59 | # Extra directories used by stack for building 60 | # extra-include-dirs: [/path/to/dir] 61 | # extra-lib-dirs: [/path/to/dir] 62 | # 63 | # Allow a newer minor version of GHC than the snapshot specifies 64 | # compiler-check: newer-minor 65 | -------------------------------------------------------------------------------- /haskell/08_Heavyweights/test/Tests.hs: -------------------------------------------------------------------------------- 1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-} 2 | {-# LANGUAGE RecordWildCards #-} 3 | 4 | import Data.Foldable (for_) 5 | import Test.Hspec (Spec, describe, it, shouldBe) 6 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith) 7 | import Heavyweights (Reg (..), Glue, accept, rigged, riggeds, riggew, submatch, symi, altw, seqw, repw, LeftLong(..)) 8 | import Data.Set 9 | import Data.List (sort) 10 | 11 | main :: IO () 12 | main = hspecWith defaultConfig {configFastFail = True} specs 13 | 14 | msym :: Char -> Reg 15 | msym c = Sym False c 16 | 17 | specs :: Spec 18 | specs = do 19 | 20 | let nocs = Rep ( Alt ( msym 'a' ) ( msym 'b' ) ) 21 | let onec = Seq nocs (msym 'c') 22 | let evencs = Seq ( Rep ( Seq onec onec ) ) nocs 23 | 24 | let as = Alt (msym 'a') (Rep (msym 'a')) 25 | let bs = Alt (msym 'b') (Rep (msym 'b')) 26 | 27 | -- it "lifted expression" $ 28 | -- (accept (rigged evencs) "acc" :: Bool) `shouldBe` True 29 | 30 | it "lifted expression short" $ 31 | (accept (rigged evencs) "acc" :: Int) `shouldBe` 1 32 | 33 | it "lifted expression counter two" $ 34 | (accept (rigged as) "a" :: Int) `shouldBe` 2 35 | 36 | it "lifted expression counter one" $ 37 | (accept (rigged as) "aa" :: Int) `shouldBe` 1 38 | 39 | it "lifted expression dynamic counter four" $ 40 | (accept (rigged (Seq as bs)) "ab" :: Int) `shouldBe` 4 41 | 42 | it "parse forests" $ 43 | (sort $ toList $ (accept (riggeds (Seq as bs)) "ab" :: Set String)) `shouldBe` ["ab"] 44 | 45 | let aa = symi 'a' 46 | let ab = repw (aa `altw` symi 'b') 47 | let aaba = aa `seqw` ab `seqw` aa 48 | 49 | it "submatch noleft" $ 50 | (submatch aaba "ab" :: LeftLong ) `shouldBe` NoLeftLong 51 | 52 | it "submatch shortrange" $ 53 | (submatch aaba "aa" :: LeftLong ) `shouldBe` (Range 0 1) 54 | 55 | it "submatch fullrange" $ 56 | (submatch aaba "bababa" :: LeftLong ) `shouldBe` (Range 1 5) 57 | 58 | for_ cases test 59 | where 60 | test Case {..} = it description assertion 61 | where 62 | assertion = (accept (rigged regex) sample :: Bool) `shouldBe` result 63 | 64 | data Case = Case 65 | { description :: String 66 | , regex :: Reg 67 | , sample :: String 68 | , result :: Bool 69 | } 70 | 71 | cases :: [Case] 72 | cases = 73 | [ Case {description = "empty", regex = Eps, sample = "", result = True} 74 | , Case {description = "char", regex = msym 'a', sample = "a", result = True} 75 | , Case 76 | {description = "not char", regex = msym 'a', sample = "b", result = False} 77 | , Case 78 | { description = "char vs empty" 79 | , regex = msym 'a' 80 | , sample = "" 81 | , result = False 82 | } 83 | , Case 84 | { description = "left alt" 85 | , regex = Alt (msym 'a') (msym 'b') 86 | , sample = "a" 87 | , result = True 88 | } 89 | , Case 90 | { description = "right alt" 91 | , regex = Alt (msym 'a') (msym 'b') 92 | , sample = "b" 93 | , result = True 94 | } 95 | , Case 96 | { description = "neither alt" 97 | , regex = Alt (msym 'a') (msym 'b') 98 | , sample = "c" 99 | , result = False 100 | } 101 | , Case 102 | { description = "empty alt" 103 | , regex = Alt (msym 'a') (msym 'b') 104 | , sample = "" 105 | , result = False 106 | } 107 | , Case 108 | { description = "empty rep" 109 | , regex = Rep (msym 'a') 110 | , sample = "" 111 | , result = True 112 | } 113 | , Case 114 | { description = "one rep" 115 | , regex = Rep (msym 'a') 116 | , sample = "a" 117 | , result = True 118 | } 119 | , Case 120 | { description = "multiple rep" 121 | , regex = Rep (msym 'a') 122 | , sample = "aaaaaaaaa" 123 | , result = True 124 | } 125 | , Case 126 | { description = "multiple rep with failure" 127 | , regex = Rep (msym 'a') 128 | , sample = "aaaaaaaaab" 129 | , result = False 130 | } 131 | , Case 132 | { description = "sequence" 133 | , regex = Seq (msym 'a') (msym 'b') 134 | , sample = "ab" 135 | , result = True 136 | } 137 | , Case 138 | { description = "sequence with empty" 139 | , regex = Seq (msym 'a') (msym 'b') 140 | , sample = "" 141 | , result = False 142 | } 143 | , Case 144 | { description = "bad short sequence" 145 | , regex = Seq (msym 'a') (msym 'b') 146 | , sample = "a" 147 | , result = False 148 | } 149 | , Case 150 | { description = "bad long sequence" 151 | , regex = Seq (msym 'a') (msym 'b') 152 | , sample = "abc" 153 | , result = False 154 | } 155 | ] 156 | 157 | 158 | -------------------------------------------------------------------------------- /haskell/09_Classed_Brzozowski/BrzExp.cabal: -------------------------------------------------------------------------------- 1 | cabal-version: 1.12 2 | 3 | -- This file has been generated from package.yaml by hpack version 0.31.1. 4 | -- 5 | -- see: https://github.com/sol/hpack 6 | -- 7 | -- hash: bf664c2df8f31bc5a6ffd907ebaf10824d56d8d271f07d3f4cc7562ce9d44395 8 | 9 | name: BrzExp 10 | version: 0.1.0.0 11 | category: Regex 12 | homepage: https://github.com/elfsternberg/riggedregex#readme 13 | author: Elf M. Sternberg 14 | maintainer: elf.sternberg@gmail.com 15 | copyright: Copyright ⓒ 2019 Elf M. Sternberg 16 | license: BSD3 17 | license-file: LICENSE 18 | build-type: Simple 19 | extra-source-files: 20 | README.md 21 | 22 | library 23 | exposed-modules: 24 | BrzExp 25 | other-modules: 26 | Paths_BrzExp 27 | hs-source-dirs: 28 | src 29 | ghc-options: -Wall 30 | build-depends: 31 | base 32 | default-language: Haskell2010 33 | 34 | test-suite test 35 | type: exitcode-stdio-1.0 36 | main-is: Tests.hs 37 | other-modules: 38 | Paths_BrzExp 39 | hs-source-dirs: 40 | test 41 | build-depends: 42 | BrzExp 43 | , base 44 | , hspec 45 | default-language: Haskell2010 46 | -------------------------------------------------------------------------------- /haskell/09_Classed_Brzozowski/LICENSE: -------------------------------------------------------------------------------- 1 | The Brzozowski experiments are original work, and are copyright and 2 | licensed by Elf M. Sternberg according to the Mozilla 2.0 Public 3 | License. See the LICENSE.md file in the main directory 4 | -------------------------------------------------------------------------------- /haskell/09_Classed_Brzozowski/README.md: -------------------------------------------------------------------------------- 1 | # Brzozowski Regular Expressions, in Haskell 2 | 3 | This is a regex recognizer implementing Brzozowski's Algorithm, in 4 | Haskell. 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /haskell/09_Classed_Brzozowski/Setup.hs: -------------------------------------------------------------------------------- 1 | import Distribution.Simple 2 | main = defaultMain 3 | -------------------------------------------------------------------------------- /haskell/09_Classed_Brzozowski/package.yaml: -------------------------------------------------------------------------------- 1 | name: BrzExp 2 | version: 0.1.0.0 3 | 4 | homepage: https://github.com/elfsternberg/riggedregex#readme 5 | license: BSD3 6 | license-file: LICENSE 7 | author: Elf M. Sternberg 8 | maintainer: elf.sternberg@gmail.com 9 | copyright: Copyright ⓒ 2019 Elf M. Sternberg 10 | category: Regex 11 | build-type: Simple 12 | extra-source-files: README.md 13 | 14 | dependencies: 15 | - base 16 | 17 | library: 18 | exposed-modules: BrzExp 19 | ghc-options: -Wall 20 | source-dirs: src 21 | 22 | tests: 23 | test: 24 | main: Tests.hs 25 | source-dirs: test 26 | dependencies: 27 | - BrzExp 28 | - hspec 29 | 30 | -------------------------------------------------------------------------------- /haskell/09_Classed_Brzozowski/src/BrzExp.hs: -------------------------------------------------------------------------------- 1 | module BrzExp ( accept, nullable, Brz (..) ) where 2 | data Brz = Emp | Eps | Sym (Char -> Bool) | Alt Brz Brz | Seq Brz Brz | Rep Brz 3 | 4 | derive :: Brz -> Char -> Brz 5 | derive Emp _ = Emp 6 | derive Eps _ = Emp 7 | derive (Sym c) u = if (c u) then Eps else Emp 8 | derive (Seq l r) u 9 | | nullable l = Alt (Seq (derive l u) r) (derive r u) 10 | | otherwise = Seq (derive l u) r 11 | 12 | derive (Alt Emp r) u = derive r u 13 | derive (Alt l Emp) u = derive l u 14 | derive (Alt l r) u = Alt (derive r u) (derive l u) 15 | 16 | derive (Rep r) u = Seq (derive r u) (Rep r) 17 | 18 | nullable :: Brz -> Bool 19 | nullable Emp = False 20 | nullable Eps = True 21 | nullable (Sym _) = False 22 | nullable (Alt l r) = nullable l || nullable r 23 | nullable (Seq l r) = nullable l && nullable r 24 | nullable (Rep _) = True 25 | 26 | accept :: Brz -> String -> Bool 27 | accept r [] = nullable r 28 | accept r (s:ss) = accept (derive r s) ss 29 | 30 | 31 | -------------------------------------------------------------------------------- /haskell/09_Classed_Brzozowski/stack.yaml: -------------------------------------------------------------------------------- 1 | # This file was automatically generated by 'stack init' 2 | # 3 | # Some commonly used options have been documented as comments in this file. 4 | # For advanced use and comprehensive documentation of the format, please see: 5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/ 6 | 7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version. 8 | # A snapshot resolver dictates the compiler version and the set of packages 9 | # to be used for project dependencies. For example: 10 | # 11 | # resolver: lts-3.5 12 | # resolver: nightly-2015-09-21 13 | # resolver: ghc-7.10.2 14 | # 15 | # The location of a snapshot can be provided as a file or url. Stack assumes 16 | # a snapshot provided as a file might change, whereas a url resource does not. 17 | # 18 | # resolver: ./custom-snapshot.yaml 19 | # resolver: https://example.com/snapshots/2018-01-01.yaml 20 | resolver: lts-13.4 21 | 22 | # User packages to be built. 23 | # Various formats can be used as shown in the example below. 24 | # 25 | # packages: 26 | # - some-directory 27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz 28 | # - location: 29 | # git: https://github.com/commercialhaskell/stack.git 30 | # commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a 31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a 32 | # subdirs: 33 | # - auto-update 34 | # - wai 35 | packages: 36 | - . 37 | # Dependency packages to be pulled from upstream that are not in the resolver 38 | # using the same syntax as the packages field. 39 | # (e.g., acme-missiles-0.3) 40 | # extra-deps: [] 41 | 42 | # Override default flag values for local packages and extra-deps 43 | # flags: {} 44 | 45 | # Extra package databases containing global packages 46 | # extra-package-dbs: [] 47 | 48 | # Control whether we use the GHC we find on the path 49 | # system-ghc: true 50 | # 51 | # Require a specific version of stack, using version ranges 52 | # require-stack-version: -any # Default 53 | # require-stack-version: ">=1.9" 54 | # 55 | # Override the architecture used by stack, especially useful on Windows 56 | # arch: i386 57 | # arch: x86_64 58 | # 59 | # Extra directories used by stack for building 60 | # extra-include-dirs: [/path/to/dir] 61 | # extra-lib-dirs: [/path/to/dir] 62 | # 63 | # Allow a newer minor version of GHC than the snapshot specifies 64 | # compiler-check: newer-minor 65 | -------------------------------------------------------------------------------- /haskell/09_Classed_Brzozowski/test/Tests.hs: -------------------------------------------------------------------------------- 1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-} 2 | {-# LANGUAGE RecordWildCards #-} 3 | 4 | import Data.Foldable (for_) 5 | import Test.Hspec (Spec, describe, it, shouldBe) 6 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith) 7 | 8 | import BrzExp (Brz (..), accept) 9 | 10 | main :: IO () 11 | main = hspecWith defaultConfig {configFastFail = True} specs 12 | 13 | specs :: Spec 14 | specs = describe "accept" $ for_ cases test 15 | where 16 | test Case {..} = it description assertion 17 | where 18 | assertion = accept regex sample `shouldBe` result 19 | 20 | data Case = Case 21 | { description :: String 22 | , regex :: Brz 23 | , sample :: String 24 | , result :: Bool 25 | } 26 | 27 | symf :: Char -> Brz 28 | symf c = Sym (\u -> c == u) 29 | 30 | -- let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) ) 31 | -- onec = Seq nocs (Sym 'c') 32 | -- evencs = Seq ( Rep ( Seq onec onec ) ) nocs 33 | -- as = Alt (Sym 'a') (Rep (Sym 'a')) 34 | -- bs = Alt (Sym 'b') (Rep (Sym 'b')) 35 | cases :: [Case] 36 | cases = 37 | [ Case {description = "empty", regex = Eps, sample = "", result = True} 38 | , Case {description = "null", regex = Emp, sample = "", result = False} 39 | , Case {description = "char", regex = symf 'a', sample = "a", result = True} 40 | , Case 41 | {description = "not char", regex = symf 'a', sample = "b", result = False} 42 | , Case 43 | { description = "char vs empty" 44 | , regex = symf 'a' 45 | , sample = "" 46 | , result = False 47 | } 48 | , Case 49 | { description = "left alt" 50 | , regex = Alt (symf 'a') (symf 'b') 51 | , sample = "a" 52 | , result = True 53 | } 54 | , Case 55 | { description = "right alt" 56 | , regex = Alt (symf 'a') (symf 'b') 57 | , sample = "b" 58 | , result = True 59 | } 60 | , Case 61 | { description = "neither alt" 62 | , regex = Alt (symf 'a') (symf 'b') 63 | , sample = "c" 64 | , result = False 65 | } 66 | , Case 67 | { description = "empty alt" 68 | , regex = Alt (symf 'a') (symf 'b') 69 | , sample = "" 70 | , result = False 71 | } 72 | , Case 73 | { description = "empty rep" 74 | , regex = Rep (symf 'a') 75 | , sample = "" 76 | , result = True 77 | } 78 | , Case 79 | { description = "one rep" 80 | , regex = Rep (symf 'a') 81 | , sample = "a" 82 | , result = True 83 | } 84 | , Case 85 | { description = "multiple rep" 86 | , regex = Rep (symf 'a') 87 | , sample = "aaaaaaaaa" 88 | , result = True 89 | } 90 | , Case 91 | { description = "multiple rep with failure" 92 | , regex = Rep (symf 'a') 93 | , sample = "aaaaaaaaab" 94 | , result = False 95 | } 96 | , Case 97 | { description = "sequence" 98 | , regex = Seq (symf 'a') (symf 'b') 99 | , sample = "ab" 100 | , result = True 101 | } 102 | , Case 103 | { description = "sequence with empty" 104 | , regex = Seq (symf 'a') (symf 'b') 105 | , sample = "" 106 | , result = False 107 | } 108 | , Case 109 | { description = "bad short sequence" 110 | , regex = Seq (symf 'a') (symf 'b') 111 | , sample = "a" 112 | , result = False 113 | } 114 | , Case 115 | { description = "bad long sequence" 116 | , regex = Seq (symf 'a') (symf 'b') 117 | , sample = "abc" 118 | , result = False 119 | } 120 | ] 121 | -------------------------------------------------------------------------------- /node/01_Kleene.ts: -------------------------------------------------------------------------------- 1 | interface Regcom { kind: string }; 2 | class Eps implements Regcom { kind: "eps"; }; 3 | class Sym implements Regcom { kind: "sym"; s: string; } 4 | class Alt implements Regcom { kind: "alt"; l: Regex; r: Regex }; 5 | class Seq implements Regcom { kind: "seq"; l: Regex; r: Regex }; 6 | class Rep implements Regcom { kind: "rep"; r: Regex }; 7 | 8 | function eps(): Eps { return { kind: "eps" }; }; 9 | function sym(c: string): Sym { return { kind: "sym", s: c }; }; 10 | function alt(l: Regex, r: Regex): Alt { return { kind: "alt", l: l, r: r }; }; 11 | function seq(l: Regex, r: Regex): Seq { return { kind: "seq", l: l, r: r }; }; 12 | function rep(r: Regex): Rep { return { kind: "rep", r: r }; }; 13 | 14 | type Regex = Eps | Sym | Alt | Seq | Rep; 15 | 16 | // split :: [a] -> [([a], [a])] 17 | // split [] = [([], [])] 18 | // split (c:cs) = ([], c : cs) : [(c : s1, s2) | (s1, s2) <- split cs] 19 | 20 | function split(s: string) { 21 | if (s.length == 0) { 22 | return [["", ""]]; 23 | } 24 | return [["", s.slice()]].concat(split(s.slice(1)).map((v) => [s[0].slice().concat(v[0].slice()), v[1].slice()])); 25 | } 26 | 27 | // parts :: [a] -> [[[a]]] 28 | // parts [] = [[]] 29 | // parts [c] = [[[c]]] 30 | // parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs] 31 | 32 | function parts(s: string): Array> { 33 | if (s.length == 0) { 34 | return [[]]; 35 | } 36 | 37 | if (s.length == 1) { 38 | return [[s]]; 39 | } 40 | 41 | let c = s[0]; 42 | let cs = s.slice(1); 43 | return parts(cs).reduce((acc, pps) => { 44 | let p: string = pps[0]; 45 | let ps: Array = pps.slice(1); 46 | let l: Array = [c + p].concat(ps); 47 | let r: Array = [c].concat(p).concat(ps); 48 | return acc.concat([l, r]); 49 | }, [[]]).filter((c) => c.length != 0); 50 | } 51 | 52 | function one(a: Array, test: (s: any) => boolean): boolean { 53 | return a.reduce((acc: boolean, sc: any) => acc || test(sc), false); 54 | } 55 | 56 | function all(a: Array, test: (s: any) => boolean): boolean { 57 | return a.reduce((acc: boolean, sc: any) => acc && test(sc), true); 58 | } 59 | 60 | 61 | function accept(r: Regex, s: string): boolean { 62 | switch(r.kind) { 63 | case "eps": 64 | return s.length == 0; 65 | case "sym": 66 | return s.length == 1 && r.s == s[0]; 67 | case "alt": 68 | return accept(r.l, s) || accept(r.r, s); 69 | case "seq": 70 | return split(s).some((v: Array) => accept(r.l, v[0]) && accept(r.r, v[1])); 71 | case "rep": 72 | return parts(s).some((v: Array) => v.every((u: string) => accept(r.r, u))); 73 | } 74 | } 75 | 76 | function run_tests() { 77 | 78 | function assert(l: any) { 79 | console.log(" ", l); 80 | } 81 | 82 | let units = { 83 | test_simple: () => { 84 | let onea = sym("a"); 85 | assert(accept(onea, "a")); 86 | 87 | let nocs = rep(alt(sym("a"), sym("b"))); 88 | assert(accept(nocs, "abab")); 89 | }, 90 | 91 | test_seq: () => { 92 | let abc = seq(sym("a"), seq(sym("b"), sym("c"))); 93 | assert(accept(abc, "abc")); 94 | }, 95 | 96 | test_rc: () => { 97 | let ab = seq(sym("a"), sym("b")); 98 | let abab = seq(ab, ab); 99 | assert(accept(abab, "abab")); 100 | }, 101 | 102 | test_fail: () => { 103 | let ab = seq(sym("a"), sym("b")); 104 | let abab = seq(ab, ab); 105 | assert(! accept(abab, "abacb")); 106 | }, 107 | 108 | test_empty_rep: () => { 109 | let a = rep(sym("a")); 110 | assert(accept(a, "")); 111 | }, 112 | 113 | test_some_rep: () => { 114 | let a = rep(sym("a")); 115 | assert(accept(a, "a")); 116 | }, 117 | 118 | test_many_rep: () => { 119 | let a = rep(sym("a")); 120 | assert(accept(a, "aaaaaaa")); 121 | }, 122 | 123 | test_many_rep_dead_l: () => { 124 | let a = rep(sym("a")); 125 | assert(! accept(a, "!aaaaaa")); 126 | }, 127 | 128 | test_many_rep_dead_r: () => { 129 | let a = rep(sym("a")); 130 | assert(! accept(a, "aaaaaa!")); 131 | }, 132 | 133 | test_many_rep_dead_m: () => { 134 | let a = rep(sym("a")); 135 | assert(! accept(a, "aaa!aaa")); 136 | }, 137 | 138 | test_two: () => { 139 | let nocs = rep(alt(sym("a"), sym("b"))); 140 | let onec = seq(nocs, sym("c")); 141 | let evencs = seq(rep(seq(onec, onec)), nocs); 142 | assert(accept(evencs, "abcc")); 143 | assert(accept(evencs, "abccababbbbcc")); 144 | } 145 | } 146 | 147 | console.log("Running tests..."); 148 | for (let k of Object.keys(units)) { 149 | console.log(k); units[k](); 150 | } 151 | } 152 | 153 | run_tests(); 154 | -------------------------------------------------------------------------------- /python/01_rigged_brzozowski.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | from collections import namedtuple 4 | import re 5 | 6 | Emp = namedtuple("Emp", []) 7 | Eps = namedtuple("Eps", ["tok"]) 8 | Sym = namedtuple("Sym", ["c"]) 9 | Alt = namedtuple("Alt", ["l", "r"]) 10 | Seq = namedtuple("Seq", ["l", "r"]) 11 | Rep = namedtuple("Rep", ["r"]) 12 | Del = namedtuple("Del", ["r"]) 13 | 14 | cname = re.compile(r'^(\w+)\(') 15 | 16 | 17 | def cn(s): 18 | """ Find the canonical name of the regex op""" 19 | return cname.match(s.__doc__).group(1) 20 | 21 | 22 | def derive(r, c): 23 | """ Take a regex op and a character, and return the derivative regex op.""" 24 | def sym(r, c): 25 | if c == r.c: 26 | return Eps(set([c])) 27 | return Emp() 28 | 29 | def alt(r, c): 30 | l1 = derive(r.l, c) 31 | r1 = derive(r.r, c) 32 | if cn(l1) == 'Emp': 33 | return r1 34 | if cn(r1) == 'Emp': 35 | return l1 36 | return Alt(l1, r1) 37 | 38 | def seq(r, c): 39 | return Alt(Seq(derive(r.l, c), r.r), 40 | Seq(Del(r.l), derive(r.r, c))) 41 | 42 | def rep(r, c): 43 | return Seq(derive(r.r, c), r) 44 | 45 | def emp(r, c): 46 | return Emp() 47 | 48 | nextfn = { 49 | "Emp": emp, 50 | "Eps": emp, 51 | "Del": emp, 52 | "Sym": sym, 53 | "Alt": alt, 54 | "Seq": seq, 55 | "Rep": rep, 56 | }.get(cn(r)) 57 | 58 | return nextfn(r, c) 59 | 60 | 61 | def parsenull(r): 62 | """ Extract the generated parse forest from the residual regular expression.""" 63 | 64 | def emp(r): return set() 65 | 66 | def eps(r): return r.tok 67 | 68 | def sym(r): return set([""]) 69 | 70 | def alt(r): return parsenull(r.l).union(parsenull(r.r)) 71 | 72 | def seq(r): return set([i + j 73 | for j in parsenull(r.r) 74 | for i in parsenull(r.l)]) 75 | 76 | def one(r): return parsenull(r.r) 77 | 78 | nextfn = { 79 | "Emp": emp, 80 | "Sym": emp, 81 | "Rep": sym, 82 | "Del": one, 83 | "Eps": eps, 84 | "Alt": alt, 85 | "Seq": seq 86 | }.get(cn(r)) 87 | 88 | return nextfn(r) 89 | 90 | 91 | def parse(r, s): 92 | """Iterate through the string, generating a new regular expression for each character, until done.""" 93 | head = r 94 | for i in s: 95 | print head, "\n" 96 | head = derive(head, i) 97 | print head 98 | return parsenull(head) 99 | 100 | 101 | if __name__ == '__main__': 102 | nocs = Rep(Alt(Sym('a'), (Sym('b')))) 103 | onec = Seq(nocs, Sym('c')) 104 | evencs = Seq(Rep(Seq(onec, onec)), nocs) 105 | 106 | aas = Alt(Sym('a'), Rep(Sym('a'))) 107 | bbs = Alt(Sym('b'), Rep(Sym('b'))) 108 | 109 | 110 | # print(parse(evencs, "acc")) 111 | 112 | sym = Seq(Sym('a'), Seq(Rep(Sym('b')), Sym('c'))) 113 | parse(sym, "ac") 114 | -------------------------------------------------------------------------------- /python/02_rigged_brzozowski.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | from collections import namedtuple 4 | import re 5 | 6 | Emp = namedtuple("Emp", []) 7 | Eps = namedtuple("Eps", ["tok"]) 8 | Sym = namedtuple("Sym", ["c"]) 9 | Alt = namedtuple("Alt", ["l", "r"]) 10 | Seq = namedtuple("Seq", ["l", "r"]) 11 | Rep = namedtuple("Rep", ["r"]) 12 | 13 | cname = re.compile(r'^(\w+)\(') 14 | 15 | 16 | def cn(s): 17 | """ Find the canonical name of the regex op""" 18 | return cname.match(s.__doc__).group(1) 19 | 20 | 21 | def derive(r, c): 22 | """ Take a regex op and a character, and return the derivative regex op.""" 23 | def sym(r, c): 24 | if c == r.c: 25 | return Eps(set([c])) 26 | return Emp() 27 | 28 | def alt(r, c): 29 | l1 = derive(r.l, c) 30 | r1 = derive(r.r, c) 31 | if cn(l1) == 'Emp': 32 | return r1 33 | if cn(r1) == 'Emp': 34 | return l1 35 | return Alt(l1, r1) 36 | 37 | def seq(r, c): 38 | if nullable(r.l): 39 | return Alt(Seq(derive(r.l, c), r.r), derive(r.r, c)) 40 | return Seq(derive(r.l, c), r.r) 41 | 42 | def rep(r, c): 43 | return Seq(derive(r.r, c), r) 44 | 45 | def emp(r, c): 46 | return Emp() 47 | 48 | nextfn = { 49 | "Emp": emp, 50 | "Eps": emp, 51 | "Sym": sym, 52 | "Alt": alt, 53 | "Seq": seq, 54 | "Rep": rep, 55 | }.get(cn(r)) 56 | 57 | return nextfn(r, c) 58 | 59 | def nullable(r): 60 | def zer(r): return False 61 | def one(r): return True 62 | def alt(r): return nullable(r.l) or nullable(r.r) 63 | def seq(r): return nullable(r.l) and nullable(r.r) 64 | 65 | nextfn = { 66 | "Emp": zer, 67 | "Sym": zer, 68 | "Rep": one, 69 | "Eps": one, 70 | "Alt": alt, 71 | "Seq": seq 72 | }.get(cn(r)) 73 | 74 | return nextfn(r) 75 | 76 | def parsenull(r): 77 | """ Extract the generated parse forest from the residual regular expression.""" 78 | 79 | def emp(r): return set() 80 | 81 | def eps(r): return r.tok 82 | 83 | def sym(r): return set([""]) 84 | 85 | def alt(r): return parsenull(r.l).union(parsenull(r.r)) 86 | 87 | def seq(r): return set([i + j 88 | for j in parsenull(r.r) 89 | for i in parsenull(r.l)]) 90 | 91 | nextfn = { 92 | "Emp": emp, 93 | "Sym": emp, 94 | "Rep": sym, 95 | "Eps": eps, 96 | "Alt": alt, 97 | "Seq": seq 98 | }.get(cn(r)) 99 | 100 | return nextfn(r) 101 | 102 | 103 | def parse(r, s): 104 | """Iterate through the string, generating a new regular expression for each character, until done.""" 105 | head = r 106 | for i in s: 107 | print head, "\n" 108 | head = derive(head, i) 109 | print head 110 | return parsenull(head) 111 | 112 | 113 | if __name__ == '__main__': 114 | # nocs = Rep(Alt(Sym('a'), (Sym('b')))) 115 | # onec = Seq(nocs, Sym('c')) 116 | # evencs = Seq(Rep(Seq(onec, onec)), nocs) 117 | # 118 | # aas = Alt(Sym('a'), Rep(Sym('a'))) 119 | # bbs = Alt(Sym('b'), Rep(Sym('b'))) 120 | # 121 | 122 | sym = Seq(Sym('a'), Seq(Rep(Sym('b')), Sym('c'))) 123 | parse(sym, "ac") 124 | -------------------------------------------------------------------------------- /python/README.md: -------------------------------------------------------------------------------- 1 | # Python Experiments! 2 | 3 | This directory contains some simple experiments, in Python. Python is, 4 | frankly, easier to instrument than Haskell, so figuring out the 5 | underlying operation and stepping through it with pdb, can sometimes be 6 | easier to do in Python3 7 | 8 | `01_rigged_brzowoski.py`: A naive implementation of Brzozowski's regular 9 | expression library, using the `Delta` operator to distinguish between 10 | nullable and not-nullable branches of the `Sequence` operator. What's 11 | remarkable about it, if anything, is just *how much* it resembles 12 | Haskell Experiment 05: Rigged Brzozowski Regular Expressions. Part of 13 | that is using the `namedtuple` as an easy hack for Haskell's data 14 | constructors, and then implementing the `derive()` and `parsenull()` 15 | functions using map functions as a substitute for Haskell's pattern 16 | matching. 17 | 18 | This is mostly proof that "One can write Haskell poorly in any 19 | language." 20 | -------------------------------------------------------------------------------- /rust/01_simpleregex/Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "simpleregex" 3 | version = "0.1.0" 4 | authors = ["Elf M. Sternberg "] 5 | edition = "2018" 6 | 7 | -------------------------------------------------------------------------------- /rust/01_simpleregex/README.md: -------------------------------------------------------------------------------- 1 | # Kleene Regular Expressions, in Rust. 2 | 3 | This is literally the definition of a simple string recognizing regular 4 | expression in Rust. It consists of the `Reg` datatype encompassing 5 | the five standard operations of regular expressions and an `accept` 6 | function that takes the expression and a string and returns a Boolean 7 | yes/no on recognition or failure. It is a direct implementation of 8 | Kleene's algebra: 9 | 10 | L[[ε]] = {ε} 11 | L[[a]] = {a} 12 | L[[r · s]] = {u · v | u ∈ L[[r]] and v ∈ L[[s]]} 13 | L[[r | s]] = L[[r]] ∪ L[[s]] 14 | L[[r∗]] = {ε} ∪ L[[r · r*]] 15 | 16 | Those equations are for: recognizing an empty string, recognizing a 17 | letter, recognizing two expressions in sequence, recognizing two 18 | expression alternatives, and the repetition operation. 19 | 20 | Composition is by simple reference-counted pointers to child 21 | expressions. I've provided convenient constructor functions to make the 22 | creation of new regexes easier. 23 | 24 | The `accept` function has two helper functions that split the string, 25 | and all substrings, into all possible substrings such that *every 26 | possible combination* of string and expression are tested, and if the 27 | resulting tree of `and`s (from Sequencing and Repetition) and `or`s 28 | (from Alternation) has at least one complete collection of `True` from 29 | top to bottom then the function returns true. 30 | 31 | This generation and comparison of substrings is grossly inefficient; an 32 | string of eight 'a's with `a*` will take 30 seconds on a modern laptop; 33 | increase that to twelve and you'll be waiting about an hour. The cost 34 | is `2^(n - 1)`, where `n` is the length of the string; this is a 35 | consequence of the sequencing operation. Sequences aren't just about 36 | letters: they could be about anything, including repetition (which 37 | itself creates new sequences) and other sequences, and the cost of 38 | examining every possible combination of sequencing creates this 39 | exponential cost. 40 | 41 | While not as clean (no pun intended) as the Haskell version, especially 42 | in the helper functions, it's still surprisingly easy to read, and the 43 | `accept` function is almost line-for-line as clear as the Haskell 44 | version. The use of `.any` and `.all` for the `and` and `or` functions 45 | makes a lot of sense here. 46 | 47 | ## License 48 | 49 | As this is entirely my work, it is copyright (c) 2019, and licensed 50 | under the Mozilla Public License v. 2.0. See the 51 | [LICENSE.md](../../LICENSE.md) in the root directory. 52 | -------------------------------------------------------------------------------- /rust/01_simpleregex/src/lib.rs: -------------------------------------------------------------------------------- 1 | use std::rc::Rc; 2 | 3 | // data Reg = Eps | Sym Char | Alt Reg Reg | Seq Reg Reg | Rep Reg 4 | 5 | #[derive(Debug)] 6 | pub enum Reg { 7 | Eps, 8 | Sym(char), 9 | Alt(Rc, Rc), 10 | Seq(Rc, Rc), 11 | Rep(Rc), 12 | } 13 | 14 | // Some rust-specific helpers to make constructing regular expressions 15 | // easier. 16 | 17 | pub fn eps() -> Rc { 18 | Rc::new(Reg::Eps) 19 | } 20 | pub fn sym(c: char) -> Rc { 21 | Rc::new(Reg::Sym(c)) 22 | } 23 | pub fn alt(r1: &Rc, r2: &Rc) -> Rc { 24 | Rc::new(Reg::Alt(r1.clone(), r2.clone())) 25 | } 26 | pub fn seq(r1: &Rc, r2: &Rc) -> Rc { 27 | Rc::new(Reg::Seq(r1.clone(), r2.clone())) 28 | } 29 | pub fn rep(r1: &Rc) -> Rc { 30 | Rc::new(Reg::Rep(r1.clone())) 31 | } 32 | 33 | // split :: [a] -> [([a], [a])] 34 | // split [] = [([], [])] 35 | // split (c:cs) = ([], c : cs) : [(c : s1, s2) | (s1, s2) <- split cs] 36 | 37 | pub fn split(s: &[char]) -> Vec<(Vec, Vec)> { 38 | if s.is_empty() { 39 | return vec![(vec![], vec![])]; 40 | } 41 | 42 | let mut ret = vec![(vec![], s.to_vec())]; 43 | let c = s[0]; 44 | 45 | fn permute(c: char, s1: &mut Vec, s2: &[char]) -> (Vec, Vec) { 46 | let mut r1 = vec![c]; 47 | r1.append(s1); 48 | (r1, s2.to_vec()) 49 | } 50 | 51 | ret.append( 52 | &mut split(&s[1..]) 53 | .iter_mut() 54 | .map(|(s1, s2)| permute(c, s1, &s2)) 55 | .collect(), 56 | ); 57 | ret 58 | } 59 | 60 | // parts :: [a] -> [[[a]]] 61 | // parts [] = [[]] 62 | // parts [c] = [[[c]]] 63 | // parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs] 64 | 65 | // This was challenging to port to Rust. Haskell's automatic 66 | // conversion of [Char] to String obscured what was going on under the 67 | // covers. 68 | // 69 | // parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs] 70 | // The two elements are: 71 | // 72 | // - ([c]:[[p]]):[[[ps]]] 73 | // The char 'c' is converted to a string, and that string is consed to 74 | // list 'p', and then list 'p' is consed onto the list 'ps' 75 | // 76 | // - [[c]]:[[p]]:[[[ps]]] 77 | // The char 'c' is made into a string and then wrapped in a list, and 78 | // then [[p]] and [[c]] are both consed onto list 'ps' 79 | // 80 | // It really took writing it all out on paper to understand the order 81 | // operation. 82 | 83 | pub fn parts(s: &[char]) -> Vec>> { 84 | if s.is_empty() { 85 | return vec![vec![]]; 86 | } 87 | if s.len() == 1 { 88 | return vec![vec![s.to_vec()]]; 89 | } 90 | 91 | let head = s[0]; 92 | let tail = &s[1..]; 93 | 94 | let mut ret = vec![]; 95 | for pps in parts(tail) { 96 | let phead = &pps[0]; 97 | let ptail = &pps[1..]; 98 | 99 | let mut left = vec![head]; 100 | left.append(&mut phead.to_vec()); 101 | 102 | let mut left_1 = vec![left]; 103 | left_1.append(&mut ptail.to_vec()); 104 | ret.push(left_1); 105 | 106 | let mut right = vec![vec![head]]; 107 | right.push(phead.to_vec()); 108 | right.append(&mut ptail.to_vec()); 109 | ret.push(right); 110 | } 111 | ret 112 | } 113 | 114 | // accept :: Reg -> String -> Bool 115 | // accept Eps u = null u 116 | // accept (Sym c) u = u == [c] 117 | // accept (Alt p q) u = accept p u || accept q u 118 | // accept (Seq p q) u = or [accept p u1 && accept q u2 | (u1, u2) <- split u] 119 | // accept (Rep r) u = or [and [accept r ui | ui <- ps] | ps <- parts u] 120 | 121 | pub fn accept(r: &Reg, s: &[char]) -> bool { 122 | match r { 123 | Reg::Eps => s.is_empty(), 124 | Reg::Sym(c) => (s.len() == 1 && s[0] == *c), 125 | Reg::Alt(r1, r2) => accept(&r1, s) || accept(&r2, s), 126 | Reg::Seq(r1, r2) => split(s) 127 | .into_iter() 128 | .any(|(u1, u2)| accept(r1, &u1) && accept(r2, &u2)), 129 | Reg::Rep(r) => parts(s) 130 | .into_iter() 131 | .any(|ps| ps.into_iter().all(|u| accept(r, &u))), 132 | } 133 | } 134 | 135 | #[cfg(test)] 136 | mod tests { 137 | use super::*; 138 | 139 | fn vectostr(r: &(Vec, Vec)) -> (String, String) { 140 | let (a, b) = r; 141 | let c: String = a.into_iter().collect(); 142 | let d: String = b.into_iter().collect(); 143 | (c, d) 144 | } 145 | 146 | #[test] 147 | fn test_split() { 148 | let c1: Vec = String::from("").chars().into_iter().collect(); 149 | let s: Vec<(String, String)> = split(&c1).into_iter().map(|r| vectostr(&r)).collect(); 150 | assert_eq!(s, [("".to_string(), "".to_string())]); 151 | } 152 | 153 | #[test] 154 | fn test_simple() { 155 | let c1: Vec = String::from("acc").chars().into_iter().collect(); 156 | assert_eq!(c1, ['a', 'c', 'c']); 157 | 158 | let c2: Vec = String::from("a").chars().into_iter().collect(); 159 | let onea = sym('a'); 160 | assert!(accept(&onea, &c2)); 161 | 162 | let c3: Vec = String::from("abab").chars().into_iter().collect(); 163 | let nocs = rep(&alt(&sym('a'), &sym('b'))); 164 | assert!(accept(&nocs, &c3)); 165 | } 166 | 167 | #[test] 168 | fn test_seq() { 169 | let c3: Vec = String::from("abc").chars().into_iter().collect(); 170 | let abc = seq(&sym('a'), &seq(&sym('b'), &sym('c'))); 171 | assert!(accept(&abc, &c3)); 172 | } 173 | 174 | #[test] 175 | fn test_rc() { 176 | let c3: Vec = String::from("abab").chars().into_iter().collect(); 177 | let ab = seq(&sym('a'), &sym('b')); 178 | let abab = seq(&ab, &ab); 179 | assert!(accept(&abab, &c3)); 180 | } 181 | 182 | #[test] 183 | fn test_empty_rep() { 184 | let c3: Vec = String::from("").chars().into_iter().collect(); 185 | let a = rep(&sym('a')); 186 | assert!(accept(&a, &c3)); 187 | } 188 | 189 | #[test] 190 | fn test_two() { 191 | let c4: Vec = String::from("abcc").chars().into_iter().collect(); 192 | let nocs = rep(&alt(&sym('a'), &sym('b'))); 193 | let onec = seq(&nocs, &sym('c')); 194 | let evencs = seq(&rep(&seq(&onec, &onec)), &nocs); 195 | assert!(accept(&evencs, &c4)) 196 | } 197 | } 198 | -------------------------------------------------------------------------------- /rust/02_riggedregex/Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "riggedregex" 3 | version = "0.1.0" 4 | authors = ["Elf M. Sternberg "] 5 | edition = "2018" 6 | 7 | [dependencies] 8 | num-traits = "0.2.6" 9 | -------------------------------------------------------------------------------- /rust/02_riggedregex/README.md: -------------------------------------------------------------------------------- 1 | # Kleene Regular Expressions with Rigging, in Rust 2 | 3 | This program builds on the simple regular expressions in Version 01, 4 | provding a new definition of a regular expression `Regw` that takes two 5 | types, a source type and an output type. The output type must be a 6 | [*Semiring*](https://en.wikipedia.org/wiki/Semiring). 7 | 8 | A semiring is a set R equipped with two binary operations + and ⋅, and 9 | two constants identified as 0 and 1. By providing a semiring to the 10 | regular expression, we change the return type of the regular expression 11 | to any set that can obey the semiring laws. There's a surprising amount 12 | of stuff you can do with the semiring laws. 13 | 14 | In this example, I've providing a function, `rigged`, that takes a 15 | simple regular expression from Version 01, and wraps or extracts 16 | the contents of that regular expression into the `Regw` datatype. 17 | Instead of the boolean mathematics of Version 01, we use the semiring 18 | symbols `add` and `mul` to represent the sum and product operations on 19 | the return type. We then define the "symbol accepted" boolean to return 20 | either the `zero` or `one` type of the semiring. 21 | 22 | I've provided two semirings: One of (0, 1, +, *, Integers), and one of 23 | (False, True, ||, &&, Booleans). Both work well. 24 | 25 | Rust isn't nearly as magical as Haskell. (See the Readme in the 26 | equivalent Haskell version for my comments on that.) On the other hand, 27 | it's not necessary to define a Semiring explicitly; instead, we define a 28 | nominative type, a struct containing our real return type, and then 29 | provide implementations of One, Zero, Mul, and Add for that type. Here, 30 | my two semirings are name `Recognizer` and `Ambigcounter`, and to make 31 | them work we have to say that our recognizer is a `Regw`; 32 | Rust won't magically glue everything together the way Haskell will. 33 | 34 | Still, this was a straightforward implementation of the rigged regular 35 | expression, and is a good stepping stone for future projects. 36 | 37 | ## License 38 | 39 | As this is entirely my work, it is copyright (c) 2019, and licensed 40 | under the Mozilla Public License v. 2.0. See the 41 | [LICENSE.md](../../LICENSE.md) in the root directory. 42 | -------------------------------------------------------------------------------- /rust/03_brzozowski_1/.gitignore: -------------------------------------------------------------------------------- 1 | /target 2 | **/*.rs.bk 3 | Cargo.lock 4 | -------------------------------------------------------------------------------- /rust/03_brzozowski_1/Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "sbrz" 3 | version = "0.1.0" 4 | authors = ["Elf M. Sternberg "] 5 | edition = "2018" 6 | 7 | [dependencies] 8 | -------------------------------------------------------------------------------- /rust/03_brzozowski_1/README.md: -------------------------------------------------------------------------------- 1 | # Brzozowski Regular Expressions, in Rust 2 | 3 | This is a regex recognizer implementing Brzozowski's Algorithm, in Rust. 4 | It has two standard optimizations (null branches are automatically 5 | pruned), and with those it works fine. 6 | 7 | This version implements regular expressions as they appear in the Racket 8 | version, without nullability optimizations (the so-called "rerp" 9 | implementation). 10 | 11 | ## License 12 | 13 | As this is entirely my work, it is copyright (c) 2019, and licensed 14 | under the Mozilla Public License v. 2.0. See the 15 | [LICENSE.md](../../LICENSE.md) in the root directory. 16 | -------------------------------------------------------------------------------- /rust/03_brzozowski_1/src/lib.rs: -------------------------------------------------------------------------------- 1 | use std::ops::Deref; 2 | use std::rc::Rc; 3 | 4 | #[derive(Debug)] 5 | pub enum Brz { 6 | Emp, 7 | Eps, 8 | Sym(char), 9 | Alt(Rc, Rc), 10 | Seq(Rc, Rc), 11 | Rep(Rc), 12 | } 13 | 14 | macro_rules! matches { 15 | ($expression:expr, $($pattern:tt)+) => { 16 | match $expression { 17 | $($pattern)+ => true, 18 | _ => false 19 | } 20 | } 21 | } 22 | 23 | macro_rules! cond { 24 | ($($pred:expr => $body:block),+ ,_ => $default:block) => { 25 | { 26 | $(if $pred $body else)+ 27 | $default 28 | } 29 | } 30 | } 31 | 32 | pub fn emp() -> Rc { 33 | Rc::new(Brz::Emp) 34 | } 35 | 36 | pub fn eps() -> Rc { 37 | Rc::new(Brz::Eps) 38 | } 39 | 40 | pub fn sym(c: char) -> Rc { 41 | Rc::new(Brz::Sym(c)) 42 | } 43 | 44 | pub fn alt(r1: &Rc, r2: &Rc) -> Rc { 45 | cond!( 46 | matches!(r1.deref(), Brz::Emp) => { r2.clone() }, 47 | matches!(r2.deref(), Brz::Emp) => { r1.clone() }, 48 | _ => { Rc::new(Brz::Alt(r1.clone(), r2.clone())) } 49 | ) 50 | } 51 | 52 | pub fn seq(r1: &Rc, r2: &Rc) -> Rc { 53 | cond!( 54 | matches!(r1.deref(), Brz::Emp) => { emp() }, 55 | matches!(r2.deref(), Brz::Emp) => { emp() }, 56 | _ => { Rc::new(Brz::Seq(r1.clone(), r2.clone())) } 57 | ) 58 | } 59 | 60 | pub fn rep(r1: &Rc) -> Rc { 61 | Rc::new(Brz::Rep(r1.clone())) 62 | } 63 | 64 | pub fn derive(n: &Rc, c: char) -> Rc { 65 | use self::Brz::*; 66 | 67 | match n.deref() { 68 | Emp => emp(), 69 | Eps => emp(), 70 | Sym(u) => { 71 | if c == *u { 72 | eps() 73 | } else { 74 | emp() 75 | } 76 | } 77 | Seq(l, r) => { 78 | let s = seq(&derive(l, c), r); 79 | if nullable(l) { 80 | alt(&s, &derive(r, c)) 81 | } else { 82 | s 83 | } 84 | } 85 | Alt(l, r) => alt(&derive(l, c), &derive(r, c)), 86 | Rep(r) => seq(&derive(r, c), &n.clone()), 87 | } 88 | } 89 | 90 | pub fn nullable(n: &Rc) -> bool { 91 | use self::Brz::*; 92 | 93 | match n.deref() { 94 | Emp => false, 95 | Eps => true, 96 | Sym(_) => false, 97 | Seq(l, r) => nullable(l) && nullable(r), 98 | Alt(l, r) => nullable(l) || nullable(r), 99 | Rep(_) => true, 100 | } 101 | } 102 | 103 | pub fn accept(n: &Rc, s: String) -> bool { 104 | use self::Brz::*; 105 | 106 | let mut source = s.chars().peekable(); 107 | let mut r = n.clone(); 108 | loop { 109 | match source.next() { 110 | None => break nullable(&r), 111 | Some(ref c) => { 112 | let np = derive(&r, *c); 113 | println!("{:?}", np); 114 | match np.deref() { 115 | Emp => return false, 116 | Eps => { 117 | break match source.peek() { 118 | None => true, 119 | Some(_) => false, 120 | }; 121 | } 122 | _ => r = np.clone(), 123 | } 124 | } 125 | } 126 | } 127 | } 128 | 129 | #[cfg(test)] 130 | mod tests { 131 | use super::*; 132 | 133 | #[test] 134 | fn basics() { 135 | let cases = [ 136 | ("empty", eps(), "", true), 137 | ("null", emp(), "", false), 138 | ("char", sym('a'), "a", true), 139 | ("not char", sym('a'), "b", false), 140 | ("char vs empty", sym('a'), "", false), 141 | ("left alt", alt(&sym('a'), &sym('b')), "a", true), 142 | ("right alt", alt(&sym('a'), &sym('b')), "b", true), 143 | ("neither alt", alt(&sym('a'), &sym('b')), "c", false), 144 | ("empty alt", alt(&sym('a'), &sym('b')), "", false), 145 | ("empty rep", rep(&sym('a')), "", true), 146 | ("one rep", rep(&sym('a')), "a", true), 147 | ("short multiple failed rep", rep(&sym('a')), "ab", false), 148 | ("multiple rep", rep(&sym('a')), "aaaaaaaaa", true), 149 | ( 150 | "multiple rep with failure", 151 | rep(&sym('a')), 152 | "aaaaaaaaab", 153 | false, 154 | ), 155 | ("sequence", seq(&sym('a'), &sym('b')), "ab", true), 156 | ("sequence with empty", seq(&sym('a'), &sym('b')), "", false), 157 | ("bad short sequence", seq(&sym('a'), &sym('b')), "a", false), 158 | ("bad long sequence", seq(&sym('a'), &sym('b')), "abc", false), 159 | ]; 160 | 161 | for (name, case, sample, result) in &cases { 162 | println!("{:?}", name); 163 | assert_eq!(accept(case, sample.to_string()), *result); 164 | } 165 | } 166 | } 167 | -------------------------------------------------------------------------------- /rust/04_brzozowski_2/.gitignore: -------------------------------------------------------------------------------- 1 | /target 2 | **/*.rs.bk 3 | Cargo.lock 4 | -------------------------------------------------------------------------------- /rust/04_brzozowski_2/Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "sbrz" 3 | version = "0.1.0" 4 | authors = ["Elf M. Sternberg "] 5 | edition = "2018" 6 | 7 | [dependencies] 8 | -------------------------------------------------------------------------------- /rust/04_brzozowski_2/README.md: -------------------------------------------------------------------------------- 1 | # Brzozowski Regular Expressions, in Rust 2 | 3 | This is a regex recognizer implementing Brzozowski's Algorithm, in Rust. 4 | It has two standard optimizations (null branches are automatically 5 | pruned), and with those it works fine. 6 | 7 | This version implements regular expressions as they appear in my own 8 | Haskell version, version, without nullability optimizations (the 9 | so-called "rerp" implementation). 10 | 11 | The difference between the two baseline implementations is that this one 12 | attempts to "object orient" the code, creating an implementation that 13 | can be modified without having to touch many portions of the code, by 14 | isolating the 'derive' and 'nullability' tests into their own 15 | implementation of the BrzNode. I consider the experiment something of a 16 | failure, in that to "work around" Rust's lack of inheritance I had to do 17 | some fairly wacky things to teach Rust how to look stuff up. 18 | 19 | ## License 20 | 21 | As this is entirely my work, it is copyright (c) 2019, and licensed 22 | under the Mozilla Public License v. 2.0. See the 23 | [LICENSE.md](../../LICENSE.md) in the root directory. 24 | -------------------------------------------------------------------------------- /rust/04_brzozowski_2/src/lib.rs: -------------------------------------------------------------------------------- 1 | use std::ops::Deref; 2 | use std::rc::Rc; 3 | 4 | #[derive(Debug)] 5 | pub struct Emp; 6 | #[derive(Debug)] 7 | pub struct Eps; 8 | #[derive(Debug)] 9 | pub struct Sym(char); 10 | #[derive(Debug)] 11 | pub struct Alt(Rc, Rc); 12 | #[derive(Debug)] 13 | pub struct Seq(Rc, Rc); 14 | #[derive(Debug)] 15 | pub struct Rep(Rc); 16 | 17 | #[derive(Debug)] 18 | pub enum Brz { 19 | Emp(Emp), 20 | Eps(Eps), 21 | Sym(Sym), 22 | Alt(Alt), 23 | Seq(Seq), 24 | Rep(Rep), 25 | } 26 | 27 | impl Brz { 28 | fn derive(&self, c: char) -> Rc { 29 | match self { 30 | Brz::Emp(emp) => emp.derive(c), 31 | Brz::Eps(eps) => eps.derive(c), 32 | Brz::Sym(sym) => sym.derive(c), 33 | Brz::Alt(alt) => alt.derive(c), 34 | Brz::Seq(seq) => seq.derive(c), 35 | Brz::Rep(rep) => rep.derive(c), 36 | } 37 | } 38 | 39 | fn nullable(&self) -> bool { 40 | match self { 41 | Brz::Emp(emp) => emp.nullable(), 42 | Brz::Eps(eps) => eps.nullable(), 43 | Brz::Sym(sym) => sym.nullable(), 44 | Brz::Alt(alt) => alt.nullable(), 45 | Brz::Seq(seq) => seq.nullable(), 46 | Brz::Rep(rep) => rep.nullable(), 47 | } 48 | } 49 | } 50 | 51 | trait Brznode { 52 | fn derive(&self, c: char) -> Rc; 53 | fn nullable(&self) -> bool; 54 | } 55 | 56 | impl Brznode for Emp { 57 | fn derive(&self, _: char) -> Rc { 58 | Rc::new(Brz::Emp(Emp {})) 59 | } 60 | fn nullable(&self) -> bool { 61 | false 62 | } 63 | } 64 | 65 | impl Brznode for Eps { 66 | fn derive(&self, _: char) -> Rc { 67 | Rc::new(Brz::Emp(Emp {})) 68 | } 69 | fn nullable(&self) -> bool { 70 | true 71 | } 72 | } 73 | 74 | impl Brznode for Sym { 75 | fn derive(&self, c: char) -> Rc { 76 | Rc::new(if c == self.0 { 77 | Brz::Eps(Eps {}) 78 | } else { 79 | Brz::Emp(Emp {}) 80 | }) 81 | } 82 | fn nullable(&self) -> bool { 83 | false 84 | } 85 | } 86 | 87 | pub fn alt(r1: &Rc, r2: &Rc) -> Rc { 88 | match (r1.deref(), r2.deref()) { 89 | (_, Brz::Emp(_)) => r1.clone(), 90 | (Brz::Emp(_), _) => r2.clone(), 91 | _ => Rc::new(Brz::Alt(Alt(r1.clone(), r2.clone()))), 92 | } 93 | } 94 | 95 | impl Brznode for Alt { 96 | fn derive(&self, c: char) -> Rc { 97 | let l = &self.0.derive(c); 98 | let r = &self.1.derive(c); 99 | alt(l, r) 100 | } 101 | fn nullable(&self) -> bool { 102 | self.0.nullable() || self.1.nullable() 103 | } 104 | } 105 | 106 | pub fn seq(r1: &Rc, r2: &Rc) -> Rc { 107 | match (r1.deref(), r2.deref()) { 108 | (_, Brz::Emp(_)) => emp(), 109 | (Brz::Emp(_), _) => emp(), 110 | _ => Rc::new(Brz::Seq(Seq(r1.clone(), r2.clone()))), 111 | } 112 | } 113 | 114 | impl Brznode for Seq { 115 | fn derive(&self, c: char) -> Rc { 116 | let s = seq(&self.0.derive(c), &self.1); 117 | if self.0.nullable() { 118 | alt(&s, &self.1.derive(c)) 119 | } else { 120 | s 121 | } 122 | } 123 | fn nullable(&self) -> bool { 124 | self.0.nullable() && self.1.nullable() 125 | } 126 | } 127 | 128 | impl Brznode for Rep { 129 | fn derive(&self, c: char) -> Rc { 130 | seq(&self.0.derive(c), &rep(&self.0)) 131 | } 132 | fn nullable(&self) -> bool { 133 | true 134 | } 135 | } 136 | 137 | pub fn emp() -> Rc { 138 | Rc::new(Brz::Emp(Emp)) 139 | } 140 | 141 | pub fn eps() -> Rc { 142 | Rc::new(Brz::Eps(Eps)) 143 | } 144 | 145 | pub fn sym(c: char) -> Rc { 146 | Rc::new(Brz::Sym(Sym(c))) 147 | } 148 | 149 | pub fn rep(r1: &Rc) -> Rc { 150 | Rc::new(Brz::Rep(Rep(r1.clone()))) 151 | } 152 | 153 | pub fn accept(n: &Rc, s: String) -> bool { 154 | let mut source = s.chars().peekable(); 155 | let mut r = n.clone(); 156 | loop { 157 | match source.next() { 158 | None => break r.nullable(), 159 | Some(c) => { 160 | let np = r.derive(c); 161 | match np.deref() { 162 | Brz::Emp(_) => return false, 163 | Brz::Eps(_) => break source.peek().is_none(), 164 | _ => r = np.clone(), 165 | } 166 | } 167 | } 168 | } 169 | } 170 | 171 | #[cfg(test)] 172 | mod tests { 173 | use super::*; 174 | 175 | #[test] 176 | fn basics() { 177 | let cases = [ 178 | ("empty", eps(), "", true), 179 | ("null", emp(), "", false), 180 | ("char", sym('a'), "a", true), 181 | ("not char", sym('a'), "b", false), 182 | ("char vs empty", sym('a'), "", false), 183 | ("left alt", alt(&sym('a'), &sym('b')), "a", true), 184 | ("right alt", alt(&sym('a'), &sym('b')), "b", true), 185 | ("neither alt", alt(&sym('a'), &sym('b')), "c", false), 186 | ("empty alt", alt(&sym('a'), &sym('b')), "", false), 187 | ("empty rep", rep(&sym('a')), "", true), 188 | ("sequence", seq(&sym('a'), &sym('b')), "ab", true), 189 | ("sequence with empty", seq(&sym('a'), &sym('b')), "", false), 190 | ("bad long sequence", seq(&sym('a'), &sym('b')), "abc", false), 191 | ("bad short sequence", seq(&sym('a'), &sym('b')), "a", false), 192 | ("one rep", rep(&sym('a')), "a", true), 193 | ("short multiple failed rep", rep(&sym('a')), "ab", false), 194 | ("multiple rep", rep(&sym('a')), "aaaaaaaaa", true), 195 | ( 196 | "multiple rep with failure", 197 | rep(&sym('a')), 198 | "aaaaaaaaab", 199 | false, 200 | ), 201 | ]; 202 | 203 | for (name, case, sample, result) in &cases { 204 | println!("{:?}", name); 205 | assert_eq!(accept(case, sample.to_string()), *result); 206 | } 207 | } 208 | } 209 | -------------------------------------------------------------------------------- /rust/05_glushkov/Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "glushkov" 3 | version = "0.1.0" 4 | authors = ["Elf M. Sternberg "] 5 | edition = "2018" 6 | 7 | [dependencies] 8 | -------------------------------------------------------------------------------- /rust/05_glushkov/README.md: -------------------------------------------------------------------------------- 1 | # Glushkov Regular Expressions, in Rust 2 | 3 | This is a Glushkov's construction of regular expressions. The basic idea 4 | is that for every symbol encountered during parsing, a corresponding 5 | symbol in the tree is marked (or, if no symbols are marked, the parse 6 | is a failure). Composites are followed to their ends for each 7 | character, and if the symbol matches it is "marked". 8 | 9 | In this instance, we create a Glushkov regular expression tree, and for 10 | each character it returns a new, complete copy of the tree, only with 11 | the marks "shifted" to where they should be given the character. In 12 | this way, each iteration of the tree keeps the NFA list of states that 13 | are active; they are the paths that lead to marked symbols. 14 | 15 | `ended` here means that no more symbols have to be read to match the 16 | expression. `empty` here means that the expression matches only the 17 | empty string. This function was named `final` in the Haskell version, 18 | but the word `ended` is used here because `final` is a reserved word in 19 | Rust. 20 | 21 | 'ended' is used here to determine if, for the Glushkov expression 22 | passed in, does the expression contain a marked symbol? This is 23 | used both to determine the end state of the expression, and in 24 | sequences to determine if the rightmost expression must be evaluted, 25 | that is, if we're currently going down a 'marked' path and the left 26 | expression can handle the empty string OR the left expression is 27 | ended. 28 | 29 | The accept method is just a fold over the expression. The initial 30 | value is the shift of the first character, with the assumed mark of 31 | `True` being included because we can always parse infinitely many 32 | empty strings before the sample begins. The returned value of that 33 | shift is our new regular expression, on which we then progressively 34 | call `shift False accg c`; here False means that we're only going to 35 | shift marks we've already found. 36 | 37 | The "trick" to understanding how this works is to consider the string 38 | "abc" for the expression "abc". The first time through, we start with 39 | True, and what gets marked is the symbol 'a': 40 | 41 | `(seq 'a' (seq 'b' 'c')) -> (seq 'a'* (seq 'b' 'c'))` 42 | 43 | When we pass the letter 'b', what happens? Well, the returned 44 | expression will have the 'a' symbol unmarked (it didn't match the 45 | character), but the second part of the shift expression says that the 46 | left expression is ended (it's a symbol and it was marked!), so we call 47 | `shift True (Sym 'b') 'b'`, and the new symbol generated will be marked, 48 | moving the mark to the correct destination. The same thing happens on 49 | the next iteration. The *inner seq* will get back that `(sym 'b')` is 50 | marked, so 'c' will match the `(sym 'c')` and shift will be in a `True` 51 | state, so now the expression comes back `(seq 'a' (seq 'b' 'c'*))`. 52 | 53 | When we run out of letters or regex, we can ask, "Is the expression 54 | final?" Again, the tricky part is inside sequences: we're only final if 55 | the left side is final and the right side can handle an empty string, or 56 | if the right side is final. 57 | 58 | Porting this from Haskell was *much* more straightforward than porting 59 | the straight regex versions, and is slightly more efficient, although 60 | it still has the "transition the entire parse tree every character" 61 | problem. That's to be solved later. 62 | 63 | ## License 64 | 65 | As this is entirely my work, it is copyright (c) 2019, and licensed 66 | under the Mozilla Public License v. 2.0. See the 67 | [LICENSE.md](../../LICENSE.md) in the root directory. 68 | -------------------------------------------------------------------------------- /rust/05_glushkov/src/lib.rs: -------------------------------------------------------------------------------- 1 | //! This crate provides a series of simple functions for building a 2 | //! regular expression, and an `accept` function which takes a 3 | //! completed regular expression and a string and returns a boolean 4 | //! value describing if the expression matched the string (or not). 5 | //! 6 | //! # Quick Preview 7 | //! 8 | //! ``` 9 | //! use glushkov::*; 10 | //! // `(fred|dino)` 11 | //! let expr = alt(&seq(&sym('f'), &seq(&sym('r'), &seq(&sym('e'), &sym('d')))), 12 | //! &seq(&sym('d'), &seq(&sym('i'), &seq(&sym('n'), &sym('o'))))); 13 | //! accept(&expr, "fred") == true; 14 | //! accept(&expr, "dino") == true; 15 | //! accept(&expr, "wilma") == false; 16 | //! ``` 17 | 18 | use std::ops::Deref; 19 | use std::rc::Rc; 20 | 21 | #[derive(Debug)] 22 | pub enum Glu { 23 | Eps, 24 | Sym(bool, char), 25 | Alt(Rc, Rc), 26 | Seq(Rc, Rc), 27 | Rep(Rc), 28 | } 29 | 30 | /// Recognize only the empty string 31 | pub fn eps() -> Rc { 32 | Rc::new(Glu::Eps) 33 | } 34 | 35 | /// Recognize a single character 36 | pub fn sym(c: char) -> Rc { 37 | Rc::new(Glu::Sym(false, c)) 38 | } 39 | 40 | /// Recognize alternatives between two other regexes 41 | pub fn alt(r1: &Rc, r2: &Rc) -> Rc { 42 | Rc::new(Glu::Alt(r1.clone(), r2.clone())) 43 | } 44 | 45 | /// Recognize a sequence of regexes in order 46 | pub fn seq(r1: &Rc, r2: &Rc) -> Rc { 47 | Rc::new(Glu::Seq(r1.clone(), r2.clone())) 48 | } 49 | 50 | /// Recognize a regex repeated zero or more times. 51 | pub fn rep(r1: &Rc) -> Rc { 52 | Rc::new(Glu::Rep(r1.clone())) 53 | } 54 | 55 | // The main function: repeatedly traverses the tree, modifying as it 56 | // goes, generating a new tree, marking the nodes where the expression 57 | // currently "is," for any given character. 58 | // 59 | pub fn shift(g: &Rc, m: bool, c: char) -> Rc { 60 | match g.deref() { 61 | Glu::Eps => eps(), 62 | Glu::Sym(_, s) => Rc::new(Glu::Sym(m && *s == c, *s)), 63 | Glu::Alt(r1, r2) => alt(&shift(r1, m, c), &shift(r2, m, c)), 64 | Glu::Seq(r1, r2) => { 65 | let l_end = empty(r1); 66 | let l_fin = ended(r1); 67 | seq(&shift(r1, m, c), &shift(r2, m && l_end || l_fin, c)) 68 | } 69 | Glu::Rep(r) => rep(&shift(r, m || ended(r), c)), 70 | } 71 | } 72 | 73 | // Helper function that describes whether or not the expression passed 74 | // in contains the mark; used to determine if, when either the string 75 | // or the expression runs out, if the expression is in an "accept" 76 | // state. 77 | // 78 | pub fn ended(g: &Rc) -> bool { 79 | match g.deref() { 80 | Glu::Eps => false, 81 | Glu::Sym(m, _) => *m, 82 | Glu::Alt(r1, r2) => ended(r1) || ended(r2), 83 | Glu::Seq(r1, r2) => ended(r1) && empty(r2) || ended(r2), 84 | Glu::Rep(r) => ended(r), 85 | } 86 | } 87 | 88 | // Helper function that describes whether or not the expression 89 | // supplied handles the empty string. 90 | // 91 | pub fn empty(g: &Rc) -> bool { 92 | match g.deref() { 93 | Glu::Eps => true, 94 | Glu::Sym(_, _) => false, 95 | Glu::Alt(r1, r2) => empty(r1) || empty(r2), 96 | Glu::Seq(r1, r2) => empty(r1) && empty(r2), 97 | Glu::Rep(_) => true, 98 | } 99 | } 100 | 101 | /// Takes a regular expression and a string and returns whether or not 102 | /// the expression and the string match (the string belongs to the 103 | /// set of languages recognized by the expression). 104 | pub fn accept(g: &Rc, s: &str) -> bool { 105 | if s.is_empty() { 106 | return empty(g); 107 | } 108 | 109 | pub fn ashift(g: Rc, c: char) -> Rc { 110 | shift(&g, false, c) 111 | } 112 | 113 | // This is kinda cool. I wonder if I can make the Brz versions look 114 | // like this. 115 | let mut seq = s.chars(); 116 | let start = shift(g, true, seq.next().unwrap()); 117 | ended(&seq.fold(start, ashift)) 118 | } 119 | 120 | #[cfg(test)] 121 | mod tests { 122 | use super::*; 123 | 124 | #[test] 125 | fn basics() { 126 | let cases = [ 127 | ("empty", eps(), "", true), 128 | ("char", sym('a'), "a", true), 129 | ("not char", sym('a'), "b", false), 130 | ("char vs empty", sym('a'), "", false), 131 | ("left alt", alt(&sym('a'), &sym('b')), "a", true), 132 | ("right alt", alt(&sym('a'), &sym('b')), "b", true), 133 | ("neither alt", alt(&sym('a'), &sym('b')), "c", false), 134 | ("empty alt", alt(&sym('a'), &sym('b')), "", false), 135 | ("empty rep", rep(&sym('a')), "", true), 136 | ("sequence", seq(&sym('a'), &sym('b')), "ab", true), 137 | ("sequence with empty", seq(&sym('a'), &sym('b')), "", false), 138 | ("bad long sequence", seq(&sym('a'), &sym('b')), "abc", false), 139 | ("bad short sequence", seq(&sym('a'), &sym('b')), "a", false), 140 | ("one rep", rep(&sym('a')), "a", true), 141 | ("short multiple failed rep", rep(&sym('a')), "ab", false), 142 | ("multiple rep", rep(&sym('a')), "aaaaaaaaa", true), 143 | ( 144 | "multiple rep with failure", 145 | rep(&sym('a')), 146 | "aaaaaaaaab", 147 | false, 148 | ), 149 | ]; 150 | 151 | for (name, case, sample, result) in &cases { 152 | println!("{:?}", name); 153 | assert_eq!(accept(case, &sample.to_string()), *result); 154 | } 155 | } 156 | } 157 | -------------------------------------------------------------------------------- /rust/06_riggedglushkov/Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "glushkov" 3 | version = "0.1.0" 4 | authors = ["Elf M. Sternberg "] 5 | edition = "2018" 6 | 7 | [dependencies] 8 | num-traits = "0.2.6" 9 | -------------------------------------------------------------------------------- /rust/06_riggedglushkov/README.md: -------------------------------------------------------------------------------- 1 | # Rigged Glushkov Regular Expressions in Rust 2 | 3 | This code is significantly different from the Haskell version (Haskell 4 | Experiment 07), in that I decided to "go for it" and merge the process 5 | of instantiation and rigging into a single structure. 6 | 7 | Prior to Rust Experiment 06, the Rust experiments had followed the 8 | Haskell versions' pattern of building the regular expression first using 9 | Kleene expressions, and then lifting them on-the-fly into more complex, 10 | "rigged" versions, before processing them with the given Kleene or 11 | Glushkov construction. 12 | 13 | Rust famously doesn't have memory management, but instead uses lifetimes 14 | and scopes to place much of what it does safely on the stack. It takes 15 | some fiddling to make types, lifetimes, and scopes line up, so as far 16 | back as the first Rust experiment I had individual factory functions for 17 | the different Regex sub-types. These take the place of the simple 18 | `data` types seen in the equivalent Haskell experiments. 19 | 20 | With that in mind, there was no reason to have two different data 21 | structures; for this experiment, there is only the one `enum` types and 22 | its sub-types. 23 | 24 | In this experiment, as in the Haskell version, I build on their idea of 25 | recording already found "empty" and "final" versions of nodes, so the 26 | data structure is now a record of (empty, final, expression). The 27 | `emp`, `alt`, `seq`, and `rep` expressions are pretty much as you'd 28 | expect; one thing you won't find in the base code is the implementation 29 | for `sym`. `sym` must implemented independently for different semiring 30 | implementations. 31 | 32 | The `Sym` expression is a *trait* now; it says that users must provide 33 | an implementation with a single method, `is`, that takes a symbol and 34 | returns a semiring. 35 | 36 | I've had to abandon the use of `num_traits` and `std::ops`, as 37 | `std::ops::Mul` and `std::ops::Add` don't provide a framework for 38 | passing in references. I've gone back to my initial instincts and 39 | provided a comprehensive `Semiring` trait which can take references for 40 | the `mul` and `add` operations. This works very well, as now I can 41 | analyze and operate on immutable `HashSet` collections without 42 | having to clone them to pass them to the cartesian product operation. 43 | That's a massive win in terms of memory and CPU savings. 44 | 45 | The construction processing the expression using Gluskov's progressive 46 | algorithm is the same as the unrigged version, only we cache the "empty" 47 | and "final" values when they're found and do not recalculate them. 48 | 49 | Down in the tests, you'll find the Boolean version (`Recognizer`) as 50 | well as a string version (`Parser`). Both versions show how to 51 | implement a semiring for doing data extraction, including how to define 52 | a specific Sym implementation for the `Sym` trait, and include a 53 | function to instantiate that implementation for your use case. 54 | 55 | The `Parser` version has a specific moment of complexity that can't be 56 | elided: in the `mul` implementation, the multiplication of two sets is 57 | the cartesian product of those sets: a new set containing all possible 58 | combinations of ordered tuples made up of a member of the first set with 59 | a member of the second set. For our purposes, this is a still a string, 60 | so our implementation involves building those tuples and then generating 61 | a new string by concatenating the orderded pair. This takes a bit of 62 | memory thrashing, but much less now that I've solved the `mul(&x, &y)` 63 | issue. 64 | 65 | ## License 66 | 67 | As this is entirely my work, it is copyright (c) 2019, and licensed 68 | under the Mozilla Public License v. 2.0. See the 69 | [LICENSE.md](../../LICENSE.md) in the root directory. 70 | -------------------------------------------------------------------------------- /rust/06_riggedglushkov/src/lib.rs: -------------------------------------------------------------------------------- 1 | //! This crate provides a series of simple functions for building a 2 | //! regular expression, and an `accept` function which takes a 3 | //! completed regular expression and a string and returns a boolean 4 | //! value describing if the expression matched the string (or not). 5 | //! 6 | 7 | use std::rc::Rc; 8 | 9 | pub trait Semiring { 10 | fn zero() -> Self; 11 | fn one() -> Self; 12 | fn is_zero(&self) -> bool; 13 | fn mul(&self, rhs: &Self) -> Self; 14 | fn add(&self, rhs: &Self) -> Self; 15 | } 16 | 17 | /// The Sym trait represents what to do for a single character. It has 18 | /// a single method, "is", that returns the semiring. Implementers of 19 | /// "is" must provide a corresponding construction factory. 20 | 21 | pub trait Sym 22 | where 23 | S: Semiring, 24 | { 25 | fn is(&self, c: char) -> S; 26 | } 27 | 28 | pub enum Glui 29 | where 30 | S: Semiring, 31 | { 32 | Eps, 33 | Sym(Rc>), 34 | Alt(Rc>, Rc>), 35 | Seq(Rc>, Rc>), 36 | Rep(Rc>), 37 | } 38 | 39 | // Empty, Final, Data 40 | pub struct Glu(S, S, Glui); 41 | 42 | /// Recognize only the empty string 43 | pub fn eps() -> Rc> 44 | where 45 | S: Semiring, 46 | { 47 | Rc::new(Glu(S::one(), S::one(), Glui::Eps)) 48 | } 49 | 50 | /// Recognize alternatives between two other regexes 51 | pub fn alt(r1: &Rc>, r2: &Rc>) -> Rc> 52 | where 53 | S: Semiring, 54 | { 55 | Rc::new(Glu( 56 | r1.0.add(&r2.0), 57 | r1.1.add(&r2.1), 58 | Glui::Alt(r1.clone(), r2.clone()), 59 | )) 60 | } 61 | 62 | /// Recognize a sequence of regexes in order 63 | pub fn seq(r1: &Rc>, r2: &Rc>) -> Rc> 64 | where 65 | S: Semiring, 66 | { 67 | Rc::new(Glu( 68 | r1.0.add(&r2.0), 69 | r1.1.mul(&r2.0).add(&r2.1), 70 | Glui::Seq(r1.clone(), r2.clone()), 71 | )) 72 | } 73 | 74 | /// Recognize a regex repeated zero or more times. 75 | pub fn rep(r1: &Rc>) -> Rc> 76 | where 77 | S: Semiring + Clone, 78 | { 79 | Rc::new(Glu(S::one(), r1.1.clone(), Glui::Rep(r1.clone()))) 80 | } 81 | 82 | // The main function: repeatedly traverses the tree, modifying as it 83 | // goes, generating a new tree, marking the nodes where the expression 84 | // currently "is," for any given character. The values of the nodes 85 | // are cached for performance, but this probably isn't a win in Rust 86 | // as Rust won't keep the intermediate functions generated, nor 87 | // provide them ad-hoc to future operations the way Haskell does. 88 | // 89 | fn shift(g: &Rc>, m: &S, c: char) -> Rc> 90 | where 91 | S: Semiring + Clone, 92 | { 93 | use self::Glui::*; 94 | match &g.2 { 95 | Eps => eps(), 96 | Sym(f) => Rc::new(Glu(S::zero(), m.mul(&f.is(c)), Glui::Sym(f.clone()))), 97 | Alt(r1, r2) => alt(&shift(&r1, m, c), &shift(&r2, m, c)), 98 | Seq(r1, r2) => seq( 99 | &shift(&r1, m, c), 100 | &shift(&r2, &(m.mul(&r1.0).add(&r1.1)), c), 101 | ), 102 | Rep(r) => rep(&shift(&r, &(m.add(&r.1)), c)), 103 | } 104 | } 105 | 106 | pub fn accept(g: &Rc>, s: &str) -> S 107 | where 108 | S: Semiring + Clone, 109 | { 110 | if s.is_empty() { 111 | return g.0.clone(); 112 | } 113 | 114 | let ashift = |g, c| shift(&g, &S::zero(), c); 115 | 116 | // This is kinda cool. I wonder if I can make the Brz versions look 117 | // like this. 118 | let mut seq = s.chars(); 119 | let start = shift(g, &S::one(), seq.next().unwrap()); 120 | seq.fold(start, ashift).1.clone() 121 | } 122 | 123 | #[cfg(test)] 124 | mod tests { 125 | 126 | use super::*; 127 | use std::collections::HashSet; 128 | 129 | macro_rules! set { 130 | ( $( $x:expr ),* ) => {{ 131 | let mut temp_set = HashSet::new(); 132 | $( temp_set.insert($x); )* 133 | temp_set // 134 | }}; 135 | } 136 | 137 | #[derive(Debug, Copy, Clone)] 138 | pub struct Recognizer(bool); 139 | 140 | impl Semiring for Recognizer { 141 | fn one() -> Recognizer { 142 | Recognizer(true) 143 | } 144 | fn zero() -> Recognizer { 145 | Recognizer(false) 146 | } 147 | fn is_zero(&self) -> bool { 148 | !self.0 149 | } 150 | fn mul(&self, rhs: &Recognizer) -> Recognizer { 151 | Recognizer(self.0 && rhs.0) 152 | } 153 | fn add(&self, rhs: &Recognizer) -> Recognizer { 154 | Recognizer(self.0 || rhs.0) 155 | } 156 | } 157 | 158 | pub struct SimpleSym { 159 | c: char, 160 | } 161 | 162 | impl Sym for SimpleSym { 163 | fn is(&self, c: char) -> Recognizer { 164 | if c == self.c { 165 | Recognizer::one() 166 | } else { 167 | Recognizer::zero() 168 | } 169 | } 170 | } 171 | 172 | #[test] 173 | fn basics() { 174 | pub fn sym(sample: char) -> Rc> { 175 | Rc::new(Glu( 176 | Recognizer::zero(), 177 | Recognizer::zero(), 178 | Glui::Sym(Rc::new(SimpleSym { c: sample })), 179 | )) 180 | } 181 | 182 | let cases = [ 183 | ("empty", eps(), "", true), 184 | ("char", sym('a'), "a", true), 185 | ("not char", sym('a'), "b", false), 186 | ("char vs empty", sym('a'), "", false), 187 | ("left alt", alt(&sym('a'), &sym('b')), "a", true), 188 | ("right alt", alt(&sym('a'), &sym('b')), "b", true), 189 | ("neither alt", alt(&sym('a'), &sym('b')), "c", false), 190 | ("empty alt", alt(&sym('a'), &sym('b')), "", false), 191 | ("empty rep", rep(&sym('a')), "", true), 192 | ("sequence", seq(&sym('a'), &sym('b')), "ab", true), 193 | ("sequence with empty", seq(&sym('a'), &sym('b')), "", false), 194 | ("bad long sequence", seq(&sym('a'), &sym('b')), "abc", false), 195 | ("bad short sequence", seq(&sym('a'), &sym('b')), "a", false), 196 | ("one rep", rep(&sym('a')), "a", true), 197 | ("short multiple failed rep", rep(&sym('a')), "ab", false), 198 | ("multiple rep", rep(&sym('a')), "aaaaaaaaa", true), 199 | ( 200 | "multiple rep with failure", 201 | rep(&sym('a')), 202 | "aaaaaaaaab", 203 | false, 204 | ), 205 | ]; 206 | 207 | for (name, case, sample, result) in &cases { 208 | println!("{:?}", name); 209 | assert_eq!(accept(case, &sample.to_string()).0, *result); 210 | } 211 | } 212 | 213 | #[derive(Debug, Clone)] 214 | pub struct Parser(HashSet); 215 | 216 | impl Semiring for Parser { 217 | fn one() -> Parser { 218 | Parser(set!["".to_string()]) 219 | } 220 | fn zero() -> Parser { 221 | Parser(set![]) 222 | } 223 | fn is_zero(&self) -> bool { 224 | self.0.len() == 0 225 | } 226 | fn mul(self: &Parser, rhs: &Parser) -> Parser { 227 | let mut temp = set![]; 228 | for i in self.0.iter().cloned() { 229 | for j in &rhs.0 { 230 | temp.insert(i.clone() + &j); 231 | } 232 | } 233 | Parser(temp) 234 | } 235 | fn add(self: &Parser, rhs: &Parser) -> Parser { 236 | Parser(self.0.union(&rhs.0).cloned().collect()) 237 | } 238 | } 239 | 240 | pub struct ParserSym { 241 | c: char, 242 | } 243 | 244 | impl Sym for ParserSym { 245 | fn is(&self, c: char) -> Parser { 246 | if c == self.c { 247 | Parser(set![c.to_string()]) 248 | } else { 249 | Parser::zero() 250 | } 251 | } 252 | } 253 | 254 | #[test] 255 | fn string_basics() { 256 | pub fn sym(sample: char) -> Rc> { 257 | Rc::new(Glu( 258 | Parser::zero(), 259 | Parser::zero(), 260 | Glui::Sym(Rc::new(ParserSym { c: sample })), 261 | )) 262 | } 263 | 264 | let cases = [ 265 | ("empty", eps(), "", Some("")), 266 | ("char", sym('a'), "a", Some("a")), 267 | ("not char", sym('a'), "b", None), 268 | ("char vs empty", sym('a'), "", None), 269 | ("left alt", alt(&sym('a'), &sym('b')), "a", Some("a")), 270 | ("right alt", alt(&sym('a'), &sym('b')), "b", Some("b")), 271 | ("neither alt", alt(&sym('a'), &sym('b')), "c", None), 272 | ("empty alt", alt(&sym('a'), &sym('b')), "", None), 273 | ("empty rep", rep(&sym('a')), "", Some("")), 274 | ("sequence", seq(&sym('a'), &sym('b')), "ab", Some("ab")), 275 | ("sequence with empty", seq(&sym('a'), &sym('b')), "", None), 276 | ("bad long sequence", seq(&sym('a'), &sym('b')), "abc", None), 277 | ("bad short sequence", seq(&sym('a'), &sym('b')), "a", None), 278 | ("one rep", rep(&sym('a')), "a", Some("a")), 279 | ("short multiple failed rep", rep(&sym('a')), "ab", None), 280 | ( 281 | "multiple rep", 282 | rep(&sym('a')), 283 | "aaaaaaaaa", 284 | Some("aaaaaaaaa"), 285 | ), 286 | ( 287 | "multiple rep with failure", 288 | rep(&sym('a')), 289 | "aaaaaaaaab", 290 | None, 291 | ), 292 | ]; 293 | 294 | for (name, case, sample, result) in &cases { 295 | println!("{:?}", name); 296 | let ret = accept(case, &sample.to_string()).0; 297 | match result { 298 | Some(r) => { 299 | let v = ret.iter().next(); 300 | if let Some(s) = v { 301 | assert_eq!(s, sample); 302 | } else { 303 | panic!("Strings did not match: {:?}, {:?}", r, v); 304 | } 305 | assert_eq!(1, ret.len()); 306 | } 307 | None => assert_eq!(0, ret.len()), 308 | } 309 | } 310 | } 311 | } 312 | -------------------------------------------------------------------------------- /rust/07_heavyweights/Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "heavyweights" 3 | version = "0.1.0" 4 | authors = ["Elf M. Sternberg "] 5 | edition = "2018" 6 | 7 | [dependencies] 8 | itertools="0.8.0" 9 | bytes="0.4.11" 10 | -------------------------------------------------------------------------------- /rust/07_heavyweights/README.md: -------------------------------------------------------------------------------- 1 | # Rigged Generic Glushkov Regular Expressions in Rust: The Heavyweight Experiments 2 | 3 | This implementation is significantly different than previous ones, 4 | although it does build off that work, and it does proceed directly from 5 | the [Haskell implementation](../../haskell/08_Heavyweights/) and the 6 | [previous Rust experiment](../../rust/06_riggedglushkov). 7 | 8 | I strongly recommend reading the [Haskell implementation 9 | README](../../haskell/08_Heavyweights/README.md) to get a sense of the 10 | changes to the algorithm. They are fairly heavyweight and interesting, 11 | in that the definition of the Semiring has been further abstracted to 12 | handle position information, and as a consequence the input type of the 13 | operation has likewise been abstracted to handle arbitrary input types. 14 | The last was done to enable us to pass both the character being analyzed 15 | and the position in the stream, such that we could record information 16 | about the position under certain circumstances. 17 | 18 | What makes the *Rust* version of this implementation noteworthy is the 19 | ease with which the inbound data type was changed to handle just about 20 | any kind of data. It adds a bit of genericizing noise to the source 21 | file, some ceremony that makes me wonder how I could abstract or derive 22 | it automatically. 23 | 24 | On the other hand, since the `Recognizer` and `Parser` implementations 25 | concretize that their input type is `char` by usage, they work *completely 26 | unchanged* from the previous Rust experiment. That's remarkable. 27 | 28 | The implementation of the `Leftlong` Semiring, which reports the 29 | location and length of the first, longest substring match of a capture 30 | group (yes!) is fairly extensive and went through a number of thrashes 31 | before I recalled that I could match on a tuple, at which point the 32 | implementations of `add` and `mul` became straightforward. 33 | 34 | Putting the entirety of the semiring in a single trait makes more sense, 35 | to me at least, than abstracting it further over `num_traits`, as I had 36 | it in earlier versions. While using `num_traits` is *clever*, it also 37 | forces us to work with the `::Mul` and `::Add`, which do not take 38 | references, and for larger and more complex semirings, working with 39 | references made a lot of sense. 40 | 41 | The implementation of `submatch`, a function that allows us to search 42 | for arbitrary substrings without having to root or ceiling the string is 43 | interesting; by using the `One()` value, I'm able to preserve the fact 44 | that the search hasn't failed while also enforcing the notion that we're 45 | skipping over things that match `any` but don't match the concrete 46 | sample, which is nifty. 47 | 48 | All in all, this is highly satisfying work, and it's a pleasure to see 49 | it working so well. 50 | 51 | ## License 52 | 53 | As this is entirely my work, it is copyright (c) 2019, and licensed 54 | under the Mozilla Public License v. 2.0. See the 55 | [LICENSE.md](../../LICENSE.md) in the root directory. 56 | -------------------------------------------------------------------------------- /rust/08_riggedbrz/Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "riggedbrz" 3 | version = "0.1.0" 4 | authors = ["Elf M. Sternberg "] 5 | edition = "2018" 6 | 7 | [dependencies] 8 | -------------------------------------------------------------------------------- /rust/08_riggedbrz/README.md: -------------------------------------------------------------------------------- 1 | This is the classic implementation of Brzozowski's Algorithm with 2 | Weighted Semirings. There are very few optimizations in this code. One 3 | aspect that frustrates me is the use of the `Del()` operator; it's a 4 | holdover from a time when I didn't quite understand the interaction 5 | between regular expressions and semirings; its purpose is to preserve 6 | the results of a lazy parsenull() of the sequence. Later, we replace 7 | that with a smarter algorithm. 8 | -------------------------------------------------------------------------------- /rust/09_riggedbrz/Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "riggedbrz" 3 | version = "0.1.0" 4 | authors = ["Elf M. Sternberg "] 5 | edition = "2018" 6 | 7 | [dependencies] 8 | hashbrown = "0.1.7" 9 | -------------------------------------------------------------------------------- /rust/09_riggedbrz/README.md: -------------------------------------------------------------------------------- 1 | # Rigged Generic Brzozowski μ-Regular Expressions 2 | 3 | This experiment realizes an important and significant step in the 4 | series of experiments. In this variant, the `*` *operation* has been 5 | completely removed; the `*` *operator* is now implemented as a recursive 6 | definition: 7 | 8 | // ONE is the identity operator under concatenation 9 | r* = alt(eps(ONE), seq(r, r*)) 10 | 11 | This may seem nonsensical to a programmer, but it's actually quite 12 | implementable in a language that allows mutation under limited effects. 13 | 14 | This means that I now have working *μ-regular expressions*, regular 15 | expressions that use a fixpoint operator to identify the least common 16 | fixpoint of a recursive regular expression, and one that allows regular 17 | expressions to be encapsulated as variables and composed just as one 18 | would compose ordinary functions. 19 | 20 | This effort took a *large* number of evolutions, as I went down various 21 | paths trying to write code faster than I could think or understand. At 22 | least twice I had to delete the work in progress and back up to an 23 | earlier commit, throwing away hours of work. 24 | 25 | But this is *it*, for some definition of "it." This is what everything 26 | has been working up to. 27 | 28 | ## Understanding this code 29 | 30 | The first thing to appreciate is that `nullable()` is now a 31 | self-terminating recursive implementation. At its core is the same 32 | nullable() instruction we've been using for a while now, but now when we 33 | determine the nullability of a node in the expression, we cache that 34 | value and we notify all of its parent nodes that they may also be able 35 | to determine nullability. This is useful for cases such as the Alt(), 36 | which has two children: if one is determined to be nullable, then the 37 | other may be as well, in which case it's now possible to mark (cache) 38 | that the entire expression is always nullable. And that's useful if the 39 | expression is going to be re-used. 40 | 41 | And in this code, expressions are composable, re-usable elements of 42 | code. They can be re-used. 43 | 44 | Also of note: the "mutate" codes are implementations of the short-outs 45 | described in Might's last paper on the topic; take a node, take its 46 | inputs, and determine if one of those inputs is already null or empty; 47 | if that's the case, then a *different* node must be expressed in that 48 | position, one that's simpler and faster. 49 | 50 | ## License 51 | 52 | As this is entirely my work, it is copyright (c) 2019, and licensed 53 | under the Mozilla Public License v. 2.0. See the 54 | [LICENSE.md](../../LICENSE.md) in the root directory. 55 | --------------------------------------------------------------------------------