├── .gitattributes
├── .gitignore
├── LICENSE.md
├── README.md
├── docs
    ├── 01_Regex_in_Typescript.md
    ├── 01_What_Is_A_Regular_Expression.md
    ├── 02_Finite_Automata.md
    ├── A_Play_03.md
    ├── A_Play_04.md
    ├── DFA1.png
    ├── NFA1.png
    ├── notes.md
    ├── paper.css
    └── summary.md
├── haskell
    ├── 01_SimpleRegex
    │   ├── LICENSE
    │   ├── README.md
    │   ├── SimpleRegex.cabal
    │   ├── package.yaml
    │   ├── src
    │   │   └── SimpleRegex.hs
    │   ├── stack.yaml
    │   └── test
    │   │   └── Tests.hs
    ├── 02_RiggedRegex
    │   ├── LICENSE
    │   ├── README.md
    │   ├── RiggedRegex.cabal
    │   ├── package.yaml
    │   ├── src
    │   │   └── RiggedRegex.hs
    │   ├── stack.yaml
    │   └── test
    │   │   └── Tests.hs
    ├── 03_Brzozowski
    │   ├── BrzExp.cabal
    │   ├── LICENSE
    │   ├── README.md
    │   ├── Setup.hs
    │   ├── package.yaml
    │   ├── src
    │   │   └── BrzExp.hs
    │   ├── stack.yaml
    │   └── test
    │   │   └── Tests.hs
    ├── 04_Gluskov
    │   ├── Glushkov.cabal
    │   ├── LICENSE
    │   ├── README.md
    │   ├── package.yaml
    │   ├── src
    │   │   └── Glushkov.hs
    │   ├── stack.yaml
    │   └── test
    │   │   └── Tests.hs
    ├── 05_RiggedBrz
    │   ├── LICENSE
    │   ├── README.md
    │   ├── RiggedBrz.cabal
    │   ├── Setup.hs
    │   ├── package.yaml
    │   ├── src
    │   │   └── RiggedBrz.hs
    │   ├── stack.yaml
    │   └── test
    │   │   └── Tests.hs
    ├── 06_RiggedRegex_Combinator
    │   ├── LICENSE
    │   ├── README.md
    │   ├── RiggedRegex.cabal
    │   ├── package.yaml
    │   ├── src
    │   │   └── RiggedRegex.hs
    │   ├── stack.yaml
    │   └── test
    │   │   └── Tests.hs
    ├── 07_Rigged_Glushkov
    │   ├── LICENSE
    │   ├── README.md
    │   ├── RiggedGlushkov.cabal
    │   ├── package.yaml
    │   ├── src
    │   │   └── RiggedGlushkov.hs
    │   ├── stack.yaml
    │   └── test
    │   │   └── Tests.hs
    ├── 08_Heavyweights
    │   ├── Heavyweights.cabal
    │   ├── LICENSE
    │   ├── LICENSE.md
    │   ├── README.md
    │   ├── package.yaml
    │   ├── src
    │   │   └── Heavyweights.hs
    │   ├── stack.yaml
    │   └── test
    │   │   └── Tests.hs
    └── 09_Classed_Brzozowski
    │   ├── BrzExp.cabal
    │   ├── LICENSE
    │   ├── README.md
    │   ├── Setup.hs
    │   ├── package.yaml
    │   ├── src
    │       └── BrzExp.hs
    │   ├── stack.yaml
    │   └── test
    │       └── Tests.hs
├── node
    └── 01_Kleene.ts
├── python
    ├── 01_rigged_brzozowski.py
    ├── 02_rigged_brzozowski.py
    └── README.md
└── rust
    ├── 01_simpleregex
        ├── Cargo.toml
        ├── README.md
        └── src
        │   └── lib.rs
    ├── 02_riggedregex
        ├── Cargo.toml
        ├── README.md
        └── src
        │   └── lib.rs
    ├── 03_brzozowski_1
        ├── .gitignore
        ├── Cargo.toml
        ├── README.md
        └── src
        │   └── lib.rs
    ├── 04_brzozowski_2
        ├── .gitignore
        ├── Cargo.toml
        ├── README.md
        └── src
        │   └── lib.rs
    ├── 05_glushkov
        ├── Cargo.toml
        ├── README.md
        └── src
        │   └── lib.rs
    ├── 06_riggedglushkov
        ├── Cargo.toml
        ├── README.md
        └── src
        │   └── lib.rs
    ├── 07_heavyweights
        ├── Cargo.toml
        ├── README.md
        └── src
        │   └── lib.rs
    ├── 08_riggedbrz
        ├── Cargo.toml
        ├── README.md
        └── src
        │   └── lib.rs
    └── 09_riggedbrz
        ├── Cargo.toml
        ├── README.md
        └── src
            └── lib.rs


/.gitattributes:
--------------------------------------------------------------------------------
1 | haskell/* linguist-vendored
2 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | .#*
 2 | *~
 3 | *#
 4 | *.aux
 5 | cabal-dev
 6 | cabal.project.local
 7 | cabal.project.local~
 8 | .cabal-sandbox/
 9 | cabal.sandbox.config
10 | Cargo.lock
11 | *.chi
12 | *.chs.h
13 | dist
14 | dist-*
15 | *.dyn_hi
16 | *.dyn_o
17 | *.eventlog
18 | .ghc.environment.*
19 | *.hi
20 | *.hp
21 | .hpc
22 | .hsenv
23 | .HTF/
24 | *.o
25 | *.prof
26 | **/*.rs.bk
27 | .stack-work/
28 | rust/**/target
29 | *.pyc
30 | *.pyo
31 | 


--------------------------------------------------------------------------------
/docs/01_What_Is_A_Regular_Expression.md:
--------------------------------------------------------------------------------
 1 | # What is a regular expression?
 2 | 
 3 | So what *is* a regular expression?  Let's build up from the bottom: we
 4 | start with:
 5 | 
 6 | Alphabet
 7 | : An alphabet is a set of symbols (or we can call them letters)
 8 | 
 9 | Word
10 | : A word is a sequence of symbols from an alphabet
11 | 
12 | Language
13 | : A language is a set of word sequences
14 | 
15 | If our alphabet is ASCII and words are English, then a very simple
16 | language would be something like 
17 | 
18 |     Common_Pets: {dog, cat, fish, hamster, parakeet}.
19 | 
20 | Stephen Cole Kleene proposed a formal definition for "regular languages"
21 | in 1959, and what we have developed since then is a series of
22 | refinements that allow us to parse regular languages in something like
23 | linear time.  Kleene's operations were meant to *generate* languages,
24 | and the research program since that time has been to turn generators
25 | into recognizers.  But let's start with Kleene's generators.
26 | 
27 | ## Regular Languages
28 | 
29 | There are six basic operators in a regular language, and each of them is
30 | itself a regular language.  The first three are the base languages,
31 | encoding the "zero," "one," and "element" of the regular language, and
32 | the second three are composite languages; they contain other regular
33 | languages (including other composites) to describe a complete
34 | generator.
35 | 
36 | Given an alphabet, `A`, we can say:
37 | 
38 | `L[[∅]] = ∅`
39 | : A language that contains nothing is made up of nothing.
40 | 
41 | `L[[ε]] = {ε}` 
42 | : A language containing only the empty string can only
43 | generate empty strings.
44 | 
45 | `L[[a]] = {a}`
46 | : A language containing only the letter 'a' can only generate a single
47 | instance of the letter 'a'.  (This is true for all letters in the
48 | alphabet.)
49 | 
50 | `L[[r · s]] = {u · v | u ∈ L[[r]] and v ∈ L[[s]]}`
51 | : A composite language made up of the *sequence* of two other regular
52 | expressions `r` and `s` can generate any tuple `uv` for every `u`
53 | generated by `r` and every `v` generated by `s`.
54 | 
55 | `L[[r | s]] = L[[r]] ∪ L[[s]]`
56 | : A composite language made up of the *alternatives* of two other
57 | regular expressions `r` and `s` can generate either the strings of `r`
58 | or the strings of `s`, or both!
59 | 
60 | `L[[r∗]] = {ε} ∪ L[[r · r*]]`
61 | : A composite language that repeats `r` zero or more times can generate
62 | zero or more instances of the strings generated by `r`.
63 | 
64 | ## Regular Expressions
65 | 
66 | What we usually think of as "regular expressions" are in fact a small
67 | programming language designed to be parsed and to internally generate a
68 | function that recognizes whether or not a string "belongs to" the sets
69 | of strings described by a Kleene Algebra.  Programatically, a regular
70 | expression is a function that takes a regular language and a string, and
71 | returns back a boolean value indicating whether or not the string
72 | belongs to the set of strings described by the regular language.
73 | 
74 | Regular expressions take Kleene's algebra and turn it backwards, asking
75 | "Can this given string be generated by an expression in Kleene's
76 | algebra?"  In both the Rust and Haskell branches you'll find the
77 | SimpleRegex implementations, which take this quite literally.  The
78 | Haskell version is the most concise; it literally encodes Kleene's five
79 | generative operations (the language of null doesn't generate anything)
80 | and *all possible combinations of `r` and `s` for any given composite
81 | language* and then tests all those combinations to see if the expression
82 | generated any of them.
83 | 
84 | This is, of course, inexcusably slow.  For any string of length `n`, the
85 | number of comparisons done, thanks mostly to the Sequence composite, is
86 | 2<sup>n-1</sup> operations.  For a string of 8 letters, that's 256
87 | different combinations of strings that have to be matched, and on my
88 | fairly modern laptop that takes a little longer than 20 seconds.
89 | Increase that to 15 letters and you'll be waiting almost an hour.
90 | 
91 | The entirety of the modern parsing research program has been to make
92 | this faster and easier to use.  There have been many attempts, and this
93 | project isn't meant to break new ground; instead, its goal is to take
94 | promising results from a variety of different academic research projects
95 | and explore whether there's anything new and interesting that we can
96 | exploit in a modern systems language like Rust or C++.
97 | 
98 | 


--------------------------------------------------------------------------------
/docs/A_Play_03.md:
--------------------------------------------------------------------------------
  1 | In the [last
  2 | post](https://elfsternberg.com/2019/01/23/a-play-on-regular-expressions-part-2/)
  3 | on "[A Play on Regular
  4 | Expressions](https://www-ps.informatik.uni-kiel.de/~sebf/pub/regexp-play.html),"
  5 | I showed how we go from a boolean regular expression to a "rigged" one;
  6 | one that uses an arbitrary data structure to extract data from the
  7 | process of recognizing regular expressions.  The data structure must
  8 | conform to a set of mathematical laws (the
  9 | [semiring](https://en.wikipedia.org/wiki/Semiring) laws), but that
 10 | simple requirement led us to some surprisingly robust results.
 11 | 
 12 | Now, the question is: Can we port this to Rust?
 13 | 
 14 | Easily.
 15 | 
 16 | The first thing to do, however, is to *not* implement a Semiring.  A
 17 | Semiring is a conceptual item, and in Rust it turns out that you can get
 18 | away without defining a Semiring as a trait; instead, it's a collection
 19 | of traits derived from the `num_traits` crate: `Zero, zero, One, one`;
 20 | the capitalized versions are the traits, and the lower case ones are the
 21 | implementations we have to provide.
 22 | 
 23 | I won't post the entire code here, but you can check it out in [Rigged
 24 | Kleene Regular Expressions in
 25 | Rust](https://github.com/elfsternberg/riggedregex/tree/master/rust/02_riggedregex).
 26 | Here are a few highlights:
 27 | 
 28 | The `accept()` function for the Haskell version looked like this: 
 29 | 
 30 |     acceptw :: Semiring s => Regw c s -> [c] -> s
 31 |     acceptw Epsw u     = if null u then one else zero
 32 |     acceptw (Symw f) u = case u of [c] -> f c;  _ -> zero
 33 |     acceptw (Altw p q) u = acceptw p u `add` acceptw q u
 34 |     acceptw (Seqw p q) u = sumr [ acceptw p u1 `mul` acceptw q u2 | (u1, u2) <- split u ]
 35 |     acceptw (Repw r)   u = sumr [ prodr [ acceptw r ui | ui <- ps ] | ps <- parts u ]
 36 | 
 37 | The `accept()` function in Rust looks almost the same:
 38 | 
 39 |     pub fn acceptw<S>(r: &Regw<S>, s: &[char]) -> S
 40 |         where S: Zero + One
 41 |     {
 42 |         match r {
 43 |             Regw::Eps => if s.is_empty() { one() } else { zero() },
 44 |             Regw::Sym(c) => if s.len() == 1 { c(s[0]) } else { zero() },
 45 |             Regw::Alt(r1, r2) => S::add(acceptw(&r1, s), acceptw(&r2, s)),
 46 |             Regw::Seq(r1, r2) => split(s)
 47 |                 .into_iter()
 48 |                 .map(|(u1, u2)| acceptw(r1, &u1) * acceptw(r2, &u2))
 49 |                 .fold(S::zero(), sumr),
 50 |             Regw::Rep(r) => parts(s)
 51 |                 .into_iter()
 52 |                 .map(|ps| ps.into_iter().map(|u| acceptw(r, &u)).fold(S::one(), prod))
 53 |                 .fold(S::zero(), sumr)
 54 |         }
 55 |     }
 56 | 
 57 | There's a bit more machinery here to support the `sum`-over and
 58 | `product`-over maps.  There's also the `where S: Zero + One` clause,
 59 | which tells us that our Semiring must be something that understands
 60 | those two notions and have implementations for them.
 61 | 
 62 | To restore our boolean version of our engine, we have to build a nominal
 63 | container that supports the various traits of our semiring.  To do that,
 64 | we need to implement the methods associated with `Zero`, `One`, `Mul`,
 65 | and `Add`, and explain what they mean to the datatype of our semiring.
 66 | The actual work is straightforward.
 67 | 
 68 |     pub struct Recognizer(bool);
 69 | 
 70 |     impl Zero for Recognizer {
 71 |         fn zero() -> Recognizer { Recognizer(false) }
 72 |         fn is_zero(&self) -> bool { !self.0 }
 73 |     }
 74 | 
 75 |     impl One for Recognizer {
 76 |         fn one() -> Recognizer { Recognizer(true) }
 77 |     }
 78 | 
 79 |     impl Mul for Recognizer {
 80 |         type Output = Recognizer;
 81 |         fn mul(self, rhs: Recognizer) -> Recognizer { Recognizer(self.0 && rhs.0) }
 82 |     }
 83 | 
 84 |     impl Add for Recognizer {
 85 |         type Output = Recognizer;
 86 |         fn add(self, rhs: Recognizer) -> Recognizer { Recognizer(self.0 || rhs.0) }
 87 |     }
 88 | 
 89 | Also, unlike Haskell, Rust must be explicitly told what kind of Semiring
 90 | will be used before processing, whereas Haskell will see what kind of
 91 | Semiring you need to produce the processed result and hook up the
 92 | machinery for you, but that's not surprising.  In Rust, you "lift" a
 93 | straight expression to a rigged one thusly:
 94 | 
 95 |     let rigged: Regw<Recognizer>  = rig(&evencs);
 96 | 
 97 | All in all, porting the Haskell to Rust was extremely straightforward.
 98 | The code looks remarkably similar, but for one detail.  In the Kleene
 99 | version of regular expressions we're emulating as closely as possible
100 | the "all possible permutations of our input string" implicit in the
101 | set-theoretic language of Kleene's 1956 paper.  That slows us down a
102 | lot, but in Haskell the code for doing it was extremely straightforward,
103 | which two simple functions to create all possible permutations for both
104 | the sequence and repetition options:
105 | 
106 |     split []     = [([], [])]
107 |     split (c:cs) = ([], c : cs) : [(c : s1, s2) | (s1, s2) <- split cs]
108 |     parts []     = [[]]
109 |     parts [c]    = [[[c]]]
110 |     parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs]
111 | 
112 | In Rust, these two functions were 21 and 29 lines long, respectively.
113 | Rust's demands that you pay attention to memory usage and the rules
114 | about it require that you also be very explicit about when you want it,
115 | so Rust knows exactly when you no longer want it and can release it back
116 | to the allocator.
117 | 
118 | Rust's syntax and support are amazing, and the way Haskell can be ported
119 | to Rust with little to no loss of fidelity makes me happy to work in
120 | both.
121 | 


--------------------------------------------------------------------------------
/docs/DFA1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/elfsternberg/riggedregex/065a5741807071a55b0e22c0c1ed8a1dfcb1abec/docs/DFA1.png


--------------------------------------------------------------------------------
/docs/NFA1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/elfsternberg/riggedregex/065a5741807071a55b0e22c0c1ed8a1dfcb1abec/docs/NFA1.png


--------------------------------------------------------------------------------
/docs/notes.md:
--------------------------------------------------------------------------------
 1 | Owens and Reppy did a much better job than I originally thought. They
 2 | use the tilde to mean "is recognized by," as in "r ~ u" means "`r`
 3 | *recognizes* the string `u`".
 4 | 
 5 | Following on the nullability issue, r ~ ε ⇔ ν(r) = ε, r ~ aw ⇔ δ(a)r ~ w
 6 | (`r` recognizes `aw` if the derivative of r with respect to `a`
 7 | recognizes only `w`).
 8 | 
 9 | r ≡ s (r is equivalent to s) if 𝓛⟦r⟧ = 𝓛⟦s⟧.  Note that this is
10 | "equivalance" under set theory, where given a binary equivalence
11 | operation.  That is, if the elements of some set S have an equivalence
12 | notion, then the set S can be split into *equivalence classes*. 
13 | 
14 | 1. At each step we have a residual regular expression `r` for the
15 |    residual string `s`
16 |    
17 | 2. Instead of computing the derivative on the fly, we precompute the
18 |    derivative of `r` for each symbol in our alphabet `Σ`, thereby
19 |    constructing a DFA for the language in `r`.
20 |    
21 | 3. Computing equivalence can be expensive
22 | 
23 | 4. It is not practical to iterate over every Unicode codepoint for
24 |    each state.
25 |    
26 | 5. A scanner-generator takes a collection of REs, not just one.
27 | 
28 | Owens & Reppy introduce a notion of *weak equivalence*, which is a set
29 | of rules for harmonizing some regular expression equivalents.  These
30 | look a lot like some of the performance optimizations found in Might &
31 | Adams.
32 | 
33 | They define a *class*, **S**, where **S** ⊆ Σ.  **S** covers both the
34 | empty set and the single character set, as well as a multi-character
35 | *class*.
36 | 
37 | They then add equivalence expressions: **R** + **S** ≈ **T** where 
38 | T = R ∪ S. (Note that this works for *recognition*.  But what about more
39 | complex operations?)
40 | 
41 | We say that a and b are equivalent in r only if δ(a)r ≡ δ(b)r.
42 | 
43 | r = a + b · a + c
44 | 
45 | (Do we read this "a OR ba OR c" or "(a or b)(a or c)".  If we read it
46 | the first way, then this makes sense: the equivalence classes for r
47 | produce three possible derivatives: {a, c}, transition to `ε`; {b},
48 | transition to `a`; or Σ\{a,b,c}, which is the alphabet that excludes a,
49 | b, or c, and transitions to `⊘`.)
50 | 
51 | All right, having gotten that out of the way, we say that i ≅ᵣ j (the
52 | derivative class of r(i) is equivalent to the derivative class of r(j))
53 | if `δᵢr ≡ δⱼr`.
54 | 
55 | fun goto q (S, (Q, δ)) =
56 | 	let c ∈ S
57 | 	let q c = ∂ c q
58 | 	in
59 | 		if ∃q 0 ∈ Q such that q 0 ≈ q c
60 | 			then (Q, δ ∪ {(q, S) 7→ q 0 })
61 | 			else
62 | 				let Q 0 = Q ∪ {q c }
63 | 				let δ 0 = δ ∪ {(q, S) 7→ q c }
64 | 				in explore (Q 0 , δ 0 , q c )
65 | 
66 | let explore (Q, δ, q) = fold (goto q) (Q, δ) (C(q))
67 | 
68 | fun mkDFA r =
69 | 	let q 0 = ∂ ε r
70 | 	let (Q, δ) = explore ({q 0 }, {}, q 0 )
71 | 	let F = {q | q ∈ Q and ν(q) = ε}
72 | 	in hQ, q 0 , F, δi
73 | 
74 | 
75 | 
76 | 
77 | 
78 | 


--------------------------------------------------------------------------------
/docs/paper.css:
--------------------------------------------------------------------------------
  1 | /*
  2 |  * I add this to html files generated with pandoc.
  3 |  */
  4 | 
  5 | html {
  6 |   font-size: 100%;
  7 |   overflow-y: scroll;
  8 |   -webkit-text-size-adjust: 100%;
  9 |   -ms-text-size-adjust: 100%;
 10 | }
 11 | 
 12 | body {
 13 |   color: #444;
 14 |   font-family: "HelveticaÊNeue", "Helvetica", "Arial", sans-serif;
 15 |   font-size: 14px;
 16 |   line-height: 1.7;
 17 |   padding: 1em;
 18 |   margin: auto;
 19 |   max-width: 42em;
 20 |   background: #fefefe;
 21 | }
 22 | 
 23 | a {
 24 |   color: #0645ad;
 25 |   text-decoration: none;
 26 | }
 27 | 
 28 | a:visited {
 29 |   color: #0b0080;
 30 | }
 31 | 
 32 | a:hover {
 33 |   color: #06e;
 34 | }
 35 | 
 36 | a:active {
 37 |   color: #faa700;
 38 | }
 39 | 
 40 | a:focus {
 41 |   outline: thin dotted;
 42 | }
 43 | 
 44 | *::-moz-selection {
 45 |   background: rgba(255, 255, 0, 0.3);
 46 |   color: #000;
 47 | }
 48 | 
 49 | *::selection {
 50 |   background: rgba(255, 255, 0, 0.3);
 51 |   color: #000;
 52 | }
 53 | 
 54 | a::-moz-selection {
 55 |   background: rgba(255, 255, 0, 0.3);
 56 |   color: #0645ad;
 57 | }
 58 | 
 59 | a::selection {
 60 |   background: rgba(255, 255, 0, 0.3);
 61 |   color: #0645ad;
 62 | }
 63 | 
 64 | p {
 65 |   margin: 1em 0;
 66 | }
 67 | 
 68 | img {
 69 |   max-width: 100%;
 70 | }
 71 | 
 72 | h1, h2, h3, h4, h5, h6 {
 73 |   color: #111;
 74 |   line-height: 125%;
 75 |   margin-top: 2em;
 76 |   font-weight: normal;
 77 | }
 78 | 
 79 | h4, h5, h6 {
 80 |   font-weight: bold;
 81 | }
 82 | 
 83 | h1 {
 84 |   font-size: 2.5em;
 85 | }
 86 | 
 87 | h2 {
 88 |   font-size: 2em;
 89 | }
 90 | 
 91 | h3 {
 92 |   font-size: 1.5em;
 93 | }
 94 | 
 95 | h4 {
 96 |   font-size: 1.2em;
 97 | }
 98 | 
 99 | h5 {
100 |   font-size: 1em;
101 | }
102 | 
103 | h6 {
104 |   font-size: 0.9em;
105 | }
106 | 
107 | blockquote {
108 |   color: #666666;
109 |   margin: 0;
110 |   padding-left: 3em;
111 |   border-left: 0.5em #EEE solid;
112 | }
113 | 
114 | hr {
115 |   display: block;
116 |   height: 2px;
117 |   border: 0;
118 |   border-top: 1px solid #aaa;
119 |   border-bottom: 1px solid #eee;
120 |   margin: 1em 0;
121 |   padding: 0;
122 | }
123 | 
124 | pre, code, kbd, samp {
125 |   color: #000;
126 |   font-family: monospace, monospace;
127 |   _font-family: 'courier new', monospace;
128 |   font-size: 0.98em;
129 | }
130 | 
131 | pre {
132 |   white-space: pre;
133 |   white-space: pre-wrap;
134 |   word-wrap: break-word;
135 | }
136 | 
137 | b, strong {
138 |   font-weight: bold;
139 | }
140 | 
141 | dfn {
142 |   font-style: italic;
143 | }
144 | 
145 | ins {
146 |   background: #ff9;
147 |   color: #000;
148 |   text-decoration: none;
149 | }
150 | 
151 | mark {
152 |   background: #ff0;
153 |   color: #000;
154 |   font-style: italic;
155 |   font-weight: bold;
156 | }
157 | 
158 | sub, sup {
159 |   font-size: 75%;
160 |   line-height: 0;
161 |   position: relative;
162 |   vertical-align: baseline;
163 | }
164 | 
165 | sup {
166 |   top: -0.5em;
167 | }
168 | 
169 | sub {
170 |   bottom: -0.25em;
171 | }
172 | 
173 | ul, ol {
174 |   margin: 1em 0;
175 |   padding: 0 0 0 2em;
176 | }
177 | 
178 | li p:last-child {
179 |   margin-bottom: 0;
180 | }
181 | 
182 | ul ul, ol ol {
183 |   margin: .3em 0;
184 | }
185 | 
186 | dl {
187 |   margin-bottom: 1em;
188 | }
189 | 
190 | dt {
191 |   font-weight: bold;
192 |   margin-bottom: .8em;
193 | }
194 | 
195 | dd {
196 |   margin: 0 0 .8em 2em;
197 | }
198 | 
199 | dd:last-child {
200 |   margin-bottom: 0;
201 | }
202 | 
203 | img {
204 |   border: 0;
205 |   -ms-interpolation-mode: bicubic;
206 |   vertical-align: middle;
207 | }
208 | 
209 | figure {
210 |   display: block;
211 |   text-align: center;
212 |   margin: 1em 0;
213 | }
214 | 
215 | figure img {
216 |   border: none;
217 |   margin: 0 auto;
218 | }
219 | 
220 | figcaption {
221 |   font-size: 0.8em;
222 |   font-style: italic;
223 |   margin: 0 0 .8em;
224 | }
225 | 
226 | table {
227 |   margin-bottom: 2em;
228 |   border-bottom: 1px solid #ddd;
229 |   border-right: 1px solid #ddd;
230 |   border-spacing: 0;
231 |   border-collapse: collapse;
232 | }
233 | 
234 | table th {
235 |   padding: .2em 1em;
236 |   background-color: #eee;
237 |   border-top: 1px solid #ddd;
238 |   border-left: 1px solid #ddd;
239 | }
240 | 
241 | table td {
242 |   padding: .2em 1em;
243 |   border-top: 1px solid #ddd;
244 |   border-left: 1px solid #ddd;
245 |   vertical-align: top;
246 | }
247 | 
248 | .author {
249 |   font-size: 1.2em;
250 |   text-align: center;
251 | }
252 | 
253 | @media only screen and (min-width: 480px) {
254 |   body {
255 |     font-size: 14px;
256 |   }
257 | }
258 | @media only screen and (min-width: 768px) {
259 |   body {
260 |     font-size: 16px;
261 |   }
262 | }
263 | @media print {
264 |   * {
265 |     background: transparent !important;
266 |     color: black !important;
267 |     filter: none !important;
268 |     -ms-filter: none !important;
269 |   }
270 | 
271 |   body {
272 |     font-size: 12pt;
273 |     max-width: 100%;
274 |   }
275 | 
276 |   a, a:visited {
277 |     text-decoration: underline;
278 |   }
279 | 
280 |   hr {
281 |     height: 1px;
282 |     border: 0;
283 |     border-bottom: 1px solid black;
284 |   }
285 | 
286 |   a[href]:after {
287 |     content: " (" attr(href) ")";
288 |   }
289 | 
290 |   abbr[title]:after {
291 |     content: " (" attr(title) ")";
292 |   }
293 | 
294 |   .ir a:after, a[href^="javascript:"]:after, a[href^="#"]:after {
295 |     content: "";
296 |   }
297 | 
298 |   pre, blockquote {
299 |     border: 1px solid #999;
300 |     padding-right: 1em;
301 |     page-break-inside: avoid;
302 |   }
303 | 
304 |   tr, img {
305 |     page-break-inside: avoid;
306 |   }
307 | 
308 |   img {
309 |     max-width: 100% !important;
310 |   }
311 | 
312 |   @page :left {
313 |     margin: 15mm 20mm 15mm 10mm;
314 | }
315 | 
316 |   @page :right {
317 |     margin: 15mm 10mm 15mm 20mm;
318 | }
319 | 
320 |   p, h2, h3 {
321 |     orphans: 3;
322 |     widows: 3;
323 |   }
324 | 
325 |   h2, h3 {
326 |     page-break-after: avoid;
327 |   }
328 | }
329 | 


--------------------------------------------------------------------------------
/docs/summary.md:
--------------------------------------------------------------------------------
1 | L[[∅]] = ∅
2 | L[[ε]] = {ε}
3 | L[[a]] = {a}
4 | L[[r · s]] = {u · v | u ∈ L[[r]] and v ∈ L[[s]]}
5 | L[[r | s]] = L[[r]] ∪ L[[s]]
6 | L[[r∗]] = {ε} ∪ L[[r · r*]]
7 | 


--------------------------------------------------------------------------------
/haskell/01_SimpleRegex/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2010, Thomas Wilke, Frank Huch, Sebastian Fischer
 2 | 
 3 | All rights reserved.
 4 | 
 5 | Redistribution and use in source and binary forms, with or without
 6 | modification, are permitted provided that the following conditions are
 7 | met:
 8 | 
 9 |  1. Redistributions of source code must retain the above copyright
10 |     notice, this list of conditions and the following disclaimer.
11 | 
12 |  2. Redistributions in binary form must reproduce the above copyright
13 |     notice, this list of conditions and the following disclaimer in
14 |     the documentation and/or other materials provided with the
15 |     distribution.
16 | 
17 |  3. Neither the name of the author nor the names of his contributors
18 |     may be used to endorse or promote products derived from this
19 |     software without specific prior written permission.
20 | 
21 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
22 | ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
23 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
24 | A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHORS OR
25 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
26 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
27 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
28 | PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
29 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
30 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
31 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
32 | 
33 | 


--------------------------------------------------------------------------------
/haskell/01_SimpleRegex/README.md:
--------------------------------------------------------------------------------
 1 | # Kleene Regular Expressions, in Haskell
 2 | 
 3 | This is literally the definition of a simple string recognizing regular
 4 | expression in Haskell.  It consists of the `Reg` datatype encompassing
 5 | the five standard operations of regular expressions and an `accept`
 6 | function that takes the expression and a string and returns a Boolean
 7 | yes/no on recognition or failure. It is a direct implementation of
 8 | Kleene's algebra:
 9 | 
10 |     L[[ε]] = {ε}
11 |     L[[a]] = {a}
12 |     L[[r · s]] = {u · v | u ∈ L[[r]] and v ∈ L[[s]]}
13 |     L[[r | s]] = L[[r]] ∪ L[[s]]
14 |     L[[r∗]] = {ε} ∪ L[[r · r*]]
15 |     
16 | Those equations are for: recognizing an empty string, recognizing a
17 | letter, recognizing two expressions in sequence, recognizing two
18 | expression alternatives, and the repetition operation.
19 | 
20 | The `accept` function has two helper functions that split the string,
21 | and all substrings, into all possible substrings such that *every
22 | possible combination* of string and expression are tested, and if the
23 | resulting tree of `and`s (from Sequencing and Repetition) and `or`s
24 | (from Alternation) has at least one complete collection of `True` from
25 | top to bottom then the function returns true.
26 | 
27 | This generation and comparison of substrings is grossly inefficient; an
28 | string of eight 'a's with `a*` will take 30 seconds on a modern laptop;
29 | increase that to twelve and you'll be waiting about an hour.  The cost
30 | is `2^(n - 1)`, where `n` is the length of the string; this is a
31 | consequence of the sequencing operation.  Sequences aren't just about
32 | letters: they could be about anything, including repetition (which
33 | itself creates new sequences) and other sequences, and the cost of
34 | examining every possible combination of sequencing creates this
35 | exponential cost.
36 | 
37 | It is quite amazing, though, to actually *see* a straightforward
38 | implementation of Kleene's Regular Expressions in code.
39 | 
40 | 
41 | 
42 | 


--------------------------------------------------------------------------------
/haskell/01_SimpleRegex/SimpleRegex.cabal:
--------------------------------------------------------------------------------
 1 | cabal-version: 1.12
 2 | 
 3 | -- This file has been generated from package.yaml by hpack version 0.31.1.
 4 | --
 5 | -- see: https://github.com/sol/hpack
 6 | --
 7 | -- hash: 2b06081e19cfbe96fa9a2d9a12695410d10cc0b73d3fe0c09d77986d2f101773
 8 | 
 9 | name:           SimpleRegex
10 | version:        0.1.0.0
11 | category:       Regex
12 | homepage:       https://github.com/elfsternberg/RegexWeightedPearl#readme
13 | author:         Elf M. Sternberg
14 | maintainer:     elf.sternberg@gmail.com
15 | copyright:      Copyright ⓒ 2019 Elf M. Sternberg
16 | license:        BSD3
17 | license-file:   LICENSE
18 | build-type:     Simple
19 | extra-source-files:
20 |     README.md
21 | 
22 | library
23 |   exposed-modules:
24 |       SimpleRegex
25 |   other-modules:
26 |       Paths_SimpleRegex
27 |   hs-source-dirs:
28 |       src
29 |   ghc-options: -Wall
30 |   build-depends:
31 |       base
32 |   default-language: Haskell2010
33 | 
34 | test-suite test
35 |   type: exitcode-stdio-1.0
36 |   main-is: Tests.hs
37 |   other-modules:
38 |       Paths_SimpleRegex
39 |   hs-source-dirs:
40 |       test
41 |   build-depends:
42 |       SimpleRegex
43 |     , base
44 |     , hspec
45 |   default-language: Haskell2010
46 | 


--------------------------------------------------------------------------------
/haskell/01_SimpleRegex/package.yaml:
--------------------------------------------------------------------------------
 1 | name: SimpleRegex
 2 | version:             0.1.0.0
 3 | 
 4 | homepage:            https://github.com/elfsternberg/riggedregex#readme
 5 | license:             BSD3
 6 | license-file:        LICENSE
 7 | author:              Elf M. Sternberg
 8 | maintainer:          elf.sternberg@gmail.com
 9 | copyright:           Copyright ⓒ 2019 Elf M. Sternberg
10 | category:            Regex
11 | build-type:          Simple
12 | extra-source-files:  README.md
13 | 
14 | dependencies:
15 |   - base
16 | 
17 | library:
18 |   exposed-modules: SimpleRegex
19 |   ghc-options: -Wall
20 |   source-dirs: src
21 | 
22 | tests:
23 |   test:
24 |     main: Tests.hs
25 |     source-dirs: test
26 |     dependencies:
27 |       - SimpleRegex
28 |       - hspec
29 |   
30 | 
31 | 


--------------------------------------------------------------------------------
/haskell/01_SimpleRegex/src/SimpleRegex.hs:
--------------------------------------------------------------------------------
 1 | {-# LANGUAGE LambdaCase #-}
 2 | 
 3 | module SimpleRegex ( accept, Reg (..) ) where
 4 | 
 5 | data Reg = 
 6 |     Eps         -- Epsilon
 7 |   | Sym Char    -- Character
 8 |   | Alt Reg Reg -- Alternation
 9 |   | Seq Reg Reg -- Sequence
10 |   | Rep Reg     -- R*
11 | 
12 | accept :: Reg -> String -> Bool
13 | -- Epsilon
14 | accept Eps u       = null u
15 | -- Accept if the character offered matches the character constructed
16 | accept (Sym c) u   = u == [c]
17 | -- Constructed of two other expressions, accept if either one does.
18 | accept (Alt p q) u = accept p u || accept q u
19 | -- Constructed of two other expressions, accept if p accepts some part
20 | -- of u and q accepts the rest, where u is split arbitrarily
21 | accept (Seq p q) u = or [accept p u1 && accept q u2 | (u1, u2) <- split u]
22 | -- For all convolutions of u containing no empty strings,
23 | -- if all the entries of that convolution are accepted,
24 | -- then at least one convolution is acceptable.
25 | accept (Rep r) u   = or [and [accept r ui | ui <- ps] | ps <- parts u]
26 | 
27 | -- Generate a list of all possible combinations of a prefix and suffix
28 | -- for the string offered.w
29 | split :: [a] -> [([a], [a])]
30 | split []     = [([], [])]
31 | split (c:cs) = ([], c : cs) : [(c : s1, s2) | (s1, s2) <- split cs]
32 | 
33 | -- Generate lists of lists that contain all possible convolutions of
34 | -- the input string, not including the empty string.
35 | parts :: [a] -> [[[a]]]
36 | parts []     = [[]]
37 | parts [c]    = [[[c]]]
38 | parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs]
39 | 
40 | 


--------------------------------------------------------------------------------
/haskell/01_SimpleRegex/stack.yaml:
--------------------------------------------------------------------------------
 1 | # This file was automatically generated by 'stack init'
 2 | #
 3 | # Some commonly used options have been documented as comments in this file.
 4 | # For advanced use and comprehensive documentation of the format, please see:
 5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/
 6 | 
 7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version.
 8 | # A snapshot resolver dictates the compiler version and the set of packages
 9 | # to be used for project dependencies. For example:
10 | #
11 | # resolver: lts-3.5
12 | # resolver: nightly-2015-09-21
13 | # resolver: ghc-7.10.2
14 | #
15 | # The location of a snapshot can be provided as a file or url. Stack assumes
16 | # a snapshot provided as a file might change, whereas a url resource does not.
17 | #
18 | # resolver: ./custom-snapshot.yaml
19 | # resolver: https://example.com/snapshots/2018-01-01.yaml
20 | resolver: lts-13.4
21 | 
22 | # User packages to be built.
23 | # Various formats can be used as shown in the example below.
24 | #
25 | # packages:
26 | # - some-directory
27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz
28 | # - location:
29 | #    git: https://github.com/commercialhaskell/stack.git
30 | #    commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
32 | #  subdirs:
33 | #  - auto-update
34 | #  - wai
35 | packages:
36 | - .
37 | # Dependency packages to be pulled from upstream that are not in the resolver
38 | # using the same syntax as the packages field.
39 | # (e.g., acme-missiles-0.3)
40 | # extra-deps: []
41 | 
42 | # Override default flag values for local packages and extra-deps
43 | # flags: {}
44 | 
45 | # Extra package databases containing global packages
46 | # extra-package-dbs: []
47 | 
48 | # Control whether we use the GHC we find on the path
49 | # system-ghc: true
50 | #
51 | # Require a specific version of stack, using version ranges
52 | # require-stack-version: -any # Default
53 | # require-stack-version: ">=1.9"
54 | #
55 | # Override the architecture used by stack, especially useful on Windows
56 | # arch: i386
57 | # arch: x86_64
58 | #
59 | # Extra directories used by stack for building
60 | # extra-include-dirs: [/path/to/dir]
61 | # extra-lib-dirs: [/path/to/dir]
62 | #
63 | # Allow a newer minor version of GHC than the snapshot specifies
64 | # compiler-check: newer-minor
65 | 


--------------------------------------------------------------------------------
/haskell/01_SimpleRegex/test/Tests.hs:
--------------------------------------------------------------------------------
  1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-}
  2 | {-# LANGUAGE RecordWildCards #-}
  3 | import           Data.Foldable     (for_)
  4 | import Test.Hspec        (Spec, it, shouldBe)
  5 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)
  6 | import SimpleRegex (Reg (..), accept)
  7 | 
  8 | main :: IO ()
  9 | main = hspecWith defaultConfig {configFastFail = True} specs
 10 | 
 11 | specs :: Spec
 12 | specs = do
 13 | 
 14 |      let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) )
 15 |      let onec = Seq nocs (Sym 'c')
 16 |      let evencs = Seq ( Rep ( Seq onec onec ) ) nocs
 17 | 
 18 |      let as = Alt (Sym 'a') (Rep (Sym 'a'))                  
 19 |      let bs = Alt (Sym 'b') (Rep (Sym 'b'))
 20 | 
 21 |      it "simple expression" $
 22 |         accept evencs "acc" `shouldBe` True
 23 | 
 24 |      for_ cases test
 25 |         where
 26 |           test Case {..} = it description assertion
 27 |               where
 28 |                 assertion = accept regex sample `shouldBe` result
 29 | 
 30 | 
 31 | data Case = Case
 32 |   { description :: String
 33 |   , regex       :: Reg
 34 |   , sample      :: String
 35 |   , result      :: Bool
 36 |   }
 37 | 
 38 | cases :: [Case]
 39 | cases =
 40 |   [ Case {description = "empty", regex = Eps, sample = "", result = True}
 41 |   , Case {description = "char", regex = Sym 'a', sample = "a", result = True}
 42 |   , Case
 43 |       {description = "not char", regex = Sym 'a', sample = "b", result = False}
 44 |   , Case
 45 |       { description = "char vs empty"
 46 |       , regex = Sym 'a'
 47 |       , sample = ""
 48 |       , result = False
 49 |       }
 50 |   , Case
 51 |       { description = "left alt"
 52 |       , regex = Alt (Sym 'a') (Sym 'b')
 53 |       , sample = "a"
 54 |       , result = True
 55 |       }
 56 |   , Case
 57 |       { description = "right alt"
 58 |       , regex = Alt (Sym 'a') (Sym 'b')
 59 |       , sample = "b"
 60 |       , result = True
 61 |       }
 62 |   , Case
 63 |       { description = "neither alt"
 64 |       , regex = Alt (Sym 'a') (Sym 'b')
 65 |       , sample = "c"
 66 |       , result = False
 67 |       }
 68 |   , Case
 69 |       { description = "empty alt"
 70 |       , regex = Alt (Sym 'a') (Sym 'b')
 71 |       , sample = ""
 72 |       , result = False
 73 |       }
 74 |   , Case
 75 |       { description = "empty rep"
 76 |       , regex = Rep (Sym 'a')
 77 |       , sample = ""
 78 |       , result = True
 79 |       }
 80 |   , Case
 81 |       { description = "one rep"
 82 |       , regex = Rep (Sym 'a')
 83 |       , sample = "a"
 84 |       , result = True
 85 |       }
 86 |   , Case
 87 |       { description = "multiple rep"
 88 |       , regex = Rep (Sym 'a')
 89 |       , sample = "aaaaaaaaa"
 90 |       , result = True
 91 |       }
 92 |   , Case
 93 |       { description = "multiple rep with failure"
 94 |       , regex = Rep (Sym 'a')
 95 |       , sample = "aaaaaaaaab"
 96 |       , result = False
 97 |       }
 98 |   , Case
 99 |       { description = "sequence"
100 |       , regex = Seq (Sym 'a') (Sym 'b')
101 |       , sample = "ab"
102 |       , result = True
103 |       }
104 |   , Case
105 |       { description = "sequence with empty"
106 |       , regex = Seq (Sym 'a') (Sym 'b')
107 |       , sample = ""
108 |       , result = False
109 |       }
110 |   , Case
111 |       { description = "bad short sequence"
112 |       , regex = Seq (Sym 'a') (Sym 'b')
113 |       , sample = "a"
114 |       , result = False
115 |       }
116 |   , Case
117 |       { description = "bad long sequence"
118 |       , regex = Seq (Sym 'a') (Sym 'b')
119 |       , sample = "abc"
120 |       , result = False
121 |       }
122 |   ]
123 |           
124 |   
125 | 


--------------------------------------------------------------------------------
/haskell/02_RiggedRegex/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2010, Thomas Wilke, Frank Huch, Sebastian Fischer
 2 | 
 3 | All rights reserved.
 4 | 
 5 | Redistribution and use in source and binary forms, with or without
 6 | modification, are permitted provided that the following conditions are
 7 | met:
 8 | 
 9 |  1. Redistributions of source code must retain the above copyright
10 |     notice, this list of conditions and the following disclaimer.
11 | 
12 |  2. Redistributions in binary form must reproduce the above copyright
13 |     notice, this list of conditions and the following disclaimer in
14 |     the documentation and/or other materials provided with the
15 |     distribution.
16 | 
17 |  3. Neither the name of the author nor the names of his contributors
18 |     may be used to endorse or promote products derived from this
19 |     software without specific prior written permission.
20 | 
21 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
22 | ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
23 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
24 | A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHORS OR
25 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
26 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
27 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
28 | PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
29 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
30 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
31 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
32 | 
33 | 


--------------------------------------------------------------------------------
/haskell/02_RiggedRegex/README.md:
--------------------------------------------------------------------------------
 1 | # Kleene Regular Expressions with Rigging, in Haskell
 2 | 
 3 | This program builds on the simple regular expressions in Version 01,
 4 | provding a new definition of a regular expression `Regw` that takes two
 5 | types, a source type and an output type.  The output type must be a
 6 | [*Semiring*](https://en.wikipedia.org/wiki/Semiring).
 7 | 
 8 | A semiring is a set R equipped with two binary operations + and ⋅, and
 9 | two constants identified as 0 and 1.  By providing a semiring to the
10 | regular expression, we change the return type of the regular expression
11 | to any set that can obey the semiring laws.  There's a surprising amount
12 | of stuff you can do with the semiring laws.
13 | 
14 | In this example, I've providing a function, `rigged`, that takes a
15 | simple regular expression from Version 01, and wraps or extracts
16 | the contents of that regular expression into the `Regw` datatype.
17 | Instead of the boolean mathematics of Version 01, we use the semiring
18 | symbols `add` and `mul` to represent the sum and product operations on
19 | the return type.  We then define the "symbol accepted" boolean to return
20 | either the `zero` or `one` type of the semiring.
21 | 
22 | I've provided two semirings: One of (0, 1, +, *, Integers), and one of
23 | (False, True, ||, &&, Booleans).  Both work well.
24 | 
25 | The `accept expression string` function of the original still works, but
26 | if you say `accept (rigged expression) string :: Int`, Haskell will *go
27 | find* a Semiring that allows this function to work and return the number
28 | of ambiguities encountered during parsing.  If you ask for Bool as a
29 | return type, it will behave as the original.
30 | 
31 | Sometimes, Haskell is bleeding magical.
32 | 
33 | 


--------------------------------------------------------------------------------
/haskell/02_RiggedRegex/RiggedRegex.cabal:
--------------------------------------------------------------------------------
 1 | cabal-version: 1.12
 2 | 
 3 | -- This file has been generated from package.yaml by hpack version 0.31.1.
 4 | --
 5 | -- see: https://github.com/sol/hpack
 6 | --
 7 | -- hash: a27275fb9824bb59f3ba73db8613283f0ce03f9ab6d1053ec40e17977c04aa1d
 8 | 
 9 | name:           RiggedRegex
10 | version:        0.1.0.0
11 | category:       Regex
12 | homepage:       https://github.com/elfsternberg/riggedregex#readme
13 | author:         Elf M. Sternberg
14 | maintainer:     elf.sternberg@gmail.com
15 | copyright:      Copyright ⓒ 2019 Elf M. Sternberg
16 | license:        BSD3
17 | license-file:   LICENSE
18 | build-type:     Simple
19 | extra-source-files:
20 |     README.md
21 | 
22 | library
23 |   exposed-modules:
24 |       RiggedRegex
25 |   other-modules:
26 |       Paths_RiggedRegex
27 |   hs-source-dirs:
28 |       src
29 |   ghc-options: -Wall
30 |   build-depends:
31 |       base
32 |   default-language: Haskell2010
33 | 
34 | test-suite test
35 |   type: exitcode-stdio-1.0
36 |   main-is: Tests.hs
37 |   other-modules:
38 |       Paths_RiggedRegex
39 |   hs-source-dirs:
40 |       test
41 |   build-depends:
42 |       RiggedRegex
43 |     , base
44 |     , hspec
45 |   default-language: Haskell2010
46 | 


--------------------------------------------------------------------------------
/haskell/02_RiggedRegex/package.yaml:
--------------------------------------------------------------------------------
 1 | name: RiggedRegex
 2 | version:             0.1.0.0
 3 | 
 4 | homepage:            https://github.com/elfsternberg/riggedregex#readme
 5 | license:             BSD3
 6 | license-file:        LICENSE
 7 | author:              Elf M. Sternberg
 8 | maintainer:          elf.sternberg@gmail.com
 9 | copyright:           Copyright ⓒ 2019 Elf M. Sternberg
10 | category:            Regex
11 | build-type:          Simple
12 | extra-source-files:  README.md
13 | 
14 | dependencies:
15 |   - base
16 | 
17 | library:
18 |   exposed-modules: RiggedRegex
19 |   ghc-options: -Wall
20 |   source-dirs: src
21 | 
22 | tests:
23 |   test:
24 |     main: Tests.hs
25 |     source-dirs: test
26 |     dependencies:
27 |       - RiggedRegex
28 |       - hspec
29 |   
30 | 
31 | 


--------------------------------------------------------------------------------
/haskell/02_RiggedRegex/src/RiggedRegex.hs:
--------------------------------------------------------------------------------
  1 | {-# LANGUAGE LambdaCase #-}
  2 | 
  3 | module RiggedRegex ( accept, acceptw, Reg (..), Regw (..), rigged ) where
  4 | 
  5 | data Reg = 
  6 |     Eps         -- Epsilon
  7 |   | Sym Char    -- Character
  8 |   | Alt Reg Reg -- Alternation
  9 |   | Seq Reg Reg -- Sequence
 10 |   | Rep Reg     -- R*
 11 | 
 12 | accept :: Reg -> String -> Bool
 13 | -- Epsilon
 14 | accept Eps u       = null u
 15 | -- Accept if the character offered matches the character constructed
 16 | accept (Sym c) u   = u == [c]
 17 | -- Constructed of two other expressions, accept if either one does.
 18 | accept (Alt p q) u = accept p u || accept q u
 19 | -- Constructed of two other expressions, accept if p accepts some part
 20 | -- of u and q accepts the rest, where u is split arbitrarily
 21 | accept (Seq p q) u = or [accept p u1 && accept q u2 | (u1, u2) <- split u]
 22 | -- For all convolutions of u containing no empty strings,
 23 | -- if all the entries of that convolution are accepted,
 24 | -- then at least one convolution is acceptable.
 25 | accept (Rep r) u   = or [and [accept r ui | ui <- ps] | ps <- parts u]
 26 | 
 27 | -- Generate a list of all possible combinations of a prefix and suffix
 28 | -- for the string offered.w
 29 | split :: [a] -> [([a], [a])]
 30 | split []     = [([], [])]
 31 | split (c:cs) = ([], c : cs) : [(c : s1, s2) | (s1, s2) <- split cs]
 32 | 
 33 | -- Generate lists of lists that contain all possible convolutions of
 34 | -- the input string, not including the empty string.
 35 | parts :: [a] -> [[[a]]]
 36 | parts []     = [[]]
 37 | parts [c]    = [[[c]]]
 38 | parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs]
 39 | 
 40 | -- A semiring is an algebraic structure with a zero, a one, a
 41 | -- "multiplication" operation, and an "addition" operation.  Zero is
 42 | -- the identity operator for addition, One is the identity operator for
 43 | -- multiplication, both composition operators are associative (it does
 44 | -- not matter how sequential operations are grouped), and addition is
 45 | -- commutative (the order of the operations does not matter).  Also,
 46 | -- zero `mul` anything is always zero.
 47 | --
 48 | -- Which, in regular expressions in general, holds that the null regex
 49 | -- is zero, and the empty string regex is one, alternation is addition
 50 | -- and ... sequence is multiplication?  Like "sum" and "product" types?
 51 |                
 52 | class Semiring s where
 53 |     zero, one :: s
 54 |     mul, add  :: s -> s -> s
 55 | 
 56 | -- Symw (c -> s) represents a mapping from a symbol to its given weight.
 57 | 
 58 | sym :: Semiring s => Char -> Regw Char s
 59 | sym c = Symw (\b -> if b == c then one else zero)      
 60 |                 
 61 | data Regw c s =                
 62 |     Epsw                       -- Epsilon
 63 |   | Symw (c -> s)              -- Character
 64 |   | Altw (Regw c s) (Regw c s) -- Alternation
 65 |   | Seqw (Regw c s) (Regw c s) -- Sequence
 66 |   | Repw (Regw c s)            -- R*
 67 | 
 68 | rigged :: Semiring s => Reg -> Regw Char s
 69 | rigged = \case
 70 |          Eps       -> Epsw
 71 |          (Sym c)   -> sym c
 72 |          (Alt p q) -> Altw (rigged p) (rigged q)
 73 |          (Seq p q) -> Seqw (rigged p) (rigged q)
 74 |          (Rep r)   -> Repw (rigged r)
 75 | 
 76 | acceptw :: Semiring s => Regw c s -> [c] -> s
 77 | acceptw Epsw u     = if null u then one else zero
 78 | acceptw (Symw f) u =
 79 |     case u of
 80 |       [c] -> f c
 81 |       _ -> zero
 82 | acceptw (Altw p q) u = acceptw p u `add` acceptw q u
 83 | acceptw (Seqw p q) u = sumr [ acceptw p u1 `mul` acceptw q u2 | (u1, u2) <- split u ]
 84 | acceptw (Repw r)   u = sumr [ prodr [ acceptw r ui | ui <- ps ] | ps <- parts u ]
 85 | 
 86 | sumr, prodr :: Semiring r => [r] -> r
 87 | sumr = foldr add zero
 88 | prodr = foldr mul one
 89 | 
 90 | instance Semiring Bool where
 91 |     zero = False
 92 |     one = True
 93 |     add = (||)
 94 |     mul = (&&)
 95 | 
 96 | instance Semiring Int where
 97 |     zero = 0
 98 |     one = 1
 99 |     add = (+)
100 |     mul = (*)
101 | 
102 | 


--------------------------------------------------------------------------------
/haskell/02_RiggedRegex/stack.yaml:
--------------------------------------------------------------------------------
 1 | # This file was automatically generated by 'stack init'
 2 | #
 3 | # Some commonly used options have been documented as comments in this file.
 4 | # For advanced use and comprehensive documentation of the format, please see:
 5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/
 6 | 
 7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version.
 8 | # A snapshot resolver dictates the compiler version and the set of packages
 9 | # to be used for project dependencies. For example:
10 | #
11 | # resolver: lts-3.5
12 | # resolver: nightly-2015-09-21
13 | # resolver: ghc-7.10.2
14 | #
15 | # The location of a snapshot can be provided as a file or url. Stack assumes
16 | # a snapshot provided as a file might change, whereas a url resource does not.
17 | #
18 | # resolver: ./custom-snapshot.yaml
19 | # resolver: https://example.com/snapshots/2018-01-01.yaml
20 | resolver: lts-13.4
21 | 
22 | # User packages to be built.
23 | # Various formats can be used as shown in the example below.
24 | #
25 | # packages:
26 | # - some-directory
27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz
28 | # - location:
29 | #    git: https://github.com/commercialhaskell/stack.git
30 | #    commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
32 | #  subdirs:
33 | #  - auto-update
34 | #  - wai
35 | packages:
36 | - .
37 | # Dependency packages to be pulled from upstream that are not in the resolver
38 | # using the same syntax as the packages field.
39 | # (e.g., acme-missiles-0.3)
40 | # extra-deps: []
41 | 
42 | # Override default flag values for local packages and extra-deps
43 | # flags: {}
44 | 
45 | # Extra package databases containing global packages
46 | # extra-package-dbs: []
47 | 
48 | # Control whether we use the GHC we find on the path
49 | # system-ghc: true
50 | #
51 | # Require a specific version of stack, using version ranges
52 | # require-stack-version: -any # Default
53 | # require-stack-version: ">=1.9"
54 | #
55 | # Override the architecture used by stack, especially useful on Windows
56 | # arch: i386
57 | # arch: x86_64
58 | #
59 | # Extra directories used by stack for building
60 | # extra-include-dirs: [/path/to/dir]
61 | # extra-lib-dirs: [/path/to/dir]
62 | #
63 | # Allow a newer minor version of GHC than the snapshot specifies
64 | # compiler-check: newer-minor
65 | 


--------------------------------------------------------------------------------
/haskell/02_RiggedRegex/test/Tests.hs:
--------------------------------------------------------------------------------
  1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-}
  2 | {-# LANGUAGE RecordWildCards #-}
  3 | import           Data.Foldable     (for_)
  4 | import Test.Hspec        (Spec, it, shouldBe)
  5 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)
  6 | import RiggedRegex (Reg (..), accept, acceptw, rigged)
  7 | 
  8 | main :: IO ()
  9 | main = hspecWith defaultConfig {configFastFail = True} specs
 10 | 
 11 | specs :: Spec
 12 | specs = do
 13 | 
 14 |      let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) )
 15 |      let onec = Seq nocs (Sym 'c')
 16 |      let evencs = Seq ( Rep ( Seq onec onec ) ) nocs
 17 | 
 18 |      let as = Alt (Sym 'a') (Rep (Sym 'a'))                  
 19 |      let bs = Alt (Sym 'b') (Rep (Sym 'b'))
 20 | 
 21 |      it "simple expression" $
 22 |         accept evencs "acc" `shouldBe` True
 23 | 
 24 |      it "lifted expression" $
 25 |         (acceptw (rigged evencs) "acc" :: Bool) `shouldBe` True
 26 | 
 27 |      it "lifted expression short" $
 28 |         (acceptw (rigged evencs) "acc" :: Int) `shouldBe` 1
 29 | 
 30 |      it "lifted expression counter two" $
 31 |         (acceptw (rigged as) "a" :: Int) `shouldBe` 2
 32 | 
 33 |      it "lifted expression counter one" $
 34 |         (acceptw (rigged as) "aa" :: Int) `shouldBe` 1
 35 | 
 36 |      it "lifted expression dynamic counter four" $
 37 |         (acceptw (rigged (Seq as bs)) "ab" :: Int) `shouldBe` 4
 38 | 
 39 |      for_ cases test
 40 |         where
 41 |           test Case {..} = it description assertion
 42 |               where
 43 |                 assertion = (acceptw (rigged regex) sample :: Bool) `shouldBe` result
 44 | 
 45 | data Case = Case
 46 |   { description :: String
 47 |   , regex       :: Reg
 48 |   , sample      :: String
 49 |   , result      :: Bool
 50 |   }
 51 | 
 52 | cases :: [Case]
 53 | cases =
 54 |   [ Case {description = "empty", regex = Eps, sample = "", result = True}
 55 |   , Case {description = "char", regex = Sym 'a', sample = "a", result = True}
 56 |   , Case
 57 |       {description = "not char", regex = Sym 'a', sample = "b", result = False}
 58 |   , Case
 59 |       { description = "char vs empty"
 60 |       , regex = Sym 'a'
 61 |       , sample = ""
 62 |       , result = False
 63 |       }
 64 |   , Case
 65 |       { description = "left alt"
 66 |       , regex = Alt (Sym 'a') (Sym 'b')
 67 |       , sample = "a"
 68 |       , result = True
 69 |       }
 70 |   , Case
 71 |       { description = "right alt"
 72 |       , regex = Alt (Sym 'a') (Sym 'b')
 73 |       , sample = "b"
 74 |       , result = True
 75 |       }
 76 |   , Case
 77 |       { description = "neither alt"
 78 |       , regex = Alt (Sym 'a') (Sym 'b')
 79 |       , sample = "c"
 80 |       , result = False
 81 |       }
 82 |   , Case
 83 |       { description = "empty alt"
 84 |       , regex = Alt (Sym 'a') (Sym 'b')
 85 |       , sample = ""
 86 |       , result = False
 87 |       }
 88 |   , Case
 89 |       { description = "empty rep"
 90 |       , regex = Rep (Sym 'a')
 91 |       , sample = ""
 92 |       , result = True
 93 |       }
 94 |   , Case
 95 |       { description = "one rep"
 96 |       , regex = Rep (Sym 'a')
 97 |       , sample = "a"
 98 |       , result = True
 99 |       }
100 |   , Case
101 |       { description = "multiple rep"
102 |       , regex = Rep (Sym 'a')
103 |       , sample = "aaaaaaaaa"
104 |       , result = True
105 |       }
106 |   , Case
107 |       { description = "multiple rep with failure"
108 |       , regex = Rep (Sym 'a')
109 |       , sample = "aaaaaaaaab"
110 |       , result = False
111 |       }
112 |   , Case
113 |       { description = "sequence"
114 |       , regex = Seq (Sym 'a') (Sym 'b')
115 |       , sample = "ab"
116 |       , result = True
117 |       }
118 |   , Case
119 |       { description = "sequence with empty"
120 |       , regex = Seq (Sym 'a') (Sym 'b')
121 |       , sample = ""
122 |       , result = False
123 |       }
124 |   , Case
125 |       { description = "bad short sequence"
126 |       , regex = Seq (Sym 'a') (Sym 'b')
127 |       , sample = "a"
128 |       , result = False
129 |       }
130 |   , Case
131 |       { description = "bad long sequence"
132 |       , regex = Seq (Sym 'a') (Sym 'b')
133 |       , sample = "abc"
134 |       , result = False
135 |       }
136 |   ]
137 |           
138 |   
139 | 


--------------------------------------------------------------------------------
/haskell/03_Brzozowski/BrzExp.cabal:
--------------------------------------------------------------------------------
 1 | cabal-version: 1.12
 2 | 
 3 | -- This file has been generated from package.yaml by hpack version 0.31.1.
 4 | --
 5 | -- see: https://github.com/sol/hpack
 6 | --
 7 | -- hash: bf664c2df8f31bc5a6ffd907ebaf10824d56d8d271f07d3f4cc7562ce9d44395
 8 | 
 9 | name:           BrzExp
10 | version:        0.1.0.0
11 | category:       Regex
12 | homepage:       https://github.com/elfsternberg/riggedregex#readme
13 | author:         Elf M. Sternberg
14 | maintainer:     elf.sternberg@gmail.com
15 | copyright:      Copyright ⓒ 2019 Elf M. Sternberg
16 | license:        BSD3
17 | license-file:   LICENSE
18 | build-type:     Simple
19 | extra-source-files:
20 |     README.md
21 | 
22 | library
23 |   exposed-modules:
24 |       BrzExp
25 |   other-modules:
26 |       Paths_BrzExp
27 |   hs-source-dirs:
28 |       src
29 |   ghc-options: -Wall
30 |   build-depends:
31 |       base
32 |   default-language: Haskell2010
33 | 
34 | test-suite test
35 |   type: exitcode-stdio-1.0
36 |   main-is: Tests.hs
37 |   other-modules:
38 |       Paths_BrzExp
39 |   hs-source-dirs:
40 |       test
41 |   build-depends:
42 |       BrzExp
43 |     , base
44 |     , hspec
45 |   default-language: Haskell2010
46 | 


--------------------------------------------------------------------------------
/haskell/03_Brzozowski/LICENSE:
--------------------------------------------------------------------------------
1 | The Brzozowski experiments are original work, and are copyright and
2 | licensed by Elf M. Sternberg according to the Mozilla 2.0 Public
3 | License.  See the LICENSE.md file in the main directory
4 | 


--------------------------------------------------------------------------------
/haskell/03_Brzozowski/README.md:
--------------------------------------------------------------------------------
 1 | # Brzozowski Regular Expressions, in Haskell
 2 | 
 3 | This is a regex recognizer implementing Brzozowski's Algorithm, in
 4 | Haskell.  Brzozowski's Algorithm has been a bit of a fascination for me,
 5 | because it made generally much more sense that the traditional
 6 | algorithm, especially since the Pumping Lemma is much more intelligible
 7 | under Brzozowski than it is with more common forms of automata analysis.
 8 | 
 9 | Brzozowski's algorithm basically says that a regular expression is a
10 | function that, given a string and a regular expression, returns three
11 | things: the remainder of the input after the leading character has been
12 | consumed, and a new function that represents the rest of the regular
13 | expression after that leading character has been analyzed, and the
14 | status of the analysis thus far.
15 | 
16 | Brzozowski called this "the derivative of the regular expression."
17 | 
18 | The only trick to dealing with Brzozowski's Algorithm is with respect to
19 | nullability: it is important to know if a regular expression _may be
20 | nullable_ (that is, it may accept the empty string).  A separate
21 | function describes the nullability of the different kinds of expressions
22 | in our system.
23 | 
24 | 
25 | 
26 | 
27 | 
28 | 


--------------------------------------------------------------------------------
/haskell/03_Brzozowski/Setup.hs:
--------------------------------------------------------------------------------
1 | import Distribution.Simple
2 | main = defaultMain
3 | 


--------------------------------------------------------------------------------
/haskell/03_Brzozowski/package.yaml:
--------------------------------------------------------------------------------
 1 | name:                BrzExp
 2 | version:             0.1.0.0
 3 | 
 4 | homepage:            https://github.com/elfsternberg/riggedregex#readme
 5 | license:             BSD3
 6 | license-file:        LICENSE
 7 | author:              Elf M. Sternberg
 8 | maintainer:          elf.sternberg@gmail.com
 9 | copyright:           Copyright ⓒ 2019 Elf M. Sternberg
10 | category:            Regex
11 | build-type:          Simple
12 | extra-source-files:  README.md
13 | 
14 | dependencies:
15 |     - base
16 |                      
17 | library:
18 |   exposed-modules: BrzExp
19 |   ghc-options: -Wall
20 |   source-dirs: src
21 | 
22 | tests:
23 |   test:
24 |     main: Tests.hs
25 |     source-dirs: test
26 |     dependencies:
27 |       - BrzExp
28 |       - hspec
29 |   
30 | 


--------------------------------------------------------------------------------
/haskell/03_Brzozowski/src/BrzExp.hs:
--------------------------------------------------------------------------------
 1 | module BrzExp ( accept, nullable, Brz (..) ) where
 2 | data Brz = Emp | Eps | Sym Char | Alt Brz Brz | Seq Brz Brz | Rep Brz
 3 | 
 4 | derive :: Brz -> Char -> Brz
 5 | derive Emp _       = Emp
 6 | derive Eps _       = Emp
 7 | derive (Sym c) u   = if c == u then Eps else Emp
 8 | derive (Seq l r) u
 9 |     | nullable l = Alt (Seq (derive l u) r) (derive r u)
10 |     | otherwise  = Seq (derive l u) r
11 | 
12 | derive (Alt Emp r) u = derive r u                    
13 | derive (Alt l Emp) u = derive l u                    
14 | derive (Alt l r) u   = Alt (derive r u) (derive l u)
15 | 
16 | derive (Rep r) u = Seq (derive r u) (Rep r)
17 | 
18 | nullable :: Brz -> Bool
19 | nullable Emp       = False
20 | nullable Eps       = True
21 | nullable (Sym _)   = False
22 | nullable (Alt l r) = nullable l || nullable r
23 | nullable (Seq l r) = nullable l && nullable r
24 | nullable (Rep _)   = True                     
25 | 
26 | accept :: Brz -> String -> Bool
27 | accept r [] = nullable r
28 | accept r (s:ss) = accept (derive r s) ss
29 | 
30 |        
31 | 


--------------------------------------------------------------------------------
/haskell/03_Brzozowski/stack.yaml:
--------------------------------------------------------------------------------
 1 | # This file was automatically generated by 'stack init'
 2 | #
 3 | # Some commonly used options have been documented as comments in this file.
 4 | # For advanced use and comprehensive documentation of the format, please see:
 5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/
 6 | 
 7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version.
 8 | # A snapshot resolver dictates the compiler version and the set of packages
 9 | # to be used for project dependencies. For example:
10 | #
11 | # resolver: lts-3.5
12 | # resolver: nightly-2015-09-21
13 | # resolver: ghc-7.10.2
14 | #
15 | # The location of a snapshot can be provided as a file or url. Stack assumes
16 | # a snapshot provided as a file might change, whereas a url resource does not.
17 | #
18 | # resolver: ./custom-snapshot.yaml
19 | # resolver: https://example.com/snapshots/2018-01-01.yaml
20 | resolver: lts-13.4
21 | 
22 | # User packages to be built.
23 | # Various formats can be used as shown in the example below.
24 | #
25 | # packages:
26 | # - some-directory
27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz
28 | # - location:
29 | #    git: https://github.com/commercialhaskell/stack.git
30 | #    commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
32 | #  subdirs:
33 | #  - auto-update
34 | #  - wai
35 | packages:
36 | - .
37 | # Dependency packages to be pulled from upstream that are not in the resolver
38 | # using the same syntax as the packages field.
39 | # (e.g., acme-missiles-0.3)
40 | # extra-deps: []
41 | 
42 | # Override default flag values for local packages and extra-deps
43 | # flags: {}
44 | 
45 | # Extra package databases containing global packages
46 | # extra-package-dbs: []
47 | 
48 | # Control whether we use the GHC we find on the path
49 | # system-ghc: true
50 | #
51 | # Require a specific version of stack, using version ranges
52 | # require-stack-version: -any # Default
53 | # require-stack-version: ">=1.9"
54 | #
55 | # Override the architecture used by stack, especially useful on Windows
56 | # arch: i386
57 | # arch: x86_64
58 | #
59 | # Extra directories used by stack for building
60 | # extra-include-dirs: [/path/to/dir]
61 | # extra-lib-dirs: [/path/to/dir]
62 | #
63 | # Allow a newer minor version of GHC than the snapshot specifies
64 | # compiler-check: newer-minor
65 | 


--------------------------------------------------------------------------------
/haskell/03_Brzozowski/test/Tests.hs:
--------------------------------------------------------------------------------
  1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-}
  2 | {-# LANGUAGE RecordWildCards #-}
  3 | 
  4 | import           Data.Foldable     (for_)
  5 | import           Test.Hspec        (Spec, describe, it, shouldBe)
  6 | import           Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)
  7 | 
  8 | import           BrzExp            (Brz (..), accept)
  9 | 
 10 | main :: IO ()
 11 | main = hspecWith defaultConfig {configFastFail = True} specs
 12 | 
 13 | specs :: Spec
 14 | specs = describe "accept" $ for_ cases test
 15 |   where
 16 |     test Case {..} = it description assertion
 17 |       where
 18 |         assertion = accept regex sample `shouldBe` result
 19 | 
 20 | data Case = Case
 21 |   { description :: String
 22 |   , regex       :: Brz
 23 |   , sample      :: String
 24 |   , result      :: Bool
 25 |   }
 26 | 
 27 | -- let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) )
 28 | --     onec = Seq nocs (Sym 'c')
 29 | --     evencs = Seq ( Rep ( Seq onec onec ) ) nocs
 30 | --     as = Alt (Sym 'a') (Rep (Sym 'a'))
 31 | --     bs = Alt (Sym 'b') (Rep (Sym 'b'))
 32 | cases :: [Case]
 33 | cases =
 34 |   [ Case {description = "empty", regex = Eps, sample = "", result = True}
 35 |   , Case {description = "null", regex = Emp, sample = "", result = False}
 36 |   , Case {description = "char", regex = Sym 'a', sample = "a", result = True}
 37 |   , Case
 38 |       {description = "not char", regex = Sym 'a', sample = "b", result = False}
 39 |   , Case
 40 |       { description = "char vs empty"
 41 |       , regex = Sym 'a'
 42 |       , sample = ""
 43 |       , result = False
 44 |       }
 45 |   , Case
 46 |       { description = "left alt"
 47 |       , regex = Alt (Sym 'a') (Sym 'b')
 48 |       , sample = "a"
 49 |       , result = True
 50 |       }
 51 |   , Case
 52 |       { description = "right alt"
 53 |       , regex = Alt (Sym 'a') (Sym 'b')
 54 |       , sample = "b"
 55 |       , result = True
 56 |       }
 57 |   , Case
 58 |       { description = "neither alt"
 59 |       , regex = Alt (Sym 'a') (Sym 'b')
 60 |       , sample = "c"
 61 |       , result = False
 62 |       }
 63 |   , Case
 64 |       { description = "empty alt"
 65 |       , regex = Alt (Sym 'a') (Sym 'b')
 66 |       , sample = ""
 67 |       , result = False
 68 |       }
 69 |   , Case
 70 |       { description = "empty rep"
 71 |       , regex = Rep (Sym 'a')
 72 |       , sample = ""
 73 |       , result = True
 74 |       }
 75 |   , Case
 76 |       { description = "one rep"
 77 |       , regex = Rep (Sym 'a')
 78 |       , sample = "a"
 79 |       , result = True
 80 |       }
 81 |   , Case
 82 |       { description = "multiple rep"
 83 |       , regex = Rep (Sym 'a')
 84 |       , sample = "aaaaaaaaa"
 85 |       , result = True
 86 |       }
 87 |   , Case
 88 |       { description = "multiple rep with failure"
 89 |       , regex = Rep (Sym 'a')
 90 |       , sample = "aaaaaaaaab"
 91 |       , result = False
 92 |       }
 93 |   , Case
 94 |       { description = "sequence"
 95 |       , regex = Seq (Sym 'a') (Sym 'b')
 96 |       , sample = "ab"
 97 |       , result = True
 98 |       }
 99 |   , Case
100 |       { description = "sequence with empty"
101 |       , regex = Seq (Sym 'a') (Sym 'b')
102 |       , sample = ""
103 |       , result = False
104 |       }
105 |   , Case
106 |       { description = "bad short sequence"
107 |       , regex = Seq (Sym 'a') (Sym 'b')
108 |       , sample = "a"
109 |       , result = False
110 |       }
111 |   , Case
112 |       { description = "bad long sequence"
113 |       , regex = Seq (Sym 'a') (Sym 'b')
114 |       , sample = "abc"
115 |       , result = False
116 |       }
117 |   ]
118 | 


--------------------------------------------------------------------------------
/haskell/04_Gluskov/Glushkov.cabal:
--------------------------------------------------------------------------------
 1 | cabal-version: 1.12
 2 | 
 3 | -- This file has been generated from package.yaml by hpack version 0.31.1.
 4 | --
 5 | -- see: https://github.com/sol/hpack
 6 | --
 7 | -- hash: 1a234ba3e4b3372f4e6f179bb337b813ee69faffc8b001f781636f1ba3d185e4
 8 | 
 9 | name:           Glushkov
10 | version:        0.1.0.0
11 | category:       Regex
12 | homepage:       https://github.com/elfsternberg/riggedregex#readme
13 | author:         Elf M. Sternberg
14 | maintainer:     elf.sternberg@gmail.com
15 | copyright:      Copyright ⓒ 2019 Elf M. Sternberg
16 | license:        BSD3
17 | license-file:   LICENSE
18 | build-type:     Simple
19 | extra-source-files:
20 |     README.md
21 | 
22 | library
23 |   exposed-modules:
24 |       Glushkov
25 |   other-modules:
26 |       Paths_Glushkov
27 |   hs-source-dirs:
28 |       src
29 |   ghc-options: -Wall
30 |   build-depends:
31 |       base
32 |   default-language: Haskell2010
33 | 
34 | test-suite test
35 |   type: exitcode-stdio-1.0
36 |   main-is: Tests.hs
37 |   other-modules:
38 |       Paths_Glushkov
39 |   hs-source-dirs:
40 |       test
41 |   build-depends:
42 |       Glushkov
43 |     , base
44 |     , hspec
45 |   default-language: Haskell2010
46 | 


--------------------------------------------------------------------------------
/haskell/04_Gluskov/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2010, Thomas Wilke, Frank Huch, Sebastian Fischer
 2 | 
 3 | All rights reserved.
 4 | 
 5 | Redistribution and use in source and binary forms, with or without
 6 | modification, are permitted provided that the following conditions are
 7 | met:
 8 | 
 9 |  1. Redistributions of source code must retain the above copyright
10 |     notice, this list of conditions and the following disclaimer.
11 | 
12 |  2. Redistributions in binary form must reproduce the above copyright
13 |     notice, this list of conditions and the following disclaimer in
14 |     the documentation and/or other materials provided with the
15 |     distribution.
16 | 
17 |  3. Neither the name of the author nor the names of his contributors
18 |     may be used to endorse or promote products derived from this
19 |     software without specific prior written permission.
20 | 
21 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
22 | ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
23 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
24 | A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHORS OR
25 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
26 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
27 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
28 | PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
29 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
30 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
31 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
32 | 
33 | 


--------------------------------------------------------------------------------
/haskell/04_Gluskov/README.md:
--------------------------------------------------------------------------------
 1 | # Glushkov Regular Expressions, in Haskell
 2 | 
 3 | This is a Glushkov's construction of regular expressions. The basic
 4 | idea is that for every symbol encountered during parsing, a
 5 | corresponding symbol in the tree is marked (or, if not symbols are
 6 | marked, the parse is a failure).  Composites are followed to their
 7 | ends for each character, and if the symbol matches it is "marked".
 8 | 
 9 | In this instance, are passing a Glushkov regular expression tree,
10 | and for each character it returns a new, complete copy of the tree,
11 | only with the marks "shifted" to where they should be given the
12 | character.  In this way, each iteration of the tree keeps the NFA
13 | list of states that are active; they are the paths that lead to
14 | marked symbols.
15 | 
16 | 'final' here means that no more symbols have to be read to match
17 | the expression.  'empty' here means that the expression matches
18 | only the empty string.
19 | 
20 | 'final' is used here to determine if, for the Glushkov expression
21 | passed in, does the expression contain a marked symbol?  This is
22 | used both to determine the end state of the expression, and in
23 | sequences to determine if the rightmost expression must be evaluted,
24 | that is, if we're currently going down a 'marked' path and the left
25 | expression can handle the empty string OR the left expression is
26 | final.
27 | 
28 | The accept method is just a fold over the expression.  The initial
29 | value is the shift of the first character, with the assumed mark of
30 | 'True' being included because we can always parse infinitely many
31 | empty strings before the sample begins.  The returned value of that
32 | shift is our new regular expression, on which we then progressively
33 | call `shift False accg c`; here False means that we're only going to
34 | shift marks we've already found.
35 | 
36 | The "trick" to understand this is to consider the string "ab" for
37 | the sequence "ab".  The first time through, we start with True, and
38 | what gets marked is the symbol 'a'.
39 | 
40 | When we pass the letter 'b', what happens?  Well, the 'a' symbol
41 | will be unmarked (it didn't match the character), but the second
42 | part of the shift expression says that the left expression is final
43 | (it's a symbol and it's marked!), so we call `shift True (Sym 'b')
44 | 'b'`, and the mark moves to the correct destination.
45 | 
46 | It continues to blow my mind that so much of mathematics can be directly
47 | translated into Haskell with no loss of fidelity.
48 | 


--------------------------------------------------------------------------------
/haskell/04_Gluskov/package.yaml:
--------------------------------------------------------------------------------
 1 | name:                Glushkov
 2 | version:             0.1.0.0
 3 | 
 4 | homepage:            https://github.com/elfsternberg/riggedregex#readme
 5 | license:             BSD3
 6 | license-file:        LICENSE
 7 | author:              Elf M. Sternberg
 8 | maintainer:          elf.sternberg@gmail.com
 9 | copyright:           Copyright ⓒ 2019 Elf M. Sternberg
10 | category:            Regex
11 | build-type:          Simple
12 | extra-source-files:  README.md
13 | 
14 | dependencies:
15 |     - base
16 | 
17 | library:
18 |   exposed-modules: Glushkov
19 |   ghc-options: -Wall
20 |   source-dirs: src
21 | 
22 | tests:
23 |   test:
24 |     main: Tests.hs
25 |     source-dirs: test
26 |     dependencies:
27 |       - Glushkov
28 |       - hspec
29 |   
30 | 


--------------------------------------------------------------------------------
/haskell/04_Gluskov/src/Glushkov.hs:
--------------------------------------------------------------------------------
 1 | module Glushkov (Glu (..), accept) where
 2 | 
 3 | data Glu = Eps
 4 |          | Sym Bool Char
 5 |          | Alt Glu Glu
 6 |          | Seq Glu Glu
 7 |          | Rep Glu
 8 | 
 9 | shift :: Bool -> Glu -> Char -> Glu
10 | shift _ Eps _       = Eps
11 | shift m (Sym _ x) c = Sym (m && x == c) x
12 | shift m (Alt p q) c = Alt (shift m p c) (shift m q c)
13 | shift m (Seq p q) c = Seq (shift m p c) (shift (m && empty p || final p) q c)
14 | shift m (Rep r)   c = Rep (shift (m || final r) r c)
15 | 
16 | empty :: Glu -> Bool
17 | empty Eps       = True
18 | empty (Sym _ _) = False
19 | empty (Alt p q) = empty p || empty q
20 | empty (Seq p q) = empty p && empty q
21 | empty (Rep _)   = True                      
22 | 
23 | final :: Glu -> Bool
24 | final Eps       = False
25 | final (Sym b _) = b
26 | final (Alt p q) = final p || final q
27 | final (Seq p q) = final p && empty q || final q
28 | final (Rep r)   = final r                  
29 | 
30 | accept :: Glu -> String -> Bool
31 | accept r []      = empty r
32 | accept r (c:cs)  = final (foldl (shift False) (shift True r c) cs)
33 | 


--------------------------------------------------------------------------------
/haskell/04_Gluskov/stack.yaml:
--------------------------------------------------------------------------------
 1 | # This file was automatically generated by 'stack init'
 2 | #
 3 | # Some commonly used options have been documented as comments in this file.
 4 | # For advanced use and comprehensive documentation of the format, please see:
 5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/
 6 | 
 7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version.
 8 | # A snapshot resolver dictates the compiler version and the set of packages
 9 | # to be used for project dependencies. For example:
10 | #
11 | # resolver: lts-3.5
12 | # resolver: nightly-2015-09-21
13 | # resolver: ghc-7.10.2
14 | #
15 | # The location of a snapshot can be provided as a file or url. Stack assumes
16 | # a snapshot provided as a file might change, whereas a url resource does not.
17 | #
18 | # resolver: ./custom-snapshot.yaml
19 | # resolver: https://example.com/snapshots/2018-01-01.yaml
20 | resolver: lts-13.5
21 | 
22 | # User packages to be built.
23 | # Various formats can be used as shown in the example below.
24 | #
25 | # packages:
26 | # - some-directory
27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz
28 | # - location:
29 | #    git: https://github.com/commercialhaskell/stack.git
30 | #    commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
32 | #  subdirs:
33 | #  - auto-update
34 | #  - wai
35 | packages:
36 | - .
37 | # Dependency packages to be pulled from upstream that are not in the resolver
38 | # using the same syntax as the packages field.
39 | # (e.g., acme-missiles-0.3)
40 | # extra-deps: []
41 | 
42 | # Override default flag values for local packages and extra-deps
43 | # flags: {}
44 | 
45 | # Extra package databases containing global packages
46 | # extra-package-dbs: []
47 | 
48 | # Control whether we use the GHC we find on the path
49 | # system-ghc: true
50 | #
51 | # Require a specific version of stack, using version ranges
52 | # require-stack-version: -any # Default
53 | # require-stack-version: ">=1.9"
54 | #
55 | # Override the architecture used by stack, especially useful on Windows
56 | # arch: i386
57 | # arch: x86_64
58 | #
59 | # Extra directories used by stack for building
60 | # extra-include-dirs: [/path/to/dir]
61 | # extra-lib-dirs: [/path/to/dir]
62 | #
63 | # Allow a newer minor version of GHC than the snapshot specifies
64 | # compiler-check: newer-minor
65 | 


--------------------------------------------------------------------------------
/haskell/04_Gluskov/test/Tests.hs:
--------------------------------------------------------------------------------
  1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-}
  2 | {-# LANGUAGE RecordWildCards #-}
  3 | 
  4 | import           Data.Foldable     (for_)
  5 | import           Test.Hspec        (Spec, describe, it, shouldBe)
  6 | import           Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)
  7 | 
  8 | import           Glushkov            (Glu (..), accept)
  9 | 
 10 | main :: IO ()
 11 | main = hspecWith defaultConfig {configFastFail = True} specs
 12 | 
 13 | specs :: Spec
 14 | specs = describe "accept" $ for_ cases test
 15 |   where
 16 |     test Case {..} = it description assertion
 17 |       where
 18 |         assertion = accept regex sample `shouldBe` result
 19 | 
 20 | data Case = Case
 21 |   { description :: String
 22 |   , regex       :: Glu
 23 |   , sample      :: String
 24 |   , result      :: Bool
 25 |   }
 26 | 
 27 | -- let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) )
 28 | --     onec = Seq nocs (Sym 'c')
 29 | --     evencs = Seq ( Rep ( Seq onec onec ) ) nocs
 30 | --     as = Alt (Sym 'a') (Rep (Sym 'a'))
 31 | --     bs = Alt (Sym 'b') (Rep (Sym 'b'))
 32 | 
 33 | cases :: [Case]
 34 | cases =
 35 |   [ Case {description = "empty", regex = Eps, sample = "", result = True}
 36 |   , Case {description = "char", regex = Sym False 'a', sample = "a", result = True}
 37 |   , Case
 38 |       {description = "not char", regex = Sym False 'a', sample = "b", result = False}
 39 |   , Case
 40 |       { description = "char vs empty"
 41 |       , regex = Sym False 'a'
 42 |       , sample = ""
 43 |       , result = False
 44 |       }
 45 |   , Case
 46 |       { description = "left alt"
 47 |       , regex = Alt (Sym False 'a') (Sym False 'b')
 48 |       , sample = "a"
 49 |       , result = True
 50 |       }
 51 |   , Case
 52 |       { description = "right alt"
 53 |       , regex = Alt (Sym False 'a') (Sym False 'b')
 54 |       , sample = "b"
 55 |       , result = True
 56 |       }
 57 |   , Case
 58 |       { description = "neither alt"
 59 |       , regex = Alt (Sym False 'a') (Sym False 'b')
 60 |       , sample = "c"
 61 |       , result = False
 62 |       }
 63 |   , Case
 64 |       { description = "empty alt"
 65 |       , regex = Alt (Sym False 'a') (Sym False 'b')
 66 |       , sample = ""
 67 |       , result = False
 68 |       }
 69 |   , Case
 70 |       { description = "empty rep"
 71 |       , regex = Rep (Sym False 'a')
 72 |       , sample = ""
 73 |       , result = True
 74 |       }
 75 |   , Case
 76 |       { description = "one rep"
 77 |       , regex = Rep (Sym False 'a')
 78 |       , sample = "a"
 79 |       , result = True
 80 |       }
 81 |   , Case
 82 |       { description = "multiple rep"
 83 |       , regex = Rep (Sym False 'a')
 84 |       , sample = "aaaaaaaaa"
 85 |       , result = True
 86 |       }
 87 |   , Case
 88 |       { description = "multiple rep with failure"
 89 |       , regex = Rep (Sym False 'a')
 90 |       , sample = "aaaaaaaaab"
 91 |       , result = False
 92 |       }
 93 |   , Case
 94 |       { description = "sequence"
 95 |       , regex = Seq (Sym False 'a') (Sym False 'b')
 96 |       , sample = "ab"
 97 |       , result = True
 98 |       }
 99 |   , Case
100 |       { description = "sequence with empty"
101 |       , regex = Seq (Sym False 'a') (Sym False 'b')
102 |       , sample = ""
103 |       , result = False
104 |       }
105 |   , Case
106 |       { description = "bad short sequence"
107 |       , regex = Seq (Sym False 'a') (Sym False 'b')
108 |       , sample = "a"
109 |       , result = False
110 |       }
111 |   , Case
112 |       { description = "bad long sequence"
113 |       , regex = Seq (Sym False 'a') (Sym False 'b')
114 |       , sample = "abc"
115 |       , result = False
116 |       }
117 |   ]
118 | 


--------------------------------------------------------------------------------
/haskell/05_RiggedBrz/LICENSE:
--------------------------------------------------------------------------------
1 | The Brzozowski experiments are original work, and are copyright and
2 | licensed by Elf M. Sternberg according to the Mozilla 2.0 Public
3 | License.  See the LICENSE.md file in the main directory
4 | 


--------------------------------------------------------------------------------
/haskell/05_RiggedBrz/README.md:
--------------------------------------------------------------------------------
 1 | # Rigged Brzozowski Regular Expressions, in Haskell
 2 | 
 3 | This is the naive implementation of Brzozowski's Algorithm, but with a
 4 | Semiring implementation for gathering complex information from the parse
 5 | process.  This implementation is "naive" in that it saves everything,
 6 | including the very large number of dead branches that hang off Sequence
 7 | processing, and then discards them at the very end of the process.
 8 | 
 9 | This implementation finally proves to me something that I've been trying
10 | to express for a while: Might, Adams, et. al.'s implementations of tree
11 | parsing *are* Semiring implementations, they just don't call it that,
12 | but the fundamental underlying operations are the same.
13 | 
14 | I'm fascinated by the lack of the nullability operator.  Instead, it's
15 | just resolved by Emp being parsed as `zero` and Eps as `one * s` where
16 | `s` is the product of the previous operation, and then the new `Delta`
17 | operator preserves this semantic, using multiplicative annhilation to
18 | discard false parses while also being immune to the `Sequence` semantic
19 | that destroys success parse history.
20 | 
21 | This can't last.  And Might admits it doesn't last.  Darais's
22 | implementation goes back to having a separate function for nullability
23 | that both preserves the status of known-nullable expressions and handles
24 | recursion.  Darais's version also implements an incredible number of
25 | optimizations to prune, compact, and process the parse tree early,
26 | enabling a number of speedups and caching strategies that get you within
27 | spitting distance of RE2.
28 | 
29 | 
30 | 


--------------------------------------------------------------------------------
/haskell/05_RiggedBrz/RiggedBrz.cabal:
--------------------------------------------------------------------------------
 1 | cabal-version: 1.12
 2 | 
 3 | -- This file has been generated from package.yaml by hpack version 0.31.1.
 4 | --
 5 | -- see: https://github.com/sol/hpack
 6 | --
 7 | -- hash: 1c0fd9015f1269af8c7445bf2af4c10b835d053b9ddce4a3df3815fb4724e489
 8 | 
 9 | name:           RiggedBrz
10 | version:        0.1.0.0
11 | category:       Regex
12 | homepage:       https://github.com/elfsternberg/riggedregex#readme
13 | author:         Elf M. Sternberg
14 | maintainer:     elf.sternberg@gmail.com
15 | copyright:      Copyright ⓒ 2019 Elf M. Sternberg
16 | license:        BSD3
17 | license-file:   LICENSE
18 | build-type:     Simple
19 | extra-source-files:
20 |     README.md
21 | 
22 | library
23 |   exposed-modules:
24 |       RiggedBrz
25 |   other-modules:
26 |       Paths_RiggedBrz
27 |   hs-source-dirs:
28 |       src
29 |   ghc-options: -Wall
30 |   build-depends:
31 |       base
32 |     , containers
33 |   default-language: Haskell2010
34 | 
35 | test-suite test
36 |   type: exitcode-stdio-1.0
37 |   main-is: Tests.hs
38 |   other-modules:
39 |       Paths_RiggedBrz
40 |   hs-source-dirs:
41 |       test
42 |   build-depends:
43 |       RiggedBrz
44 |     , base
45 |     , containers
46 |     , hspec
47 |   default-language: Haskell2010
48 | 


--------------------------------------------------------------------------------
/haskell/05_RiggedBrz/Setup.hs:
--------------------------------------------------------------------------------
1 | import Distribution.Simple
2 | main = defaultMain
3 | 


--------------------------------------------------------------------------------
/haskell/05_RiggedBrz/package.yaml:
--------------------------------------------------------------------------------
 1 | name:                RiggedBrz
 2 | version:             0.1.0.0
 3 | 
 4 | homepage:            https://github.com/elfsternberg/riggedregex#readme
 5 | license:             BSD3
 6 | license-file:        LICENSE
 7 | author:              Elf M. Sternberg
 8 | maintainer:          elf.sternberg@gmail.com
 9 | copyright:           Copyright ⓒ 2019 Elf M. Sternberg
10 | category:            Regex
11 | build-type:          Simple
12 | extra-source-files:  README.md
13 | 
14 | dependencies:
15 |     - containers
16 |     - base
17 | 
18 | library:
19 |   exposed-modules: RiggedBrz
20 |   ghc-options: -Wall
21 |   source-dirs: src
22 | 
23 | tests:
24 |   test:
25 |     main: Tests.hs
26 |     source-dirs: test
27 |     dependencies:
28 |       - RiggedBrz
29 |       - hspec
30 |   
31 | 


--------------------------------------------------------------------------------
/haskell/05_RiggedBrz/src/RiggedBrz.hs:
--------------------------------------------------------------------------------
 1 | {-# LANGUAGE LambdaCase #-}
 2 | {-# LANGUAGE FlexibleInstances #-}
 3 | 
 4 | module RiggedBrz ( Brz (..), parse, rigged, riggeds ) where
 5 | 
 6 | import Data.Set
 7 |     
 8 | data Brz = Emp | Eps | Sym Char | Alt Brz Brz | Seq Brz Brz | Rep Brz deriving (Eq)
 9 | 
10 | -- Transform a Brz into a Brzr.  That's all it does.  It's not magical.
11 |               
12 | rigging :: Semiring s => (Char -> Brzr Char s) -> Brz -> Brzr Char s
13 | rigging s = \case
14 |              Emp -> Empr
15 |              Eps -> Epsr one
16 |              (Sym c) -> s c
17 |              (Alt p q) -> Altr (rigging s p) (rigging s q)
18 |              (Seq p q) -> Seqr (rigging s p) (rigging s q)
19 |              (Rep r) -> Repr (rigging s r)
20 | 
21 | class Semiring s where
22 |   zero, one :: s
23 |   mul, add :: s -> s -> s
24 | 
25 | data Brzr c s = Empr
26 |               | Epsr s
27 |               | Delr (Brzr c s)
28 |               | Symr (c -> s)
29 |               | Altr (Brzr c s) (Brzr c s)
30 |               | Seqr (Brzr c s) (Brzr c s)
31 |               | Repr (Brzr c s)
32 | 
33 | deriver :: Semiring s => Brzr c s -> c -> Brzr c s
34 | deriver Empr _        = Empr
35 | deriver (Epsr _) _    = Empr
36 | deriver (Delr _) _    = Empr
37 | deriver (Symr f) u    = Epsr $ (f u)
38 | 
39 | deriver (Seqr l r) u  =
40 |     Altr dl dr
41 |         where
42 |           dl = Seqr (deriver l u) r
43 |           dr = Seqr (Delr l) (deriver r u)
44 | 
45 | deriver (Altr l r) u = go (deriver l u) (deriver r u)
46 |     where go Empr r1 = r1
47 |           go r1 Empr = r1
48 |           go l1 r1 = Altr l1 r1
49 |         
50 | deriver (Repr r) u = Seqr (deriver r u) (Repr r)
51 | 
52 | parsenull :: Semiring s => (Brzr c s) -> s
53 | parsenull Empr = zero
54 | parsenull (Symr _)   = zero
55 | parsenull (Repr _)   = one
56 | parsenull (Epsr s)   = s
57 | parsenull (Delr s)   = parsenull s
58 | parsenull (Altr p q) = parsenull p `add` parsenull q
59 | parsenull (Seqr p q) = parsenull p `mul` parsenull q
60 | 
61 | instance Semiring Int where
62 |     zero = 0
63 |     one = 1
64 |     add = (Prelude.+)
65 |     mul = (Prelude.*)
66 | 
67 | instance Semiring Bool where
68 |     zero = False
69 |     one = True
70 |     add = (||)
71 |     mul = (&&)
72 | 
73 | instance Semiring (Set String) where
74 |     zero    = empty 
75 |     one     = singleton ""
76 |     add     = union
77 |     mul a b = Data.Set.map (uncurry (++)) $ cartesianProduct a b
78 | 
79 | -- Rigging for boolean and integer values.              
80 | 
81 | sym :: Semiring s => Char -> Brzr Char s
82 | sym c = Symr (\b -> if b == c then one else zero)      
83 | 
84 | rigged :: Semiring s => Brz -> Brzr Char s
85 | rigged = rigging sym
86 | 
87 | -- Rigging for parse forests
88 |          
89 | syms :: Char -> Brzr Char (Set String)
90 | syms c = Symr (\b -> if b == c then singleton [c] else zero)
91 |          
92 | riggeds :: Brz -> Brzr Char (Set String)
93 | riggeds = rigging syms
94 |               
95 | parse :: (Semiring s) => (Brzr Char s) -> String -> s
96 | parse w [] = parsenull w
97 | parse w (c:cs) = parse (deriver w c) cs
98 |        
99 | 


--------------------------------------------------------------------------------
/haskell/05_RiggedBrz/stack.yaml:
--------------------------------------------------------------------------------
 1 | # This file was automatically generated by 'stack init'
 2 | #
 3 | # Some commonly used options have been documented as comments in this file.
 4 | # For advanced use and comprehensive documentation of the format, please see:
 5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/
 6 | 
 7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version.
 8 | # A snapshot resolver dictates the compiler version and the set of packages
 9 | # to be used for project dependencies. For example:
10 | #
11 | # resolver: lts-3.5
12 | # resolver: nightly-2015-09-21
13 | # resolver: ghc-7.10.2
14 | #
15 | # The location of a snapshot can be provided as a file or url. Stack assumes
16 | # a snapshot provided as a file might change, whereas a url resource does not.
17 | #
18 | # resolver: ./custom-snapshot.yaml
19 | # resolver: https://example.com/snapshots/2018-01-01.yaml
20 | resolver: lts-13.5
21 | 
22 | # User packages to be built.
23 | # Various formats can be used as shown in the example below.
24 | #
25 | # packages:
26 | # - some-directory
27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz
28 | # - location:
29 | #    git: https://github.com/commercialhaskell/stack.git
30 | #    commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
32 | #  subdirs:
33 | #  - auto-update
34 | #  - wai
35 | packages:
36 | - .
37 | # Dependency packages to be pulled from upstream that are not in the resolver
38 | # using the same syntax as the packages field.
39 | # (e.g., acme-missiles-0.3)
40 | extra-deps:
41 |   - containers-0.6.0.1
42 | 
43 | # Override default flag values for local packages and extra-deps
44 | # flags: {}
45 | 
46 | # Extra package databases containing global packages
47 | # extra-package-dbs: []
48 | 
49 | # Control whether we use the GHC we find on the path
50 | # system-ghc: true
51 | #
52 | # Require a specific version of stack, using version ranges
53 | # require-stack-version: -any # Default
54 | # require-stack-version: ">=1.9"
55 | #
56 | # Override the architecture used by stack, especially useful on Windows
57 | # arch: i386
58 | # arch: x86_64
59 | #
60 | # Extra directories used by stack for building
61 | # extra-include-dirs: [/path/to/dir]
62 | # extra-lib-dirs: [/path/to/dir]
63 | #
64 | # Allow a newer minor version of GHC than the snapshot specifies
65 | # compiler-check: newer-minor
66 | 


--------------------------------------------------------------------------------
/haskell/05_RiggedBrz/test/Tests.hs:
--------------------------------------------------------------------------------
  1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-}
  2 | {-# LANGUAGE RecordWildCards #-}
  3 | import           Data.Foldable     (for_)
  4 | import Test.Hspec        (Spec, it, shouldBe)
  5 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)
  6 | import RiggedBrz ( Brz (..), parse, rigged, riggeds )
  7 | import Data.Set
  8 | import Data.List (sort)
  9 |     
 10 | main :: IO ()
 11 | main = hspecWith defaultConfig {configFastFail = True} specs
 12 | 
 13 | specs :: Spec
 14 | specs = do
 15 | 
 16 |      let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) )
 17 |      let onec = Seq nocs (Sym 'c')
 18 |      let evencs = Seq ( Rep ( Seq onec onec ) ) nocs
 19 | 
 20 |      let as = Alt (Sym 'a') (Rep (Sym 'a'))                  
 21 |      let bs = Alt (Sym 'b') (Rep (Sym 'b'))
 22 | 
 23 |      it "lifted expression" $
 24 |         (parse (rigged evencs) "acc" :: Bool) `shouldBe` True
 25 | 
 26 |      it "lifted expression short" $
 27 |         (parse (rigged evencs) "acc" :: Int) `shouldBe` 1
 28 | 
 29 |      it "lifted expression counter two" $
 30 |         (parse (rigged as) "a" :: Int) `shouldBe` 2
 31 | 
 32 |      it "lifted expression counter one" $
 33 |         (parse (rigged as) "aa" :: Int) `shouldBe` 1
 34 | 
 35 |      it "lifted expression dynamic counter four" $
 36 |         (parse (rigged (Seq as bs)) "ab" :: Int) `shouldBe` 4
 37 | 
 38 |      it "parse forests" $
 39 |             (sort $ toList $ (parse (riggeds (Seq as bs)) "ab" :: Set String)) `shouldBe` ["ab"]
 40 |                                                   
 41 |      for_ cases test
 42 |         where
 43 |           test Case {..} = it description assertion
 44 |               where
 45 |                 assertion = (parse (rigged regex) sample :: Bool) `shouldBe` result
 46 | 
 47 | data Case = Case
 48 |   { description :: String
 49 |   , regex       :: Brz
 50 |   , sample      :: String
 51 |   , result      :: Bool
 52 |   }
 53 | 
 54 | cases :: [Case]
 55 | cases =
 56 |   [ Case {description = "empty", regex = Eps, sample = "", result = True}
 57 |   , Case {description = "char", regex = Sym 'a', sample = "a", result = True}
 58 |   , Case
 59 |       {description = "not char", regex = Sym 'a', sample = "b", result = False}
 60 |   , Case
 61 |       { description = "char vs empty"
 62 |       , regex = Sym 'a'
 63 |       , sample = ""
 64 |       , result = False
 65 |       }
 66 |   , Case
 67 |       { description = "left alt"
 68 |       , regex = Alt (Sym 'a') (Sym 'b')
 69 |       , sample = "a"
 70 |       , result = True
 71 |       }
 72 |   , Case
 73 |       { description = "right alt"
 74 |       , regex = Alt (Sym 'a') (Sym 'b')
 75 |       , sample = "b"
 76 |       , result = True
 77 |       }
 78 |   , Case
 79 |       { description = "neither alt"
 80 |       , regex = Alt (Sym 'a') (Sym 'b')
 81 |       , sample = "c"
 82 |       , result = False
 83 |       }
 84 |   , Case
 85 |       { description = "empty alt"
 86 |       , regex = Alt (Sym 'a') (Sym 'b')
 87 |       , sample = ""
 88 |       , result = False
 89 |       }
 90 |   , Case
 91 |       { description = "empty rep"
 92 |       , regex = Rep (Sym 'a')
 93 |       , sample = ""
 94 |       , result = True
 95 |       }
 96 |   , Case
 97 |       { description = "one rep"
 98 |       , regex = Rep (Sym 'a')
 99 |       , sample = "a"
100 |       , result = True
101 |       }
102 |   , Case
103 |       { description = "multiple rep"
104 |       , regex = Rep (Sym 'a')
105 |       , sample = "aaaaaaaaa"
106 |       , result = True
107 |       }
108 |   , Case
109 |       { description = "multiple rep with failure"
110 |       , regex = Rep (Sym 'a')
111 |       , sample = "aaaaaaaaab"
112 |       , result = False
113 |       }
114 |   , Case
115 |       { description = "sequence"
116 |       , regex = Seq (Sym 'a') (Sym 'b')
117 |       , sample = "ab"
118 |       , result = True
119 |       }
120 |   , Case
121 |       { description = "sequence with empty"
122 |       , regex = Seq (Sym 'a') (Sym 'b')
123 |       , sample = ""
124 |       , result = False
125 |       }
126 |   , Case
127 |       { description = "bad short sequence"
128 |       , regex = Seq (Sym 'a') (Sym 'b')
129 |       , sample = "a"
130 |       , result = False
131 |       }
132 |   , Case
133 |       { description = "bad long sequence"
134 |       , regex = Seq (Sym 'a') (Sym 'b')
135 |       , sample = "abc"
136 |       , result = False
137 |       }
138 |   ]
139 |           
140 |   
141 | 


--------------------------------------------------------------------------------
/haskell/06_RiggedRegex_Combinator/LICENSE:
--------------------------------------------------------------------------------
1 | The Brzozowski experiments are original work, and are copyright and
2 | licensed by Elf M. Sternberg according to the Mozilla 2.0 Public
3 | License.  See the LICENSE.md file in the main directory
4 | 


--------------------------------------------------------------------------------
/haskell/06_RiggedRegex_Combinator/README.md:
--------------------------------------------------------------------------------
 1 | # Kleene Regular Expressions with Rigging, in Haskell
 2 | 
 3 | This variant takes the RiggedRegex (Version 02) and provides a third
 4 | Semiring, `Semiring Set String`.  `Zero` is the empty set, `One` is a
 5 | set with an empty string, `Add` is union and `Mul` is the cartesian
 6 | concatenation of the tuples generated by the cartesian product.  The
 7 | `sym` function is now modified to return `Zero` on failure, or on
 8 | success a Set containing the recognized character as a string.
 9 | 
10 | The union of a any set with the empty set is the set; the cartesian
11 | product of any set with the empty set is the empty set; the
12 | concatenation of the empty string with any set of strings is that set of
13 | strings, so the Semiring properties hold.
14 | 
15 | The result is a regular expression engine that returns all possible
16 | unique sets of strings that resulted from matching the regular
17 | expression, or the empty set if no match happened.
18 | 
19 | <s>I'm not yet comfortable with the theoretical underpinnings of this 
20 | variant, but I'm reading intensely to see where I can land this.</s>
21 | 
22 | It turns out that what I did is just fine, and is well-supported by the
23 | theoretical underpinnings.  See "[Semiring
24 | Parsing](https://www.aclweb.org/anthology/J99-4004.pdf)" by Joshua
25 | Goodman.
26 | 
27 | 
28 | 


--------------------------------------------------------------------------------
/haskell/06_RiggedRegex_Combinator/RiggedRegex.cabal:
--------------------------------------------------------------------------------
 1 | cabal-version: 1.12
 2 | 
 3 | -- This file has been generated from package.yaml by hpack version 0.31.1.
 4 | --
 5 | -- see: https://github.com/sol/hpack
 6 | --
 7 | -- hash: 24886bf51ff45652f17f1174185b977a916ba0794b24fee1315723e119dc204a
 8 | 
 9 | name:           RiggedRegex
10 | version:        0.1.0.0
11 | category:       Regex
12 | homepage:       https://github.com/elfsternberg/riggedregex#readme
13 | author:         Elf M. Sternberg
14 | maintainer:     elf.sternberg@gmail.com
15 | copyright:      Copyright ⓒ 2019 Elf M. Sternberg
16 | license:        BSD3
17 | license-file:   LICENSE
18 | build-type:     Simple
19 | extra-source-files:
20 |     README.md
21 | 
22 | library
23 |   exposed-modules:
24 |       RiggedRegex
25 |   other-modules:
26 |       Paths_RiggedRegex
27 |   hs-source-dirs:
28 |       src
29 |   ghc-options: -Wall
30 |   build-depends:
31 |       base
32 |     , containers
33 |   default-language: Haskell2010
34 | 
35 | test-suite test
36 |   type: exitcode-stdio-1.0
37 |   main-is: Tests.hs
38 |   other-modules:
39 |       Paths_RiggedRegex
40 |   hs-source-dirs:
41 |       test
42 |   build-depends:
43 |       RiggedRegex
44 |     , base
45 |     , containers
46 |     , hspec
47 |   default-language: Haskell2010
48 | 


--------------------------------------------------------------------------------
/haskell/06_RiggedRegex_Combinator/package.yaml:
--------------------------------------------------------------------------------
 1 | name: RiggedRegex
 2 | version:             0.1.0.0
 3 | 
 4 | homepage:            https://github.com/elfsternberg/riggedregex#readme
 5 | license:             BSD3
 6 | license-file:        LICENSE
 7 | author:              Elf M. Sternberg
 8 | maintainer:          elf.sternberg@gmail.com
 9 | copyright:           Copyright ⓒ 2019 Elf M. Sternberg
10 | category:            Regex
11 | build-type:          Simple
12 | extra-source-files:  README.md
13 | 
14 | dependencies:
15 |   - containers
16 |   - base
17 | 
18 | library:
19 |   exposed-modules: RiggedRegex
20 |   ghc-options: -Wall
21 |   source-dirs: src
22 | 
23 | tests:
24 |   test:
25 |     main: Tests.hs
26 |     source-dirs: test
27 |     dependencies:
28 |       - RiggedRegex
29 |       - hspec
30 |   
31 | 
32 | 


--------------------------------------------------------------------------------
/haskell/06_RiggedRegex_Combinator/src/RiggedRegex.hs:
--------------------------------------------------------------------------------
  1 | {-# LANGUAGE LambdaCase #-}
  2 | {-# LANGUAGE FlexibleInstances #-}
  3 | 
  4 | module RiggedRegex ( accept, acceptw, Reg (..), Regw (..), rigged, riggeds ) where
  5 | 
  6 | import Data.Set hiding (split)
  7 |     
  8 | data Reg = 
  9 |     Eps         -- Epsilon
 10 |   | Sym Char    -- Character
 11 |   | Alt Reg Reg -- Alternation
 12 |   | Seq Reg Reg -- Sequence
 13 |   | Rep Reg     -- R*
 14 | 
 15 | accept :: Reg -> String -> Bool
 16 | -- Epsilon
 17 | accept Eps u       = Prelude.null u
 18 | -- Accept if the character offered matches the character constructed
 19 | accept (Sym c) u   = u == [c]
 20 | -- Constructed of two other expressions, accept if either one does.
 21 | accept (Alt p q) u = accept p u || accept q u
 22 | -- Constructed of two other expressions, accept if p accepts some part
 23 | -- of u and q accepts the rest, where u is split arbitrarily
 24 | accept (Seq p q) u = or [accept p u1 && accept q u2 | (u1, u2) <- split u]
 25 | -- For all convolutions of u containing no empty strings,
 26 | -- if all the entries of that convolution are accepted,
 27 | -- then at least one convolution is acceptable.
 28 | accept (Rep r) u   = or [and [accept r ui | ui <- ps] | ps <- parts u]
 29 | 
 30 | -- Generate a list of all possible combinations of a prefix and suffix
 31 | -- for the string offered.w
 32 | split :: [a] -> [([a], [a])]
 33 | split []     = [([], [])]
 34 | split (c:cs) = ([], c : cs) : [(c : s1, s2) | (s1, s2) <- split cs]
 35 | 
 36 | -- Generate lists of lists that contain all possible convolutions of
 37 | -- the input string, not including the empty string.
 38 | parts :: [a] -> [[[a]]]
 39 | parts []     = [[]]
 40 | parts [c]    = [[[c]]]
 41 | parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs]
 42 | 
 43 | -- A semiring is an algebraic structure with a zero, a one, a
 44 | -- "multiplication" operation, and an "addition" operation.  Zero is
 45 | -- the identity operator for addition, One is the identity operator for
 46 | -- multiplication, both composition operators are associative (it does
 47 | -- not matter how sequential operations are grouped), and addition is
 48 | -- commutative (the order of the operations does not matter).  Also,
 49 | -- zero `mul` anything is always zero.
 50 | --
 51 | -- Which, in regular expressions in general, holds that the null regex
 52 | -- is zero, and the empty string regex is one, alternation is addition
 53 | -- and ... sequence is multiplication?  Like "sum" and "product" types?
 54 |                
 55 | -- Symw (c -> s) represents a mapping from a symbol to its given weight.
 56 | 
 57 | class Semiring s where
 58 |     zero, one :: s
 59 |     mul, add  :: s -> s -> s
 60 | 
 61 | sym :: Semiring s => Char -> Regw Char s
 62 | sym c = Symw (\b -> if b == c then one else zero)      
 63 |                 
 64 | data Regw c s =                
 65 |     Epsw                       -- Epsilon
 66 |   | Symw (c -> s)              -- Character
 67 |   | Altw (Regw c s) (Regw c s) -- Alternation
 68 |   | Seqw (Regw c s) (Regw c s) -- Sequence
 69 |   | Repw (Regw c s)            -- R*
 70 | 
 71 | rigging :: Semiring s => (Char -> Regw Char s) -> Reg -> Regw Char s
 72 | rigging s = \case
 73 |          Eps       -> Epsw
 74 |          (Sym c)   -> s c
 75 |          (Alt p q) -> Altw (rigging s p) (rigging s q)
 76 |          (Seq p q) -> Seqw (rigging s p) (rigging s q)
 77 |          (Rep r)   -> Repw (rigging s r)
 78 | 
 79 | rigged :: Semiring s => Reg -> Regw Char s
 80 | rigged = rigging sym
 81 | 
 82 | acceptw :: Semiring s => Regw c s -> [c] -> s
 83 | acceptw Epsw u     = if Prelude.null u then one else zero
 84 | acceptw (Symw f) u =
 85 |     case u of
 86 |       [c] -> f c
 87 |       _ -> zero
 88 | acceptw (Altw p q) u = acceptw p u `add` acceptw q u
 89 | acceptw (Seqw p q) u = sumr [ acceptw p u1 `mul` acceptw q u2 | (u1, u2) <- split u ]
 90 | acceptw (Repw r)   u = sumr [ prodr [ acceptw r ui | ui <- ps ] | ps <- parts u ]
 91 | 
 92 | -- Something feels hacky about this.  I mean, I know, on the one
 93 | -- hand than any epsilon is still "one" as far as the system is
 94 | -- concerned; on the other hand, I would much rather have a better
 95 | -- theoretical ground for what I just did here...
 96 |                        
 97 | syms :: Char -> Regw Char (Set String)
 98 | syms c = Symw (\b -> if b == c then singleton [c] else zero)
 99 |          
100 | riggeds :: Reg -> Regw Char (Set String)
101 | riggeds = rigging syms
102 | 
103 | sumr, prodr :: Semiring r => [r] -> r
104 | sumr = Prelude.foldr add zero
105 | prodr = Prelude.foldr mul one
106 | 
107 | instance Semiring Int where
108 |     zero = 0
109 |     one = 1
110 |     add = (Prelude.+)
111 |     mul = (Prelude.*)
112 | 
113 | instance Semiring Bool where
114 |     zero = False
115 |     one = True
116 |     add = (||)
117 |     mul = (&&)
118 | 
119 | -- εs = {(ε, s)} Empty Word
120 | -- c =  {(c, c)} Token
121 | -- L1 ◦ L2 = {(uv,(s, t)) | (u, s) ∈ L1 and (v, t) ∈ L2} Concatenation
122 | -- L1 ∪ L2 = {(u, s) | (u, s)} ∈ L1 Alternation
123 | 
124 | -- Boolean Semiring (TRUE, FALSE,∨,∧, FALSE, TRUE) recognition
125 | -- Inside Semiring (R(1/0), +, ×, 0, 1) string probability
126 | -- Counting Semiring (N(∞/0), +, ×, 0, 1) number of derivations
127 | -- Derivation Forests Semiring (2E,∪, ·, ∅, {<>}) set of derivation
128 | 
129 | instance Semiring (Set String) where
130 |     zero    = empty 
131 |     one     = singleton ""
132 |     add     = union
133 |     mul a b = Data.Set.map (uncurry (++)) $ cartesianProduct a b
134 | 


--------------------------------------------------------------------------------
/haskell/06_RiggedRegex_Combinator/stack.yaml:
--------------------------------------------------------------------------------
 1 | # This file was automatically generated by 'stack init'
 2 | #
 3 | # Some commonly used options have been documented as comments in this file.
 4 | # For advanced use and comprehensive documentation of the format, please see:
 5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/
 6 | 
 7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version.
 8 | # A snapshot resolver dictates the compiler version and the set of packages
 9 | # to be used for project dependencies. For example:
10 | #
11 | # resolver: lts-3.5
12 | # resolver: nightly-2015-09-21
13 | # resolver: ghc-7.10.2
14 | #
15 | # The location of a snapshot can be provided as a file or url. Stack assumes
16 | # a snapshot provided as a file might change, whereas a url resource does not.
17 | #
18 | # resolver: ./custom-snapshot.yaml
19 | # resolver: https://example.com/snapshots/2018-01-01.yaml
20 | resolver: lts-13.4
21 | 
22 | # User packages to be built.
23 | # Various formats can be used as shown in the example below.
24 | #
25 | # packages:
26 | # - some-directory
27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz
28 | # - location:
29 | #    git: https://github.com/commercialhaskell/stack.git
30 | #    commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
32 | #  subdirs:
33 | #  - auto-update
34 | #  - wai
35 | packages:
36 | - .
37 | # Dependency packages to be pulled from upstream that are not in the resolver
38 | # using the same syntax as the packages field.
39 | # (e.g., acme-missiles-0.3)
40 | # extra-deps: []
41 | 
42 | # Override default flag values for local packages and extra-deps
43 | # flags: {}
44 | 
45 | # Extra package databases containing global packages
46 | # extra-package-dbs: []
47 | 
48 | # Control whether we use the GHC we find on the path
49 | # system-ghc: true
50 | #
51 | # Require a specific version of stack, using version ranges
52 | # require-stack-version: -any # Default
53 | # require-stack-version: ">=1.9"
54 | #
55 | # Override the architecture used by stack, especially useful on Windows
56 | # arch: i386
57 | # arch: x86_64
58 | #
59 | # Extra directories used by stack for building
60 | # extra-include-dirs: [/path/to/dir]
61 | # extra-lib-dirs: [/path/to/dir]
62 | #
63 | # Allow a newer minor version of GHC than the snapshot specifies
64 | # compiler-check: newer-minor
65 | 


--------------------------------------------------------------------------------
/haskell/06_RiggedRegex_Combinator/test/Tests.hs:
--------------------------------------------------------------------------------
  1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-}
  2 | {-# LANGUAGE RecordWildCards #-}
  3 | import           Data.Foldable     (for_)
  4 | import Test.Hspec        (Spec, it, shouldBe)
  5 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)
  6 | import RiggedRegex (Reg (..), accept, acceptw, rigged, riggeds)
  7 | import Data.Set
  8 | import Data.List (sort)
  9 |     
 10 | main :: IO ()
 11 | main = hspecWith defaultConfig {configFastFail = True} specs
 12 | 
 13 | specs :: Spec
 14 | specs = do
 15 | 
 16 |      let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) )
 17 |      let onec = Seq nocs (Sym 'c')
 18 |      let evencs = Seq ( Rep ( Seq onec onec ) ) nocs
 19 | 
 20 |      let as = Alt (Sym 'a') (Rep (Sym 'a'))                  
 21 |      let bs = Alt (Sym 'b') (Rep (Sym 'b'))
 22 | 
 23 |      it "simple expression" $
 24 |         accept evencs "acc" `shouldBe` True
 25 | 
 26 |      it "lifted expression" $
 27 |         (acceptw (rigged evencs) "acc" :: Bool) `shouldBe` True
 28 | 
 29 |      it "lifted expression short" $
 30 |         (acceptw (rigged evencs) "acc" :: Int) `shouldBe` 1
 31 | 
 32 |      it "lifted expression counter two" $
 33 |         (acceptw (rigged as) "a" :: Int) `shouldBe` 2
 34 | 
 35 |      it "lifted expression counter one" $
 36 |         (acceptw (rigged as) "aa" :: Int) `shouldBe` 1
 37 | 
 38 |      it "lifted expression dynamic counter four" $
 39 |         (acceptw (rigged (Seq as bs)) "ab" :: Int) `shouldBe` 4
 40 | 
 41 |      it "parse forests" $
 42 |             (sort $ toList $ (acceptw (riggeds (Seq as bs)) "ab" :: Set String)) `shouldBe` ["ab"]
 43 |                                                   
 44 |      for_ cases test
 45 |         where
 46 |           test Case {..} = it description assertion
 47 |               where
 48 |                 assertion = (acceptw (rigged regex) sample :: Bool) `shouldBe` result
 49 | 
 50 | data Case = Case
 51 |   { description :: String
 52 |   , regex       :: Reg
 53 |   , sample      :: String
 54 |   , result      :: Bool
 55 |   }
 56 | 
 57 | cases :: [Case]
 58 | cases =
 59 |   [ Case {description = "empty", regex = Eps, sample = "", result = True}
 60 |   , Case {description = "char", regex = Sym 'a', sample = "a", result = True}
 61 |   , Case
 62 |       {description = "not char", regex = Sym 'a', sample = "b", result = False}
 63 |   , Case
 64 |       { description = "char vs empty"
 65 |       , regex = Sym 'a'
 66 |       , sample = ""
 67 |       , result = False
 68 |       }
 69 |   , Case
 70 |       { description = "left alt"
 71 |       , regex = Alt (Sym 'a') (Sym 'b')
 72 |       , sample = "a"
 73 |       , result = True
 74 |       }
 75 |   , Case
 76 |       { description = "right alt"
 77 |       , regex = Alt (Sym 'a') (Sym 'b')
 78 |       , sample = "b"
 79 |       , result = True
 80 |       }
 81 |   , Case
 82 |       { description = "neither alt"
 83 |       , regex = Alt (Sym 'a') (Sym 'b')
 84 |       , sample = "c"
 85 |       , result = False
 86 |       }
 87 |   , Case
 88 |       { description = "empty alt"
 89 |       , regex = Alt (Sym 'a') (Sym 'b')
 90 |       , sample = ""
 91 |       , result = False
 92 |       }
 93 |   , Case
 94 |       { description = "empty rep"
 95 |       , regex = Rep (Sym 'a')
 96 |       , sample = ""
 97 |       , result = True
 98 |       }
 99 |   , Case
100 |       { description = "one rep"
101 |       , regex = Rep (Sym 'a')
102 |       , sample = "a"
103 |       , result = True
104 |       }
105 |   , Case
106 |       { description = "multiple rep"
107 |       , regex = Rep (Sym 'a')
108 |       , sample = "aaaaaaaaa"
109 |       , result = True
110 |       }
111 |   , Case
112 |       { description = "multiple rep with failure"
113 |       , regex = Rep (Sym 'a')
114 |       , sample = "aaaaaaaaab"
115 |       , result = False
116 |       }
117 |   , Case
118 |       { description = "sequence"
119 |       , regex = Seq (Sym 'a') (Sym 'b')
120 |       , sample = "ab"
121 |       , result = True
122 |       }
123 |   , Case
124 |       { description = "sequence with empty"
125 |       , regex = Seq (Sym 'a') (Sym 'b')
126 |       , sample = ""
127 |       , result = False
128 |       }
129 |   , Case
130 |       { description = "bad short sequence"
131 |       , regex = Seq (Sym 'a') (Sym 'b')
132 |       , sample = "a"
133 |       , result = False
134 |       }
135 |   , Case
136 |       { description = "bad long sequence"
137 |       , regex = Seq (Sym 'a') (Sym 'b')
138 |       , sample = "abc"
139 |       , result = False
140 |       }
141 |   ]
142 |           
143 |   
144 | 


--------------------------------------------------------------------------------
/haskell/07_Rigged_Glushkov/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2010, Thomas Wilke, Frank Huch, Sebastian Fischer
 2 | 
 3 | All rights reserved.
 4 | 
 5 | Redistribution and use in source and binary forms, with or without
 6 | modification, are permitted provided that the following conditions are
 7 | met:
 8 | 
 9 |  1. Redistributions of source code must retain the above copyright
10 |     notice, this list of conditions and the following disclaimer.
11 | 
12 |  2. Redistributions in binary form must reproduce the above copyright
13 |     notice, this list of conditions and the following disclaimer in
14 |     the documentation and/or other materials provided with the
15 |     distribution.
16 | 
17 |  3. Neither the name of the author nor the names of his contributors
18 |     may be used to endorse or promote products derived from this
19 |     software without specific prior written permission.
20 | 
21 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
22 | ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
23 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
24 | A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHORS OR
25 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
26 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
27 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
28 | PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
29 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
30 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
31 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
32 | 
33 | 


--------------------------------------------------------------------------------
/haskell/07_Rigged_Glushkov/README.md:
--------------------------------------------------------------------------------
 1 | # Rigged Glushkov Regular Expressions, in Haskell
 2 | 
 3 | This is by far the most successful Haskell experiment yet.  It builds on
 4 | Experiment 04, "Glushkov Regular Expressions," and adds the Semiring
 5 | implementation.
 6 | 
 7 | We use the familiar pattern of building our regular expressions using
 8 | the Kleene primitive pattern developed for Experiment 01, then lift the
 9 | constructed expression into our Gluskhov representation and run it
10 | through a modified version of the 'shift' function to produce a result.
11 | In this version, as in previous rigged versions, we apply the logic of
12 | regular expressions to our semiring data during parsing.
13 | 
14 | One thing that was necessary here was that, to support more complex
15 | semirings, those that are not just primitive data with simple zero or
16 | one representations, I needed to provide a constructor to the shift
17 | function that knew how to build new symbol operations.  When you "rig"
18 | the Kleene representation, you must provide a function that takes a char
19 | and returns a symbol operator that includes the semiring.
20 | 
21 | Rigging examples were *not* included in the paper.  This was the first
22 | experiment where I had to come up with some parts of the solution on my
23 | own, and solving it was a fun problem.  This particular version took
24 | about four hours to puzzle out, but it was worth it.  I'm sure there are
25 | alternatives to my rigging-with-constructor solution, but this works and
26 | I'm not unhappy with it.  It does look a bit cluttered, but that's
27 | actually how it's presented in the paper; my solution actually reduces
28 | some of the clutter.
29 | 
30 | Otherwise, this version works pretty much the same way you'd expect a
31 | merger of the Kleene Semiring version and the Glushkov boolean version
32 | work.
33 | 
34 | One thing that came out of the paper was the use of a Haskell
35 | record-type to record whether or not a node had already been analyzed
36 | for its finality and emptiness; this caches those results and "shorts
37 | out" traversing down the tree to rediscover these properties, resulting
38 | in a bit of a speed-up.
39 | 


--------------------------------------------------------------------------------
/haskell/07_Rigged_Glushkov/RiggedGlushkov.cabal:
--------------------------------------------------------------------------------
 1 | cabal-version: 1.12
 2 | 
 3 | -- This file has been generated from package.yaml by hpack version 0.31.1.
 4 | --
 5 | -- see: https://github.com/sol/hpack
 6 | --
 7 | -- hash: b650d4292e70e7a507191f08e2c62b80a0eb311278b7aef3a2a084f2dac0c3ca
 8 | 
 9 | name:           RiggedGlushkov
10 | version:        0.1.0.0
11 | category:       Regex
12 | homepage:       https://github.com/elfsternberg/riggedregex#readme
13 | author:         Elf M. Sternberg
14 | maintainer:     elf.sternberg@gmail.com
15 | copyright:      Copyright ⓒ 2019 Elf M. Sternberg
16 | license:        BSD3
17 | license-file:   LICENSE
18 | build-type:     Simple
19 | extra-source-files:
20 |     README.md
21 | 
22 | library
23 |   exposed-modules:
24 |       RiggedGlushkov
25 |   other-modules:
26 |       Paths_RiggedGlushkov
27 |   hs-source-dirs:
28 |       src
29 |   ghc-options: -Wall
30 |   build-depends:
31 |       base
32 |     , containers
33 |   default-language: Haskell2010
34 | 
35 | test-suite test
36 |   type: exitcode-stdio-1.0
37 |   main-is: Tests.hs
38 |   other-modules:
39 |       Paths_RiggedGlushkov
40 |   hs-source-dirs:
41 |       test
42 |   build-depends:
43 |       RiggedGlushkov
44 |     , base
45 |     , containers
46 |     , hspec
47 |   default-language: Haskell2010
48 | 


--------------------------------------------------------------------------------
/haskell/07_Rigged_Glushkov/package.yaml:
--------------------------------------------------------------------------------
 1 | name:                RiggedGlushkov
 2 | version:             0.1.0.0
 3 | 
 4 | homepage:            https://github.com/elfsternberg/riggedregex#readme
 5 | license:             BSD3
 6 | license-file:        LICENSE
 7 | author:              Elf M. Sternberg
 8 | maintainer:          elf.sternberg@gmail.com
 9 | copyright:           Copyright ⓒ 2019 Elf M. Sternberg
10 | category:            Regex
11 | build-type:          Simple
12 | extra-source-files:  README.md
13 | 
14 | dependencies:
15 |     - containers
16 |     - base
17 | 
18 | library:
19 |   exposed-modules: RiggedGlushkov
20 |   ghc-options: -Wall
21 |   source-dirs: src
22 | 
23 | tests:
24 |   test:
25 |     main: Tests.hs
26 |     source-dirs: test
27 |     dependencies:
28 |       - RiggedGlushkov
29 |       - hspec
30 |   
31 | 


--------------------------------------------------------------------------------
/haskell/07_Rigged_Glushkov/src/RiggedGlushkov.hs:
--------------------------------------------------------------------------------
  1 | {-# LANGUAGE FlexibleInstances #-}
  2 | {-# LANGUAGE LambdaCase        #-}
  3 | 
  4 | module RiggedGlushkov ( Glu(..), acceptg, rigged, riggeds ) where
  5 | 
  6 | import           Data.Set hiding (foldl, split)
  7 | 
  8 | data Glu
  9 |   = Eps
 10 |   | Sym Bool Char
 11 |   | Alt Glu Glu
 12 |   | Seq Glu Glu
 13 |   | Rep Glu
 14 | 
 15 | -- Just as with the Kleene versions, we're going to exploit the fact
 16 | -- that we have a working version.  For Rust, we're going to do
 17 | -- something a little different.  But for now...
 18 | --
 19 | -- This is interesting.  The paper decides that, to keep the cost of
 20 | -- processing down, we're going to cache the results of empty and
 21 | -- final.  One of the prices paid, though, is in the complexity of the
 22 | -- data type for our expressions, and that complexity is now managed
 23 | -- through factories.
 24 | 
 25 | class Semiring s where
 26 |   zero, one :: s
 27 |   mul, add :: s -> s -> s
 28 | 
 29 | data Glue c s = Glue
 30 |   { emptye :: s
 31 |   , finale :: s
 32 |   , gluw   :: Gluw c s
 33 |   }
 34 | 
 35 | data Gluw c s
 36 |   = Epsw
 37 |   | Symw (c -> s)
 38 |   | Altw (Glue c s) (Glue c s)
 39 |   | Seqw (Glue c s) (Glue c s)
 40 |   | Repw (Glue c s)
 41 | 
 42 | epsw :: Semiring s => Glue c s
 43 | epsw = Glue {emptye = one, finale = zero, gluw = Epsw}
 44 | 
 45 | symw :: Semiring s => (c -> s) -> Glue c s
 46 | symw f = Glue {emptye = zero, finale = zero, gluw = Symw f}
 47 | 
 48 | altw :: Semiring s => Glue c s -> Glue c s -> Glue c s
 49 | altw l r =
 50 |   Glue
 51 |     { emptye = add (emptye l) (emptye r),
 52 |       finale = add (finale l) (finale r),
 53 |       gluw = Altw l r
 54 |     }
 55 | 
 56 | seqw :: Semiring s => Glue c s -> Glue c s -> Glue c s
 57 | seqw l r =
 58 |   Glue
 59 |     { emptye = mul (emptye l) (emptye r),
 60 |       finale = add (mul (finale l) (emptye r)) (finale r),
 61 |       gluw = Seqw l r
 62 |     }
 63 | 
 64 | repw :: Semiring s => Glue c s -> Glue c s
 65 | repw r = Glue {emptye = one, finale = finale r, gluw = Repw r}
 66 | 
 67 | -- for my edification, the syntax under Symw is syntax for "replace
 68 | -- this value in the created record."
 69 | --     > data Foo = Foo { a :: Int, b :: Int } deriving (Show)
 70 | --     > (Foo 1 2) { b = 4 }
 71 | --     Foo { a = 1, b = 4 }
 72 | -- It doesn't seem to be functional, i.e. Foo 1 2 $ { b = 4 } doesn't work.
 73 | shifte :: Semiring s => s -> Gluw c s -> c -> Glue c s
 74 | shifte _ Epsw _ = epsw
 75 | shifte m (Symw f) c = (symw f) {finale = m `mul` f c}
 76 | shifte m (Seqw l r) c =
 77 |   seqw
 78 |     (shifte m (gluw l) c)
 79 |     (shifte (add (m `mul` (emptye l)) (finale l)) (gluw r) c)
 80 | shifte m (Altw l r) c = altw (shifte m (gluw l) c) (shifte m (gluw r) c)
 81 | shifte m (Repw r) c = repw (shifte (m `add` finale r) (gluw r) c)
 82 | 
 83 | sym :: (Semiring s, Eq c) => c -> Glue c s
 84 | sym c = symw (\b -> if b == c then one else zero)
 85 | 
 86 | rigging :: Semiring s => (Char -> Glue Char s) -> Glu -> Glue Char s
 87 | rigging s =
 88 |   \case
 89 |     Eps -> epsw
 90 |     (Sym _ c) -> s c
 91 |     (Alt p q) -> altw (rigging s p) (rigging s q)
 92 |     (Seq p q) -> seqw (rigging s p) (rigging s q)
 93 |     (Rep r) -> repw (rigging s r)
 94 | 
 95 | rigged :: Semiring s => Glu -> Glue Char s
 96 | rigged = rigging sym
 97 | 
 98 | syms :: Char -> Glue Char (Set String)
 99 | syms c = symw (\b -> if b == c then singleton [c] else zero)
100 | 
101 | riggeds :: Glu -> Glue Char (Set String)
102 | riggeds = rigging syms
103 | 
104 | instance Semiring (Set String) where
105 |   zero = empty
106 |   one = singleton ""
107 |   add = union
108 |   mul a b = Data.Set.map (uncurry (++)) $ cartesianProduct a b
109 | 
110 | instance Semiring Int where
111 |   zero = 0
112 |   one = 1
113 |   add = (Prelude.+)
114 |   mul = (Prelude.*)
115 | 
116 | instance Semiring Bool where
117 |   zero = False
118 |   one = True
119 |   add = (||)
120 |   mul = (&&)
121 | 
122 | acceptg :: Semiring s => Glue c s -> [c] -> s
123 | acceptg r [] = emptye r
124 | acceptg r (c:cs) =
125 |   finale (foldl (shifte zero . gluw) (shifte one (gluw r) c) cs)
126 | 


--------------------------------------------------------------------------------
/haskell/07_Rigged_Glushkov/stack.yaml:
--------------------------------------------------------------------------------
 1 | # This file was automatically generated by 'stack init'
 2 | #
 3 | # Some commonly used options have been documented as comments in this file.
 4 | # For advanced use and comprehensive documentation of the format, please see:
 5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/
 6 | 
 7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version.
 8 | # A snapshot resolver dictates the compiler version and the set of packages
 9 | # to be used for project dependencies. For example:
10 | #
11 | # resolver: lts-3.5
12 | # resolver: nightly-2015-09-21
13 | # resolver: ghc-7.10.2
14 | #
15 | # The location of a snapshot can be provided as a file or url. Stack assumes
16 | # a snapshot provided as a file might change, whereas a url resource does not.
17 | #
18 | # resolver: ./custom-snapshot.yaml
19 | # resolver: https://example.com/snapshots/2018-01-01.yaml
20 | resolver: lts-13.5
21 | 
22 | # User packages to be built.
23 | # Various formats can be used as shown in the example below.
24 | #
25 | # packages:
26 | # - some-directory
27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz
28 | # - location:
29 | #    git: https://github.com/commercialhaskell/stack.git
30 | #    commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
32 | #  subdirs:
33 | #  - auto-update
34 | #  - wai
35 | packages:
36 | - .
37 | # Dependency packages to be pulled from upstream that are not in the resolver
38 | # using the same syntax as the packages field.
39 | # (e.g., acme-missiles-0.3)
40 | # extra-deps: []
41 | 
42 | # Override default flag values for local packages and extra-deps
43 | # flags: {}
44 | 
45 | # Extra package databases containing global packages
46 | # extra-package-dbs: []
47 | 
48 | # Control whether we use the GHC we find on the path
49 | # system-ghc: true
50 | #
51 | # Require a specific version of stack, using version ranges
52 | # require-stack-version: -any # Default
53 | # require-stack-version: ">=1.9"
54 | #
55 | # Override the architecture used by stack, especially useful on Windows
56 | # arch: i386
57 | # arch: x86_64
58 | #
59 | # Extra directories used by stack for building
60 | # extra-include-dirs: [/path/to/dir]
61 | # extra-lib-dirs: [/path/to/dir]
62 | #
63 | # Allow a newer minor version of GHC than the snapshot specifies
64 | # compiler-check: newer-minor
65 | 


--------------------------------------------------------------------------------
/haskell/07_Rigged_Glushkov/test/Tests.hs:
--------------------------------------------------------------------------------
  1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-}
  2 | {-# LANGUAGE RecordWildCards #-}
  3 | 
  4 | import           Data.Foldable     (for_)
  5 | import           Test.Hspec        (Spec, describe, it, shouldBe)
  6 | import           Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)
  7 | import           RiggedGlushkov    (Glu (..), acceptg, rigged, riggeds)
  8 | import Data.Set
  9 | import Data.List (sort)
 10 | 
 11 | main :: IO ()
 12 | main = hspecWith defaultConfig {configFastFail = True} specs
 13 | 
 14 | msym :: Char -> Glu       
 15 | msym c = Sym False c       
 16 |        
 17 | specs :: Spec
 18 | specs = do
 19 | 
 20 |      let nocs = Rep ( Alt ( msym 'a' ) ( msym 'b' ) )
 21 |      let onec = Seq nocs (msym 'c')
 22 |      let evencs = Seq ( Rep ( Seq onec onec ) ) nocs
 23 | 
 24 |      let as = Alt (msym 'a') (Rep (msym 'a'))                  
 25 |      let bs = Alt (msym 'b') (Rep (msym 'b'))
 26 | 
 27 | --     it "lifted expression" $
 28 | --        (acceptg (rigged evencs) "acc" :: Bool) `shouldBe` True
 29 | 
 30 |      it "lifted expression short" $
 31 |         (acceptg (rigged evencs) "acc" :: Int) `shouldBe` 1
 32 | 
 33 |      it "lifted expression counter two" $
 34 |         (acceptg (rigged as) "a" :: Int) `shouldBe` 2
 35 | 
 36 |      it "lifted expression counter one" $
 37 |         (acceptg (rigged as) "aa" :: Int) `shouldBe` 1
 38 | 
 39 |      it "lifted expression dynamic counter four" $
 40 |         (acceptg (rigged (Seq as bs)) "ab" :: Int) `shouldBe` 4
 41 | 
 42 |      it "parse forests" $
 43 |             (sort $ toList $ (acceptg (riggeds (Seq as bs)) "ab" :: Set String)) `shouldBe` ["ab"]
 44 |                                                   
 45 |      for_ cases test
 46 |         where
 47 |           test Case {..} = it description assertion
 48 |               where
 49 |                 assertion = (acceptg (rigged regex) sample :: Bool) `shouldBe` result
 50 | 
 51 | data Case = Case
 52 |   { description :: String
 53 |   , regex       :: Glu
 54 |   , sample      :: String
 55 |   , result      :: Bool
 56 |   }
 57 | 
 58 | cases :: [Case]
 59 | cases =
 60 |   [ Case {description = "empty", regex = Eps, sample = "", result = True}
 61 |   , Case {description = "char", regex = msym 'a', sample = "a", result = True}
 62 |   , Case
 63 |       {description = "not char", regex = msym 'a', sample = "b", result = False}
 64 |   , Case
 65 |       { description = "char vs empty"
 66 |       , regex = msym 'a'
 67 |       , sample = ""
 68 |       , result = False
 69 |       }
 70 |   , Case
 71 |       { description = "left alt"
 72 |       , regex = Alt (msym 'a') (msym 'b')
 73 |       , sample = "a"
 74 |       , result = True
 75 |       }
 76 |   , Case
 77 |       { description = "right alt"
 78 |       , regex = Alt (msym 'a') (msym 'b')
 79 |       , sample = "b"
 80 |       , result = True
 81 |       }
 82 |   , Case
 83 |       { description = "neither alt"
 84 |       , regex = Alt (msym 'a') (msym 'b')
 85 |       , sample = "c"
 86 |       , result = False
 87 |       }
 88 |   , Case
 89 |       { description = "empty alt"
 90 |       , regex = Alt (msym 'a') (msym 'b')
 91 |       , sample = ""
 92 |       , result = False
 93 |       }
 94 |   , Case
 95 |       { description = "empty rep"
 96 |       , regex = Rep (msym 'a')
 97 |       , sample = ""
 98 |       , result = True
 99 |       }
100 |   , Case
101 |       { description = "one rep"
102 |       , regex = Rep (msym 'a')
103 |       , sample = "a"
104 |       , result = True
105 |       }
106 |   , Case
107 |       { description = "multiple rep"
108 |       , regex = Rep (msym 'a')
109 |       , sample = "aaaaaaaaa"
110 |       , result = True
111 |       }
112 |   , Case
113 |       { description = "multiple rep with failure"
114 |       , regex = Rep (msym 'a')
115 |       , sample = "aaaaaaaaab"
116 |       , result = False
117 |       }
118 |   , Case
119 |       { description = "sequence"
120 |       , regex = Seq (msym 'a') (msym 'b')
121 |       , sample = "ab"
122 |       , result = True
123 |       }
124 |   , Case
125 |       { description = "sequence with empty"
126 |       , regex = Seq (msym 'a') (msym 'b')
127 |       , sample = ""
128 |       , result = False
129 |       }
130 |   , Case
131 |       { description = "bad short sequence"
132 |       , regex = Seq (msym 'a') (msym 'b')
133 |       , sample = "a"
134 |       , result = False
135 |       }
136 |   , Case
137 |       { description = "bad long sequence"
138 |       , regex = Seq (msym 'a') (msym 'b')
139 |       , sample = "abc"
140 |       , result = False
141 |       }
142 |   ]
143 |           
144 |   
145 | 


--------------------------------------------------------------------------------
/haskell/08_Heavyweights/Heavyweights.cabal:
--------------------------------------------------------------------------------
 1 | cabal-version: 1.12
 2 | 
 3 | -- This file has been generated from package.yaml by hpack version 0.31.1.
 4 | --
 5 | -- see: https://github.com/sol/hpack
 6 | --
 7 | -- hash: c903f1e543aacf648806799a4c925c51ece2e6c833560c723350207fa137497f
 8 | 
 9 | name:           Heavyweights
10 | version:        0.1.0.0
11 | category:       Regex
12 | homepage:       https://github.com/elfsternberg/riggedregex#readme
13 | author:         Elf M. Sternberg
14 | maintainer:     elf.sternberg@gmail.com
15 | copyright:      Copyright ⓒ 2019 Elf M. Sternberg
16 | license:        MPL-2.0
17 | license-file:   LICENSE.md
18 | build-type:     Simple
19 | extra-source-files:
20 |     README.md
21 | 
22 | library
23 |   exposed-modules:
24 |       Heavyweights
25 |   other-modules:
26 |       Paths_Heavyweights
27 |   hs-source-dirs:
28 |       src
29 |   ghc-options: -Wall
30 |   build-depends:
31 |       base
32 |     , containers
33 |   default-language: Haskell2010
34 | 
35 | test-suite test
36 |   type: exitcode-stdio-1.0
37 |   main-is: Tests.hs
38 |   other-modules:
39 |       Paths_Heavyweights
40 |   hs-source-dirs:
41 |       test
42 |   build-depends:
43 |       Heavyweights
44 |     , base
45 |     , containers
46 |     , hspec
47 |   default-language: Haskell2010
48 | 


--------------------------------------------------------------------------------
/haskell/08_Heavyweights/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2010, Thomas Wilke, Frank Huch, Sebastian Fischer
 2 | 
 3 | All rights reserved.
 4 | 
 5 | Redistribution and use in source and binary forms, with or without
 6 | modification, are permitted provided that the following conditions are
 7 | met:
 8 | 
 9 |  1. Redistributions of source code must retain the above copyright
10 |     notice, this list of conditions and the following disclaimer.
11 | 
12 |  2. Redistributions in binary form must reproduce the above copyright
13 |     notice, this list of conditions and the following disclaimer in
14 |     the documentation and/or other materials provided with the
15 |     distribution.
16 | 
17 |  3. Neither the name of the author nor the names of his contributors
18 |     may be used to endorse or promote products derived from this
19 |     software without specific prior written permission.
20 | 
21 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
22 | ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
23 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
24 | A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHORS OR
25 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
26 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
27 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
28 | PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
29 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
30 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
31 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
32 | 
33 | 


--------------------------------------------------------------------------------
/haskell/08_Heavyweights/LICENSE.md:
--------------------------------------------------------------------------------
1 | See license in main directory
2 | 


--------------------------------------------------------------------------------
/haskell/08_Heavyweights/README.md:
--------------------------------------------------------------------------------
  1 | # Rigged Glushkov Regular Expressions in Haskell: Compliance Experiments
  2 | 
  3 | This implementation doesn't differ much from [Experiment 07: Rigged
  4 | Glushkov Regular Expressions in Haskell](../07_Rigged_Glushkov), except
  5 | that it adds two new Semiring implementations to the library.
  6 | 
  7 | Recall the basics of Semiring theory: There is a zero, a one, an
  8 | "addition" operation and a "multiplication" operation.  These two
  9 | operators have identities (in numbers, addition has zero, multiplication
 10 | has one) and the operators behave similarly (multiplication times zero
 11 | always equals zero, or "nothing."), and a data type on which these
 12 | operations work.
 13 | 
 14 | We've used these principles to do boolean recognition; multiplication is
 15 | the boolean `and` operator, used to encode sequences using annihilation:
 16 | any sequence that doesn't match is `False`, and `False && x` is always
 17 | `False`.  If the entire truth of an expression depends on an annihilated
 18 | sequence, then it's not true.
 19 | 
 20 | We've used it to count ambiguities, via integers: by using addition *as*
 21 | addition, we count the number of different regular languages encoded in
 22 | our initial expression that could have produced the string submitted,
 23 | thus revealing the number of ambiguities in our expression.  Each `or`
 24 | that returns 1 reveals a different path, so an alternate pattern will
 25 | return the sum of the paths that pass through it.
 26 | 
 27 | And we've even used it to identify the strings that match.  By saying
 28 | that our Semiring is a "set of strings", our addition is union (that is,
 29 | we keep the set of all paths through the alternate pattern), and
 30 | multiplication is the concatenation of the cartesian products of the two
 31 | elements of a sequence (so for a basic pattern with no alternatives it's
 32 | just a concatenation of the two strings, but for alternatives with
 33 | multiples, it's the concatenation of all possible combinations), we've
 34 | created a way to extract the exact string(s) that we submitted to the
 35 | machine that matched.
 36 | 
 37 | In *Heavyweights*, Fischer, Huch and Wilke go further and show how
 38 | clever choices among zeros and ones can lead to some rather powerful
 39 | outcomes.
 40 | 
 41 | The first thing to appreciate is that our symbol operator, `sym`, has
 42 | never actually been about symbols.  It's about predicates.  Our base
 43 | implementation has been to pass a closured comparison with our desired
 44 | symbol, returning zero or one.
 45 | 
 46 | For the string implementation, which is *not* covered in the paper and
 47 | which I managed to extract, successfully, from Might's work, I passed to
 48 | `sym` instead a closured comparison to the desired symbol, and the
 49 | return value was either the zero or `singleton [c]`, meaning a set with
 50 | a string of one character in it.  (I'm quite proud of that work; it both
 51 | affirmed my notion that Might & Adams had a semiring implementation,
 52 | they just didn't call it that, and that I was able to merge two
 53 | different equational systems, applying some notions of category theory
 54 | to do so.)
 55 | 
 56 | The definition of `sym` was: `sym :: (Semiring s, Eq c) => c -> Reg c
 57 | s`.  I added `syms`: `syms :: Char -> Reg Char (Set String)`.  Now the
 58 | three provide `symi`: symi :: Semiringi s => Char -> Reg (Int, Char) s`
 59 | This is a semiring that *takes* both an Int and a Char, and their
 60 | `accept` method `zip`s the input value with a position value, so that
 61 | both are available for processing.  Remember that everything else
 62 | depends on the Semiring, and *not* the input type; only `sym` cares.
 63 | 
 64 | Now they add an "indexed semiring," and to it provide a version of `sym`
 65 | that returns the `index` semiring when true, and zero otherwise.
 66 | 
 67 |     class Semiring s => Semiringi s where
 68 |         index :: Int -> s
 69 | 
 70 |     symi :: Semiringi s => Char -> Glue (Int, Char) s
 71 |     symi c = symw weight
 72 |         where weight (pos, x) | x == c    = index pos
 73 |                               | otherwise = zero
 74 | 
 75 | But what *is* the `index` semiring?  Here's where things get
 76 | interesting.  Fischer, et. al., want to encode the length of the longest
 77 | submatch.  The first thing they do is define submatch as a variant of
 78 | accept, with a lead-in that just matches everything.  This is okay, as
 79 | this is a Glushkov machine and that just means that the 'arb' NFA will
 80 | almost always be active, but it won't be important to us, it's not
 81 | working with `symi` values.
 82 | 
 83 |     submatch :: Semiring s => Glue (Int, c) s -> [c] -> s
 84 |     submatch r s =
 85 |         accept (seqw arb (seqw r arb)) (zip [0..] s)
 86 |             where arb = repw (symw (\_ -> one))
 87 | 
 88 | So... what are the zero and one of a "longest submatch" operation?  The
 89 | zero is that no match ever occurred.  The one is that a match is
 90 | possible, but hasn't yet occurred.  Any other value is a submatch.  The
 91 | final value is the longest interval of the submatch.
 92 | 
 93 | Fischer, et al. break up their semiring into two parts:
 94 | 
 95 |     data LeftLong = NoLeftLong | LeftLong Range deriving (Show)
 96 |     data Range = NoRange | Range Int Int deriving (Show)
 97 |     
 98 | `NoLeftLong` is zero; it could never happen, there was no match.
 99 | `NoRange` is the one, meaning it could still happen, it just hasn't
100 | yet.  And `Range` is a submatch that has been found.
101 | 
102 | For addition (which symbolizes alternation, recall), adding a failure to
103 | anything is the anything, no `add NoLeftLong x = x`, and that's true the
104 | other way.  Adding a range with an empty range is just the range, and
105 | adding two ranges is to pick the longer of the two.
106 | 
107 | For multiplication, again, multiplying by failure is just failure.
108 | Multiplying anything with `NoRange` means that the anything is preserved
109 | unchanged, and multiplying two ranges is a new range with the start of
110 | the first range and the end of the latter range. (Recall that for
111 | Semirings, the operations are associative but they are *not*
112 | commutative.  They may *be* commutative for some sets, but it's not a
113 | requirement of semirings and you shouldn't count on commutativity.)
114 | 
115 | 


--------------------------------------------------------------------------------
/haskell/08_Heavyweights/package.yaml:
--------------------------------------------------------------------------------
 1 | name:                Heavyweights
 2 | version:             0.1.0.0
 3 | 
 4 | homepage:            https://github.com/elfsternberg/riggedregex#readme
 5 | license:             MPL-2.0
 6 | license-file:        LICENSE.md
 7 | author:              Elf M. Sternberg
 8 | maintainer:          elf.sternberg@gmail.com
 9 | copyright:           Copyright ⓒ 2019 Elf M. Sternberg
10 | category:            Regex
11 | build-type:          Simple
12 | extra-source-files:  README.md
13 | 
14 | dependencies:
15 |   - base
16 |   - containers
17 | 
18 | library:
19 |   exposed-modules: Heavyweights
20 |   ghc-options: -Wall
21 |   source-dirs: src
22 | 
23 | tests:
24 |   test:
25 |     main: Tests.hs
26 |     source-dirs: test
27 |     dependencies:
28 |       - Heavyweights
29 |       - hspec
30 | 


--------------------------------------------------------------------------------
/haskell/08_Heavyweights/src/Heavyweights.hs:
--------------------------------------------------------------------------------
  1 | {-# LANGUAGE FlexibleInstances #-}
  2 | {-# LANGUAGE LambdaCase        #-}
  3 | 
  4 | module Heavyweights ( Reg(..), Glue, accept, rigged, riggeds, riggew, submatch, symi, altw, seqw, repw, LeftLong(..) ) where
  5 | 
  6 | import           Data.Set hiding (foldl, split)
  7 | 
  8 | data Reg
  9 |   = Eps
 10 |   | Sym Bool Char
 11 |   | Alt Reg Reg
 12 |   | Seq Reg Reg
 13 |   | Rep Reg
 14 | 
 15 | -- Just as with the Kleene versions, we're going to exploit the fact
 16 | -- that we have a working version.  For Rust, we're going to do
 17 | -- something a little different.  But for now...
 18 | --
 19 | -- This is interesting.  The paper decides that, to keep the cost of
 20 | -- processing down, we're going to cache the results of emptyg and
 21 | -- final.  One of the prices paid, though, is in the complexity of the
 22 | -- data type for our expressions, and that complexity is now managed
 23 | -- through factories.
 24 | 
 25 | class Semiring s where
 26 |   zero, one :: s
 27 |   mul, add :: s -> s -> s
 28 | 
 29 | data Glue c s = Glue
 30 |   { emptyg :: s
 31 |   , final :: s
 32 |   , glu   :: Glu c s
 33 |   }
 34 | 
 35 | -- 'Glu' is just the representative of the regex element
 36 | -- 'Glue' is the extended representation with cached values
 37 |               
 38 | data Glu c s
 39 |   = Epsw
 40 |   | Symw (c -> s)
 41 |   | Altw (Glue c s) (Glue c s)
 42 |   | Seqw (Glue c s) (Glue c s)
 43 |   | Repw (Glue c s)
 44 | 
 45 | epsw :: Semiring s => Glue c s
 46 | epsw = Glue {emptyg = one, final = zero, glu = Epsw}
 47 | 
 48 | symw :: Semiring s => (c -> s) -> Glue c s
 49 | symw f = Glue {emptyg = zero, final = zero, glu = Symw f}
 50 | 
 51 | altw :: Semiring s => Glue c s -> Glue c s -> Glue c s
 52 | altw l r =
 53 |   Glue
 54 |     { emptyg = add (emptyg l) (emptyg r),
 55 |       final = add (final l) (final r),
 56 |       glu = Altw l r
 57 |     }
 58 | 
 59 | seqw :: Semiring s => Glue c s -> Glue c s -> Glue c s
 60 | seqw l r =
 61 |   Glue
 62 |     { emptyg = mul (emptyg l) (emptyg r),
 63 |       final = add (mul (final l) (emptyg r)) (final r),
 64 |       glu = Seqw l r
 65 |     }
 66 | 
 67 | repw :: Semiring s => Glue c s -> Glue c s
 68 | repw r = Glue {emptyg = one, final = final r, glu = Repw r}
 69 | 
 70 | -- for my edification, the syntax under Symw is syntax for "replace
 71 | -- this value in the created record."
 72 | --     > data Foo = Foo { a :: Int, b :: Int } deriving (Show)
 73 | --     > (Foo 1 2) { b = 4 }
 74 | --     Foo { a = 1, b = 4 }
 75 | -- It doesn't seem to be functional, i.e. Foo 1 2 $ { b = 4 } doesn't work.
 76 | 
 77 | shift :: Semiring s => s -> Glu c s -> c -> Glue c s
 78 | shift _ Epsw _ = epsw
 79 | shift m (Symw f) c = (symw f) {final = m `mul` f c}
 80 | shift m (Seqw l r) c =
 81 |   seqw
 82 |     (shift m (glu l) c)
 83 |     (shift (add (m `mul` (emptyg l)) (final l)) (glu r) c)
 84 | shift m (Altw l r) c = altw (shift m (glu l) c) (shift m (glu r) c)
 85 | shift m (Repw r) c = repw (shift (m `add` final r) (glu r) c)
 86 | 
 87 | sym :: (Semiring s, Eq c) => c -> Glue c s
 88 | sym c = symw (\b -> if b == c then one else zero)
 89 | 
 90 | rigging :: Semiring s => (Char -> Glue t s) -> Reg -> Glue t s
 91 | rigging s =
 92 |   \case
 93 |     Eps -> epsw
 94 |     (Sym _ c) -> s c
 95 |     (Alt p q) -> altw (rigging s p) (rigging s q)
 96 |     (Seq p q) -> seqw (rigging s p) (rigging s q)
 97 |     (Rep r) -> repw (rigging s r)
 98 | 
 99 | rigged :: Semiring s => Reg -> Glue Char s
100 | rigged = rigging sym
101 | 
102 | syms :: Char -> Glue Char (Set String)
103 | syms c = symw (\b -> if b == c then singleton [c] else zero)
104 | 
105 | riggeds :: Reg -> Glue Char (Set String)
106 | riggeds = rigging syms
107 | 
108 | instance Semiring (Set String) where
109 |   zero = empty
110 |   one = singleton ""
111 |   add = union
112 |   mul a b = Data.Set.map (uncurry (++)) $ cartesianProduct a b
113 | 
114 | instance Semiring Int where
115 |   zero = 0
116 |   one = 1
117 |   add = (Prelude.+)
118 |   mul = (Prelude.*)
119 | 
120 | instance Semiring Bool where
121 |   zero = False
122 |   one = True
123 |   add = (||)
124 |   mul = (&&)
125 | 
126 | accept :: Semiring s => Glue c s -> [c] -> s
127 | accept r [] = emptyg r
128 | accept r (c:cs) =
129 |   final (foldl (shift zero . glu) (shift one (glu r) c) cs)
130 | 
131 | submatch :: Semiring s => Glue (Int, c) s -> [c] -> s
132 | submatch r s =
133 |     accept (seqw arb (seqw r arb)) (zip [0..] s)
134 |         where arb = repw (symw (\_ -> one))
135 |           
136 | class Semiring s => Semiringi s where
137 |     index :: Int -> s
138 | 
139 | symi :: Semiringi s => Char -> Glue (Int, Char) s
140 | symi c = symw weight
141 |     where weight (pos, x) | x == c    = index pos
142 |                           | otherwise = zero
143 | 
144 | riggew :: Semiringi s => Reg -> Glue (Int, Char) s
145 | riggew = rigging symi         
146 | 
147 | data Leftmost = NoLeft | Leftmost Start deriving (Show)
148 | data Start = NoStart | Start Int deriving (Show)
149 | 
150 | instance Semiring Leftmost where
151 |     zero = NoLeft
152 |     one  = Leftmost NoStart
153 |     add NoLeft x = x
154 |     add x NoLeft = x
155 |     add (Leftmost x) (Leftmost y) = Leftmost (leftmost x y)
156 |         where leftmost NoStart NoStart     = NoStart
157 |               leftmost NoStart (Start i)   = Start i
158 |               leftmost (Start i) NoStart   = Start i
159 |               leftmost (Start i) (Start j) = Start (min i j)
160 |     mul NoLeft _ = NoLeft
161 |     mul _ NoLeft = NoLeft
162 |     mul (Leftmost x) (Leftmost y) = Leftmost (start x y)
163 |         where start NoStart s = s
164 |               start s _       = s
165 | 
166 | instance Semiringi Leftmost where
167 |     index = Leftmost . Start
168 | 
169 | -- Leftlong Implementation!
170 |             
171 | data LeftLong = NoLeftLong | NoRange | Range Int Int deriving (Show, Eq)
172 | 
173 | instance Semiring LeftLong where
174 |     zero = NoLeftLong
175 |     one  = NoRange
176 | 
177 | -- The addition of two leftlongs is the selection
178 | -- of the longer of the two, provided there are
179 | -- two.
180 | 
181 |     add NoLeftLong x    = x
182 |     add x NoLeftLong    = x
183 |     add NoRange x       = x
184 |     add x NoRange       = x
185 |     add (Range i j) (Range k l)
186 |         | i < k || i == k && j > l = Range i j
187 |         | otherwise             = Range k l
188 | 
189 | -- The multiplication of two leftlongs is the the longest possible
190 | -- range among the leftlongs provided; the zero is still annhilation,
191 | -- the one is still identity, and `mul` here is the start of the left
192 | -- component and the end of the right component.
193 | 
194 |     mul NoLeftLong _ = NoLeftLong
195 |     mul _ NoLeftLong = NoLeftLong
196 |     mul NoRange x    = x
197 |     mul x NoRange    = x
198 |     mul (Range i _) (Range _ l) = Range i l
199 | 
200 | instance Semiringi LeftLong where
201 |     index i = Range i i                                        
202 |                                               
203 | 


--------------------------------------------------------------------------------
/haskell/08_Heavyweights/stack.yaml:
--------------------------------------------------------------------------------
 1 | # This file was automatically generated by 'stack init'
 2 | #
 3 | # Some commonly used options have been documented as comments in this file.
 4 | # For advanced use and comprehensive documentation of the format, please see:
 5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/
 6 | 
 7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version.
 8 | # A snapshot resolver dictates the compiler version and the set of packages
 9 | # to be used for project dependencies. For example:
10 | #
11 | # resolver: lts-3.5
12 | # resolver: nightly-2015-09-21
13 | # resolver: ghc-7.10.2
14 | #
15 | # The location of a snapshot can be provided as a file or url. Stack assumes
16 | # a snapshot provided as a file might change, whereas a url resource does not.
17 | #
18 | # resolver: ./custom-snapshot.yaml
19 | # resolver: https://example.com/snapshots/2018-01-01.yaml
20 | resolver: lts-13.7
21 | 
22 | # User packages to be built.
23 | # Various formats can be used as shown in the example below.
24 | #
25 | # packages:
26 | # - some-directory
27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz
28 | # - location:
29 | #    git: https://github.com/commercialhaskell/stack.git
30 | #    commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
32 | #  subdirs:
33 | #  - auto-update
34 | #  - wai
35 | packages:
36 | - .
37 | # Dependency packages to be pulled from upstream that are not in the resolver
38 | # using the same syntax as the packages field.
39 | # (e.g., acme-missiles-0.3)
40 | # extra-deps: []
41 | 
42 | # Override default flag values for local packages and extra-deps
43 | # flags: {}
44 | 
45 | # Extra package databases containing global packages
46 | # extra-package-dbs: []
47 | 
48 | # Control whether we use the GHC we find on the path
49 | # system-ghc: true
50 | #
51 | # Require a specific version of stack, using version ranges
52 | # require-stack-version: -any # Default
53 | # require-stack-version: ">=1.9"
54 | #
55 | # Override the architecture used by stack, especially useful on Windows
56 | # arch: i386
57 | # arch: x86_64
58 | #
59 | # Extra directories used by stack for building
60 | # extra-include-dirs: [/path/to/dir]
61 | # extra-lib-dirs: [/path/to/dir]
62 | #
63 | # Allow a newer minor version of GHC than the snapshot specifies
64 | # compiler-check: newer-minor
65 | 


--------------------------------------------------------------------------------
/haskell/08_Heavyweights/test/Tests.hs:
--------------------------------------------------------------------------------
  1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-}
  2 | {-# LANGUAGE RecordWildCards #-}
  3 | 
  4 | import           Data.Foldable     (for_)
  5 | import           Test.Hspec        (Spec, describe, it, shouldBe)
  6 | import           Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)
  7 | import           Heavyweights    (Reg (..), Glue, accept, rigged, riggeds, riggew, submatch, symi, altw, seqw, repw, LeftLong(..))
  8 | import Data.Set
  9 | import Data.List (sort)
 10 | 
 11 | main :: IO ()
 12 | main = hspecWith defaultConfig {configFastFail = True} specs
 13 | 
 14 | msym :: Char -> Reg
 15 | msym c = Sym False c       
 16 |        
 17 | specs :: Spec
 18 | specs = do
 19 | 
 20 |      let nocs = Rep ( Alt ( msym 'a' ) ( msym 'b' ) )
 21 |      let onec = Seq nocs (msym 'c')
 22 |      let evencs = Seq ( Rep ( Seq onec onec ) ) nocs
 23 | 
 24 |      let as = Alt (msym 'a') (Rep (msym 'a'))                  
 25 |      let bs = Alt (msym 'b') (Rep (msym 'b'))
 26 | 
 27 | --     it "lifted expression" $
 28 | --        (accept (rigged evencs) "acc" :: Bool) `shouldBe` True
 29 | 
 30 |      it "lifted expression short" $
 31 |         (accept (rigged evencs) "acc" :: Int) `shouldBe` 1
 32 | 
 33 |      it "lifted expression counter two" $
 34 |         (accept (rigged as) "a" :: Int) `shouldBe` 2
 35 | 
 36 |      it "lifted expression counter one" $
 37 |         (accept (rigged as) "aa" :: Int) `shouldBe` 1
 38 | 
 39 |      it "lifted expression dynamic counter four" $
 40 |         (accept (rigged (Seq as bs)) "ab" :: Int) `shouldBe` 4
 41 | 
 42 |      it "parse forests" $
 43 |             (sort $ toList $ (accept (riggeds (Seq as bs)) "ab" :: Set String)) `shouldBe` ["ab"]
 44 | 
 45 |      let aa = symi 'a'
 46 |      let ab = repw (aa `altw` symi 'b')
 47 |      let aaba = aa `seqw` ab `seqw` aa
 48 | 
 49 |      it "submatch noleft" $
 50 |         (submatch aaba "ab" :: LeftLong ) `shouldBe` NoLeftLong
 51 | 
 52 |      it "submatch shortrange" $
 53 |         (submatch aaba "aa" :: LeftLong ) `shouldBe` (Range 0 1)
 54 | 
 55 |      it "submatch fullrange" $
 56 |         (submatch aaba "bababa" :: LeftLong ) `shouldBe` (Range 1 5)
 57 |                                                 
 58 |      for_ cases test
 59 |         where
 60 |           test Case {..} = it description assertion
 61 |               where
 62 |                 assertion = (accept (rigged regex) sample :: Bool) `shouldBe` result
 63 | 
 64 | data Case = Case
 65 |   { description :: String
 66 |   , regex       :: Reg
 67 |   , sample      :: String
 68 |   , result      :: Bool
 69 |   }
 70 | 
 71 | cases :: [Case]
 72 | cases =
 73 |   [ Case {description = "empty", regex = Eps, sample = "", result = True}
 74 |   , Case {description = "char", regex = msym 'a', sample = "a", result = True}
 75 |   , Case
 76 |       {description = "not char", regex = msym 'a', sample = "b", result = False}
 77 |   , Case
 78 |       { description = "char vs empty"
 79 |       , regex = msym 'a'
 80 |       , sample = ""
 81 |       , result = False
 82 |       }
 83 |   , Case
 84 |       { description = "left alt"
 85 |       , regex = Alt (msym 'a') (msym 'b')
 86 |       , sample = "a"
 87 |       , result = True
 88 |       }
 89 |   , Case
 90 |       { description = "right alt"
 91 |       , regex = Alt (msym 'a') (msym 'b')
 92 |       , sample = "b"
 93 |       , result = True
 94 |       }
 95 |   , Case
 96 |       { description = "neither alt"
 97 |       , regex = Alt (msym 'a') (msym 'b')
 98 |       , sample = "c"
 99 |       , result = False
100 |       }
101 |   , Case
102 |       { description = "empty alt"
103 |       , regex = Alt (msym 'a') (msym 'b')
104 |       , sample = ""
105 |       , result = False
106 |       }
107 |   , Case
108 |       { description = "empty rep"
109 |       , regex = Rep (msym 'a')
110 |       , sample = ""
111 |       , result = True
112 |       }
113 |   , Case
114 |       { description = "one rep"
115 |       , regex = Rep (msym 'a')
116 |       , sample = "a"
117 |       , result = True
118 |       }
119 |   , Case
120 |       { description = "multiple rep"
121 |       , regex = Rep (msym 'a')
122 |       , sample = "aaaaaaaaa"
123 |       , result = True
124 |       }
125 |   , Case
126 |       { description = "multiple rep with failure"
127 |       , regex = Rep (msym 'a')
128 |       , sample = "aaaaaaaaab"
129 |       , result = False
130 |       }
131 |   , Case
132 |       { description = "sequence"
133 |       , regex = Seq (msym 'a') (msym 'b')
134 |       , sample = "ab"
135 |       , result = True
136 |       }
137 |   , Case
138 |       { description = "sequence with empty"
139 |       , regex = Seq (msym 'a') (msym 'b')
140 |       , sample = ""
141 |       , result = False
142 |       }
143 |   , Case
144 |       { description = "bad short sequence"
145 |       , regex = Seq (msym 'a') (msym 'b')
146 |       , sample = "a"
147 |       , result = False
148 |       }
149 |   , Case
150 |       { description = "bad long sequence"
151 |       , regex = Seq (msym 'a') (msym 'b')
152 |       , sample = "abc"
153 |       , result = False
154 |       }
155 |   ]
156 |           
157 |   
158 | 


--------------------------------------------------------------------------------
/haskell/09_Classed_Brzozowski/BrzExp.cabal:
--------------------------------------------------------------------------------
 1 | cabal-version: 1.12
 2 | 
 3 | -- This file has been generated from package.yaml by hpack version 0.31.1.
 4 | --
 5 | -- see: https://github.com/sol/hpack
 6 | --
 7 | -- hash: bf664c2df8f31bc5a6ffd907ebaf10824d56d8d271f07d3f4cc7562ce9d44395
 8 | 
 9 | name:           BrzExp
10 | version:        0.1.0.0
11 | category:       Regex
12 | homepage:       https://github.com/elfsternberg/riggedregex#readme
13 | author:         Elf M. Sternberg
14 | maintainer:     elf.sternberg@gmail.com
15 | copyright:      Copyright ⓒ 2019 Elf M. Sternberg
16 | license:        BSD3
17 | license-file:   LICENSE
18 | build-type:     Simple
19 | extra-source-files:
20 |     README.md
21 | 
22 | library
23 |   exposed-modules:
24 |       BrzExp
25 |   other-modules:
26 |       Paths_BrzExp
27 |   hs-source-dirs:
28 |       src
29 |   ghc-options: -Wall
30 |   build-depends:
31 |       base
32 |   default-language: Haskell2010
33 | 
34 | test-suite test
35 |   type: exitcode-stdio-1.0
36 |   main-is: Tests.hs
37 |   other-modules:
38 |       Paths_BrzExp
39 |   hs-source-dirs:
40 |       test
41 |   build-depends:
42 |       BrzExp
43 |     , base
44 |     , hspec
45 |   default-language: Haskell2010
46 | 


--------------------------------------------------------------------------------
/haskell/09_Classed_Brzozowski/LICENSE:
--------------------------------------------------------------------------------
1 | The Brzozowski experiments are original work, and are copyright and
2 | licensed by Elf M. Sternberg according to the Mozilla 2.0 Public
3 | License.  See the LICENSE.md file in the main directory
4 | 


--------------------------------------------------------------------------------
/haskell/09_Classed_Brzozowski/README.md:
--------------------------------------------------------------------------------
1 | # Brzozowski Regular Expressions, in Haskell
2 | 
3 | This is a regex recognizer implementing Brzozowski's Algorithm, in
4 | Haskell.
5 | 
6 | 
7 | 
8 | 


--------------------------------------------------------------------------------
/haskell/09_Classed_Brzozowski/Setup.hs:
--------------------------------------------------------------------------------
1 | import Distribution.Simple
2 | main = defaultMain
3 | 


--------------------------------------------------------------------------------
/haskell/09_Classed_Brzozowski/package.yaml:
--------------------------------------------------------------------------------
 1 | name:                BrzExp
 2 | version:             0.1.0.0
 3 | 
 4 | homepage:            https://github.com/elfsternberg/riggedregex#readme
 5 | license:             BSD3
 6 | license-file:        LICENSE
 7 | author:              Elf M. Sternberg
 8 | maintainer:          elf.sternberg@gmail.com
 9 | copyright:           Copyright ⓒ 2019 Elf M. Sternberg
10 | category:            Regex
11 | build-type:          Simple
12 | extra-source-files:  README.md
13 | 
14 | dependencies:
15 |     - base
16 |                      
17 | library:
18 |   exposed-modules: BrzExp
19 |   ghc-options: -Wall
20 |   source-dirs: src
21 | 
22 | tests:
23 |   test:
24 |     main: Tests.hs
25 |     source-dirs: test
26 |     dependencies:
27 |       - BrzExp
28 |       - hspec
29 |   
30 | 


--------------------------------------------------------------------------------
/haskell/09_Classed_Brzozowski/src/BrzExp.hs:
--------------------------------------------------------------------------------
 1 | module BrzExp ( accept, nullable, Brz (..) ) where
 2 | data Brz = Emp | Eps | Sym (Char -> Bool) | Alt Brz Brz | Seq Brz Brz | Rep Brz
 3 | 
 4 | derive :: Brz -> Char -> Brz
 5 | derive Emp _       = Emp
 6 | derive Eps _       = Emp
 7 | derive (Sym c) u   = if (c u) then Eps else Emp
 8 | derive (Seq l r) u
 9 |     | nullable l = Alt (Seq (derive l u) r) (derive r u)
10 |     | otherwise  = Seq (derive l u) r
11 | 
12 | derive (Alt Emp r) u = derive r u                    
13 | derive (Alt l Emp) u = derive l u                    
14 | derive (Alt l r) u   = Alt (derive r u) (derive l u)
15 | 
16 | derive (Rep r) u = Seq (derive r u) (Rep r)
17 | 
18 | nullable :: Brz -> Bool
19 | nullable Emp       = False
20 | nullable Eps       = True
21 | nullable (Sym _)   = False
22 | nullable (Alt l r) = nullable l || nullable r
23 | nullable (Seq l r) = nullable l && nullable r
24 | nullable (Rep _)   = True                     
25 | 
26 | accept :: Brz -> String -> Bool
27 | accept r [] = nullable r
28 | accept r (s:ss) = accept (derive r s) ss
29 | 
30 |        
31 | 


--------------------------------------------------------------------------------
/haskell/09_Classed_Brzozowski/stack.yaml:
--------------------------------------------------------------------------------
 1 | # This file was automatically generated by 'stack init'
 2 | #
 3 | # Some commonly used options have been documented as comments in this file.
 4 | # For advanced use and comprehensive documentation of the format, please see:
 5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/
 6 | 
 7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version.
 8 | # A snapshot resolver dictates the compiler version and the set of packages
 9 | # to be used for project dependencies. For example:
10 | #
11 | # resolver: lts-3.5
12 | # resolver: nightly-2015-09-21
13 | # resolver: ghc-7.10.2
14 | #
15 | # The location of a snapshot can be provided as a file or url. Stack assumes
16 | # a snapshot provided as a file might change, whereas a url resource does not.
17 | #
18 | # resolver: ./custom-snapshot.yaml
19 | # resolver: https://example.com/snapshots/2018-01-01.yaml
20 | resolver: lts-13.4
21 | 
22 | # User packages to be built.
23 | # Various formats can be used as shown in the example below.
24 | #
25 | # packages:
26 | # - some-directory
27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz
28 | # - location:
29 | #    git: https://github.com/commercialhaskell/stack.git
30 | #    commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
32 | #  subdirs:
33 | #  - auto-update
34 | #  - wai
35 | packages:
36 | - .
37 | # Dependency packages to be pulled from upstream that are not in the resolver
38 | # using the same syntax as the packages field.
39 | # (e.g., acme-missiles-0.3)
40 | # extra-deps: []
41 | 
42 | # Override default flag values for local packages and extra-deps
43 | # flags: {}
44 | 
45 | # Extra package databases containing global packages
46 | # extra-package-dbs: []
47 | 
48 | # Control whether we use the GHC we find on the path
49 | # system-ghc: true
50 | #
51 | # Require a specific version of stack, using version ranges
52 | # require-stack-version: -any # Default
53 | # require-stack-version: ">=1.9"
54 | #
55 | # Override the architecture used by stack, especially useful on Windows
56 | # arch: i386
57 | # arch: x86_64
58 | #
59 | # Extra directories used by stack for building
60 | # extra-include-dirs: [/path/to/dir]
61 | # extra-lib-dirs: [/path/to/dir]
62 | #
63 | # Allow a newer minor version of GHC than the snapshot specifies
64 | # compiler-check: newer-minor
65 | 


--------------------------------------------------------------------------------
/haskell/09_Classed_Brzozowski/test/Tests.hs:
--------------------------------------------------------------------------------
  1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-}
  2 | {-# LANGUAGE RecordWildCards #-}
  3 | 
  4 | import           Data.Foldable     (for_)
  5 | import           Test.Hspec        (Spec, describe, it, shouldBe)
  6 | import           Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)
  7 | 
  8 | import           BrzExp            (Brz (..), accept)
  9 | 
 10 | main :: IO ()
 11 | main = hspecWith defaultConfig {configFastFail = True} specs
 12 | 
 13 | specs :: Spec
 14 | specs = describe "accept" $ for_ cases test
 15 |   where
 16 |     test Case {..} = it description assertion
 17 |       where
 18 |         assertion = accept regex sample `shouldBe` result
 19 | 
 20 | data Case = Case
 21 |   { description :: String
 22 |   , regex       :: Brz
 23 |   , sample      :: String
 24 |   , result      :: Bool
 25 |   }
 26 | 
 27 | symf :: Char -> Brz
 28 | symf c = Sym (\u -> c == u)
 29 |   
 30 | -- let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) )
 31 | --     onec = Seq nocs (Sym 'c')
 32 | --     evencs = Seq ( Rep ( Seq onec onec ) ) nocs
 33 | --     as = Alt (Sym 'a') (Rep (Sym 'a'))
 34 | --     bs = Alt (Sym 'b') (Rep (Sym 'b'))
 35 | cases :: [Case]
 36 | cases =
 37 |   [ Case {description = "empty", regex = Eps, sample = "", result = True}
 38 |   , Case {description = "null", regex = Emp, sample = "", result = False}
 39 |   , Case {description = "char", regex = symf 'a', sample = "a", result = True}
 40 |   , Case
 41 |       {description = "not char", regex = symf 'a', sample = "b", result = False}
 42 |   , Case
 43 |       { description = "char vs empty"
 44 |       , regex = symf 'a'
 45 |       , sample = ""
 46 |       , result = False
 47 |       }
 48 |   , Case
 49 |       { description = "left alt"
 50 |       , regex = Alt (symf 'a') (symf 'b')
 51 |       , sample = "a"
 52 |       , result = True
 53 |       }
 54 |   , Case
 55 |       { description = "right alt"
 56 |       , regex = Alt (symf 'a') (symf 'b')
 57 |       , sample = "b"
 58 |       , result = True
 59 |       }
 60 |   , Case
 61 |       { description = "neither alt"
 62 |       , regex = Alt (symf 'a') (symf 'b')
 63 |       , sample = "c"
 64 |       , result = False
 65 |       }
 66 |   , Case
 67 |       { description = "empty alt"
 68 |       , regex = Alt (symf 'a') (symf 'b')
 69 |       , sample = ""
 70 |       , result = False
 71 |       }
 72 |   , Case
 73 |       { description = "empty rep"
 74 |       , regex = Rep (symf 'a')
 75 |       , sample = ""
 76 |       , result = True
 77 |       }
 78 |   , Case
 79 |       { description = "one rep"
 80 |       , regex = Rep (symf 'a')
 81 |       , sample = "a"
 82 |       , result = True
 83 |       }
 84 |   , Case
 85 |       { description = "multiple rep"
 86 |       , regex = Rep (symf 'a')
 87 |       , sample = "aaaaaaaaa"
 88 |       , result = True
 89 |       }
 90 |   , Case
 91 |       { description = "multiple rep with failure"
 92 |       , regex = Rep (symf 'a')
 93 |       , sample = "aaaaaaaaab"
 94 |       , result = False
 95 |       }
 96 |   , Case
 97 |       { description = "sequence"
 98 |       , regex = Seq (symf 'a') (symf 'b')
 99 |       , sample = "ab"
100 |       , result = True
101 |       }
102 |   , Case
103 |       { description = "sequence with empty"
104 |       , regex = Seq (symf 'a') (symf 'b')
105 |       , sample = ""
106 |       , result = False
107 |       }
108 |   , Case
109 |       { description = "bad short sequence"
110 |       , regex = Seq (symf 'a') (symf 'b')
111 |       , sample = "a"
112 |       , result = False
113 |       }
114 |   , Case
115 |       { description = "bad long sequence"
116 |       , regex = Seq (symf 'a') (symf 'b')
117 |       , sample = "abc"
118 |       , result = False
119 |       }
120 |   ]
121 | 


--------------------------------------------------------------------------------
/node/01_Kleene.ts:
--------------------------------------------------------------------------------
  1 | interface Regcom { kind: string };
  2 | class Eps implements Regcom { kind: "eps"; };
  3 | class Sym implements Regcom { kind: "sym"; s: string; }
  4 | class Alt implements Regcom { kind: "alt"; l: Regex; r: Regex };
  5 | class Seq implements Regcom { kind: "seq"; l: Regex; r: Regex };
  6 | class Rep implements Regcom { kind: "rep"; r: Regex };
  7 | 
  8 | function eps():                   Eps { return { kind: "eps" }; };
  9 | function sym(c: string):          Sym { return { kind: "sym", s: c }; };
 10 | function alt(l: Regex, r: Regex): Alt { return { kind: "alt", l: l, r: r }; };
 11 | function seq(l: Regex, r: Regex): Seq { return { kind: "seq", l: l, r: r }; };
 12 | function rep(r: Regex):           Rep { return { kind: "rep", r: r }; };
 13 | 
 14 | type Regex = Eps | Sym | Alt | Seq | Rep;
 15 | 
 16 | // split :: [a] -> [([a], [a])]
 17 | // split []     = [([], [])]
 18 | // split (c:cs) = ([], c : cs) : [(c : s1, s2) | (s1, s2) <- split cs]
 19 | 
 20 | function split(s: string) {
 21 |     if (s.length == 0) {
 22 |         return [["", ""]];  
 23 |     }
 24 |     return [["", s.slice()]].concat(split(s.slice(1)).map((v) => [s[0].slice().concat(v[0].slice()), v[1].slice()]));
 25 | }
 26 | 
 27 | // parts :: [a] -> [[[a]]]
 28 | // parts []     = [[]]
 29 | // parts [c]    = [[[c]]]
 30 | // parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs]
 31 | 
 32 | function parts(s: string): Array<Array<string>> {
 33 |     if (s.length == 0) {
 34 |         return [[]];
 35 |     }
 36 | 
 37 |     if (s.length == 1) {
 38 |         return [[s]];
 39 |     }
 40 | 
 41 |     let c = s[0];
 42 |     let cs = s.slice(1);
 43 |     return parts(cs).reduce((acc, pps) => {
 44 |         let p: string  = pps[0];
 45 |         let ps: Array<string> = pps.slice(1);
 46 |         let l:  Array<string> = [c + p].concat(ps);
 47 |         let r:  Array<string> = [c].concat(p).concat(ps);
 48 |         return acc.concat([l, r]);
 49 |     }, [[]]).filter((c) => c.length != 0);
 50 | }
 51 | 
 52 | function one(a: Array<any>, test: (s: any) => boolean): boolean {
 53 |     return a.reduce((acc: boolean, sc: any) => acc || test(sc), false);
 54 | }
 55 | 
 56 | function all(a: Array<any>, test: (s: any) => boolean): boolean {
 57 |     return a.reduce((acc: boolean, sc: any) => acc && test(sc), true);
 58 | }
 59 | 
 60 | 
 61 | function accept(r: Regex, s: string): boolean {
 62 |     switch(r.kind) {
 63 |     case "eps":
 64 |         return s.length == 0;
 65 |     case "sym":
 66 |         return s.length == 1 && r.s == s[0];
 67 |     case "alt":
 68 |         return accept(r.l, s) || accept(r.r, s);
 69 |     case "seq":
 70 |         return split(s).some((v: Array<string>) => accept(r.l, v[0]) && accept(r.r, v[1]));
 71 |     case "rep":
 72 |         return parts(s).some((v: Array<string>) => v.every((u: string) => accept(r.r, u)));
 73 |     }
 74 | }
 75 | 
 76 | function run_tests() {
 77 | 
 78 |     function assert(l: any) {
 79 |         console.log("   ", l);
 80 |     }
 81 | 
 82 |     let units = {
 83 |         test_simple: () => {
 84 |             let onea = sym("a");
 85 |             assert(accept(onea, "a"));
 86 | 
 87 |             let nocs = rep(alt(sym("a"), sym("b")));
 88 |             assert(accept(nocs, "abab"));
 89 |         },
 90 | 
 91 |         test_seq: () => {
 92 |             let abc = seq(sym("a"), seq(sym("b"), sym("c")));
 93 |             assert(accept(abc, "abc"));
 94 |         },
 95 | 
 96 |         test_rc: () => {
 97 |             let ab = seq(sym("a"), sym("b"));
 98 |             let abab = seq(ab, ab);
 99 |             assert(accept(abab, "abab"));
100 |         },
101 | 
102 |         test_fail: () => {
103 |             let ab = seq(sym("a"), sym("b"));
104 |             let abab = seq(ab, ab);
105 |             assert(! accept(abab, "abacb"));
106 |         },
107 | 
108 |         test_empty_rep: () => {
109 |             let a = rep(sym("a"));
110 |             assert(accept(a, ""));
111 |         },
112 | 
113 |         test_some_rep: () => {
114 |             let a = rep(sym("a"));
115 |             assert(accept(a, "a"));
116 |         },
117 | 
118 |         test_many_rep: () => {
119 |             let a = rep(sym("a"));
120 |             assert(accept(a, "aaaaaaa"));
121 |         },
122 | 
123 |         test_many_rep_dead_l: () => {
124 |             let a = rep(sym("a"));
125 |             assert(! accept(a, "!aaaaaa"));
126 |         },
127 | 
128 |         test_many_rep_dead_r: () => {
129 |             let a = rep(sym("a"));
130 |             assert(! accept(a, "aaaaaa!"));
131 |         },
132 | 
133 |         test_many_rep_dead_m: () => {
134 |             let a = rep(sym("a"));
135 |             assert(! accept(a, "aaa!aaa"));
136 |         },
137 | 
138 |         test_two: () => {
139 |             let nocs = rep(alt(sym("a"), sym("b")));
140 |             let onec = seq(nocs, sym("c"));
141 |             let evencs = seq(rep(seq(onec, onec)), nocs);
142 |             assert(accept(evencs, "abcc"));
143 |             assert(accept(evencs, "abccababbbbcc"));
144 |         }
145 |     }
146 | 
147 |     console.log("Running tests...");
148 |     for (let k of Object.keys(units)) {
149 |         console.log(k); units[k]();
150 |     }
151 | }
152 | 
153 | run_tests();
154 | 


--------------------------------------------------------------------------------
/python/01_rigged_brzozowski.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | from collections import namedtuple
  4 | import re
  5 | 
  6 | Emp = namedtuple("Emp", [])
  7 | Eps = namedtuple("Eps", ["tok"])
  8 | Sym = namedtuple("Sym", ["c"])
  9 | Alt = namedtuple("Alt", ["l", "r"])
 10 | Seq = namedtuple("Seq", ["l", "r"])
 11 | Rep = namedtuple("Rep", ["r"])
 12 | Del = namedtuple("Del", ["r"])
 13 | 
 14 | cname = re.compile(r'^(\w+)\(')
 15 | 
 16 | 
 17 | def cn(s):
 18 |     """ Find the canonical name of the regex op"""
 19 |     return cname.match(s.__doc__).group(1)
 20 | 
 21 | 
 22 | def derive(r, c):
 23 |     """ Take a regex op and a character, and return the derivative regex op."""
 24 |     def sym(r, c):
 25 |         if c == r.c:
 26 |             return Eps(set([c]))
 27 |         return Emp()
 28 | 
 29 |     def alt(r, c):
 30 |         l1 = derive(r.l, c)
 31 |         r1 = derive(r.r, c)
 32 |         if cn(l1) == 'Emp':
 33 |             return r1
 34 |         if cn(r1) == 'Emp':
 35 |             return l1
 36 |         return Alt(l1, r1)
 37 | 
 38 |     def seq(r, c):
 39 |         return Alt(Seq(derive(r.l, c), r.r),
 40 |                    Seq(Del(r.l), derive(r.r, c)))
 41 | 
 42 |     def rep(r, c):
 43 |         return Seq(derive(r.r, c), r)
 44 | 
 45 |     def emp(r, c):
 46 |         return Emp()
 47 | 
 48 |     nextfn = {
 49 |         "Emp": emp,
 50 |         "Eps": emp,
 51 |         "Del": emp,
 52 |         "Sym": sym,
 53 |         "Alt": alt,
 54 |         "Seq": seq,
 55 |         "Rep": rep,
 56 |     }.get(cn(r))
 57 | 
 58 |     return nextfn(r, c)
 59 | 
 60 | 
 61 | def parsenull(r):
 62 |     """ Extract the generated parse forest from the residual regular expression."""
 63 | 
 64 |     def emp(r): return set()
 65 | 
 66 |     def eps(r): return r.tok
 67 | 
 68 |     def sym(r): return set([""])
 69 | 
 70 |     def alt(r): return parsenull(r.l).union(parsenull(r.r))
 71 | 
 72 |     def seq(r): return set([i + j
 73 |                             for j in parsenull(r.r)
 74 |                             for i in parsenull(r.l)])
 75 | 
 76 |     def one(r): return parsenull(r.r)
 77 | 
 78 |     nextfn = {
 79 |         "Emp": emp,
 80 |         "Sym": emp,
 81 |         "Rep": sym,
 82 |         "Del": one,
 83 |         "Eps": eps,
 84 |         "Alt": alt,
 85 |         "Seq": seq
 86 |     }.get(cn(r))
 87 | 
 88 |     return nextfn(r)
 89 | 
 90 | 
 91 | def parse(r, s):
 92 |     """Iterate through the string, generating a new regular expression for each character, until done."""
 93 |     head = r
 94 |     for i in s:
 95 |         print head, "\n"
 96 |         head = derive(head, i)
 97 |     print head
 98 |     return parsenull(head)
 99 | 
100 | 
101 | if __name__ == '__main__':
102 |     nocs = Rep(Alt(Sym('a'), (Sym('b'))))
103 |     onec = Seq(nocs, Sym('c'))
104 |     evencs = Seq(Rep(Seq(onec, onec)), nocs)
105 | 
106 |     aas = Alt(Sym('a'), Rep(Sym('a')))
107 |     bbs = Alt(Sym('b'), Rep(Sym('b')))
108 |     
109 | 
110 |     # print(parse(evencs, "acc"))
111 | 
112 |     sym = Seq(Sym('a'), Seq(Rep(Sym('b')), Sym('c')))
113 |     parse(sym, "ac")
114 | 


--------------------------------------------------------------------------------
/python/02_rigged_brzozowski.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | from collections import namedtuple
  4 | import re
  5 | 
  6 | Emp = namedtuple("Emp", [])
  7 | Eps = namedtuple("Eps", ["tok"])
  8 | Sym = namedtuple("Sym", ["c"])
  9 | Alt = namedtuple("Alt", ["l", "r"])
 10 | Seq = namedtuple("Seq", ["l", "r"])
 11 | Rep = namedtuple("Rep", ["r"])
 12 | 
 13 | cname = re.compile(r'^(\w+)\(')
 14 | 
 15 | 
 16 | def cn(s):
 17 |     """ Find the canonical name of the regex op"""
 18 |     return cname.match(s.__doc__).group(1)
 19 | 
 20 | 
 21 | def derive(r, c):
 22 |     """ Take a regex op and a character, and return the derivative regex op."""
 23 |     def sym(r, c):
 24 |         if c == r.c:
 25 |             return Eps(set([c]))
 26 |         return Emp()
 27 | 
 28 |     def alt(r, c):
 29 |         l1 = derive(r.l, c)
 30 |         r1 = derive(r.r, c)
 31 |         if cn(l1) == 'Emp':
 32 |             return r1
 33 |         if cn(r1) == 'Emp':
 34 |             return l1
 35 |         return Alt(l1, r1)
 36 | 
 37 |     def seq(r, c):
 38 |         if nullable(r.l):
 39 |             return Alt(Seq(derive(r.l, c), r.r), derive(r.r, c))
 40 |         return Seq(derive(r.l, c), r.r)
 41 | 
 42 |     def rep(r, c):
 43 |         return Seq(derive(r.r, c), r)
 44 | 
 45 |     def emp(r, c):
 46 |         return Emp()
 47 | 
 48 |     nextfn = {
 49 |         "Emp": emp,
 50 |         "Eps": emp,
 51 |         "Sym": sym,
 52 |         "Alt": alt,
 53 |         "Seq": seq,
 54 |         "Rep": rep,
 55 |     }.get(cn(r))
 56 | 
 57 |     return nextfn(r, c)
 58 | 
 59 | def nullable(r):
 60 |     def zer(r): return False
 61 |     def one(r): return True
 62 |     def alt(r): return nullable(r.l) or nullable(r.r)
 63 |     def seq(r): return nullable(r.l) and nullable(r.r)
 64 | 
 65 |     nextfn = {
 66 |         "Emp": zer,
 67 |         "Sym": zer,
 68 |         "Rep": one,
 69 |         "Eps": one,
 70 |         "Alt": alt,
 71 |         "Seq": seq
 72 |     }.get(cn(r))
 73 | 
 74 |     return nextfn(r)
 75 | 
 76 | def parsenull(r):
 77 |     """ Extract the generated parse forest from the residual regular expression."""
 78 | 
 79 |     def emp(r): return set()
 80 | 
 81 |     def eps(r): return r.tok
 82 | 
 83 |     def sym(r): return set([""])
 84 | 
 85 |     def alt(r): return parsenull(r.l).union(parsenull(r.r))
 86 | 
 87 |     def seq(r): return set([i + j
 88 |                             for j in parsenull(r.r)
 89 |                             for i in parsenull(r.l)])
 90 | 
 91 |     nextfn = {
 92 |         "Emp": emp,
 93 |         "Sym": emp,
 94 |         "Rep": sym,
 95 |         "Eps": eps,
 96 |         "Alt": alt,
 97 |         "Seq": seq
 98 |     }.get(cn(r))
 99 | 
100 |     return nextfn(r)
101 | 
102 | 
103 | def parse(r, s):
104 |     """Iterate through the string, generating a new regular expression for each character, until done."""
105 |     head = r
106 |     for i in s:
107 |         print head, "\n"
108 |         head = derive(head, i)
109 |     print head
110 |     return parsenull(head)
111 | 
112 | 
113 | if __name__ == '__main__':
114 | #    nocs = Rep(Alt(Sym('a'), (Sym('b'))))
115 | #    onec = Seq(nocs, Sym('c'))
116 | #    evencs = Seq(Rep(Seq(onec, onec)), nocs)
117 | #
118 | #    aas = Alt(Sym('a'), Rep(Sym('a')))
119 | #    bbs = Alt(Sym('b'), Rep(Sym('b')))
120 | #
121 | 
122 |     sym = Seq(Sym('a'), Seq(Rep(Sym('b')), Sym('c')))
123 |     parse(sym, "ac")
124 | 


--------------------------------------------------------------------------------
/python/README.md:
--------------------------------------------------------------------------------
 1 | # Python Experiments!
 2 | 
 3 | This directory contains some simple experiments, in Python.  Python is,
 4 | frankly, easier to instrument than Haskell, so figuring out the
 5 | underlying operation and stepping through it with pdb, can sometimes be
 6 | easier to do in Python3
 7 | 
 8 | `01_rigged_brzowoski.py`: A naive implementation of Brzozowski's regular
 9 | expression library, using the `Delta` operator to distinguish between
10 | nullable and not-nullable branches of the `Sequence` operator.  What's
11 | remarkable about it, if anything, is just *how much* it resembles
12 | Haskell Experiment 05: Rigged Brzozowski Regular Expressions.  Part of
13 | that is using the `namedtuple` as an easy hack for Haskell's data
14 | constructors, and then implementing the `derive()` and `parsenull()`
15 | functions using map functions as a substitute for Haskell's pattern
16 | matching.
17 | 
18 | This is mostly proof that "One can write Haskell poorly in any
19 | language."
20 | 


--------------------------------------------------------------------------------
/rust/01_simpleregex/Cargo.toml:
--------------------------------------------------------------------------------
1 | [package]
2 | name = "simpleregex"
3 | version = "0.1.0"
4 | authors = ["Elf M. Sternberg <elf.sternberg@gmail.com>"]
5 | edition = "2018"
6 | 
7 | 


--------------------------------------------------------------------------------
/rust/01_simpleregex/README.md:
--------------------------------------------------------------------------------
 1 | # Kleene Regular Expressions, in Rust.
 2 | 
 3 | This is literally the definition of a simple string recognizing regular
 4 | expression in Rust.  It consists of the `Reg` datatype encompassing
 5 | the five standard operations of regular expressions and an `accept`
 6 | function that takes the expression and a string and returns a Boolean
 7 | yes/no on recognition or failure. It is a direct implementation of
 8 | Kleene's algebra:
 9 | 
10 |     L[[ε]] = {ε}
11 |     L[[a]] = {a}
12 |     L[[r · s]] = {u · v | u ∈ L[[r]] and v ∈ L[[s]]}
13 |     L[[r | s]] = L[[r]] ∪ L[[s]]
14 |     L[[r∗]] = {ε} ∪ L[[r · r*]]
15 |     
16 | Those equations are for: recognizing an empty string, recognizing a
17 | letter, recognizing two expressions in sequence, recognizing two
18 | expression alternatives, and the repetition operation.
19 | 
20 | Composition is by simple reference-counted pointers to child
21 | expressions.  I've provided convenient constructor functions to make the
22 | creation of new regexes easier.
23 | 
24 | The `accept` function has two helper functions that split the string,
25 | and all substrings, into all possible substrings such that *every
26 | possible combination* of string and expression are tested, and if the
27 | resulting tree of `and`s (from Sequencing and Repetition) and `or`s
28 | (from Alternation) has at least one complete collection of `True` from
29 | top to bottom then the function returns true.
30 | 
31 | This generation and comparison of substrings is grossly inefficient; an
32 | string of eight 'a's with `a*` will take 30 seconds on a modern laptop;
33 | increase that to twelve and you'll be waiting about an hour.  The cost
34 | is `2^(n - 1)`, where `n` is the length of the string; this is a
35 | consequence of the sequencing operation.  Sequences aren't just about
36 | letters: they could be about anything, including repetition (which
37 | itself creates new sequences) and other sequences, and the cost of
38 | examining every possible combination of sequencing creates this
39 | exponential cost.
40 | 
41 | While not as clean (no pun intended) as the Haskell version, especially
42 | in the helper functions, it's still surprisingly easy to read, and the
43 | `accept` function is almost line-for-line as clear as the Haskell
44 | version.  The use of `.any` and `.all` for the `and` and `or` functions
45 | makes a lot of sense here.
46 | 
47 | ## License
48 | 
49 | As this is entirely my work, it is copyright (c) 2019, and licensed
50 | under the Mozilla Public License v. 2.0.  See the
51 | [LICENSE.md](../../LICENSE.md) in the root directory.
52 | 


--------------------------------------------------------------------------------
/rust/01_simpleregex/src/lib.rs:
--------------------------------------------------------------------------------
  1 | use std::rc::Rc;
  2 | 
  3 | // data Reg = Eps | Sym Char | Alt Reg Reg | Seq Reg Reg | Rep Reg
  4 | 
  5 | #[derive(Debug)]
  6 | pub enum Reg {
  7 |     Eps,
  8 |     Sym(char),
  9 |     Alt(Rc<Reg>, Rc<Reg>),
 10 |     Seq(Rc<Reg>, Rc<Reg>),
 11 |     Rep(Rc<Reg>),
 12 | }
 13 | 
 14 | // Some rust-specific helpers to make constructing regular expressions
 15 | // easier.
 16 | 
 17 | pub fn eps() -> Rc<Reg> {
 18 |     Rc::new(Reg::Eps)
 19 | }
 20 | pub fn sym(c: char) -> Rc<Reg> {
 21 |     Rc::new(Reg::Sym(c))
 22 | }
 23 | pub fn alt(r1: &Rc<Reg>, r2: &Rc<Reg>) -> Rc<Reg> {
 24 |     Rc::new(Reg::Alt(r1.clone(), r2.clone()))
 25 | }
 26 | pub fn seq(r1: &Rc<Reg>, r2: &Rc<Reg>) -> Rc<Reg> {
 27 |     Rc::new(Reg::Seq(r1.clone(), r2.clone()))
 28 | }
 29 | pub fn rep(r1: &Rc<Reg>) -> Rc<Reg> {
 30 |     Rc::new(Reg::Rep(r1.clone()))
 31 | }
 32 | 
 33 | // split :: [a] -> [([a], [a])]
 34 | // split []     = [([], [])]
 35 | // split (c:cs) = ([], c : cs) : [(c : s1, s2) | (s1, s2) <- split cs]
 36 | 
 37 | pub fn split(s: &[char]) -> Vec<(Vec<char>, Vec<char>)> {
 38 |     if s.is_empty() {
 39 |         return vec![(vec![], vec![])];
 40 |     }
 41 | 
 42 |     let mut ret = vec![(vec![], s.to_vec())];
 43 |     let c = s[0];
 44 | 
 45 |     fn permute(c: char, s1: &mut Vec<char>, s2: &[char]) -> (Vec<char>, Vec<char>) {
 46 |         let mut r1 = vec![c];
 47 |         r1.append(s1);
 48 |         (r1, s2.to_vec())
 49 |     }
 50 | 
 51 |     ret.append(
 52 |         &mut split(&s[1..])
 53 |             .iter_mut()
 54 |             .map(|(s1, s2)| permute(c, s1, &s2))
 55 |             .collect(),
 56 |     );
 57 |     ret
 58 | }
 59 | 
 60 | // parts :: [a] -> [[[a]]]
 61 | // parts []     = [[]]
 62 | // parts [c]    = [[[c]]]
 63 | // parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs]
 64 | 
 65 | // This was challenging to port to Rust.  Haskell's automatic
 66 | // conversion of [Char] to String obscured what was going on under the
 67 | // covers.
 68 | //
 69 | // parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs]
 70 | // The two elements are:
 71 | //
 72 | // - ([c]:[[p]]):[[[ps]]]
 73 | // The char 'c' is converted to a string, and that string is consed to
 74 | // list 'p', and then list 'p' is consed onto the list 'ps'
 75 | //
 76 | // - [[c]]:[[p]]:[[[ps]]]
 77 | // The char 'c' is made into a string and then wrapped in a list, and
 78 | // then [[p]] and [[c]] are both consed onto list 'ps'
 79 | //
 80 | // It really took writing it all out on paper to understand the order
 81 | // operation.
 82 | 
 83 | pub fn parts(s: &[char]) -> Vec<Vec<Vec<char>>> {
 84 |     if s.is_empty() {
 85 |         return vec![vec![]];
 86 |     }
 87 |     if s.len() == 1 {
 88 |         return vec![vec![s.to_vec()]];
 89 |     }
 90 | 
 91 |     let head = s[0];
 92 |     let tail = &s[1..];
 93 | 
 94 |     let mut ret = vec![];
 95 |     for pps in parts(tail) {
 96 |         let phead = &pps[0];
 97 |         let ptail = &pps[1..];
 98 | 
 99 |         let mut left = vec![head];
100 |         left.append(&mut phead.to_vec());
101 | 
102 |         let mut left_1 = vec![left];
103 |         left_1.append(&mut ptail.to_vec());
104 |         ret.push(left_1);
105 | 
106 |         let mut right = vec![vec![head]];
107 |         right.push(phead.to_vec());
108 |         right.append(&mut ptail.to_vec());
109 |         ret.push(right);
110 |     }
111 |     ret
112 | }
113 | 
114 | // accept :: Reg -> String -> Bool
115 | // accept Eps u       = null u
116 | // accept (Sym c) u   = u == [c]
117 | // accept (Alt p q) u = accept p u || accept q u
118 | // accept (Seq p q) u = or [accept p u1 && accept q u2 | (u1, u2) <- split u]
119 | // accept (Rep r) u   = or [and [accept r ui | ui <- ps] | ps <- parts u]
120 | 
121 | pub fn accept(r: &Reg, s: &[char]) -> bool {
122 |     match r {
123 |         Reg::Eps => s.is_empty(),
124 |         Reg::Sym(c) => (s.len() == 1 && s[0] == *c),
125 |         Reg::Alt(r1, r2) => accept(&r1, s) || accept(&r2, s),
126 |         Reg::Seq(r1, r2) => split(s)
127 |             .into_iter()
128 |             .any(|(u1, u2)| accept(r1, &u1) && accept(r2, &u2)),
129 |         Reg::Rep(r) => parts(s)
130 |             .into_iter()
131 |             .any(|ps| ps.into_iter().all(|u| accept(r, &u))),
132 |     }
133 | }
134 | 
135 | #[cfg(test)]
136 | mod tests {
137 |     use super::*;
138 | 
139 |     fn vectostr(r: &(Vec<char>, Vec<char>)) -> (String, String) {
140 |         let (a, b) = r;
141 |         let c: String = a.into_iter().collect();
142 |         let d: String = b.into_iter().collect();
143 |         (c, d)
144 |     }
145 | 
146 |     #[test]
147 |     fn test_split() {
148 |         let c1: Vec<char> = String::from("").chars().into_iter().collect();
149 |         let s: Vec<(String, String)> = split(&c1).into_iter().map(|r| vectostr(&r)).collect();
150 |         assert_eq!(s, [("".to_string(), "".to_string())]);
151 |     }
152 | 
153 |     #[test]
154 |     fn test_simple() {
155 |         let c1: Vec<char> = String::from("acc").chars().into_iter().collect();
156 |         assert_eq!(c1, ['a', 'c', 'c']);
157 | 
158 |         let c2: Vec<char> = String::from("a").chars().into_iter().collect();
159 |         let onea = sym('a');
160 |         assert!(accept(&onea, &c2));
161 | 
162 |         let c3: Vec<char> = String::from("abab").chars().into_iter().collect();
163 |         let nocs = rep(&alt(&sym('a'), &sym('b')));
164 |         assert!(accept(&nocs, &c3));
165 |     }
166 | 
167 |     #[test]
168 |     fn test_seq() {
169 |         let c3: Vec<char> = String::from("abc").chars().into_iter().collect();
170 |         let abc = seq(&sym('a'), &seq(&sym('b'), &sym('c')));
171 |         assert!(accept(&abc, &c3));
172 |     }
173 | 
174 |     #[test]
175 |     fn test_rc() {
176 |         let c3: Vec<char> = String::from("abab").chars().into_iter().collect();
177 |         let ab = seq(&sym('a'), &sym('b'));
178 |         let abab = seq(&ab, &ab);
179 |         assert!(accept(&abab, &c3));
180 |     }
181 | 
182 |     #[test]
183 |     fn test_empty_rep() {
184 |         let c3: Vec<char> = String::from("").chars().into_iter().collect();
185 |         let a = rep(&sym('a'));
186 |         assert!(accept(&a, &c3));
187 |     }
188 | 
189 |     #[test]
190 |     fn test_two() {
191 |         let c4: Vec<char> = String::from("abcc").chars().into_iter().collect();
192 |         let nocs = rep(&alt(&sym('a'), &sym('b')));
193 |         let onec = seq(&nocs, &sym('c'));
194 |         let evencs = seq(&rep(&seq(&onec, &onec)), &nocs);
195 |         assert!(accept(&evencs, &c4))
196 |     }
197 | }
198 | 


--------------------------------------------------------------------------------
/rust/02_riggedregex/Cargo.toml:
--------------------------------------------------------------------------------
1 | [package]
2 | name = "riggedregex"
3 | version = "0.1.0"
4 | authors = ["Elf M. Sternberg <elf.sternberg@gmail.com>"]
5 | edition = "2018"
6 | 
7 | [dependencies]
8 | num-traits = "0.2.6"
9 | 


--------------------------------------------------------------------------------
/rust/02_riggedregex/README.md:
--------------------------------------------------------------------------------
 1 | # Kleene Regular Expressions with Rigging, in Rust
 2 | 
 3 | This program builds on the simple regular expressions in Version 01,
 4 | provding a new definition of a regular expression `Regw` that takes two
 5 | types, a source type and an output type.  The output type must be a
 6 | [*Semiring*](https://en.wikipedia.org/wiki/Semiring).
 7 | 
 8 | A semiring is a set R equipped with two binary operations + and ⋅, and
 9 | two constants identified as 0 and 1.  By providing a semiring to the
10 | regular expression, we change the return type of the regular expression
11 | to any set that can obey the semiring laws.  There's a surprising amount
12 | of stuff you can do with the semiring laws.
13 | 
14 | In this example, I've providing a function, `rigged`, that takes a
15 | simple regular expression from Version 01, and wraps or extracts
16 | the contents of that regular expression into the `Regw` datatype.
17 | Instead of the boolean mathematics of Version 01, we use the semiring
18 | symbols `add` and `mul` to represent the sum and product operations on
19 | the return type.  We then define the "symbol accepted" boolean to return
20 | either the `zero` or `one` type of the semiring.
21 | 
22 | I've provided two semirings: One of (0, 1, +, *, Integers), and one of
23 | (False, True, ||, &&, Booleans).  Both work well.  
24 | 
25 | Rust isn't nearly as magical as Haskell.  (See the Readme in the
26 | equivalent Haskell version for my comments on that.)  On the other hand,
27 | it's not necessary to define a Semiring explicitly; instead, we define a
28 | nominative type, a struct containing our real return type, and then
29 | provide implementations of One, Zero, Mul, and Add for that type.  Here,
30 | my two semirings are name `Recognizer` and `Ambigcounter`, and to make
31 | them work we have to say that our recognizer is a `Regw<Recognizer>`;
32 | Rust won't magically glue everything together the way Haskell will.
33 | 
34 | Still, this was a straightforward implementation of the rigged regular
35 | expression, and is a good stepping stone for future projects.
36 | 
37 | ## License
38 | 
39 | As this is entirely my work, it is copyright (c) 2019, and licensed
40 | under the Mozilla Public License v. 2.0.  See the
41 | [LICENSE.md](../../LICENSE.md) in the root directory.
42 | 


--------------------------------------------------------------------------------
/rust/03_brzozowski_1/.gitignore:
--------------------------------------------------------------------------------
1 | /target
2 | **/*.rs.bk
3 | Cargo.lock
4 | 


--------------------------------------------------------------------------------
/rust/03_brzozowski_1/Cargo.toml:
--------------------------------------------------------------------------------
1 | [package]
2 | name = "sbrz"
3 | version = "0.1.0"
4 | authors = ["Elf M. Sternberg <elf.sternberg@gmail.com>"]
5 | edition = "2018"
6 | 
7 | [dependencies]
8 | 


--------------------------------------------------------------------------------
/rust/03_brzozowski_1/README.md:
--------------------------------------------------------------------------------
 1 | # Brzozowski Regular Expressions, in Rust
 2 | 
 3 | This is a regex recognizer implementing Brzozowski's Algorithm, in Rust.
 4 | It has two standard optimizations (null branches are automatically
 5 | pruned), and with those it works fine.
 6 | 
 7 | This version implements regular expressions as they appear in the Racket
 8 | version, without nullability optimizations (the so-called "rerp"
 9 | implementation).
10 | 
11 | ## License
12 | 
13 | As this is entirely my work, it is copyright (c) 2019, and licensed
14 | under the Mozilla Public License v. 2.0.  See the
15 | [LICENSE.md](../../LICENSE.md) in the root directory.
16 | 


--------------------------------------------------------------------------------
/rust/03_brzozowski_1/src/lib.rs:
--------------------------------------------------------------------------------
  1 | use std::ops::Deref;
  2 | use std::rc::Rc;
  3 | 
  4 | #[derive(Debug)]
  5 | pub enum Brz {
  6 |     Emp,
  7 |     Eps,
  8 |     Sym(char),
  9 |     Alt(Rc<Brz>, Rc<Brz>),
 10 |     Seq(Rc<Brz>, Rc<Brz>),
 11 |     Rep(Rc<Brz>),
 12 | }
 13 | 
 14 | macro_rules! matches {
 15 |     ($expression:expr, $($pattern:tt)+) => {
 16 |         match $expression {
 17 |             $($pattern)+ => true,
 18 |             _ => false
 19 |         }
 20 |     }
 21 | }
 22 | 
 23 | macro_rules! cond {
 24 |     ($($pred:expr => $body:block),+ ,_ => $default:block) => {
 25 |         {
 26 |             $(if $pred $body else)+
 27 |             $default
 28 |         }
 29 |     }
 30 | }
 31 | 
 32 | pub fn emp() -> Rc<Brz> {
 33 |     Rc::new(Brz::Emp)
 34 | }
 35 | 
 36 | pub fn eps() -> Rc<Brz> {
 37 |     Rc::new(Brz::Eps)
 38 | }
 39 | 
 40 | pub fn sym(c: char) -> Rc<Brz> {
 41 |     Rc::new(Brz::Sym(c))
 42 | }
 43 | 
 44 | pub fn alt(r1: &Rc<Brz>, r2: &Rc<Brz>) -> Rc<Brz> {
 45 |     cond!(
 46 |         matches!(r1.deref(), Brz::Emp) => { r2.clone() },
 47 |         matches!(r2.deref(), Brz::Emp) => { r1.clone() },
 48 |         _ => { Rc::new(Brz::Alt(r1.clone(), r2.clone())) }
 49 |     )
 50 | }
 51 | 
 52 | pub fn seq(r1: &Rc<Brz>, r2: &Rc<Brz>) -> Rc<Brz> {
 53 |     cond!(
 54 |         matches!(r1.deref(), Brz::Emp) => { emp() },
 55 |         matches!(r2.deref(), Brz::Emp) => { emp() },
 56 |         _ => { Rc::new(Brz::Seq(r1.clone(), r2.clone())) }
 57 |     )
 58 | }
 59 | 
 60 | pub fn rep(r1: &Rc<Brz>) -> Rc<Brz> {
 61 |     Rc::new(Brz::Rep(r1.clone()))
 62 | }
 63 | 
 64 | pub fn derive(n: &Rc<Brz>, c: char) -> Rc<Brz> {
 65 |     use self::Brz::*;
 66 | 
 67 |     match n.deref() {
 68 |         Emp => emp(),
 69 |         Eps => emp(),
 70 |         Sym(u) => {
 71 |             if c == *u {
 72 |                 eps()
 73 |             } else {
 74 |                 emp()
 75 |             }
 76 |         }
 77 |         Seq(l, r) => {
 78 |             let s = seq(&derive(l, c), r);
 79 |             if nullable(l) {
 80 |                 alt(&s, &derive(r, c))
 81 |             } else {
 82 |                 s
 83 |             }
 84 |         }
 85 |         Alt(l, r) => alt(&derive(l, c), &derive(r, c)),
 86 |         Rep(r) => seq(&derive(r, c), &n.clone()),
 87 |     }
 88 | }
 89 | 
 90 | pub fn nullable(n: &Rc<Brz>) -> bool {
 91 |     use self::Brz::*;
 92 | 
 93 |     match n.deref() {
 94 |         Emp => false,
 95 |         Eps => true,
 96 |         Sym(_) => false,
 97 |         Seq(l, r) => nullable(l) && nullable(r),
 98 |         Alt(l, r) => nullable(l) || nullable(r),
 99 |         Rep(_) => true,
100 |     }
101 | }
102 | 
103 | pub fn accept(n: &Rc<Brz>, s: String) -> bool {
104 |     use self::Brz::*;
105 | 
106 |     let mut source = s.chars().peekable();
107 |     let mut r = n.clone();
108 |     loop {
109 |         match source.next() {
110 |             None => break nullable(&r),
111 |             Some(ref c) => {
112 |                 let np = derive(&r, *c);
113 |                 println!("{:?}", np);
114 |                 match np.deref() {
115 |                     Emp => return false,
116 |                     Eps => {
117 |                         break match source.peek() {
118 |                             None => true,
119 |                             Some(_) => false,
120 |                         };
121 |                     }
122 |                     _ => r = np.clone(),
123 |                 }
124 |             }
125 |         }
126 |     }
127 | }
128 | 
129 | #[cfg(test)]
130 | mod tests {
131 |     use super::*;
132 | 
133 |     #[test]
134 |     fn basics() {
135 |         let cases = [
136 |             ("empty", eps(), "", true),
137 |             ("null", emp(), "", false),
138 |             ("char", sym('a'), "a", true),
139 |             ("not char", sym('a'), "b", false),
140 |             ("char vs empty", sym('a'), "", false),
141 |             ("left alt", alt(&sym('a'), &sym('b')), "a", true),
142 |             ("right alt", alt(&sym('a'), &sym('b')), "b", true),
143 |             ("neither alt", alt(&sym('a'), &sym('b')), "c", false),
144 |             ("empty alt", alt(&sym('a'), &sym('b')), "", false),
145 |             ("empty rep", rep(&sym('a')), "", true),
146 |             ("one rep", rep(&sym('a')), "a", true),
147 |             ("short multiple failed rep", rep(&sym('a')), "ab", false),
148 |             ("multiple rep", rep(&sym('a')), "aaaaaaaaa", true),
149 |             (
150 |                 "multiple rep with failure",
151 |                 rep(&sym('a')),
152 |                 "aaaaaaaaab",
153 |                 false,
154 |             ),
155 |             ("sequence", seq(&sym('a'), &sym('b')), "ab", true),
156 |             ("sequence with empty", seq(&sym('a'), &sym('b')), "", false),
157 |             ("bad short sequence", seq(&sym('a'), &sym('b')), "a", false),
158 |             ("bad long sequence", seq(&sym('a'), &sym('b')), "abc", false),
159 |         ];
160 | 
161 |         for (name, case, sample, result) in &cases {
162 |             println!("{:?}", name);
163 |             assert_eq!(accept(case, sample.to_string()), *result);
164 |         }
165 |     }
166 | }
167 | 


--------------------------------------------------------------------------------
/rust/04_brzozowski_2/.gitignore:
--------------------------------------------------------------------------------
1 | /target
2 | **/*.rs.bk
3 | Cargo.lock
4 | 


--------------------------------------------------------------------------------
/rust/04_brzozowski_2/Cargo.toml:
--------------------------------------------------------------------------------
1 | [package]
2 | name = "sbrz"
3 | version = "0.1.0"
4 | authors = ["Elf M. Sternberg <elf.sternberg@gmail.com>"]
5 | edition = "2018"
6 | 
7 | [dependencies]
8 | 


--------------------------------------------------------------------------------
/rust/04_brzozowski_2/README.md:
--------------------------------------------------------------------------------
 1 | # Brzozowski Regular Expressions, in Rust
 2 | 
 3 | This is a regex recognizer implementing Brzozowski's Algorithm, in Rust.
 4 | It has two standard optimizations (null branches are automatically
 5 | pruned), and with those it works fine.
 6 | 
 7 | This version implements regular expressions as they appear in my own
 8 | Haskell version, version, without nullability optimizations (the
 9 | so-called "rerp" implementation).
10 | 
11 | The difference between the two baseline implementations is that this one
12 | attempts to "object orient" the code, creating an implementation that
13 | can be modified without having to touch many portions of the code, by
14 | isolating the 'derive' and 'nullability' tests into their own
15 | implementation of the BrzNode.  I consider the experiment something of a
16 | failure, in that to "work around" Rust's lack of inheritance I had to do
17 | some fairly wacky things to teach Rust how to look stuff up.
18 | 
19 | ## License
20 | 
21 | As this is entirely my work, it is copyright (c) 2019, and licensed
22 | under the Mozilla Public License v. 2.0.  See the
23 | [LICENSE.md](../../LICENSE.md) in the root directory.
24 | 


--------------------------------------------------------------------------------
/rust/04_brzozowski_2/src/lib.rs:
--------------------------------------------------------------------------------
  1 | use std::ops::Deref;
  2 | use std::rc::Rc;
  3 | 
  4 | #[derive(Debug)]
  5 | pub struct Emp;
  6 | #[derive(Debug)]
  7 | pub struct Eps;
  8 | #[derive(Debug)]
  9 | pub struct Sym(char);
 10 | #[derive(Debug)]
 11 | pub struct Alt(Rc<Brz>, Rc<Brz>);
 12 | #[derive(Debug)]
 13 | pub struct Seq(Rc<Brz>, Rc<Brz>);
 14 | #[derive(Debug)]
 15 | pub struct Rep(Rc<Brz>);
 16 | 
 17 | #[derive(Debug)]
 18 | pub enum Brz {
 19 | 	Emp(Emp),
 20 | 	Eps(Eps),
 21 | 	Sym(Sym),
 22 | 	Alt(Alt),
 23 | 	Seq(Seq),
 24 | 	Rep(Rep),
 25 | }
 26 | 
 27 | impl Brz {
 28 | 	fn derive(&self, c: char) -> Rc<Brz> {
 29 | 		match self {
 30 | 			Brz::Emp(emp) => emp.derive(c),
 31 | 			Brz::Eps(eps) => eps.derive(c),
 32 | 			Brz::Sym(sym) => sym.derive(c),
 33 | 			Brz::Alt(alt) => alt.derive(c),
 34 | 			Brz::Seq(seq) => seq.derive(c),
 35 | 			Brz::Rep(rep) => rep.derive(c),
 36 | 		}
 37 | 	}
 38 | 
 39 | 	fn nullable(&self) -> bool {
 40 | 		match self {
 41 | 			Brz::Emp(emp) => emp.nullable(),
 42 | 			Brz::Eps(eps) => eps.nullable(),
 43 | 			Brz::Sym(sym) => sym.nullable(),
 44 | 			Brz::Alt(alt) => alt.nullable(),
 45 | 			Brz::Seq(seq) => seq.nullable(),
 46 | 			Brz::Rep(rep) => rep.nullable(),
 47 | 		}
 48 | 	}
 49 | }
 50 | 
 51 | trait Brznode {
 52 | 	fn derive(&self, c: char) -> Rc<Brz>;
 53 | 	fn nullable(&self) -> bool;
 54 | }
 55 | 
 56 | impl Brznode for Emp {
 57 | 	fn derive(&self, _: char) -> Rc<Brz> {
 58 | 		Rc::new(Brz::Emp(Emp {}))
 59 | 	}
 60 | 	fn nullable(&self) -> bool {
 61 | 		false
 62 | 	}
 63 | }
 64 | 
 65 | impl Brznode for Eps {
 66 | 	fn derive(&self, _: char) -> Rc<Brz> {
 67 | 		Rc::new(Brz::Emp(Emp {}))
 68 | 	}
 69 | 	fn nullable(&self) -> bool {
 70 | 		true
 71 | 	}
 72 | }
 73 | 
 74 | impl Brznode for Sym {
 75 | 	fn derive(&self, c: char) -> Rc<Brz> {
 76 | 		Rc::new(if c == self.0 {
 77 | 			Brz::Eps(Eps {})
 78 | 		} else {
 79 | 			Brz::Emp(Emp {})
 80 | 		})
 81 | 	}
 82 | 	fn nullable(&self) -> bool {
 83 | 		false
 84 | 	}
 85 | }
 86 | 
 87 | pub fn alt(r1: &Rc<Brz>, r2: &Rc<Brz>) -> Rc<Brz> {
 88 | 	match (r1.deref(), r2.deref()) {
 89 | 		(_, Brz::Emp(_)) => r1.clone(),
 90 | 		(Brz::Emp(_), _) => r2.clone(),
 91 | 		_ => Rc::new(Brz::Alt(Alt(r1.clone(), r2.clone()))),
 92 | 	}
 93 | }
 94 | 
 95 | impl Brznode for Alt {
 96 | 	fn derive(&self, c: char) -> Rc<Brz> {
 97 | 		let l = &self.0.derive(c);
 98 | 		let r = &self.1.derive(c);
 99 | 		alt(l, r)
100 | 	}
101 | 	fn nullable(&self) -> bool {
102 | 		self.0.nullable() || self.1.nullable()
103 | 	}
104 | }
105 | 
106 | pub fn seq(r1: &Rc<Brz>, r2: &Rc<Brz>) -> Rc<Brz> {
107 | 	match (r1.deref(), r2.deref()) {
108 | 		(_, Brz::Emp(_)) => emp(),
109 | 		(Brz::Emp(_), _) => emp(),
110 | 		_ => Rc::new(Brz::Seq(Seq(r1.clone(), r2.clone()))),
111 | 	}
112 | }
113 | 
114 | impl Brznode for Seq {
115 | 	fn derive(&self, c: char) -> Rc<Brz> {
116 | 		let s = seq(&self.0.derive(c), &self.1);
117 | 		if self.0.nullable() {
118 | 			alt(&s, &self.1.derive(c))
119 | 		} else {
120 | 			s
121 | 		}
122 | 	}
123 | 	fn nullable(&self) -> bool {
124 | 		self.0.nullable() && self.1.nullable()
125 | 	}
126 | }
127 | 
128 | impl Brznode for Rep {
129 | 	fn derive(&self, c: char) -> Rc<Brz> {
130 | 		seq(&self.0.derive(c), &rep(&self.0))
131 | 	}
132 | 	fn nullable(&self) -> bool {
133 | 		true
134 | 	}
135 | }
136 | 
137 | pub fn emp() -> Rc<Brz> {
138 | 	Rc::new(Brz::Emp(Emp))
139 | }
140 | 
141 | pub fn eps() -> Rc<Brz> {
142 | 	Rc::new(Brz::Eps(Eps))
143 | }
144 | 
145 | pub fn sym(c: char) -> Rc<Brz> {
146 | 	Rc::new(Brz::Sym(Sym(c)))
147 | }
148 | 
149 | pub fn rep(r1: &Rc<Brz>) -> Rc<Brz> {
150 | 	Rc::new(Brz::Rep(Rep(r1.clone())))
151 | }
152 | 
153 | pub fn accept(n: &Rc<Brz>, s: String) -> bool {
154 | 	let mut source = s.chars().peekable();
155 | 	let mut r = n.clone();
156 | 	loop {
157 | 		match source.next() {
158 | 			None => break r.nullable(),
159 | 			Some(c) => {
160 | 				let np = r.derive(c);
161 | 				match np.deref() {
162 | 					Brz::Emp(_) => return false,
163 | 					Brz::Eps(_) => break source.peek().is_none(),
164 | 					_ => r = np.clone(),
165 | 				}
166 | 			}
167 | 		}
168 | 	}
169 | }
170 | 
171 | #[cfg(test)]
172 | mod tests {
173 | 	use super::*;
174 | 
175 | 	#[test]
176 | 	fn basics() {
177 | 		let cases = [
178 | 			("empty", eps(), "", true),
179 | 			("null", emp(), "", false),
180 | 			("char", sym('a'), "a", true),
181 | 			("not char", sym('a'), "b", false),
182 | 			("char vs empty", sym('a'), "", false),
183 | 			("left alt", alt(&sym('a'), &sym('b')), "a", true),
184 | 			("right alt", alt(&sym('a'), &sym('b')), "b", true),
185 | 			("neither alt", alt(&sym('a'), &sym('b')), "c", false),
186 | 			("empty alt", alt(&sym('a'), &sym('b')), "", false),
187 | 			("empty rep", rep(&sym('a')), "", true),
188 | 			("sequence", seq(&sym('a'), &sym('b')), "ab", true),
189 | 			("sequence with empty", seq(&sym('a'), &sym('b')), "", false),
190 | 			("bad long sequence", seq(&sym('a'), &sym('b')), "abc", false),
191 | 			("bad short sequence", seq(&sym('a'), &sym('b')), "a", false),
192 | 			("one rep", rep(&sym('a')), "a", true),
193 | 			("short multiple failed rep", rep(&sym('a')), "ab", false),
194 | 			("multiple rep", rep(&sym('a')), "aaaaaaaaa", true),
195 | 			(
196 | 				"multiple rep with failure",
197 | 				rep(&sym('a')),
198 | 				"aaaaaaaaab",
199 | 				false,
200 | 			),
201 | 		];
202 | 
203 | 		for (name, case, sample, result) in &cases {
204 | 			println!("{:?}", name);
205 | 			assert_eq!(accept(case, sample.to_string()), *result);
206 | 		}
207 | 	}
208 | }
209 | 


--------------------------------------------------------------------------------
/rust/05_glushkov/Cargo.toml:
--------------------------------------------------------------------------------
1 | [package]
2 | name = "glushkov"
3 | version = "0.1.0"
4 | authors = ["Elf M. Sternberg <elf.sternberg@gmail.com>"]
5 | edition = "2018"
6 | 
7 | [dependencies]
8 | 


--------------------------------------------------------------------------------
/rust/05_glushkov/README.md:
--------------------------------------------------------------------------------
 1 | # Glushkov Regular Expressions, in Rust
 2 | 
 3 | This is a Glushkov's construction of regular expressions. The basic idea
 4 | is that for every symbol encountered during parsing, a corresponding
 5 | symbol in the tree is marked (or, if no symbols are marked, the parse
 6 | is a failure).  Composites are followed to their ends for each
 7 | character, and if the symbol matches it is "marked".
 8 | 
 9 | In this instance, we create a Glushkov regular expression tree, and for
10 | each character it returns a new, complete copy of the tree, only with
11 | the marks "shifted" to where they should be given the character.  In
12 | this way, each iteration of the tree keeps the NFA list of states that
13 | are active; they are the paths that lead to marked symbols.
14 | 
15 | `ended` here means that no more symbols have to be read to match the
16 | expression.  `empty` here means that the expression matches only the
17 | empty string.  This function was named `final` in the Haskell version,
18 | but the word `ended` is used here because `final` is a reserved word in
19 | Rust.
20 | 
21 | 'ended' is used here to determine if, for the Glushkov expression
22 | passed in, does the expression contain a marked symbol?  This is
23 | used both to determine the end state of the expression, and in
24 | sequences to determine if the rightmost expression must be evaluted,
25 | that is, if we're currently going down a 'marked' path and the left
26 | expression can handle the empty string OR the left expression is
27 | ended.
28 | 
29 | The accept method is just a fold over the expression.  The initial
30 | value is the shift of the first character, with the assumed mark of
31 | `True` being included because we can always parse infinitely many
32 | empty strings before the sample begins.  The returned value of that
33 | shift is our new regular expression, on which we then progressively
34 | call `shift False accg c`; here False means that we're only going to
35 | shift marks we've already found.
36 | 
37 | The "trick" to understanding how this works is to consider the string
38 | "abc" for the expression "abc".  The first time through, we start with
39 | True, and what gets marked is the symbol 'a':
40 | 
41 | `(seq 'a' (seq 'b' 'c')) -> (seq 'a'* (seq 'b' 'c'))`
42 | 
43 | When we pass the letter 'b', what happens?  Well, the returned
44 | expression will have the 'a' symbol unmarked (it didn't match the
45 | character), but the second part of the shift expression says that the
46 | left expression is ended (it's a symbol and it was marked!), so we call
47 | `shift True (Sym 'b') 'b'`, and the new symbol generated will be marked,
48 | moving the mark to the correct destination.  The same thing happens on
49 | the next iteration.  The *inner seq* will get back that `(sym 'b')` is
50 | marked, so 'c' will match the `(sym 'c')` and shift will be in a `True`
51 | state, so now the expression comes back `(seq 'a' (seq 'b' 'c'*))`.  
52 | 
53 | When we run out of letters or regex, we can ask, "Is the expression
54 | final?"  Again, the tricky part is inside sequences: we're only final if
55 | the left side is final and the right side can handle an empty string, or
56 | if the right side is final.
57 | 
58 | Porting this from Haskell was *much* more straightforward than porting 
59 | the straight regex versions, and is slightly more efficient, although 
60 | it still has the "transition the entire parse tree every character" 
61 | problem.  That's to be solved later.
62 | 
63 | ## License
64 | 
65 | As this is entirely my work, it is copyright (c) 2019, and licensed
66 | under the Mozilla Public License v. 2.0.  See the
67 | [LICENSE.md](../../LICENSE.md) in the root directory.
68 | 


--------------------------------------------------------------------------------
/rust/05_glushkov/src/lib.rs:
--------------------------------------------------------------------------------
  1 | //! This crate provides a series of simple functions for building a
  2 | //! regular expression, and an `accept` function which takes a
  3 | //! completed regular expression and a string and returns a boolean
  4 | //! value describing if the expression matched the string (or not).
  5 | //!
  6 | //! # Quick Preview
  7 | //!
  8 | //! ```
  9 | //! use glushkov::*;
 10 | //! // `(fred|dino)`
 11 | //! let expr = alt(&seq(&sym('f'), &seq(&sym('r'), &seq(&sym('e'), &sym('d')))),
 12 | //!                &seq(&sym('d'), &seq(&sym('i'), &seq(&sym('n'), &sym('o')))));
 13 | //! accept(&expr, "fred") == true;
 14 | //! accept(&expr, "dino") == true;
 15 | //! accept(&expr, "wilma") == false;
 16 | //! ```
 17 | 
 18 | use std::ops::Deref;
 19 | use std::rc::Rc;
 20 | 
 21 | #[derive(Debug)]
 22 | pub enum Glu {
 23 | 	Eps,
 24 | 	Sym(bool, char),
 25 | 	Alt(Rc<Glu>, Rc<Glu>),
 26 | 	Seq(Rc<Glu>, Rc<Glu>),
 27 | 	Rep(Rc<Glu>),
 28 | }
 29 | 
 30 | /// Recognize only the empty string
 31 | pub fn eps() -> Rc<Glu> {
 32 | 	Rc::new(Glu::Eps)
 33 | }
 34 | 
 35 | /// Recognize a single character
 36 | pub fn sym(c: char) -> Rc<Glu> {
 37 | 	Rc::new(Glu::Sym(false, c))
 38 | }
 39 | 
 40 | /// Recognize alternatives between two other regexes
 41 | pub fn alt(r1: &Rc<Glu>, r2: &Rc<Glu>) -> Rc<Glu> {
 42 | 	Rc::new(Glu::Alt(r1.clone(), r2.clone()))
 43 | }
 44 | 
 45 | /// Recognize a sequence of regexes in order
 46 | pub fn seq(r1: &Rc<Glu>, r2: &Rc<Glu>) -> Rc<Glu> {
 47 | 	Rc::new(Glu::Seq(r1.clone(), r2.clone()))
 48 | }
 49 | 
 50 | /// Recognize a regex repeated zero or more times.
 51 | pub fn rep(r1: &Rc<Glu>) -> Rc<Glu> {
 52 | 	Rc::new(Glu::Rep(r1.clone()))
 53 | }
 54 | 
 55 | // The main function: repeatedly traverses the tree, modifying as it
 56 | // goes, generating a new tree, marking the nodes where the expression
 57 | // currently "is," for any given character.
 58 | //
 59 | pub fn shift(g: &Rc<Glu>, m: bool, c: char) -> Rc<Glu> {
 60 | 	match g.deref() {
 61 | 		Glu::Eps => eps(),
 62 | 		Glu::Sym(_, s) => Rc::new(Glu::Sym(m && *s == c, *s)),
 63 | 		Glu::Alt(r1, r2) => alt(&shift(r1, m, c), &shift(r2, m, c)),
 64 | 		Glu::Seq(r1, r2) => {
 65 | 			let l_end = empty(r1);
 66 | 			let l_fin = ended(r1);
 67 | 			seq(&shift(r1, m, c), &shift(r2, m && l_end || l_fin, c))
 68 | 		}
 69 | 		Glu::Rep(r) => rep(&shift(r, m || ended(r), c)),
 70 | 	}
 71 | }
 72 | 
 73 | // Helper function that describes whether or not the expression passed
 74 | // in contains the mark; used to determine if, when either the string
 75 | // or the expression runs out, if the expression is in an "accept"
 76 | // state.
 77 | //
 78 | pub fn ended(g: &Rc<Glu>) -> bool {
 79 | 	match g.deref() {
 80 | 		Glu::Eps => false,
 81 | 		Glu::Sym(m, _) => *m,
 82 | 		Glu::Alt(r1, r2) => ended(r1) || ended(r2),
 83 | 		Glu::Seq(r1, r2) => ended(r1) && empty(r2) || ended(r2),
 84 | 		Glu::Rep(r) => ended(r),
 85 | 	}
 86 | }
 87 | 
 88 | // Helper function that describes whether or not the expression
 89 | // supplied handles the empty string.
 90 | //
 91 | pub fn empty(g: &Rc<Glu>) -> bool {
 92 | 	match g.deref() {
 93 | 		Glu::Eps => true,
 94 | 		Glu::Sym(_, _) => false,
 95 | 		Glu::Alt(r1, r2) => empty(r1) || empty(r2),
 96 | 		Glu::Seq(r1, r2) => empty(r1) && empty(r2),
 97 | 		Glu::Rep(_) => true,
 98 | 	}
 99 | }
100 | 
101 | /// Takes a regular expression and a string and returns whether or not
102 | /// the expression and the string match (the string belongs to the
103 | /// set of languages recognized by the expression).
104 | pub fn accept(g: &Rc<Glu>, s: &str) -> bool {
105 | 	if s.is_empty() {
106 | 		return empty(g);
107 | 	}
108 | 
109 | 	pub fn ashift(g: Rc<Glu>, c: char) -> Rc<Glu> {
110 | 		shift(&g, false, c)
111 | 	}
112 | 
113 | 	// This is kinda cool. I wonder if I can make the Brz versions look
114 | 	// like this.
115 | 	let mut seq = s.chars();
116 | 	let start = shift(g, true, seq.next().unwrap());
117 | 	ended(&seq.fold(start, ashift))
118 | }
119 | 
120 | #[cfg(test)]
121 | mod tests {
122 | 	use super::*;
123 | 
124 | 	#[test]
125 | 	fn basics() {
126 | 		let cases = [
127 | 			("empty", eps(), "", true),
128 | 			("char", sym('a'), "a", true),
129 | 			("not char", sym('a'), "b", false),
130 | 			("char vs empty", sym('a'), "", false),
131 | 			("left alt", alt(&sym('a'), &sym('b')), "a", true),
132 | 			("right alt", alt(&sym('a'), &sym('b')), "b", true),
133 | 			("neither alt", alt(&sym('a'), &sym('b')), "c", false),
134 | 			("empty alt", alt(&sym('a'), &sym('b')), "", false),
135 | 			("empty rep", rep(&sym('a')), "", true),
136 | 			("sequence", seq(&sym('a'), &sym('b')), "ab", true),
137 | 			("sequence with empty", seq(&sym('a'), &sym('b')), "", false),
138 | 			("bad long sequence", seq(&sym('a'), &sym('b')), "abc", false),
139 | 			("bad short sequence", seq(&sym('a'), &sym('b')), "a", false),
140 | 			("one rep", rep(&sym('a')), "a", true),
141 | 			("short multiple failed rep", rep(&sym('a')), "ab", false),
142 | 			("multiple rep", rep(&sym('a')), "aaaaaaaaa", true),
143 | 			(
144 | 				"multiple rep with failure",
145 | 				rep(&sym('a')),
146 | 				"aaaaaaaaab",
147 | 				false,
148 | 			),
149 | 		];
150 | 
151 | 		for (name, case, sample, result) in &cases {
152 | 			println!("{:?}", name);
153 | 			assert_eq!(accept(case, &sample.to_string()), *result);
154 | 		}
155 | 	}
156 | }
157 | 


--------------------------------------------------------------------------------
/rust/06_riggedglushkov/Cargo.toml:
--------------------------------------------------------------------------------
1 | [package]
2 | name = "glushkov"
3 | version = "0.1.0"
4 | authors = ["Elf M. Sternberg <elf.sternberg@gmail.com>"]
5 | edition = "2018"
6 | 
7 | [dependencies]
8 | num-traits = "0.2.6"
9 | 


--------------------------------------------------------------------------------
/rust/06_riggedglushkov/README.md:
--------------------------------------------------------------------------------
 1 | # Rigged Glushkov Regular Expressions in Rust
 2 | 
 3 | This code is significantly different from the Haskell version (Haskell
 4 | Experiment 07), in that I decided to "go for it" and merge the process
 5 | of instantiation and rigging into a single structure. 
 6 | 
 7 | Prior to Rust Experiment 06, the Rust experiments had followed the
 8 | Haskell versions' pattern of building the regular expression first using
 9 | Kleene expressions, and then lifting them on-the-fly into more complex,
10 | "rigged" versions, before processing them with the given Kleene or
11 | Glushkov construction.
12 | 
13 | Rust famously doesn't have memory management, but instead uses lifetimes
14 | and scopes to place much of what it does safely on the stack.  It takes
15 | some fiddling to make types, lifetimes, and scopes line up, so as far
16 | back as the first Rust experiment I had individual factory functions for
17 | the different Regex sub-types.  These take the place of the simple
18 | `data` types seen in the equivalent Haskell experiments.
19 | 
20 | With that in mind, there was no reason to have two different data
21 | structures; for this experiment, there is only the one `enum` types and
22 | its sub-types.
23 | 
24 | In this experiment, as in the Haskell version, I build on their idea of
25 | recording already found "empty" and "final" versions of nodes, so the
26 | data structure is now a record of (empty, final, expression).  The
27 | `emp`, `alt`, `seq`, and `rep` expressions are pretty much as you'd
28 | expect; one thing you won't find in the base code is the implementation
29 | for `sym`.  `sym` must implemented independently for different semiring
30 | implementations.
31 | 
32 | The `Sym` expression is a *trait* now; it says that users must provide
33 | an implementation with a single method, `is`, that takes a symbol and
34 | returns a semiring.
35 | 
36 | I've had to abandon the use of `num_traits` and `std::ops`, as
37 | `std::ops::Mul` and `std::ops::Add` don't provide a framework for
38 | passing in references.  I've gone back to my initial instincts and
39 | provided a comprehensive `Semiring` trait which can take references for
40 | the `mul` and `add` operations.  This works very well, as now I can
41 | analyze and operate on immutable `HashSet<String>` collections without
42 | having to clone them to pass them to the cartesian product operation.
43 | That's a massive win in terms of memory and CPU savings.
44 | 
45 | The construction processing the expression using Gluskov's progressive
46 | algorithm is the same as the unrigged version, only we cache the "empty"
47 | and "final" values when they're found and do not recalculate them.
48 | 
49 | Down in the tests, you'll find the Boolean version (`Recognizer`) as
50 | well as a string version (`Parser`).  Both versions show how to
51 | implement a semiring for doing data extraction, including how to define
52 | a specific Sym implementation for the `Sym` trait, and include a
53 | function to instantiate that implementation for your use case.
54 | 
55 | The `Parser` version has a specific moment of complexity that can't be
56 | elided: in the `mul` implementation, the multiplication of two sets is
57 | the cartesian product of those sets: a new set containing all possible
58 | combinations of ordered tuples made up of a member of the first set with
59 | a member of the second set.  For our purposes, this is a still a string,
60 | so our implementation involves building those tuples and then generating
61 | a new string by concatenating the orderded pair.  This takes a bit of
62 | memory thrashing, but much less now that I've solved the `mul(&x, &y)`
63 | issue.
64 | 
65 | ## License
66 | 
67 | As this is entirely my work, it is copyright (c) 2019, and licensed
68 | under the Mozilla Public License v. 2.0.  See the
69 | [LICENSE.md](../../LICENSE.md) in the root directory.
70 | 


--------------------------------------------------------------------------------
/rust/06_riggedglushkov/src/lib.rs:
--------------------------------------------------------------------------------
  1 | //! This crate provides a series of simple functions for building a
  2 | //! regular expression, and an `accept` function which takes a
  3 | //! completed regular expression and a string and returns a boolean
  4 | //! value describing if the expression matched the string (or not).
  5 | //!
  6 | 
  7 | use std::rc::Rc;
  8 | 
  9 | pub trait Semiring {
 10 | 	fn zero() -> Self;
 11 | 	fn one() -> Self;
 12 | 	fn is_zero(&self) -> bool;
 13 | 	fn mul(&self, rhs: &Self) -> Self;
 14 | 	fn add(&self, rhs: &Self) -> Self;
 15 | }
 16 | 
 17 | /// The Sym trait represents what to do for a single character.  It has
 18 | /// a single method, "is", that returns the semiring.  Implementers of
 19 | /// "is" must provide a corresponding construction factory.
 20 | 
 21 | pub trait Sym<S>
 22 | where
 23 | 	S: Semiring,
 24 | {
 25 | 	fn is(&self, c: char) -> S;
 26 | }
 27 | 
 28 | pub enum Glui<S>
 29 | where
 30 | 	S: Semiring,
 31 | {
 32 | 	Eps,
 33 | 	Sym(Rc<Sym<S>>),
 34 | 	Alt(Rc<Glu<S>>, Rc<Glu<S>>),
 35 | 	Seq(Rc<Glu<S>>, Rc<Glu<S>>),
 36 | 	Rep(Rc<Glu<S>>),
 37 | }
 38 | 
 39 | // Empty, Final, Data
 40 | pub struct Glu<S: Semiring>(S, S, Glui<S>);
 41 | 
 42 | /// Recognize only the empty string
 43 | pub fn eps<S>() -> Rc<Glu<S>>
 44 | where
 45 | 	S: Semiring,
 46 | {
 47 | 	Rc::new(Glu(S::one(), S::one(), Glui::Eps))
 48 | }
 49 | 
 50 | /// Recognize alternatives between two other regexes
 51 | pub fn alt<S>(r1: &Rc<Glu<S>>, r2: &Rc<Glu<S>>) -> Rc<Glu<S>>
 52 | where
 53 | 	S: Semiring,
 54 | {
 55 | 	Rc::new(Glu(
 56 | 		r1.0.add(&r2.0),
 57 | 		r1.1.add(&r2.1),
 58 | 		Glui::Alt(r1.clone(), r2.clone()),
 59 | 	))
 60 | }
 61 | 
 62 | /// Recognize a sequence of regexes in order
 63 | pub fn seq<S>(r1: &Rc<Glu<S>>, r2: &Rc<Glu<S>>) -> Rc<Glu<S>>
 64 | where
 65 | 	S: Semiring,
 66 | {
 67 | 	Rc::new(Glu(
 68 | 		r1.0.add(&r2.0),
 69 | 		r1.1.mul(&r2.0).add(&r2.1),
 70 | 		Glui::Seq(r1.clone(), r2.clone()),
 71 | 	))
 72 | }
 73 | 
 74 | /// Recognize a regex repeated zero or more times.
 75 | pub fn rep<S>(r1: &Rc<Glu<S>>) -> Rc<Glu<S>>
 76 | where
 77 | 	S: Semiring + Clone,
 78 | {
 79 | 	Rc::new(Glu(S::one(), r1.1.clone(), Glui::Rep(r1.clone())))
 80 | }
 81 | 
 82 | // The main function: repeatedly traverses the tree, modifying as it
 83 | // goes, generating a new tree, marking the nodes where the expression
 84 | // currently "is," for any given character.  The values of the nodes
 85 | // are cached for performance, but this probably isn't a win in Rust
 86 | // as Rust won't keep the intermediate functions generated, nor
 87 | // provide them ad-hoc to future operations the way Haskell does.
 88 | //
 89 | fn shift<S>(g: &Rc<Glu<S>>, m: &S, c: char) -> Rc<Glu<S>>
 90 | where
 91 | 	S: Semiring + Clone,
 92 | {
 93 | 	use self::Glui::*;
 94 | 	match &g.2 {
 95 | 		Eps => eps(),
 96 | 		Sym(f) => Rc::new(Glu(S::zero(), m.mul(&f.is(c)), Glui::Sym(f.clone()))),
 97 | 		Alt(r1, r2) => alt(&shift(&r1, m, c), &shift(&r2, m, c)),
 98 | 		Seq(r1, r2) => seq(
 99 | 			&shift(&r1, m, c),
100 | 			&shift(&r2, &(m.mul(&r1.0).add(&r1.1)), c),
101 | 		),
102 | 		Rep(r) => rep(&shift(&r, &(m.add(&r.1)), c)),
103 | 	}
104 | }
105 | 
106 | pub fn accept<S>(g: &Rc<Glu<S>>, s: &str) -> S
107 | where
108 | 	S: Semiring + Clone,
109 | {
110 | 	if s.is_empty() {
111 | 		return g.0.clone();
112 | 	}
113 | 
114 | 	let ashift = |g, c| shift(&g, &S::zero(), c);
115 | 
116 | 	// This is kinda cool. I wonder if I can make the Brz versions look
117 | 	// like this.
118 | 	let mut seq = s.chars();
119 | 	let start = shift(g, &S::one(), seq.next().unwrap());
120 | 	seq.fold(start, ashift).1.clone()
121 | }
122 | 
123 | #[cfg(test)]
124 | mod tests {
125 | 
126 | 	use super::*;
127 | 	use std::collections::HashSet;
128 | 
129 | 	macro_rules! set {
130 |         ( $( $x:expr ),* ) => {{
131 |                 let mut temp_set = HashSet::new();
132 |                 $( temp_set.insert($x); )*
133 |                 temp_set //
134 |             }};
135 |     }
136 | 
137 | 	#[derive(Debug, Copy, Clone)]
138 | 	pub struct Recognizer(bool);
139 | 
140 | 	impl Semiring for Recognizer {
141 | 		fn one() -> Recognizer {
142 | 			Recognizer(true)
143 | 		}
144 | 		fn zero() -> Recognizer {
145 | 			Recognizer(false)
146 | 		}
147 | 		fn is_zero(&self) -> bool {
148 | 			!self.0
149 | 		}
150 | 		fn mul(&self, rhs: &Recognizer) -> Recognizer {
151 | 			Recognizer(self.0 && rhs.0)
152 | 		}
153 | 		fn add(&self, rhs: &Recognizer) -> Recognizer {
154 | 			Recognizer(self.0 || rhs.0)
155 | 		}
156 | 	}
157 | 
158 | 	pub struct SimpleSym {
159 | 		c: char,
160 | 	}
161 | 
162 | 	impl Sym<Recognizer> for SimpleSym {
163 | 		fn is(&self, c: char) -> Recognizer {
164 | 			if c == self.c {
165 | 				Recognizer::one()
166 | 			} else {
167 | 				Recognizer::zero()
168 | 			}
169 | 		}
170 | 	}
171 | 
172 | 	#[test]
173 | 	fn basics() {
174 | 		pub fn sym(sample: char) -> Rc<Glu<Recognizer>> {
175 | 			Rc::new(Glu(
176 | 				Recognizer::zero(),
177 | 				Recognizer::zero(),
178 | 				Glui::Sym(Rc::new(SimpleSym { c: sample })),
179 | 			))
180 | 		}
181 | 
182 | 		let cases = [
183 | 			("empty", eps(), "", true),
184 | 			("char", sym('a'), "a", true),
185 | 			("not char", sym('a'), "b", false),
186 | 			("char vs empty", sym('a'), "", false),
187 | 			("left alt", alt(&sym('a'), &sym('b')), "a", true),
188 | 			("right alt", alt(&sym('a'), &sym('b')), "b", true),
189 | 			("neither alt", alt(&sym('a'), &sym('b')), "c", false),
190 | 			("empty alt", alt(&sym('a'), &sym('b')), "", false),
191 | 			("empty rep", rep(&sym('a')), "", true),
192 | 			("sequence", seq(&sym('a'), &sym('b')), "ab", true),
193 | 			("sequence with empty", seq(&sym('a'), &sym('b')), "", false),
194 | 			("bad long sequence", seq(&sym('a'), &sym('b')), "abc", false),
195 | 			("bad short sequence", seq(&sym('a'), &sym('b')), "a", false),
196 | 			("one rep", rep(&sym('a')), "a", true),
197 | 			("short multiple failed rep", rep(&sym('a')), "ab", false),
198 | 			("multiple rep", rep(&sym('a')), "aaaaaaaaa", true),
199 | 			(
200 | 				"multiple rep with failure",
201 | 				rep(&sym('a')),
202 | 				"aaaaaaaaab",
203 | 				false,
204 | 			),
205 | 		];
206 | 
207 | 		for (name, case, sample, result) in &cases {
208 | 			println!("{:?}", name);
209 | 			assert_eq!(accept(case, &sample.to_string()).0, *result);
210 | 		}
211 | 	}
212 | 
213 | 	#[derive(Debug, Clone)]
214 | 	pub struct Parser(HashSet<String>);
215 | 
216 | 	impl Semiring for Parser {
217 | 		fn one() -> Parser {
218 | 			Parser(set!["".to_string()])
219 | 		}
220 | 		fn zero() -> Parser {
221 | 			Parser(set![])
222 | 		}
223 | 		fn is_zero(&self) -> bool {
224 | 			self.0.len() == 0
225 | 		}
226 | 		fn mul(self: &Parser, rhs: &Parser) -> Parser {
227 | 			let mut temp = set![];
228 | 			for i in self.0.iter().cloned() {
229 | 				for j in &rhs.0 {
230 | 					temp.insert(i.clone() + &j);
231 | 				}
232 | 			}
233 | 			Parser(temp)
234 | 		}
235 | 		fn add(self: &Parser, rhs: &Parser) -> Parser {
236 | 			Parser(self.0.union(&rhs.0).cloned().collect())
237 | 		}
238 | 	}
239 | 
240 | 	pub struct ParserSym {
241 | 		c: char,
242 | 	}
243 | 
244 | 	impl Sym<Parser> for ParserSym {
245 | 		fn is(&self, c: char) -> Parser {
246 | 			if c == self.c {
247 | 				Parser(set![c.to_string()])
248 | 			} else {
249 | 				Parser::zero()
250 | 			}
251 | 		}
252 | 	}
253 | 
254 | 	#[test]
255 | 	fn string_basics() {
256 | 		pub fn sym(sample: char) -> Rc<Glu<Parser>> {
257 | 			Rc::new(Glu(
258 | 				Parser::zero(),
259 | 				Parser::zero(),
260 | 				Glui::Sym(Rc::new(ParserSym { c: sample })),
261 | 			))
262 | 		}
263 | 
264 | 		let cases = [
265 | 			("empty", eps(), "", Some("")),
266 | 			("char", sym('a'), "a", Some("a")),
267 | 			("not char", sym('a'), "b", None),
268 | 			("char vs empty", sym('a'), "", None),
269 | 			("left alt", alt(&sym('a'), &sym('b')), "a", Some("a")),
270 | 			("right alt", alt(&sym('a'), &sym('b')), "b", Some("b")),
271 | 			("neither alt", alt(&sym('a'), &sym('b')), "c", None),
272 | 			("empty alt", alt(&sym('a'), &sym('b')), "", None),
273 | 			("empty rep", rep(&sym('a')), "", Some("")),
274 | 			("sequence", seq(&sym('a'), &sym('b')), "ab", Some("ab")),
275 | 			("sequence with empty", seq(&sym('a'), &sym('b')), "", None),
276 | 			("bad long sequence", seq(&sym('a'), &sym('b')), "abc", None),
277 | 			("bad short sequence", seq(&sym('a'), &sym('b')), "a", None),
278 | 			("one rep", rep(&sym('a')), "a", Some("a")),
279 | 			("short multiple failed rep", rep(&sym('a')), "ab", None),
280 | 			(
281 | 				"multiple rep",
282 | 				rep(&sym('a')),
283 | 				"aaaaaaaaa",
284 | 				Some("aaaaaaaaa"),
285 | 			),
286 | 			(
287 | 				"multiple rep with failure",
288 | 				rep(&sym('a')),
289 | 				"aaaaaaaaab",
290 | 				None,
291 | 			),
292 | 		];
293 | 
294 | 		for (name, case, sample, result) in &cases {
295 | 			println!("{:?}", name);
296 | 			let ret = accept(case, &sample.to_string()).0;
297 | 			match result {
298 | 				Some(r) => {
299 | 					let v = ret.iter().next();
300 | 					if let Some(s) = v {
301 | 						assert_eq!(s, sample);
302 | 					} else {
303 | 						panic!("Strings did not match: {:?}, {:?}", r, v);
304 | 					}
305 | 					assert_eq!(1, ret.len());
306 | 				}
307 | 				None => assert_eq!(0, ret.len()),
308 | 			}
309 | 		}
310 | 	}
311 | }
312 | 


--------------------------------------------------------------------------------
/rust/07_heavyweights/Cargo.toml:
--------------------------------------------------------------------------------
 1 | [package]
 2 | name = "heavyweights"
 3 | version = "0.1.0"
 4 | authors = ["Elf M. Sternberg <elf.sternberg@gmail.com>"]
 5 | edition = "2018"
 6 | 
 7 | [dependencies]
 8 | itertools="0.8.0"
 9 | bytes="0.4.11"
10 | 


--------------------------------------------------------------------------------
/rust/07_heavyweights/README.md:
--------------------------------------------------------------------------------
 1 | # Rigged Generic Glushkov Regular Expressions in Rust: The Heavyweight Experiments
 2 | 
 3 | This implementation is significantly different than previous ones,
 4 | although it does build off that work, and it does proceed directly from
 5 | the [Haskell implementation](../../haskell/08_Heavyweights/) and the
 6 | [previous Rust experiment](../../rust/06_riggedglushkov).
 7 | 
 8 | I strongly recommend reading the [Haskell implementation
 9 | README](../../haskell/08_Heavyweights/README.md) to get a sense of the
10 | changes to the algorithm.  They are fairly heavyweight and interesting,
11 | in that the definition of the Semiring has been further abstracted to
12 | handle position information, and as a consequence the input type of the
13 | operation has likewise been abstracted to handle arbitrary input types.
14 | The last was done to enable us to pass both the character being analyzed
15 | and the position in the stream, such that we could record information
16 | about the position under certain circumstances.
17 | 
18 | What makes the *Rust* version of this implementation noteworthy is the
19 | ease with which the inbound data type was changed to handle just about
20 | any kind of data.  It adds a bit of genericizing noise to the source
21 | file, some ceremony that makes me wonder how I could abstract or derive
22 | it automatically.
23 | 
24 | On the other hand, since the `Recognizer` and `Parser` implementations
25 | concretize that their input type is `char` by usage, they work *completely
26 | unchanged* from the previous Rust experiment.  That's remarkable.
27 | 
28 | The implementation of the `Leftlong` Semiring, which reports the
29 | location and length of the first, longest substring match of a capture
30 | group (yes!) is fairly extensive and went through a number of thrashes
31 | before I recalled that I could match on a tuple, at which point the
32 | implementations of `add` and `mul` became straightforward.
33 | 
34 | Putting the entirety of the semiring in a single trait makes more sense,
35 | to me at least, than abstracting it further over `num_traits`, as I had
36 | it in earlier versions.  While using `num_traits` is *clever*, it also
37 | forces us to work with the `::Mul` and `::Add`, which do not take
38 | references, and for larger and more complex semirings, working with
39 | references made a lot of sense.
40 | 
41 | The implementation of `submatch`, a function that allows us to search
42 | for arbitrary substrings without having to root or ceiling the string is
43 | interesting; by using the `One()` value, I'm able to preserve the fact
44 | that the search hasn't failed while also enforcing the notion that we're
45 | skipping over things that match `any` but don't match the concrete
46 | sample, which is nifty.
47 | 
48 | All in all, this is highly satisfying work, and it's a pleasure to see
49 | it working so well.
50 | 
51 | ## License
52 | 
53 | As this is entirely my work, it is copyright (c) 2019, and licensed
54 | under the Mozilla Public License v. 2.0.  See the
55 | [LICENSE.md](../../LICENSE.md) in the root directory.
56 | 


--------------------------------------------------------------------------------
/rust/08_riggedbrz/Cargo.toml:
--------------------------------------------------------------------------------
1 | [package]
2 | name = "riggedbrz"
3 | version = "0.1.0"
4 | authors = ["Elf M. Sternberg <elf.sternberg@gmail.com>"]
5 | edition = "2018"
6 | 
7 | [dependencies]
8 | 


--------------------------------------------------------------------------------
/rust/08_riggedbrz/README.md:
--------------------------------------------------------------------------------
1 | This is the classic implementation of Brzozowski's Algorithm with
2 | Weighted Semirings.  There are very few optimizations in this code.  One
3 | aspect that frustrates me is the use of the `Del()` operator; it's a
4 | holdover from a time when I didn't quite understand the interaction
5 | between regular expressions and semirings; its purpose is to preserve
6 | the results of a lazy parsenull() of the sequence.  Later, we replace
7 | that with a smarter algorithm.
8 | 


--------------------------------------------------------------------------------
/rust/09_riggedbrz/Cargo.toml:
--------------------------------------------------------------------------------
1 | [package]
2 | name = "riggedbrz"
3 | version = "0.1.0"
4 | authors = ["Elf M. Sternberg <elf.sternberg@gmail.com>"]
5 | edition = "2018"
6 | 
7 | [dependencies]
8 | hashbrown = "0.1.7"
9 | 


--------------------------------------------------------------------------------
/rust/09_riggedbrz/README.md:
--------------------------------------------------------------------------------
 1 | # Rigged Generic Brzozowski μ-Regular Expressions
 2 | 
 3 | This experiment realizes an important and significant step in the
 4 | series of experiments.  In this variant, the `*` *operation* has been
 5 | completely removed; the `*` *operator* is now implemented as a recursive
 6 | definition:
 7 | 
 8 |     // ONE is the identity operator under concatenation
 9 |     r* = alt(eps(ONE), seq(r, r*))
10 | 
11 | This may seem nonsensical to a programmer, but it's actually quite
12 | implementable in a language that allows mutation under limited effects.
13 | 
14 | This means that I now have working *μ-regular expressions*, regular
15 | expressions that use a fixpoint operator to identify the least common
16 | fixpoint of a recursive regular expression, and one that allows regular
17 | expressions to be encapsulated as variables and composed just as one
18 | would compose ordinary functions.
19 | 
20 | This effort took a *large* number of evolutions, as I went down various
21 | paths trying to write code faster than I could think or understand.  At
22 | least twice I had to delete the work in progress and back up to an
23 | earlier commit, throwing away hours of work.
24 | 
25 | But this is *it*, for some definition of "it."  This is what everything
26 | has been working up to.
27 | 
28 | ## Understanding this code
29 | 
30 | The first thing to appreciate is that `nullable()` is now a
31 | self-terminating recursive implementation.  At its core is the same
32 | nullable() instruction we've been using for a while now, but now when we
33 | determine the nullability of a node in the expression, we cache that
34 | value and we notify all of its parent nodes that they may also be able
35 | to determine nullability.  This is useful for cases such as the Alt(),
36 | which has two children: if one is determined to be nullable, then the
37 | other may be as well, in which case it's now possible to mark (cache)
38 | that the entire expression is always nullable.  And that's useful if the
39 | expression is going to be re-used.
40 | 
41 | And in this code, expressions are composable, re-usable elements of
42 | code.  They can be re-used.
43 | 
44 | Also of note: the "mutate" codes are implementations of the short-outs
45 | described in Might's last paper on the topic; take a node, take its
46 | inputs, and determine if one of those inputs is already null or empty;
47 | if that's the case, then a *different* node must be expressed in that
48 | position, one that's simpler and faster.
49 | 
50 | ## License
51 | 
52 | As this is entirely my work, it is copyright (c) 2019, and licensed
53 | under the Mozilla Public License v. 2.0.  See the
54 | [LICENSE.md](../../LICENSE.md) in the root directory.
55 | 


--------------------------------------------------------------------------------