├── .gitattributes
├── .gitignore
├── LICENSE.md
├── README.md
├── docs
├── 01_Regex_in_Typescript.md
├── 01_What_Is_A_Regular_Expression.md
├── 02_Finite_Automata.md
├── A_Play_03.md
├── A_Play_04.md
├── DFA1.png
├── NFA1.png
├── notes.md
├── paper.css
└── summary.md
├── haskell
├── 01_SimpleRegex
│ ├── LICENSE
│ ├── README.md
│ ├── SimpleRegex.cabal
│ ├── package.yaml
│ ├── src
│ │ └── SimpleRegex.hs
│ ├── stack.yaml
│ └── test
│ │ └── Tests.hs
├── 02_RiggedRegex
│ ├── LICENSE
│ ├── README.md
│ ├── RiggedRegex.cabal
│ ├── package.yaml
│ ├── src
│ │ └── RiggedRegex.hs
│ ├── stack.yaml
│ └── test
│ │ └── Tests.hs
├── 03_Brzozowski
│ ├── BrzExp.cabal
│ ├── LICENSE
│ ├── README.md
│ ├── Setup.hs
│ ├── package.yaml
│ ├── src
│ │ └── BrzExp.hs
│ ├── stack.yaml
│ └── test
│ │ └── Tests.hs
├── 04_Gluskov
│ ├── Glushkov.cabal
│ ├── LICENSE
│ ├── README.md
│ ├── package.yaml
│ ├── src
│ │ └── Glushkov.hs
│ ├── stack.yaml
│ └── test
│ │ └── Tests.hs
├── 05_RiggedBrz
│ ├── LICENSE
│ ├── README.md
│ ├── RiggedBrz.cabal
│ ├── Setup.hs
│ ├── package.yaml
│ ├── src
│ │ └── RiggedBrz.hs
│ ├── stack.yaml
│ └── test
│ │ └── Tests.hs
├── 06_RiggedRegex_Combinator
│ ├── LICENSE
│ ├── README.md
│ ├── RiggedRegex.cabal
│ ├── package.yaml
│ ├── src
│ │ └── RiggedRegex.hs
│ ├── stack.yaml
│ └── test
│ │ └── Tests.hs
├── 07_Rigged_Glushkov
│ ├── LICENSE
│ ├── README.md
│ ├── RiggedGlushkov.cabal
│ ├── package.yaml
│ ├── src
│ │ └── RiggedGlushkov.hs
│ ├── stack.yaml
│ └── test
│ │ └── Tests.hs
├── 08_Heavyweights
│ ├── Heavyweights.cabal
│ ├── LICENSE
│ ├── LICENSE.md
│ ├── README.md
│ ├── package.yaml
│ ├── src
│ │ └── Heavyweights.hs
│ ├── stack.yaml
│ └── test
│ │ └── Tests.hs
└── 09_Classed_Brzozowski
│ ├── BrzExp.cabal
│ ├── LICENSE
│ ├── README.md
│ ├── Setup.hs
│ ├── package.yaml
│ ├── src
│ └── BrzExp.hs
│ ├── stack.yaml
│ └── test
│ └── Tests.hs
├── node
└── 01_Kleene.ts
├── python
├── 01_rigged_brzozowski.py
├── 02_rigged_brzozowski.py
└── README.md
└── rust
├── 01_simpleregex
├── Cargo.toml
├── README.md
└── src
│ └── lib.rs
├── 02_riggedregex
├── Cargo.toml
├── README.md
└── src
│ └── lib.rs
├── 03_brzozowski_1
├── .gitignore
├── Cargo.toml
├── README.md
└── src
│ └── lib.rs
├── 04_brzozowski_2
├── .gitignore
├── Cargo.toml
├── README.md
└── src
│ └── lib.rs
├── 05_glushkov
├── Cargo.toml
├── README.md
└── src
│ └── lib.rs
├── 06_riggedglushkov
├── Cargo.toml
├── README.md
└── src
│ └── lib.rs
├── 07_heavyweights
├── Cargo.toml
├── README.md
└── src
│ └── lib.rs
├── 08_riggedbrz
├── Cargo.toml
├── README.md
└── src
│ └── lib.rs
└── 09_riggedbrz
├── Cargo.toml
├── README.md
└── src
└── lib.rs
/.gitattributes:
--------------------------------------------------------------------------------
1 | haskell/* linguist-vendored
2 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .#*
2 | *~
3 | *#
4 | *.aux
5 | cabal-dev
6 | cabal.project.local
7 | cabal.project.local~
8 | .cabal-sandbox/
9 | cabal.sandbox.config
10 | Cargo.lock
11 | *.chi
12 | *.chs.h
13 | dist
14 | dist-*
15 | *.dyn_hi
16 | *.dyn_o
17 | *.eventlog
18 | .ghc.environment.*
19 | *.hi
20 | *.hp
21 | .hpc
22 | .hsenv
23 | .HTF/
24 | *.o
25 | *.prof
26 | **/*.rs.bk
27 | .stack-work/
28 | rust/**/target
29 | *.pyc
30 | *.pyo
31 |
--------------------------------------------------------------------------------
/docs/01_What_Is_A_Regular_Expression.md:
--------------------------------------------------------------------------------
1 | # What is a regular expression?
2 |
3 | So what *is* a regular expression? Let's build up from the bottom: we
4 | start with:
5 |
6 | Alphabet
7 | : An alphabet is a set of symbols (or we can call them letters)
8 |
9 | Word
10 | : A word is a sequence of symbols from an alphabet
11 |
12 | Language
13 | : A language is a set of word sequences
14 |
15 | If our alphabet is ASCII and words are English, then a very simple
16 | language would be something like
17 |
18 | Common_Pets: {dog, cat, fish, hamster, parakeet}.
19 |
20 | Stephen Cole Kleene proposed a formal definition for "regular languages"
21 | in 1959, and what we have developed since then is a series of
22 | refinements that allow us to parse regular languages in something like
23 | linear time. Kleene's operations were meant to *generate* languages,
24 | and the research program since that time has been to turn generators
25 | into recognizers. But let's start with Kleene's generators.
26 |
27 | ## Regular Languages
28 |
29 | There are six basic operators in a regular language, and each of them is
30 | itself a regular language. The first three are the base languages,
31 | encoding the "zero," "one," and "element" of the regular language, and
32 | the second three are composite languages; they contain other regular
33 | languages (including other composites) to describe a complete
34 | generator.
35 |
36 | Given an alphabet, `A`, we can say:
37 |
38 | `L[[∅]] = ∅`
39 | : A language that contains nothing is made up of nothing.
40 |
41 | `L[[ε]] = {ε}`
42 | : A language containing only the empty string can only
43 | generate empty strings.
44 |
45 | `L[[a]] = {a}`
46 | : A language containing only the letter 'a' can only generate a single
47 | instance of the letter 'a'. (This is true for all letters in the
48 | alphabet.)
49 |
50 | `L[[r · s]] = {u · v | u ∈ L[[r]] and v ∈ L[[s]]}`
51 | : A composite language made up of the *sequence* of two other regular
52 | expressions `r` and `s` can generate any tuple `uv` for every `u`
53 | generated by `r` and every `v` generated by `s`.
54 |
55 | `L[[r | s]] = L[[r]] ∪ L[[s]]`
56 | : A composite language made up of the *alternatives* of two other
57 | regular expressions `r` and `s` can generate either the strings of `r`
58 | or the strings of `s`, or both!
59 |
60 | `L[[r∗]] = {ε} ∪ L[[r · r*]]`
61 | : A composite language that repeats `r` zero or more times can generate
62 | zero or more instances of the strings generated by `r`.
63 |
64 | ## Regular Expressions
65 |
66 | What we usually think of as "regular expressions" are in fact a small
67 | programming language designed to be parsed and to internally generate a
68 | function that recognizes whether or not a string "belongs to" the sets
69 | of strings described by a Kleene Algebra. Programatically, a regular
70 | expression is a function that takes a regular language and a string, and
71 | returns back a boolean value indicating whether or not the string
72 | belongs to the set of strings described by the regular language.
73 |
74 | Regular expressions take Kleene's algebra and turn it backwards, asking
75 | "Can this given string be generated by an expression in Kleene's
76 | algebra?" In both the Rust and Haskell branches you'll find the
77 | SimpleRegex implementations, which take this quite literally. The
78 | Haskell version is the most concise; it literally encodes Kleene's five
79 | generative operations (the language of null doesn't generate anything)
80 | and *all possible combinations of `r` and `s` for any given composite
81 | language* and then tests all those combinations to see if the expression
82 | generated any of them.
83 |
84 | This is, of course, inexcusably slow. For any string of length `n`, the
85 | number of comparisons done, thanks mostly to the Sequence composite, is
86 | 2n-1 operations. For a string of 8 letters, that's 256
87 | different combinations of strings that have to be matched, and on my
88 | fairly modern laptop that takes a little longer than 20 seconds.
89 | Increase that to 15 letters and you'll be waiting almost an hour.
90 |
91 | The entirety of the modern parsing research program has been to make
92 | this faster and easier to use. There have been many attempts, and this
93 | project isn't meant to break new ground; instead, its goal is to take
94 | promising results from a variety of different academic research projects
95 | and explore whether there's anything new and interesting that we can
96 | exploit in a modern systems language like Rust or C++.
97 |
98 |
--------------------------------------------------------------------------------
/docs/A_Play_03.md:
--------------------------------------------------------------------------------
1 | In the [last
2 | post](https://elfsternberg.com/2019/01/23/a-play-on-regular-expressions-part-2/)
3 | on "[A Play on Regular
4 | Expressions](https://www-ps.informatik.uni-kiel.de/~sebf/pub/regexp-play.html),"
5 | I showed how we go from a boolean regular expression to a "rigged" one;
6 | one that uses an arbitrary data structure to extract data from the
7 | process of recognizing regular expressions. The data structure must
8 | conform to a set of mathematical laws (the
9 | [semiring](https://en.wikipedia.org/wiki/Semiring) laws), but that
10 | simple requirement led us to some surprisingly robust results.
11 |
12 | Now, the question is: Can we port this to Rust?
13 |
14 | Easily.
15 |
16 | The first thing to do, however, is to *not* implement a Semiring. A
17 | Semiring is a conceptual item, and in Rust it turns out that you can get
18 | away without defining a Semiring as a trait; instead, it's a collection
19 | of traits derived from the `num_traits` crate: `Zero, zero, One, one`;
20 | the capitalized versions are the traits, and the lower case ones are the
21 | implementations we have to provide.
22 |
23 | I won't post the entire code here, but you can check it out in [Rigged
24 | Kleene Regular Expressions in
25 | Rust](https://github.com/elfsternberg/riggedregex/tree/master/rust/02_riggedregex).
26 | Here are a few highlights:
27 |
28 | The `accept()` function for the Haskell version looked like this:
29 |
30 | acceptw :: Semiring s => Regw c s -> [c] -> s
31 | acceptw Epsw u = if null u then one else zero
32 | acceptw (Symw f) u = case u of [c] -> f c; _ -> zero
33 | acceptw (Altw p q) u = acceptw p u `add` acceptw q u
34 | acceptw (Seqw p q) u = sumr [ acceptw p u1 `mul` acceptw q u2 | (u1, u2) <- split u ]
35 | acceptw (Repw r) u = sumr [ prodr [ acceptw r ui | ui <- ps ] | ps <- parts u ]
36 |
37 | The `accept()` function in Rust looks almost the same:
38 |
39 | pub fn acceptw(r: &Regw, s: &[char]) -> S
40 | where S: Zero + One
41 | {
42 | match r {
43 | Regw::Eps => if s.is_empty() { one() } else { zero() },
44 | Regw::Sym(c) => if s.len() == 1 { c(s[0]) } else { zero() },
45 | Regw::Alt(r1, r2) => S::add(acceptw(&r1, s), acceptw(&r2, s)),
46 | Regw::Seq(r1, r2) => split(s)
47 | .into_iter()
48 | .map(|(u1, u2)| acceptw(r1, &u1) * acceptw(r2, &u2))
49 | .fold(S::zero(), sumr),
50 | Regw::Rep(r) => parts(s)
51 | .into_iter()
52 | .map(|ps| ps.into_iter().map(|u| acceptw(r, &u)).fold(S::one(), prod))
53 | .fold(S::zero(), sumr)
54 | }
55 | }
56 |
57 | There's a bit more machinery here to support the `sum`-over and
58 | `product`-over maps. There's also the `where S: Zero + One` clause,
59 | which tells us that our Semiring must be something that understands
60 | those two notions and have implementations for them.
61 |
62 | To restore our boolean version of our engine, we have to build a nominal
63 | container that supports the various traits of our semiring. To do that,
64 | we need to implement the methods associated with `Zero`, `One`, `Mul`,
65 | and `Add`, and explain what they mean to the datatype of our semiring.
66 | The actual work is straightforward.
67 |
68 | pub struct Recognizer(bool);
69 |
70 | impl Zero for Recognizer {
71 | fn zero() -> Recognizer { Recognizer(false) }
72 | fn is_zero(&self) -> bool { !self.0 }
73 | }
74 |
75 | impl One for Recognizer {
76 | fn one() -> Recognizer { Recognizer(true) }
77 | }
78 |
79 | impl Mul for Recognizer {
80 | type Output = Recognizer;
81 | fn mul(self, rhs: Recognizer) -> Recognizer { Recognizer(self.0 && rhs.0) }
82 | }
83 |
84 | impl Add for Recognizer {
85 | type Output = Recognizer;
86 | fn add(self, rhs: Recognizer) -> Recognizer { Recognizer(self.0 || rhs.0) }
87 | }
88 |
89 | Also, unlike Haskell, Rust must be explicitly told what kind of Semiring
90 | will be used before processing, whereas Haskell will see what kind of
91 | Semiring you need to produce the processed result and hook up the
92 | machinery for you, but that's not surprising. In Rust, you "lift" a
93 | straight expression to a rigged one thusly:
94 |
95 | let rigged: Regw = rig(&evencs);
96 |
97 | All in all, porting the Haskell to Rust was extremely straightforward.
98 | The code looks remarkably similar, but for one detail. In the Kleene
99 | version of regular expressions we're emulating as closely as possible
100 | the "all possible permutations of our input string" implicit in the
101 | set-theoretic language of Kleene's 1956 paper. That slows us down a
102 | lot, but in Haskell the code for doing it was extremely straightforward,
103 | which two simple functions to create all possible permutations for both
104 | the sequence and repetition options:
105 |
106 | split [] = [([], [])]
107 | split (c:cs) = ([], c : cs) : [(c : s1, s2) | (s1, s2) <- split cs]
108 | parts [] = [[]]
109 | parts [c] = [[[c]]]
110 | parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs]
111 |
112 | In Rust, these two functions were 21 and 29 lines long, respectively.
113 | Rust's demands that you pay attention to memory usage and the rules
114 | about it require that you also be very explicit about when you want it,
115 | so Rust knows exactly when you no longer want it and can release it back
116 | to the allocator.
117 |
118 | Rust's syntax and support are amazing, and the way Haskell can be ported
119 | to Rust with little to no loss of fidelity makes me happy to work in
120 | both.
121 |
--------------------------------------------------------------------------------
/docs/DFA1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/elfsternberg/riggedregex/065a5741807071a55b0e22c0c1ed8a1dfcb1abec/docs/DFA1.png
--------------------------------------------------------------------------------
/docs/NFA1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/elfsternberg/riggedregex/065a5741807071a55b0e22c0c1ed8a1dfcb1abec/docs/NFA1.png
--------------------------------------------------------------------------------
/docs/notes.md:
--------------------------------------------------------------------------------
1 | Owens and Reppy did a much better job than I originally thought. They
2 | use the tilde to mean "is recognized by," as in "r ~ u" means "`r`
3 | *recognizes* the string `u`".
4 |
5 | Following on the nullability issue, r ~ ε ⇔ ν(r) = ε, r ~ aw ⇔ δ(a)r ~ w
6 | (`r` recognizes `aw` if the derivative of r with respect to `a`
7 | recognizes only `w`).
8 |
9 | r ≡ s (r is equivalent to s) if 𝓛⟦r⟧ = 𝓛⟦s⟧. Note that this is
10 | "equivalance" under set theory, where given a binary equivalence
11 | operation. That is, if the elements of some set S have an equivalence
12 | notion, then the set S can be split into *equivalence classes*.
13 |
14 | 1. At each step we have a residual regular expression `r` for the
15 | residual string `s`
16 |
17 | 2. Instead of computing the derivative on the fly, we precompute the
18 | derivative of `r` for each symbol in our alphabet `Σ`, thereby
19 | constructing a DFA for the language in `r`.
20 |
21 | 3. Computing equivalence can be expensive
22 |
23 | 4. It is not practical to iterate over every Unicode codepoint for
24 | each state.
25 |
26 | 5. A scanner-generator takes a collection of REs, not just one.
27 |
28 | Owens & Reppy introduce a notion of *weak equivalence*, which is a set
29 | of rules for harmonizing some regular expression equivalents. These
30 | look a lot like some of the performance optimizations found in Might &
31 | Adams.
32 |
33 | They define a *class*, **S**, where **S** ⊆ Σ. **S** covers both the
34 | empty set and the single character set, as well as a multi-character
35 | *class*.
36 |
37 | They then add equivalence expressions: **R** + **S** ≈ **T** where
38 | T = R ∪ S. (Note that this works for *recognition*. But what about more
39 | complex operations?)
40 |
41 | We say that a and b are equivalent in r only if δ(a)r ≡ δ(b)r.
42 |
43 | r = a + b · a + c
44 |
45 | (Do we read this "a OR ba OR c" or "(a or b)(a or c)". If we read it
46 | the first way, then this makes sense: the equivalence classes for r
47 | produce three possible derivatives: {a, c}, transition to `ε`; {b},
48 | transition to `a`; or Σ\{a,b,c}, which is the alphabet that excludes a,
49 | b, or c, and transitions to `⊘`.)
50 |
51 | All right, having gotten that out of the way, we say that i ≅ᵣ j (the
52 | derivative class of r(i) is equivalent to the derivative class of r(j))
53 | if `δᵢr ≡ δⱼr`.
54 |
55 | fun goto q (S, (Q, δ)) =
56 | let c ∈ S
57 | let q c = ∂ c q
58 | in
59 | if ∃q 0 ∈ Q such that q 0 ≈ q c
60 | then (Q, δ ∪ {(q, S) 7→ q 0 })
61 | else
62 | let Q 0 = Q ∪ {q c }
63 | let δ 0 = δ ∪ {(q, S) 7→ q c }
64 | in explore (Q 0 , δ 0 , q c )
65 |
66 | let explore (Q, δ, q) = fold (goto q) (Q, δ) (C(q))
67 |
68 | fun mkDFA r =
69 | let q 0 = ∂ ε r
70 | let (Q, δ) = explore ({q 0 }, {}, q 0 )
71 | let F = {q | q ∈ Q and ν(q) = ε}
72 | in hQ, q 0 , F, δi
73 |
74 |
75 |
76 |
77 |
78 |
--------------------------------------------------------------------------------
/docs/paper.css:
--------------------------------------------------------------------------------
1 | /*
2 | * I add this to html files generated with pandoc.
3 | */
4 |
5 | html {
6 | font-size: 100%;
7 | overflow-y: scroll;
8 | -webkit-text-size-adjust: 100%;
9 | -ms-text-size-adjust: 100%;
10 | }
11 |
12 | body {
13 | color: #444;
14 | font-family: "HelveticaÊNeue", "Helvetica", "Arial", sans-serif;
15 | font-size: 14px;
16 | line-height: 1.7;
17 | padding: 1em;
18 | margin: auto;
19 | max-width: 42em;
20 | background: #fefefe;
21 | }
22 |
23 | a {
24 | color: #0645ad;
25 | text-decoration: none;
26 | }
27 |
28 | a:visited {
29 | color: #0b0080;
30 | }
31 |
32 | a:hover {
33 | color: #06e;
34 | }
35 |
36 | a:active {
37 | color: #faa700;
38 | }
39 |
40 | a:focus {
41 | outline: thin dotted;
42 | }
43 |
44 | *::-moz-selection {
45 | background: rgba(255, 255, 0, 0.3);
46 | color: #000;
47 | }
48 |
49 | *::selection {
50 | background: rgba(255, 255, 0, 0.3);
51 | color: #000;
52 | }
53 |
54 | a::-moz-selection {
55 | background: rgba(255, 255, 0, 0.3);
56 | color: #0645ad;
57 | }
58 |
59 | a::selection {
60 | background: rgba(255, 255, 0, 0.3);
61 | color: #0645ad;
62 | }
63 |
64 | p {
65 | margin: 1em 0;
66 | }
67 |
68 | img {
69 | max-width: 100%;
70 | }
71 |
72 | h1, h2, h3, h4, h5, h6 {
73 | color: #111;
74 | line-height: 125%;
75 | margin-top: 2em;
76 | font-weight: normal;
77 | }
78 |
79 | h4, h5, h6 {
80 | font-weight: bold;
81 | }
82 |
83 | h1 {
84 | font-size: 2.5em;
85 | }
86 |
87 | h2 {
88 | font-size: 2em;
89 | }
90 |
91 | h3 {
92 | font-size: 1.5em;
93 | }
94 |
95 | h4 {
96 | font-size: 1.2em;
97 | }
98 |
99 | h5 {
100 | font-size: 1em;
101 | }
102 |
103 | h6 {
104 | font-size: 0.9em;
105 | }
106 |
107 | blockquote {
108 | color: #666666;
109 | margin: 0;
110 | padding-left: 3em;
111 | border-left: 0.5em #EEE solid;
112 | }
113 |
114 | hr {
115 | display: block;
116 | height: 2px;
117 | border: 0;
118 | border-top: 1px solid #aaa;
119 | border-bottom: 1px solid #eee;
120 | margin: 1em 0;
121 | padding: 0;
122 | }
123 |
124 | pre, code, kbd, samp {
125 | color: #000;
126 | font-family: monospace, monospace;
127 | _font-family: 'courier new', monospace;
128 | font-size: 0.98em;
129 | }
130 |
131 | pre {
132 | white-space: pre;
133 | white-space: pre-wrap;
134 | word-wrap: break-word;
135 | }
136 |
137 | b, strong {
138 | font-weight: bold;
139 | }
140 |
141 | dfn {
142 | font-style: italic;
143 | }
144 |
145 | ins {
146 | background: #ff9;
147 | color: #000;
148 | text-decoration: none;
149 | }
150 |
151 | mark {
152 | background: #ff0;
153 | color: #000;
154 | font-style: italic;
155 | font-weight: bold;
156 | }
157 |
158 | sub, sup {
159 | font-size: 75%;
160 | line-height: 0;
161 | position: relative;
162 | vertical-align: baseline;
163 | }
164 |
165 | sup {
166 | top: -0.5em;
167 | }
168 |
169 | sub {
170 | bottom: -0.25em;
171 | }
172 |
173 | ul, ol {
174 | margin: 1em 0;
175 | padding: 0 0 0 2em;
176 | }
177 |
178 | li p:last-child {
179 | margin-bottom: 0;
180 | }
181 |
182 | ul ul, ol ol {
183 | margin: .3em 0;
184 | }
185 |
186 | dl {
187 | margin-bottom: 1em;
188 | }
189 |
190 | dt {
191 | font-weight: bold;
192 | margin-bottom: .8em;
193 | }
194 |
195 | dd {
196 | margin: 0 0 .8em 2em;
197 | }
198 |
199 | dd:last-child {
200 | margin-bottom: 0;
201 | }
202 |
203 | img {
204 | border: 0;
205 | -ms-interpolation-mode: bicubic;
206 | vertical-align: middle;
207 | }
208 |
209 | figure {
210 | display: block;
211 | text-align: center;
212 | margin: 1em 0;
213 | }
214 |
215 | figure img {
216 | border: none;
217 | margin: 0 auto;
218 | }
219 |
220 | figcaption {
221 | font-size: 0.8em;
222 | font-style: italic;
223 | margin: 0 0 .8em;
224 | }
225 |
226 | table {
227 | margin-bottom: 2em;
228 | border-bottom: 1px solid #ddd;
229 | border-right: 1px solid #ddd;
230 | border-spacing: 0;
231 | border-collapse: collapse;
232 | }
233 |
234 | table th {
235 | padding: .2em 1em;
236 | background-color: #eee;
237 | border-top: 1px solid #ddd;
238 | border-left: 1px solid #ddd;
239 | }
240 |
241 | table td {
242 | padding: .2em 1em;
243 | border-top: 1px solid #ddd;
244 | border-left: 1px solid #ddd;
245 | vertical-align: top;
246 | }
247 |
248 | .author {
249 | font-size: 1.2em;
250 | text-align: center;
251 | }
252 |
253 | @media only screen and (min-width: 480px) {
254 | body {
255 | font-size: 14px;
256 | }
257 | }
258 | @media only screen and (min-width: 768px) {
259 | body {
260 | font-size: 16px;
261 | }
262 | }
263 | @media print {
264 | * {
265 | background: transparent !important;
266 | color: black !important;
267 | filter: none !important;
268 | -ms-filter: none !important;
269 | }
270 |
271 | body {
272 | font-size: 12pt;
273 | max-width: 100%;
274 | }
275 |
276 | a, a:visited {
277 | text-decoration: underline;
278 | }
279 |
280 | hr {
281 | height: 1px;
282 | border: 0;
283 | border-bottom: 1px solid black;
284 | }
285 |
286 | a[href]:after {
287 | content: " (" attr(href) ")";
288 | }
289 |
290 | abbr[title]:after {
291 | content: " (" attr(title) ")";
292 | }
293 |
294 | .ir a:after, a[href^="javascript:"]:after, a[href^="#"]:after {
295 | content: "";
296 | }
297 |
298 | pre, blockquote {
299 | border: 1px solid #999;
300 | padding-right: 1em;
301 | page-break-inside: avoid;
302 | }
303 |
304 | tr, img {
305 | page-break-inside: avoid;
306 | }
307 |
308 | img {
309 | max-width: 100% !important;
310 | }
311 |
312 | @page :left {
313 | margin: 15mm 20mm 15mm 10mm;
314 | }
315 |
316 | @page :right {
317 | margin: 15mm 10mm 15mm 20mm;
318 | }
319 |
320 | p, h2, h3 {
321 | orphans: 3;
322 | widows: 3;
323 | }
324 |
325 | h2, h3 {
326 | page-break-after: avoid;
327 | }
328 | }
329 |
--------------------------------------------------------------------------------
/docs/summary.md:
--------------------------------------------------------------------------------
1 | L[[∅]] = ∅
2 | L[[ε]] = {ε}
3 | L[[a]] = {a}
4 | L[[r · s]] = {u · v | u ∈ L[[r]] and v ∈ L[[s]]}
5 | L[[r | s]] = L[[r]] ∪ L[[s]]
6 | L[[r∗]] = {ε} ∪ L[[r · r*]]
7 |
--------------------------------------------------------------------------------
/haskell/01_SimpleRegex/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright (c) 2010, Thomas Wilke, Frank Huch, Sebastian Fischer
2 |
3 | All rights reserved.
4 |
5 | Redistribution and use in source and binary forms, with or without
6 | modification, are permitted provided that the following conditions are
7 | met:
8 |
9 | 1. Redistributions of source code must retain the above copyright
10 | notice, this list of conditions and the following disclaimer.
11 |
12 | 2. Redistributions in binary form must reproduce the above copyright
13 | notice, this list of conditions and the following disclaimer in
14 | the documentation and/or other materials provided with the
15 | distribution.
16 |
17 | 3. Neither the name of the author nor the names of his contributors
18 | may be used to endorse or promote products derived from this
19 | software without specific prior written permission.
20 |
21 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
22 | ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
23 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
24 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR
25 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
26 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
27 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
28 | PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
29 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
30 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
31 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
32 |
33 |
--------------------------------------------------------------------------------
/haskell/01_SimpleRegex/README.md:
--------------------------------------------------------------------------------
1 | # Kleene Regular Expressions, in Haskell
2 |
3 | This is literally the definition of a simple string recognizing regular
4 | expression in Haskell. It consists of the `Reg` datatype encompassing
5 | the five standard operations of regular expressions and an `accept`
6 | function that takes the expression and a string and returns a Boolean
7 | yes/no on recognition or failure. It is a direct implementation of
8 | Kleene's algebra:
9 |
10 | L[[ε]] = {ε}
11 | L[[a]] = {a}
12 | L[[r · s]] = {u · v | u ∈ L[[r]] and v ∈ L[[s]]}
13 | L[[r | s]] = L[[r]] ∪ L[[s]]
14 | L[[r∗]] = {ε} ∪ L[[r · r*]]
15 |
16 | Those equations are for: recognizing an empty string, recognizing a
17 | letter, recognizing two expressions in sequence, recognizing two
18 | expression alternatives, and the repetition operation.
19 |
20 | The `accept` function has two helper functions that split the string,
21 | and all substrings, into all possible substrings such that *every
22 | possible combination* of string and expression are tested, and if the
23 | resulting tree of `and`s (from Sequencing and Repetition) and `or`s
24 | (from Alternation) has at least one complete collection of `True` from
25 | top to bottom then the function returns true.
26 |
27 | This generation and comparison of substrings is grossly inefficient; an
28 | string of eight 'a's with `a*` will take 30 seconds on a modern laptop;
29 | increase that to twelve and you'll be waiting about an hour. The cost
30 | is `2^(n - 1)`, where `n` is the length of the string; this is a
31 | consequence of the sequencing operation. Sequences aren't just about
32 | letters: they could be about anything, including repetition (which
33 | itself creates new sequences) and other sequences, and the cost of
34 | examining every possible combination of sequencing creates this
35 | exponential cost.
36 |
37 | It is quite amazing, though, to actually *see* a straightforward
38 | implementation of Kleene's Regular Expressions in code.
39 |
40 |
41 |
42 |
--------------------------------------------------------------------------------
/haskell/01_SimpleRegex/SimpleRegex.cabal:
--------------------------------------------------------------------------------
1 | cabal-version: 1.12
2 |
3 | -- This file has been generated from package.yaml by hpack version 0.31.1.
4 | --
5 | -- see: https://github.com/sol/hpack
6 | --
7 | -- hash: 2b06081e19cfbe96fa9a2d9a12695410d10cc0b73d3fe0c09d77986d2f101773
8 |
9 | name: SimpleRegex
10 | version: 0.1.0.0
11 | category: Regex
12 | homepage: https://github.com/elfsternberg/RegexWeightedPearl#readme
13 | author: Elf M. Sternberg
14 | maintainer: elf.sternberg@gmail.com
15 | copyright: Copyright ⓒ 2019 Elf M. Sternberg
16 | license: BSD3
17 | license-file: LICENSE
18 | build-type: Simple
19 | extra-source-files:
20 | README.md
21 |
22 | library
23 | exposed-modules:
24 | SimpleRegex
25 | other-modules:
26 | Paths_SimpleRegex
27 | hs-source-dirs:
28 | src
29 | ghc-options: -Wall
30 | build-depends:
31 | base
32 | default-language: Haskell2010
33 |
34 | test-suite test
35 | type: exitcode-stdio-1.0
36 | main-is: Tests.hs
37 | other-modules:
38 | Paths_SimpleRegex
39 | hs-source-dirs:
40 | test
41 | build-depends:
42 | SimpleRegex
43 | , base
44 | , hspec
45 | default-language: Haskell2010
46 |
--------------------------------------------------------------------------------
/haskell/01_SimpleRegex/package.yaml:
--------------------------------------------------------------------------------
1 | name: SimpleRegex
2 | version: 0.1.0.0
3 |
4 | homepage: https://github.com/elfsternberg/riggedregex#readme
5 | license: BSD3
6 | license-file: LICENSE
7 | author: Elf M. Sternberg
8 | maintainer: elf.sternberg@gmail.com
9 | copyright: Copyright ⓒ 2019 Elf M. Sternberg
10 | category: Regex
11 | build-type: Simple
12 | extra-source-files: README.md
13 |
14 | dependencies:
15 | - base
16 |
17 | library:
18 | exposed-modules: SimpleRegex
19 | ghc-options: -Wall
20 | source-dirs: src
21 |
22 | tests:
23 | test:
24 | main: Tests.hs
25 | source-dirs: test
26 | dependencies:
27 | - SimpleRegex
28 | - hspec
29 |
30 |
31 |
--------------------------------------------------------------------------------
/haskell/01_SimpleRegex/src/SimpleRegex.hs:
--------------------------------------------------------------------------------
1 | {-# LANGUAGE LambdaCase #-}
2 |
3 | module SimpleRegex ( accept, Reg (..) ) where
4 |
5 | data Reg =
6 | Eps -- Epsilon
7 | | Sym Char -- Character
8 | | Alt Reg Reg -- Alternation
9 | | Seq Reg Reg -- Sequence
10 | | Rep Reg -- R*
11 |
12 | accept :: Reg -> String -> Bool
13 | -- Epsilon
14 | accept Eps u = null u
15 | -- Accept if the character offered matches the character constructed
16 | accept (Sym c) u = u == [c]
17 | -- Constructed of two other expressions, accept if either one does.
18 | accept (Alt p q) u = accept p u || accept q u
19 | -- Constructed of two other expressions, accept if p accepts some part
20 | -- of u and q accepts the rest, where u is split arbitrarily
21 | accept (Seq p q) u = or [accept p u1 && accept q u2 | (u1, u2) <- split u]
22 | -- For all convolutions of u containing no empty strings,
23 | -- if all the entries of that convolution are accepted,
24 | -- then at least one convolution is acceptable.
25 | accept (Rep r) u = or [and [accept r ui | ui <- ps] | ps <- parts u]
26 |
27 | -- Generate a list of all possible combinations of a prefix and suffix
28 | -- for the string offered.w
29 | split :: [a] -> [([a], [a])]
30 | split [] = [([], [])]
31 | split (c:cs) = ([], c : cs) : [(c : s1, s2) | (s1, s2) <- split cs]
32 |
33 | -- Generate lists of lists that contain all possible convolutions of
34 | -- the input string, not including the empty string.
35 | parts :: [a] -> [[[a]]]
36 | parts [] = [[]]
37 | parts [c] = [[[c]]]
38 | parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs]
39 |
40 |
--------------------------------------------------------------------------------
/haskell/01_SimpleRegex/stack.yaml:
--------------------------------------------------------------------------------
1 | # This file was automatically generated by 'stack init'
2 | #
3 | # Some commonly used options have been documented as comments in this file.
4 | # For advanced use and comprehensive documentation of the format, please see:
5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/
6 |
7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version.
8 | # A snapshot resolver dictates the compiler version and the set of packages
9 | # to be used for project dependencies. For example:
10 | #
11 | # resolver: lts-3.5
12 | # resolver: nightly-2015-09-21
13 | # resolver: ghc-7.10.2
14 | #
15 | # The location of a snapshot can be provided as a file or url. Stack assumes
16 | # a snapshot provided as a file might change, whereas a url resource does not.
17 | #
18 | # resolver: ./custom-snapshot.yaml
19 | # resolver: https://example.com/snapshots/2018-01-01.yaml
20 | resolver: lts-13.4
21 |
22 | # User packages to be built.
23 | # Various formats can be used as shown in the example below.
24 | #
25 | # packages:
26 | # - some-directory
27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz
28 | # - location:
29 | # git: https://github.com/commercialhaskell/stack.git
30 | # commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
32 | # subdirs:
33 | # - auto-update
34 | # - wai
35 | packages:
36 | - .
37 | # Dependency packages to be pulled from upstream that are not in the resolver
38 | # using the same syntax as the packages field.
39 | # (e.g., acme-missiles-0.3)
40 | # extra-deps: []
41 |
42 | # Override default flag values for local packages and extra-deps
43 | # flags: {}
44 |
45 | # Extra package databases containing global packages
46 | # extra-package-dbs: []
47 |
48 | # Control whether we use the GHC we find on the path
49 | # system-ghc: true
50 | #
51 | # Require a specific version of stack, using version ranges
52 | # require-stack-version: -any # Default
53 | # require-stack-version: ">=1.9"
54 | #
55 | # Override the architecture used by stack, especially useful on Windows
56 | # arch: i386
57 | # arch: x86_64
58 | #
59 | # Extra directories used by stack for building
60 | # extra-include-dirs: [/path/to/dir]
61 | # extra-lib-dirs: [/path/to/dir]
62 | #
63 | # Allow a newer minor version of GHC than the snapshot specifies
64 | # compiler-check: newer-minor
65 |
--------------------------------------------------------------------------------
/haskell/01_SimpleRegex/test/Tests.hs:
--------------------------------------------------------------------------------
1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-}
2 | {-# LANGUAGE RecordWildCards #-}
3 | import Data.Foldable (for_)
4 | import Test.Hspec (Spec, it, shouldBe)
5 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)
6 | import SimpleRegex (Reg (..), accept)
7 |
8 | main :: IO ()
9 | main = hspecWith defaultConfig {configFastFail = True} specs
10 |
11 | specs :: Spec
12 | specs = do
13 |
14 | let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) )
15 | let onec = Seq nocs (Sym 'c')
16 | let evencs = Seq ( Rep ( Seq onec onec ) ) nocs
17 |
18 | let as = Alt (Sym 'a') (Rep (Sym 'a'))
19 | let bs = Alt (Sym 'b') (Rep (Sym 'b'))
20 |
21 | it "simple expression" $
22 | accept evencs "acc" `shouldBe` True
23 |
24 | for_ cases test
25 | where
26 | test Case {..} = it description assertion
27 | where
28 | assertion = accept regex sample `shouldBe` result
29 |
30 |
31 | data Case = Case
32 | { description :: String
33 | , regex :: Reg
34 | , sample :: String
35 | , result :: Bool
36 | }
37 |
38 | cases :: [Case]
39 | cases =
40 | [ Case {description = "empty", regex = Eps, sample = "", result = True}
41 | , Case {description = "char", regex = Sym 'a', sample = "a", result = True}
42 | , Case
43 | {description = "not char", regex = Sym 'a', sample = "b", result = False}
44 | , Case
45 | { description = "char vs empty"
46 | , regex = Sym 'a'
47 | , sample = ""
48 | , result = False
49 | }
50 | , Case
51 | { description = "left alt"
52 | , regex = Alt (Sym 'a') (Sym 'b')
53 | , sample = "a"
54 | , result = True
55 | }
56 | , Case
57 | { description = "right alt"
58 | , regex = Alt (Sym 'a') (Sym 'b')
59 | , sample = "b"
60 | , result = True
61 | }
62 | , Case
63 | { description = "neither alt"
64 | , regex = Alt (Sym 'a') (Sym 'b')
65 | , sample = "c"
66 | , result = False
67 | }
68 | , Case
69 | { description = "empty alt"
70 | , regex = Alt (Sym 'a') (Sym 'b')
71 | , sample = ""
72 | , result = False
73 | }
74 | , Case
75 | { description = "empty rep"
76 | , regex = Rep (Sym 'a')
77 | , sample = ""
78 | , result = True
79 | }
80 | , Case
81 | { description = "one rep"
82 | , regex = Rep (Sym 'a')
83 | , sample = "a"
84 | , result = True
85 | }
86 | , Case
87 | { description = "multiple rep"
88 | , regex = Rep (Sym 'a')
89 | , sample = "aaaaaaaaa"
90 | , result = True
91 | }
92 | , Case
93 | { description = "multiple rep with failure"
94 | , regex = Rep (Sym 'a')
95 | , sample = "aaaaaaaaab"
96 | , result = False
97 | }
98 | , Case
99 | { description = "sequence"
100 | , regex = Seq (Sym 'a') (Sym 'b')
101 | , sample = "ab"
102 | , result = True
103 | }
104 | , Case
105 | { description = "sequence with empty"
106 | , regex = Seq (Sym 'a') (Sym 'b')
107 | , sample = ""
108 | , result = False
109 | }
110 | , Case
111 | { description = "bad short sequence"
112 | , regex = Seq (Sym 'a') (Sym 'b')
113 | , sample = "a"
114 | , result = False
115 | }
116 | , Case
117 | { description = "bad long sequence"
118 | , regex = Seq (Sym 'a') (Sym 'b')
119 | , sample = "abc"
120 | , result = False
121 | }
122 | ]
123 |
124 |
125 |
--------------------------------------------------------------------------------
/haskell/02_RiggedRegex/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright (c) 2010, Thomas Wilke, Frank Huch, Sebastian Fischer
2 |
3 | All rights reserved.
4 |
5 | Redistribution and use in source and binary forms, with or without
6 | modification, are permitted provided that the following conditions are
7 | met:
8 |
9 | 1. Redistributions of source code must retain the above copyright
10 | notice, this list of conditions and the following disclaimer.
11 |
12 | 2. Redistributions in binary form must reproduce the above copyright
13 | notice, this list of conditions and the following disclaimer in
14 | the documentation and/or other materials provided with the
15 | distribution.
16 |
17 | 3. Neither the name of the author nor the names of his contributors
18 | may be used to endorse or promote products derived from this
19 | software without specific prior written permission.
20 |
21 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
22 | ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
23 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
24 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR
25 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
26 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
27 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
28 | PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
29 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
30 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
31 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
32 |
33 |
--------------------------------------------------------------------------------
/haskell/02_RiggedRegex/README.md:
--------------------------------------------------------------------------------
1 | # Kleene Regular Expressions with Rigging, in Haskell
2 |
3 | This program builds on the simple regular expressions in Version 01,
4 | provding a new definition of a regular expression `Regw` that takes two
5 | types, a source type and an output type. The output type must be a
6 | [*Semiring*](https://en.wikipedia.org/wiki/Semiring).
7 |
8 | A semiring is a set R equipped with two binary operations + and ⋅, and
9 | two constants identified as 0 and 1. By providing a semiring to the
10 | regular expression, we change the return type of the regular expression
11 | to any set that can obey the semiring laws. There's a surprising amount
12 | of stuff you can do with the semiring laws.
13 |
14 | In this example, I've providing a function, `rigged`, that takes a
15 | simple regular expression from Version 01, and wraps or extracts
16 | the contents of that regular expression into the `Regw` datatype.
17 | Instead of the boolean mathematics of Version 01, we use the semiring
18 | symbols `add` and `mul` to represent the sum and product operations on
19 | the return type. We then define the "symbol accepted" boolean to return
20 | either the `zero` or `one` type of the semiring.
21 |
22 | I've provided two semirings: One of (0, 1, +, *, Integers), and one of
23 | (False, True, ||, &&, Booleans). Both work well.
24 |
25 | The `accept expression string` function of the original still works, but
26 | if you say `accept (rigged expression) string :: Int`, Haskell will *go
27 | find* a Semiring that allows this function to work and return the number
28 | of ambiguities encountered during parsing. If you ask for Bool as a
29 | return type, it will behave as the original.
30 |
31 | Sometimes, Haskell is bleeding magical.
32 |
33 |
--------------------------------------------------------------------------------
/haskell/02_RiggedRegex/RiggedRegex.cabal:
--------------------------------------------------------------------------------
1 | cabal-version: 1.12
2 |
3 | -- This file has been generated from package.yaml by hpack version 0.31.1.
4 | --
5 | -- see: https://github.com/sol/hpack
6 | --
7 | -- hash: a27275fb9824bb59f3ba73db8613283f0ce03f9ab6d1053ec40e17977c04aa1d
8 |
9 | name: RiggedRegex
10 | version: 0.1.0.0
11 | category: Regex
12 | homepage: https://github.com/elfsternberg/riggedregex#readme
13 | author: Elf M. Sternberg
14 | maintainer: elf.sternberg@gmail.com
15 | copyright: Copyright ⓒ 2019 Elf M. Sternberg
16 | license: BSD3
17 | license-file: LICENSE
18 | build-type: Simple
19 | extra-source-files:
20 | README.md
21 |
22 | library
23 | exposed-modules:
24 | RiggedRegex
25 | other-modules:
26 | Paths_RiggedRegex
27 | hs-source-dirs:
28 | src
29 | ghc-options: -Wall
30 | build-depends:
31 | base
32 | default-language: Haskell2010
33 |
34 | test-suite test
35 | type: exitcode-stdio-1.0
36 | main-is: Tests.hs
37 | other-modules:
38 | Paths_RiggedRegex
39 | hs-source-dirs:
40 | test
41 | build-depends:
42 | RiggedRegex
43 | , base
44 | , hspec
45 | default-language: Haskell2010
46 |
--------------------------------------------------------------------------------
/haskell/02_RiggedRegex/package.yaml:
--------------------------------------------------------------------------------
1 | name: RiggedRegex
2 | version: 0.1.0.0
3 |
4 | homepage: https://github.com/elfsternberg/riggedregex#readme
5 | license: BSD3
6 | license-file: LICENSE
7 | author: Elf M. Sternberg
8 | maintainer: elf.sternberg@gmail.com
9 | copyright: Copyright ⓒ 2019 Elf M. Sternberg
10 | category: Regex
11 | build-type: Simple
12 | extra-source-files: README.md
13 |
14 | dependencies:
15 | - base
16 |
17 | library:
18 | exposed-modules: RiggedRegex
19 | ghc-options: -Wall
20 | source-dirs: src
21 |
22 | tests:
23 | test:
24 | main: Tests.hs
25 | source-dirs: test
26 | dependencies:
27 | - RiggedRegex
28 | - hspec
29 |
30 |
31 |
--------------------------------------------------------------------------------
/haskell/02_RiggedRegex/src/RiggedRegex.hs:
--------------------------------------------------------------------------------
1 | {-# LANGUAGE LambdaCase #-}
2 |
3 | module RiggedRegex ( accept, acceptw, Reg (..), Regw (..), rigged ) where
4 |
5 | data Reg =
6 | Eps -- Epsilon
7 | | Sym Char -- Character
8 | | Alt Reg Reg -- Alternation
9 | | Seq Reg Reg -- Sequence
10 | | Rep Reg -- R*
11 |
12 | accept :: Reg -> String -> Bool
13 | -- Epsilon
14 | accept Eps u = null u
15 | -- Accept if the character offered matches the character constructed
16 | accept (Sym c) u = u == [c]
17 | -- Constructed of two other expressions, accept if either one does.
18 | accept (Alt p q) u = accept p u || accept q u
19 | -- Constructed of two other expressions, accept if p accepts some part
20 | -- of u and q accepts the rest, where u is split arbitrarily
21 | accept (Seq p q) u = or [accept p u1 && accept q u2 | (u1, u2) <- split u]
22 | -- For all convolutions of u containing no empty strings,
23 | -- if all the entries of that convolution are accepted,
24 | -- then at least one convolution is acceptable.
25 | accept (Rep r) u = or [and [accept r ui | ui <- ps] | ps <- parts u]
26 |
27 | -- Generate a list of all possible combinations of a prefix and suffix
28 | -- for the string offered.w
29 | split :: [a] -> [([a], [a])]
30 | split [] = [([], [])]
31 | split (c:cs) = ([], c : cs) : [(c : s1, s2) | (s1, s2) <- split cs]
32 |
33 | -- Generate lists of lists that contain all possible convolutions of
34 | -- the input string, not including the empty string.
35 | parts :: [a] -> [[[a]]]
36 | parts [] = [[]]
37 | parts [c] = [[[c]]]
38 | parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs]
39 |
40 | -- A semiring is an algebraic structure with a zero, a one, a
41 | -- "multiplication" operation, and an "addition" operation. Zero is
42 | -- the identity operator for addition, One is the identity operator for
43 | -- multiplication, both composition operators are associative (it does
44 | -- not matter how sequential operations are grouped), and addition is
45 | -- commutative (the order of the operations does not matter). Also,
46 | -- zero `mul` anything is always zero.
47 | --
48 | -- Which, in regular expressions in general, holds that the null regex
49 | -- is zero, and the empty string regex is one, alternation is addition
50 | -- and ... sequence is multiplication? Like "sum" and "product" types?
51 |
52 | class Semiring s where
53 | zero, one :: s
54 | mul, add :: s -> s -> s
55 |
56 | -- Symw (c -> s) represents a mapping from a symbol to its given weight.
57 |
58 | sym :: Semiring s => Char -> Regw Char s
59 | sym c = Symw (\b -> if b == c then one else zero)
60 |
61 | data Regw c s =
62 | Epsw -- Epsilon
63 | | Symw (c -> s) -- Character
64 | | Altw (Regw c s) (Regw c s) -- Alternation
65 | | Seqw (Regw c s) (Regw c s) -- Sequence
66 | | Repw (Regw c s) -- R*
67 |
68 | rigged :: Semiring s => Reg -> Regw Char s
69 | rigged = \case
70 | Eps -> Epsw
71 | (Sym c) -> sym c
72 | (Alt p q) -> Altw (rigged p) (rigged q)
73 | (Seq p q) -> Seqw (rigged p) (rigged q)
74 | (Rep r) -> Repw (rigged r)
75 |
76 | acceptw :: Semiring s => Regw c s -> [c] -> s
77 | acceptw Epsw u = if null u then one else zero
78 | acceptw (Symw f) u =
79 | case u of
80 | [c] -> f c
81 | _ -> zero
82 | acceptw (Altw p q) u = acceptw p u `add` acceptw q u
83 | acceptw (Seqw p q) u = sumr [ acceptw p u1 `mul` acceptw q u2 | (u1, u2) <- split u ]
84 | acceptw (Repw r) u = sumr [ prodr [ acceptw r ui | ui <- ps ] | ps <- parts u ]
85 |
86 | sumr, prodr :: Semiring r => [r] -> r
87 | sumr = foldr add zero
88 | prodr = foldr mul one
89 |
90 | instance Semiring Bool where
91 | zero = False
92 | one = True
93 | add = (||)
94 | mul = (&&)
95 |
96 | instance Semiring Int where
97 | zero = 0
98 | one = 1
99 | add = (+)
100 | mul = (*)
101 |
102 |
--------------------------------------------------------------------------------
/haskell/02_RiggedRegex/stack.yaml:
--------------------------------------------------------------------------------
1 | # This file was automatically generated by 'stack init'
2 | #
3 | # Some commonly used options have been documented as comments in this file.
4 | # For advanced use and comprehensive documentation of the format, please see:
5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/
6 |
7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version.
8 | # A snapshot resolver dictates the compiler version and the set of packages
9 | # to be used for project dependencies. For example:
10 | #
11 | # resolver: lts-3.5
12 | # resolver: nightly-2015-09-21
13 | # resolver: ghc-7.10.2
14 | #
15 | # The location of a snapshot can be provided as a file or url. Stack assumes
16 | # a snapshot provided as a file might change, whereas a url resource does not.
17 | #
18 | # resolver: ./custom-snapshot.yaml
19 | # resolver: https://example.com/snapshots/2018-01-01.yaml
20 | resolver: lts-13.4
21 |
22 | # User packages to be built.
23 | # Various formats can be used as shown in the example below.
24 | #
25 | # packages:
26 | # - some-directory
27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz
28 | # - location:
29 | # git: https://github.com/commercialhaskell/stack.git
30 | # commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
32 | # subdirs:
33 | # - auto-update
34 | # - wai
35 | packages:
36 | - .
37 | # Dependency packages to be pulled from upstream that are not in the resolver
38 | # using the same syntax as the packages field.
39 | # (e.g., acme-missiles-0.3)
40 | # extra-deps: []
41 |
42 | # Override default flag values for local packages and extra-deps
43 | # flags: {}
44 |
45 | # Extra package databases containing global packages
46 | # extra-package-dbs: []
47 |
48 | # Control whether we use the GHC we find on the path
49 | # system-ghc: true
50 | #
51 | # Require a specific version of stack, using version ranges
52 | # require-stack-version: -any # Default
53 | # require-stack-version: ">=1.9"
54 | #
55 | # Override the architecture used by stack, especially useful on Windows
56 | # arch: i386
57 | # arch: x86_64
58 | #
59 | # Extra directories used by stack for building
60 | # extra-include-dirs: [/path/to/dir]
61 | # extra-lib-dirs: [/path/to/dir]
62 | #
63 | # Allow a newer minor version of GHC than the snapshot specifies
64 | # compiler-check: newer-minor
65 |
--------------------------------------------------------------------------------
/haskell/02_RiggedRegex/test/Tests.hs:
--------------------------------------------------------------------------------
1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-}
2 | {-# LANGUAGE RecordWildCards #-}
3 | import Data.Foldable (for_)
4 | import Test.Hspec (Spec, it, shouldBe)
5 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)
6 | import RiggedRegex (Reg (..), accept, acceptw, rigged)
7 |
8 | main :: IO ()
9 | main = hspecWith defaultConfig {configFastFail = True} specs
10 |
11 | specs :: Spec
12 | specs = do
13 |
14 | let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) )
15 | let onec = Seq nocs (Sym 'c')
16 | let evencs = Seq ( Rep ( Seq onec onec ) ) nocs
17 |
18 | let as = Alt (Sym 'a') (Rep (Sym 'a'))
19 | let bs = Alt (Sym 'b') (Rep (Sym 'b'))
20 |
21 | it "simple expression" $
22 | accept evencs "acc" `shouldBe` True
23 |
24 | it "lifted expression" $
25 | (acceptw (rigged evencs) "acc" :: Bool) `shouldBe` True
26 |
27 | it "lifted expression short" $
28 | (acceptw (rigged evencs) "acc" :: Int) `shouldBe` 1
29 |
30 | it "lifted expression counter two" $
31 | (acceptw (rigged as) "a" :: Int) `shouldBe` 2
32 |
33 | it "lifted expression counter one" $
34 | (acceptw (rigged as) "aa" :: Int) `shouldBe` 1
35 |
36 | it "lifted expression dynamic counter four" $
37 | (acceptw (rigged (Seq as bs)) "ab" :: Int) `shouldBe` 4
38 |
39 | for_ cases test
40 | where
41 | test Case {..} = it description assertion
42 | where
43 | assertion = (acceptw (rigged regex) sample :: Bool) `shouldBe` result
44 |
45 | data Case = Case
46 | { description :: String
47 | , regex :: Reg
48 | , sample :: String
49 | , result :: Bool
50 | }
51 |
52 | cases :: [Case]
53 | cases =
54 | [ Case {description = "empty", regex = Eps, sample = "", result = True}
55 | , Case {description = "char", regex = Sym 'a', sample = "a", result = True}
56 | , Case
57 | {description = "not char", regex = Sym 'a', sample = "b", result = False}
58 | , Case
59 | { description = "char vs empty"
60 | , regex = Sym 'a'
61 | , sample = ""
62 | , result = False
63 | }
64 | , Case
65 | { description = "left alt"
66 | , regex = Alt (Sym 'a') (Sym 'b')
67 | , sample = "a"
68 | , result = True
69 | }
70 | , Case
71 | { description = "right alt"
72 | , regex = Alt (Sym 'a') (Sym 'b')
73 | , sample = "b"
74 | , result = True
75 | }
76 | , Case
77 | { description = "neither alt"
78 | , regex = Alt (Sym 'a') (Sym 'b')
79 | , sample = "c"
80 | , result = False
81 | }
82 | , Case
83 | { description = "empty alt"
84 | , regex = Alt (Sym 'a') (Sym 'b')
85 | , sample = ""
86 | , result = False
87 | }
88 | , Case
89 | { description = "empty rep"
90 | , regex = Rep (Sym 'a')
91 | , sample = ""
92 | , result = True
93 | }
94 | , Case
95 | { description = "one rep"
96 | , regex = Rep (Sym 'a')
97 | , sample = "a"
98 | , result = True
99 | }
100 | , Case
101 | { description = "multiple rep"
102 | , regex = Rep (Sym 'a')
103 | , sample = "aaaaaaaaa"
104 | , result = True
105 | }
106 | , Case
107 | { description = "multiple rep with failure"
108 | , regex = Rep (Sym 'a')
109 | , sample = "aaaaaaaaab"
110 | , result = False
111 | }
112 | , Case
113 | { description = "sequence"
114 | , regex = Seq (Sym 'a') (Sym 'b')
115 | , sample = "ab"
116 | , result = True
117 | }
118 | , Case
119 | { description = "sequence with empty"
120 | , regex = Seq (Sym 'a') (Sym 'b')
121 | , sample = ""
122 | , result = False
123 | }
124 | , Case
125 | { description = "bad short sequence"
126 | , regex = Seq (Sym 'a') (Sym 'b')
127 | , sample = "a"
128 | , result = False
129 | }
130 | , Case
131 | { description = "bad long sequence"
132 | , regex = Seq (Sym 'a') (Sym 'b')
133 | , sample = "abc"
134 | , result = False
135 | }
136 | ]
137 |
138 |
139 |
--------------------------------------------------------------------------------
/haskell/03_Brzozowski/BrzExp.cabal:
--------------------------------------------------------------------------------
1 | cabal-version: 1.12
2 |
3 | -- This file has been generated from package.yaml by hpack version 0.31.1.
4 | --
5 | -- see: https://github.com/sol/hpack
6 | --
7 | -- hash: bf664c2df8f31bc5a6ffd907ebaf10824d56d8d271f07d3f4cc7562ce9d44395
8 |
9 | name: BrzExp
10 | version: 0.1.0.0
11 | category: Regex
12 | homepage: https://github.com/elfsternberg/riggedregex#readme
13 | author: Elf M. Sternberg
14 | maintainer: elf.sternberg@gmail.com
15 | copyright: Copyright ⓒ 2019 Elf M. Sternberg
16 | license: BSD3
17 | license-file: LICENSE
18 | build-type: Simple
19 | extra-source-files:
20 | README.md
21 |
22 | library
23 | exposed-modules:
24 | BrzExp
25 | other-modules:
26 | Paths_BrzExp
27 | hs-source-dirs:
28 | src
29 | ghc-options: -Wall
30 | build-depends:
31 | base
32 | default-language: Haskell2010
33 |
34 | test-suite test
35 | type: exitcode-stdio-1.0
36 | main-is: Tests.hs
37 | other-modules:
38 | Paths_BrzExp
39 | hs-source-dirs:
40 | test
41 | build-depends:
42 | BrzExp
43 | , base
44 | , hspec
45 | default-language: Haskell2010
46 |
--------------------------------------------------------------------------------
/haskell/03_Brzozowski/LICENSE:
--------------------------------------------------------------------------------
1 | The Brzozowski experiments are original work, and are copyright and
2 | licensed by Elf M. Sternberg according to the Mozilla 2.0 Public
3 | License. See the LICENSE.md file in the main directory
4 |
--------------------------------------------------------------------------------
/haskell/03_Brzozowski/README.md:
--------------------------------------------------------------------------------
1 | # Brzozowski Regular Expressions, in Haskell
2 |
3 | This is a regex recognizer implementing Brzozowski's Algorithm, in
4 | Haskell. Brzozowski's Algorithm has been a bit of a fascination for me,
5 | because it made generally much more sense that the traditional
6 | algorithm, especially since the Pumping Lemma is much more intelligible
7 | under Brzozowski than it is with more common forms of automata analysis.
8 |
9 | Brzozowski's algorithm basically says that a regular expression is a
10 | function that, given a string and a regular expression, returns three
11 | things: the remainder of the input after the leading character has been
12 | consumed, and a new function that represents the rest of the regular
13 | expression after that leading character has been analyzed, and the
14 | status of the analysis thus far.
15 |
16 | Brzozowski called this "the derivative of the regular expression."
17 |
18 | The only trick to dealing with Brzozowski's Algorithm is with respect to
19 | nullability: it is important to know if a regular expression _may be
20 | nullable_ (that is, it may accept the empty string). A separate
21 | function describes the nullability of the different kinds of expressions
22 | in our system.
23 |
24 |
25 |
26 |
27 |
28 |
--------------------------------------------------------------------------------
/haskell/03_Brzozowski/Setup.hs:
--------------------------------------------------------------------------------
1 | import Distribution.Simple
2 | main = defaultMain
3 |
--------------------------------------------------------------------------------
/haskell/03_Brzozowski/package.yaml:
--------------------------------------------------------------------------------
1 | name: BrzExp
2 | version: 0.1.0.0
3 |
4 | homepage: https://github.com/elfsternberg/riggedregex#readme
5 | license: BSD3
6 | license-file: LICENSE
7 | author: Elf M. Sternberg
8 | maintainer: elf.sternberg@gmail.com
9 | copyright: Copyright ⓒ 2019 Elf M. Sternberg
10 | category: Regex
11 | build-type: Simple
12 | extra-source-files: README.md
13 |
14 | dependencies:
15 | - base
16 |
17 | library:
18 | exposed-modules: BrzExp
19 | ghc-options: -Wall
20 | source-dirs: src
21 |
22 | tests:
23 | test:
24 | main: Tests.hs
25 | source-dirs: test
26 | dependencies:
27 | - BrzExp
28 | - hspec
29 |
30 |
--------------------------------------------------------------------------------
/haskell/03_Brzozowski/src/BrzExp.hs:
--------------------------------------------------------------------------------
1 | module BrzExp ( accept, nullable, Brz (..) ) where
2 | data Brz = Emp | Eps | Sym Char | Alt Brz Brz | Seq Brz Brz | Rep Brz
3 |
4 | derive :: Brz -> Char -> Brz
5 | derive Emp _ = Emp
6 | derive Eps _ = Emp
7 | derive (Sym c) u = if c == u then Eps else Emp
8 | derive (Seq l r) u
9 | | nullable l = Alt (Seq (derive l u) r) (derive r u)
10 | | otherwise = Seq (derive l u) r
11 |
12 | derive (Alt Emp r) u = derive r u
13 | derive (Alt l Emp) u = derive l u
14 | derive (Alt l r) u = Alt (derive r u) (derive l u)
15 |
16 | derive (Rep r) u = Seq (derive r u) (Rep r)
17 |
18 | nullable :: Brz -> Bool
19 | nullable Emp = False
20 | nullable Eps = True
21 | nullable (Sym _) = False
22 | nullable (Alt l r) = nullable l || nullable r
23 | nullable (Seq l r) = nullable l && nullable r
24 | nullable (Rep _) = True
25 |
26 | accept :: Brz -> String -> Bool
27 | accept r [] = nullable r
28 | accept r (s:ss) = accept (derive r s) ss
29 |
30 |
31 |
--------------------------------------------------------------------------------
/haskell/03_Brzozowski/stack.yaml:
--------------------------------------------------------------------------------
1 | # This file was automatically generated by 'stack init'
2 | #
3 | # Some commonly used options have been documented as comments in this file.
4 | # For advanced use and comprehensive documentation of the format, please see:
5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/
6 |
7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version.
8 | # A snapshot resolver dictates the compiler version and the set of packages
9 | # to be used for project dependencies. For example:
10 | #
11 | # resolver: lts-3.5
12 | # resolver: nightly-2015-09-21
13 | # resolver: ghc-7.10.2
14 | #
15 | # The location of a snapshot can be provided as a file or url. Stack assumes
16 | # a snapshot provided as a file might change, whereas a url resource does not.
17 | #
18 | # resolver: ./custom-snapshot.yaml
19 | # resolver: https://example.com/snapshots/2018-01-01.yaml
20 | resolver: lts-13.4
21 |
22 | # User packages to be built.
23 | # Various formats can be used as shown in the example below.
24 | #
25 | # packages:
26 | # - some-directory
27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz
28 | # - location:
29 | # git: https://github.com/commercialhaskell/stack.git
30 | # commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
32 | # subdirs:
33 | # - auto-update
34 | # - wai
35 | packages:
36 | - .
37 | # Dependency packages to be pulled from upstream that are not in the resolver
38 | # using the same syntax as the packages field.
39 | # (e.g., acme-missiles-0.3)
40 | # extra-deps: []
41 |
42 | # Override default flag values for local packages and extra-deps
43 | # flags: {}
44 |
45 | # Extra package databases containing global packages
46 | # extra-package-dbs: []
47 |
48 | # Control whether we use the GHC we find on the path
49 | # system-ghc: true
50 | #
51 | # Require a specific version of stack, using version ranges
52 | # require-stack-version: -any # Default
53 | # require-stack-version: ">=1.9"
54 | #
55 | # Override the architecture used by stack, especially useful on Windows
56 | # arch: i386
57 | # arch: x86_64
58 | #
59 | # Extra directories used by stack for building
60 | # extra-include-dirs: [/path/to/dir]
61 | # extra-lib-dirs: [/path/to/dir]
62 | #
63 | # Allow a newer minor version of GHC than the snapshot specifies
64 | # compiler-check: newer-minor
65 |
--------------------------------------------------------------------------------
/haskell/03_Brzozowski/test/Tests.hs:
--------------------------------------------------------------------------------
1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-}
2 | {-# LANGUAGE RecordWildCards #-}
3 |
4 | import Data.Foldable (for_)
5 | import Test.Hspec (Spec, describe, it, shouldBe)
6 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)
7 |
8 | import BrzExp (Brz (..), accept)
9 |
10 | main :: IO ()
11 | main = hspecWith defaultConfig {configFastFail = True} specs
12 |
13 | specs :: Spec
14 | specs = describe "accept" $ for_ cases test
15 | where
16 | test Case {..} = it description assertion
17 | where
18 | assertion = accept regex sample `shouldBe` result
19 |
20 | data Case = Case
21 | { description :: String
22 | , regex :: Brz
23 | , sample :: String
24 | , result :: Bool
25 | }
26 |
27 | -- let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) )
28 | -- onec = Seq nocs (Sym 'c')
29 | -- evencs = Seq ( Rep ( Seq onec onec ) ) nocs
30 | -- as = Alt (Sym 'a') (Rep (Sym 'a'))
31 | -- bs = Alt (Sym 'b') (Rep (Sym 'b'))
32 | cases :: [Case]
33 | cases =
34 | [ Case {description = "empty", regex = Eps, sample = "", result = True}
35 | , Case {description = "null", regex = Emp, sample = "", result = False}
36 | , Case {description = "char", regex = Sym 'a', sample = "a", result = True}
37 | , Case
38 | {description = "not char", regex = Sym 'a', sample = "b", result = False}
39 | , Case
40 | { description = "char vs empty"
41 | , regex = Sym 'a'
42 | , sample = ""
43 | , result = False
44 | }
45 | , Case
46 | { description = "left alt"
47 | , regex = Alt (Sym 'a') (Sym 'b')
48 | , sample = "a"
49 | , result = True
50 | }
51 | , Case
52 | { description = "right alt"
53 | , regex = Alt (Sym 'a') (Sym 'b')
54 | , sample = "b"
55 | , result = True
56 | }
57 | , Case
58 | { description = "neither alt"
59 | , regex = Alt (Sym 'a') (Sym 'b')
60 | , sample = "c"
61 | , result = False
62 | }
63 | , Case
64 | { description = "empty alt"
65 | , regex = Alt (Sym 'a') (Sym 'b')
66 | , sample = ""
67 | , result = False
68 | }
69 | , Case
70 | { description = "empty rep"
71 | , regex = Rep (Sym 'a')
72 | , sample = ""
73 | , result = True
74 | }
75 | , Case
76 | { description = "one rep"
77 | , regex = Rep (Sym 'a')
78 | , sample = "a"
79 | , result = True
80 | }
81 | , Case
82 | { description = "multiple rep"
83 | , regex = Rep (Sym 'a')
84 | , sample = "aaaaaaaaa"
85 | , result = True
86 | }
87 | , Case
88 | { description = "multiple rep with failure"
89 | , regex = Rep (Sym 'a')
90 | , sample = "aaaaaaaaab"
91 | , result = False
92 | }
93 | , Case
94 | { description = "sequence"
95 | , regex = Seq (Sym 'a') (Sym 'b')
96 | , sample = "ab"
97 | , result = True
98 | }
99 | , Case
100 | { description = "sequence with empty"
101 | , regex = Seq (Sym 'a') (Sym 'b')
102 | , sample = ""
103 | , result = False
104 | }
105 | , Case
106 | { description = "bad short sequence"
107 | , regex = Seq (Sym 'a') (Sym 'b')
108 | , sample = "a"
109 | , result = False
110 | }
111 | , Case
112 | { description = "bad long sequence"
113 | , regex = Seq (Sym 'a') (Sym 'b')
114 | , sample = "abc"
115 | , result = False
116 | }
117 | ]
118 |
--------------------------------------------------------------------------------
/haskell/04_Gluskov/Glushkov.cabal:
--------------------------------------------------------------------------------
1 | cabal-version: 1.12
2 |
3 | -- This file has been generated from package.yaml by hpack version 0.31.1.
4 | --
5 | -- see: https://github.com/sol/hpack
6 | --
7 | -- hash: 1a234ba3e4b3372f4e6f179bb337b813ee69faffc8b001f781636f1ba3d185e4
8 |
9 | name: Glushkov
10 | version: 0.1.0.0
11 | category: Regex
12 | homepage: https://github.com/elfsternberg/riggedregex#readme
13 | author: Elf M. Sternberg
14 | maintainer: elf.sternberg@gmail.com
15 | copyright: Copyright ⓒ 2019 Elf M. Sternberg
16 | license: BSD3
17 | license-file: LICENSE
18 | build-type: Simple
19 | extra-source-files:
20 | README.md
21 |
22 | library
23 | exposed-modules:
24 | Glushkov
25 | other-modules:
26 | Paths_Glushkov
27 | hs-source-dirs:
28 | src
29 | ghc-options: -Wall
30 | build-depends:
31 | base
32 | default-language: Haskell2010
33 |
34 | test-suite test
35 | type: exitcode-stdio-1.0
36 | main-is: Tests.hs
37 | other-modules:
38 | Paths_Glushkov
39 | hs-source-dirs:
40 | test
41 | build-depends:
42 | Glushkov
43 | , base
44 | , hspec
45 | default-language: Haskell2010
46 |
--------------------------------------------------------------------------------
/haskell/04_Gluskov/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright (c) 2010, Thomas Wilke, Frank Huch, Sebastian Fischer
2 |
3 | All rights reserved.
4 |
5 | Redistribution and use in source and binary forms, with or without
6 | modification, are permitted provided that the following conditions are
7 | met:
8 |
9 | 1. Redistributions of source code must retain the above copyright
10 | notice, this list of conditions and the following disclaimer.
11 |
12 | 2. Redistributions in binary form must reproduce the above copyright
13 | notice, this list of conditions and the following disclaimer in
14 | the documentation and/or other materials provided with the
15 | distribution.
16 |
17 | 3. Neither the name of the author nor the names of his contributors
18 | may be used to endorse or promote products derived from this
19 | software without specific prior written permission.
20 |
21 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
22 | ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
23 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
24 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR
25 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
26 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
27 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
28 | PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
29 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
30 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
31 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
32 |
33 |
--------------------------------------------------------------------------------
/haskell/04_Gluskov/README.md:
--------------------------------------------------------------------------------
1 | # Glushkov Regular Expressions, in Haskell
2 |
3 | This is a Glushkov's construction of regular expressions. The basic
4 | idea is that for every symbol encountered during parsing, a
5 | corresponding symbol in the tree is marked (or, if not symbols are
6 | marked, the parse is a failure). Composites are followed to their
7 | ends for each character, and if the symbol matches it is "marked".
8 |
9 | In this instance, are passing a Glushkov regular expression tree,
10 | and for each character it returns a new, complete copy of the tree,
11 | only with the marks "shifted" to where they should be given the
12 | character. In this way, each iteration of the tree keeps the NFA
13 | list of states that are active; they are the paths that lead to
14 | marked symbols.
15 |
16 | 'final' here means that no more symbols have to be read to match
17 | the expression. 'empty' here means that the expression matches
18 | only the empty string.
19 |
20 | 'final' is used here to determine if, for the Glushkov expression
21 | passed in, does the expression contain a marked symbol? This is
22 | used both to determine the end state of the expression, and in
23 | sequences to determine if the rightmost expression must be evaluted,
24 | that is, if we're currently going down a 'marked' path and the left
25 | expression can handle the empty string OR the left expression is
26 | final.
27 |
28 | The accept method is just a fold over the expression. The initial
29 | value is the shift of the first character, with the assumed mark of
30 | 'True' being included because we can always parse infinitely many
31 | empty strings before the sample begins. The returned value of that
32 | shift is our new regular expression, on which we then progressively
33 | call `shift False accg c`; here False means that we're only going to
34 | shift marks we've already found.
35 |
36 | The "trick" to understand this is to consider the string "ab" for
37 | the sequence "ab". The first time through, we start with True, and
38 | what gets marked is the symbol 'a'.
39 |
40 | When we pass the letter 'b', what happens? Well, the 'a' symbol
41 | will be unmarked (it didn't match the character), but the second
42 | part of the shift expression says that the left expression is final
43 | (it's a symbol and it's marked!), so we call `shift True (Sym 'b')
44 | 'b'`, and the mark moves to the correct destination.
45 |
46 | It continues to blow my mind that so much of mathematics can be directly
47 | translated into Haskell with no loss of fidelity.
48 |
--------------------------------------------------------------------------------
/haskell/04_Gluskov/package.yaml:
--------------------------------------------------------------------------------
1 | name: Glushkov
2 | version: 0.1.0.0
3 |
4 | homepage: https://github.com/elfsternberg/riggedregex#readme
5 | license: BSD3
6 | license-file: LICENSE
7 | author: Elf M. Sternberg
8 | maintainer: elf.sternberg@gmail.com
9 | copyright: Copyright ⓒ 2019 Elf M. Sternberg
10 | category: Regex
11 | build-type: Simple
12 | extra-source-files: README.md
13 |
14 | dependencies:
15 | - base
16 |
17 | library:
18 | exposed-modules: Glushkov
19 | ghc-options: -Wall
20 | source-dirs: src
21 |
22 | tests:
23 | test:
24 | main: Tests.hs
25 | source-dirs: test
26 | dependencies:
27 | - Glushkov
28 | - hspec
29 |
30 |
--------------------------------------------------------------------------------
/haskell/04_Gluskov/src/Glushkov.hs:
--------------------------------------------------------------------------------
1 | module Glushkov (Glu (..), accept) where
2 |
3 | data Glu = Eps
4 | | Sym Bool Char
5 | | Alt Glu Glu
6 | | Seq Glu Glu
7 | | Rep Glu
8 |
9 | shift :: Bool -> Glu -> Char -> Glu
10 | shift _ Eps _ = Eps
11 | shift m (Sym _ x) c = Sym (m && x == c) x
12 | shift m (Alt p q) c = Alt (shift m p c) (shift m q c)
13 | shift m (Seq p q) c = Seq (shift m p c) (shift (m && empty p || final p) q c)
14 | shift m (Rep r) c = Rep (shift (m || final r) r c)
15 |
16 | empty :: Glu -> Bool
17 | empty Eps = True
18 | empty (Sym _ _) = False
19 | empty (Alt p q) = empty p || empty q
20 | empty (Seq p q) = empty p && empty q
21 | empty (Rep _) = True
22 |
23 | final :: Glu -> Bool
24 | final Eps = False
25 | final (Sym b _) = b
26 | final (Alt p q) = final p || final q
27 | final (Seq p q) = final p && empty q || final q
28 | final (Rep r) = final r
29 |
30 | accept :: Glu -> String -> Bool
31 | accept r [] = empty r
32 | accept r (c:cs) = final (foldl (shift False) (shift True r c) cs)
33 |
--------------------------------------------------------------------------------
/haskell/04_Gluskov/stack.yaml:
--------------------------------------------------------------------------------
1 | # This file was automatically generated by 'stack init'
2 | #
3 | # Some commonly used options have been documented as comments in this file.
4 | # For advanced use and comprehensive documentation of the format, please see:
5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/
6 |
7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version.
8 | # A snapshot resolver dictates the compiler version and the set of packages
9 | # to be used for project dependencies. For example:
10 | #
11 | # resolver: lts-3.5
12 | # resolver: nightly-2015-09-21
13 | # resolver: ghc-7.10.2
14 | #
15 | # The location of a snapshot can be provided as a file or url. Stack assumes
16 | # a snapshot provided as a file might change, whereas a url resource does not.
17 | #
18 | # resolver: ./custom-snapshot.yaml
19 | # resolver: https://example.com/snapshots/2018-01-01.yaml
20 | resolver: lts-13.5
21 |
22 | # User packages to be built.
23 | # Various formats can be used as shown in the example below.
24 | #
25 | # packages:
26 | # - some-directory
27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz
28 | # - location:
29 | # git: https://github.com/commercialhaskell/stack.git
30 | # commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
32 | # subdirs:
33 | # - auto-update
34 | # - wai
35 | packages:
36 | - .
37 | # Dependency packages to be pulled from upstream that are not in the resolver
38 | # using the same syntax as the packages field.
39 | # (e.g., acme-missiles-0.3)
40 | # extra-deps: []
41 |
42 | # Override default flag values for local packages and extra-deps
43 | # flags: {}
44 |
45 | # Extra package databases containing global packages
46 | # extra-package-dbs: []
47 |
48 | # Control whether we use the GHC we find on the path
49 | # system-ghc: true
50 | #
51 | # Require a specific version of stack, using version ranges
52 | # require-stack-version: -any # Default
53 | # require-stack-version: ">=1.9"
54 | #
55 | # Override the architecture used by stack, especially useful on Windows
56 | # arch: i386
57 | # arch: x86_64
58 | #
59 | # Extra directories used by stack for building
60 | # extra-include-dirs: [/path/to/dir]
61 | # extra-lib-dirs: [/path/to/dir]
62 | #
63 | # Allow a newer minor version of GHC than the snapshot specifies
64 | # compiler-check: newer-minor
65 |
--------------------------------------------------------------------------------
/haskell/04_Gluskov/test/Tests.hs:
--------------------------------------------------------------------------------
1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-}
2 | {-# LANGUAGE RecordWildCards #-}
3 |
4 | import Data.Foldable (for_)
5 | import Test.Hspec (Spec, describe, it, shouldBe)
6 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)
7 |
8 | import Glushkov (Glu (..), accept)
9 |
10 | main :: IO ()
11 | main = hspecWith defaultConfig {configFastFail = True} specs
12 |
13 | specs :: Spec
14 | specs = describe "accept" $ for_ cases test
15 | where
16 | test Case {..} = it description assertion
17 | where
18 | assertion = accept regex sample `shouldBe` result
19 |
20 | data Case = Case
21 | { description :: String
22 | , regex :: Glu
23 | , sample :: String
24 | , result :: Bool
25 | }
26 |
27 | -- let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) )
28 | -- onec = Seq nocs (Sym 'c')
29 | -- evencs = Seq ( Rep ( Seq onec onec ) ) nocs
30 | -- as = Alt (Sym 'a') (Rep (Sym 'a'))
31 | -- bs = Alt (Sym 'b') (Rep (Sym 'b'))
32 |
33 | cases :: [Case]
34 | cases =
35 | [ Case {description = "empty", regex = Eps, sample = "", result = True}
36 | , Case {description = "char", regex = Sym False 'a', sample = "a", result = True}
37 | , Case
38 | {description = "not char", regex = Sym False 'a', sample = "b", result = False}
39 | , Case
40 | { description = "char vs empty"
41 | , regex = Sym False 'a'
42 | , sample = ""
43 | , result = False
44 | }
45 | , Case
46 | { description = "left alt"
47 | , regex = Alt (Sym False 'a') (Sym False 'b')
48 | , sample = "a"
49 | , result = True
50 | }
51 | , Case
52 | { description = "right alt"
53 | , regex = Alt (Sym False 'a') (Sym False 'b')
54 | , sample = "b"
55 | , result = True
56 | }
57 | , Case
58 | { description = "neither alt"
59 | , regex = Alt (Sym False 'a') (Sym False 'b')
60 | , sample = "c"
61 | , result = False
62 | }
63 | , Case
64 | { description = "empty alt"
65 | , regex = Alt (Sym False 'a') (Sym False 'b')
66 | , sample = ""
67 | , result = False
68 | }
69 | , Case
70 | { description = "empty rep"
71 | , regex = Rep (Sym False 'a')
72 | , sample = ""
73 | , result = True
74 | }
75 | , Case
76 | { description = "one rep"
77 | , regex = Rep (Sym False 'a')
78 | , sample = "a"
79 | , result = True
80 | }
81 | , Case
82 | { description = "multiple rep"
83 | , regex = Rep (Sym False 'a')
84 | , sample = "aaaaaaaaa"
85 | , result = True
86 | }
87 | , Case
88 | { description = "multiple rep with failure"
89 | , regex = Rep (Sym False 'a')
90 | , sample = "aaaaaaaaab"
91 | , result = False
92 | }
93 | , Case
94 | { description = "sequence"
95 | , regex = Seq (Sym False 'a') (Sym False 'b')
96 | , sample = "ab"
97 | , result = True
98 | }
99 | , Case
100 | { description = "sequence with empty"
101 | , regex = Seq (Sym False 'a') (Sym False 'b')
102 | , sample = ""
103 | , result = False
104 | }
105 | , Case
106 | { description = "bad short sequence"
107 | , regex = Seq (Sym False 'a') (Sym False 'b')
108 | , sample = "a"
109 | , result = False
110 | }
111 | , Case
112 | { description = "bad long sequence"
113 | , regex = Seq (Sym False 'a') (Sym False 'b')
114 | , sample = "abc"
115 | , result = False
116 | }
117 | ]
118 |
--------------------------------------------------------------------------------
/haskell/05_RiggedBrz/LICENSE:
--------------------------------------------------------------------------------
1 | The Brzozowski experiments are original work, and are copyright and
2 | licensed by Elf M. Sternberg according to the Mozilla 2.0 Public
3 | License. See the LICENSE.md file in the main directory
4 |
--------------------------------------------------------------------------------
/haskell/05_RiggedBrz/README.md:
--------------------------------------------------------------------------------
1 | # Rigged Brzozowski Regular Expressions, in Haskell
2 |
3 | This is the naive implementation of Brzozowski's Algorithm, but with a
4 | Semiring implementation for gathering complex information from the parse
5 | process. This implementation is "naive" in that it saves everything,
6 | including the very large number of dead branches that hang off Sequence
7 | processing, and then discards them at the very end of the process.
8 |
9 | This implementation finally proves to me something that I've been trying
10 | to express for a while: Might, Adams, et. al.'s implementations of tree
11 | parsing *are* Semiring implementations, they just don't call it that,
12 | but the fundamental underlying operations are the same.
13 |
14 | I'm fascinated by the lack of the nullability operator. Instead, it's
15 | just resolved by Emp being parsed as `zero` and Eps as `one * s` where
16 | `s` is the product of the previous operation, and then the new `Delta`
17 | operator preserves this semantic, using multiplicative annhilation to
18 | discard false parses while also being immune to the `Sequence` semantic
19 | that destroys success parse history.
20 |
21 | This can't last. And Might admits it doesn't last. Darais's
22 | implementation goes back to having a separate function for nullability
23 | that both preserves the status of known-nullable expressions and handles
24 | recursion. Darais's version also implements an incredible number of
25 | optimizations to prune, compact, and process the parse tree early,
26 | enabling a number of speedups and caching strategies that get you within
27 | spitting distance of RE2.
28 |
29 |
30 |
--------------------------------------------------------------------------------
/haskell/05_RiggedBrz/RiggedBrz.cabal:
--------------------------------------------------------------------------------
1 | cabal-version: 1.12
2 |
3 | -- This file has been generated from package.yaml by hpack version 0.31.1.
4 | --
5 | -- see: https://github.com/sol/hpack
6 | --
7 | -- hash: 1c0fd9015f1269af8c7445bf2af4c10b835d053b9ddce4a3df3815fb4724e489
8 |
9 | name: RiggedBrz
10 | version: 0.1.0.0
11 | category: Regex
12 | homepage: https://github.com/elfsternberg/riggedregex#readme
13 | author: Elf M. Sternberg
14 | maintainer: elf.sternberg@gmail.com
15 | copyright: Copyright ⓒ 2019 Elf M. Sternberg
16 | license: BSD3
17 | license-file: LICENSE
18 | build-type: Simple
19 | extra-source-files:
20 | README.md
21 |
22 | library
23 | exposed-modules:
24 | RiggedBrz
25 | other-modules:
26 | Paths_RiggedBrz
27 | hs-source-dirs:
28 | src
29 | ghc-options: -Wall
30 | build-depends:
31 | base
32 | , containers
33 | default-language: Haskell2010
34 |
35 | test-suite test
36 | type: exitcode-stdio-1.0
37 | main-is: Tests.hs
38 | other-modules:
39 | Paths_RiggedBrz
40 | hs-source-dirs:
41 | test
42 | build-depends:
43 | RiggedBrz
44 | , base
45 | , containers
46 | , hspec
47 | default-language: Haskell2010
48 |
--------------------------------------------------------------------------------
/haskell/05_RiggedBrz/Setup.hs:
--------------------------------------------------------------------------------
1 | import Distribution.Simple
2 | main = defaultMain
3 |
--------------------------------------------------------------------------------
/haskell/05_RiggedBrz/package.yaml:
--------------------------------------------------------------------------------
1 | name: RiggedBrz
2 | version: 0.1.0.0
3 |
4 | homepage: https://github.com/elfsternberg/riggedregex#readme
5 | license: BSD3
6 | license-file: LICENSE
7 | author: Elf M. Sternberg
8 | maintainer: elf.sternberg@gmail.com
9 | copyright: Copyright ⓒ 2019 Elf M. Sternberg
10 | category: Regex
11 | build-type: Simple
12 | extra-source-files: README.md
13 |
14 | dependencies:
15 | - containers
16 | - base
17 |
18 | library:
19 | exposed-modules: RiggedBrz
20 | ghc-options: -Wall
21 | source-dirs: src
22 |
23 | tests:
24 | test:
25 | main: Tests.hs
26 | source-dirs: test
27 | dependencies:
28 | - RiggedBrz
29 | - hspec
30 |
31 |
--------------------------------------------------------------------------------
/haskell/05_RiggedBrz/src/RiggedBrz.hs:
--------------------------------------------------------------------------------
1 | {-# LANGUAGE LambdaCase #-}
2 | {-# LANGUAGE FlexibleInstances #-}
3 |
4 | module RiggedBrz ( Brz (..), parse, rigged, riggeds ) where
5 |
6 | import Data.Set
7 |
8 | data Brz = Emp | Eps | Sym Char | Alt Brz Brz | Seq Brz Brz | Rep Brz deriving (Eq)
9 |
10 | -- Transform a Brz into a Brzr. That's all it does. It's not magical.
11 |
12 | rigging :: Semiring s => (Char -> Brzr Char s) -> Brz -> Brzr Char s
13 | rigging s = \case
14 | Emp -> Empr
15 | Eps -> Epsr one
16 | (Sym c) -> s c
17 | (Alt p q) -> Altr (rigging s p) (rigging s q)
18 | (Seq p q) -> Seqr (rigging s p) (rigging s q)
19 | (Rep r) -> Repr (rigging s r)
20 |
21 | class Semiring s where
22 | zero, one :: s
23 | mul, add :: s -> s -> s
24 |
25 | data Brzr c s = Empr
26 | | Epsr s
27 | | Delr (Brzr c s)
28 | | Symr (c -> s)
29 | | Altr (Brzr c s) (Brzr c s)
30 | | Seqr (Brzr c s) (Brzr c s)
31 | | Repr (Brzr c s)
32 |
33 | deriver :: Semiring s => Brzr c s -> c -> Brzr c s
34 | deriver Empr _ = Empr
35 | deriver (Epsr _) _ = Empr
36 | deriver (Delr _) _ = Empr
37 | deriver (Symr f) u = Epsr $ (f u)
38 |
39 | deriver (Seqr l r) u =
40 | Altr dl dr
41 | where
42 | dl = Seqr (deriver l u) r
43 | dr = Seqr (Delr l) (deriver r u)
44 |
45 | deriver (Altr l r) u = go (deriver l u) (deriver r u)
46 | where go Empr r1 = r1
47 | go r1 Empr = r1
48 | go l1 r1 = Altr l1 r1
49 |
50 | deriver (Repr r) u = Seqr (deriver r u) (Repr r)
51 |
52 | parsenull :: Semiring s => (Brzr c s) -> s
53 | parsenull Empr = zero
54 | parsenull (Symr _) = zero
55 | parsenull (Repr _) = one
56 | parsenull (Epsr s) = s
57 | parsenull (Delr s) = parsenull s
58 | parsenull (Altr p q) = parsenull p `add` parsenull q
59 | parsenull (Seqr p q) = parsenull p `mul` parsenull q
60 |
61 | instance Semiring Int where
62 | zero = 0
63 | one = 1
64 | add = (Prelude.+)
65 | mul = (Prelude.*)
66 |
67 | instance Semiring Bool where
68 | zero = False
69 | one = True
70 | add = (||)
71 | mul = (&&)
72 |
73 | instance Semiring (Set String) where
74 | zero = empty
75 | one = singleton ""
76 | add = union
77 | mul a b = Data.Set.map (uncurry (++)) $ cartesianProduct a b
78 |
79 | -- Rigging for boolean and integer values.
80 |
81 | sym :: Semiring s => Char -> Brzr Char s
82 | sym c = Symr (\b -> if b == c then one else zero)
83 |
84 | rigged :: Semiring s => Brz -> Brzr Char s
85 | rigged = rigging sym
86 |
87 | -- Rigging for parse forests
88 |
89 | syms :: Char -> Brzr Char (Set String)
90 | syms c = Symr (\b -> if b == c then singleton [c] else zero)
91 |
92 | riggeds :: Brz -> Brzr Char (Set String)
93 | riggeds = rigging syms
94 |
95 | parse :: (Semiring s) => (Brzr Char s) -> String -> s
96 | parse w [] = parsenull w
97 | parse w (c:cs) = parse (deriver w c) cs
98 |
99 |
--------------------------------------------------------------------------------
/haskell/05_RiggedBrz/stack.yaml:
--------------------------------------------------------------------------------
1 | # This file was automatically generated by 'stack init'
2 | #
3 | # Some commonly used options have been documented as comments in this file.
4 | # For advanced use and comprehensive documentation of the format, please see:
5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/
6 |
7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version.
8 | # A snapshot resolver dictates the compiler version and the set of packages
9 | # to be used for project dependencies. For example:
10 | #
11 | # resolver: lts-3.5
12 | # resolver: nightly-2015-09-21
13 | # resolver: ghc-7.10.2
14 | #
15 | # The location of a snapshot can be provided as a file or url. Stack assumes
16 | # a snapshot provided as a file might change, whereas a url resource does not.
17 | #
18 | # resolver: ./custom-snapshot.yaml
19 | # resolver: https://example.com/snapshots/2018-01-01.yaml
20 | resolver: lts-13.5
21 |
22 | # User packages to be built.
23 | # Various formats can be used as shown in the example below.
24 | #
25 | # packages:
26 | # - some-directory
27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz
28 | # - location:
29 | # git: https://github.com/commercialhaskell/stack.git
30 | # commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
32 | # subdirs:
33 | # - auto-update
34 | # - wai
35 | packages:
36 | - .
37 | # Dependency packages to be pulled from upstream that are not in the resolver
38 | # using the same syntax as the packages field.
39 | # (e.g., acme-missiles-0.3)
40 | extra-deps:
41 | - containers-0.6.0.1
42 |
43 | # Override default flag values for local packages and extra-deps
44 | # flags: {}
45 |
46 | # Extra package databases containing global packages
47 | # extra-package-dbs: []
48 |
49 | # Control whether we use the GHC we find on the path
50 | # system-ghc: true
51 | #
52 | # Require a specific version of stack, using version ranges
53 | # require-stack-version: -any # Default
54 | # require-stack-version: ">=1.9"
55 | #
56 | # Override the architecture used by stack, especially useful on Windows
57 | # arch: i386
58 | # arch: x86_64
59 | #
60 | # Extra directories used by stack for building
61 | # extra-include-dirs: [/path/to/dir]
62 | # extra-lib-dirs: [/path/to/dir]
63 | #
64 | # Allow a newer minor version of GHC than the snapshot specifies
65 | # compiler-check: newer-minor
66 |
--------------------------------------------------------------------------------
/haskell/05_RiggedBrz/test/Tests.hs:
--------------------------------------------------------------------------------
1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-}
2 | {-# LANGUAGE RecordWildCards #-}
3 | import Data.Foldable (for_)
4 | import Test.Hspec (Spec, it, shouldBe)
5 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)
6 | import RiggedBrz ( Brz (..), parse, rigged, riggeds )
7 | import Data.Set
8 | import Data.List (sort)
9 |
10 | main :: IO ()
11 | main = hspecWith defaultConfig {configFastFail = True} specs
12 |
13 | specs :: Spec
14 | specs = do
15 |
16 | let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) )
17 | let onec = Seq nocs (Sym 'c')
18 | let evencs = Seq ( Rep ( Seq onec onec ) ) nocs
19 |
20 | let as = Alt (Sym 'a') (Rep (Sym 'a'))
21 | let bs = Alt (Sym 'b') (Rep (Sym 'b'))
22 |
23 | it "lifted expression" $
24 | (parse (rigged evencs) "acc" :: Bool) `shouldBe` True
25 |
26 | it "lifted expression short" $
27 | (parse (rigged evencs) "acc" :: Int) `shouldBe` 1
28 |
29 | it "lifted expression counter two" $
30 | (parse (rigged as) "a" :: Int) `shouldBe` 2
31 |
32 | it "lifted expression counter one" $
33 | (parse (rigged as) "aa" :: Int) `shouldBe` 1
34 |
35 | it "lifted expression dynamic counter four" $
36 | (parse (rigged (Seq as bs)) "ab" :: Int) `shouldBe` 4
37 |
38 | it "parse forests" $
39 | (sort $ toList $ (parse (riggeds (Seq as bs)) "ab" :: Set String)) `shouldBe` ["ab"]
40 |
41 | for_ cases test
42 | where
43 | test Case {..} = it description assertion
44 | where
45 | assertion = (parse (rigged regex) sample :: Bool) `shouldBe` result
46 |
47 | data Case = Case
48 | { description :: String
49 | , regex :: Brz
50 | , sample :: String
51 | , result :: Bool
52 | }
53 |
54 | cases :: [Case]
55 | cases =
56 | [ Case {description = "empty", regex = Eps, sample = "", result = True}
57 | , Case {description = "char", regex = Sym 'a', sample = "a", result = True}
58 | , Case
59 | {description = "not char", regex = Sym 'a', sample = "b", result = False}
60 | , Case
61 | { description = "char vs empty"
62 | , regex = Sym 'a'
63 | , sample = ""
64 | , result = False
65 | }
66 | , Case
67 | { description = "left alt"
68 | , regex = Alt (Sym 'a') (Sym 'b')
69 | , sample = "a"
70 | , result = True
71 | }
72 | , Case
73 | { description = "right alt"
74 | , regex = Alt (Sym 'a') (Sym 'b')
75 | , sample = "b"
76 | , result = True
77 | }
78 | , Case
79 | { description = "neither alt"
80 | , regex = Alt (Sym 'a') (Sym 'b')
81 | , sample = "c"
82 | , result = False
83 | }
84 | , Case
85 | { description = "empty alt"
86 | , regex = Alt (Sym 'a') (Sym 'b')
87 | , sample = ""
88 | , result = False
89 | }
90 | , Case
91 | { description = "empty rep"
92 | , regex = Rep (Sym 'a')
93 | , sample = ""
94 | , result = True
95 | }
96 | , Case
97 | { description = "one rep"
98 | , regex = Rep (Sym 'a')
99 | , sample = "a"
100 | , result = True
101 | }
102 | , Case
103 | { description = "multiple rep"
104 | , regex = Rep (Sym 'a')
105 | , sample = "aaaaaaaaa"
106 | , result = True
107 | }
108 | , Case
109 | { description = "multiple rep with failure"
110 | , regex = Rep (Sym 'a')
111 | , sample = "aaaaaaaaab"
112 | , result = False
113 | }
114 | , Case
115 | { description = "sequence"
116 | , regex = Seq (Sym 'a') (Sym 'b')
117 | , sample = "ab"
118 | , result = True
119 | }
120 | , Case
121 | { description = "sequence with empty"
122 | , regex = Seq (Sym 'a') (Sym 'b')
123 | , sample = ""
124 | , result = False
125 | }
126 | , Case
127 | { description = "bad short sequence"
128 | , regex = Seq (Sym 'a') (Sym 'b')
129 | , sample = "a"
130 | , result = False
131 | }
132 | , Case
133 | { description = "bad long sequence"
134 | , regex = Seq (Sym 'a') (Sym 'b')
135 | , sample = "abc"
136 | , result = False
137 | }
138 | ]
139 |
140 |
141 |
--------------------------------------------------------------------------------
/haskell/06_RiggedRegex_Combinator/LICENSE:
--------------------------------------------------------------------------------
1 | The Brzozowski experiments are original work, and are copyright and
2 | licensed by Elf M. Sternberg according to the Mozilla 2.0 Public
3 | License. See the LICENSE.md file in the main directory
4 |
--------------------------------------------------------------------------------
/haskell/06_RiggedRegex_Combinator/README.md:
--------------------------------------------------------------------------------
1 | # Kleene Regular Expressions with Rigging, in Haskell
2 |
3 | This variant takes the RiggedRegex (Version 02) and provides a third
4 | Semiring, `Semiring Set String`. `Zero` is the empty set, `One` is a
5 | set with an empty string, `Add` is union and `Mul` is the cartesian
6 | concatenation of the tuples generated by the cartesian product. The
7 | `sym` function is now modified to return `Zero` on failure, or on
8 | success a Set containing the recognized character as a string.
9 |
10 | The union of a any set with the empty set is the set; the cartesian
11 | product of any set with the empty set is the empty set; the
12 | concatenation of the empty string with any set of strings is that set of
13 | strings, so the Semiring properties hold.
14 |
15 | The result is a regular expression engine that returns all possible
16 | unique sets of strings that resulted from matching the regular
17 | expression, or the empty set if no match happened.
18 |
19 | I'm not yet comfortable with the theoretical underpinnings of this
20 | variant, but I'm reading intensely to see where I can land this.
21 |
22 | It turns out that what I did is just fine, and is well-supported by the
23 | theoretical underpinnings. See "[Semiring
24 | Parsing](https://www.aclweb.org/anthology/J99-4004.pdf)" by Joshua
25 | Goodman.
26 |
27 |
28 |
--------------------------------------------------------------------------------
/haskell/06_RiggedRegex_Combinator/RiggedRegex.cabal:
--------------------------------------------------------------------------------
1 | cabal-version: 1.12
2 |
3 | -- This file has been generated from package.yaml by hpack version 0.31.1.
4 | --
5 | -- see: https://github.com/sol/hpack
6 | --
7 | -- hash: 24886bf51ff45652f17f1174185b977a916ba0794b24fee1315723e119dc204a
8 |
9 | name: RiggedRegex
10 | version: 0.1.0.0
11 | category: Regex
12 | homepage: https://github.com/elfsternberg/riggedregex#readme
13 | author: Elf M. Sternberg
14 | maintainer: elf.sternberg@gmail.com
15 | copyright: Copyright ⓒ 2019 Elf M. Sternberg
16 | license: BSD3
17 | license-file: LICENSE
18 | build-type: Simple
19 | extra-source-files:
20 | README.md
21 |
22 | library
23 | exposed-modules:
24 | RiggedRegex
25 | other-modules:
26 | Paths_RiggedRegex
27 | hs-source-dirs:
28 | src
29 | ghc-options: -Wall
30 | build-depends:
31 | base
32 | , containers
33 | default-language: Haskell2010
34 |
35 | test-suite test
36 | type: exitcode-stdio-1.0
37 | main-is: Tests.hs
38 | other-modules:
39 | Paths_RiggedRegex
40 | hs-source-dirs:
41 | test
42 | build-depends:
43 | RiggedRegex
44 | , base
45 | , containers
46 | , hspec
47 | default-language: Haskell2010
48 |
--------------------------------------------------------------------------------
/haskell/06_RiggedRegex_Combinator/package.yaml:
--------------------------------------------------------------------------------
1 | name: RiggedRegex
2 | version: 0.1.0.0
3 |
4 | homepage: https://github.com/elfsternberg/riggedregex#readme
5 | license: BSD3
6 | license-file: LICENSE
7 | author: Elf M. Sternberg
8 | maintainer: elf.sternberg@gmail.com
9 | copyright: Copyright ⓒ 2019 Elf M. Sternberg
10 | category: Regex
11 | build-type: Simple
12 | extra-source-files: README.md
13 |
14 | dependencies:
15 | - containers
16 | - base
17 |
18 | library:
19 | exposed-modules: RiggedRegex
20 | ghc-options: -Wall
21 | source-dirs: src
22 |
23 | tests:
24 | test:
25 | main: Tests.hs
26 | source-dirs: test
27 | dependencies:
28 | - RiggedRegex
29 | - hspec
30 |
31 |
32 |
--------------------------------------------------------------------------------
/haskell/06_RiggedRegex_Combinator/src/RiggedRegex.hs:
--------------------------------------------------------------------------------
1 | {-# LANGUAGE LambdaCase #-}
2 | {-# LANGUAGE FlexibleInstances #-}
3 |
4 | module RiggedRegex ( accept, acceptw, Reg (..), Regw (..), rigged, riggeds ) where
5 |
6 | import Data.Set hiding (split)
7 |
8 | data Reg =
9 | Eps -- Epsilon
10 | | Sym Char -- Character
11 | | Alt Reg Reg -- Alternation
12 | | Seq Reg Reg -- Sequence
13 | | Rep Reg -- R*
14 |
15 | accept :: Reg -> String -> Bool
16 | -- Epsilon
17 | accept Eps u = Prelude.null u
18 | -- Accept if the character offered matches the character constructed
19 | accept (Sym c) u = u == [c]
20 | -- Constructed of two other expressions, accept if either one does.
21 | accept (Alt p q) u = accept p u || accept q u
22 | -- Constructed of two other expressions, accept if p accepts some part
23 | -- of u and q accepts the rest, where u is split arbitrarily
24 | accept (Seq p q) u = or [accept p u1 && accept q u2 | (u1, u2) <- split u]
25 | -- For all convolutions of u containing no empty strings,
26 | -- if all the entries of that convolution are accepted,
27 | -- then at least one convolution is acceptable.
28 | accept (Rep r) u = or [and [accept r ui | ui <- ps] | ps <- parts u]
29 |
30 | -- Generate a list of all possible combinations of a prefix and suffix
31 | -- for the string offered.w
32 | split :: [a] -> [([a], [a])]
33 | split [] = [([], [])]
34 | split (c:cs) = ([], c : cs) : [(c : s1, s2) | (s1, s2) <- split cs]
35 |
36 | -- Generate lists of lists that contain all possible convolutions of
37 | -- the input string, not including the empty string.
38 | parts :: [a] -> [[[a]]]
39 | parts [] = [[]]
40 | parts [c] = [[[c]]]
41 | parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs]
42 |
43 | -- A semiring is an algebraic structure with a zero, a one, a
44 | -- "multiplication" operation, and an "addition" operation. Zero is
45 | -- the identity operator for addition, One is the identity operator for
46 | -- multiplication, both composition operators are associative (it does
47 | -- not matter how sequential operations are grouped), and addition is
48 | -- commutative (the order of the operations does not matter). Also,
49 | -- zero `mul` anything is always zero.
50 | --
51 | -- Which, in regular expressions in general, holds that the null regex
52 | -- is zero, and the empty string regex is one, alternation is addition
53 | -- and ... sequence is multiplication? Like "sum" and "product" types?
54 |
55 | -- Symw (c -> s) represents a mapping from a symbol to its given weight.
56 |
57 | class Semiring s where
58 | zero, one :: s
59 | mul, add :: s -> s -> s
60 |
61 | sym :: Semiring s => Char -> Regw Char s
62 | sym c = Symw (\b -> if b == c then one else zero)
63 |
64 | data Regw c s =
65 | Epsw -- Epsilon
66 | | Symw (c -> s) -- Character
67 | | Altw (Regw c s) (Regw c s) -- Alternation
68 | | Seqw (Regw c s) (Regw c s) -- Sequence
69 | | Repw (Regw c s) -- R*
70 |
71 | rigging :: Semiring s => (Char -> Regw Char s) -> Reg -> Regw Char s
72 | rigging s = \case
73 | Eps -> Epsw
74 | (Sym c) -> s c
75 | (Alt p q) -> Altw (rigging s p) (rigging s q)
76 | (Seq p q) -> Seqw (rigging s p) (rigging s q)
77 | (Rep r) -> Repw (rigging s r)
78 |
79 | rigged :: Semiring s => Reg -> Regw Char s
80 | rigged = rigging sym
81 |
82 | acceptw :: Semiring s => Regw c s -> [c] -> s
83 | acceptw Epsw u = if Prelude.null u then one else zero
84 | acceptw (Symw f) u =
85 | case u of
86 | [c] -> f c
87 | _ -> zero
88 | acceptw (Altw p q) u = acceptw p u `add` acceptw q u
89 | acceptw (Seqw p q) u = sumr [ acceptw p u1 `mul` acceptw q u2 | (u1, u2) <- split u ]
90 | acceptw (Repw r) u = sumr [ prodr [ acceptw r ui | ui <- ps ] | ps <- parts u ]
91 |
92 | -- Something feels hacky about this. I mean, I know, on the one
93 | -- hand than any epsilon is still "one" as far as the system is
94 | -- concerned; on the other hand, I would much rather have a better
95 | -- theoretical ground for what I just did here...
96 |
97 | syms :: Char -> Regw Char (Set String)
98 | syms c = Symw (\b -> if b == c then singleton [c] else zero)
99 |
100 | riggeds :: Reg -> Regw Char (Set String)
101 | riggeds = rigging syms
102 |
103 | sumr, prodr :: Semiring r => [r] -> r
104 | sumr = Prelude.foldr add zero
105 | prodr = Prelude.foldr mul one
106 |
107 | instance Semiring Int where
108 | zero = 0
109 | one = 1
110 | add = (Prelude.+)
111 | mul = (Prelude.*)
112 |
113 | instance Semiring Bool where
114 | zero = False
115 | one = True
116 | add = (||)
117 | mul = (&&)
118 |
119 | -- εs = {(ε, s)} Empty Word
120 | -- c = {(c, c)} Token
121 | -- L1 ◦ L2 = {(uv,(s, t)) | (u, s) ∈ L1 and (v, t) ∈ L2} Concatenation
122 | -- L1 ∪ L2 = {(u, s) | (u, s)} ∈ L1 Alternation
123 |
124 | -- Boolean Semiring (TRUE, FALSE,∨,∧, FALSE, TRUE) recognition
125 | -- Inside Semiring (R(1/0), +, ×, 0, 1) string probability
126 | -- Counting Semiring (N(∞/0), +, ×, 0, 1) number of derivations
127 | -- Derivation Forests Semiring (2E,∪, ·, ∅, {<>}) set of derivation
128 |
129 | instance Semiring (Set String) where
130 | zero = empty
131 | one = singleton ""
132 | add = union
133 | mul a b = Data.Set.map (uncurry (++)) $ cartesianProduct a b
134 |
--------------------------------------------------------------------------------
/haskell/06_RiggedRegex_Combinator/stack.yaml:
--------------------------------------------------------------------------------
1 | # This file was automatically generated by 'stack init'
2 | #
3 | # Some commonly used options have been documented as comments in this file.
4 | # For advanced use and comprehensive documentation of the format, please see:
5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/
6 |
7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version.
8 | # A snapshot resolver dictates the compiler version and the set of packages
9 | # to be used for project dependencies. For example:
10 | #
11 | # resolver: lts-3.5
12 | # resolver: nightly-2015-09-21
13 | # resolver: ghc-7.10.2
14 | #
15 | # The location of a snapshot can be provided as a file or url. Stack assumes
16 | # a snapshot provided as a file might change, whereas a url resource does not.
17 | #
18 | # resolver: ./custom-snapshot.yaml
19 | # resolver: https://example.com/snapshots/2018-01-01.yaml
20 | resolver: lts-13.4
21 |
22 | # User packages to be built.
23 | # Various formats can be used as shown in the example below.
24 | #
25 | # packages:
26 | # - some-directory
27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz
28 | # - location:
29 | # git: https://github.com/commercialhaskell/stack.git
30 | # commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
32 | # subdirs:
33 | # - auto-update
34 | # - wai
35 | packages:
36 | - .
37 | # Dependency packages to be pulled from upstream that are not in the resolver
38 | # using the same syntax as the packages field.
39 | # (e.g., acme-missiles-0.3)
40 | # extra-deps: []
41 |
42 | # Override default flag values for local packages and extra-deps
43 | # flags: {}
44 |
45 | # Extra package databases containing global packages
46 | # extra-package-dbs: []
47 |
48 | # Control whether we use the GHC we find on the path
49 | # system-ghc: true
50 | #
51 | # Require a specific version of stack, using version ranges
52 | # require-stack-version: -any # Default
53 | # require-stack-version: ">=1.9"
54 | #
55 | # Override the architecture used by stack, especially useful on Windows
56 | # arch: i386
57 | # arch: x86_64
58 | #
59 | # Extra directories used by stack for building
60 | # extra-include-dirs: [/path/to/dir]
61 | # extra-lib-dirs: [/path/to/dir]
62 | #
63 | # Allow a newer minor version of GHC than the snapshot specifies
64 | # compiler-check: newer-minor
65 |
--------------------------------------------------------------------------------
/haskell/06_RiggedRegex_Combinator/test/Tests.hs:
--------------------------------------------------------------------------------
1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-}
2 | {-# LANGUAGE RecordWildCards #-}
3 | import Data.Foldable (for_)
4 | import Test.Hspec (Spec, it, shouldBe)
5 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)
6 | import RiggedRegex (Reg (..), accept, acceptw, rigged, riggeds)
7 | import Data.Set
8 | import Data.List (sort)
9 |
10 | main :: IO ()
11 | main = hspecWith defaultConfig {configFastFail = True} specs
12 |
13 | specs :: Spec
14 | specs = do
15 |
16 | let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) )
17 | let onec = Seq nocs (Sym 'c')
18 | let evencs = Seq ( Rep ( Seq onec onec ) ) nocs
19 |
20 | let as = Alt (Sym 'a') (Rep (Sym 'a'))
21 | let bs = Alt (Sym 'b') (Rep (Sym 'b'))
22 |
23 | it "simple expression" $
24 | accept evencs "acc" `shouldBe` True
25 |
26 | it "lifted expression" $
27 | (acceptw (rigged evencs) "acc" :: Bool) `shouldBe` True
28 |
29 | it "lifted expression short" $
30 | (acceptw (rigged evencs) "acc" :: Int) `shouldBe` 1
31 |
32 | it "lifted expression counter two" $
33 | (acceptw (rigged as) "a" :: Int) `shouldBe` 2
34 |
35 | it "lifted expression counter one" $
36 | (acceptw (rigged as) "aa" :: Int) `shouldBe` 1
37 |
38 | it "lifted expression dynamic counter four" $
39 | (acceptw (rigged (Seq as bs)) "ab" :: Int) `shouldBe` 4
40 |
41 | it "parse forests" $
42 | (sort $ toList $ (acceptw (riggeds (Seq as bs)) "ab" :: Set String)) `shouldBe` ["ab"]
43 |
44 | for_ cases test
45 | where
46 | test Case {..} = it description assertion
47 | where
48 | assertion = (acceptw (rigged regex) sample :: Bool) `shouldBe` result
49 |
50 | data Case = Case
51 | { description :: String
52 | , regex :: Reg
53 | , sample :: String
54 | , result :: Bool
55 | }
56 |
57 | cases :: [Case]
58 | cases =
59 | [ Case {description = "empty", regex = Eps, sample = "", result = True}
60 | , Case {description = "char", regex = Sym 'a', sample = "a", result = True}
61 | , Case
62 | {description = "not char", regex = Sym 'a', sample = "b", result = False}
63 | , Case
64 | { description = "char vs empty"
65 | , regex = Sym 'a'
66 | , sample = ""
67 | , result = False
68 | }
69 | , Case
70 | { description = "left alt"
71 | , regex = Alt (Sym 'a') (Sym 'b')
72 | , sample = "a"
73 | , result = True
74 | }
75 | , Case
76 | { description = "right alt"
77 | , regex = Alt (Sym 'a') (Sym 'b')
78 | , sample = "b"
79 | , result = True
80 | }
81 | , Case
82 | { description = "neither alt"
83 | , regex = Alt (Sym 'a') (Sym 'b')
84 | , sample = "c"
85 | , result = False
86 | }
87 | , Case
88 | { description = "empty alt"
89 | , regex = Alt (Sym 'a') (Sym 'b')
90 | , sample = ""
91 | , result = False
92 | }
93 | , Case
94 | { description = "empty rep"
95 | , regex = Rep (Sym 'a')
96 | , sample = ""
97 | , result = True
98 | }
99 | , Case
100 | { description = "one rep"
101 | , regex = Rep (Sym 'a')
102 | , sample = "a"
103 | , result = True
104 | }
105 | , Case
106 | { description = "multiple rep"
107 | , regex = Rep (Sym 'a')
108 | , sample = "aaaaaaaaa"
109 | , result = True
110 | }
111 | , Case
112 | { description = "multiple rep with failure"
113 | , regex = Rep (Sym 'a')
114 | , sample = "aaaaaaaaab"
115 | , result = False
116 | }
117 | , Case
118 | { description = "sequence"
119 | , regex = Seq (Sym 'a') (Sym 'b')
120 | , sample = "ab"
121 | , result = True
122 | }
123 | , Case
124 | { description = "sequence with empty"
125 | , regex = Seq (Sym 'a') (Sym 'b')
126 | , sample = ""
127 | , result = False
128 | }
129 | , Case
130 | { description = "bad short sequence"
131 | , regex = Seq (Sym 'a') (Sym 'b')
132 | , sample = "a"
133 | , result = False
134 | }
135 | , Case
136 | { description = "bad long sequence"
137 | , regex = Seq (Sym 'a') (Sym 'b')
138 | , sample = "abc"
139 | , result = False
140 | }
141 | ]
142 |
143 |
144 |
--------------------------------------------------------------------------------
/haskell/07_Rigged_Glushkov/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright (c) 2010, Thomas Wilke, Frank Huch, Sebastian Fischer
2 |
3 | All rights reserved.
4 |
5 | Redistribution and use in source and binary forms, with or without
6 | modification, are permitted provided that the following conditions are
7 | met:
8 |
9 | 1. Redistributions of source code must retain the above copyright
10 | notice, this list of conditions and the following disclaimer.
11 |
12 | 2. Redistributions in binary form must reproduce the above copyright
13 | notice, this list of conditions and the following disclaimer in
14 | the documentation and/or other materials provided with the
15 | distribution.
16 |
17 | 3. Neither the name of the author nor the names of his contributors
18 | may be used to endorse or promote products derived from this
19 | software without specific prior written permission.
20 |
21 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
22 | ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
23 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
24 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR
25 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
26 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
27 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
28 | PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
29 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
30 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
31 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
32 |
33 |
--------------------------------------------------------------------------------
/haskell/07_Rigged_Glushkov/README.md:
--------------------------------------------------------------------------------
1 | # Rigged Glushkov Regular Expressions, in Haskell
2 |
3 | This is by far the most successful Haskell experiment yet. It builds on
4 | Experiment 04, "Glushkov Regular Expressions," and adds the Semiring
5 | implementation.
6 |
7 | We use the familiar pattern of building our regular expressions using
8 | the Kleene primitive pattern developed for Experiment 01, then lift the
9 | constructed expression into our Gluskhov representation and run it
10 | through a modified version of the 'shift' function to produce a result.
11 | In this version, as in previous rigged versions, we apply the logic of
12 | regular expressions to our semiring data during parsing.
13 |
14 | One thing that was necessary here was that, to support more complex
15 | semirings, those that are not just primitive data with simple zero or
16 | one representations, I needed to provide a constructor to the shift
17 | function that knew how to build new symbol operations. When you "rig"
18 | the Kleene representation, you must provide a function that takes a char
19 | and returns a symbol operator that includes the semiring.
20 |
21 | Rigging examples were *not* included in the paper. This was the first
22 | experiment where I had to come up with some parts of the solution on my
23 | own, and solving it was a fun problem. This particular version took
24 | about four hours to puzzle out, but it was worth it. I'm sure there are
25 | alternatives to my rigging-with-constructor solution, but this works and
26 | I'm not unhappy with it. It does look a bit cluttered, but that's
27 | actually how it's presented in the paper; my solution actually reduces
28 | some of the clutter.
29 |
30 | Otherwise, this version works pretty much the same way you'd expect a
31 | merger of the Kleene Semiring version and the Glushkov boolean version
32 | work.
33 |
34 | One thing that came out of the paper was the use of a Haskell
35 | record-type to record whether or not a node had already been analyzed
36 | for its finality and emptiness; this caches those results and "shorts
37 | out" traversing down the tree to rediscover these properties, resulting
38 | in a bit of a speed-up.
39 |
--------------------------------------------------------------------------------
/haskell/07_Rigged_Glushkov/RiggedGlushkov.cabal:
--------------------------------------------------------------------------------
1 | cabal-version: 1.12
2 |
3 | -- This file has been generated from package.yaml by hpack version 0.31.1.
4 | --
5 | -- see: https://github.com/sol/hpack
6 | --
7 | -- hash: b650d4292e70e7a507191f08e2c62b80a0eb311278b7aef3a2a084f2dac0c3ca
8 |
9 | name: RiggedGlushkov
10 | version: 0.1.0.0
11 | category: Regex
12 | homepage: https://github.com/elfsternberg/riggedregex#readme
13 | author: Elf M. Sternberg
14 | maintainer: elf.sternberg@gmail.com
15 | copyright: Copyright ⓒ 2019 Elf M. Sternberg
16 | license: BSD3
17 | license-file: LICENSE
18 | build-type: Simple
19 | extra-source-files:
20 | README.md
21 |
22 | library
23 | exposed-modules:
24 | RiggedGlushkov
25 | other-modules:
26 | Paths_RiggedGlushkov
27 | hs-source-dirs:
28 | src
29 | ghc-options: -Wall
30 | build-depends:
31 | base
32 | , containers
33 | default-language: Haskell2010
34 |
35 | test-suite test
36 | type: exitcode-stdio-1.0
37 | main-is: Tests.hs
38 | other-modules:
39 | Paths_RiggedGlushkov
40 | hs-source-dirs:
41 | test
42 | build-depends:
43 | RiggedGlushkov
44 | , base
45 | , containers
46 | , hspec
47 | default-language: Haskell2010
48 |
--------------------------------------------------------------------------------
/haskell/07_Rigged_Glushkov/package.yaml:
--------------------------------------------------------------------------------
1 | name: RiggedGlushkov
2 | version: 0.1.0.0
3 |
4 | homepage: https://github.com/elfsternberg/riggedregex#readme
5 | license: BSD3
6 | license-file: LICENSE
7 | author: Elf M. Sternberg
8 | maintainer: elf.sternberg@gmail.com
9 | copyright: Copyright ⓒ 2019 Elf M. Sternberg
10 | category: Regex
11 | build-type: Simple
12 | extra-source-files: README.md
13 |
14 | dependencies:
15 | - containers
16 | - base
17 |
18 | library:
19 | exposed-modules: RiggedGlushkov
20 | ghc-options: -Wall
21 | source-dirs: src
22 |
23 | tests:
24 | test:
25 | main: Tests.hs
26 | source-dirs: test
27 | dependencies:
28 | - RiggedGlushkov
29 | - hspec
30 |
31 |
--------------------------------------------------------------------------------
/haskell/07_Rigged_Glushkov/src/RiggedGlushkov.hs:
--------------------------------------------------------------------------------
1 | {-# LANGUAGE FlexibleInstances #-}
2 | {-# LANGUAGE LambdaCase #-}
3 |
4 | module RiggedGlushkov ( Glu(..), acceptg, rigged, riggeds ) where
5 |
6 | import Data.Set hiding (foldl, split)
7 |
8 | data Glu
9 | = Eps
10 | | Sym Bool Char
11 | | Alt Glu Glu
12 | | Seq Glu Glu
13 | | Rep Glu
14 |
15 | -- Just as with the Kleene versions, we're going to exploit the fact
16 | -- that we have a working version. For Rust, we're going to do
17 | -- something a little different. But for now...
18 | --
19 | -- This is interesting. The paper decides that, to keep the cost of
20 | -- processing down, we're going to cache the results of empty and
21 | -- final. One of the prices paid, though, is in the complexity of the
22 | -- data type for our expressions, and that complexity is now managed
23 | -- through factories.
24 |
25 | class Semiring s where
26 | zero, one :: s
27 | mul, add :: s -> s -> s
28 |
29 | data Glue c s = Glue
30 | { emptye :: s
31 | , finale :: s
32 | , gluw :: Gluw c s
33 | }
34 |
35 | data Gluw c s
36 | = Epsw
37 | | Symw (c -> s)
38 | | Altw (Glue c s) (Glue c s)
39 | | Seqw (Glue c s) (Glue c s)
40 | | Repw (Glue c s)
41 |
42 | epsw :: Semiring s => Glue c s
43 | epsw = Glue {emptye = one, finale = zero, gluw = Epsw}
44 |
45 | symw :: Semiring s => (c -> s) -> Glue c s
46 | symw f = Glue {emptye = zero, finale = zero, gluw = Symw f}
47 |
48 | altw :: Semiring s => Glue c s -> Glue c s -> Glue c s
49 | altw l r =
50 | Glue
51 | { emptye = add (emptye l) (emptye r),
52 | finale = add (finale l) (finale r),
53 | gluw = Altw l r
54 | }
55 |
56 | seqw :: Semiring s => Glue c s -> Glue c s -> Glue c s
57 | seqw l r =
58 | Glue
59 | { emptye = mul (emptye l) (emptye r),
60 | finale = add (mul (finale l) (emptye r)) (finale r),
61 | gluw = Seqw l r
62 | }
63 |
64 | repw :: Semiring s => Glue c s -> Glue c s
65 | repw r = Glue {emptye = one, finale = finale r, gluw = Repw r}
66 |
67 | -- for my edification, the syntax under Symw is syntax for "replace
68 | -- this value in the created record."
69 | -- > data Foo = Foo { a :: Int, b :: Int } deriving (Show)
70 | -- > (Foo 1 2) { b = 4 }
71 | -- Foo { a = 1, b = 4 }
72 | -- It doesn't seem to be functional, i.e. Foo 1 2 $ { b = 4 } doesn't work.
73 | shifte :: Semiring s => s -> Gluw c s -> c -> Glue c s
74 | shifte _ Epsw _ = epsw
75 | shifte m (Symw f) c = (symw f) {finale = m `mul` f c}
76 | shifte m (Seqw l r) c =
77 | seqw
78 | (shifte m (gluw l) c)
79 | (shifte (add (m `mul` (emptye l)) (finale l)) (gluw r) c)
80 | shifte m (Altw l r) c = altw (shifte m (gluw l) c) (shifte m (gluw r) c)
81 | shifte m (Repw r) c = repw (shifte (m `add` finale r) (gluw r) c)
82 |
83 | sym :: (Semiring s, Eq c) => c -> Glue c s
84 | sym c = symw (\b -> if b == c then one else zero)
85 |
86 | rigging :: Semiring s => (Char -> Glue Char s) -> Glu -> Glue Char s
87 | rigging s =
88 | \case
89 | Eps -> epsw
90 | (Sym _ c) -> s c
91 | (Alt p q) -> altw (rigging s p) (rigging s q)
92 | (Seq p q) -> seqw (rigging s p) (rigging s q)
93 | (Rep r) -> repw (rigging s r)
94 |
95 | rigged :: Semiring s => Glu -> Glue Char s
96 | rigged = rigging sym
97 |
98 | syms :: Char -> Glue Char (Set String)
99 | syms c = symw (\b -> if b == c then singleton [c] else zero)
100 |
101 | riggeds :: Glu -> Glue Char (Set String)
102 | riggeds = rigging syms
103 |
104 | instance Semiring (Set String) where
105 | zero = empty
106 | one = singleton ""
107 | add = union
108 | mul a b = Data.Set.map (uncurry (++)) $ cartesianProduct a b
109 |
110 | instance Semiring Int where
111 | zero = 0
112 | one = 1
113 | add = (Prelude.+)
114 | mul = (Prelude.*)
115 |
116 | instance Semiring Bool where
117 | zero = False
118 | one = True
119 | add = (||)
120 | mul = (&&)
121 |
122 | acceptg :: Semiring s => Glue c s -> [c] -> s
123 | acceptg r [] = emptye r
124 | acceptg r (c:cs) =
125 | finale (foldl (shifte zero . gluw) (shifte one (gluw r) c) cs)
126 |
--------------------------------------------------------------------------------
/haskell/07_Rigged_Glushkov/stack.yaml:
--------------------------------------------------------------------------------
1 | # This file was automatically generated by 'stack init'
2 | #
3 | # Some commonly used options have been documented as comments in this file.
4 | # For advanced use and comprehensive documentation of the format, please see:
5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/
6 |
7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version.
8 | # A snapshot resolver dictates the compiler version and the set of packages
9 | # to be used for project dependencies. For example:
10 | #
11 | # resolver: lts-3.5
12 | # resolver: nightly-2015-09-21
13 | # resolver: ghc-7.10.2
14 | #
15 | # The location of a snapshot can be provided as a file or url. Stack assumes
16 | # a snapshot provided as a file might change, whereas a url resource does not.
17 | #
18 | # resolver: ./custom-snapshot.yaml
19 | # resolver: https://example.com/snapshots/2018-01-01.yaml
20 | resolver: lts-13.5
21 |
22 | # User packages to be built.
23 | # Various formats can be used as shown in the example below.
24 | #
25 | # packages:
26 | # - some-directory
27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz
28 | # - location:
29 | # git: https://github.com/commercialhaskell/stack.git
30 | # commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
32 | # subdirs:
33 | # - auto-update
34 | # - wai
35 | packages:
36 | - .
37 | # Dependency packages to be pulled from upstream that are not in the resolver
38 | # using the same syntax as the packages field.
39 | # (e.g., acme-missiles-0.3)
40 | # extra-deps: []
41 |
42 | # Override default flag values for local packages and extra-deps
43 | # flags: {}
44 |
45 | # Extra package databases containing global packages
46 | # extra-package-dbs: []
47 |
48 | # Control whether we use the GHC we find on the path
49 | # system-ghc: true
50 | #
51 | # Require a specific version of stack, using version ranges
52 | # require-stack-version: -any # Default
53 | # require-stack-version: ">=1.9"
54 | #
55 | # Override the architecture used by stack, especially useful on Windows
56 | # arch: i386
57 | # arch: x86_64
58 | #
59 | # Extra directories used by stack for building
60 | # extra-include-dirs: [/path/to/dir]
61 | # extra-lib-dirs: [/path/to/dir]
62 | #
63 | # Allow a newer minor version of GHC than the snapshot specifies
64 | # compiler-check: newer-minor
65 |
--------------------------------------------------------------------------------
/haskell/07_Rigged_Glushkov/test/Tests.hs:
--------------------------------------------------------------------------------
1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-}
2 | {-# LANGUAGE RecordWildCards #-}
3 |
4 | import Data.Foldable (for_)
5 | import Test.Hspec (Spec, describe, it, shouldBe)
6 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)
7 | import RiggedGlushkov (Glu (..), acceptg, rigged, riggeds)
8 | import Data.Set
9 | import Data.List (sort)
10 |
11 | main :: IO ()
12 | main = hspecWith defaultConfig {configFastFail = True} specs
13 |
14 | msym :: Char -> Glu
15 | msym c = Sym False c
16 |
17 | specs :: Spec
18 | specs = do
19 |
20 | let nocs = Rep ( Alt ( msym 'a' ) ( msym 'b' ) )
21 | let onec = Seq nocs (msym 'c')
22 | let evencs = Seq ( Rep ( Seq onec onec ) ) nocs
23 |
24 | let as = Alt (msym 'a') (Rep (msym 'a'))
25 | let bs = Alt (msym 'b') (Rep (msym 'b'))
26 |
27 | -- it "lifted expression" $
28 | -- (acceptg (rigged evencs) "acc" :: Bool) `shouldBe` True
29 |
30 | it "lifted expression short" $
31 | (acceptg (rigged evencs) "acc" :: Int) `shouldBe` 1
32 |
33 | it "lifted expression counter two" $
34 | (acceptg (rigged as) "a" :: Int) `shouldBe` 2
35 |
36 | it "lifted expression counter one" $
37 | (acceptg (rigged as) "aa" :: Int) `shouldBe` 1
38 |
39 | it "lifted expression dynamic counter four" $
40 | (acceptg (rigged (Seq as bs)) "ab" :: Int) `shouldBe` 4
41 |
42 | it "parse forests" $
43 | (sort $ toList $ (acceptg (riggeds (Seq as bs)) "ab" :: Set String)) `shouldBe` ["ab"]
44 |
45 | for_ cases test
46 | where
47 | test Case {..} = it description assertion
48 | where
49 | assertion = (acceptg (rigged regex) sample :: Bool) `shouldBe` result
50 |
51 | data Case = Case
52 | { description :: String
53 | , regex :: Glu
54 | , sample :: String
55 | , result :: Bool
56 | }
57 |
58 | cases :: [Case]
59 | cases =
60 | [ Case {description = "empty", regex = Eps, sample = "", result = True}
61 | , Case {description = "char", regex = msym 'a', sample = "a", result = True}
62 | , Case
63 | {description = "not char", regex = msym 'a', sample = "b", result = False}
64 | , Case
65 | { description = "char vs empty"
66 | , regex = msym 'a'
67 | , sample = ""
68 | , result = False
69 | }
70 | , Case
71 | { description = "left alt"
72 | , regex = Alt (msym 'a') (msym 'b')
73 | , sample = "a"
74 | , result = True
75 | }
76 | , Case
77 | { description = "right alt"
78 | , regex = Alt (msym 'a') (msym 'b')
79 | , sample = "b"
80 | , result = True
81 | }
82 | , Case
83 | { description = "neither alt"
84 | , regex = Alt (msym 'a') (msym 'b')
85 | , sample = "c"
86 | , result = False
87 | }
88 | , Case
89 | { description = "empty alt"
90 | , regex = Alt (msym 'a') (msym 'b')
91 | , sample = ""
92 | , result = False
93 | }
94 | , Case
95 | { description = "empty rep"
96 | , regex = Rep (msym 'a')
97 | , sample = ""
98 | , result = True
99 | }
100 | , Case
101 | { description = "one rep"
102 | , regex = Rep (msym 'a')
103 | , sample = "a"
104 | , result = True
105 | }
106 | , Case
107 | { description = "multiple rep"
108 | , regex = Rep (msym 'a')
109 | , sample = "aaaaaaaaa"
110 | , result = True
111 | }
112 | , Case
113 | { description = "multiple rep with failure"
114 | , regex = Rep (msym 'a')
115 | , sample = "aaaaaaaaab"
116 | , result = False
117 | }
118 | , Case
119 | { description = "sequence"
120 | , regex = Seq (msym 'a') (msym 'b')
121 | , sample = "ab"
122 | , result = True
123 | }
124 | , Case
125 | { description = "sequence with empty"
126 | , regex = Seq (msym 'a') (msym 'b')
127 | , sample = ""
128 | , result = False
129 | }
130 | , Case
131 | { description = "bad short sequence"
132 | , regex = Seq (msym 'a') (msym 'b')
133 | , sample = "a"
134 | , result = False
135 | }
136 | , Case
137 | { description = "bad long sequence"
138 | , regex = Seq (msym 'a') (msym 'b')
139 | , sample = "abc"
140 | , result = False
141 | }
142 | ]
143 |
144 |
145 |
--------------------------------------------------------------------------------
/haskell/08_Heavyweights/Heavyweights.cabal:
--------------------------------------------------------------------------------
1 | cabal-version: 1.12
2 |
3 | -- This file has been generated from package.yaml by hpack version 0.31.1.
4 | --
5 | -- see: https://github.com/sol/hpack
6 | --
7 | -- hash: c903f1e543aacf648806799a4c925c51ece2e6c833560c723350207fa137497f
8 |
9 | name: Heavyweights
10 | version: 0.1.0.0
11 | category: Regex
12 | homepage: https://github.com/elfsternberg/riggedregex#readme
13 | author: Elf M. Sternberg
14 | maintainer: elf.sternberg@gmail.com
15 | copyright: Copyright ⓒ 2019 Elf M. Sternberg
16 | license: MPL-2.0
17 | license-file: LICENSE.md
18 | build-type: Simple
19 | extra-source-files:
20 | README.md
21 |
22 | library
23 | exposed-modules:
24 | Heavyweights
25 | other-modules:
26 | Paths_Heavyweights
27 | hs-source-dirs:
28 | src
29 | ghc-options: -Wall
30 | build-depends:
31 | base
32 | , containers
33 | default-language: Haskell2010
34 |
35 | test-suite test
36 | type: exitcode-stdio-1.0
37 | main-is: Tests.hs
38 | other-modules:
39 | Paths_Heavyweights
40 | hs-source-dirs:
41 | test
42 | build-depends:
43 | Heavyweights
44 | , base
45 | , containers
46 | , hspec
47 | default-language: Haskell2010
48 |
--------------------------------------------------------------------------------
/haskell/08_Heavyweights/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright (c) 2010, Thomas Wilke, Frank Huch, Sebastian Fischer
2 |
3 | All rights reserved.
4 |
5 | Redistribution and use in source and binary forms, with or without
6 | modification, are permitted provided that the following conditions are
7 | met:
8 |
9 | 1. Redistributions of source code must retain the above copyright
10 | notice, this list of conditions and the following disclaimer.
11 |
12 | 2. Redistributions in binary form must reproduce the above copyright
13 | notice, this list of conditions and the following disclaimer in
14 | the documentation and/or other materials provided with the
15 | distribution.
16 |
17 | 3. Neither the name of the author nor the names of his contributors
18 | may be used to endorse or promote products derived from this
19 | software without specific prior written permission.
20 |
21 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
22 | ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
23 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
24 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR
25 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
26 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
27 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
28 | PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
29 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
30 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
31 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
32 |
33 |
--------------------------------------------------------------------------------
/haskell/08_Heavyweights/LICENSE.md:
--------------------------------------------------------------------------------
1 | See license in main directory
2 |
--------------------------------------------------------------------------------
/haskell/08_Heavyweights/README.md:
--------------------------------------------------------------------------------
1 | # Rigged Glushkov Regular Expressions in Haskell: Compliance Experiments
2 |
3 | This implementation doesn't differ much from [Experiment 07: Rigged
4 | Glushkov Regular Expressions in Haskell](../07_Rigged_Glushkov), except
5 | that it adds two new Semiring implementations to the library.
6 |
7 | Recall the basics of Semiring theory: There is a zero, a one, an
8 | "addition" operation and a "multiplication" operation. These two
9 | operators have identities (in numbers, addition has zero, multiplication
10 | has one) and the operators behave similarly (multiplication times zero
11 | always equals zero, or "nothing."), and a data type on which these
12 | operations work.
13 |
14 | We've used these principles to do boolean recognition; multiplication is
15 | the boolean `and` operator, used to encode sequences using annihilation:
16 | any sequence that doesn't match is `False`, and `False && x` is always
17 | `False`. If the entire truth of an expression depends on an annihilated
18 | sequence, then it's not true.
19 |
20 | We've used it to count ambiguities, via integers: by using addition *as*
21 | addition, we count the number of different regular languages encoded in
22 | our initial expression that could have produced the string submitted,
23 | thus revealing the number of ambiguities in our expression. Each `or`
24 | that returns 1 reveals a different path, so an alternate pattern will
25 | return the sum of the paths that pass through it.
26 |
27 | And we've even used it to identify the strings that match. By saying
28 | that our Semiring is a "set of strings", our addition is union (that is,
29 | we keep the set of all paths through the alternate pattern), and
30 | multiplication is the concatenation of the cartesian products of the two
31 | elements of a sequence (so for a basic pattern with no alternatives it's
32 | just a concatenation of the two strings, but for alternatives with
33 | multiples, it's the concatenation of all possible combinations), we've
34 | created a way to extract the exact string(s) that we submitted to the
35 | machine that matched.
36 |
37 | In *Heavyweights*, Fischer, Huch and Wilke go further and show how
38 | clever choices among zeros and ones can lead to some rather powerful
39 | outcomes.
40 |
41 | The first thing to appreciate is that our symbol operator, `sym`, has
42 | never actually been about symbols. It's about predicates. Our base
43 | implementation has been to pass a closured comparison with our desired
44 | symbol, returning zero or one.
45 |
46 | For the string implementation, which is *not* covered in the paper and
47 | which I managed to extract, successfully, from Might's work, I passed to
48 | `sym` instead a closured comparison to the desired symbol, and the
49 | return value was either the zero or `singleton [c]`, meaning a set with
50 | a string of one character in it. (I'm quite proud of that work; it both
51 | affirmed my notion that Might & Adams had a semiring implementation,
52 | they just didn't call it that, and that I was able to merge two
53 | different equational systems, applying some notions of category theory
54 | to do so.)
55 |
56 | The definition of `sym` was: `sym :: (Semiring s, Eq c) => c -> Reg c
57 | s`. I added `syms`: `syms :: Char -> Reg Char (Set String)`. Now the
58 | three provide `symi`: symi :: Semiringi s => Char -> Reg (Int, Char) s`
59 | This is a semiring that *takes* both an Int and a Char, and their
60 | `accept` method `zip`s the input value with a position value, so that
61 | both are available for processing. Remember that everything else
62 | depends on the Semiring, and *not* the input type; only `sym` cares.
63 |
64 | Now they add an "indexed semiring," and to it provide a version of `sym`
65 | that returns the `index` semiring when true, and zero otherwise.
66 |
67 | class Semiring s => Semiringi s where
68 | index :: Int -> s
69 |
70 | symi :: Semiringi s => Char -> Glue (Int, Char) s
71 | symi c = symw weight
72 | where weight (pos, x) | x == c = index pos
73 | | otherwise = zero
74 |
75 | But what *is* the `index` semiring? Here's where things get
76 | interesting. Fischer, et. al., want to encode the length of the longest
77 | submatch. The first thing they do is define submatch as a variant of
78 | accept, with a lead-in that just matches everything. This is okay, as
79 | this is a Glushkov machine and that just means that the 'arb' NFA will
80 | almost always be active, but it won't be important to us, it's not
81 | working with `symi` values.
82 |
83 | submatch :: Semiring s => Glue (Int, c) s -> [c] -> s
84 | submatch r s =
85 | accept (seqw arb (seqw r arb)) (zip [0..] s)
86 | where arb = repw (symw (\_ -> one))
87 |
88 | So... what are the zero and one of a "longest submatch" operation? The
89 | zero is that no match ever occurred. The one is that a match is
90 | possible, but hasn't yet occurred. Any other value is a submatch. The
91 | final value is the longest interval of the submatch.
92 |
93 | Fischer, et al. break up their semiring into two parts:
94 |
95 | data LeftLong = NoLeftLong | LeftLong Range deriving (Show)
96 | data Range = NoRange | Range Int Int deriving (Show)
97 |
98 | `NoLeftLong` is zero; it could never happen, there was no match.
99 | `NoRange` is the one, meaning it could still happen, it just hasn't
100 | yet. And `Range` is a submatch that has been found.
101 |
102 | For addition (which symbolizes alternation, recall), adding a failure to
103 | anything is the anything, no `add NoLeftLong x = x`, and that's true the
104 | other way. Adding a range with an empty range is just the range, and
105 | adding two ranges is to pick the longer of the two.
106 |
107 | For multiplication, again, multiplying by failure is just failure.
108 | Multiplying anything with `NoRange` means that the anything is preserved
109 | unchanged, and multiplying two ranges is a new range with the start of
110 | the first range and the end of the latter range. (Recall that for
111 | Semirings, the operations are associative but they are *not*
112 | commutative. They may *be* commutative for some sets, but it's not a
113 | requirement of semirings and you shouldn't count on commutativity.)
114 |
115 |
--------------------------------------------------------------------------------
/haskell/08_Heavyweights/package.yaml:
--------------------------------------------------------------------------------
1 | name: Heavyweights
2 | version: 0.1.0.0
3 |
4 | homepage: https://github.com/elfsternberg/riggedregex#readme
5 | license: MPL-2.0
6 | license-file: LICENSE.md
7 | author: Elf M. Sternberg
8 | maintainer: elf.sternberg@gmail.com
9 | copyright: Copyright ⓒ 2019 Elf M. Sternberg
10 | category: Regex
11 | build-type: Simple
12 | extra-source-files: README.md
13 |
14 | dependencies:
15 | - base
16 | - containers
17 |
18 | library:
19 | exposed-modules: Heavyweights
20 | ghc-options: -Wall
21 | source-dirs: src
22 |
23 | tests:
24 | test:
25 | main: Tests.hs
26 | source-dirs: test
27 | dependencies:
28 | - Heavyweights
29 | - hspec
30 |
--------------------------------------------------------------------------------
/haskell/08_Heavyweights/src/Heavyweights.hs:
--------------------------------------------------------------------------------
1 | {-# LANGUAGE FlexibleInstances #-}
2 | {-# LANGUAGE LambdaCase #-}
3 |
4 | module Heavyweights ( Reg(..), Glue, accept, rigged, riggeds, riggew, submatch, symi, altw, seqw, repw, LeftLong(..) ) where
5 |
6 | import Data.Set hiding (foldl, split)
7 |
8 | data Reg
9 | = Eps
10 | | Sym Bool Char
11 | | Alt Reg Reg
12 | | Seq Reg Reg
13 | | Rep Reg
14 |
15 | -- Just as with the Kleene versions, we're going to exploit the fact
16 | -- that we have a working version. For Rust, we're going to do
17 | -- something a little different. But for now...
18 | --
19 | -- This is interesting. The paper decides that, to keep the cost of
20 | -- processing down, we're going to cache the results of emptyg and
21 | -- final. One of the prices paid, though, is in the complexity of the
22 | -- data type for our expressions, and that complexity is now managed
23 | -- through factories.
24 |
25 | class Semiring s where
26 | zero, one :: s
27 | mul, add :: s -> s -> s
28 |
29 | data Glue c s = Glue
30 | { emptyg :: s
31 | , final :: s
32 | , glu :: Glu c s
33 | }
34 |
35 | -- 'Glu' is just the representative of the regex element
36 | -- 'Glue' is the extended representation with cached values
37 |
38 | data Glu c s
39 | = Epsw
40 | | Symw (c -> s)
41 | | Altw (Glue c s) (Glue c s)
42 | | Seqw (Glue c s) (Glue c s)
43 | | Repw (Glue c s)
44 |
45 | epsw :: Semiring s => Glue c s
46 | epsw = Glue {emptyg = one, final = zero, glu = Epsw}
47 |
48 | symw :: Semiring s => (c -> s) -> Glue c s
49 | symw f = Glue {emptyg = zero, final = zero, glu = Symw f}
50 |
51 | altw :: Semiring s => Glue c s -> Glue c s -> Glue c s
52 | altw l r =
53 | Glue
54 | { emptyg = add (emptyg l) (emptyg r),
55 | final = add (final l) (final r),
56 | glu = Altw l r
57 | }
58 |
59 | seqw :: Semiring s => Glue c s -> Glue c s -> Glue c s
60 | seqw l r =
61 | Glue
62 | { emptyg = mul (emptyg l) (emptyg r),
63 | final = add (mul (final l) (emptyg r)) (final r),
64 | glu = Seqw l r
65 | }
66 |
67 | repw :: Semiring s => Glue c s -> Glue c s
68 | repw r = Glue {emptyg = one, final = final r, glu = Repw r}
69 |
70 | -- for my edification, the syntax under Symw is syntax for "replace
71 | -- this value in the created record."
72 | -- > data Foo = Foo { a :: Int, b :: Int } deriving (Show)
73 | -- > (Foo 1 2) { b = 4 }
74 | -- Foo { a = 1, b = 4 }
75 | -- It doesn't seem to be functional, i.e. Foo 1 2 $ { b = 4 } doesn't work.
76 |
77 | shift :: Semiring s => s -> Glu c s -> c -> Glue c s
78 | shift _ Epsw _ = epsw
79 | shift m (Symw f) c = (symw f) {final = m `mul` f c}
80 | shift m (Seqw l r) c =
81 | seqw
82 | (shift m (glu l) c)
83 | (shift (add (m `mul` (emptyg l)) (final l)) (glu r) c)
84 | shift m (Altw l r) c = altw (shift m (glu l) c) (shift m (glu r) c)
85 | shift m (Repw r) c = repw (shift (m `add` final r) (glu r) c)
86 |
87 | sym :: (Semiring s, Eq c) => c -> Glue c s
88 | sym c = symw (\b -> if b == c then one else zero)
89 |
90 | rigging :: Semiring s => (Char -> Glue t s) -> Reg -> Glue t s
91 | rigging s =
92 | \case
93 | Eps -> epsw
94 | (Sym _ c) -> s c
95 | (Alt p q) -> altw (rigging s p) (rigging s q)
96 | (Seq p q) -> seqw (rigging s p) (rigging s q)
97 | (Rep r) -> repw (rigging s r)
98 |
99 | rigged :: Semiring s => Reg -> Glue Char s
100 | rigged = rigging sym
101 |
102 | syms :: Char -> Glue Char (Set String)
103 | syms c = symw (\b -> if b == c then singleton [c] else zero)
104 |
105 | riggeds :: Reg -> Glue Char (Set String)
106 | riggeds = rigging syms
107 |
108 | instance Semiring (Set String) where
109 | zero = empty
110 | one = singleton ""
111 | add = union
112 | mul a b = Data.Set.map (uncurry (++)) $ cartesianProduct a b
113 |
114 | instance Semiring Int where
115 | zero = 0
116 | one = 1
117 | add = (Prelude.+)
118 | mul = (Prelude.*)
119 |
120 | instance Semiring Bool where
121 | zero = False
122 | one = True
123 | add = (||)
124 | mul = (&&)
125 |
126 | accept :: Semiring s => Glue c s -> [c] -> s
127 | accept r [] = emptyg r
128 | accept r (c:cs) =
129 | final (foldl (shift zero . glu) (shift one (glu r) c) cs)
130 |
131 | submatch :: Semiring s => Glue (Int, c) s -> [c] -> s
132 | submatch r s =
133 | accept (seqw arb (seqw r arb)) (zip [0..] s)
134 | where arb = repw (symw (\_ -> one))
135 |
136 | class Semiring s => Semiringi s where
137 | index :: Int -> s
138 |
139 | symi :: Semiringi s => Char -> Glue (Int, Char) s
140 | symi c = symw weight
141 | where weight (pos, x) | x == c = index pos
142 | | otherwise = zero
143 |
144 | riggew :: Semiringi s => Reg -> Glue (Int, Char) s
145 | riggew = rigging symi
146 |
147 | data Leftmost = NoLeft | Leftmost Start deriving (Show)
148 | data Start = NoStart | Start Int deriving (Show)
149 |
150 | instance Semiring Leftmost where
151 | zero = NoLeft
152 | one = Leftmost NoStart
153 | add NoLeft x = x
154 | add x NoLeft = x
155 | add (Leftmost x) (Leftmost y) = Leftmost (leftmost x y)
156 | where leftmost NoStart NoStart = NoStart
157 | leftmost NoStart (Start i) = Start i
158 | leftmost (Start i) NoStart = Start i
159 | leftmost (Start i) (Start j) = Start (min i j)
160 | mul NoLeft _ = NoLeft
161 | mul _ NoLeft = NoLeft
162 | mul (Leftmost x) (Leftmost y) = Leftmost (start x y)
163 | where start NoStart s = s
164 | start s _ = s
165 |
166 | instance Semiringi Leftmost where
167 | index = Leftmost . Start
168 |
169 | -- Leftlong Implementation!
170 |
171 | data LeftLong = NoLeftLong | NoRange | Range Int Int deriving (Show, Eq)
172 |
173 | instance Semiring LeftLong where
174 | zero = NoLeftLong
175 | one = NoRange
176 |
177 | -- The addition of two leftlongs is the selection
178 | -- of the longer of the two, provided there are
179 | -- two.
180 |
181 | add NoLeftLong x = x
182 | add x NoLeftLong = x
183 | add NoRange x = x
184 | add x NoRange = x
185 | add (Range i j) (Range k l)
186 | | i < k || i == k && j > l = Range i j
187 | | otherwise = Range k l
188 |
189 | -- The multiplication of two leftlongs is the the longest possible
190 | -- range among the leftlongs provided; the zero is still annhilation,
191 | -- the one is still identity, and `mul` here is the start of the left
192 | -- component and the end of the right component.
193 |
194 | mul NoLeftLong _ = NoLeftLong
195 | mul _ NoLeftLong = NoLeftLong
196 | mul NoRange x = x
197 | mul x NoRange = x
198 | mul (Range i _) (Range _ l) = Range i l
199 |
200 | instance Semiringi LeftLong where
201 | index i = Range i i
202 |
203 |
--------------------------------------------------------------------------------
/haskell/08_Heavyweights/stack.yaml:
--------------------------------------------------------------------------------
1 | # This file was automatically generated by 'stack init'
2 | #
3 | # Some commonly used options have been documented as comments in this file.
4 | # For advanced use and comprehensive documentation of the format, please see:
5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/
6 |
7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version.
8 | # A snapshot resolver dictates the compiler version and the set of packages
9 | # to be used for project dependencies. For example:
10 | #
11 | # resolver: lts-3.5
12 | # resolver: nightly-2015-09-21
13 | # resolver: ghc-7.10.2
14 | #
15 | # The location of a snapshot can be provided as a file or url. Stack assumes
16 | # a snapshot provided as a file might change, whereas a url resource does not.
17 | #
18 | # resolver: ./custom-snapshot.yaml
19 | # resolver: https://example.com/snapshots/2018-01-01.yaml
20 | resolver: lts-13.7
21 |
22 | # User packages to be built.
23 | # Various formats can be used as shown in the example below.
24 | #
25 | # packages:
26 | # - some-directory
27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz
28 | # - location:
29 | # git: https://github.com/commercialhaskell/stack.git
30 | # commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
32 | # subdirs:
33 | # - auto-update
34 | # - wai
35 | packages:
36 | - .
37 | # Dependency packages to be pulled from upstream that are not in the resolver
38 | # using the same syntax as the packages field.
39 | # (e.g., acme-missiles-0.3)
40 | # extra-deps: []
41 |
42 | # Override default flag values for local packages and extra-deps
43 | # flags: {}
44 |
45 | # Extra package databases containing global packages
46 | # extra-package-dbs: []
47 |
48 | # Control whether we use the GHC we find on the path
49 | # system-ghc: true
50 | #
51 | # Require a specific version of stack, using version ranges
52 | # require-stack-version: -any # Default
53 | # require-stack-version: ">=1.9"
54 | #
55 | # Override the architecture used by stack, especially useful on Windows
56 | # arch: i386
57 | # arch: x86_64
58 | #
59 | # Extra directories used by stack for building
60 | # extra-include-dirs: [/path/to/dir]
61 | # extra-lib-dirs: [/path/to/dir]
62 | #
63 | # Allow a newer minor version of GHC than the snapshot specifies
64 | # compiler-check: newer-minor
65 |
--------------------------------------------------------------------------------
/haskell/08_Heavyweights/test/Tests.hs:
--------------------------------------------------------------------------------
1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-}
2 | {-# LANGUAGE RecordWildCards #-}
3 |
4 | import Data.Foldable (for_)
5 | import Test.Hspec (Spec, describe, it, shouldBe)
6 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)
7 | import Heavyweights (Reg (..), Glue, accept, rigged, riggeds, riggew, submatch, symi, altw, seqw, repw, LeftLong(..))
8 | import Data.Set
9 | import Data.List (sort)
10 |
11 | main :: IO ()
12 | main = hspecWith defaultConfig {configFastFail = True} specs
13 |
14 | msym :: Char -> Reg
15 | msym c = Sym False c
16 |
17 | specs :: Spec
18 | specs = do
19 |
20 | let nocs = Rep ( Alt ( msym 'a' ) ( msym 'b' ) )
21 | let onec = Seq nocs (msym 'c')
22 | let evencs = Seq ( Rep ( Seq onec onec ) ) nocs
23 |
24 | let as = Alt (msym 'a') (Rep (msym 'a'))
25 | let bs = Alt (msym 'b') (Rep (msym 'b'))
26 |
27 | -- it "lifted expression" $
28 | -- (accept (rigged evencs) "acc" :: Bool) `shouldBe` True
29 |
30 | it "lifted expression short" $
31 | (accept (rigged evencs) "acc" :: Int) `shouldBe` 1
32 |
33 | it "lifted expression counter two" $
34 | (accept (rigged as) "a" :: Int) `shouldBe` 2
35 |
36 | it "lifted expression counter one" $
37 | (accept (rigged as) "aa" :: Int) `shouldBe` 1
38 |
39 | it "lifted expression dynamic counter four" $
40 | (accept (rigged (Seq as bs)) "ab" :: Int) `shouldBe` 4
41 |
42 | it "parse forests" $
43 | (sort $ toList $ (accept (riggeds (Seq as bs)) "ab" :: Set String)) `shouldBe` ["ab"]
44 |
45 | let aa = symi 'a'
46 | let ab = repw (aa `altw` symi 'b')
47 | let aaba = aa `seqw` ab `seqw` aa
48 |
49 | it "submatch noleft" $
50 | (submatch aaba "ab" :: LeftLong ) `shouldBe` NoLeftLong
51 |
52 | it "submatch shortrange" $
53 | (submatch aaba "aa" :: LeftLong ) `shouldBe` (Range 0 1)
54 |
55 | it "submatch fullrange" $
56 | (submatch aaba "bababa" :: LeftLong ) `shouldBe` (Range 1 5)
57 |
58 | for_ cases test
59 | where
60 | test Case {..} = it description assertion
61 | where
62 | assertion = (accept (rigged regex) sample :: Bool) `shouldBe` result
63 |
64 | data Case = Case
65 | { description :: String
66 | , regex :: Reg
67 | , sample :: String
68 | , result :: Bool
69 | }
70 |
71 | cases :: [Case]
72 | cases =
73 | [ Case {description = "empty", regex = Eps, sample = "", result = True}
74 | , Case {description = "char", regex = msym 'a', sample = "a", result = True}
75 | , Case
76 | {description = "not char", regex = msym 'a', sample = "b", result = False}
77 | , Case
78 | { description = "char vs empty"
79 | , regex = msym 'a'
80 | , sample = ""
81 | , result = False
82 | }
83 | , Case
84 | { description = "left alt"
85 | , regex = Alt (msym 'a') (msym 'b')
86 | , sample = "a"
87 | , result = True
88 | }
89 | , Case
90 | { description = "right alt"
91 | , regex = Alt (msym 'a') (msym 'b')
92 | , sample = "b"
93 | , result = True
94 | }
95 | , Case
96 | { description = "neither alt"
97 | , regex = Alt (msym 'a') (msym 'b')
98 | , sample = "c"
99 | , result = False
100 | }
101 | , Case
102 | { description = "empty alt"
103 | , regex = Alt (msym 'a') (msym 'b')
104 | , sample = ""
105 | , result = False
106 | }
107 | , Case
108 | { description = "empty rep"
109 | , regex = Rep (msym 'a')
110 | , sample = ""
111 | , result = True
112 | }
113 | , Case
114 | { description = "one rep"
115 | , regex = Rep (msym 'a')
116 | , sample = "a"
117 | , result = True
118 | }
119 | , Case
120 | { description = "multiple rep"
121 | , regex = Rep (msym 'a')
122 | , sample = "aaaaaaaaa"
123 | , result = True
124 | }
125 | , Case
126 | { description = "multiple rep with failure"
127 | , regex = Rep (msym 'a')
128 | , sample = "aaaaaaaaab"
129 | , result = False
130 | }
131 | , Case
132 | { description = "sequence"
133 | , regex = Seq (msym 'a') (msym 'b')
134 | , sample = "ab"
135 | , result = True
136 | }
137 | , Case
138 | { description = "sequence with empty"
139 | , regex = Seq (msym 'a') (msym 'b')
140 | , sample = ""
141 | , result = False
142 | }
143 | , Case
144 | { description = "bad short sequence"
145 | , regex = Seq (msym 'a') (msym 'b')
146 | , sample = "a"
147 | , result = False
148 | }
149 | , Case
150 | { description = "bad long sequence"
151 | , regex = Seq (msym 'a') (msym 'b')
152 | , sample = "abc"
153 | , result = False
154 | }
155 | ]
156 |
157 |
158 |
--------------------------------------------------------------------------------
/haskell/09_Classed_Brzozowski/BrzExp.cabal:
--------------------------------------------------------------------------------
1 | cabal-version: 1.12
2 |
3 | -- This file has been generated from package.yaml by hpack version 0.31.1.
4 | --
5 | -- see: https://github.com/sol/hpack
6 | --
7 | -- hash: bf664c2df8f31bc5a6ffd907ebaf10824d56d8d271f07d3f4cc7562ce9d44395
8 |
9 | name: BrzExp
10 | version: 0.1.0.0
11 | category: Regex
12 | homepage: https://github.com/elfsternberg/riggedregex#readme
13 | author: Elf M. Sternberg
14 | maintainer: elf.sternberg@gmail.com
15 | copyright: Copyright ⓒ 2019 Elf M. Sternberg
16 | license: BSD3
17 | license-file: LICENSE
18 | build-type: Simple
19 | extra-source-files:
20 | README.md
21 |
22 | library
23 | exposed-modules:
24 | BrzExp
25 | other-modules:
26 | Paths_BrzExp
27 | hs-source-dirs:
28 | src
29 | ghc-options: -Wall
30 | build-depends:
31 | base
32 | default-language: Haskell2010
33 |
34 | test-suite test
35 | type: exitcode-stdio-1.0
36 | main-is: Tests.hs
37 | other-modules:
38 | Paths_BrzExp
39 | hs-source-dirs:
40 | test
41 | build-depends:
42 | BrzExp
43 | , base
44 | , hspec
45 | default-language: Haskell2010
46 |
--------------------------------------------------------------------------------
/haskell/09_Classed_Brzozowski/LICENSE:
--------------------------------------------------------------------------------
1 | The Brzozowski experiments are original work, and are copyright and
2 | licensed by Elf M. Sternberg according to the Mozilla 2.0 Public
3 | License. See the LICENSE.md file in the main directory
4 |
--------------------------------------------------------------------------------
/haskell/09_Classed_Brzozowski/README.md:
--------------------------------------------------------------------------------
1 | # Brzozowski Regular Expressions, in Haskell
2 |
3 | This is a regex recognizer implementing Brzozowski's Algorithm, in
4 | Haskell.
5 |
6 |
7 |
8 |
--------------------------------------------------------------------------------
/haskell/09_Classed_Brzozowski/Setup.hs:
--------------------------------------------------------------------------------
1 | import Distribution.Simple
2 | main = defaultMain
3 |
--------------------------------------------------------------------------------
/haskell/09_Classed_Brzozowski/package.yaml:
--------------------------------------------------------------------------------
1 | name: BrzExp
2 | version: 0.1.0.0
3 |
4 | homepage: https://github.com/elfsternberg/riggedregex#readme
5 | license: BSD3
6 | license-file: LICENSE
7 | author: Elf M. Sternberg
8 | maintainer: elf.sternberg@gmail.com
9 | copyright: Copyright ⓒ 2019 Elf M. Sternberg
10 | category: Regex
11 | build-type: Simple
12 | extra-source-files: README.md
13 |
14 | dependencies:
15 | - base
16 |
17 | library:
18 | exposed-modules: BrzExp
19 | ghc-options: -Wall
20 | source-dirs: src
21 |
22 | tests:
23 | test:
24 | main: Tests.hs
25 | source-dirs: test
26 | dependencies:
27 | - BrzExp
28 | - hspec
29 |
30 |
--------------------------------------------------------------------------------
/haskell/09_Classed_Brzozowski/src/BrzExp.hs:
--------------------------------------------------------------------------------
1 | module BrzExp ( accept, nullable, Brz (..) ) where
2 | data Brz = Emp | Eps | Sym (Char -> Bool) | Alt Brz Brz | Seq Brz Brz | Rep Brz
3 |
4 | derive :: Brz -> Char -> Brz
5 | derive Emp _ = Emp
6 | derive Eps _ = Emp
7 | derive (Sym c) u = if (c u) then Eps else Emp
8 | derive (Seq l r) u
9 | | nullable l = Alt (Seq (derive l u) r) (derive r u)
10 | | otherwise = Seq (derive l u) r
11 |
12 | derive (Alt Emp r) u = derive r u
13 | derive (Alt l Emp) u = derive l u
14 | derive (Alt l r) u = Alt (derive r u) (derive l u)
15 |
16 | derive (Rep r) u = Seq (derive r u) (Rep r)
17 |
18 | nullable :: Brz -> Bool
19 | nullable Emp = False
20 | nullable Eps = True
21 | nullable (Sym _) = False
22 | nullable (Alt l r) = nullable l || nullable r
23 | nullable (Seq l r) = nullable l && nullable r
24 | nullable (Rep _) = True
25 |
26 | accept :: Brz -> String -> Bool
27 | accept r [] = nullable r
28 | accept r (s:ss) = accept (derive r s) ss
29 |
30 |
31 |
--------------------------------------------------------------------------------
/haskell/09_Classed_Brzozowski/stack.yaml:
--------------------------------------------------------------------------------
1 | # This file was automatically generated by 'stack init'
2 | #
3 | # Some commonly used options have been documented as comments in this file.
4 | # For advanced use and comprehensive documentation of the format, please see:
5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/
6 |
7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version.
8 | # A snapshot resolver dictates the compiler version and the set of packages
9 | # to be used for project dependencies. For example:
10 | #
11 | # resolver: lts-3.5
12 | # resolver: nightly-2015-09-21
13 | # resolver: ghc-7.10.2
14 | #
15 | # The location of a snapshot can be provided as a file or url. Stack assumes
16 | # a snapshot provided as a file might change, whereas a url resource does not.
17 | #
18 | # resolver: ./custom-snapshot.yaml
19 | # resolver: https://example.com/snapshots/2018-01-01.yaml
20 | resolver: lts-13.4
21 |
22 | # User packages to be built.
23 | # Various formats can be used as shown in the example below.
24 | #
25 | # packages:
26 | # - some-directory
27 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz
28 | # - location:
29 | # git: https://github.com/commercialhaskell/stack.git
30 | # commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a
31 | # - location: https://github.com/commercialhaskell/stack/commit/e7b331f14bcffb8367cd58fbfc8b40ec7642100a
32 | # subdirs:
33 | # - auto-update
34 | # - wai
35 | packages:
36 | - .
37 | # Dependency packages to be pulled from upstream that are not in the resolver
38 | # using the same syntax as the packages field.
39 | # (e.g., acme-missiles-0.3)
40 | # extra-deps: []
41 |
42 | # Override default flag values for local packages and extra-deps
43 | # flags: {}
44 |
45 | # Extra package databases containing global packages
46 | # extra-package-dbs: []
47 |
48 | # Control whether we use the GHC we find on the path
49 | # system-ghc: true
50 | #
51 | # Require a specific version of stack, using version ranges
52 | # require-stack-version: -any # Default
53 | # require-stack-version: ">=1.9"
54 | #
55 | # Override the architecture used by stack, especially useful on Windows
56 | # arch: i386
57 | # arch: x86_64
58 | #
59 | # Extra directories used by stack for building
60 | # extra-include-dirs: [/path/to/dir]
61 | # extra-lib-dirs: [/path/to/dir]
62 | #
63 | # Allow a newer minor version of GHC than the snapshot specifies
64 | # compiler-check: newer-minor
65 |
--------------------------------------------------------------------------------
/haskell/09_Classed_Brzozowski/test/Tests.hs:
--------------------------------------------------------------------------------
1 | {-# OPTIONS_GHC -fno-warn-type-defaults #-}
2 | {-# LANGUAGE RecordWildCards #-}
3 |
4 | import Data.Foldable (for_)
5 | import Test.Hspec (Spec, describe, it, shouldBe)
6 | import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)
7 |
8 | import BrzExp (Brz (..), accept)
9 |
10 | main :: IO ()
11 | main = hspecWith defaultConfig {configFastFail = True} specs
12 |
13 | specs :: Spec
14 | specs = describe "accept" $ for_ cases test
15 | where
16 | test Case {..} = it description assertion
17 | where
18 | assertion = accept regex sample `shouldBe` result
19 |
20 | data Case = Case
21 | { description :: String
22 | , regex :: Brz
23 | , sample :: String
24 | , result :: Bool
25 | }
26 |
27 | symf :: Char -> Brz
28 | symf c = Sym (\u -> c == u)
29 |
30 | -- let nocs = Rep ( Alt ( Sym 'a' ) ( Sym 'b' ) )
31 | -- onec = Seq nocs (Sym 'c')
32 | -- evencs = Seq ( Rep ( Seq onec onec ) ) nocs
33 | -- as = Alt (Sym 'a') (Rep (Sym 'a'))
34 | -- bs = Alt (Sym 'b') (Rep (Sym 'b'))
35 | cases :: [Case]
36 | cases =
37 | [ Case {description = "empty", regex = Eps, sample = "", result = True}
38 | , Case {description = "null", regex = Emp, sample = "", result = False}
39 | , Case {description = "char", regex = symf 'a', sample = "a", result = True}
40 | , Case
41 | {description = "not char", regex = symf 'a', sample = "b", result = False}
42 | , Case
43 | { description = "char vs empty"
44 | , regex = symf 'a'
45 | , sample = ""
46 | , result = False
47 | }
48 | , Case
49 | { description = "left alt"
50 | , regex = Alt (symf 'a') (symf 'b')
51 | , sample = "a"
52 | , result = True
53 | }
54 | , Case
55 | { description = "right alt"
56 | , regex = Alt (symf 'a') (symf 'b')
57 | , sample = "b"
58 | , result = True
59 | }
60 | , Case
61 | { description = "neither alt"
62 | , regex = Alt (symf 'a') (symf 'b')
63 | , sample = "c"
64 | , result = False
65 | }
66 | , Case
67 | { description = "empty alt"
68 | , regex = Alt (symf 'a') (symf 'b')
69 | , sample = ""
70 | , result = False
71 | }
72 | , Case
73 | { description = "empty rep"
74 | , regex = Rep (symf 'a')
75 | , sample = ""
76 | , result = True
77 | }
78 | , Case
79 | { description = "one rep"
80 | , regex = Rep (symf 'a')
81 | , sample = "a"
82 | , result = True
83 | }
84 | , Case
85 | { description = "multiple rep"
86 | , regex = Rep (symf 'a')
87 | , sample = "aaaaaaaaa"
88 | , result = True
89 | }
90 | , Case
91 | { description = "multiple rep with failure"
92 | , regex = Rep (symf 'a')
93 | , sample = "aaaaaaaaab"
94 | , result = False
95 | }
96 | , Case
97 | { description = "sequence"
98 | , regex = Seq (symf 'a') (symf 'b')
99 | , sample = "ab"
100 | , result = True
101 | }
102 | , Case
103 | { description = "sequence with empty"
104 | , regex = Seq (symf 'a') (symf 'b')
105 | , sample = ""
106 | , result = False
107 | }
108 | , Case
109 | { description = "bad short sequence"
110 | , regex = Seq (symf 'a') (symf 'b')
111 | , sample = "a"
112 | , result = False
113 | }
114 | , Case
115 | { description = "bad long sequence"
116 | , regex = Seq (symf 'a') (symf 'b')
117 | , sample = "abc"
118 | , result = False
119 | }
120 | ]
121 |
--------------------------------------------------------------------------------
/node/01_Kleene.ts:
--------------------------------------------------------------------------------
1 | interface Regcom { kind: string };
2 | class Eps implements Regcom { kind: "eps"; };
3 | class Sym implements Regcom { kind: "sym"; s: string; }
4 | class Alt implements Regcom { kind: "alt"; l: Regex; r: Regex };
5 | class Seq implements Regcom { kind: "seq"; l: Regex; r: Regex };
6 | class Rep implements Regcom { kind: "rep"; r: Regex };
7 |
8 | function eps(): Eps { return { kind: "eps" }; };
9 | function sym(c: string): Sym { return { kind: "sym", s: c }; };
10 | function alt(l: Regex, r: Regex): Alt { return { kind: "alt", l: l, r: r }; };
11 | function seq(l: Regex, r: Regex): Seq { return { kind: "seq", l: l, r: r }; };
12 | function rep(r: Regex): Rep { return { kind: "rep", r: r }; };
13 |
14 | type Regex = Eps | Sym | Alt | Seq | Rep;
15 |
16 | // split :: [a] -> [([a], [a])]
17 | // split [] = [([], [])]
18 | // split (c:cs) = ([], c : cs) : [(c : s1, s2) | (s1, s2) <- split cs]
19 |
20 | function split(s: string) {
21 | if (s.length == 0) {
22 | return [["", ""]];
23 | }
24 | return [["", s.slice()]].concat(split(s.slice(1)).map((v) => [s[0].slice().concat(v[0].slice()), v[1].slice()]));
25 | }
26 |
27 | // parts :: [a] -> [[[a]]]
28 | // parts [] = [[]]
29 | // parts [c] = [[[c]]]
30 | // parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs]
31 |
32 | function parts(s: string): Array> {
33 | if (s.length == 0) {
34 | return [[]];
35 | }
36 |
37 | if (s.length == 1) {
38 | return [[s]];
39 | }
40 |
41 | let c = s[0];
42 | let cs = s.slice(1);
43 | return parts(cs).reduce((acc, pps) => {
44 | let p: string = pps[0];
45 | let ps: Array = pps.slice(1);
46 | let l: Array = [c + p].concat(ps);
47 | let r: Array = [c].concat(p).concat(ps);
48 | return acc.concat([l, r]);
49 | }, [[]]).filter((c) => c.length != 0);
50 | }
51 |
52 | function one(a: Array, test: (s: any) => boolean): boolean {
53 | return a.reduce((acc: boolean, sc: any) => acc || test(sc), false);
54 | }
55 |
56 | function all(a: Array, test: (s: any) => boolean): boolean {
57 | return a.reduce((acc: boolean, sc: any) => acc && test(sc), true);
58 | }
59 |
60 |
61 | function accept(r: Regex, s: string): boolean {
62 | switch(r.kind) {
63 | case "eps":
64 | return s.length == 0;
65 | case "sym":
66 | return s.length == 1 && r.s == s[0];
67 | case "alt":
68 | return accept(r.l, s) || accept(r.r, s);
69 | case "seq":
70 | return split(s).some((v: Array) => accept(r.l, v[0]) && accept(r.r, v[1]));
71 | case "rep":
72 | return parts(s).some((v: Array) => v.every((u: string) => accept(r.r, u)));
73 | }
74 | }
75 |
76 | function run_tests() {
77 |
78 | function assert(l: any) {
79 | console.log(" ", l);
80 | }
81 |
82 | let units = {
83 | test_simple: () => {
84 | let onea = sym("a");
85 | assert(accept(onea, "a"));
86 |
87 | let nocs = rep(alt(sym("a"), sym("b")));
88 | assert(accept(nocs, "abab"));
89 | },
90 |
91 | test_seq: () => {
92 | let abc = seq(sym("a"), seq(sym("b"), sym("c")));
93 | assert(accept(abc, "abc"));
94 | },
95 |
96 | test_rc: () => {
97 | let ab = seq(sym("a"), sym("b"));
98 | let abab = seq(ab, ab);
99 | assert(accept(abab, "abab"));
100 | },
101 |
102 | test_fail: () => {
103 | let ab = seq(sym("a"), sym("b"));
104 | let abab = seq(ab, ab);
105 | assert(! accept(abab, "abacb"));
106 | },
107 |
108 | test_empty_rep: () => {
109 | let a = rep(sym("a"));
110 | assert(accept(a, ""));
111 | },
112 |
113 | test_some_rep: () => {
114 | let a = rep(sym("a"));
115 | assert(accept(a, "a"));
116 | },
117 |
118 | test_many_rep: () => {
119 | let a = rep(sym("a"));
120 | assert(accept(a, "aaaaaaa"));
121 | },
122 |
123 | test_many_rep_dead_l: () => {
124 | let a = rep(sym("a"));
125 | assert(! accept(a, "!aaaaaa"));
126 | },
127 |
128 | test_many_rep_dead_r: () => {
129 | let a = rep(sym("a"));
130 | assert(! accept(a, "aaaaaa!"));
131 | },
132 |
133 | test_many_rep_dead_m: () => {
134 | let a = rep(sym("a"));
135 | assert(! accept(a, "aaa!aaa"));
136 | },
137 |
138 | test_two: () => {
139 | let nocs = rep(alt(sym("a"), sym("b")));
140 | let onec = seq(nocs, sym("c"));
141 | let evencs = seq(rep(seq(onec, onec)), nocs);
142 | assert(accept(evencs, "abcc"));
143 | assert(accept(evencs, "abccababbbbcc"));
144 | }
145 | }
146 |
147 | console.log("Running tests...");
148 | for (let k of Object.keys(units)) {
149 | console.log(k); units[k]();
150 | }
151 | }
152 |
153 | run_tests();
154 |
--------------------------------------------------------------------------------
/python/01_rigged_brzozowski.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | from collections import namedtuple
4 | import re
5 |
6 | Emp = namedtuple("Emp", [])
7 | Eps = namedtuple("Eps", ["tok"])
8 | Sym = namedtuple("Sym", ["c"])
9 | Alt = namedtuple("Alt", ["l", "r"])
10 | Seq = namedtuple("Seq", ["l", "r"])
11 | Rep = namedtuple("Rep", ["r"])
12 | Del = namedtuple("Del", ["r"])
13 |
14 | cname = re.compile(r'^(\w+)\(')
15 |
16 |
17 | def cn(s):
18 | """ Find the canonical name of the regex op"""
19 | return cname.match(s.__doc__).group(1)
20 |
21 |
22 | def derive(r, c):
23 | """ Take a regex op and a character, and return the derivative regex op."""
24 | def sym(r, c):
25 | if c == r.c:
26 | return Eps(set([c]))
27 | return Emp()
28 |
29 | def alt(r, c):
30 | l1 = derive(r.l, c)
31 | r1 = derive(r.r, c)
32 | if cn(l1) == 'Emp':
33 | return r1
34 | if cn(r1) == 'Emp':
35 | return l1
36 | return Alt(l1, r1)
37 |
38 | def seq(r, c):
39 | return Alt(Seq(derive(r.l, c), r.r),
40 | Seq(Del(r.l), derive(r.r, c)))
41 |
42 | def rep(r, c):
43 | return Seq(derive(r.r, c), r)
44 |
45 | def emp(r, c):
46 | return Emp()
47 |
48 | nextfn = {
49 | "Emp": emp,
50 | "Eps": emp,
51 | "Del": emp,
52 | "Sym": sym,
53 | "Alt": alt,
54 | "Seq": seq,
55 | "Rep": rep,
56 | }.get(cn(r))
57 |
58 | return nextfn(r, c)
59 |
60 |
61 | def parsenull(r):
62 | """ Extract the generated parse forest from the residual regular expression."""
63 |
64 | def emp(r): return set()
65 |
66 | def eps(r): return r.tok
67 |
68 | def sym(r): return set([""])
69 |
70 | def alt(r): return parsenull(r.l).union(parsenull(r.r))
71 |
72 | def seq(r): return set([i + j
73 | for j in parsenull(r.r)
74 | for i in parsenull(r.l)])
75 |
76 | def one(r): return parsenull(r.r)
77 |
78 | nextfn = {
79 | "Emp": emp,
80 | "Sym": emp,
81 | "Rep": sym,
82 | "Del": one,
83 | "Eps": eps,
84 | "Alt": alt,
85 | "Seq": seq
86 | }.get(cn(r))
87 |
88 | return nextfn(r)
89 |
90 |
91 | def parse(r, s):
92 | """Iterate through the string, generating a new regular expression for each character, until done."""
93 | head = r
94 | for i in s:
95 | print head, "\n"
96 | head = derive(head, i)
97 | print head
98 | return parsenull(head)
99 |
100 |
101 | if __name__ == '__main__':
102 | nocs = Rep(Alt(Sym('a'), (Sym('b'))))
103 | onec = Seq(nocs, Sym('c'))
104 | evencs = Seq(Rep(Seq(onec, onec)), nocs)
105 |
106 | aas = Alt(Sym('a'), Rep(Sym('a')))
107 | bbs = Alt(Sym('b'), Rep(Sym('b')))
108 |
109 |
110 | # print(parse(evencs, "acc"))
111 |
112 | sym = Seq(Sym('a'), Seq(Rep(Sym('b')), Sym('c')))
113 | parse(sym, "ac")
114 |
--------------------------------------------------------------------------------
/python/02_rigged_brzozowski.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | from collections import namedtuple
4 | import re
5 |
6 | Emp = namedtuple("Emp", [])
7 | Eps = namedtuple("Eps", ["tok"])
8 | Sym = namedtuple("Sym", ["c"])
9 | Alt = namedtuple("Alt", ["l", "r"])
10 | Seq = namedtuple("Seq", ["l", "r"])
11 | Rep = namedtuple("Rep", ["r"])
12 |
13 | cname = re.compile(r'^(\w+)\(')
14 |
15 |
16 | def cn(s):
17 | """ Find the canonical name of the regex op"""
18 | return cname.match(s.__doc__).group(1)
19 |
20 |
21 | def derive(r, c):
22 | """ Take a regex op and a character, and return the derivative regex op."""
23 | def sym(r, c):
24 | if c == r.c:
25 | return Eps(set([c]))
26 | return Emp()
27 |
28 | def alt(r, c):
29 | l1 = derive(r.l, c)
30 | r1 = derive(r.r, c)
31 | if cn(l1) == 'Emp':
32 | return r1
33 | if cn(r1) == 'Emp':
34 | return l1
35 | return Alt(l1, r1)
36 |
37 | def seq(r, c):
38 | if nullable(r.l):
39 | return Alt(Seq(derive(r.l, c), r.r), derive(r.r, c))
40 | return Seq(derive(r.l, c), r.r)
41 |
42 | def rep(r, c):
43 | return Seq(derive(r.r, c), r)
44 |
45 | def emp(r, c):
46 | return Emp()
47 |
48 | nextfn = {
49 | "Emp": emp,
50 | "Eps": emp,
51 | "Sym": sym,
52 | "Alt": alt,
53 | "Seq": seq,
54 | "Rep": rep,
55 | }.get(cn(r))
56 |
57 | return nextfn(r, c)
58 |
59 | def nullable(r):
60 | def zer(r): return False
61 | def one(r): return True
62 | def alt(r): return nullable(r.l) or nullable(r.r)
63 | def seq(r): return nullable(r.l) and nullable(r.r)
64 |
65 | nextfn = {
66 | "Emp": zer,
67 | "Sym": zer,
68 | "Rep": one,
69 | "Eps": one,
70 | "Alt": alt,
71 | "Seq": seq
72 | }.get(cn(r))
73 |
74 | return nextfn(r)
75 |
76 | def parsenull(r):
77 | """ Extract the generated parse forest from the residual regular expression."""
78 |
79 | def emp(r): return set()
80 |
81 | def eps(r): return r.tok
82 |
83 | def sym(r): return set([""])
84 |
85 | def alt(r): return parsenull(r.l).union(parsenull(r.r))
86 |
87 | def seq(r): return set([i + j
88 | for j in parsenull(r.r)
89 | for i in parsenull(r.l)])
90 |
91 | nextfn = {
92 | "Emp": emp,
93 | "Sym": emp,
94 | "Rep": sym,
95 | "Eps": eps,
96 | "Alt": alt,
97 | "Seq": seq
98 | }.get(cn(r))
99 |
100 | return nextfn(r)
101 |
102 |
103 | def parse(r, s):
104 | """Iterate through the string, generating a new regular expression for each character, until done."""
105 | head = r
106 | for i in s:
107 | print head, "\n"
108 | head = derive(head, i)
109 | print head
110 | return parsenull(head)
111 |
112 |
113 | if __name__ == '__main__':
114 | # nocs = Rep(Alt(Sym('a'), (Sym('b'))))
115 | # onec = Seq(nocs, Sym('c'))
116 | # evencs = Seq(Rep(Seq(onec, onec)), nocs)
117 | #
118 | # aas = Alt(Sym('a'), Rep(Sym('a')))
119 | # bbs = Alt(Sym('b'), Rep(Sym('b')))
120 | #
121 |
122 | sym = Seq(Sym('a'), Seq(Rep(Sym('b')), Sym('c')))
123 | parse(sym, "ac")
124 |
--------------------------------------------------------------------------------
/python/README.md:
--------------------------------------------------------------------------------
1 | # Python Experiments!
2 |
3 | This directory contains some simple experiments, in Python. Python is,
4 | frankly, easier to instrument than Haskell, so figuring out the
5 | underlying operation and stepping through it with pdb, can sometimes be
6 | easier to do in Python3
7 |
8 | `01_rigged_brzowoski.py`: A naive implementation of Brzozowski's regular
9 | expression library, using the `Delta` operator to distinguish between
10 | nullable and not-nullable branches of the `Sequence` operator. What's
11 | remarkable about it, if anything, is just *how much* it resembles
12 | Haskell Experiment 05: Rigged Brzozowski Regular Expressions. Part of
13 | that is using the `namedtuple` as an easy hack for Haskell's data
14 | constructors, and then implementing the `derive()` and `parsenull()`
15 | functions using map functions as a substitute for Haskell's pattern
16 | matching.
17 |
18 | This is mostly proof that "One can write Haskell poorly in any
19 | language."
20 |
--------------------------------------------------------------------------------
/rust/01_simpleregex/Cargo.toml:
--------------------------------------------------------------------------------
1 | [package]
2 | name = "simpleregex"
3 | version = "0.1.0"
4 | authors = ["Elf M. Sternberg "]
5 | edition = "2018"
6 |
7 |
--------------------------------------------------------------------------------
/rust/01_simpleregex/README.md:
--------------------------------------------------------------------------------
1 | # Kleene Regular Expressions, in Rust.
2 |
3 | This is literally the definition of a simple string recognizing regular
4 | expression in Rust. It consists of the `Reg` datatype encompassing
5 | the five standard operations of regular expressions and an `accept`
6 | function that takes the expression and a string and returns a Boolean
7 | yes/no on recognition or failure. It is a direct implementation of
8 | Kleene's algebra:
9 |
10 | L[[ε]] = {ε}
11 | L[[a]] = {a}
12 | L[[r · s]] = {u · v | u ∈ L[[r]] and v ∈ L[[s]]}
13 | L[[r | s]] = L[[r]] ∪ L[[s]]
14 | L[[r∗]] = {ε} ∪ L[[r · r*]]
15 |
16 | Those equations are for: recognizing an empty string, recognizing a
17 | letter, recognizing two expressions in sequence, recognizing two
18 | expression alternatives, and the repetition operation.
19 |
20 | Composition is by simple reference-counted pointers to child
21 | expressions. I've provided convenient constructor functions to make the
22 | creation of new regexes easier.
23 |
24 | The `accept` function has two helper functions that split the string,
25 | and all substrings, into all possible substrings such that *every
26 | possible combination* of string and expression are tested, and if the
27 | resulting tree of `and`s (from Sequencing and Repetition) and `or`s
28 | (from Alternation) has at least one complete collection of `True` from
29 | top to bottom then the function returns true.
30 |
31 | This generation and comparison of substrings is grossly inefficient; an
32 | string of eight 'a's with `a*` will take 30 seconds on a modern laptop;
33 | increase that to twelve and you'll be waiting about an hour. The cost
34 | is `2^(n - 1)`, where `n` is the length of the string; this is a
35 | consequence of the sequencing operation. Sequences aren't just about
36 | letters: they could be about anything, including repetition (which
37 | itself creates new sequences) and other sequences, and the cost of
38 | examining every possible combination of sequencing creates this
39 | exponential cost.
40 |
41 | While not as clean (no pun intended) as the Haskell version, especially
42 | in the helper functions, it's still surprisingly easy to read, and the
43 | `accept` function is almost line-for-line as clear as the Haskell
44 | version. The use of `.any` and `.all` for the `and` and `or` functions
45 | makes a lot of sense here.
46 |
47 | ## License
48 |
49 | As this is entirely my work, it is copyright (c) 2019, and licensed
50 | under the Mozilla Public License v. 2.0. See the
51 | [LICENSE.md](../../LICENSE.md) in the root directory.
52 |
--------------------------------------------------------------------------------
/rust/01_simpleregex/src/lib.rs:
--------------------------------------------------------------------------------
1 | use std::rc::Rc;
2 |
3 | // data Reg = Eps | Sym Char | Alt Reg Reg | Seq Reg Reg | Rep Reg
4 |
5 | #[derive(Debug)]
6 | pub enum Reg {
7 | Eps,
8 | Sym(char),
9 | Alt(Rc, Rc),
10 | Seq(Rc, Rc),
11 | Rep(Rc),
12 | }
13 |
14 | // Some rust-specific helpers to make constructing regular expressions
15 | // easier.
16 |
17 | pub fn eps() -> Rc {
18 | Rc::new(Reg::Eps)
19 | }
20 | pub fn sym(c: char) -> Rc {
21 | Rc::new(Reg::Sym(c))
22 | }
23 | pub fn alt(r1: &Rc, r2: &Rc) -> Rc {
24 | Rc::new(Reg::Alt(r1.clone(), r2.clone()))
25 | }
26 | pub fn seq(r1: &Rc, r2: &Rc) -> Rc {
27 | Rc::new(Reg::Seq(r1.clone(), r2.clone()))
28 | }
29 | pub fn rep(r1: &Rc) -> Rc {
30 | Rc::new(Reg::Rep(r1.clone()))
31 | }
32 |
33 | // split :: [a] -> [([a], [a])]
34 | // split [] = [([], [])]
35 | // split (c:cs) = ([], c : cs) : [(c : s1, s2) | (s1, s2) <- split cs]
36 |
37 | pub fn split(s: &[char]) -> Vec<(Vec, Vec)> {
38 | if s.is_empty() {
39 | return vec![(vec![], vec![])];
40 | }
41 |
42 | let mut ret = vec![(vec![], s.to_vec())];
43 | let c = s[0];
44 |
45 | fn permute(c: char, s1: &mut Vec, s2: &[char]) -> (Vec, Vec) {
46 | let mut r1 = vec![c];
47 | r1.append(s1);
48 | (r1, s2.to_vec())
49 | }
50 |
51 | ret.append(
52 | &mut split(&s[1..])
53 | .iter_mut()
54 | .map(|(s1, s2)| permute(c, s1, &s2))
55 | .collect(),
56 | );
57 | ret
58 | }
59 |
60 | // parts :: [a] -> [[[a]]]
61 | // parts [] = [[]]
62 | // parts [c] = [[[c]]]
63 | // parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs]
64 |
65 | // This was challenging to port to Rust. Haskell's automatic
66 | // conversion of [Char] to String obscured what was going on under the
67 | // covers.
68 | //
69 | // parts (c:cs) = concat [[(c : p) : ps, [c] : p : ps] | p:ps <- parts cs]
70 | // The two elements are:
71 | //
72 | // - ([c]:[[p]]):[[[ps]]]
73 | // The char 'c' is converted to a string, and that string is consed to
74 | // list 'p', and then list 'p' is consed onto the list 'ps'
75 | //
76 | // - [[c]]:[[p]]:[[[ps]]]
77 | // The char 'c' is made into a string and then wrapped in a list, and
78 | // then [[p]] and [[c]] are both consed onto list 'ps'
79 | //
80 | // It really took writing it all out on paper to understand the order
81 | // operation.
82 |
83 | pub fn parts(s: &[char]) -> Vec>> {
84 | if s.is_empty() {
85 | return vec![vec![]];
86 | }
87 | if s.len() == 1 {
88 | return vec![vec![s.to_vec()]];
89 | }
90 |
91 | let head = s[0];
92 | let tail = &s[1..];
93 |
94 | let mut ret = vec![];
95 | for pps in parts(tail) {
96 | let phead = &pps[0];
97 | let ptail = &pps[1..];
98 |
99 | let mut left = vec![head];
100 | left.append(&mut phead.to_vec());
101 |
102 | let mut left_1 = vec![left];
103 | left_1.append(&mut ptail.to_vec());
104 | ret.push(left_1);
105 |
106 | let mut right = vec![vec![head]];
107 | right.push(phead.to_vec());
108 | right.append(&mut ptail.to_vec());
109 | ret.push(right);
110 | }
111 | ret
112 | }
113 |
114 | // accept :: Reg -> String -> Bool
115 | // accept Eps u = null u
116 | // accept (Sym c) u = u == [c]
117 | // accept (Alt p q) u = accept p u || accept q u
118 | // accept (Seq p q) u = or [accept p u1 && accept q u2 | (u1, u2) <- split u]
119 | // accept (Rep r) u = or [and [accept r ui | ui <- ps] | ps <- parts u]
120 |
121 | pub fn accept(r: &Reg, s: &[char]) -> bool {
122 | match r {
123 | Reg::Eps => s.is_empty(),
124 | Reg::Sym(c) => (s.len() == 1 && s[0] == *c),
125 | Reg::Alt(r1, r2) => accept(&r1, s) || accept(&r2, s),
126 | Reg::Seq(r1, r2) => split(s)
127 | .into_iter()
128 | .any(|(u1, u2)| accept(r1, &u1) && accept(r2, &u2)),
129 | Reg::Rep(r) => parts(s)
130 | .into_iter()
131 | .any(|ps| ps.into_iter().all(|u| accept(r, &u))),
132 | }
133 | }
134 |
135 | #[cfg(test)]
136 | mod tests {
137 | use super::*;
138 |
139 | fn vectostr(r: &(Vec, Vec)) -> (String, String) {
140 | let (a, b) = r;
141 | let c: String = a.into_iter().collect();
142 | let d: String = b.into_iter().collect();
143 | (c, d)
144 | }
145 |
146 | #[test]
147 | fn test_split() {
148 | let c1: Vec = String::from("").chars().into_iter().collect();
149 | let s: Vec<(String, String)> = split(&c1).into_iter().map(|r| vectostr(&r)).collect();
150 | assert_eq!(s, [("".to_string(), "".to_string())]);
151 | }
152 |
153 | #[test]
154 | fn test_simple() {
155 | let c1: Vec = String::from("acc").chars().into_iter().collect();
156 | assert_eq!(c1, ['a', 'c', 'c']);
157 |
158 | let c2: Vec = String::from("a").chars().into_iter().collect();
159 | let onea = sym('a');
160 | assert!(accept(&onea, &c2));
161 |
162 | let c3: Vec = String::from("abab").chars().into_iter().collect();
163 | let nocs = rep(&alt(&sym('a'), &sym('b')));
164 | assert!(accept(&nocs, &c3));
165 | }
166 |
167 | #[test]
168 | fn test_seq() {
169 | let c3: Vec = String::from("abc").chars().into_iter().collect();
170 | let abc = seq(&sym('a'), &seq(&sym('b'), &sym('c')));
171 | assert!(accept(&abc, &c3));
172 | }
173 |
174 | #[test]
175 | fn test_rc() {
176 | let c3: Vec = String::from("abab").chars().into_iter().collect();
177 | let ab = seq(&sym('a'), &sym('b'));
178 | let abab = seq(&ab, &ab);
179 | assert!(accept(&abab, &c3));
180 | }
181 |
182 | #[test]
183 | fn test_empty_rep() {
184 | let c3: Vec = String::from("").chars().into_iter().collect();
185 | let a = rep(&sym('a'));
186 | assert!(accept(&a, &c3));
187 | }
188 |
189 | #[test]
190 | fn test_two() {
191 | let c4: Vec = String::from("abcc").chars().into_iter().collect();
192 | let nocs = rep(&alt(&sym('a'), &sym('b')));
193 | let onec = seq(&nocs, &sym('c'));
194 | let evencs = seq(&rep(&seq(&onec, &onec)), &nocs);
195 | assert!(accept(&evencs, &c4))
196 | }
197 | }
198 |
--------------------------------------------------------------------------------
/rust/02_riggedregex/Cargo.toml:
--------------------------------------------------------------------------------
1 | [package]
2 | name = "riggedregex"
3 | version = "0.1.0"
4 | authors = ["Elf M. Sternberg "]
5 | edition = "2018"
6 |
7 | [dependencies]
8 | num-traits = "0.2.6"
9 |
--------------------------------------------------------------------------------
/rust/02_riggedregex/README.md:
--------------------------------------------------------------------------------
1 | # Kleene Regular Expressions with Rigging, in Rust
2 |
3 | This program builds on the simple regular expressions in Version 01,
4 | provding a new definition of a regular expression `Regw` that takes two
5 | types, a source type and an output type. The output type must be a
6 | [*Semiring*](https://en.wikipedia.org/wiki/Semiring).
7 |
8 | A semiring is a set R equipped with two binary operations + and ⋅, and
9 | two constants identified as 0 and 1. By providing a semiring to the
10 | regular expression, we change the return type of the regular expression
11 | to any set that can obey the semiring laws. There's a surprising amount
12 | of stuff you can do with the semiring laws.
13 |
14 | In this example, I've providing a function, `rigged`, that takes a
15 | simple regular expression from Version 01, and wraps or extracts
16 | the contents of that regular expression into the `Regw` datatype.
17 | Instead of the boolean mathematics of Version 01, we use the semiring
18 | symbols `add` and `mul` to represent the sum and product operations on
19 | the return type. We then define the "symbol accepted" boolean to return
20 | either the `zero` or `one` type of the semiring.
21 |
22 | I've provided two semirings: One of (0, 1, +, *, Integers), and one of
23 | (False, True, ||, &&, Booleans). Both work well.
24 |
25 | Rust isn't nearly as magical as Haskell. (See the Readme in the
26 | equivalent Haskell version for my comments on that.) On the other hand,
27 | it's not necessary to define a Semiring explicitly; instead, we define a
28 | nominative type, a struct containing our real return type, and then
29 | provide implementations of One, Zero, Mul, and Add for that type. Here,
30 | my two semirings are name `Recognizer` and `Ambigcounter`, and to make
31 | them work we have to say that our recognizer is a `Regw`;
32 | Rust won't magically glue everything together the way Haskell will.
33 |
34 | Still, this was a straightforward implementation of the rigged regular
35 | expression, and is a good stepping stone for future projects.
36 |
37 | ## License
38 |
39 | As this is entirely my work, it is copyright (c) 2019, and licensed
40 | under the Mozilla Public License v. 2.0. See the
41 | [LICENSE.md](../../LICENSE.md) in the root directory.
42 |
--------------------------------------------------------------------------------
/rust/03_brzozowski_1/.gitignore:
--------------------------------------------------------------------------------
1 | /target
2 | **/*.rs.bk
3 | Cargo.lock
4 |
--------------------------------------------------------------------------------
/rust/03_brzozowski_1/Cargo.toml:
--------------------------------------------------------------------------------
1 | [package]
2 | name = "sbrz"
3 | version = "0.1.0"
4 | authors = ["Elf M. Sternberg "]
5 | edition = "2018"
6 |
7 | [dependencies]
8 |
--------------------------------------------------------------------------------
/rust/03_brzozowski_1/README.md:
--------------------------------------------------------------------------------
1 | # Brzozowski Regular Expressions, in Rust
2 |
3 | This is a regex recognizer implementing Brzozowski's Algorithm, in Rust.
4 | It has two standard optimizations (null branches are automatically
5 | pruned), and with those it works fine.
6 |
7 | This version implements regular expressions as they appear in the Racket
8 | version, without nullability optimizations (the so-called "rerp"
9 | implementation).
10 |
11 | ## License
12 |
13 | As this is entirely my work, it is copyright (c) 2019, and licensed
14 | under the Mozilla Public License v. 2.0. See the
15 | [LICENSE.md](../../LICENSE.md) in the root directory.
16 |
--------------------------------------------------------------------------------
/rust/03_brzozowski_1/src/lib.rs:
--------------------------------------------------------------------------------
1 | use std::ops::Deref;
2 | use std::rc::Rc;
3 |
4 | #[derive(Debug)]
5 | pub enum Brz {
6 | Emp,
7 | Eps,
8 | Sym(char),
9 | Alt(Rc, Rc),
10 | Seq(Rc, Rc),
11 | Rep(Rc),
12 | }
13 |
14 | macro_rules! matches {
15 | ($expression:expr, $($pattern:tt)+) => {
16 | match $expression {
17 | $($pattern)+ => true,
18 | _ => false
19 | }
20 | }
21 | }
22 |
23 | macro_rules! cond {
24 | ($($pred:expr => $body:block),+ ,_ => $default:block) => {
25 | {
26 | $(if $pred $body else)+
27 | $default
28 | }
29 | }
30 | }
31 |
32 | pub fn emp() -> Rc {
33 | Rc::new(Brz::Emp)
34 | }
35 |
36 | pub fn eps() -> Rc {
37 | Rc::new(Brz::Eps)
38 | }
39 |
40 | pub fn sym(c: char) -> Rc {
41 | Rc::new(Brz::Sym(c))
42 | }
43 |
44 | pub fn alt(r1: &Rc, r2: &Rc) -> Rc {
45 | cond!(
46 | matches!(r1.deref(), Brz::Emp) => { r2.clone() },
47 | matches!(r2.deref(), Brz::Emp) => { r1.clone() },
48 | _ => { Rc::new(Brz::Alt(r1.clone(), r2.clone())) }
49 | )
50 | }
51 |
52 | pub fn seq(r1: &Rc, r2: &Rc) -> Rc {
53 | cond!(
54 | matches!(r1.deref(), Brz::Emp) => { emp() },
55 | matches!(r2.deref(), Brz::Emp) => { emp() },
56 | _ => { Rc::new(Brz::Seq(r1.clone(), r2.clone())) }
57 | )
58 | }
59 |
60 | pub fn rep(r1: &Rc) -> Rc {
61 | Rc::new(Brz::Rep(r1.clone()))
62 | }
63 |
64 | pub fn derive(n: &Rc, c: char) -> Rc {
65 | use self::Brz::*;
66 |
67 | match n.deref() {
68 | Emp => emp(),
69 | Eps => emp(),
70 | Sym(u) => {
71 | if c == *u {
72 | eps()
73 | } else {
74 | emp()
75 | }
76 | }
77 | Seq(l, r) => {
78 | let s = seq(&derive(l, c), r);
79 | if nullable(l) {
80 | alt(&s, &derive(r, c))
81 | } else {
82 | s
83 | }
84 | }
85 | Alt(l, r) => alt(&derive(l, c), &derive(r, c)),
86 | Rep(r) => seq(&derive(r, c), &n.clone()),
87 | }
88 | }
89 |
90 | pub fn nullable(n: &Rc) -> bool {
91 | use self::Brz::*;
92 |
93 | match n.deref() {
94 | Emp => false,
95 | Eps => true,
96 | Sym(_) => false,
97 | Seq(l, r) => nullable(l) && nullable(r),
98 | Alt(l, r) => nullable(l) || nullable(r),
99 | Rep(_) => true,
100 | }
101 | }
102 |
103 | pub fn accept(n: &Rc, s: String) -> bool {
104 | use self::Brz::*;
105 |
106 | let mut source = s.chars().peekable();
107 | let mut r = n.clone();
108 | loop {
109 | match source.next() {
110 | None => break nullable(&r),
111 | Some(ref c) => {
112 | let np = derive(&r, *c);
113 | println!("{:?}", np);
114 | match np.deref() {
115 | Emp => return false,
116 | Eps => {
117 | break match source.peek() {
118 | None => true,
119 | Some(_) => false,
120 | };
121 | }
122 | _ => r = np.clone(),
123 | }
124 | }
125 | }
126 | }
127 | }
128 |
129 | #[cfg(test)]
130 | mod tests {
131 | use super::*;
132 |
133 | #[test]
134 | fn basics() {
135 | let cases = [
136 | ("empty", eps(), "", true),
137 | ("null", emp(), "", false),
138 | ("char", sym('a'), "a", true),
139 | ("not char", sym('a'), "b", false),
140 | ("char vs empty", sym('a'), "", false),
141 | ("left alt", alt(&sym('a'), &sym('b')), "a", true),
142 | ("right alt", alt(&sym('a'), &sym('b')), "b", true),
143 | ("neither alt", alt(&sym('a'), &sym('b')), "c", false),
144 | ("empty alt", alt(&sym('a'), &sym('b')), "", false),
145 | ("empty rep", rep(&sym('a')), "", true),
146 | ("one rep", rep(&sym('a')), "a", true),
147 | ("short multiple failed rep", rep(&sym('a')), "ab", false),
148 | ("multiple rep", rep(&sym('a')), "aaaaaaaaa", true),
149 | (
150 | "multiple rep with failure",
151 | rep(&sym('a')),
152 | "aaaaaaaaab",
153 | false,
154 | ),
155 | ("sequence", seq(&sym('a'), &sym('b')), "ab", true),
156 | ("sequence with empty", seq(&sym('a'), &sym('b')), "", false),
157 | ("bad short sequence", seq(&sym('a'), &sym('b')), "a", false),
158 | ("bad long sequence", seq(&sym('a'), &sym('b')), "abc", false),
159 | ];
160 |
161 | for (name, case, sample, result) in &cases {
162 | println!("{:?}", name);
163 | assert_eq!(accept(case, sample.to_string()), *result);
164 | }
165 | }
166 | }
167 |
--------------------------------------------------------------------------------
/rust/04_brzozowski_2/.gitignore:
--------------------------------------------------------------------------------
1 | /target
2 | **/*.rs.bk
3 | Cargo.lock
4 |
--------------------------------------------------------------------------------
/rust/04_brzozowski_2/Cargo.toml:
--------------------------------------------------------------------------------
1 | [package]
2 | name = "sbrz"
3 | version = "0.1.0"
4 | authors = ["Elf M. Sternberg "]
5 | edition = "2018"
6 |
7 | [dependencies]
8 |
--------------------------------------------------------------------------------
/rust/04_brzozowski_2/README.md:
--------------------------------------------------------------------------------
1 | # Brzozowski Regular Expressions, in Rust
2 |
3 | This is a regex recognizer implementing Brzozowski's Algorithm, in Rust.
4 | It has two standard optimizations (null branches are automatically
5 | pruned), and with those it works fine.
6 |
7 | This version implements regular expressions as they appear in my own
8 | Haskell version, version, without nullability optimizations (the
9 | so-called "rerp" implementation).
10 |
11 | The difference between the two baseline implementations is that this one
12 | attempts to "object orient" the code, creating an implementation that
13 | can be modified without having to touch many portions of the code, by
14 | isolating the 'derive' and 'nullability' tests into their own
15 | implementation of the BrzNode. I consider the experiment something of a
16 | failure, in that to "work around" Rust's lack of inheritance I had to do
17 | some fairly wacky things to teach Rust how to look stuff up.
18 |
19 | ## License
20 |
21 | As this is entirely my work, it is copyright (c) 2019, and licensed
22 | under the Mozilla Public License v. 2.0. See the
23 | [LICENSE.md](../../LICENSE.md) in the root directory.
24 |
--------------------------------------------------------------------------------
/rust/04_brzozowski_2/src/lib.rs:
--------------------------------------------------------------------------------
1 | use std::ops::Deref;
2 | use std::rc::Rc;
3 |
4 | #[derive(Debug)]
5 | pub struct Emp;
6 | #[derive(Debug)]
7 | pub struct Eps;
8 | #[derive(Debug)]
9 | pub struct Sym(char);
10 | #[derive(Debug)]
11 | pub struct Alt(Rc, Rc);
12 | #[derive(Debug)]
13 | pub struct Seq(Rc, Rc);
14 | #[derive(Debug)]
15 | pub struct Rep(Rc);
16 |
17 | #[derive(Debug)]
18 | pub enum Brz {
19 | Emp(Emp),
20 | Eps(Eps),
21 | Sym(Sym),
22 | Alt(Alt),
23 | Seq(Seq),
24 | Rep(Rep),
25 | }
26 |
27 | impl Brz {
28 | fn derive(&self, c: char) -> Rc {
29 | match self {
30 | Brz::Emp(emp) => emp.derive(c),
31 | Brz::Eps(eps) => eps.derive(c),
32 | Brz::Sym(sym) => sym.derive(c),
33 | Brz::Alt(alt) => alt.derive(c),
34 | Brz::Seq(seq) => seq.derive(c),
35 | Brz::Rep(rep) => rep.derive(c),
36 | }
37 | }
38 |
39 | fn nullable(&self) -> bool {
40 | match self {
41 | Brz::Emp(emp) => emp.nullable(),
42 | Brz::Eps(eps) => eps.nullable(),
43 | Brz::Sym(sym) => sym.nullable(),
44 | Brz::Alt(alt) => alt.nullable(),
45 | Brz::Seq(seq) => seq.nullable(),
46 | Brz::Rep(rep) => rep.nullable(),
47 | }
48 | }
49 | }
50 |
51 | trait Brznode {
52 | fn derive(&self, c: char) -> Rc;
53 | fn nullable(&self) -> bool;
54 | }
55 |
56 | impl Brznode for Emp {
57 | fn derive(&self, _: char) -> Rc {
58 | Rc::new(Brz::Emp(Emp {}))
59 | }
60 | fn nullable(&self) -> bool {
61 | false
62 | }
63 | }
64 |
65 | impl Brznode for Eps {
66 | fn derive(&self, _: char) -> Rc {
67 | Rc::new(Brz::Emp(Emp {}))
68 | }
69 | fn nullable(&self) -> bool {
70 | true
71 | }
72 | }
73 |
74 | impl Brznode for Sym {
75 | fn derive(&self, c: char) -> Rc {
76 | Rc::new(if c == self.0 {
77 | Brz::Eps(Eps {})
78 | } else {
79 | Brz::Emp(Emp {})
80 | })
81 | }
82 | fn nullable(&self) -> bool {
83 | false
84 | }
85 | }
86 |
87 | pub fn alt(r1: &Rc, r2: &Rc) -> Rc {
88 | match (r1.deref(), r2.deref()) {
89 | (_, Brz::Emp(_)) => r1.clone(),
90 | (Brz::Emp(_), _) => r2.clone(),
91 | _ => Rc::new(Brz::Alt(Alt(r1.clone(), r2.clone()))),
92 | }
93 | }
94 |
95 | impl Brznode for Alt {
96 | fn derive(&self, c: char) -> Rc {
97 | let l = &self.0.derive(c);
98 | let r = &self.1.derive(c);
99 | alt(l, r)
100 | }
101 | fn nullable(&self) -> bool {
102 | self.0.nullable() || self.1.nullable()
103 | }
104 | }
105 |
106 | pub fn seq(r1: &Rc, r2: &Rc) -> Rc {
107 | match (r1.deref(), r2.deref()) {
108 | (_, Brz::Emp(_)) => emp(),
109 | (Brz::Emp(_), _) => emp(),
110 | _ => Rc::new(Brz::Seq(Seq(r1.clone(), r2.clone()))),
111 | }
112 | }
113 |
114 | impl Brznode for Seq {
115 | fn derive(&self, c: char) -> Rc {
116 | let s = seq(&self.0.derive(c), &self.1);
117 | if self.0.nullable() {
118 | alt(&s, &self.1.derive(c))
119 | } else {
120 | s
121 | }
122 | }
123 | fn nullable(&self) -> bool {
124 | self.0.nullable() && self.1.nullable()
125 | }
126 | }
127 |
128 | impl Brznode for Rep {
129 | fn derive(&self, c: char) -> Rc {
130 | seq(&self.0.derive(c), &rep(&self.0))
131 | }
132 | fn nullable(&self) -> bool {
133 | true
134 | }
135 | }
136 |
137 | pub fn emp() -> Rc {
138 | Rc::new(Brz::Emp(Emp))
139 | }
140 |
141 | pub fn eps() -> Rc {
142 | Rc::new(Brz::Eps(Eps))
143 | }
144 |
145 | pub fn sym(c: char) -> Rc {
146 | Rc::new(Brz::Sym(Sym(c)))
147 | }
148 |
149 | pub fn rep(r1: &Rc) -> Rc {
150 | Rc::new(Brz::Rep(Rep(r1.clone())))
151 | }
152 |
153 | pub fn accept(n: &Rc, s: String) -> bool {
154 | let mut source = s.chars().peekable();
155 | let mut r = n.clone();
156 | loop {
157 | match source.next() {
158 | None => break r.nullable(),
159 | Some(c) => {
160 | let np = r.derive(c);
161 | match np.deref() {
162 | Brz::Emp(_) => return false,
163 | Brz::Eps(_) => break source.peek().is_none(),
164 | _ => r = np.clone(),
165 | }
166 | }
167 | }
168 | }
169 | }
170 |
171 | #[cfg(test)]
172 | mod tests {
173 | use super::*;
174 |
175 | #[test]
176 | fn basics() {
177 | let cases = [
178 | ("empty", eps(), "", true),
179 | ("null", emp(), "", false),
180 | ("char", sym('a'), "a", true),
181 | ("not char", sym('a'), "b", false),
182 | ("char vs empty", sym('a'), "", false),
183 | ("left alt", alt(&sym('a'), &sym('b')), "a", true),
184 | ("right alt", alt(&sym('a'), &sym('b')), "b", true),
185 | ("neither alt", alt(&sym('a'), &sym('b')), "c", false),
186 | ("empty alt", alt(&sym('a'), &sym('b')), "", false),
187 | ("empty rep", rep(&sym('a')), "", true),
188 | ("sequence", seq(&sym('a'), &sym('b')), "ab", true),
189 | ("sequence with empty", seq(&sym('a'), &sym('b')), "", false),
190 | ("bad long sequence", seq(&sym('a'), &sym('b')), "abc", false),
191 | ("bad short sequence", seq(&sym('a'), &sym('b')), "a", false),
192 | ("one rep", rep(&sym('a')), "a", true),
193 | ("short multiple failed rep", rep(&sym('a')), "ab", false),
194 | ("multiple rep", rep(&sym('a')), "aaaaaaaaa", true),
195 | (
196 | "multiple rep with failure",
197 | rep(&sym('a')),
198 | "aaaaaaaaab",
199 | false,
200 | ),
201 | ];
202 |
203 | for (name, case, sample, result) in &cases {
204 | println!("{:?}", name);
205 | assert_eq!(accept(case, sample.to_string()), *result);
206 | }
207 | }
208 | }
209 |
--------------------------------------------------------------------------------
/rust/05_glushkov/Cargo.toml:
--------------------------------------------------------------------------------
1 | [package]
2 | name = "glushkov"
3 | version = "0.1.0"
4 | authors = ["Elf M. Sternberg "]
5 | edition = "2018"
6 |
7 | [dependencies]
8 |
--------------------------------------------------------------------------------
/rust/05_glushkov/README.md:
--------------------------------------------------------------------------------
1 | # Glushkov Regular Expressions, in Rust
2 |
3 | This is a Glushkov's construction of regular expressions. The basic idea
4 | is that for every symbol encountered during parsing, a corresponding
5 | symbol in the tree is marked (or, if no symbols are marked, the parse
6 | is a failure). Composites are followed to their ends for each
7 | character, and if the symbol matches it is "marked".
8 |
9 | In this instance, we create a Glushkov regular expression tree, and for
10 | each character it returns a new, complete copy of the tree, only with
11 | the marks "shifted" to where they should be given the character. In
12 | this way, each iteration of the tree keeps the NFA list of states that
13 | are active; they are the paths that lead to marked symbols.
14 |
15 | `ended` here means that no more symbols have to be read to match the
16 | expression. `empty` here means that the expression matches only the
17 | empty string. This function was named `final` in the Haskell version,
18 | but the word `ended` is used here because `final` is a reserved word in
19 | Rust.
20 |
21 | 'ended' is used here to determine if, for the Glushkov expression
22 | passed in, does the expression contain a marked symbol? This is
23 | used both to determine the end state of the expression, and in
24 | sequences to determine if the rightmost expression must be evaluted,
25 | that is, if we're currently going down a 'marked' path and the left
26 | expression can handle the empty string OR the left expression is
27 | ended.
28 |
29 | The accept method is just a fold over the expression. The initial
30 | value is the shift of the first character, with the assumed mark of
31 | `True` being included because we can always parse infinitely many
32 | empty strings before the sample begins. The returned value of that
33 | shift is our new regular expression, on which we then progressively
34 | call `shift False accg c`; here False means that we're only going to
35 | shift marks we've already found.
36 |
37 | The "trick" to understanding how this works is to consider the string
38 | "abc" for the expression "abc". The first time through, we start with
39 | True, and what gets marked is the symbol 'a':
40 |
41 | `(seq 'a' (seq 'b' 'c')) -> (seq 'a'* (seq 'b' 'c'))`
42 |
43 | When we pass the letter 'b', what happens? Well, the returned
44 | expression will have the 'a' symbol unmarked (it didn't match the
45 | character), but the second part of the shift expression says that the
46 | left expression is ended (it's a symbol and it was marked!), so we call
47 | `shift True (Sym 'b') 'b'`, and the new symbol generated will be marked,
48 | moving the mark to the correct destination. The same thing happens on
49 | the next iteration. The *inner seq* will get back that `(sym 'b')` is
50 | marked, so 'c' will match the `(sym 'c')` and shift will be in a `True`
51 | state, so now the expression comes back `(seq 'a' (seq 'b' 'c'*))`.
52 |
53 | When we run out of letters or regex, we can ask, "Is the expression
54 | final?" Again, the tricky part is inside sequences: we're only final if
55 | the left side is final and the right side can handle an empty string, or
56 | if the right side is final.
57 |
58 | Porting this from Haskell was *much* more straightforward than porting
59 | the straight regex versions, and is slightly more efficient, although
60 | it still has the "transition the entire parse tree every character"
61 | problem. That's to be solved later.
62 |
63 | ## License
64 |
65 | As this is entirely my work, it is copyright (c) 2019, and licensed
66 | under the Mozilla Public License v. 2.0. See the
67 | [LICENSE.md](../../LICENSE.md) in the root directory.
68 |
--------------------------------------------------------------------------------
/rust/05_glushkov/src/lib.rs:
--------------------------------------------------------------------------------
1 | //! This crate provides a series of simple functions for building a
2 | //! regular expression, and an `accept` function which takes a
3 | //! completed regular expression and a string and returns a boolean
4 | //! value describing if the expression matched the string (or not).
5 | //!
6 | //! # Quick Preview
7 | //!
8 | //! ```
9 | //! use glushkov::*;
10 | //! // `(fred|dino)`
11 | //! let expr = alt(&seq(&sym('f'), &seq(&sym('r'), &seq(&sym('e'), &sym('d')))),
12 | //! &seq(&sym('d'), &seq(&sym('i'), &seq(&sym('n'), &sym('o')))));
13 | //! accept(&expr, "fred") == true;
14 | //! accept(&expr, "dino") == true;
15 | //! accept(&expr, "wilma") == false;
16 | //! ```
17 |
18 | use std::ops::Deref;
19 | use std::rc::Rc;
20 |
21 | #[derive(Debug)]
22 | pub enum Glu {
23 | Eps,
24 | Sym(bool, char),
25 | Alt(Rc, Rc),
26 | Seq(Rc, Rc),
27 | Rep(Rc),
28 | }
29 |
30 | /// Recognize only the empty string
31 | pub fn eps() -> Rc {
32 | Rc::new(Glu::Eps)
33 | }
34 |
35 | /// Recognize a single character
36 | pub fn sym(c: char) -> Rc {
37 | Rc::new(Glu::Sym(false, c))
38 | }
39 |
40 | /// Recognize alternatives between two other regexes
41 | pub fn alt(r1: &Rc, r2: &Rc) -> Rc {
42 | Rc::new(Glu::Alt(r1.clone(), r2.clone()))
43 | }
44 |
45 | /// Recognize a sequence of regexes in order
46 | pub fn seq(r1: &Rc, r2: &Rc) -> Rc {
47 | Rc::new(Glu::Seq(r1.clone(), r2.clone()))
48 | }
49 |
50 | /// Recognize a regex repeated zero or more times.
51 | pub fn rep(r1: &Rc) -> Rc {
52 | Rc::new(Glu::Rep(r1.clone()))
53 | }
54 |
55 | // The main function: repeatedly traverses the tree, modifying as it
56 | // goes, generating a new tree, marking the nodes where the expression
57 | // currently "is," for any given character.
58 | //
59 | pub fn shift(g: &Rc, m: bool, c: char) -> Rc {
60 | match g.deref() {
61 | Glu::Eps => eps(),
62 | Glu::Sym(_, s) => Rc::new(Glu::Sym(m && *s == c, *s)),
63 | Glu::Alt(r1, r2) => alt(&shift(r1, m, c), &shift(r2, m, c)),
64 | Glu::Seq(r1, r2) => {
65 | let l_end = empty(r1);
66 | let l_fin = ended(r1);
67 | seq(&shift(r1, m, c), &shift(r2, m && l_end || l_fin, c))
68 | }
69 | Glu::Rep(r) => rep(&shift(r, m || ended(r), c)),
70 | }
71 | }
72 |
73 | // Helper function that describes whether or not the expression passed
74 | // in contains the mark; used to determine if, when either the string
75 | // or the expression runs out, if the expression is in an "accept"
76 | // state.
77 | //
78 | pub fn ended(g: &Rc) -> bool {
79 | match g.deref() {
80 | Glu::Eps => false,
81 | Glu::Sym(m, _) => *m,
82 | Glu::Alt(r1, r2) => ended(r1) || ended(r2),
83 | Glu::Seq(r1, r2) => ended(r1) && empty(r2) || ended(r2),
84 | Glu::Rep(r) => ended(r),
85 | }
86 | }
87 |
88 | // Helper function that describes whether or not the expression
89 | // supplied handles the empty string.
90 | //
91 | pub fn empty(g: &Rc) -> bool {
92 | match g.deref() {
93 | Glu::Eps => true,
94 | Glu::Sym(_, _) => false,
95 | Glu::Alt(r1, r2) => empty(r1) || empty(r2),
96 | Glu::Seq(r1, r2) => empty(r1) && empty(r2),
97 | Glu::Rep(_) => true,
98 | }
99 | }
100 |
101 | /// Takes a regular expression and a string and returns whether or not
102 | /// the expression and the string match (the string belongs to the
103 | /// set of languages recognized by the expression).
104 | pub fn accept(g: &Rc, s: &str) -> bool {
105 | if s.is_empty() {
106 | return empty(g);
107 | }
108 |
109 | pub fn ashift(g: Rc, c: char) -> Rc {
110 | shift(&g, false, c)
111 | }
112 |
113 | // This is kinda cool. I wonder if I can make the Brz versions look
114 | // like this.
115 | let mut seq = s.chars();
116 | let start = shift(g, true, seq.next().unwrap());
117 | ended(&seq.fold(start, ashift))
118 | }
119 |
120 | #[cfg(test)]
121 | mod tests {
122 | use super::*;
123 |
124 | #[test]
125 | fn basics() {
126 | let cases = [
127 | ("empty", eps(), "", true),
128 | ("char", sym('a'), "a", true),
129 | ("not char", sym('a'), "b", false),
130 | ("char vs empty", sym('a'), "", false),
131 | ("left alt", alt(&sym('a'), &sym('b')), "a", true),
132 | ("right alt", alt(&sym('a'), &sym('b')), "b", true),
133 | ("neither alt", alt(&sym('a'), &sym('b')), "c", false),
134 | ("empty alt", alt(&sym('a'), &sym('b')), "", false),
135 | ("empty rep", rep(&sym('a')), "", true),
136 | ("sequence", seq(&sym('a'), &sym('b')), "ab", true),
137 | ("sequence with empty", seq(&sym('a'), &sym('b')), "", false),
138 | ("bad long sequence", seq(&sym('a'), &sym('b')), "abc", false),
139 | ("bad short sequence", seq(&sym('a'), &sym('b')), "a", false),
140 | ("one rep", rep(&sym('a')), "a", true),
141 | ("short multiple failed rep", rep(&sym('a')), "ab", false),
142 | ("multiple rep", rep(&sym('a')), "aaaaaaaaa", true),
143 | (
144 | "multiple rep with failure",
145 | rep(&sym('a')),
146 | "aaaaaaaaab",
147 | false,
148 | ),
149 | ];
150 |
151 | for (name, case, sample, result) in &cases {
152 | println!("{:?}", name);
153 | assert_eq!(accept(case, &sample.to_string()), *result);
154 | }
155 | }
156 | }
157 |
--------------------------------------------------------------------------------
/rust/06_riggedglushkov/Cargo.toml:
--------------------------------------------------------------------------------
1 | [package]
2 | name = "glushkov"
3 | version = "0.1.0"
4 | authors = ["Elf M. Sternberg "]
5 | edition = "2018"
6 |
7 | [dependencies]
8 | num-traits = "0.2.6"
9 |
--------------------------------------------------------------------------------
/rust/06_riggedglushkov/README.md:
--------------------------------------------------------------------------------
1 | # Rigged Glushkov Regular Expressions in Rust
2 |
3 | This code is significantly different from the Haskell version (Haskell
4 | Experiment 07), in that I decided to "go for it" and merge the process
5 | of instantiation and rigging into a single structure.
6 |
7 | Prior to Rust Experiment 06, the Rust experiments had followed the
8 | Haskell versions' pattern of building the regular expression first using
9 | Kleene expressions, and then lifting them on-the-fly into more complex,
10 | "rigged" versions, before processing them with the given Kleene or
11 | Glushkov construction.
12 |
13 | Rust famously doesn't have memory management, but instead uses lifetimes
14 | and scopes to place much of what it does safely on the stack. It takes
15 | some fiddling to make types, lifetimes, and scopes line up, so as far
16 | back as the first Rust experiment I had individual factory functions for
17 | the different Regex sub-types. These take the place of the simple
18 | `data` types seen in the equivalent Haskell experiments.
19 |
20 | With that in mind, there was no reason to have two different data
21 | structures; for this experiment, there is only the one `enum` types and
22 | its sub-types.
23 |
24 | In this experiment, as in the Haskell version, I build on their idea of
25 | recording already found "empty" and "final" versions of nodes, so the
26 | data structure is now a record of (empty, final, expression). The
27 | `emp`, `alt`, `seq`, and `rep` expressions are pretty much as you'd
28 | expect; one thing you won't find in the base code is the implementation
29 | for `sym`. `sym` must implemented independently for different semiring
30 | implementations.
31 |
32 | The `Sym` expression is a *trait* now; it says that users must provide
33 | an implementation with a single method, `is`, that takes a symbol and
34 | returns a semiring.
35 |
36 | I've had to abandon the use of `num_traits` and `std::ops`, as
37 | `std::ops::Mul` and `std::ops::Add` don't provide a framework for
38 | passing in references. I've gone back to my initial instincts and
39 | provided a comprehensive `Semiring` trait which can take references for
40 | the `mul` and `add` operations. This works very well, as now I can
41 | analyze and operate on immutable `HashSet` collections without
42 | having to clone them to pass them to the cartesian product operation.
43 | That's a massive win in terms of memory and CPU savings.
44 |
45 | The construction processing the expression using Gluskov's progressive
46 | algorithm is the same as the unrigged version, only we cache the "empty"
47 | and "final" values when they're found and do not recalculate them.
48 |
49 | Down in the tests, you'll find the Boolean version (`Recognizer`) as
50 | well as a string version (`Parser`). Both versions show how to
51 | implement a semiring for doing data extraction, including how to define
52 | a specific Sym implementation for the `Sym` trait, and include a
53 | function to instantiate that implementation for your use case.
54 |
55 | The `Parser` version has a specific moment of complexity that can't be
56 | elided: in the `mul` implementation, the multiplication of two sets is
57 | the cartesian product of those sets: a new set containing all possible
58 | combinations of ordered tuples made up of a member of the first set with
59 | a member of the second set. For our purposes, this is a still a string,
60 | so our implementation involves building those tuples and then generating
61 | a new string by concatenating the orderded pair. This takes a bit of
62 | memory thrashing, but much less now that I've solved the `mul(&x, &y)`
63 | issue.
64 |
65 | ## License
66 |
67 | As this is entirely my work, it is copyright (c) 2019, and licensed
68 | under the Mozilla Public License v. 2.0. See the
69 | [LICENSE.md](../../LICENSE.md) in the root directory.
70 |
--------------------------------------------------------------------------------
/rust/06_riggedglushkov/src/lib.rs:
--------------------------------------------------------------------------------
1 | //! This crate provides a series of simple functions for building a
2 | //! regular expression, and an `accept` function which takes a
3 | //! completed regular expression and a string and returns a boolean
4 | //! value describing if the expression matched the string (or not).
5 | //!
6 |
7 | use std::rc::Rc;
8 |
9 | pub trait Semiring {
10 | fn zero() -> Self;
11 | fn one() -> Self;
12 | fn is_zero(&self) -> bool;
13 | fn mul(&self, rhs: &Self) -> Self;
14 | fn add(&self, rhs: &Self) -> Self;
15 | }
16 |
17 | /// The Sym trait represents what to do for a single character. It has
18 | /// a single method, "is", that returns the semiring. Implementers of
19 | /// "is" must provide a corresponding construction factory.
20 |
21 | pub trait Sym
22 | where
23 | S: Semiring,
24 | {
25 | fn is(&self, c: char) -> S;
26 | }
27 |
28 | pub enum Glui
29 | where
30 | S: Semiring,
31 | {
32 | Eps,
33 | Sym(Rc>),
34 | Alt(Rc>, Rc>),
35 | Seq(Rc>, Rc>),
36 | Rep(Rc>),
37 | }
38 |
39 | // Empty, Final, Data
40 | pub struct Glu(S, S, Glui);
41 |
42 | /// Recognize only the empty string
43 | pub fn eps() -> Rc>
44 | where
45 | S: Semiring,
46 | {
47 | Rc::new(Glu(S::one(), S::one(), Glui::Eps))
48 | }
49 |
50 | /// Recognize alternatives between two other regexes
51 | pub fn alt(r1: &Rc>, r2: &Rc>) -> Rc>
52 | where
53 | S: Semiring,
54 | {
55 | Rc::new(Glu(
56 | r1.0.add(&r2.0),
57 | r1.1.add(&r2.1),
58 | Glui::Alt(r1.clone(), r2.clone()),
59 | ))
60 | }
61 |
62 | /// Recognize a sequence of regexes in order
63 | pub fn seq(r1: &Rc>, r2: &Rc>) -> Rc>
64 | where
65 | S: Semiring,
66 | {
67 | Rc::new(Glu(
68 | r1.0.add(&r2.0),
69 | r1.1.mul(&r2.0).add(&r2.1),
70 | Glui::Seq(r1.clone(), r2.clone()),
71 | ))
72 | }
73 |
74 | /// Recognize a regex repeated zero or more times.
75 | pub fn rep(r1: &Rc>) -> Rc>
76 | where
77 | S: Semiring + Clone,
78 | {
79 | Rc::new(Glu(S::one(), r1.1.clone(), Glui::Rep(r1.clone())))
80 | }
81 |
82 | // The main function: repeatedly traverses the tree, modifying as it
83 | // goes, generating a new tree, marking the nodes where the expression
84 | // currently "is," for any given character. The values of the nodes
85 | // are cached for performance, but this probably isn't a win in Rust
86 | // as Rust won't keep the intermediate functions generated, nor
87 | // provide them ad-hoc to future operations the way Haskell does.
88 | //
89 | fn shift(g: &Rc>, m: &S, c: char) -> Rc>
90 | where
91 | S: Semiring + Clone,
92 | {
93 | use self::Glui::*;
94 | match &g.2 {
95 | Eps => eps(),
96 | Sym(f) => Rc::new(Glu(S::zero(), m.mul(&f.is(c)), Glui::Sym(f.clone()))),
97 | Alt(r1, r2) => alt(&shift(&r1, m, c), &shift(&r2, m, c)),
98 | Seq(r1, r2) => seq(
99 | &shift(&r1, m, c),
100 | &shift(&r2, &(m.mul(&r1.0).add(&r1.1)), c),
101 | ),
102 | Rep(r) => rep(&shift(&r, &(m.add(&r.1)), c)),
103 | }
104 | }
105 |
106 | pub fn accept(g: &Rc>, s: &str) -> S
107 | where
108 | S: Semiring + Clone,
109 | {
110 | if s.is_empty() {
111 | return g.0.clone();
112 | }
113 |
114 | let ashift = |g, c| shift(&g, &S::zero(), c);
115 |
116 | // This is kinda cool. I wonder if I can make the Brz versions look
117 | // like this.
118 | let mut seq = s.chars();
119 | let start = shift(g, &S::one(), seq.next().unwrap());
120 | seq.fold(start, ashift).1.clone()
121 | }
122 |
123 | #[cfg(test)]
124 | mod tests {
125 |
126 | use super::*;
127 | use std::collections::HashSet;
128 |
129 | macro_rules! set {
130 | ( $( $x:expr ),* ) => {{
131 | let mut temp_set = HashSet::new();
132 | $( temp_set.insert($x); )*
133 | temp_set //
134 | }};
135 | }
136 |
137 | #[derive(Debug, Copy, Clone)]
138 | pub struct Recognizer(bool);
139 |
140 | impl Semiring for Recognizer {
141 | fn one() -> Recognizer {
142 | Recognizer(true)
143 | }
144 | fn zero() -> Recognizer {
145 | Recognizer(false)
146 | }
147 | fn is_zero(&self) -> bool {
148 | !self.0
149 | }
150 | fn mul(&self, rhs: &Recognizer) -> Recognizer {
151 | Recognizer(self.0 && rhs.0)
152 | }
153 | fn add(&self, rhs: &Recognizer) -> Recognizer {
154 | Recognizer(self.0 || rhs.0)
155 | }
156 | }
157 |
158 | pub struct SimpleSym {
159 | c: char,
160 | }
161 |
162 | impl Sym for SimpleSym {
163 | fn is(&self, c: char) -> Recognizer {
164 | if c == self.c {
165 | Recognizer::one()
166 | } else {
167 | Recognizer::zero()
168 | }
169 | }
170 | }
171 |
172 | #[test]
173 | fn basics() {
174 | pub fn sym(sample: char) -> Rc> {
175 | Rc::new(Glu(
176 | Recognizer::zero(),
177 | Recognizer::zero(),
178 | Glui::Sym(Rc::new(SimpleSym { c: sample })),
179 | ))
180 | }
181 |
182 | let cases = [
183 | ("empty", eps(), "", true),
184 | ("char", sym('a'), "a", true),
185 | ("not char", sym('a'), "b", false),
186 | ("char vs empty", sym('a'), "", false),
187 | ("left alt", alt(&sym('a'), &sym('b')), "a", true),
188 | ("right alt", alt(&sym('a'), &sym('b')), "b", true),
189 | ("neither alt", alt(&sym('a'), &sym('b')), "c", false),
190 | ("empty alt", alt(&sym('a'), &sym('b')), "", false),
191 | ("empty rep", rep(&sym('a')), "", true),
192 | ("sequence", seq(&sym('a'), &sym('b')), "ab", true),
193 | ("sequence with empty", seq(&sym('a'), &sym('b')), "", false),
194 | ("bad long sequence", seq(&sym('a'), &sym('b')), "abc", false),
195 | ("bad short sequence", seq(&sym('a'), &sym('b')), "a", false),
196 | ("one rep", rep(&sym('a')), "a", true),
197 | ("short multiple failed rep", rep(&sym('a')), "ab", false),
198 | ("multiple rep", rep(&sym('a')), "aaaaaaaaa", true),
199 | (
200 | "multiple rep with failure",
201 | rep(&sym('a')),
202 | "aaaaaaaaab",
203 | false,
204 | ),
205 | ];
206 |
207 | for (name, case, sample, result) in &cases {
208 | println!("{:?}", name);
209 | assert_eq!(accept(case, &sample.to_string()).0, *result);
210 | }
211 | }
212 |
213 | #[derive(Debug, Clone)]
214 | pub struct Parser(HashSet);
215 |
216 | impl Semiring for Parser {
217 | fn one() -> Parser {
218 | Parser(set!["".to_string()])
219 | }
220 | fn zero() -> Parser {
221 | Parser(set![])
222 | }
223 | fn is_zero(&self) -> bool {
224 | self.0.len() == 0
225 | }
226 | fn mul(self: &Parser, rhs: &Parser) -> Parser {
227 | let mut temp = set![];
228 | for i in self.0.iter().cloned() {
229 | for j in &rhs.0 {
230 | temp.insert(i.clone() + &j);
231 | }
232 | }
233 | Parser(temp)
234 | }
235 | fn add(self: &Parser, rhs: &Parser) -> Parser {
236 | Parser(self.0.union(&rhs.0).cloned().collect())
237 | }
238 | }
239 |
240 | pub struct ParserSym {
241 | c: char,
242 | }
243 |
244 | impl Sym for ParserSym {
245 | fn is(&self, c: char) -> Parser {
246 | if c == self.c {
247 | Parser(set![c.to_string()])
248 | } else {
249 | Parser::zero()
250 | }
251 | }
252 | }
253 |
254 | #[test]
255 | fn string_basics() {
256 | pub fn sym(sample: char) -> Rc> {
257 | Rc::new(Glu(
258 | Parser::zero(),
259 | Parser::zero(),
260 | Glui::Sym(Rc::new(ParserSym { c: sample })),
261 | ))
262 | }
263 |
264 | let cases = [
265 | ("empty", eps(), "", Some("")),
266 | ("char", sym('a'), "a", Some("a")),
267 | ("not char", sym('a'), "b", None),
268 | ("char vs empty", sym('a'), "", None),
269 | ("left alt", alt(&sym('a'), &sym('b')), "a", Some("a")),
270 | ("right alt", alt(&sym('a'), &sym('b')), "b", Some("b")),
271 | ("neither alt", alt(&sym('a'), &sym('b')), "c", None),
272 | ("empty alt", alt(&sym('a'), &sym('b')), "", None),
273 | ("empty rep", rep(&sym('a')), "", Some("")),
274 | ("sequence", seq(&sym('a'), &sym('b')), "ab", Some("ab")),
275 | ("sequence with empty", seq(&sym('a'), &sym('b')), "", None),
276 | ("bad long sequence", seq(&sym('a'), &sym('b')), "abc", None),
277 | ("bad short sequence", seq(&sym('a'), &sym('b')), "a", None),
278 | ("one rep", rep(&sym('a')), "a", Some("a")),
279 | ("short multiple failed rep", rep(&sym('a')), "ab", None),
280 | (
281 | "multiple rep",
282 | rep(&sym('a')),
283 | "aaaaaaaaa",
284 | Some("aaaaaaaaa"),
285 | ),
286 | (
287 | "multiple rep with failure",
288 | rep(&sym('a')),
289 | "aaaaaaaaab",
290 | None,
291 | ),
292 | ];
293 |
294 | for (name, case, sample, result) in &cases {
295 | println!("{:?}", name);
296 | let ret = accept(case, &sample.to_string()).0;
297 | match result {
298 | Some(r) => {
299 | let v = ret.iter().next();
300 | if let Some(s) = v {
301 | assert_eq!(s, sample);
302 | } else {
303 | panic!("Strings did not match: {:?}, {:?}", r, v);
304 | }
305 | assert_eq!(1, ret.len());
306 | }
307 | None => assert_eq!(0, ret.len()),
308 | }
309 | }
310 | }
311 | }
312 |
--------------------------------------------------------------------------------
/rust/07_heavyweights/Cargo.toml:
--------------------------------------------------------------------------------
1 | [package]
2 | name = "heavyweights"
3 | version = "0.1.0"
4 | authors = ["Elf M. Sternberg "]
5 | edition = "2018"
6 |
7 | [dependencies]
8 | itertools="0.8.0"
9 | bytes="0.4.11"
10 |
--------------------------------------------------------------------------------
/rust/07_heavyweights/README.md:
--------------------------------------------------------------------------------
1 | # Rigged Generic Glushkov Regular Expressions in Rust: The Heavyweight Experiments
2 |
3 | This implementation is significantly different than previous ones,
4 | although it does build off that work, and it does proceed directly from
5 | the [Haskell implementation](../../haskell/08_Heavyweights/) and the
6 | [previous Rust experiment](../../rust/06_riggedglushkov).
7 |
8 | I strongly recommend reading the [Haskell implementation
9 | README](../../haskell/08_Heavyweights/README.md) to get a sense of the
10 | changes to the algorithm. They are fairly heavyweight and interesting,
11 | in that the definition of the Semiring has been further abstracted to
12 | handle position information, and as a consequence the input type of the
13 | operation has likewise been abstracted to handle arbitrary input types.
14 | The last was done to enable us to pass both the character being analyzed
15 | and the position in the stream, such that we could record information
16 | about the position under certain circumstances.
17 |
18 | What makes the *Rust* version of this implementation noteworthy is the
19 | ease with which the inbound data type was changed to handle just about
20 | any kind of data. It adds a bit of genericizing noise to the source
21 | file, some ceremony that makes me wonder how I could abstract or derive
22 | it automatically.
23 |
24 | On the other hand, since the `Recognizer` and `Parser` implementations
25 | concretize that their input type is `char` by usage, they work *completely
26 | unchanged* from the previous Rust experiment. That's remarkable.
27 |
28 | The implementation of the `Leftlong` Semiring, which reports the
29 | location and length of the first, longest substring match of a capture
30 | group (yes!) is fairly extensive and went through a number of thrashes
31 | before I recalled that I could match on a tuple, at which point the
32 | implementations of `add` and `mul` became straightforward.
33 |
34 | Putting the entirety of the semiring in a single trait makes more sense,
35 | to me at least, than abstracting it further over `num_traits`, as I had
36 | it in earlier versions. While using `num_traits` is *clever*, it also
37 | forces us to work with the `::Mul` and `::Add`, which do not take
38 | references, and for larger and more complex semirings, working with
39 | references made a lot of sense.
40 |
41 | The implementation of `submatch`, a function that allows us to search
42 | for arbitrary substrings without having to root or ceiling the string is
43 | interesting; by using the `One()` value, I'm able to preserve the fact
44 | that the search hasn't failed while also enforcing the notion that we're
45 | skipping over things that match `any` but don't match the concrete
46 | sample, which is nifty.
47 |
48 | All in all, this is highly satisfying work, and it's a pleasure to see
49 | it working so well.
50 |
51 | ## License
52 |
53 | As this is entirely my work, it is copyright (c) 2019, and licensed
54 | under the Mozilla Public License v. 2.0. See the
55 | [LICENSE.md](../../LICENSE.md) in the root directory.
56 |
--------------------------------------------------------------------------------
/rust/08_riggedbrz/Cargo.toml:
--------------------------------------------------------------------------------
1 | [package]
2 | name = "riggedbrz"
3 | version = "0.1.0"
4 | authors = ["Elf M. Sternberg "]
5 | edition = "2018"
6 |
7 | [dependencies]
8 |
--------------------------------------------------------------------------------
/rust/08_riggedbrz/README.md:
--------------------------------------------------------------------------------
1 | This is the classic implementation of Brzozowski's Algorithm with
2 | Weighted Semirings. There are very few optimizations in this code. One
3 | aspect that frustrates me is the use of the `Del()` operator; it's a
4 | holdover from a time when I didn't quite understand the interaction
5 | between regular expressions and semirings; its purpose is to preserve
6 | the results of a lazy parsenull() of the sequence. Later, we replace
7 | that with a smarter algorithm.
8 |
--------------------------------------------------------------------------------
/rust/09_riggedbrz/Cargo.toml:
--------------------------------------------------------------------------------
1 | [package]
2 | name = "riggedbrz"
3 | version = "0.1.0"
4 | authors = ["Elf M. Sternberg "]
5 | edition = "2018"
6 |
7 | [dependencies]
8 | hashbrown = "0.1.7"
9 |
--------------------------------------------------------------------------------
/rust/09_riggedbrz/README.md:
--------------------------------------------------------------------------------
1 | # Rigged Generic Brzozowski μ-Regular Expressions
2 |
3 | This experiment realizes an important and significant step in the
4 | series of experiments. In this variant, the `*` *operation* has been
5 | completely removed; the `*` *operator* is now implemented as a recursive
6 | definition:
7 |
8 | // ONE is the identity operator under concatenation
9 | r* = alt(eps(ONE), seq(r, r*))
10 |
11 | This may seem nonsensical to a programmer, but it's actually quite
12 | implementable in a language that allows mutation under limited effects.
13 |
14 | This means that I now have working *μ-regular expressions*, regular
15 | expressions that use a fixpoint operator to identify the least common
16 | fixpoint of a recursive regular expression, and one that allows regular
17 | expressions to be encapsulated as variables and composed just as one
18 | would compose ordinary functions.
19 |
20 | This effort took a *large* number of evolutions, as I went down various
21 | paths trying to write code faster than I could think or understand. At
22 | least twice I had to delete the work in progress and back up to an
23 | earlier commit, throwing away hours of work.
24 |
25 | But this is *it*, for some definition of "it." This is what everything
26 | has been working up to.
27 |
28 | ## Understanding this code
29 |
30 | The first thing to appreciate is that `nullable()` is now a
31 | self-terminating recursive implementation. At its core is the same
32 | nullable() instruction we've been using for a while now, but now when we
33 | determine the nullability of a node in the expression, we cache that
34 | value and we notify all of its parent nodes that they may also be able
35 | to determine nullability. This is useful for cases such as the Alt(),
36 | which has two children: if one is determined to be nullable, then the
37 | other may be as well, in which case it's now possible to mark (cache)
38 | that the entire expression is always nullable. And that's useful if the
39 | expression is going to be re-used.
40 |
41 | And in this code, expressions are composable, re-usable elements of
42 | code. They can be re-used.
43 |
44 | Also of note: the "mutate" codes are implementations of the short-outs
45 | described in Might's last paper on the topic; take a node, take its
46 | inputs, and determine if one of those inputs is already null or empty;
47 | if that's the case, then a *different* node must be expressed in that
48 | position, one that's simpler and faster.
49 |
50 | ## License
51 |
52 | As this is entirely my work, it is copyright (c) 2019, and licensed
53 | under the Mozilla Public License v. 2.0. See the
54 | [LICENSE.md](../../LICENSE.md) in the root directory.
55 |
--------------------------------------------------------------------------------