├── Dockerfile ├── README.md ├── haskell ├── .gitignore ├── README.md ├── app │ └── Main.hs ├── bench │ ├── Bench.hs │ ├── BenchUtils.hs │ └── HaskellBench.hs ├── regex-lookaround.cabal ├── src │ ├── Extracted.hs │ ├── Parser │ │ ├── Parser.hs │ │ └── Types.hs │ └── Utils.hs └── test │ ├── LookaroundTests │ └── Tests.hs │ └── Spec.hs ├── preprint.pdf └── theories ├── .gitignore ├── Abstraction.v ├── CMatcher.v ├── Equations.v ├── Extract.v ├── LRegex.v ├── Layerwise.v ├── ListLemmas.v ├── Makefile ├── OMatcher.v ├── ORegex.v ├── OReverse.v ├── Reverse.v └── _CoqProject /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM alpine:latest 2 | 3 | RUN apk update && apk add --no-cache \ 4 | bash curl git make m4 gcc g++ ocaml \ 5 | "coq=~8.19" "cabal=~3.10" && \ 6 | cabal update 7 | 8 | WORKDIR /workspace 9 | 10 | COPY . . 11 | 12 | RUN cd theories && \ 13 | make && \ 14 | cd ../haskell && \ 15 | cabal build && \ 16 | cabal test 17 | 18 | ENTRYPOINT ["/bin/bash"] -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Verified and Efficient Matching of Regular Expressions with Lookaround 2 | === 3 | 4 | This repository contains the Coq formalization accompanying the paper "Verified and Efficient Matching of Regular Expressions with Lookaround" by Agnishom Chattopadhyay, Angela W. Li and Konstantinos Mamouras, to be presented at CPP 2025. 5 | 6 | > **Abstract** 7 | > 8 | > Regular expressions can be extended with lookarounds for contextual matching. This paper discusses a Coq formalization of the theory of regular expressions with lookarounds. We provide an efficient and purely functional algorithm for matching expressions with lookarounds and verify its correctness. The algorithm runs in time linear in both the size of the regular expression as well as the input string. Our experimental results provide empirical support to our complexity analysis. To the best of our knowledge, this is the first formalization of a linear-time matching algorithm for regular expressions with lookarounds. 9 | 10 | This builds upon [prior work](https://dl.acm.org/doi/10.1145/3632934) published in POPL 2024. 11 | 12 | # Checking Proofs 13 | 14 | To check the proofs, you need Coq 8.19. Once you have Coq installed, you can run `make` from the `theories/` directory. This will run the proof scripts, and extract the code to `haskell/src/Extracted.hs`. 15 | 16 | Alternatively, you can check the proofs inside a container. From the main directory, run `docker build --progress=plain -t lregex .` 17 | 18 | # Proof Structure 19 | 20 | The syntax and semantics of regular expressions with lookarounds are defined in `LRegex.v` (see `LRegex` and `match_regex`). The function `llmatch` in `Layerwise.v` finds the leftmost longest match, and this is proven correct in `llmatch_correct`. 21 | 22 | The files contain detailed comments. A brief overview of the files is as follows: 23 | 24 | - `ListLemmas.v`: lemmas about lists that can be used in conjunction with those in `Coq.Lists.List`. 25 | - `LRegex.v`: syntax and semantics of regular expressions with lookarounds, the definition of `is_tape` 26 | - `Equations.v`: Equations (using `≡`) and inequalities (using `⊑`) on regular expressions. Includes Kleene Axioms, and identities involving lookarounds. 27 | - `ORegex.v`: defintion of strings with oracle valuations (`ostring`), syntax and semantics of regular expressions with oracle queries. 28 | - `Reverse.v`, `OReverse.v`: reversal of regular expressions and oracle regular expressions. 29 | - `Abstraction.v`: connection between regular expressions and oracle regular expressions. 30 | - `OMatcher.v`: matching algorithm for oracle regular expressions on ostrings. 31 | - `CMatcher.v`: an optimization of the same matching algorithm that avoids recomputation using caching 32 | - `Layerwise.v`: the main algorithm for matching regular expressions with lookarounds on strings, and finding the leftmost longest match. 33 | 34 | # Correspondence with the [Preprint](preprint.pdf) 35 | 36 | | Location in Draft | Concept | Coq File | Coq Definition | 37 | | --- | --- | --- | --- | 38 | | Page 2, Definition 1 | Lookaround Syntax | `LRegex.v` | `LRegex` | 39 | | Page 2, Figure 1 | Lookaround Semantics | `LRegex.v` | `match_regex` | 40 | | Page 3, Definition 3 | Regex Equivalence | `Equations.v` | `regex_eq` | 41 | | | Regex Containment | `Equations.v` | `regex_leq` | 42 | | Page 4, Section 3.1 | Oracle Strings | `ORegex.v` | `ostring` | 43 | | Page 4, Definition 4 | ORegex Syntax | `ORegex.v` | `ORegex` | 44 | | | ORegex Semantics | `ORegex.v` | `match_o_regex` | 45 | | Page 5, Definition 6 | Maximal Lookarounds | `Abstraction.v` | `maximal_lookarounds` | 46 | | | Arity | `Abstraction.v` | `arity` | 47 | | Page 5, Definition 7 | Abstraction | `Abstraction.v` | `abstract` | 48 | | Page 5, Definition 9 | Tape | `Abstraction.v` | `is_lookaround_tape` | 49 | | | Tape | `LRegex.v` | `is_tape` | 50 | | | Oracle Valuations | `Abstraction.v` | `is_oval` | 51 | | Page 6, Lemma 11 | Connection between LRegex and ORegex | `Abstraction.v` | `oracle_compose` | 52 | | Page 7, Definition 12 | Synax of Marked Expressions | `OMatcher.v` | `MRegex` | 53 | | | Semantics of Marked Expressions | `OMatcher.v` | `match_mregex` | 54 | | | Stripping Marks | `OMatcher.v` | `strip` | 55 | | Page 8, Figure 2 | Operations on Marked Expressions | `OMatcher.v` | `nullableWith`, `finalWith`, `followWith`, `read`, `shiftWith`, `initMarkWith`| 56 | | Page 7, Lemma 14, Part 1 | Behavior of `nullable` | `OMatcher.v` | `nullable` | `nullableWith_iff` | 57 | | Part 2 | Behavior of `final` | `OMatcher.v` | `finalWith_fw` | 58 | | Part 3 | Behavior of `final` | `OMatcher.v` | `finalWith_bw` | 59 | | Page 7, Lemma 16, Part 1 | Behavior of `read` | `OMatcher.v` | `read_subset` | 60 | | Part 2 | Behavior of `read` | `OMatcher.v` | `read_fw` | 61 | | Part 3 | Behavior of `read` | `OMatcher.v` | `read_bw` | 62 | | Part 4 | Behavior of `read` | `OMatcher.v` | `read_no_spurious` | 63 | | Page 7 | Relation between `follow` and `init` and `shift` | `OMatcher.v` | `followWith_false`, `followWith_true` | 64 | | Page 8, Lemma 18, Part 1 | Behavior of `init` | `OMatcher.v` | `initMarkWith_superset` | 65 | | Part 2 | Behavior of `init` | `OMatcher.v` | `stripLang_in_initMarkWith` | 66 | | Part 3 | Behavior of `init` | `OMatcher.v` | `initMarkWith_bw` | 67 | | Part 4 | Behavior of `shift` | `OMatcher.v` | `shiftWith_fw` | 68 | | Part 5 | Behavior of `shift` | `OMatcher.v` | `shiftWith_bw` | 69 | | Page 9 | Syntax of Marked Regexes with Caching | `CMatcher.v` | `CMRegex` | 70 | | | Uncache | `CMatcher.v` | `uncache` | 71 | | | Smart Constructors for CMRegex | `CMatcher.v` | `mkEpsilon`, `mkCharClass`, `mkQueryPos`, `mkQueryNeg`, `mkConcat`, `mkUnion`, `mkStar` | 72 | | | Synchronization between Valuation and CMRegex | `CMatcher.v` | `synced` | 73 | | Page 9, Figure 3 | Operations on CMRegex | `CMatcher.v` | `syncVal`, `cFollow`, `cRead` | 74 | | Page 9, Lemma 20, Part 1 | Behavior of `sync` | `CMatcher.v` | `synced_syncVal`, `syncVal_unCache` | 75 | | Part 2 | Behavior of `cRead` | `CMatcher.v` | `synced_cRead` | 76 | | Part 3 | Behavior of `cRead` | `CMatcher.v` | `cRead_unCache` | 77 | | Part 4 | Behavior of `cFollow` | `CMatcher.v` | `synced_unCache_followWith` | 78 | | Page 10, Definition 21 | oMatch | `CMatcher.v` | `cScanMatch` | 79 | | Page 10, Theorem 22 | Correctness of oMatch | `CMatcher.v` | `cScanMatch_tape` | 80 | | Page 10, Section 5.1 | Reversal of LRegex | `Reverse.v` | `reverse` | 81 | | | Reversal of ORegex | `OReverse.v` | `oreverse` | 82 | | Page 10, Lemma 24, Part 1 | Property of Reversal (LRegex) | `Reverse.v` | `reverse_match` | 83 | | Part 2 | Property of Reversal (ORegex) | `OReverse.v` | `oreverse_match_iff` | 84 | | Part 3 | Computing Tapes for Lookahead | `Abstraction.v` | `lookahead_tape_is_tape` | 85 | | Part 4 | Reverse of Abstracted Regex | `Layerwise.v` | `is_otape_oval_rev` | 86 | | Page 11, Figure 3 | The Matching Algorithm | `Layerwise.v` | `llmatch`, `scanMatch`, `absEvalAux`, `absEval` | 87 | | Page 11, Lemma 25 | Behavior of `evalAux` | `Layerwise.v` | `absEvalAux_spec` | 88 | | Page 11, Theorem 26 | Correctness of `match` | `Layerwise.v` | `scanMatch_correct` | 89 | | | Correctness of Leftmost Longest Match | `Layerwise.v` | `llmatch_correct` | -------------------------------------------------------------------------------- /haskell/.gitignore: -------------------------------------------------------------------------------- 1 | dist-newstyle/ -------------------------------------------------------------------------------- /haskell/README.md: -------------------------------------------------------------------------------- 1 | - Building: Run `cabal build`. 2 | - Testing: Run `cabal test`. 3 | - Execution: Run `cabal exec regex-lookaround ` and then type in the input, or run `cabal exec regex-lookaround [filepath]`. 4 | - Benchmarking: Run `cabal bench`. 5 | 6 | You can also run use the `+RTS -s` flag to get a more detailed summary of the runtime statistics. For example, run 7 | 8 | ``` 9 | cabal exec regex-lookaround +RTS -s -- "(a?){5000}a{5000}" a.txt 10 | ``` 11 | 12 | 13 | 14 | To do the above, you will need `cabal`. I recommend using installing them via [GHCup](https://www.haskell.org/ghcup/). -------------------------------------------------------------------------------- /haskell/app/Main.hs: -------------------------------------------------------------------------------- 1 | module Main where 2 | 3 | import System.Environment 4 | 5 | import Parser.Parser 6 | import Utils 7 | import Extracted 8 | 9 | fromFile :: LRegex Char -> FilePath -> IO () 10 | fromFile re file = do 11 | contents <- readFile file 12 | print $ match re contents 13 | 14 | interactRe :: LRegex Char -> IO () 15 | interactRe re = getLine >>= print . match re 16 | 17 | main :: IO () 18 | main = do 19 | args <- getArgs 20 | case args of 21 | [regex] -> case stringToLRe regex of 22 | Left err -> putStrLn err 23 | Right re -> interactRe re 24 | [regex, file] -> case stringToLRe regex of 25 | Left err -> putStrLn err 26 | Right re -> fromFile re file 27 | _ -> putStrLn "Usage: lookaroundre [file]" 28 | -------------------------------------------------------------------------------- /haskell/bench/Bench.hs: -------------------------------------------------------------------------------- 1 | 2 | {-# LANGUAGE NumericUnderscores #-} 3 | 4 | module Main (main) where 5 | 6 | import qualified Streaming.Prelude as S 7 | import Streaming (Stream, Of) 8 | 9 | import BenchUtils 10 | import HaskellBench (haskellBench) 11 | 12 | bench :: Tool -> ReFamily -> Int -> InpFamily -> Int -> IO BenchmarkResult 13 | bench Haskell = haskellBench 14 | -- bench PCRE = pcre2Bench 15 | -- bench Java = javaBench 16 | -- bench Lean = leanBench 17 | 18 | -- we let the string stay the same, but increase the parameter n 19 | haskellParamsScaleRegex :: [(Tool, ReFamily, Int, InpFamily, Int)] 20 | haskellParamsScaleRegex = 21 | [ (Haskell, DisjNegLookAhead, n, ManyZthenAtoY, len) | 22 | n <- [1..10], len <- [ 50_000, 100_000 .. 200_000 ] 23 | ] ++ 24 | [ (Haskell, NestedX, n, ManyA, len) | 25 | n <- [1..10], len <- [ 50_000, 100_000 .. 200_000 ] 26 | ] ++ 27 | [ (Haskell, NestedDisj, n, ManyA, len) | 28 | n <- [1..10], len <- [ 50_000, 100_000 .. 200_000 ] 29 | ] 30 | 31 | -- here, we increase the input length for the same regex 32 | haskellParamsScaleInput :: [(Tool, ReFamily, Int, InpFamily, Int)] 33 | haskellParamsScaleInput = 34 | [ (Haskell, DisjNegLookAhead, n, ManyZthenAtoY, len) | 35 | n <- [1..4], len <- [ 30_000, 60_000 .. 300_000 ] 36 | ] ++ 37 | [ (Haskell, NestedX, n, ManyA, len) | 38 | n <- [1..4], len <- [ 30_000, 60_000 .. 300_000 ] 39 | ] ++ 40 | [ (Haskell, NestedDisj, n, ManyA, len) | 41 | n <- [1..4], len <- [ 30_000, 60_000 .. 300_000] 42 | ] 43 | 44 | benchParams :: [(Tool, ReFamily, Int, InpFamily, Int)] 45 | -- benchParams = haskellParams ++ javaParams ++ pcreParams ++ leanParams 46 | benchParams = haskellParamsScaleInput 47 | 48 | nTrials :: Int 49 | nTrials = 10 50 | 51 | resultStream :: Stream (Of BenchmarkResult) IO () 52 | resultStream = S.mapM f . S.each . concat $ replicate nTrials benchParams 53 | where 54 | f = \(tool, reFamily, reParam, inpFamily, inpParam) -> bench tool reFamily reParam inpFamily inpParam 55 | 56 | main :: IO () 57 | main = do 58 | putStrLn benchmarkHeader 59 | S.mapM_ (putStrLn . strBenchmarkResult) resultStream -------------------------------------------------------------------------------- /haskell/bench/BenchUtils.hs: -------------------------------------------------------------------------------- 1 | module BenchUtils where 2 | 3 | import Data.List (intercalate) 4 | 5 | import System.Process 6 | import System.Directory 7 | import System.Exit 8 | 9 | data Tool = Haskell -- | Java | PCRE | Lean 10 | deriving (Show, Eq) 11 | 12 | data ReFamily = DisjNegLookAhead | NestedX | NestedDisj | ManyPlus 13 | deriving (Show, Eq) 14 | 15 | data InpFamily = ManyA | ManyEthenABCD | ManyZthenAtoY 16 | deriving (Show, Eq) 17 | 18 | data BenchmarkResult = BenchmarkResult{ 19 | tool :: Tool, 20 | regex_family :: ReFamily, 21 | regex_param :: Int, 22 | input_family :: InpFamily, 23 | input_length :: Int, 24 | time_in_ms :: Maybe Double 25 | } deriving (Show) 26 | 27 | benchmarkHeader :: String 28 | benchmarkHeader = "tool,regex_family,regex_param,input_family,input_length,time_in_ms" 29 | 30 | strBenchmarkResult :: BenchmarkResult -> String 31 | strBenchmarkResult (BenchmarkResult tool regex_family regex_param input_family input_length time_in_ms) = 32 | intercalate "," [show tool, show regex_family, show regex_param, show input_family, show input_length, str_time_in_ms] 33 | where 34 | str_time_in_ms = case time_in_ms of 35 | Just t -> show t 36 | Nothing -> "NA" 37 | 38 | mkInput :: InpFamily -> Int -> String 39 | mkInput ManyA n = replicate n 'a' 40 | mkInput ManyEthenABCD n = replicate n 'e' ++ "abcd" 41 | mkInput ManyZthenAtoY n = replicate n 'z' ++ "abcdefghijklmnopqrstuvwy" 42 | 43 | fileNameInput :: String -> InpFamily -> Int -> String 44 | fileNameInput tmpPrefix ManyA n = tmpPrefix ++ "manyA" ++ show n 45 | fileNameInput tmpPrefix ManyEthenABCD n = tmpPrefix ++ "manyEthenABCD" ++ show n 46 | fileNameInput tmpPrefix ManyZthenAtoY n = tmpPrefix ++ "manyZthenAtoY" ++ show n 47 | 48 | saveInput :: String -> InpFamily -> Int -> IO () 49 | saveInput tmpPrefix inpFamily n = do 50 | let filePath = fileNameInput tmpPrefix inpFamily n 51 | alreadyExists <- doesFileExist filePath 52 | if alreadyExists 53 | then return () 54 | else do 55 | createDirectoryIfMissing True tmpPrefix 56 | let cmd = case inpFamily of 57 | ManyA -> "for i in `seq 1 " ++ show n ++ "`; do echo -n a; done > " ++ filePath 58 | ManyEthenABCD -> "(for i in `seq 1 " ++ show n ++ "`; do echo -n e; done; echo -n abcd) > " ++ filePath 59 | ManyZthenAtoY -> "(for i in `seq 1 " ++ show n ++ "`; do echo -n z; done; echo -n abcdefghijklmnopqrstuvwy) > " ++ filePath 60 | (exitCode, _, _) <- readProcessWithExitCode "bash" ["-c", cmd] "" 61 | case exitCode of 62 | ExitSuccess -> return () 63 | ExitFailure _ -> error "saveInput: failed to save input" 64 | 65 | nestedXRe :: Int -> String 66 | nestedXRe n = "a" ++ f n where 67 | f 0 = "" 68 | f 1 = "(?=.*c)" 69 | f n = "(?=.*" ++ f (n - 1) ++ ")" 70 | 71 | nestedDisjRe :: Int -> String 72 | nestedDisjRe n = "a" ++ f n where 73 | letters :: [String] 74 | letters = map (:[]) ['b' .. 'z'] 75 | f 0 = "" 76 | f 1 = "(?=.*b)" 77 | f n = "((?=.*" ++ (letters !! (n - 1)) ++ ")|.*a" ++ f (n - 1) ++ ")" 78 | 79 | disjNegLookAheadRe :: Int -> String 80 | disjNegLookAheadRe n = r ++ ".*" 81 | where 82 | rs = drop 1 [ "(?!(.*" ++ [ch] ++ ".*))" | ch <- take n ['a'..] ] 83 | r = foldl (\r s -> "(" ++ r ++ "|" ++ s ++ ")") "(?!(.*a.*))" rs 84 | 85 | manyPlusRe :: Int -> String 86 | manyPlusRe n = error "manyPlusRe: not implemented" 87 | 88 | mkRe :: ReFamily -> Int -> String 89 | mkRe DisjNegLookAhead = disjNegLookAheadRe 90 | mkRe NestedX = nestedXRe 91 | mkRe NestedDisj = nestedDisjRe 92 | mkRe ManyPlus = manyPlusRe 93 | 94 | fileNameRe :: String -> ReFamily -> Int -> String 95 | fileNameRe tmpPrefix _ _ = tmpPrefix ++ "pattern.re" 96 | 97 | saveRe :: String -> ReFamily -> Int -> IO () 98 | saveRe tmpPrefix reFamily n = do 99 | writeFile (fileNameRe tmpPrefix reFamily n) (mkRe reFamily n) -------------------------------------------------------------------------------- /haskell/bench/HaskellBench.hs: -------------------------------------------------------------------------------- 1 | {-# LANGUAGE NumericUnderscores #-} 2 | 3 | module HaskellBench where 4 | 5 | import Data.Maybe (isJust) 6 | import Control.Monad 7 | import System.Random 8 | import System.Clock 9 | import System.IO 10 | import System.IO.Silently 11 | 12 | import Utils 13 | import Extracted (LRegex(..), llmatch) 14 | 15 | import BenchUtils hiding (nestedXRe, nestedDisjRe, disjNegLookAheadRe, manyPlusRe, mkRe) 16 | 17 | nestedXRe :: Int -> LRegex Char 18 | nestedXRe n = wildCard `Concat` f n `Concat` wildCard 19 | where 20 | f 0 = Epsilon 21 | f 1 = LookAhead $ wildCard `Concat` (fromWord "c") `Concat` wildCard 22 | f n = LookAhead $ wildCard `Concat` (f (n - 1)) `Concat` wildCard 23 | 24 | nestedDisjRe :: Int -> LRegex Char 25 | nestedDisjRe n = wildCard `Concat` fromWord "a" `Concat` f n `Concat` wildCard 26 | where 27 | letters = map (:[]) ['b' .. 'z'] 28 | f 0 = Epsilon 29 | f 1 = LookAhead $ wildCard `Concat` (fromWord "c") `Concat` wildCard 30 | f n = Union part1 part2 where 31 | part1 = wildCard `Concat` fromWord "a" `Concat` f (n - 1) 32 | part2 = LookAhead $ wildCard `Concat` (fromWord $ letters !! n) `Concat` wildCard 33 | 34 | disjNegLookAheadRe :: Int -> LRegex Char 35 | disjNegLookAheadRe n = r `Concat` wildCard 36 | where 37 | rs = drop 1 [ NegLookAhead (wildCard `Concat` fromWord [ch] `Concat` wildCard) | ch <- take n ['a'..]] 38 | r = foldl Union (NegLookAhead $ wildCard `Concat` fromWord "a" `Concat` wildCard) rs 39 | 40 | manyPlusRe :: Int -> LRegex Char 41 | manyPlusRe n = error "manyPlusRe: not implemented" 42 | 43 | mkRe :: ReFamily -> Int -> LRegex Char 44 | mkRe DisjNegLookAhead = disjNegLookAheadRe 45 | mkRe NestedX = nestedXRe 46 | mkRe NestedDisj = nestedDisjRe 47 | mkRe ManyPlus = manyPlusRe 48 | 49 | haskellBench :: ReFamily -> Int -> InpFamily -> Int -> IO BenchmarkResult 50 | haskellBench reFamily reParam inpFamily inpParam = do 51 | let re = mkRe reFamily reParam 52 | let input = mkInput inpFamily inpParam 53 | let re_size = lreSize re 54 | silence . print $ re_size 55 | let input_length = length input 56 | silence . print $ input_length 57 | start <- getTime Monotonic 58 | let b = isJust (llmatch re input) 59 | silence $ print b 60 | end <- getTime Monotonic 61 | let diff_millis = toNanoSecs (end - start) `div` 1_000_000 62 | let time_in_ms = fromIntegral diff_millis 63 | return $ BenchmarkResult Haskell reFamily reParam inpFamily inpParam (Just time_in_ms) -------------------------------------------------------------------------------- /haskell/regex-lookaround.cabal: -------------------------------------------------------------------------------- 1 | cabal-version: 3.0 2 | 3 | -- Initial package description 'regex-lookaround.cabal' generated by 4 | -- 'cabal init'. For further documentation, see: 5 | -- http://haskell.org/cabal/users-guide/ 6 | -- 7 | -- The name of the package. 8 | name: regex-lookaround 9 | 10 | -- The package version. 11 | -- See the Haskell package versioning policy (PVP) for standards 12 | -- guiding when and how versions should be incremented. 13 | -- https://pvp.haskell.org 14 | -- PVP summary: +-+------- breaking API changes 15 | -- | | +----- non-breaking API additions 16 | -- | | | +--- code changes with no API change 17 | version: 0.1.0.0 18 | 19 | -- A short (one-line) description of the package. 20 | synopsis: Regular Expression library with support for lookarounds 21 | 22 | -- A longer description of the package. 23 | -- description: 24 | 25 | -- URL for the project homepage or repository. 26 | homepage: 27 | 28 | -- A URL where users can report bugs. 29 | -- bug-reports: 30 | 31 | -- The license under which the package is released. 32 | license: NONE 33 | 34 | -- The package author(s). 35 | author: Anonymous 36 | 37 | -- An email address to which users can send suggestions, bug reports, and patches. 38 | maintainer: anonymous@example.com 39 | 40 | -- A copyright notice. 41 | -- copyright: 42 | -- category: 43 | 44 | -- Extra files to be distributed with the package, such as examples or a README. 45 | -- extra-source-files: CHANGELOG.md 46 | 47 | library 48 | -- Modules exported by the library. 49 | exposed-modules: 50 | Extracted, 51 | Utils, 52 | Parser.Types, 53 | Parser.Parser, 54 | 55 | -- Modules included in this library but not exported. 56 | -- other-modules: 57 | 58 | -- LANGUAGE extensions used by modules in this package. 59 | -- other-extensions: 60 | 61 | -- Other library packages from which modules are imported. 62 | build-depends: 63 | base >=4.18.0.0, 64 | mtl ^>=2.3.1, 65 | attoparsec ^>=0.14.4, 66 | utf8-string ^>=1.0.2, 67 | safe ^>=0.3.19, 68 | bytestring ^>= 0.12 69 | 70 | 71 | -- Directories containing source files. 72 | hs-source-dirs: src 73 | 74 | -- Base language which the package is written in. 75 | default-language: Haskell2010 76 | 77 | executable regex-lookaround 78 | -- .hs or .lhs file containing the Main module. 79 | main-is: Main.hs 80 | 81 | -- Modules included in this executable, other than Main. 82 | -- other-modules: 83 | 84 | -- LANGUAGE extensions used by modules in this package. 85 | -- other-extensions: 86 | 87 | -- Other library packages from which modules are imported. 88 | build-depends: 89 | base >=4.18.0.0, 90 | regex-lookaround 91 | 92 | -- Directories containing source files. 93 | hs-source-dirs: app 94 | 95 | -- Base language which the package is written in. 96 | default-language: Haskell2010 97 | 98 | ghc-options: 99 | -threaded 100 | -rtsopts 101 | -with-rtsopts=-N 102 | -O2 103 | 104 | test-suite regex-lookaround-test 105 | -- Base language which the package is written in. 106 | default-language: Haskell2010 107 | 108 | -- The interface type and version of the test suite. 109 | type: exitcode-stdio-1.0 110 | 111 | -- Directories containing source files. 112 | hs-source-dirs: test 113 | 114 | -- The entrypoint to the test suite. 115 | main-is: Spec.hs 116 | 117 | other-modules: 118 | LookaroundTests.Tests, 119 | 120 | -- Test dependencies. 121 | build-depends: 122 | base >=4.18.0.0, 123 | hspec, 124 | regex-lookaround 125 | 126 | benchmark regex-lookaround-bench 127 | 128 | default-language: Haskell2010 129 | 130 | type: exitcode-stdio-1.0 131 | 132 | hs-source-dirs: bench 133 | 134 | main-is: Bench.hs 135 | 136 | other-modules: 137 | BenchUtils, 138 | HaskellBench, 139 | 140 | build-depends: 141 | base >=4.18.0.0, 142 | random, 143 | containers, 144 | clock, 145 | silently, 146 | process, 147 | directory, 148 | streaming, 149 | regex-lookaround, 150 | 151 | ghc-options: 152 | -threaded 153 | -rtsopts 154 | -with-rtsopts=-N 155 | -O2 156 | 157 | -------------------------------------------------------------------------------- /haskell/src/Extracted.hs: -------------------------------------------------------------------------------- 1 | module Extracted where 2 | 3 | import qualified Prelude 4 | 5 | fst :: ((,) a1 a2) -> a1 6 | fst p = 7 | case p of { 8 | (,) x _ -> x} 9 | 10 | snd :: ((,) a1 a2) -> a2 11 | snd p = 12 | case p of { 13 | (,) _ y -> y} 14 | 15 | length :: (([]) a1) -> Prelude.Int 16 | length l = 17 | case l of { 18 | ([]) -> 0; 19 | (:) _ l' -> Prelude.succ (length l')} 20 | 21 | sub :: Prelude.Int -> Prelude.Int -> Prelude.Int 22 | sub = (\n m -> Prelude.max 0 (n Prelude.- m)) 23 | 24 | nth_error :: (([]) a1) -> Prelude.Int -> Prelude.Maybe a1 25 | nth_error l n = 26 | (\fO fS n -> if n Prelude.== 0 then fO () else fS (n Prelude.- 1)) 27 | (\_ -> 28 | case l of { 29 | ([]) -> Prelude.Nothing; 30 | (:) x _ -> Prelude.Just x}) 31 | (\n0 -> 32 | case l of { 33 | ([]) -> Prelude.Nothing; 34 | (:) _ l0 -> nth_error l0 n0}) 35 | n 36 | 37 | map :: (a1 -> a2) -> (([]) a1) -> ([]) a2 38 | map f l = 39 | case l of { 40 | ([]) -> ([]); 41 | (:) a t -> (:) (f a) (map f t)} 42 | 43 | fold_right :: (a2 -> a1 -> a1) -> a1 -> (([]) a2) -> a1 44 | fold_right f a0 l = 45 | case l of { 46 | ([]) -> a0; 47 | (:) b t -> f b (fold_right f a0 t)} 48 | 49 | combine :: (([]) a1) -> (([]) a2) -> ([]) ((,) a1 a2) 50 | combine l l' = 51 | case l of { 52 | ([]) -> ([]); 53 | (:) x tl -> 54 | case l' of { 55 | ([]) -> ([]); 56 | (:) y tl' -> (:) ((,) x y) (combine tl tl')}} 57 | 58 | skipn :: Prelude.Int -> (([]) a1) -> ([]) a1 59 | skipn n l = 60 | (\fO fS n -> if n Prelude.== 0 then fO () else fS (n Prelude.- 1)) 61 | (\_ -> l) 62 | (\n0 -> case l of { 63 | ([]) -> ([]); 64 | (:) _ l0 -> skipn n0 l0}) 65 | n 66 | 67 | seq :: Prelude.Int -> Prelude.Int -> ([]) Prelude.Int 68 | seq start len = 69 | (\fO fS n -> if n Prelude.== 0 then fO () else fS (n Prelude.- 1)) 70 | (\_ -> ([])) 71 | (\len0 -> (:) start (seq (Prelude.succ start) len0)) 72 | len 73 | 74 | repeat :: a1 -> Prelude.Int -> ([]) a1 75 | repeat x n = 76 | (\fO fS n -> if n Prelude.== 0 then fO () else fS (n Prelude.- 1)) 77 | (\_ -> ([])) 78 | (\k -> (:) x (repeat x k)) 79 | n 80 | 81 | data LRegex a = 82 | Epsilon 83 | | CharClass (a -> Prelude.Bool) 84 | | Concat (LRegex a) (LRegex a) 85 | | Union (LRegex a) (LRegex a) 86 | | Star (LRegex a) 87 | | LookAhead (LRegex a) 88 | | LookBehind (LRegex a) 89 | | NegLookAhead (LRegex a) 90 | | NegLookBehind (LRegex a) 91 | 92 | zipWith :: (a1 -> a2 -> a3) -> (([]) a1) -> (([]) a2) -> ([]) a3 93 | zipWith f xs ys = 94 | map (\pat -> case pat of { 95 | (,) x y -> f x y}) (combine xs ys) 96 | 97 | transpose :: Prelude.Int -> (([]) (([]) a1)) -> ([]) (([]) a1) 98 | transpose len tapes = 99 | case tapes of { 100 | ([]) -> repeat ([]) len; 101 | (:) t ts -> zipWith (\x x0 -> (:) x x0) t (transpose len ts)} 102 | 103 | altr :: (Prelude.Maybe a1) -> (Prelude.Maybe a1) -> Prelude.Maybe a1 104 | altr a b = 105 | case b of { 106 | Prelude.Just _ -> b; 107 | Prelude.Nothing -> a} 108 | 109 | last_Some :: (([]) (Prelude.Maybe a1)) -> Prelude.Maybe a1 110 | last_Some l = 111 | fold_right altr Prelude.Nothing l 112 | 113 | opt_enum :: (([]) Prelude.Bool) -> ([]) (Prelude.Maybe Prelude.Int) 114 | opt_enum lb = 115 | zipWith (\b i -> 116 | case b of { 117 | Prelude.True -> Prelude.Just i; 118 | Prelude.False -> Prelude.Nothing}) lb (seq 0 (length lb)) 119 | 120 | find_largest_true :: (([]) Prelude.Bool) -> Prelude.Maybe Prelude.Int 121 | find_largest_true lb = 122 | last_Some (opt_enum lb) 123 | 124 | type Valuation = ([]) Prelude.Bool 125 | 126 | type Ostring a = (,) (([]) a) (([]) Valuation) 127 | 128 | oskipn :: Prelude.Int -> (Ostring a1) -> Ostring a1 129 | oskipn n s = 130 | (,) (skipn n (fst s)) (skipn (Prelude.min n (length (fst s))) (snd s)) 131 | 132 | orev :: (Ostring a1) -> Ostring a1 133 | orev s = 134 | (,) (Prelude.reverse (fst s)) (Prelude.reverse (snd s)) 135 | 136 | data ORegex a = 137 | OEpsilon 138 | | OCharClass (a -> Prelude.Bool) 139 | | OConcat (ORegex a) (ORegex a) 140 | | OUnion (ORegex a) (ORegex a) 141 | | OStar (ORegex a) 142 | | OQueryPos Prelude.Int 143 | | OQueryNeg Prelude.Int 144 | 145 | oWildCard :: ORegex a1 146 | oWildCard = 147 | OStar (OCharClass (\_ -> Prelude.True)) 148 | 149 | rPass :: (ORegex a1) -> ORegex a1 150 | rPass or = 151 | OConcat oWildCard or 152 | 153 | oreverse :: (ORegex a1) -> ORegex a1 154 | oreverse r = 155 | case r of { 156 | OConcat r1 r2 -> OConcat (oreverse r2) (oreverse r1); 157 | OUnion r1 r2 -> OUnion (oreverse r1) (oreverse r2); 158 | OStar r0 -> OStar (oreverse r0); 159 | x -> x} 160 | 161 | type Tape = ([]) Prelude.Bool 162 | 163 | data CMRegex a = 164 | MkCMRegex Prelude.Bool Prelude.Bool (CMRe a) 165 | data CMRe a = 166 | CMEpsilon 167 | | CMCharClass (a -> Prelude.Bool) 168 | | CMQueryPos Prelude.Int 169 | | CMQueryNeg Prelude.Int 170 | | CMConcat (CMRegex a) (CMRegex a) 171 | | CMUnion (CMRegex a) (CMRegex a) 172 | | CMStar (CMRegex a) 173 | 174 | cNullable :: (CMRegex a1) -> Prelude.Bool 175 | cNullable r = 176 | case r of { 177 | MkCMRegex b _ _ -> b} 178 | 179 | cFinal :: (CMRegex a1) -> Prelude.Bool 180 | cFinal r = 181 | case r of { 182 | MkCMRegex _ b _ -> b} 183 | 184 | cRe :: (CMRegex a1) -> CMRe a1 185 | cRe r = 186 | case r of { 187 | MkCMRegex _ _ re -> re} 188 | 189 | mkEpsilon :: CMRegex a1 190 | mkEpsilon = 191 | MkCMRegex Prelude.True Prelude.False CMEpsilon 192 | 193 | mkCharClass :: Prelude.Bool -> (a1 -> Prelude.Bool) -> CMRegex a1 194 | mkCharClass mark f = 195 | MkCMRegex Prelude.False mark (CMCharClass f) 196 | 197 | mkQueryPos :: Prelude.Bool -> Prelude.Int -> CMRegex a1 198 | mkQueryPos b n = 199 | MkCMRegex b Prelude.False (CMQueryPos n) 200 | 201 | mkQueryNeg :: Prelude.Bool -> Prelude.Int -> CMRegex a1 202 | mkQueryNeg b n = 203 | MkCMRegex b Prelude.False (CMQueryNeg n) 204 | 205 | mkConcat :: (CMRegex a1) -> (CMRegex a1) -> CMRegex a1 206 | mkConcat r1 r2 = 207 | MkCMRegex ((Prelude.&&) (cNullable r1) (cNullable r2)) 208 | ((Prelude.||) ((Prelude.&&) (cFinal r1) (cNullable r2)) (cFinal r2)) 209 | (CMConcat r1 r2) 210 | 211 | mkUnion :: (CMRegex a1) -> (CMRegex a1) -> CMRegex a1 212 | mkUnion r1 r2 = 213 | MkCMRegex ((Prelude.||) (cNullable r1) (cNullable r2)) 214 | ((Prelude.||) (cFinal r1) (cFinal r2)) (CMUnion r1 r2) 215 | 216 | mkStar :: (CMRegex a1) -> CMRegex a1 217 | mkStar r = 218 | MkCMRegex Prelude.True (cFinal r) (CMStar r) 219 | 220 | syncVal :: Valuation -> (CMRegex a1) -> CMRegex a1 221 | syncVal v r = 222 | case cRe r of { 223 | CMEpsilon -> mkEpsilon; 224 | CMCharClass f -> mkCharClass (cFinal r) f; 225 | CMQueryPos n -> 226 | mkQueryPos 227 | (case nth_error v n of { 228 | Prelude.Just b -> b; 229 | Prelude.Nothing -> Prelude.False}) n; 230 | CMQueryNeg n -> 231 | mkQueryNeg 232 | (case nth_error v n of { 233 | Prelude.Just b -> 234 | case b of { 235 | Prelude.True -> Prelude.False; 236 | Prelude.False -> Prelude.True}; 237 | Prelude.Nothing -> Prelude.False}) n; 238 | CMConcat r1 r2 -> mkConcat (syncVal v r1) (syncVal v r2); 239 | CMUnion r1 r2 -> mkUnion (syncVal v r1) (syncVal v r2); 240 | CMStar r1 -> mkStar (syncVal v r1)} 241 | 242 | cRead :: a1 -> (CMRegex a1) -> CMRegex a1 243 | cRead a r = 244 | case cRe r of { 245 | CMCharClass f -> mkCharClass ((Prelude.&&) (cFinal r) (f a)) f; 246 | CMConcat r1 r2 -> mkConcat (cRead a r1) (cRead a r2); 247 | CMUnion r1 r2 -> mkUnion (cRead a r1) (cRead a r2); 248 | CMStar r1 -> mkStar (cRead a r1); 249 | _ -> r} 250 | 251 | cFollow :: Prelude.Bool -> (CMRegex a1) -> CMRegex a1 252 | cFollow b r = 253 | case cRe r of { 254 | CMEpsilon -> mkEpsilon; 255 | CMCharClass f -> mkCharClass b f; 256 | CMConcat r1 r2 -> 257 | let {b1 = (Prelude.||) (cFinal r1) ((Prelude.&&) b (cNullable r1))} in 258 | mkConcat (cFollow b r1) (cFollow b1 r2); 259 | CMUnion r1 r2 -> mkUnion (cFollow b r1) (cFollow b r2); 260 | CMStar r1 -> mkStar (cFollow ((Prelude.||) b (cFinal r1)) r1); 261 | _ -> r} 262 | 263 | toCMarked :: (ORegex a1) -> CMRegex a1 264 | toCMarked or = 265 | case or of { 266 | OEpsilon -> mkEpsilon; 267 | OCharClass f -> mkCharClass Prelude.False f; 268 | OConcat or1 or2 -> mkConcat (toCMarked or1) (toCMarked or2); 269 | OUnion or1 or2 -> mkUnion (toCMarked or1) (toCMarked or2); 270 | OStar or1 -> mkStar (toCMarked or1); 271 | OQueryPos n -> mkQueryPos Prelude.False n; 272 | OQueryNeg n -> mkQueryNeg Prelude.False n} 273 | 274 | cScanMatchAux :: (CMRegex a1) -> (([]) a1) -> (([]) Valuation) -> ([]) 275 | Prelude.Bool 276 | cScanMatchAux cmr w o = 277 | let {b = cFinal cmr} in 278 | case w of { 279 | ([]) -> (:) b ([]); 280 | (:) a0 w' -> 281 | case o of { 282 | ([]) -> (:) b ([]); 283 | (:) v1 o' -> 284 | let {cmrNew = cFollow Prelude.False cmr} in 285 | let {cmrNew' = cRead a0 cmrNew} in 286 | let {cmrNew'' = syncVal v1 cmrNew'} in 287 | (:) b (cScanMatchAux cmrNew'' w' o')}} 288 | 289 | cScanMatch :: (ORegex a1) -> (Ostring a1) -> ([]) Prelude.Bool 290 | cScanMatch or os = 291 | let {cmr = toCMarked or} in 292 | case os of { 293 | (,) l l0 -> 294 | case l of { 295 | ([]) -> 296 | case l0 of { 297 | ([]) -> ([]); 298 | (:) o0 l1 -> 299 | case l1 of { 300 | ([]) -> (:) (cNullable (syncVal o0 cmr)) ([]); 301 | (:) _ _ -> ([])}}; 302 | (:) a0 w' -> 303 | case l0 of { 304 | ([]) -> ([]); 305 | (:) o0 l1 -> 306 | case l1 of { 307 | ([]) -> ([]); 308 | (:) o1 o' -> 309 | let {b0 = cNullable (syncVal o0 cmr)} in 310 | let {cmr0 = cFollow Prelude.True (syncVal o0 cmr)} in 311 | let {cmr0' = cRead a0 cmr0} in 312 | let {cmr0'' = syncVal o1 cmr0'} in 313 | (:) b0 (cScanMatchAux cmr0'' w' o')}}}} 314 | 315 | absEvalAux :: ((ORegex a1) -> (Ostring a1) -> Tape) -> (([]) a1) -> (([]) 316 | a1) -> Prelude.Int -> (LRegex a1) -> Prelude.Int -> (([]) 317 | Tape) -> (,) ((,) (ORegex a1) Prelude.Int) (([]) Tape) 318 | absEvalAux scanMatch0 w wrev len r i ts = 319 | case r of { 320 | Epsilon -> (,) ((,) OEpsilon 0) ts; 321 | CharClass p -> (,) ((,) (OCharClass p) 0) ts; 322 | Concat r1 r2 -> 323 | case absEvalAux scanMatch0 w wrev len r1 i ts of { 324 | (,) p ts' -> 325 | case p of { 326 | (,) o1 n1 -> 327 | case absEvalAux scanMatch0 w wrev len r2 ((Prelude.+) i n1) ts' of { 328 | (,) p0 ts'' -> 329 | case p0 of { 330 | (,) o2 n2 -> (,) ((,) (OConcat o1 o2) ((Prelude.+) n1 n2)) ts''}}}}; 331 | Union r1 r2 -> 332 | case absEvalAux scanMatch0 w wrev len r1 i ts of { 333 | (,) p ts' -> 334 | case p of { 335 | (,) o1 n1 -> 336 | case absEvalAux scanMatch0 w wrev len r2 ((Prelude.+) i n1) ts' of { 337 | (,) p0 ts'' -> 338 | case p0 of { 339 | (,) o2 n2 -> (,) ((,) (OUnion o1 o2) ((Prelude.+) n1 n2)) ts''}}}}; 340 | Star r0 -> 341 | case absEvalAux scanMatch0 w wrev len r0 i ts of { 342 | (,) p ts' -> case p of { 343 | (,) o n -> (,) ((,) (OStar o) n) ts'}}; 344 | LookAhead r0 -> 345 | case absEvalAux scanMatch0 w wrev len r0 0 ([]) of { 346 | (,) p ts_inner -> 347 | case p of { 348 | (,) o _ -> 349 | let {vs = transpose len (Prelude.reverse ts_inner)} in 350 | let { 351 | newtape = Prelude.reverse 352 | (scanMatch0 (oreverse o) ((,) wrev 353 | (Prelude.reverse vs)))} 354 | in 355 | (,) ((,) (OQueryPos i) (Prelude.succ 0)) ((:) newtape ts)}}; 356 | LookBehind r0 -> 357 | case absEvalAux scanMatch0 w wrev len r0 0 ([]) of { 358 | (,) p ts_inner -> 359 | case p of { 360 | (,) o _ -> 361 | let {vs = transpose len (Prelude.reverse ts_inner)} in 362 | let {newtape = scanMatch0 o ((,) w vs)} in 363 | (,) ((,) (OQueryPos i) (Prelude.succ 0)) ((:) newtape ts)}}; 364 | NegLookAhead r0 -> 365 | case absEvalAux scanMatch0 w wrev len r0 0 ([]) of { 366 | (,) p ts_inner -> 367 | case p of { 368 | (,) o _ -> 369 | let {vs = transpose len (Prelude.reverse ts_inner)} in 370 | let { 371 | newtape = Prelude.reverse 372 | (scanMatch0 (oreverse o) ((,) wrev 373 | (Prelude.reverse vs)))} 374 | in 375 | (,) ((,) (OQueryNeg i) (Prelude.succ 0)) ((:) newtape ts)}}; 376 | NegLookBehind r0 -> 377 | case absEvalAux scanMatch0 w wrev len r0 0 ([]) of { 378 | (,) p ts_inner -> 379 | case p of { 380 | (,) o _ -> 381 | let {vs = transpose len (Prelude.reverse ts_inner)} in 382 | let {newtape = scanMatch0 o ((,) w vs)} in 383 | (,) ((,) (OQueryNeg i) (Prelude.succ 0)) ((:) newtape ts)}}} 384 | 385 | absEval :: ((ORegex a1) -> (Ostring a1) -> Tape) -> (([]) a1) -> (LRegex 386 | a1) -> (,) (ORegex a1) (([]) Valuation) 387 | absEval scanMatch0 w r = 388 | let {wrev = Prelude.reverse w} in 389 | let {len = length w} in 390 | case absEvalAux scanMatch0 w wrev ((Prelude.+) len (Prelude.succ 0)) r 0 391 | ([]) of { 392 | (,) p vs -> 393 | case p of { 394 | (,) o _ -> (,) o 395 | (transpose ((Prelude.+) len (Prelude.succ 0)) (Prelude.reverse vs))}} 396 | 397 | scanMatchWith :: ((ORegex a1) -> (Ostring a1) -> Tape) -> (([]) a1) -> 398 | (LRegex a1) -> Tape 399 | scanMatchWith scanMatchONFA w r = 400 | case absEval scanMatchONFA w r of { 401 | (,) o vs -> scanMatchONFA o ((,) w vs)} 402 | 403 | scanMatch :: (LRegex a1) -> (([]) a1) -> Tape 404 | scanMatch r w = 405 | scanMatchWith cScanMatch w r 406 | 407 | llmatch :: (LRegex a1) -> (([]) a1) -> Prelude.Maybe 408 | ((,) Prelude.Int Prelude.Int) 409 | llmatch r w = 410 | case absEval cScanMatch w r of { 411 | (,) or vs -> 412 | let {bw_tape = cScanMatch (rPass (oreverse or)) (orev ((,) w vs))} in 413 | let {lendO = find_largest_true bw_tape} in 414 | case lendO of { 415 | Prelude.Just lend' -> 416 | let {lend = sub (length w) lend'} in 417 | let {fw_tape = cScanMatch or (oskipn lend ((,) w vs))} in 418 | let {d = find_largest_true fw_tape} in 419 | case d of { 420 | Prelude.Just d0 -> Prelude.Just ((,) lend d0); 421 | Prelude.Nothing -> Prelude.Nothing}; 422 | Prelude.Nothing -> Prelude.Nothing}} 423 | 424 | -------------------------------------------------------------------------------- /haskell/src/Parser/Parser.hs: -------------------------------------------------------------------------------- 1 | {- 2 | 3 | Taken from the Skylighting Project 4 | https://github.com/jgm/skylighting/blob/master/skylighting-core/src/Regex/KDE/Compile.hs 5 | 6 | -} 7 | 8 | {-# LANGUAGE CPP #-} 9 | {-# LANGUAGE LambdaCase #-} 10 | {-# LANGUAGE BangPatterns #-} 11 | {-# LANGUAGE OverloadedStrings #-} 12 | module Parser.Parser 13 | (compileRegex, toLRegex, stringToLRe) 14 | where 15 | 16 | import Data.Word (Word8) 17 | import qualified Data.ByteString as B 18 | import Data.ByteString (ByteString) 19 | import qualified Data.ByteString.UTF8 as U 20 | import Safe 21 | import Data.Attoparsec.ByteString as A hiding (match) 22 | import Data.Char 23 | import Control.Applicative 24 | import Data.Functor 25 | import Parser.Types 26 | import Control.Monad 27 | import Control.Monad.State.Strict 28 | #if !MIN_VERSION_base(4,11,0) 29 | import Data.Semigroup ((<>)) 30 | #endif 31 | 32 | import qualified Extracted as E (LRegex (..)) 33 | import Utils 34 | 35 | 36 | stringToLRe :: String -> Either String (E.LRegex Char) 37 | stringToLRe re = (compileRegex True . U.fromString $ re) >>= toLRegex 38 | 39 | toLRegex :: Regex -> Either String (E.LRegex Char) 40 | toLRegex MatchNull = Right E.Epsilon 41 | toLRegex MatchAnyChar = Right $ E.CharClass (const True) 42 | toLRegex (MatchChar f) = Right $ E.CharClass f 43 | toLRegex (MatchConcat e f) = E.Concat <$> toLRegex e <*> toLRegex f 44 | toLRegex (MatchAlt e f) = E.Union <$> toLRegex e <*> toLRegex f 45 | toLRegex (MatchSome e) = atLeastN 1 <$> toLRegex e 46 | toLRegex (MatchCount lo hi e) = repeatMN lo hi <$> toLRegex e 47 | toLRegex (MatchCountUnbounded lo e) = atLeastN lo <$> toLRegex e 48 | toLRegex AssertWordBoundary = Right $ E.LookAhead $ E.CharClass (isWordChar) `E.Concat` E.CharClass ( not . isWordChar) 49 | toLRegex AssertBeginning = Right $ E.LookBehind E.Epsilon 50 | toLRegex AssertEnd = Right $ E.LookAhead E.Epsilon 51 | toLRegex (AssertPositive Forward e) = E.LookAhead <$> toLRegex e 52 | toLRegex (AssertPositive Backward e) = E.LookBehind <$> toLRegex e 53 | toLRegex (AssertNegative Forward e) = E.NegLookAhead <$> toLRegex e 54 | toLRegex (AssertNegative Backward e) = E.NegLookBehind <$> toLRegex e 55 | toLRegex (MatchDynamic _) = Left "dynamic regexes not supported" 56 | toLRegex (MatchCapture _ e) = toLRegex e 57 | toLRegex (MatchCaptured _) = Left "backreferences not supported" 58 | toLRegex (Subroutine _) = Left "subroutines not supported" 59 | toLRegex (Lazy e) = toLRegex e 60 | toLRegex (Possessive e) = toLRegex e 61 | 62 | 63 | -- | Compile a UTF-8 encoded ByteString as a Regex. If the first 64 | -- parameter is True, then the Regex will be case sensitive. 65 | compileRegex :: Bool -> ByteString -> Either String Regex 66 | compileRegex caseSensitive bs = 67 | let !res = parseOnly (evalStateT parser 0) bs 68 | in res 69 | where 70 | parser = do 71 | !re <- pRegex caseSensitive 72 | (re <$ lift A.endOfInput) <|> 73 | do rest <- lift A.takeByteString 74 | fail $ "parse error at byte position " ++ 75 | show (B.length bs - B.length rest) 76 | 77 | type RParser = StateT Int Parser 78 | 79 | pRegex :: Bool -> RParser Regex 80 | pRegex caseSensitive = 81 | option MatchNull $ 82 | foldr MatchAlt 83 | <$> (pAltPart caseSensitive) 84 | <*> (many $ lift (char '|') *> (pAltPart caseSensitive <|> pure mempty)) 85 | 86 | pAltPart :: Bool -> RParser Regex 87 | pAltPart caseSensitive = mconcat <$> many1 (pRegexPart caseSensitive) 88 | 89 | char :: Char -> Parser Char 90 | char c = 91 | c <$ satisfy (== fromIntegral (ord c)) 92 | 93 | pRegexPart :: Bool -> RParser Regex 94 | pRegexPart caseSensitive = 95 | (lift (pRegexChar caseSensitive) <|> pParenthesized caseSensitive) >>= 96 | lift . pSuffix 97 | 98 | pParenthesized :: Bool -> RParser Regex 99 | pParenthesized caseSensitive = do 100 | _ <- lift (satisfy (== 40)) 101 | -- pcrepattern says: A group that starts with (?| resets the capturing 102 | -- parentheses numbers in each alternative. 103 | resetCaptureNumbers <- option False (True <$ lift (string "?|")) 104 | modifier <- if resetCaptureNumbers 105 | then return id 106 | else lift (satisfy (== 63) *> pGroupModifiers) 107 | <|> (MatchCapture <$> (modify (+ 1) *> get)) 108 | currentCaptureNumber <- get 109 | contents <- option MatchNull $ 110 | foldr MatchAlt 111 | <$> (pAltPart caseSensitive) 112 | <*> (many $ lift (char '|') *> 113 | (((if resetCaptureNumbers 114 | then put currentCaptureNumber 115 | else return ()) >> pAltPart caseSensitive) <|> pure mempty)) 116 | _ <- lift (satisfy (== 41)) 117 | return $ modifier contents 118 | 119 | pGroupModifiers :: Parser (Regex -> Regex) 120 | pGroupModifiers = 121 | (id <$ char ':') 122 | <|> 123 | do dir <- option Forward $ Backward <$ char '<' 124 | (AssertPositive dir <$ char '=') <|> (AssertNegative dir <$ char '!') 125 | <|> 126 | do n <- satisfy (\d -> d >= 48 && d <= 57) 127 | return (\_ -> Subroutine (fromIntegral n - 48)) 128 | <|> 129 | do _ <- satisfy (== 82) -- R 130 | return (\_ -> Subroutine 0) 131 | 132 | pSuffix :: Regex -> Parser Regex 133 | pSuffix re = option re $ do 134 | w <- satisfy (\x -> x == 42 || x == 43 || x == 63 || x == 123) 135 | (case w of 136 | 42 -> return $ MatchAlt (MatchSome re) MatchNull 137 | 43 -> return $ MatchSome re 138 | 63 -> return $ MatchAlt re MatchNull 139 | 123 -> do 140 | let isDig x = x >= 48 && x < 58 141 | minn <- option Nothing $ readMay . U.toString <$> A.takeWhile isDig 142 | comma <- option False (char ',' $> True) 143 | maxn <- case comma of 144 | False -> pure Nothing 145 | True -> option Nothing (readMay . U.toString <$> A.takeWhile isDig) 146 | _ <- char '}' 147 | case (minn, comma, maxn) of 148 | (Nothing, _, Nothing) -> mzero 149 | (Just n, True, Nothing) -> return $! atleast n re 150 | (Nothing, True, Just n) -> return $! atmost n re 151 | (Just n, False, Nothing) -> return $! between n n re 152 | (Just m, True, Just n) -> return $! between m n re 153 | _ -> fail "pSuffix encountered impossible byte") >>= pQuantifierModifier 154 | where 155 | atmost 0 _ = MatchNull 156 | atmost n r = MatchCount 0 n r 157 | 158 | between m n r = MatchCount m n r 159 | 160 | atleast n r = MatchCountUnbounded n r 161 | 162 | pQuantifierModifier :: Regex -> Parser Regex 163 | pQuantifierModifier re = option re $ 164 | (Possessive re <$ satisfy (== 43)) <|> 165 | (Lazy re <$ satisfy (==63)) 166 | 167 | pRegexChar :: Bool -> Parser Regex 168 | pRegexChar caseSensitive = do 169 | w <- satisfy $ const True 170 | case w of 171 | 46 -> return MatchAnyChar 172 | 37 -> (do -- dynamic %1 %2 173 | ds <- A.takeWhile1 (\x -> x >= 48 && x <= 57) 174 | case readMay (U.toString ds) of 175 | Just !n -> return $ MatchDynamic n 176 | Nothing -> fail "not a number") 177 | <|> return (MatchChar (== '%')) 178 | 92 -> pRegexEscapedChar 179 | 36 -> return AssertEnd 180 | 94 -> return AssertBeginning 181 | 91 -> pRegexCharClass 182 | _ | w < 128 183 | , not (isSpecial w) 184 | -> do let c = chr $ fromIntegral w 185 | return $! MatchChar $ 186 | if caseSensitive 187 | then (== c) 188 | else (\d -> toLower d == toLower c) 189 | | w >= 0xc0 -> do 190 | rest <- case w of 191 | _ | w >= 0xf0 -> A.take 3 192 | | w >= 0xe0 -> A.take 2 193 | | otherwise -> A.take 1 194 | case U.uncons (B.cons w rest) of 195 | Just (d, _) -> return $! MatchChar $ 196 | if caseSensitive 197 | then (== d) 198 | else (\e -> toLower e == toLower d) 199 | Nothing -> fail "could not decode as UTF8" 200 | | otherwise -> mzero 201 | 202 | pRegexEscapedChar :: Parser Regex 203 | pRegexEscapedChar = do 204 | c <- anyChar 205 | (case c of 206 | 'b' -> return AssertWordBoundary 207 | '{' -> do -- captured pattern: \1 \2 \{12} 208 | ds <- A.takeWhile1 (\x -> x >= 48 && x <= 57) 209 | _ <- char '}' 210 | case readMay (U.toString ds) of 211 | Just !n -> return $ MatchCaptured $ n 212 | Nothing -> fail "not a number" 213 | 'd' -> return $ MatchChar isDigit 214 | 'D' -> return $ MatchChar (not . isDigit) 215 | 's' -> return $ MatchChar isSpace 216 | 'S' -> return $ MatchChar (not . isSpace) 217 | 'w' -> return $ MatchChar isWordChar 218 | 'W' -> return $ MatchChar (not . isWordChar) 219 | _ | isDigit c -> 220 | return $! MatchCaptured (ord c - ord '0') 221 | | otherwise -> mzero) <|> (MatchChar . (==) <$> pEscaped c) 222 | 223 | pEscaped :: Char -> Parser Char 224 | pEscaped c = 225 | case c of 226 | '\\' -> return c 227 | 'a' -> return '\a' 228 | 'f' -> return '\f' 229 | 'n' -> return '\n' 230 | 'r' -> return '\r' 231 | 't' -> return '\t' 232 | 'v' -> return '\v' 233 | 'x' -> do -- \xhh matches hex hh, \x{h+} matches hex h+ 234 | ds <- (satisfy (== 123) *> A.takeWhile (/= 125) <* satisfy (== 125)) 235 | <|> ( do 236 | x1 <- satisfy (inClass "a-fA-F0-9") 237 | x2 <- Just <$> satisfy (inClass "a-fA-F0-9") <|> pure Nothing 238 | case x2 of 239 | Nothing -> return $ B.pack [x1] 240 | Just x2' -> if inClass "a-fA-F0-9" x2' 241 | then return $ B.pack [x1, x2'] 242 | else return $ B.pack [x1] 243 | ) 244 | case readMay ("'\\x" ++ U.toString ds ++ "'") of 245 | Just x -> return x 246 | Nothing -> fail "invalid hex character escape" 247 | '0' -> do -- \0ooo matches octal ooo 248 | ds <- A.take 3 249 | case readMay ("'\\o" ++ U.toString ds ++ "'") of 250 | Just x -> return x 251 | Nothing -> fail "invalid octal character escape" 252 | _ | c >= '1' && c <= '7' -> do 253 | -- \123 matches octal 123, \1 matches octal 1 254 | let octalDigitScanner s w 255 | | s < 3, w >= 48 && w <= 55 256 | = Just (s + 1) -- digits 0-7 257 | | otherwise = Nothing 258 | ds <- A.scan (1 :: Int) octalDigitScanner 259 | case readMay ("'\\o" ++ [c] ++ U.toString ds ++ "'") of 260 | Just x -> return x 261 | Nothing -> fail "invalid octal character escape" 262 | 'z' -> do -- \zhhhh matches unicode hex char hhhh 263 | ds <- A.take 4 264 | case readMay ("'\\x" ++ U.toString ds ++ "'") of 265 | Just x -> return x 266 | Nothing -> fail "invalid hex character escape" 267 | _ -> return c 268 | 269 | pRegexCharClass :: Parser Regex 270 | pRegexCharClass = do 271 | negated <- option False $ True <$ satisfy (== 94) -- '^' 272 | let getEscapedClass = do 273 | _ <- satisfy (== 92) -- backslash 274 | (isDigit <$ char 'd') 275 | <|> (not . isDigit <$ char 'D') 276 | <|> (isSpace <$ char 's') 277 | <|> (not . isSpace <$ char 'S') 278 | <|> (isWordChar <$ char 'w') 279 | <|> (not . isWordChar <$ char 'W') 280 | let getPosixClass = do 281 | _ <- string "[:" 282 | localNegated <- option False $ True <$ satisfy (== 94) -- '^' 283 | res <- (isAlphaNum <$ string "alnum") 284 | <|> (isAlpha <$ string "alpha") 285 | <|> (isAscii <$ string "ascii") 286 | <|> ((\c -> isSpace c && c `notElem` ['\n','\r','\f','\v']) <$ 287 | string "blank") 288 | <|> (isControl <$ string "cntrl") 289 | <|> ((\c -> isPrint c || isSpace c) <$ string "graph:") 290 | <|> (isLower <$ string "lower") 291 | <|> (isUpper <$ string "upper") 292 | <|> (isPrint <$ string "print") 293 | <|> (isPunctuation <$ string "punct") 294 | <|> (isSpace <$ string "space") 295 | <|> ((\c -> isAlphaNum c || 296 | generalCategory c == ConnectorPunctuation) 297 | <$ string "word:") 298 | <|> (isHexDigit <$ string "xdigit") 299 | _ <- string ":]" 300 | return $! if localNegated then not . res else res 301 | let getC = (satisfy (== 92) *> anyChar >>= pEscaped) <|> 302 | (chr . fromIntegral <$> satisfy (\x -> x /= 92 && x /= 93)) -- \ ] 303 | let getCRange = do 304 | c <- getC 305 | (\d -> (\x -> x >= c && x <= d)) <$> (char '-' *> getC) <|> 306 | return (== c) 307 | brack <- option [] $ [(==']')] <$ char ']' 308 | fs <- many (getEscapedClass <|> getPosixClass <|> getCRange) 309 | _ <- satisfy (== 93) -- ] 310 | let f c = any ($ c) $ brack ++ fs 311 | return $! MatchChar (if negated then (not . f) else f) 312 | 313 | anyChar :: Parser Char 314 | anyChar = do 315 | w <- satisfy (const True) 316 | return $! chr $ fromIntegral w 317 | 318 | isSpecial :: Word8 -> Bool 319 | isSpecial 92 = True -- '\\' 320 | isSpecial 63 = True -- '?' 321 | isSpecial 42 = True -- '*' 322 | isSpecial 43 = True -- '+' 323 | -- isSpecial 123 = True -- '{' -- this is okay except in suffixes 324 | isSpecial 91 = True -- '[' 325 | isSpecial 93 = True -- ']' 326 | isSpecial 37 = True -- '%' 327 | isSpecial 40 = True -- '(' 328 | isSpecial 41 = True -- ')' 329 | isSpecial 124 = True -- '|' 330 | isSpecial 46 = True -- '.' 331 | isSpecial 36 = True -- '$' 332 | isSpecial 94 = True -- '^' 333 | isSpecial _ = False 334 | -------------------------------------------------------------------------------- /haskell/src/Parser/Types.hs: -------------------------------------------------------------------------------- 1 | {- 2 | 3 | Taken from the Skylighting Project 4 | https://github.com/jgm/skylighting/blob/master/skylighting-core/src/Regex/KDE/Regex.hs 5 | 6 | -} 7 | 8 | {-# LANGUAGE CPP #-} 9 | {-# LANGUAGE OverloadedStrings #-} 10 | {-# LANGUAGE DeriveDataTypeable #-} 11 | 12 | module Parser.Types where 13 | 14 | import Data.Char 15 | import Data.Word 16 | 17 | #if !MIN_VERSION_base(4,11,0) 18 | import Data.Semigroup ((<>), Semigroup) 19 | #endif 20 | 21 | data Direction = Forward | Backward 22 | deriving (Show, Eq) 23 | 24 | data Regex = 25 | MatchAnyChar | 26 | MatchDynamic !Int | 27 | MatchChar (Char -> Bool) | 28 | MatchSome !Regex | 29 | MatchAlt !Regex !Regex | 30 | MatchConcat !Regex !Regex | 31 | MatchCapture !Int !Regex | 32 | MatchCount !Int !Int !Regex | 33 | MatchCountUnbounded !Int !Regex | 34 | MatchCaptured !Int | 35 | AssertWordBoundary | 36 | AssertBeginning | 37 | AssertEnd | 38 | AssertPositive !Direction !Regex | 39 | AssertNegative !Direction !Regex | 40 | Possessive !Regex | 41 | Lazy !Regex | 42 | Subroutine !Int | 43 | MatchNull 44 | 45 | instance Show Regex where 46 | show MatchAnyChar = "MatchAnyChar" 47 | show (MatchDynamic i) = "MatchDynamic " <> show i 48 | show (MatchChar _) = "(MatchChar )" 49 | show (MatchSome re) = "(MatchSome " <> show re <> ")" 50 | show (MatchAlt r1 r2) = "(MatchAlt " <> show r1 <> " " <> show r2 <> ")" 51 | show (MatchConcat r1 r2) = "(MatchConcat " <> show r1 <> " " <> show r2 <> 52 | ")" 53 | show (MatchCapture i re) = "(MatchCapture " <> show i <> " " <> 54 | show re <> ")" 55 | show (MatchCaptured n) = "(MatchCaptured " <> show n <> ")" 56 | show AssertWordBoundary = "AssertWordBoundary" 57 | show AssertBeginning = "AssertBeginning" 58 | show AssertEnd = "AssertEnd" 59 | show (AssertPositive dir re) = "(AssertPositive " <> show dir <> " " <> 60 | show re <> ")" 61 | show (AssertNegative dir re) = "(AssertNegativeLookahead " <> 62 | show dir <> " " <> show re <> ")" 63 | show (Possessive re) = "(Possessive " <> show re <> ")" 64 | show (Lazy re) = "(Lazy " <> show re <> ")" 65 | show (Subroutine i) = "(Subroutine " <> show i <> ")" 66 | show MatchNull = "MatchNull" 67 | 68 | instance Semigroup Regex where 69 | (<>) = MatchConcat 70 | 71 | instance Monoid Regex where 72 | mempty = MatchNull 73 | mappend = (<>) 74 | 75 | isWordChar :: Char -> Bool 76 | isWordChar c = isAlphaNum c || generalCategory c == ConnectorPunctuation -------------------------------------------------------------------------------- /haskell/src/Utils.hs: -------------------------------------------------------------------------------- 1 | module Utils where 2 | 3 | import Data.Char (isAlphaNum, isLower, isUpper, isDigit) 4 | 5 | import Extracted (LRegex (..), llmatch, scanMatch) 6 | 7 | fromWord :: String -> LRegex Char 8 | fromWord = foldl1 Concat . map (CharClass . (==)) 9 | 10 | isWord :: (Char -> Bool) -- \w 11 | isWord = \c -> c == '_' || isAlphaNum c 12 | 13 | wildCard :: LRegex a 14 | wildCard = Star . CharClass $ const True 15 | 16 | multiConcat :: [LRegex a] -> LRegex a 17 | multiConcat rs 18 | | null rs = Epsilon 19 | | otherwise = foldr1 Concat rs 20 | 21 | multiUnion :: [LRegex a] -> LRegex a 22 | multiUnion = foldr1 Union 23 | 24 | repeatN :: Int -> LRegex a -> LRegex a 25 | repeatN n r = multiConcat $ replicate n r 26 | 27 | repeatMN :: Int -> Int -> LRegex a -> LRegex a 28 | repeatMN m n r = multiUnion [repeatN i r | i <- [m..n]] 29 | 30 | atLeastN :: Int -> LRegex a -> LRegex a 31 | atLeastN n r = repeatN n r `Concat` Star r 32 | 33 | -- this construct only works as a conjunction if used at the top level 34 | 35 | topAnd :: LRegex a -> LRegex a -> LRegex a 36 | topAnd r1 r2 = LookAhead r2 `Concat` r1 37 | 38 | topAndMulti :: [LRegex a] -> LRegex a 39 | topAndMulti = foldr1 topAnd 40 | 41 | topMinus :: LRegex a -> LRegex a -> LRegex a 42 | topMinus r1 r2 = NegLookAhead r2 `Concat` r1 43 | 44 | leftMostMaximalMatch :: LRegex a -> [a] -> Maybe [a] 45 | leftMostMaximalMatch r w = 46 | case llmatch r w of 47 | Just (n, d) -> Just . take d . drop n $ w 48 | Nothing -> Nothing 49 | 50 | match :: LRegex a -> [a] -> Bool 51 | match r w = last (scanMatch r w) 52 | 53 | lreSize :: LRegex a -> Int 54 | lreSize Epsilon = 1 55 | lreSize (CharClass _) = 1 56 | lreSize (Concat r1 r2) = 1 + lreSize r1 + lreSize r2 57 | lreSize (Union r1 r2) = 1 + lreSize r1 + lreSize r2 58 | lreSize (Star r) = 1 + lreSize r 59 | lreSize (LookAhead r) = 1 + lreSize r 60 | lreSize (LookBehind r) = 1 + lreSize r 61 | lreSize (NegLookAhead r) = 1 + lreSize r 62 | lreSize (NegLookBehind r) = 1 + lreSize r 63 | -------------------------------------------------------------------------------- /haskell/test/LookaroundTests/Tests.hs: -------------------------------------------------------------------------------- 1 | module LookaroundTests.Tests where 2 | 3 | import Test.Hspec 4 | 5 | import Data.Char (isAlphaNum, isLower, isUpper, isDigit) 6 | 7 | import Utils 8 | import Extracted (LRegex (..), scanMatch) 9 | 10 | foo = fromWord "foo" 11 | bar = fromWord "bar" 12 | 13 | foolookbar = foo `Concat` LookAhead bar 14 | lookfoobar = wildCard `Concat` LookBehind foo `Concat` bar 15 | lookfoobar2 = LookBehind foo `Concat` bar 16 | 17 | 18 | {- 19 | From: http://www.rexegg.com/regex-lookarounds.html 20 | 21 | 1. The password must have between six and ten word characters \w 22 | 2. It must include at least one lowercase character [a-z] 23 | 3. It must include at least three uppercase characters [A-Z] 24 | 4. It must include at least one digit 25 | -} 26 | 27 | condition1 = repeatMN 6 10 (CharClass isWord) 28 | condition2 = wildCard `Concat` (CharClass isLower) `Concat` wildCard 29 | condition3 = repeatN 3 (wildCard `Concat` (CharClass isUpper)) `Concat` wildCard 30 | condition4 = wildCard `Concat` (CharClass isDigit) `Concat` wildCard 31 | password = topAndMulti [condition1, condition2, condition3, condition4] 32 | 33 | -- A lowercase word that does not use the letter 'q' 34 | lowercaseWord = Star . CharClass $ isLower 35 | containsq = wildCard `Concat` (CharClass (== 'q')) `Concat` wildCard 36 | wordNoQ = NegLookAhead containsq `Concat` lowercaseWord 37 | 38 | -- a word that does not end with END 39 | endswithEND = wildCard `Concat` fromWord "END" 40 | doesnotEndwithEND = wildCard `Concat` NegLookBehind endswithEND 41 | 42 | -- there are at least 3 uppercase letters which are immediately followed by a vowel 43 | isVowel = CharClass $ (`elem` "AEIOUaeiou") 44 | upperthenvowel = (CharClass isUpper) `Concat` LookAhead (isVowel `Concat` wildCard) 45 | atleast3upperthenvowel = repeatN 3 (wildCard `Concat` upperthenvowel) `Concat` wildCard 46 | 47 | -- a word that starts with START 48 | startswithSTART = wildCard `Concat` LookBehind (fromWord "START") `Concat` wildCard 49 | 50 | -- text between two underscores 51 | twoUnderscores = fromWord "__" 52 | textBetweenUnderscores = wildCard `Concat` LookBehind twoUnderscores `Concat` wildCard `Concat` LookAhead twoUnderscores `Concat` wildCard 53 | textBetweenUnderscores2 = LookBehind twoUnderscores `Concat` wildCard `Concat` LookAhead twoUnderscores 54 | 55 | -- a word whose second last character is X, but does not end withXY 56 | xnoxy = wildCard `Concat` LookAhead (fromWord "X" `Concat` NegLookAhead (fromWord "Y") `Concat` (CharClass $ const True) ) `Concat` wildCard 57 | 58 | foobarTests :: SpecWith () 59 | foobarTests = do 60 | describe "Some tests for foo(?=bar)" $ do 61 | it "works correctly on foobar" $ do 62 | scanMatch foolookbar "foobar" `shouldBe` [False, False, False, True, False, False, False] 63 | it "works correctly on foobaz" $ do 64 | scanMatch foolookbar "foobaz" `shouldBe` [False, False, False, False, False, False, False] 65 | it "correctly extracts foo" $ do 66 | leftMostMaximalMatch foolookbar "foobar" `shouldBe` Just "foo" 67 | leftMostMaximalMatch foolookbar "foobaz" `shouldBe` Nothing 68 | describe "Some tests for .*(?<=foo)bar" $ do 69 | it "works correctly on foobar" $ do 70 | scanMatch lookfoobar "foobar" `shouldBe` [False, False, False, False, False, False, True] 71 | it "works correctly on foobaz" $ do 72 | scanMatch lookfoobar "foobaz" `shouldBe` [False, False, False, False, False, False, False] 73 | describe "Some tests for (?<=foo)bar" $ do 74 | it "can extract bar" $ do 75 | leftMostMaximalMatch lookfoobar2 "foobar" `shouldBe` Just "bar" 76 | leftMostMaximalMatch lookfoobar2 "foobaz" `shouldBe` Nothing 77 | 78 | 79 | passwordTests :: SpecWith () 80 | passwordTests = do 81 | describe "Testing password conditions" $ do 82 | it "correctly checks \\w{6,10}" $ do 83 | "abcdef" `shouldSatisfy` match condition1 84 | "abcd-f" `shouldNotSatisfy` match condition1 85 | "abc" `shouldNotSatisfy` match condition1 86 | it "correctly checks presence of a lowercase character" $ do 87 | "abcdef" `shouldSatisfy` match condition2 88 | "ABCDEF" `shouldNotSatisfy` match condition2 89 | "ABCdEF" `shouldSatisfy` match condition2 90 | it "correctly checks presence of 3 uppercase characters" $ do 91 | "ABCDEF" `shouldSatisfy` match condition3 92 | "ABCdEF" `shouldSatisfy` match condition3 93 | "ABcde" `shouldNotSatisfy` match condition3 94 | it "correctly checks presence of a digit" $ do 95 | "abc123" `shouldSatisfy` match condition4 96 | "abc" `shouldNotSatisfy` match condition4 97 | "123" `shouldSatisfy` match condition4 98 | "a1b" `shouldSatisfy` match condition4 99 | it "correctly checks the conjunction of the conditions above" $ do 100 | "abc" `shouldNotSatisfy` match password -- too short 101 | "abc123" `shouldNotSatisfy` match password -- no uppercase 102 | "ABC123" `shouldNotSatisfy` match password -- no lowercase 103 | "abcDEF" `shouldNotSatisfy` match password -- no digit 104 | "abc123DE" `shouldNotSatisfy` match password -- not enough uppercase characters 105 | "abc123DEF" `shouldSatisfy` match password 106 | "abc123DEFghi" `shouldNotSatisfy` match password -- too long 107 | 108 | booleanOpTests :: SpecWith () 109 | booleanOpTests = do 110 | describe "Checking several boolean operations" $ do 111 | it "correctly checks for lower case words without q" $ do 112 | "abc" `shouldSatisfy` match wordNoQ 113 | "abcq" `shouldNotSatisfy` match wordNoQ 114 | "qabc" `shouldNotSatisfy` match wordNoQ 115 | "ABC" `shouldNotSatisfy` match wordNoQ 116 | it "correctly checks for words that do not end with END" $ do 117 | "abc" `shouldSatisfy` match doesnotEndwithEND 118 | "abcEND" `shouldNotSatisfy` match doesnotEndwithEND 119 | "abcENDabc" `shouldSatisfy` match doesnotEndwithEND 120 | "abcENDabcEND" `shouldNotSatisfy` match doesnotEndwithEND 121 | "abcENDabcENDabc" `shouldSatisfy` match doesnotEndwithEND 122 | it "correctly checks for words containing at least 3 instances where an upper case letter is immediately followed by a vowel" $ do 123 | "AEIOU" `shouldSatisfy` match atleast3upperthenvowel 124 | "ABeAA" `shouldNotSatisfy` match atleast3upperthenvowel 125 | "ABeAAe" `shouldSatisfy` match atleast3upperthenvowel 126 | "AeAAe" `shouldSatisfy` match atleast3upperthenvowel 127 | "AeAAX" `shouldNotSatisfy` match atleast3upperthenvowel 128 | "AeAAXe" `shouldSatisfy` match atleast3upperthenvowel 129 | it "correctly checks for words that start with START" $ do 130 | "STARTabc" `shouldSatisfy` match startswithSTART 131 | "abcSTART" `shouldNotSatisfy` match startswithSTART 132 | "abcSTARTabc" `shouldNotSatisfy` match startswithSTART 133 | it "correctly checks for __words__ between two underscores" $ do 134 | "__abc__" `shouldSatisfy` match textBetweenUnderscores 135 | "__abc" `shouldNotSatisfy` match textBetweenUnderscores 136 | "abc__" `shouldNotSatisfy` match textBetweenUnderscores 137 | "__abc__def__" `shouldSatisfy` match textBetweenUnderscores 138 | it "correctly extracts the leftmost longest __word__ between two underscores" $ do 139 | leftMostMaximalMatch textBetweenUnderscores2 "__abc__def__" `shouldBe` Just "abc__def" 140 | leftMostMaximalMatch textBetweenUnderscores2 "__abc__" `shouldBe` Just "abc" 141 | leftMostMaximalMatch textBetweenUnderscores2 "x__abc__x" `shouldBe` Nothing 142 | leftMostMaximalMatch textBetweenUnderscores2 "__abc" `shouldBe` Nothing 143 | leftMostMaximalMatch textBetweenUnderscores2 "abc__" `shouldBe` Nothing 144 | it "correctly checks for words that end with X[^Y]" $ do 145 | "abcXZ" `shouldSatisfy` match xnoxy 146 | "abcXY" `shouldNotSatisfy` match xnoxy 147 | "abcXYabc" `shouldNotSatisfy` match xnoxy 148 | "abcXabc" `shouldNotSatisfy` match xnoxy 149 | 150 | hardFamily = wildCard : map f hardFamily where 151 | f r = wildCard `topAnd` (wildCard `Concat` r `Concat` fromWord "b") 152 | 153 | hardFamilyTests :: SpecWith () 154 | hardFamilyTests = do 155 | describe "Testing with r[i+1]= (?=.*r[i]b).*" $ do 156 | it "works correctly at level 2" $ do 157 | replicate 5000 'a' ++ "b" `shouldSatisfy` match (hardFamily !! 2) 158 | replicate 5000 'a' `shouldNotSatisfy` match (hardFamily !! 2) 159 | it "works correctly at level 10" $ do 160 | replicate 5000 'a' ++ "b" `shouldSatisfy` match (hardFamily !! 10) 161 | replicate 5000 'a' `shouldNotSatisfy` match (hardFamily !! 10) 162 | 163 | {- 164 | This is taken from the following paper: 165 | Derivatives of Regular Expressions with Lookahead, by T. Miyazaki \& Y. Minamide (2019) [JIP] 166 | 167 | Below we use the primes 2, 3, 5, 7, 11, 13. The size of the regular expression is no bigger than 2 + 3 + 5 + 7 + 11 + 13 <= 13 * 13 = 169. 168 | 169 | However, it represents that the string is a multiple of N, where 170 | N = 2 * 3 * 5 * 7 * 11 * 13 = 30030. 171 | 172 | The regex .*b.{N} can thus be described using a much smaller regex, as done below. 173 | We know that this regex, however, cannot be expressed with a DFA of less than 2^30030 states. 174 | 175 | -} 176 | 177 | primeProductRegex = topAndMulti $ map ofLength [2, 3, 5, 7, 11, 13] where 178 | ofLength n = Star $ repeatN n (CharClass $ const True) 179 | buffer30030 = wildCard `Concat` fromWord "b" `Concat` primeProductRegex 180 | 181 | primeProductTests :: SpecWith () 182 | primeProductTests = do 183 | describe "Testing ability to do multiplications" $ do 184 | it "correctly checks for multiples of 30030" $ do 185 | replicate 30030 'a' `shouldSatisfy` match primeProductRegex 186 | replicate 67 'a' `shouldNotSatisfy` match primeProductRegex 187 | replicate 60060 'a' `shouldSatisfy` match primeProductRegex 188 | replicate 2000 'a' `shouldNotSatisfy` match primeProductRegex 189 | it "correctly checks that the 30030th last letter is b" $ do 190 | "b" ++ replicate 30030 'a' `shouldSatisfy` match buffer30030 191 | "a" ++ replicate 30030 'a' `shouldNotSatisfy` match buffer30030 192 | "bb" ++ replicate 30030 'a' `shouldSatisfy` match buffer30030 193 | 194 | 195 | lookaroundTest :: SpecWith () 196 | lookaroundTest = do 197 | foobarTests 198 | passwordTests 199 | booleanOpTests 200 | hardFamilyTests 201 | primeProductTests -------------------------------------------------------------------------------- /haskell/test/Spec.hs: -------------------------------------------------------------------------------- 1 | module Main (main) where 2 | 3 | import LookaroundTests.Tests (lookaroundTest) 4 | 5 | import Test.Hspec 6 | 7 | main :: IO () 8 | main = hspec $ do 9 | lookaroundTest 10 | -------------------------------------------------------------------------------- /preprint.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Agnishom/lregex/ffff8240e1b46d635974a3c62c7856a19064920a/preprint.pdf -------------------------------------------------------------------------------- /theories/.gitignore: -------------------------------------------------------------------------------- 1 | .lia.cache 2 | *.vo* 3 | *.aux 4 | *.glob 5 | .Makefile.coq.d 6 | Makefile.coq.conf 7 | Makefile.coq -------------------------------------------------------------------------------- /theories/CMatcher.v: -------------------------------------------------------------------------------- 1 | (** 2 | 3 | In the file [OMatcher.v], an algorithm for matching Oracle Regular Expressions [ORegex] in Oracle Strings [ostring] is formalized. In this file, we formalize some optimizations on that algorithm. The C in [CMatcher.v] stands for Caching. The main optimization is that we cache the results of the functions [finalWith] and [nullableWith], so that computing them is done in tandem with building the value itself and they can be looked up later in costant time. 4 | 5 | 1. The main type used in this file is [CMRegex]. It is defined in a mutually inductive manner with the type [CMRe]. 6 | - The type [CMRegex] is a wrapper around [CMRe] and contains two boolean fields. These fields can be accessed using the functions [cRe], [cNullable] and [cFinal] which are defined for convenience. 7 | - The function [unCache] converts a [CMRegex] to a [MRegex] by removing the caching information. 8 | - The relation [synced : valuation -> CMRegex -> Prop] asserts that the [CMRegex] is synchronized with respect to the given valuation. This happens when the fields [cNullable] and [cFinal] contain the correct values. This is established in the lemmas [synced_unCache_nullableWith] and [synced_unCache_finalWith]. 9 | 2. The functions [mkEpsilon], [mkCharClass], [mkQueryPos], [mkQueryNeg], [mkConcat], [mkUnion] and [mkStar] behave like 'smart constructors'. This is to say that they set the fields [cNullable] and [cFinal] correctly based on the input [CMRegex]es. 10 | - The correctness of these 'smart constructors' are established in [synced_mkEpsilon], [synced_mkCharClass], [synced_mkQueryPos], [synced_mkQueryNeg], [synced_mkConcat], [synced_mkUnion] and [synced_mkStar]. 11 | - The function [syncVal] is used to synchronize a [CMRegex] with respect to a given valuation. The correctness of this function is established in [synced_syncVal]. 12 | 3. The functions [cRead], [cFollow] and [toCMarked] are counterparts of the functions [read], [followWith] and [toMarked] respectively. The correctness of these functions are established in [cRead_unCache], [synced_unCache_followWith] and [toCMarked_unCache]. 13 | 4. The function [cConsume] is the counterpart of the function [consume]. The correctness of this function is established in [cConsume_empty], [cConsume_singleton] and [cConsume_step]. 14 | - The function [cScanMatch : ORegex -> ostring -> list bool] is the counterpart of the function [oscanMatcher]. The correctness of this function is established in [cScanMatch_tape]. 15 | 16 | 17 | *) 18 | 19 | Require Import Lia. 20 | Require Import Coq.Arith.Wf_nat. 21 | Require Import Coq.Lists.List. 22 | 23 | Import ListNotations. 24 | Import Bool. 25 | 26 | 27 | Require Import ListLemmas. 28 | Require Import ORegex. 29 | Require Import OMatcher. 30 | 31 | Section CMRegex. 32 | 33 | Context {A : Type}. 34 | 35 | Inductive CMRegex : Type := 36 | | MkCMRegex : bool -> bool -> CMRe -> CMRegex 37 | with CMRe : Type := 38 | | CMEpsilon : CMRe 39 | | CMCharClass : (A -> bool) -> CMRe 40 | | CMQueryPos : nat -> CMRe 41 | | CMQueryNeg : nat -> CMRe 42 | | CMConcat : CMRegex -> CMRegex -> CMRe 43 | | CMUnion : CMRegex -> CMRegex -> CMRe 44 | | CMStar : CMRegex -> CMRe 45 | . 46 | 47 | Definition cNullable (r : CMRegex) : bool := 48 | match r with 49 | | MkCMRegex b _ _ => b 50 | end. 51 | 52 | Definition cFinal (r : CMRegex) : bool := 53 | match r with 54 | | MkCMRegex _ b _ => b 55 | end. 56 | 57 | Definition cRe (r : CMRegex) : CMRe := 58 | match r with 59 | | MkCMRegex _ _ re => re 60 | end. 61 | 62 | Definition mkEpsilon : CMRegex := 63 | MkCMRegex true false CMEpsilon. 64 | 65 | Definition mkCharClass (mark : bool) (f : A -> bool) : CMRegex := 66 | MkCMRegex false mark (CMCharClass f). 67 | 68 | Definition mkQueryPos (b : bool) (n : nat) : CMRegex := 69 | MkCMRegex b false (CMQueryPos n). 70 | 71 | Definition mkQueryNeg (b : bool) (n : nat) : CMRegex := 72 | MkCMRegex b false (CMQueryNeg n). 73 | 74 | Definition mkConcat (r1 r2 : CMRegex) : CMRegex := 75 | MkCMRegex 76 | (cNullable r1 && cNullable r2) 77 | ((cFinal r1 && cNullable r2) || cFinal r2) 78 | (CMConcat r1 r2). 79 | 80 | Definition mkUnion (r1 r2 : CMRegex) : CMRegex := 81 | MkCMRegex 82 | (cNullable r1 || cNullable r2) 83 | (cFinal r1 || cFinal r2) 84 | (CMUnion r1 r2). 85 | 86 | Definition mkStar (r : CMRegex) : CMRegex := 87 | MkCMRegex true (cFinal r) (CMStar r). 88 | 89 | Fixpoint synced (v : valuation) (r : CMRegex) : Prop := 90 | match cRe r with 91 | | CMEpsilon => cFinal r = false /\ cNullable r = true 92 | | CMCharClass _ => cNullable r = false 93 | | CMQueryPos n => 94 | cFinal r = false 95 | /\ cNullable r = match nth_error v n with 96 | | Some true => true 97 | | _ => false 98 | end 99 | | CMQueryNeg n => 100 | cFinal r = false 101 | /\ cNullable r = match nth_error v n with 102 | | Some false => true 103 | | _ => false 104 | end 105 | | CMConcat r1 r2 => 106 | cFinal r = ((cFinal r1 && cNullable r2) || cFinal r2) 107 | /\ cNullable r = (cNullable r1 && cNullable r2) 108 | /\ synced v r1 109 | /\ synced v r2 110 | | CMUnion r1 r2 => 111 | cFinal r = (cFinal r1 || cFinal r2) 112 | /\ cNullable r = (cNullable r1 || cNullable r2) 113 | /\ synced v r1 114 | /\ synced v r2 115 | | CMStar r1 => 116 | cFinal r = cFinal r1 117 | /\ cNullable r = true 118 | /\ synced v r1 119 | end. 120 | 121 | Fixpoint unCache (r : CMRegex) : MRegex := 122 | match cRe r with 123 | | CMEpsilon => MEpsilon 124 | | CMCharClass f => MCharClass (cFinal r) f 125 | | CMQueryPos n => MQueryPos n 126 | | CMQueryNeg n => MQueryNeg n 127 | | CMConcat r1 r2 => MConcat (unCache r1) (unCache r2) 128 | | CMUnion r1 r2 => MUnion (unCache r1) (unCache r2) 129 | | CMStar r1 => MStar (unCache r1) 130 | end. 131 | 132 | Fixpoint syncVal (v : valuation) (r : CMRegex) : CMRegex := 133 | match cRe r with 134 | | CMEpsilon => mkEpsilon 135 | | CMCharClass f => mkCharClass (cFinal r) f 136 | | CMQueryPos n => mkQueryPos (match nth_error v n with 137 | | Some true => true 138 | | _ => false 139 | end) n 140 | | CMQueryNeg n => mkQueryNeg (match nth_error v n with 141 | | Some false => true 142 | | _ => false 143 | end) n 144 | | CMConcat r1 r2 => mkConcat (syncVal v r1) (syncVal v r2) 145 | | CMUnion r1 r2 => mkUnion (syncVal v r1) (syncVal v r2) 146 | | CMStar r1 => mkStar (syncVal v r1) 147 | end. 148 | 149 | Fixpoint cRead (a : A) (r : CMRegex) : CMRegex := 150 | match cRe r with 151 | | CMEpsilon => r 152 | | CMCharClass f => mkCharClass (cFinal r && f a) f 153 | | CMQueryPos n => r 154 | | CMQueryNeg n => r 155 | | CMConcat r1 r2 => mkConcat (cRead a r1) (cRead a r2) 156 | | CMUnion r1 r2 => mkUnion (cRead a r1) (cRead a r2) 157 | | CMStar r1 => mkStar (cRead a r1) 158 | end. 159 | 160 | Fixpoint cFollow (b : bool) (r : CMRegex) : CMRegex := 161 | match cRe r with 162 | | CMEpsilon => mkEpsilon 163 | | CMCharClass f => mkCharClass b f 164 | | CMQueryPos n => r 165 | | CMQueryNeg n => r 166 | | CMConcat r1 r2 => 167 | let b1 := cFinal r1 || (b && cNullable r1) in 168 | mkConcat (cFollow b r1) (cFollow b1 r2) 169 | | CMUnion r1 r2 => mkUnion (cFollow b r1) (cFollow b r2) 170 | | CMStar r1 => mkStar (cFollow (b || cFinal r1) r1) 171 | end. 172 | 173 | Fixpoint toCMarked (or : @ORegex A) : CMRegex := 174 | match or with 175 | | OEpsilon => mkEpsilon 176 | | OCharClass f => mkCharClass false f 177 | | OQueryPos n => mkQueryPos false n 178 | | OQueryNeg n => mkQueryNeg false n 179 | | OConcat or1 or2 => mkConcat (toCMarked or1) (toCMarked or2) 180 | | OUnion or1 or2 => mkUnion (toCMarked or1) (toCMarked or2) 181 | | OStar or1 => mkStar (toCMarked or1) 182 | end. 183 | 184 | (* when using, we want |w| = |o| *) 185 | Fixpoint cConsumeAux (cmr : CMRegex) (w : list A) (o : list valuation) : CMRegex := 186 | match w, o with 187 | | [], [] => cmr 188 | | [], _ :: _ => mkEpsilon (* shouldn't arise! *) 189 | | _ :: _, [] => mkEpsilon (* shouldn't arise! *) 190 | | a0 :: w', v1 :: o' => 191 | let cmrNew := cFollow false cmr in 192 | let cmrNew' := cRead a0 cmrNew in 193 | let cmrNew'' := syncVal v1 cmrNew' in 194 | cConsumeAux cmrNew'' w' o' 195 | end. 196 | 197 | 198 | Definition cConsume (or : ORegex) (os : @ostring A) : CMRegex := 199 | let cmr := toCMarked or in 200 | match os with 201 | | (_, []) => cmr (* shouldn't arise! *) 202 | | ([], [o0]) => syncVal o0 cmr 203 | | (_ , _ :: []) => cmr (* shouldn't arise! *) 204 | | (a0 :: w', o0 :: o1 :: o') => 205 | let cmr0 := cFollow true (syncVal o0 cmr) in 206 | let cmr0' := cRead a0 cmr0 in 207 | let cmr0'' := syncVal o1 cmr0' in 208 | cConsumeAux cmr0'' w' o' 209 | | (_, _ :: _ :: _) => cmr (* shouldn't arise! *) 210 | end. 211 | 212 | (* when using, we want |w| = |o| *) 213 | Fixpoint cScanMatchAux (cmr : CMRegex) (w : list A) (o : list valuation) : list bool := 214 | let b := cFinal cmr in 215 | match w, o with 216 | | [], [] => [b] 217 | | [], _ :: _ => [b] (* shouldn't arise! *) 218 | | _ :: _, [] => [b] (* shouldn't arise! *) 219 | | a0 :: w', v1 :: o' => 220 | let cmrNew := cFollow false cmr in 221 | let cmrNew' := cRead a0 cmrNew in 222 | let cmrNew'' := syncVal v1 cmrNew' in 223 | b :: cScanMatchAux cmrNew'' w' o' 224 | end. 225 | 226 | Definition cScanMatch (or : ORegex) (os : @ostring A) : list bool := 227 | let cmr := toCMarked or in 228 | match os with 229 | | (_, []) => [] (* shouldn't arise! *) 230 | | ([], [o0]) => [cNullable (syncVal o0 cmr)] 231 | | (_ , _ :: []) => [] (* shouldn't arise! *) 232 | | (a0 :: w', o0 :: o1 :: o') => 233 | let b0 := cNullable (syncVal o0 cmr) in 234 | let cmr0 := cFollow true (syncVal o0 cmr) in 235 | let cmr0' := cRead a0 cmr0 in 236 | let cmr0'' := syncVal o1 cmr0' in 237 | b0 :: cScanMatchAux cmr0'' w' o' 238 | | (_, _ :: _ :: _) => [] (* shouldn't arise! *) 239 | end. 240 | 241 | Hint Unfold synced : regex. 242 | 243 | Lemma synced_mkEpsilon : 244 | forall v : valuation, 245 | synced v mkEpsilon. 246 | Proof. 247 | intros. simpl. auto. 248 | Qed. 249 | 250 | Lemma synced_mkCharClass : 251 | forall (v : valuation) (mark : bool) (f : A -> bool), 252 | synced v (mkCharClass mark f). 253 | Proof. 254 | intros. simpl. auto. 255 | Qed. 256 | 257 | Lemma synced_mkQueryPos : 258 | forall (v : valuation) (n : nat) (b : bool), 259 | b = match nth_error v n with 260 | | Some true => true 261 | | _ => false 262 | end 263 | <-> synced v (mkQueryPos b n). 264 | Proof. 265 | intros. simpl. tauto. 266 | Qed. 267 | 268 | Lemma synced_mkQueryNeg : 269 | forall (v : valuation) (n : nat) (b : bool), 270 | b = match nth_error v n with 271 | | Some false => true 272 | | _ => false 273 | end 274 | <-> synced v (mkQueryNeg b n). 275 | Proof. 276 | intros. simpl. tauto. 277 | Qed. 278 | 279 | Lemma synced_mkConcat : 280 | forall (v : valuation) (r1 r2 : CMRegex), 281 | synced v r1 /\ synced v r2 <-> 282 | synced v (mkConcat r1 r2). 283 | Proof. 284 | intros. simpl. tauto. 285 | Qed. 286 | 287 | Lemma synced_mkUnion : 288 | forall (v : valuation) (r1 r2 : CMRegex), 289 | synced v r1 /\ synced v r2 <-> 290 | synced v (mkUnion r1 r2). 291 | Proof. 292 | intros. simpl. tauto. 293 | Qed. 294 | 295 | Lemma synced_mkStar : 296 | forall (v : valuation) (r : CMRegex), 297 | synced v r <-> 298 | synced v (mkStar r). 299 | Proof. 300 | intros. simpl. tauto. 301 | Qed. 302 | 303 | Hint Resolve synced_mkEpsilon : regex. 304 | Hint Resolve synced_mkCharClass : regex. 305 | Hint Resolve synced_mkQueryPos : regex. 306 | Hint Resolve synced_mkQueryNeg : regex. 307 | Hint Resolve synced_mkConcat : regex. 308 | Hint Resolve synced_mkUnion : regex. 309 | Hint Resolve synced_mkStar : regex. 310 | 311 | 312 | 313 | Fixpoint CMRegex_ind_2 (P : CMRe -> Prop) 314 | (HEps: P CMEpsilon) 315 | (HCharClass : (forall f : A -> bool, P (CMCharClass f))) 316 | (HQueryPos : (forall n : nat, P (CMQueryPos n))) 317 | (HQueryNeg : (forall n : nat, P (CMQueryNeg n))) 318 | (HConcat : forall r1 r2 : CMRe, 319 | P r1 320 | -> P r2 321 | -> forall n1 f1 n2 f2, 322 | P (CMConcat (MkCMRegex n1 f1 r1) (MkCMRegex n2 f2 r2))) 323 | (HUnion : forall r1 r2 : CMRe, 324 | P r1 325 | -> P r2 326 | -> forall n1 f1 n2 f2, 327 | P (CMUnion (MkCMRegex n1 f1 r1) (MkCMRegex n2 f2 r2))) 328 | (HStar : (forall r : CMRe, 329 | P r 330 | -> forall n f, P (CMStar (MkCMRegex n f r)))) 331 | (r : CMRe) : P r := 332 | let inductor := CMRegex_ind_2 P HEps HCharClass HQueryPos HQueryNeg HConcat HUnion HStar in 333 | match r with 334 | | CMEpsilon => HEps 335 | | CMCharClass f => HCharClass f 336 | | CMQueryPos n => HQueryPos n 337 | | CMQueryNeg n => HQueryNeg n 338 | | CMConcat (MkCMRegex n1 f1 r1) (MkCMRegex n2 f2 r2) => HConcat r1 r2 (inductor r1) (inductor r2) n1 f1 n2 f2 339 | | CMUnion (MkCMRegex n1 f1 r1) (MkCMRegex n2 f2 r2) => HUnion r1 r2 (inductor r1) (inductor r2) n1 f1 n2 f2 340 | | CMStar (MkCMRegex n f r) => HStar r (inductor r) n f 341 | end. 342 | 343 | 344 | Lemma synced_unCache_nullableWith (v : valuation) (r : CMRegex) : 345 | synced v r 346 | -> cNullable r = nullableWith v (unCache r). 347 | Proof. 348 | destruct r as [cNull cFin re]. revert cNull cFin. 349 | induction re using CMRegex_ind_2; 350 | try (simpl in *; tauto). 351 | (* Concat *) 352 | { intros cNull cFin. 353 | remember (MkCMRegex n1 f1 re1) as r1. 354 | remember (MkCMRegex n2 f2 re2) as r2. 355 | simpl synced. intro Hsynced. 356 | destruct Hsynced as [_ [Hnull [Hs1 Hs2]]]. 357 | simpl unCache. simpl nullableWith. 358 | simpl cNullable. subst cNull. 359 | subst r1 r2. 360 | specialize (IHre1 n1 f1 Hs1). 361 | specialize (IHre2 n2 f2 Hs2). 362 | congruence. 363 | } 364 | (* Union *) 365 | { intros cNull cFin. 366 | remember (MkCMRegex n1 f1 re1) as r1. 367 | remember (MkCMRegex n2 f2 re2) as r2. 368 | simpl synced. intro Hsynced. 369 | destruct Hsynced as [_ [Hnull [Hs1 Hs2]]]. 370 | simpl unCache. simpl nullableWith. 371 | simpl cNullable. subst cNull. 372 | subst r1 r2. 373 | specialize (IHre1 n1 f1 Hs1). 374 | specialize (IHre2 n2 f2 Hs2). 375 | congruence. 376 | } 377 | Qed. 378 | 379 | Lemma synced_unCache_finalWith (v : valuation) (r : CMRegex) : 380 | synced v r 381 | -> cFinal r = finalWith v (unCache r). 382 | Proof. 383 | destruct r as [cNull cFin re]. revert cNull cFin. 384 | induction re using CMRegex_ind_2; 385 | try (simpl in *; tauto). 386 | (* Concat *) 387 | { intros cNull cFin. 388 | remember (MkCMRegex n1 f1 re1) as r1. 389 | remember (MkCMRegex n2 f2 re2) as r2. 390 | simpl synced. intro Hsynced. 391 | destruct Hsynced as [Hfin [_ [Hs1 Hs2]]]. 392 | simpl unCache. simpl finalWith. 393 | simpl cFinal. 394 | pose proof (synced_unCache_nullableWith _ _ Hs2) as Hnull. 395 | subst cFin. 396 | subst r1 r2. 397 | specialize (IHre1 n1 f1 Hs1). 398 | specialize (IHre2 n2 f2 Hs2). 399 | congruence. 400 | } 401 | (* Union *) 402 | { intros cNull cFin. 403 | remember (MkCMRegex n1 f1 re1) as r1. 404 | remember (MkCMRegex n2 f2 re2) as r2. 405 | simpl synced. intro Hsynced. 406 | destruct Hsynced as [Hfin [_ [Hs1 Hs2]]]. 407 | simpl unCache. simpl finalWith. 408 | simpl cFinal. subst cFin. 409 | subst r1 r2. 410 | specialize (IHre1 n1 f1 Hs1). 411 | specialize (IHre2 n2 f2 Hs2). 412 | congruence. 413 | } 414 | (* Star *) 415 | { intros cNull cFin. 416 | remember (MkCMRegex n f re) as r. 417 | simpl synced. intro Hsynced. 418 | destruct Hsynced as [Hfin [Hnull Hs]]. 419 | simpl unCache. simpl finalWith. 420 | simpl cFinal. subst cFin. 421 | subst r. 422 | specialize (IHre n f Hs). 423 | congruence. 424 | } 425 | Qed. 426 | 427 | 428 | 429 | Lemma synced_syncVal (v : valuation) (r : CMRegex) : 430 | synced v (syncVal v r). 431 | Proof. 432 | destruct r as [cNull cFin re]. revert cNull cFin. 433 | induction re using CMRegex_ind_2; 434 | try (simpl in *; tauto). 435 | (* Concat *) 436 | { intros cNull cFin. 437 | remember (MkCMRegex n1 f1 re1) as r1. 438 | remember (MkCMRegex n2 f2 re2) as r2. 439 | simpl syncVal. 440 | apply synced_mkConcat. 441 | specialize (IHre1 n1 f1). 442 | specialize (IHre2 n2 f2). 443 | rewrite <- Heqr1 in IHre1. 444 | rewrite <- Heqr2 in IHre2. 445 | auto. 446 | } 447 | (* Union *) 448 | { intros cNull cFin. 449 | remember (MkCMRegex n1 f1 re1) as r1. 450 | remember (MkCMRegex n2 f2 re2) as r2. 451 | simpl syncVal. 452 | apply synced_mkUnion. 453 | specialize (IHre1 n1 f1). 454 | specialize (IHre2 n2 f2). 455 | rewrite <- Heqr1 in IHre1. 456 | rewrite <- Heqr2 in IHre2. 457 | auto. 458 | } 459 | (* Star *) 460 | { intros cNull cFin. 461 | remember (MkCMRegex n f re) as r. 462 | simpl syncVal. 463 | rewrite <- synced_mkStar. 464 | specialize (IHre n f). 465 | rewrite <- Heqr in IHre. 466 | auto. 467 | } 468 | Qed. 469 | 470 | Lemma syncVal_unCache (r : CMRegex) (v : valuation) : 471 | unCache (syncVal v r) = unCache r. 472 | Proof. 473 | destruct r as [cNull cFin re]. revert cNull cFin. 474 | induction re using CMRegex_ind_2; 475 | try (simpl in *; tauto). 476 | (* Concat *) { 477 | intros cNull cFin. 478 | remember (MkCMRegex n1 f1 re1) as r1. 479 | remember (MkCMRegex n2 f2 re2) as r2. 480 | simpl syncVal. 481 | simpl unCache. 482 | subst r1 r2. 483 | specialize (IHre1 n1 f1). 484 | specialize (IHre2 n2 f2). 485 | rewrite IHre1. 486 | rewrite IHre2. 487 | auto. 488 | } 489 | (* Union *) { 490 | intros cNull cFin. 491 | remember (MkCMRegex n1 f1 re1) as r1. 492 | remember (MkCMRegex n2 f2 re2) as r2. 493 | simpl syncVal. 494 | simpl unCache. 495 | subst r1 r2. 496 | specialize (IHre1 n1 f1). 497 | specialize (IHre2 n2 f2). 498 | rewrite IHre1. 499 | rewrite IHre2. 500 | auto. 501 | } 502 | (* Star *) { 503 | intros cNull cFin. 504 | remember (MkCMRegex n f re) as r. 505 | simpl syncVal. 506 | simpl unCache. 507 | subst r. 508 | specialize (IHre n f). 509 | rewrite IHre. 510 | auto. 511 | } 512 | Qed. 513 | 514 | 515 | Lemma synced_cRead (v : valuation) (r : CMRegex) (a : A) : 516 | synced v r -> synced v (cRead a r). 517 | Proof. 518 | destruct r as [cNull cFin re]. revert cNull cFin. 519 | induction re using CMRegex_ind_2; 520 | try (simpl in *; tauto). 521 | (* Concat *) 522 | { intros cNull cFin. 523 | remember (MkCMRegex n1 f1 re1) as r1. 524 | remember (MkCMRegex n2 f2 re2) as r2. 525 | intros Hsynced. 526 | destruct Hsynced as [_ [_ [Hs1 Hs2]]]. 527 | simpl cRead. 528 | apply synced_mkConcat. 529 | subst r1 r2. 530 | specialize (IHre1 n1 f1 Hs1). 531 | specialize (IHre2 n2 f2 Hs2). 532 | auto. 533 | } 534 | (* Union *) 535 | { intros cNull cFin. 536 | remember (MkCMRegex n1 f1 re1) as r1. 537 | remember (MkCMRegex n2 f2 re2) as r2. 538 | intros Hsynced. 539 | destruct Hsynced as [_ [_ [Hs1 Hs2]]]. 540 | simpl cRead. 541 | apply synced_mkUnion. 542 | subst r1 r2. 543 | specialize (IHre1 n1 f1 Hs1). 544 | specialize (IHre2 n2 f2 Hs2). 545 | auto. 546 | } 547 | (* Star *) 548 | { intros cNull cFin. 549 | remember (MkCMRegex n f re) as r. 550 | intros Hsynced. 551 | destruct Hsynced as [_ [_ Hs]]. 552 | simpl cRead. 553 | rewrite <- synced_mkStar. 554 | subst r. 555 | now specialize (IHre n f Hs). 556 | } 557 | Qed. 558 | 559 | Lemma cRead_unCache (a : A) (r : CMRegex) : 560 | unCache (cRead a r) = read a (unCache r). 561 | Proof. 562 | destruct r as [cNull cFin re]. revert cNull cFin. 563 | induction re using CMRegex_ind_2; 564 | try (simpl in *; tauto). 565 | (* Concat *){ 566 | intros cNull cFin. 567 | remember (MkCMRegex n1 f1 re1) as r1. 568 | remember (MkCMRegex n2 f2 re2) as r2. 569 | simpl cRead. 570 | simpl unCache. 571 | subst r1 r2. 572 | specialize (IHre1 n1 f1). 573 | specialize (IHre2 n2 f2). 574 | rewrite IHre1. 575 | rewrite IHre2. 576 | auto. 577 | } 578 | (* Union *){ 579 | intros cNull cFin. 580 | remember (MkCMRegex n1 f1 re1) as r1. 581 | remember (MkCMRegex n2 f2 re2) as r2. 582 | simpl cRead. 583 | simpl unCache. 584 | subst r1 r2. 585 | specialize (IHre1 n1 f1). 586 | specialize (IHre2 n2 f2). 587 | rewrite IHre1. 588 | rewrite IHre2. 589 | auto. 590 | } 591 | (* Star *) { 592 | intros cNull cFin. 593 | remember (MkCMRegex n f re) as r. 594 | simpl cRead. 595 | simpl unCache. 596 | subst r. 597 | specialize (IHre n f). 598 | rewrite IHre. 599 | auto. 600 | } 601 | Qed. 602 | 603 | Lemma synced_unCache_followWith (v : valuation) (r : CMRegex) (b : bool) : 604 | synced v r 605 | -> unCache (cFollow b r) = followWith b v (unCache r). 606 | Proof. 607 | destruct r as [cNull cFin re]. revert cNull cFin b v. 608 | induction re using CMRegex_ind_2; 609 | try (simpl in *; tauto). 610 | (* Concat *) 611 | { intros cNull cFin. 612 | remember (MkCMRegex n1 f1 re1) as r1. 613 | remember (MkCMRegex n2 f2 re2) as r2. 614 | intros b v Hsynced. 615 | destruct Hsynced as [Hfinal [Hnull [Hs1 Hs2]]]. 616 | simpl in Hfinal. 617 | simpl cFollow. simpl unCache. simpl followWith. 618 | pose proof (synced_unCache_nullableWith _ _ Hs1) as Hnull1. 619 | pose proof (synced_unCache_finalWith _ _ Hs1) as Hfinal1. 620 | subst r2. 621 | specialize (IHre2 n2 f2 (cFinal r1 || b && cNullable r1) v Hs2). 622 | rewrite IHre2. 623 | subst r1. 624 | specialize (IHre1 n1 f1 b v Hs1). rewrite IHre1. 625 | rewrite Hfinal1. rewrite Hnull1. 626 | reflexivity. 627 | } 628 | (* Union *) 629 | { intros cNull cFin. 630 | remember (MkCMRegex n1 f1 re1) as r1. 631 | remember (MkCMRegex n2 f2 re2) as r2. 632 | intros b v Hsynced. 633 | destruct Hsynced as [_ [_ [Hs1 Hs2]]]. 634 | simpl cFollow. simpl unCache. simpl followWith. 635 | subst r1 r2. 636 | specialize (IHre1 n1 f1 b v Hs1). rewrite IHre1. 637 | specialize (IHre2 n2 f2 b v Hs2). rewrite IHre2. 638 | reflexivity. 639 | } 640 | (* Star *) 641 | { intros cNull cFin. 642 | remember (MkCMRegex n f re) as r. 643 | intros b v Hsynced. 644 | destruct Hsynced as [Hfinal [_ Hs]]. 645 | pose proof (synced_unCache_finalWith _ _ Hs) as Hfinal1. 646 | simpl cFollow. simpl unCache. simpl followWith. 647 | rewrite Hfinal1. 648 | specialize (IHre n f (b || finalWith v (unCache r)) v ltac:(now subst)). 649 | subst. 650 | rewrite IHre. 651 | reflexivity. 652 | } 653 | Qed. 654 | 655 | 656 | 657 | Lemma toCMarked_unCache (or : @ORegex A) : 658 | unCache (toCMarked or) = toMarked or. 659 | Proof. 660 | induction or; simpl; auto. 661 | all: congruence. 662 | Qed. 663 | 664 | 665 | 666 | Lemma cConsumeAux_snoc (cmr : CMRegex) (w : list A) (o : list valuation) 667 | (a' : A) (v' : valuation) : 668 | length w = length o 669 | -> cConsumeAux cmr (w ++ [a']) (o ++ [v']) = 670 | syncVal v' (cRead a' (cFollow false (cConsumeAux cmr w o))). 671 | Proof. 672 | remember (length w) as n eqn:Hn. 673 | revert n cmr w o v' a' Hn. 674 | induction n. 675 | - intros cmr w o v' a' Hlen Heq. 676 | destruct w as [ | w0 w']. 2 : { simpl in Hlen. lia. } 677 | destruct o as [ | o0 o']. 2 : { simpl in Heq. lia. } 678 | auto. 679 | - intros mr w o v' a' Hlen Heq. 680 | destruct w as [ | w0 w']. { simpl in Hlen. lia. } 681 | destruct o as [ | o0 o']. { simpl in Heq. lia. } 682 | simpl in Hlen. simpl in Heq. 683 | simpl. rewrite IHn. auto. 684 | lia. lia. 685 | Qed. 686 | 687 | Lemma cConsume_snoc (or : ORegex) (w : list A) (o : list valuation) 688 | (a' : A) (v' : valuation) : 689 | outer_length_wf (w, o) 690 | -> olength (w, o) > 0 691 | -> cConsume or (w ++ [a'], o ++ [v']) = 692 | syncVal v' (cRead a' (cFollow false (cConsume or (w, o)))). 693 | Proof. 694 | intros Hwf Hlen. 695 | unfold outer_length_wf in Hwf. 696 | unfold olength in Hlen. simpl in * |-. 697 | destruct o as [ | o0 o']. { simpl in Hwf. lia. } 698 | destruct w as [ | a0 w']. { simpl in Hlen. lia. } 699 | destruct o' as [ | o1 o'']. { simpl in Hwf. lia. } 700 | simpl app. 701 | simpl. apply cConsumeAux_snoc. 702 | simpl in Hwf. lia. 703 | Qed. 704 | 705 | 706 | 707 | Lemma cConsume_empty (or : ORegex) (v0 : valuation) : 708 | unCache (cConsume or ([], [v0])) = toMarked or 709 | /\ synced v0 (cConsume or ([], [v0])). 710 | Proof. 711 | simpl. 712 | rewrite syncVal_unCache. 713 | rewrite toCMarked_unCache. 714 | split; [ auto | ]. 715 | apply synced_syncVal. 716 | Qed. 717 | 718 | Lemma cConsume_singleton (or : ORegex) (a0 : A) (v0 v1 : valuation) : 719 | unCache (cConsume or ([a0], [v0 ; v1])) = read a0 (consume or ([], [v0])) 720 | /\ synced v1 (cConsume or ([a0], [v0 ; v1])). 721 | Proof. 722 | simpl. 723 | remember (syncVal v0 (toCMarked or)) as cr0. 724 | assert (synced v0 cr0). { 725 | subst. apply synced_syncVal. 726 | } 727 | remember (cFollow true cr0) as cr1. 728 | assert (unCache cr1 = followWith true v0 (unCache cr0)). { 729 | subst cr1. apply synced_unCache_followWith. assumption. 730 | } 731 | remember (cRead a0 cr1) as cr2. 732 | assert (unCache cr2 = read a0 (unCache cr1)). { 733 | subst cr2. apply cRead_unCache. 734 | } 735 | remember (syncVal v1 cr2) as cr3. 736 | assert (synced v1 cr3). { 737 | subst cr3. apply synced_syncVal. 738 | } 739 | split; [ | auto]. 740 | subst cr3. 741 | rewrite syncVal_unCache. 742 | subst cr2. 743 | rewrite cRead_unCache. 744 | rewrite H0. 745 | f_equal. f_equal. 746 | subst cr0. 747 | rewrite syncVal_unCache. 748 | rewrite toCMarked_unCache. 749 | reflexivity. 750 | Qed. 751 | 752 | Lemma cConsume_step (or : ORegex) (w : list A) (o : list valuation) 753 | (a a' : A) (v v' : valuation) : 754 | outer_length_wf (w, o) 755 | -> unCache (cConsume or (w ++ [a], o ++ [v])) = read a (consume or (w, o)) 756 | -> synced v (cConsume or (w ++ [a], o ++ [v])) 757 | -> unCache (cConsume or ((w ++ [a]) ++ [a'], (o ++ [v]) ++ [v'])) = read a' (consume or (w ++ [a], o ++ [v])) 758 | /\ synced v' (cConsume or ((w ++ [a]) ++ [a'], (o ++ [v]) ++ [v'])). 759 | Proof. 760 | intros Hwf Hconsume Hsynced. 761 | rewrite cConsume_snoc. 762 | rewrite consume_snoc; [ | auto]. 763 | split. 764 | - rewrite syncVal_unCache. rewrite cRead_unCache. 765 | erewrite synced_unCache_followWith. 766 | f_equal. f_equal. auto. auto. 767 | - apply synced_syncVal. 768 | - unfold outer_length_wf in *. simpl in Hwf |- *. 769 | rewrite app_length. rewrite app_length. 770 | simpl. auto. 771 | - unfold olength. simpl. rewrite app_length. simpl. lia. 772 | Qed. 773 | 774 | Lemma cConsume_nonempty (or : ORegex) (w : list A) (o : list valuation) 775 | (a : A) (v : valuation) : 776 | outer_length_wf (w, o) 777 | -> unCache (cConsume or (w ++ [a], o ++ [v])) = read a (consume or (w, o)) 778 | /\ synced v (cConsume or (w ++ [a], o ++ [v])). 779 | Proof. 780 | revert o v a. 781 | induction w using rev_ind. { 782 | intros ? ? ? Hwf. 783 | unfold outer_length_wf in Hwf. 784 | simpl in Hwf. 785 | destruct o as [ | o0 o' ]. { simpl in Hwf. lia. } 786 | destruct o' as [ | o1 o'']. 2 : { simpl in Hwf. lia. } 787 | simpl app. 788 | apply cConsume_singleton. 789 | } 790 | intros ? ? ? Hwf. 791 | destruct (unsnoc o) as [[o' oX]|] eqn:E. 792 | 2 : { rewrite unsnoc_None in E. subst o. 793 | unfold outer_length_wf in Hwf. simpl in Hwf. 794 | rewrite app_length in Hwf. simpl in Hwf. lia. 795 | } 796 | rewrite unsnoc_Some in E. 797 | subst o. 798 | assert (outer_length_wf (w, o')). { 799 | unfold outer_length_wf in *. simpl in *. 800 | repeat rewrite app_length in Hwf. 801 | simpl in Hwf. lia. 802 | } 803 | specialize (IHw o' oX x H) as [IH1 IH2]. 804 | apply cConsume_step; auto. 805 | Qed. 806 | 807 | Lemma cmembership_empty (or : ORegex) (v0 : valuation): 808 | cNullable (cConsume or ([], [v0])) = true 809 | <-> match_oregex or ([], [v0]). 810 | Proof. 811 | simpl. 812 | rewrite synced_unCache_nullableWith with (v := v0). 813 | - rewrite syncVal_unCache. 814 | rewrite toCMarked_unCache. 815 | rewrite nullableWith_iff. 816 | now rewrite toMarked_strip. 817 | - apply synced_syncVal. 818 | Qed. 819 | 820 | Lemma cmembership_nonempty (or : ORegex) (w : list A) (o : list valuation) 821 | (a : A) (v : valuation) : 822 | outer_length_wf (w, o) 823 | -> cFinal (cConsume or (w ++ [a], o ++ [v])) = true 824 | <-> match_oregex or (w ++ [a], o ++ [v]). 825 | Proof. 826 | intros Hwf. 827 | pose proof (cConsume_nonempty or w o a v Hwf) as [Hconsume Hsynced]. 828 | rewrite synced_unCache_finalWith with (v := v); [ | auto]. 829 | rewrite Hconsume. 830 | replace w with (fst (w, o)) by auto. 831 | replace o with (snd (w, o)) at 2 4 by auto. 832 | now apply membership_nonempty. 833 | Qed. 834 | 835 | 836 | 837 | Lemma cScanMatchAux_hd_error (cmr : CMRegex) (w : list A) (o : list valuation) : 838 | hd_error (cScanMatchAux cmr w o) = Some (cFinal cmr). 839 | Proof. 840 | destruct w; destruct o; auto. 841 | Qed. 842 | 843 | Lemma cScanMatchAux_cons (cmr : CMRegex) (w : list A) (o : list valuation) (a0 : A) (v0 : valuation) : 844 | cScanMatchAux cmr (a0 :: w) (v0 :: o) = 845 | cFinal cmr :: cScanMatchAux (syncVal v0 (cRead a0 (cFollow false cmr))) w o. 846 | Proof. 847 | reflexivity. 848 | Qed. 849 | 850 | Lemma cScanMatchAux_length (cmr : CMRegex) (w : list A) (o : list valuation) : 851 | length w = length o 852 | -> length (cScanMatchAux cmr w o) = S (length w). 853 | Proof. 854 | revert cmr o. 855 | induction w. 856 | - intros cmr o Hlen. destruct o. 2 : { simpl in Hlen. lia. } 857 | simpl. reflexivity. 858 | - intros cmr o Hlen. destruct o. 1 : { simpl in Hlen. lia. } 859 | simpl in Hlen. 860 | simpl. rewrite IHw; lia. 861 | Qed. 862 | 863 | Lemma cScanMatch_length (or : ORegex) (os : @ostring A) : 864 | outer_length_wf os 865 | -> length (cScanMatch or os) = S (olength os). 866 | Proof. 867 | intros Hwf. 868 | unfold outer_length_wf in Hwf. 869 | destruct os as [w o]. 870 | unfold olength. simpl fst. 871 | destruct o as [ | o0 o']. 1 : { simpl in Hwf. lia. } 872 | destruct o' as [ | o1 o'']. { 873 | destruct w as [ | a0 w']. 2 : { simpl in Hwf. lia. } 874 | auto. 875 | } 876 | destruct w as [ | a0 w']. 1 : { simpl in Hwf. lia. } 877 | simpl. rewrite cScanMatchAux_length; auto. 878 | simpl in Hwf. lia. 879 | Qed. 880 | 881 | Lemma cScanMatchAux_tl (or : ORegex) (w w' : list A) (o o' : list valuation) 882 | (a' : A) (v' : valuation) : 883 | outer_length_wf (w, o) 884 | -> olength (w, o) > 0 885 | -> tl (cScanMatchAux (cConsume or (w, o)) (a' :: w' ) (v' :: o')) = 886 | cScanMatchAux (cConsume or (w ++ [a'], o ++ [v'])) w' o'. 887 | Proof. 888 | intros Hwf Hlen. rewrite cConsume_snoc; auto. 889 | Qed. 890 | 891 | Lemma cScanMatchAux_skipn (or : ORegex) (w w': list A) (o o': list valuation) 892 | (n : nat) : 893 | outer_length_wf (w, o) 894 | -> length w' = length o' 895 | -> olength (w, o) > 0 896 | -> n <= length w' 897 | -> skipn n (cScanMatchAux (cConsume or (w, o)) w' o') = 898 | cScanMatchAux (cConsume or (w ++ firstn n w', o ++ firstn n o')) (skipn n w') (skipn n o'). 899 | Proof. 900 | intros Hwf Hlen Hlen'. 901 | revert w o w' o' Hwf Hlen Hlen'. 902 | induction n. { 903 | intros w o w' o' Hwf Hlen' Hlen Hn. 904 | repeat rewrite skipn_O. repeat rewrite firstn_O. 905 | repeat rewrite app_nil_r. reflexivity. 906 | } { 907 | intros w o w' o' Hwf Hlen' Hlen Hn. 908 | destruct w' as [ | a' w'']. { 909 | simpl in Hn. lia. 910 | } 911 | destruct o' as [ | v' o'']. { 912 | simpl in Hlen'. lia. 913 | } 914 | simpl skipn at 2 3. 915 | simpl firstn. 916 | rewrite cScanMatchAux_cons. 917 | rewrite skipn_cons. 918 | rewrite <- cConsume_snoc; auto. 919 | rewrite IHn; auto. 920 | f_equal. f_equal. f_equal. 921 | - rewrite <- app_assoc. reflexivity. 922 | - rewrite <- app_assoc. reflexivity. 923 | - unfold outer_length_wf in Hwf |- *. simpl in Hwf |- *. 924 | repeat rewrite app_length. simpl. lia. 925 | - unfold olength in Hlen |- *. simpl in Hlen |- *. 926 | rewrite app_length. simpl. lia. 927 | - simpl in Hn. lia. 928 | } 929 | Qed. 930 | 931 | Lemma cScanMatch_nth_error_S (or : ORegex) (w : list A) (o : list valuation) (n : nat) : 932 | outer_length_wf (w, o) 933 | -> n > 0 934 | -> n <= length w 935 | -> nth_error (cScanMatch or (w, o)) n = 936 | Some (cFinal (cConsume or (ofirstn n (w, o)))). 937 | Proof. 938 | intros Hwf Hn1 Hn. 939 | (* n != 0 *) 940 | destruct n. lia. 941 | (* when n = 1 *) 942 | destruct n. { 943 | destruct o as [ | o0 o']. 1 : { unfold outer_length_wf in Hwf. simpl in Hwf. lia. } 944 | destruct o' as [ | o1 o'']. 1 : { unfold outer_length_wf in Hwf. simpl in Hwf. lia. } 945 | destruct w as [ | a0 w']. 1 : { unfold outer_length_wf in Hwf. simpl in Hwf. lia. } 946 | unfold ofirstn. simpl firstn. 947 | simpl cScanMatch. 948 | remember (cScanMatchAux _ _). 949 | remember 0 as zero. 950 | replace 1 with (S zero) by auto. 951 | simpl nth_error. subst zero. 952 | rewrite <- hd_error_nth_error. 953 | subst l. rewrite cScanMatchAux_hd_error. 954 | reflexivity. 955 | } 956 | (* when n >= 2 *){ 957 | destruct o as [ | o0 o']. 1 : { unfold outer_length_wf in Hwf. simpl in Hwf. lia. } 958 | destruct o' as [ | o1 o'']. 1 : { unfold outer_length_wf in Hwf. simpl in Hwf. lia. } 959 | destruct w as [ | a0 w']. 1 : { unfold outer_length_wf in Hwf. simpl in Hwf. lia. } 960 | destruct w' as [ | a1 w'']. 1 : { simpl in Hn. lia. } 961 | destruct o'' as [ | o2 o''']. 1 : { unfold outer_length_wf in Hwf. simpl in Hwf. lia. } 962 | unfold ofirstn. simpl firstn. 963 | simpl cScanMatch. 964 | simpl nth_error. 965 | remember (syncVal _ _) as cr2. 966 | replace cr2 with (cConsume or ([a0; a1], [o0; o1; o2])) by auto. 967 | clear Heqcr2. 968 | replace ((a0 :: a1 :: firstn n w'', o0 :: o1 :: o2 :: firstn n o''')) 969 | with (([a0; a1] ++ firstn n w'', [o0; o1; o2] ++ firstn n o''')) by auto. 970 | replace n with (0 + n) at 1 by auto. 971 | rewrite nth_error_skipn. rewrite <- hd_error_nth_error. 972 | rewrite cScanMatchAux_skipn. 973 | - rewrite cScanMatchAux_hd_error. reflexivity. 974 | - unfold outer_length_wf. reflexivity. 975 | - unfold outer_length_wf in Hwf. simpl in Hwf. lia. 976 | - unfold olength. simpl. lia. 977 | - simpl in Hn. lia. 978 | } 979 | Qed. 980 | 981 | Lemma cScanMatch_nth_error_O (or : ORegex) (w : list A) (o : list valuation) : 982 | outer_length_wf (w, o) 983 | -> nth_error (cScanMatch or (w, o)) 0 = 984 | Some (cNullable (cConsume or (ofirstn 0 (w, o)))). 985 | Proof. 986 | intros Hwf. 987 | destruct o as [ | o0 o']. 1 : { unfold outer_length_wf in Hwf. simpl in Hwf. lia. } 988 | unfold ofirstn. simpl firstn. 989 | rewrite <- hd_error_nth_error. 990 | destruct w; destruct o'; auto. 991 | all : unfold outer_length_wf in Hwf; simpl in Hwf; lia. 992 | Qed. 993 | 994 | Lemma cScanMatch_tape (r : ORegex) (os : @ostring A) : 995 | outer_length_wf os 996 | -> (length (cScanMatch r os) = olength os + 1) /\ 997 | forall i, 998 | i <= olength os -> 999 | (nth_error (cScanMatch r os) i = Some true <-> match_oregex r (ofirstn i os)) /\ 1000 | (nth_error (cScanMatch r os) i = Some false <-> ~ match_oregex r (ofirstn i os)). 1001 | Proof. 1002 | intros Hwf. 1003 | split. 1004 | { rewrite cScanMatch_length. lia. assumption. } 1005 | intros i Hlen. 1006 | destruct i. 1007 | (* when i is 0 *) { 1008 | destruct os as [w o]. 1009 | rewrite cScanMatch_nth_error_O; auto. 1010 | destruct o as [ | o0 o']. 1 : { unfold outer_length_wf in Hwf. simpl in Hwf. lia. } 1011 | unfold ofirstn. simpl. 1012 | pose proof (cConsume_empty r o0) as [_ Hsynced]. 1013 | pose proof (cmembership_empty r o0) as Hmembership. 1014 | clear Hsynced. 1015 | repeat split; intros. 1016 | - inversion H; tauto. 1017 | - rewrite <- Hmembership in H. simpl in H. now rewrite H. 1018 | - inversion H. intros M. apply Hmembership in M. 1019 | simpl in M. rewrite H1 in M. discriminate. 1020 | - simpl in Hmembership. remember (cNullable _) as b. 1021 | destruct b; [ tauto | auto]. 1022 | } 1023 | (* when i > 0 *) { 1024 | destruct os as [w o]. 1025 | rewrite cScanMatch_nth_error_S; auto. 2: lia. 1026 | unfold outer_length_wf in Hwf. unfold olength in Hlen. 1027 | simpl in Hlen, Hwf. 1028 | replace (ofirstn (S i) (w, o)) with ((firstn (S i) w, firstn (S (S i)) o)) by auto. 1029 | destruct (unsnoc (firstn (S i) w)) as [[w' a] | ] eqn:Ew. 1030 | 2 : { 1031 | rewrite unsnoc_None in Ew. 1032 | apply f_equal with (f := @length A) in Ew. 1033 | rewrite firstn_length in Ew. 1034 | simpl length in Ew. lia. 1035 | } 1036 | rewrite unsnoc_Some in Ew. rewrite Ew. 1037 | destruct (unsnoc (firstn (S (S i)) o)) as [[o' v] | ] eqn:Eo. 1038 | 2 : { 1039 | rewrite unsnoc_None in Eo. 1040 | apply f_equal with (f := @length valuation) in Eo. 1041 | rewrite firstn_length in Eo. 1042 | simpl length in Eo. lia. 1043 | } 1044 | rewrite unsnoc_Some in Eo. rewrite Eo. 1045 | assert (outer_length_wf (w', o')). { 1046 | unfold outer_length_wf. simpl. 1047 | apply f_equal with (f := @length A) in Ew. 1048 | rewrite firstn_length in Ew. 1049 | apply f_equal with (f := @length valuation) in Eo. 1050 | rewrite firstn_length in Eo. 1051 | rewrite app_length in Ew, Eo. simpl length in Ew, Eo. 1052 | lia. 1053 | } 1054 | pose proof (cConsume_nonempty r w' o' a v H) as [_ Hsynced]. 1055 | pose proof (cmembership_nonempty r w' o' a v H) as Hmembership. 1056 | clear Hsynced H. 1057 | repeat split; intros. 1058 | - inversion H; tauto. 1059 | - rewrite <- Hmembership in H. now rewrite H. 1060 | - remember ((cConsume r (w' ++ [a], o' ++ [v]))). 1061 | inversion H. intros M. apply Hmembership in M. 1062 | simpl in M. rewrite H1 in M. discriminate. 1063 | - remember ((cConsume r (w' ++ [a], o' ++ [v]))). 1064 | remember (cFinal _) as b. 1065 | destruct b; [ tauto | auto]. 1066 | } 1067 | Qed. 1068 | 1069 | End CMRegex. -------------------------------------------------------------------------------- /theories/Equations.v: -------------------------------------------------------------------------------- 1 | (** 2 | 3 | This file contains a number of useful lemmas and relations which allow equational reasoning on regular expressions. 4 | 5 | 1. The [regex_eq : LRegex -> LRegex -> Prop] relation is an equivalence relation defined on regular expressions. 6 | - We use the notation [r1 ≡ r2] to denote that [r1] and [r2] are equivalent. 7 | - We prove that [regex_eq] is a congruence with respect to the regex constructors. 8 | 2. The [regex_leq : LRegex -> LRegex -> Prop] relation is a partial order defined on regular expressions. 9 | - We use the notation [r1 ⊑ r2] to denote that whenever 10 | [r1] matches [w] between [i, j], then so does [r2]. 11 | - We prove that the regex constructors are monotonic with respect to [regex_leq]. 12 | 3. We prove a number of lemmas involving these relations. 13 | - This includes the Kleene Axioms 14 | - A number of identitites involving lookaheads and lookbehinds 15 | 16 | *) 17 | 18 | Require Import Lia. 19 | Require Import Coq.Arith.Wf_nat. 20 | 21 | 22 | Require Import Coq.Lists.List. 23 | 24 | Require Export Setoid. 25 | Require Import Morphisms Setoid. 26 | Require Export Relation_Definitions. 27 | Require Import Relation_Operators. 28 | Require Import Operators_Properties. 29 | 30 | Require Import LRegex. 31 | 32 | Open Scope bool_scope. 33 | 34 | Section Equations. 35 | 36 | Context {A : Type}. 37 | 38 | Definition regex_eq (r1 r2 : @LRegex A) : Prop := 39 | forall w start delta, 40 | match_regex r1 w start delta <-> match_regex r2 w start delta. 41 | 42 | Infix "≡" := regex_eq (at level 70, no associativity). 43 | 44 | Definition regex_leq (r s : LRegex) : Prop := 45 | r ∪ s ≡ s. 46 | 47 | Notation "r ⊑ s" := (regex_leq r s) (at level 70). 48 | 49 | Definition suffix_match (r : LRegex) (w : list A) (i : nat) : Prop := 50 | match_regex r w i (length w - i). 51 | 52 | Definition wildcard : (@LRegex A) := (CharClass (fun _ => true)) *. 53 | 54 | Definition end_of_string : (@LRegex A) := 55 | (?!> (CharClass (fun _ => true)) · wildcard). 56 | 57 | Fixpoint regex_exp (r : LRegex) (n : nat) : (@LRegex A) := 58 | match n with 59 | | 0 => Epsilon 60 | | S n' => r · (regex_exp r n') 61 | end. 62 | 63 | Notation "r ^^ n" := (regex_exp r n) (at level 30). 64 | 65 | Definition empty_reg : @LRegex A := 66 | CharClass (fun _ => false). 67 | 68 | Notation "∅" := empty_reg. 69 | 70 | Lemma match_length : forall r (w : list A) start delta, 71 | match_regex r w start delta -> delta <= length w - start. 72 | Proof. 73 | induction r; intros. 74 | - rewrite match_eps_iff in H. lia. 75 | - rewrite match_class_iff in H. 76 | destruct H as [a [Ha1 [Ha2 H1]]]. 77 | assert (nth_error w start <> None) by congruence. 78 | apply nth_error_Some in H. lia. 79 | - rewrite match_concat_iff in H. 80 | destruct H as [d1 [d2 [H1 [H2 H3]]]]. 81 | apply IHr1 in H1. apply IHr2 in H2. lia. 82 | - rewrite match_union_iff in H. 83 | destruct H as [H1 | H2]. 84 | + apply IHr1 in H1. lia. 85 | + apply IHr2 in H2. lia. 86 | - remember (Star r) as r'. 87 | induction H; subst; try discriminate. 88 | + lia. 89 | + inversion Heqr'; subst. 90 | apply IHmatch_regex2 in Heqr'. 91 | apply IHr in H. lia. 92 | - rewrite match_lookahead_iff in H. 93 | lia. 94 | - rewrite match_lookbehind_iff in H. 95 | lia. 96 | - rewrite match_neglookahead_iff in H. 97 | lia. 98 | - rewrite match_neglookbehind_iff in H. 99 | lia. 100 | Qed. 101 | 102 | Instance regex_eq_refl : Reflexive regex_eq. 103 | Proof. 104 | intros r. unfold regex_eq. tauto. 105 | Qed. 106 | 107 | Instance regex_eq_sym : Symmetric regex_eq. 108 | Proof. 109 | intros r1 r2. unfold regex_eq. firstorder. 110 | Qed. 111 | 112 | Instance regex_eq_trans : Transitive regex_eq. 113 | Proof. 114 | intros r1 r2 r3. unfold regex_eq. firstorder. 115 | + rewrite <- H0. rewrite <- H. auto. 116 | + rewrite H. rewrite H0. auto. 117 | Qed. 118 | 119 | Hint Resolve regex_eq_sym regex_eq_refl regex_eq_trans : regex. 120 | 121 | Instance regex_eq_equiv : Equivalence regex_eq. 122 | Proof. 123 | split; auto with regex. 124 | Qed. 125 | 126 | Hint Resolve regex_eq_equiv : regex. 127 | 128 | Lemma concat_eps_l : forall (r : LRegex), 129 | Epsilon · r ≡ r. 130 | Proof. 131 | unfold regex_eq. intros. 132 | rewrite match_concat_iff. 133 | split. 134 | + intros [d1 [d2 [H1 [H2 H3]]]]. 135 | rewrite match_eps_iff in H1. 136 | subst. now replace (start + 0) with start in H2 by lia. 137 | + intros. exists 0, delta. 138 | repeat split. 139 | - now apply match_eps_iff. 140 | - replace (start + 0) with start by lia. auto. 141 | Qed. 142 | 143 | Lemma concat_eps_r : forall (r : LRegex), 144 | r · Epsilon ≡ r. 145 | Proof. 146 | unfold regex_eq. intros. 147 | rewrite match_concat_iff. 148 | split. 149 | + intros [d1 [d2 [H1 [H2 H3]]]]. 150 | rewrite match_eps_iff in H2. 151 | subst. now replace (d1 + 0) with d1 by lia. 152 | + intros. exists delta, 0. 153 | repeat split; [auto | | lia]. 154 | - now apply match_eps_iff. 155 | Qed. 156 | 157 | Lemma concat_assoc : forall r1 r2 r3, 158 | (r1 · r2) · r3 ≡ r1 · (r2 · r3). 159 | Proof. 160 | intros. unfold regex_eq. intros. 161 | rewrite match_concat_iff. rewrite match_concat_iff. 162 | split. 163 | - intros [d12 [d3 [H1 [H2 H3]]]]. 164 | rewrite match_concat_iff in H1. 165 | destruct H1 as [d1 [d2 [H1 [H4 H5]]]]. 166 | exists d1, (d2 + d3). 167 | repeat split; [auto| |lia]. 168 | rewrite match_concat_iff. 169 | exists d2, d3. 170 | repeat split; auto. 171 | now replace (start + d1 + d2) with (start + d12) by lia. 172 | - intros [d1 [d23 [H1 [H2 H3]]]]. 173 | rewrite match_concat_iff in H2. 174 | destruct H2 as [d2 [d3 [H2 [H4 H5]]]]. 175 | exists (d1 + d2), d3. 176 | repeat split; [auto| |lia]. 177 | rewrite match_concat_iff. 178 | exists d1, d2. 179 | repeat split; auto. 180 | now replace (start + (d1 + d2)) with (start + d1 + d2) by lia. 181 | Qed. 182 | 183 | Hint Rewrite concat_eps_l concat_eps_r concat_assoc : regex. 184 | 185 | Lemma union_assoc : forall (r1 r2 r3 : LRegex), 186 | (r1 ∪ r2) ∪ r3 ≡ r1 ∪ (r2 ∪ r3). 187 | Proof. 188 | unfold regex_eq. intros. 189 | repeat rewrite match_union_iff. 190 | tauto. 191 | Qed. 192 | 193 | Lemma union_comm : forall (r1 r2 : LRegex), 194 | r1 ∪ r2 ≡ r2 ∪ r1. 195 | Proof. 196 | unfold regex_eq. intros. 197 | repeat rewrite match_union_iff. 198 | tauto. 199 | Qed. 200 | 201 | Lemma union_idemp : forall (r : LRegex), 202 | r ∪ r ≡ r. 203 | Proof. 204 | unfold regex_eq. intros. 205 | repeat rewrite match_union_iff. 206 | tauto. 207 | Qed. 208 | 209 | Hint Rewrite union_assoc union_comm union_idemp : regex. 210 | 211 | Lemma concat_distrib_union_l : forall (r s t : LRegex), 212 | r · (s ∪ t) ≡ (r · s) ∪ (r · t). 213 | Proof. 214 | unfold regex_eq. intros. 215 | rewrite match_concat_iff. 216 | rewrite match_union_iff. 217 | repeat rewrite match_concat_iff. 218 | split. 219 | - intros [d1 [d2 [H1 [H2 H3]]]]. 220 | rewrite match_union_iff in H2. 221 | destruct H2 as [H2 | H2]. 222 | + left. exists d1, d2. repeat split; auto. 223 | + right. exists d1, d2. repeat split; auto. 224 | - intros [H1 | H1]. 225 | + destruct H1 as [d1 [d2 [H1 [H2 H3]]]]. 226 | exists d1, d2. repeat split; auto. 227 | rewrite match_union_iff. auto. 228 | + destruct H1 as [d1 [d2 [H1 [H2 H3]]]]. 229 | exists d1, d2. repeat split; auto. 230 | rewrite match_union_iff. auto. 231 | Qed. 232 | 233 | Lemma concat_distrib_union_r : forall (r s t : LRegex), 234 | (r ∪ s) · t ≡ (r · t) ∪ (s · t). 235 | Proof. 236 | unfold regex_eq. intros. 237 | rewrite match_concat_iff. 238 | rewrite match_union_iff. 239 | repeat rewrite match_concat_iff. 240 | split. 241 | - intros [d1 [d2 [H1 [H2 H3]]]]. 242 | rewrite match_union_iff in H1. 243 | destruct H1 as [H1 | H1]. 244 | + left. exists d1, d2. repeat split; auto. 245 | + right. exists d1, d2. repeat split; auto. 246 | - intros [H1 | H1]. 247 | + destruct H1 as [d1 [d2 [H1 [H2 H3]]]]. 248 | exists d1, d2. repeat split; auto. 249 | rewrite match_union_iff. auto. 250 | + destruct H1 as [d1 [d2 [H1 [H2 H3]]]]. 251 | exists d1, d2. repeat split; auto. 252 | rewrite match_union_iff. auto. 253 | Qed. 254 | 255 | Hint Rewrite concat_distrib_union_l concat_distrib_union_r : regex. 256 | 257 | Lemma lookahead_comm : forall (r1 r2 : LRegex), 258 | (?> r1) · (?> r2) ≡ (?> r2) · (?> r1). 259 | Proof. 260 | unfold regex_eq. intros. 261 | repeat rewrite match_concat_iff. 262 | split. 263 | - intros [d1 [d2 [H1 [H2 H3]]]]. 264 | rewrite match_lookahead_iff in H1. 265 | rewrite match_lookahead_iff in H2. 266 | destruct H1 as [H11 H12]. 267 | destruct H2 as [H21 H22]. 268 | subst. exists 0, 0. 269 | replace (start + 0) with start in * by lia. 270 | repeat rewrite match_lookahead_iff. 271 | firstorder. 272 | - intros [d1 [d2 [H1 [H2 H3]]]]. 273 | rewrite match_lookahead_iff in H1. 274 | rewrite match_lookahead_iff in H2. 275 | destruct H1 as [H11 H12]. 276 | destruct H2 as [H21 H22]. 277 | subst. exists 0, 0. 278 | replace (start + 0) with start in * by lia. 279 | repeat rewrite match_lookahead_iff. 280 | firstorder. 281 | Qed. 282 | 283 | Lemma lookahead_star_eps : forall (r : LRegex), 284 | (?> r) * ≡ Epsilon. 285 | Proof. 286 | unfold regex_eq. intros. 287 | rewrite match_eps_iff. 288 | split; intros. 289 | - remember ( (?> r) *) as e. 290 | induction H; inversion Heqe. 291 | split. 292 | subst. 293 | assert (d2 = 0) by now apply IHmatch_regex2. 294 | rewrite match_lookahead_iff in H. 295 | lia. 296 | - subst. constructor. 297 | Qed. 298 | 299 | Lemma lookahead_distrib_union : forall (r s : LRegex), 300 | (?> r) ∪ (?> s) ≡ (?> r ∪ s). 301 | Proof. 302 | unfold regex_eq. intros. 303 | rewrite match_union_iff. 304 | repeat rewrite match_lookahead_iff. 305 | rewrite match_union_iff. 306 | tauto. 307 | Qed. 308 | 309 | Lemma lookahead_neglookahead_inverse : forall (r : LRegex), 310 | (?> r) ∪ (?!> r) ≡ Epsilon. 311 | Proof. 312 | unfold regex_eq. intros. 313 | rewrite match_union_iff. 314 | pose proof match_lem r w start (length w - start) as [Hr | Hr]; 315 | rewrite match_eps_iff; 316 | rewrite match_lookahead_iff; 317 | rewrite match_neglookahead_iff; 318 | rewrite <- match_not_match; 319 | firstorder. 320 | Qed. 321 | 322 | Hint Rewrite lookahead_comm lookahead_star_eps lookahead_distrib_union lookahead_neglookahead_inverse : regex. 323 | 324 | Lemma lookahead_conj (r s : LRegex) : 325 | forall start w, 326 | suffix_match ((?> r) · s) w start 327 | <-> (suffix_match r w start /\ suffix_match s w start). 328 | Proof. 329 | intros. unfold suffix_match. 330 | rewrite match_concat_iff. 331 | split. 332 | - intros [d1 [d2 [H1 [H2 H3]]]]. 333 | rewrite match_lookahead_iff in H1. 334 | destruct H1 as [H11 H12]. 335 | subst. simpl. 336 | replace (start + 0) with start in H2 by lia. 337 | simpl in H3. subst. auto. 338 | - intros [H1 H2]. 339 | exists 0, (length w - start). 340 | replace (start + 0) with start by lia. 341 | simpl. 342 | repeat split; auto. 343 | + now constructor. 344 | Qed. 345 | 346 | Lemma wildcard_match : forall w start delta, 347 | delta <= length w - start 348 | -> match_regex wildcard w start delta. 349 | Proof. 350 | intros w start delta. revert w start. 351 | induction delta. 352 | - constructor. 353 | - intros. replace (S delta) with (1 + delta) by lia. 354 | unfold wildcard. 355 | constructor. 356 | + eapply match_class; [auto | ]. 357 | apply nth_error_nth'. 358 | lia. 359 | + apply IHdelta. 360 | lia. 361 | Unshelve. 362 | destruct w eqn:E. simpl in H. lia. 363 | exact a. 364 | Qed. 365 | 366 | Lemma wildcard_match_iff : forall w start delta, 367 | delta <= length w - start 368 | <-> match_regex wildcard w start delta. 369 | Proof. 370 | split. 371 | - apply wildcard_match. 372 | - apply match_length. 373 | Qed. 374 | 375 | Lemma lookahead_flatten_wildcard : forall (r s: LRegex), 376 | (?> r · (?> s) · wildcard) ≡ (?> r · s). 377 | Proof. 378 | unfold regex_eq. intros. 379 | repeat rewrite match_lookahead_iff. 380 | repeat rewrite match_concat_iff. 381 | split. 382 | - intros [[d1 [d2 [H1 [H2 H3]]]] Hdelta]. 383 | rewrite match_concat_iff in H2. 384 | destruct H2 as [d3 [d4 [H2 [H4 H5]]]]. 385 | rewrite match_lookahead_iff in H2. 386 | destruct H2 as [H2 H6]. 387 | subst. split; [| auto]. 388 | exists d1, (length w - (start + d1)). 389 | repeat split; auto. 390 | lia. 391 | - intros [[d1 [d2 [H1 [H2 H3]]]] Hdelta]. 392 | split; [| auto]. 393 | exists d1, (length w - (start + d1)). 394 | split; [auto | split; [ | lia]]. 395 | apply lookahead_conj. split. 396 | + unfold suffix_match. 397 | replace (length w - (start + d1)) with d2 by lia. 398 | auto. 399 | + apply wildcard_match. lia. 400 | Qed. 401 | 402 | Lemma lookahead_flatten : forall (r s: LRegex), 403 | (?> (?> r) · s) ≡ (?> r) · (?> s). 404 | Proof. 405 | unfold regex_eq. intros. 406 | repeat rewrite match_lookahead_iff. 407 | repeat rewrite match_concat_iff. 408 | split. 409 | - intros [[d1 [d2 [H1 [H2 H3]]]] Hdelta]. 410 | subst delta. 411 | exists 0, 0. 412 | simpl. 413 | repeat rewrite match_lookahead_iff. 414 | rewrite match_lookahead_iff in H1. 415 | destruct H1 as [H1 H4]. 416 | subst d1. simpl in H3. subst d2. 417 | replace (start + 0) with start in * by lia. 418 | auto. 419 | - intros [d1 [d2 [Hr [Hs Hd]]]]. 420 | rewrite match_lookahead_iff in Hr. 421 | rewrite match_lookahead_iff in Hs. 422 | destruct Hr as [Hr Hr']. 423 | destruct Hs as [Hs Hs']. 424 | subst. split; auto. 425 | exists 0, (length w - start). 426 | rewrite match_lookahead_iff. 427 | replace (start + 0) with start in * by lia. 428 | split; auto. 429 | Qed. 430 | 431 | Lemma lookahead_neglookahead_eps (r : LRegex) : 432 | (?> r) ∪ (?!> r) ≡ Epsilon. 433 | Proof. 434 | unfold regex_eq. intros. 435 | rewrite match_union_iff. 436 | pose proof match_lem r w start (length w - start) as [Hr | Hr]; 437 | rewrite match_eps_iff; 438 | rewrite match_lookahead_iff; 439 | rewrite match_neglookahead_iff; 440 | rewrite <- match_not_match; 441 | firstorder. 442 | Qed. 443 | 444 | Lemma lookahed_neglookahead_concat (r s : LRegex) : 445 | (?> (?!> r) · s) ≡ (?!> r) · (?> s). 446 | Proof. 447 | unfold regex_eq. intros. 448 | rewrite match_concat_iff. 449 | rewrite match_lookahead_iff. 450 | rewrite match_concat_iff. 451 | split. 452 | - intros [[d1 [d2 [H1 [H2 H3]]]] Hdelta]. 453 | subst delta. 454 | exists 0, 0. 455 | rewrite match_neglookahead_iff in H1. 456 | destruct H1 as [H1 H4]. 457 | subst d1. simpl in *. subst d2. 458 | repeat split. 459 | + rewrite match_neglookahead_iff. 460 | split; auto. 461 | + replace (start + 0) with start in * by lia. 462 | rewrite match_lookahead_iff. 463 | split; auto. 464 | - intros [d1 [d2 [H1 [H2 H3]]]]. 465 | rewrite match_neglookahead_iff in H1. 466 | rewrite match_lookahead_iff in H2. 467 | destruct H1 as [H1 H4]. 468 | destruct H2 as [H2 H5]. 469 | subst. split; [ | auto]. 470 | exists 0, (length w - start). 471 | repeat split. 472 | + rewrite match_neglookahead_iff. 473 | split; auto. 474 | + replace (start + 0) with start in * by lia. 475 | auto. 476 | Qed. 477 | 478 | 479 | Lemma neglookahead_union (r s : LRegex) : 480 | (?!> (?> r) · s) ≡ (?!> r) ∪ (?!> s). 481 | Proof. 482 | unfold regex_eq. intros. 483 | rewrite match_union_iff. 484 | rewrite match_neglookahead_iff. 485 | rewrite not_match_concat_iff. 486 | repeat rewrite match_neglookahead_iff. 487 | split. 488 | - intros [H Hdelta]. subst delta. 489 | specialize (H 0 ltac:(lia)). 490 | destruct H. 491 | + rewrite not_match_lookahead_iff in H. 492 | destruct H; [ | congruence]. 493 | tauto. 494 | + replace (start + 0) with start in H by lia. 495 | replace (length w - start - 0) with (length w - start) in H by lia. 496 | tauto. 497 | - intros [[H Hd] | [H Hd]]; subst delta. 498 | + split; [ | auto]. 499 | intros d Hd. 500 | rewrite not_match_lookahead_iff. 501 | tauto. 502 | + split; [ | auto]. 503 | intros d Hd. 504 | assert (d = 0 \/ d <> 0) as [Hd' | Hd'] by lia. 505 | * subst. replace (start + 0) with start by lia. 506 | replace (length w - start - 0) with (length w - start) by lia. 507 | auto. 508 | * left. rewrite not_match_lookahead_iff. 509 | right. auto. 510 | Qed. 511 | 512 | Lemma double_neglookahead_lookahead (r : LRegex) : 513 | (?!> (?!> r · wildcard) · wildcard) ≡ (?> r · wildcard). 514 | Proof. 515 | unfold regex_eq. intros. 516 | split. 517 | - intros. 518 | rewrite match_lookahead_iff. 519 | rewrite match_neglookahead_iff in H. 520 | rewrite not_match_concat_iff in H. 521 | destruct H. 522 | subst delta. split; [ | auto]. 523 | rewrite match_concat_iff. 524 | specialize (H 0 ltac:(lia)). 525 | destruct H as [H | H]. 526 | + rewrite not_match_neglookahead_iff in H. 527 | destruct H ; [ | congruence]. 528 | rewrite match_concat_iff in H. 529 | auto. 530 | + rewrite <- match_not_match in H. 531 | replace (start + 0) with start in H by lia. 532 | replace (length w - start - 0) with (length w - start) in H by lia. 533 | pose proof wildcard_match w start (length w - start) ltac:(lia) as Hwild. 534 | contradiction. 535 | - intros. 536 | rewrite match_neglookahead_iff. 537 | rewrite not_match_concat_iff. 538 | rewrite match_lookahead_iff in H. 539 | rewrite match_concat_iff in H. 540 | destruct H as [[d1 [d2 [H1 [H2 H3]]]] Hd]. 541 | subst delta. 542 | split; [ | auto]. 543 | intros d Hd. 544 | rewrite not_match_neglookahead_iff. 545 | left. 546 | rewrite match_concat_iff. left. 547 | exists d1, d2. 548 | tauto. 549 | Qed. 550 | 551 | Lemma neglookahead_unsat (r : LRegex) : 552 | (forall w start delta, ~ match_regex r w start delta) 553 | -> (?> r) ≡ r. 554 | Proof. 555 | intros Hunsat. 556 | unfold regex_eq. intros. 557 | rewrite match_lookahead_iff. 558 | split; intros. 559 | - destruct H. specialize (Hunsat w start (length w - start)). 560 | contradiction. 561 | - specialize (Hunsat w start delta). 562 | contradiction. 563 | Qed. 564 | 565 | 566 | Lemma lookahead_neglookahed_union (r s : LRegex) : 567 | (?!> (?!> r) · s) ≡ (?> r) ∪ (?!> s). 568 | Proof. 569 | unfold regex_eq. intros. 570 | rewrite match_union_iff. 571 | rewrite match_lookahead_iff. 572 | repeat rewrite match_neglookahead_iff. 573 | rewrite not_match_concat_iff. 574 | split. 575 | - intros [H Hdelta]. subst delta. 576 | specialize (H 0 ltac:(lia)). 577 | rewrite not_match_neglookahead_iff in H. 578 | replace (start + 0) with start in H by lia. 579 | replace (length w - start - 0) with (length w - start) in H by lia. 580 | tauto. 581 | - intros [[H Hd] | [H Hd]]; subst delta. 582 | + split; [ | auto]. 583 | intros d Hd. 584 | rewrite not_match_neglookahead_iff. 585 | tauto. 586 | + split; [ | auto]. 587 | intros d Hd. 588 | assert (d = 0 \/ d <> 0) as [Hd' | Hd'] by lia. 589 | * subst. replace (start + 0) with start by lia. 590 | replace (length w - start - 0) with (length w - start) by lia. 591 | auto. 592 | * left. rewrite not_match_neglookahead_iff. 593 | right. auto. 594 | Qed. 595 | 596 | Lemma lookahead_neglookahead_not_match (r : @LRegex A) : 597 | forall w start delta, 598 | ~ match_regex ((?> r) · (?!> r)) w start delta. 599 | Proof. 600 | intros w start delta. 601 | unfold not. 602 | rewrite match_concat_iff. 603 | intros [d1 [d2 [H1 [H2 H3]]]]. 604 | rewrite match_lookahead_iff in H1. 605 | rewrite match_neglookahead_iff in H2. 606 | rewrite <- match_not_match in H2. 607 | destruct H1 as [H1 H4]. 608 | destruct H2 as [H2 H5]. 609 | subst. replace (start + 0) with start in H2 by lia. 610 | auto. 611 | Qed. 612 | 613 | Lemma peel_lookahead (p1 p2 : A -> bool) (r1 r2 : LRegex) : 614 | (?> (CharClass p1) · r1) · (CharClass p2) · r2 615 | ≡ (CharClass (fun c => p1 c && p2 c)) · (?> r1 ) · r2. 616 | Proof. 617 | unfold regex_eq. intros. 618 | repeat rewrite match_concat_iff. 619 | split. 620 | - intros [d1 [d2 [H1 [H2 H3]]]]. 621 | rewrite match_concat_iff in H2. 622 | destruct H2 as [d3 [d4 [H2 [H4 H5]]]]. 623 | rewrite match_lookahead_iff in H1. 624 | destruct H1 as [H1 H6]. 625 | subst d1. simpl in H3. 626 | rewrite match_concat_iff in H1. 627 | destruct H1 as [d5 [d6 [H1 [H7 H8]]]]. 628 | rewrite match_class_iff in H1. 629 | destruct H1 as [a [Ha [Haa Hd5]]]. 630 | rewrite match_class_iff in H2. 631 | destruct H2 as [b [Hb [Hbb Hd6]]]. 632 | subst d3 d5. 633 | replace (start + 0) with start in * by lia. 634 | assert (a = b). 635 | { rewrite Haa in Hbb. inversion Hbb. auto. } 636 | subst b. 637 | exists 1, (0 + d4). 638 | repeat split; [ | | lia]. 639 | + apply match_class with (a := a). 640 | rewrite Bool.andb_true_iff; auto. 641 | auto. 642 | + apply match_concat. 643 | * apply match_lookahead. 644 | replace (length w - (start + 1)) with d6 by lia. 645 | auto. 646 | * replace (start + 1 + 0) with (start + 1) by lia. 647 | auto. 648 | - intros [d1 [d2 [H1 [H2 H3]]]]. 649 | exists 0, delta. 650 | rewrite match_concat_iff in H2. 651 | destruct H2 as [d3 [d4 [H2 [H4 H5]]]]. 652 | rewrite match_lookahead_iff in H2. 653 | destruct H2 as [H2 H6]. 654 | subst d3. simpl in H5. subst d4. 655 | replace (start + d1 + 0) with (start + d1) in H4 by lia. 656 | rewrite match_class_iff in H1. 657 | destruct H1 as [a [Ha [Haa Hd1]]]. 658 | subst d1. rewrite Bool.andb_true_iff in Ha. 659 | destruct Ha as [Ha1 Ha2]. 660 | repeat split. 661 | + apply match_lookahead. 662 | assert (start < length w). { 663 | apply nth_error_Some. 664 | destruct (nth_error w start) eqn:Hnth; 665 | inversion Haa; discriminate. 666 | } 667 | replace (length w - start) with (1 + (length w - (start + 1))) by lia. 668 | apply match_concat; [ | auto]. 669 | apply match_class with (a := a); auto. 670 | + replace (start + 0) with start by lia. 671 | rewrite H3. 672 | apply match_concat; [ | auto]. 673 | now apply match_class with (a := a). 674 | Qed. 675 | 676 | Hint Rewrite lookahead_flatten_wildcard lookahead_flatten lookahead_neglookahead_eps 677 | peel_lookahead : regex. 678 | Hint Resolve lookahead_neglookahead_not_match : regex. 679 | 680 | Lemma match_star_r (r : LRegex) (w : list A) (start : nat) (d1 d2 : nat) : 681 | match_regex (r *) w start d1 682 | -> match_regex r w (start + d1) d2 683 | -> match_regex (r *) w start (d1 + d2). 684 | Proof. 685 | intros Hmatch. remember (r *) as e. 686 | revert d2. induction Hmatch; intros; inversion Heqe. 687 | - replace (start + 0) with start in H by lia. 688 | replace (0 + d2) with (d2 + 0) by lia. 689 | constructor. auto. constructor. 690 | - subst. replace (d1 + d2 + d0) with (d1 + (d2 + d0)) by lia. 691 | constructor. auto. 692 | apply IHHmatch2. auto. 693 | replace (start + (d1 + d2)) with (start + d1 + d2) in H by lia. 694 | auto. 695 | Qed. 696 | 697 | Lemma lookahead_eps_concat (r : @LRegex A): 698 | forall w start delta, 699 | match_regex ((?> Epsilon) · r) w start delta 700 | -> match_regex ((?> Epsilon)) w start delta. 701 | Proof. 702 | intros w start delta. 703 | rewrite match_concat_iff. 704 | rewrite match_lookahead_iff. 705 | rewrite match_eps_iff. 706 | intros [d1 [d2 [H1 [H2 H3]]]]. 707 | rewrite match_lookahead_iff in H1. 708 | destruct H1 as [H1 H4]. 709 | subst d1. simpl in H3. 710 | rewrite match_eps_iff in H1. 711 | apply match_length in H2. lia. 712 | Qed. 713 | 714 | Lemma lookahead_eps_eps : 715 | ((?> Epsilon) · Epsilon) ≡ (?> Epsilon). 716 | Proof. 717 | unfold regex_eq. split. 718 | - apply lookahead_eps_concat. 719 | - rewrite match_concat_iff. 720 | rewrite match_lookahead_iff. 721 | rewrite match_eps_iff. 722 | intros [H1 H2]. subst. 723 | exists 0, 0. repeat split. 724 | + apply match_lookahead. 725 | rewrite H1. constructor. 726 | + constructor. 727 | Qed. 728 | 729 | Lemma lookahead_eps_charclass (p : A -> bool) (r : @LRegex A) : 730 | forall w start delta, 731 | not_match_regex ((?> Epsilon) · (CharClass p) · r) w start delta. 732 | Proof. 733 | intros. 734 | rewrite <- match_not_match. 735 | unfold not. intros. 736 | rewrite match_concat_iff in H. 737 | destruct H as [d1 [d2 [H [H1 H2]]]]. 738 | rewrite match_lookahead_iff in H. 739 | rewrite match_concat_iff in H1. 740 | destruct H1 as [d3 [d4 [H3 [H4 H5]]]]. 741 | rewrite match_class_iff in H3. 742 | destruct H as [H H6]. 743 | destruct H3 as [a [Ha [Haa H3]]]. 744 | subst d1 d3. 745 | rewrite match_eps_iff in H. 746 | replace (start + 0) with start in Haa by lia. 747 | assert (start < length w). { 748 | apply nth_error_Some. 749 | destruct (nth_error w start) eqn:Hnth; 750 | inversion Haa; discriminate. 751 | } 752 | lia. 753 | Qed. 754 | 755 | Lemma end_of_string_match : forall w start delta, 756 | match_regex end_of_string w start delta 757 | <-> delta = 0 /\ start >= length w. 758 | Proof. 759 | unfold end_of_string. intros. 760 | rewrite match_neglookahead_iff. 761 | rewrite not_match_concat_iff. 762 | split. 763 | - intros [H Hd]. subst. split; [ auto | ]. 764 | assert (length w > start \/ length w <= start) as [Hd | Hd] by lia. 765 | + specialize (H 1 ltac:(lia)). 766 | destruct H. 767 | * rewrite not_match_class_iff in H. 768 | destruct H. congruence. 769 | destruct H. rewrite nth_error_None in H. lia. 770 | destruct H. destruct H. congruence. 771 | * rewrite <- match_not_match in H. 772 | pose proof wildcard_match w (start + 1) (length w - (start + 1)) ltac:(auto) as Hwild. 773 | replace (length w - (start + 1)) with (length w - start - 1) in Hwild by lia. 774 | contradiction. 775 | + lia. 776 | - intros [Hdelta Hstart]. subst. 777 | split; [ | auto]. 778 | intros d Hd. 779 | assert (d = 0) by lia. subst. 780 | left. 781 | rewrite not_match_class_iff. 782 | left. discriminate. 783 | Qed. 784 | 785 | Hint Rewrite lookahead_eps_eps : regex. 786 | 787 | Lemma unmatch_star_r (r : LRegex) (w : list A) (start delta : nat) : 788 | match_regex (r *) w start delta 789 | -> (delta = 0 \/ 790 | exists d1 d2, d1 + d2 = delta 791 | /\ match_regex (r *) w start d1 792 | /\ match_regex r w (start + d1) d2). 793 | Proof. 794 | intros. remember (r *) as e. 795 | induction H; inversion Heqe. 796 | - auto. 797 | - subst. right. 798 | assert (r * = r *) by auto. 799 | apply IHmatch_regex2 in H1. 800 | destruct H1 as [H1 | [d3 [d4 [H1 [H2 H3]]]]]. 801 | + subst. exists 0, d1. repeat split; auto. 802 | * constructor. 803 | * replace (start + 0) with start by lia. 804 | auto. 805 | + exists (d1 + d3), d4. 806 | repeat split; auto. 807 | * lia. 808 | * constructor; auto. 809 | * replace (start + (d1 + d3)) with (start + d1 + d3) by lia. 810 | auto. 811 | Qed. 812 | 813 | Lemma match_star_r_iff (r : LRegex) (w : list A) (start : nat) (delta : nat) : 814 | match_regex (r *) w start delta 815 | <-> (delta = 0 \/ 816 | exists d1 d2, d1 + d2 = delta 817 | /\ match_regex (r *) w start d1 818 | /\ match_regex r w (start + d1) d2). 819 | Proof. 820 | split; [apply unmatch_star_r | ]. 821 | intros [H1 | [d1 [d2 [H1 [H2 H3]]]]]. 822 | - subst. constructor. 823 | - rewrite <- H1. now apply match_star_r. 824 | Qed. 825 | 826 | Lemma match_star_app (r : LRegex) (w : list A) (start d1 d2 : nat) : 827 | match_regex (r *) w start d1 828 | -> match_regex (r *) w (start + d1) d2 829 | -> match_regex (r *) w start (d1 + d2). 830 | Proof. 831 | intros Hmatch. remember (r *) as e. 832 | revert d2. induction Hmatch; intros; inversion Heqe. 833 | - replace (start + 0) with start in H by lia. 834 | replace (0 + d2) with d2 by lia. 835 | subst. auto. 836 | - subst. replace (d1 + d2 + d0) with (d1 + (d2 + d0)) by lia. 837 | constructor. auto. 838 | apply IHHmatch2. auto. 839 | replace (start + (d1 + d2)) with (start + d1 + d2) in H by lia. 840 | apply H. 841 | Qed. 842 | 843 | Lemma match_star_once (r : LRegex) (w : list A) (start d : nat) : 844 | match_regex r w start d 845 | -> match_regex (r *) w start d. 846 | Proof. 847 | intros Hmatch. 848 | replace d with (d + 0) by lia. 849 | constructor. auto. constructor. 850 | Qed. 851 | 852 | Hint Resolve match_star_once match_star_app : regex. 853 | 854 | Lemma star_idemp : forall (r : LRegex), 855 | r * * ≡ r *. 856 | Proof. 857 | unfold regex_eq. intros. 858 | split; intros. 859 | - remember (r * *) as e. 860 | induction H; inversion Heqe. 861 | + subst. constructor. 862 | + subst. apply match_star_app. 863 | * apply H. 864 | * apply IHmatch_regex2. 865 | auto. 866 | - replace delta with (delta + 0) by lia. 867 | constructor; [auto | constructor]. 868 | Qed. 869 | 870 | 871 | 872 | Instance regex_leq_refl : Reflexive regex_leq. 873 | Proof. 874 | unfold regex_leq. 875 | auto using union_idemp. 876 | Defined. 877 | 878 | Instance union_proper : Proper (regex_eq ==> regex_eq ==> regex_eq) Union. 879 | Proof. 880 | unfold regex_eq. 881 | intros r1 r2 Hr s1 s2 Hs. 882 | intros w start delta. 883 | repeat rewrite match_union_iff. 884 | firstorder. 885 | Defined. 886 | 887 | Instance concat_proper : Proper (regex_eq ==> regex_eq ==> regex_eq) Concat. 888 | Proof. 889 | unfold regex_eq. 890 | intros r1 r2 Hr s1 s2 Hs. 891 | intros w start delta. 892 | repeat rewrite match_concat_iff. 893 | split. 894 | - intros [d1 [d2 [H1 [H2 H3]]]]. 895 | exists d1, d2. 896 | repeat split; [ | | auto]. 897 | + apply Hr. auto. 898 | + apply Hs. auto. 899 | - intros [d1 [d2 [H1 [H2 H3]]]]. 900 | exists d1, d2. 901 | repeat split; [ | | auto]. 902 | + apply Hr. auto. 903 | + apply Hs. auto. 904 | Defined. 905 | 906 | Instance star_proper : Proper (regex_eq ==> regex_eq) Star. 907 | Proof. 908 | assert ( 909 | (forall (r1 r2 : LRegex), 910 | (forall (w : list A) (start delta : nat), 911 | match_regex r1 w start delta -> match_regex r2 w start delta) 912 | -> forall w start delta, 913 | match_regex (r1 *) w start delta -> match_regex (r2 *) w start delta)). 914 | { intros r1 r2 Hmatch w start delta. 915 | remember (r1 *) as e. 916 | induction 1; inversion Heqe. 917 | - subst. constructor. 918 | - subst. constructor. 919 | + auto. 920 | + apply IHmatch_regex2. auto. 921 | } 922 | unfold regex_eq. 923 | intros r1 r2 Hr. 924 | intros w start delta. 925 | split; 926 | apply H; apply Hr. 927 | Defined. 928 | 929 | Instance lookahead_proper : Proper (regex_eq ==> regex_eq) LookAhead. 930 | Proof. 931 | unfold regex_eq. 932 | intros r s H w start delta. 933 | rewrite !match_lookahead_iff. 934 | now rewrite H. 935 | Defined. 936 | 937 | Instance neglookahead_proper : Proper (regex_eq ==> regex_eq) NegLookAhead. 938 | Proof. 939 | unfold regex_eq. 940 | intros r s H w start delta. 941 | rewrite !match_neglookahead_iff, <- !match_not_match. 942 | now rewrite H. 943 | Defined. 944 | 945 | Instance lookbehind_proper : Proper (regex_eq ==> regex_eq) LookBehind. 946 | Proof. 947 | unfold regex_eq. 948 | intros r s H w start delta. 949 | rewrite !match_lookbehind_iff. 950 | now rewrite H. 951 | Defined. 952 | 953 | Instance neglookbehind_proper : Proper (regex_eq ==> regex_eq) NegLookBehind. 954 | Proof. 955 | unfold regex_eq. 956 | intros r s H w start delta. 957 | rewrite !match_neglookbehind_iff, <- !match_not_match. 958 | now rewrite H. 959 | Defined. 960 | 961 | 962 | Hint Resolve union_proper concat_proper star_proper 963 | lookahead_proper neglookahead_proper lookbehind_proper neglookbehind_proper 964 | : regex. 965 | 966 | Instance regex_leq_trans : Transitive regex_leq. 967 | Proof. 968 | unfold regex_leq. 969 | intros r s t. 970 | intros Hrs Hst. 971 | rewrite <- Hst. 972 | rewrite <- union_assoc. 973 | rewrite Hrs. reflexivity. 974 | Defined. 975 | 976 | Instance regex_leq_antisym : Antisymmetric _ regex_eq regex_leq. 977 | Proof. 978 | unfold regex_leq. 979 | intros r s Hrs Hsr. 980 | rewrite <- Hrs. 981 | rewrite <- Hsr at 1. 982 | auto using union_comm. 983 | Defined. 984 | 985 | Instance regex_leq_preorder : PreOrder regex_leq. 986 | Proof. 987 | split; auto with typeclass_instances. 988 | Defined. 989 | 990 | Instance regex_leq_partialorder : PartialOrder regex_eq regex_leq. 991 | Proof. 992 | intros r s. simpl. 993 | split; intros. 994 | - split. 995 | + unfold regex_leq. 996 | rewrite H. auto using union_idemp. 997 | + unfold regex_leq. 998 | unfold Basics.flip. 999 | rewrite H. auto using union_idemp. 1000 | - destruct H. unfold Basics.flip in H0. 1001 | apply regex_leq_antisym; auto. 1002 | Defined. 1003 | 1004 | Lemma subset_leq : forall (r s : LRegex), 1005 | (forall (w : list A) (start delta : nat), 1006 | match_regex r w start delta -> match_regex s w start delta) 1007 | <-> r ⊑ s. 1008 | Proof. 1009 | intros r s. 1010 | unfold regex_leq. 1011 | unfold regex_eq. 1012 | split. 1013 | - intros H w start delta. 1014 | rewrite match_union_iff. 1015 | firstorder. 1016 | - intros H w start delta Hr. 1017 | apply H. rewrite match_union_iff. 1018 | firstorder. 1019 | Qed. 1020 | 1021 | Hint Resolve subset_leq : regex. 1022 | 1023 | 1024 | Lemma kleene_1 : forall (r : LRegex), 1025 | Epsilon ∪ (r · r *) ⊑ r *. 1026 | Proof. 1027 | intro r. 1028 | apply subset_leq with (s := r *). 1029 | intros w start delta H. 1030 | inversion H. 1031 | - subst. apply match_eps_iff in H5. 1032 | subst. constructor. 1033 | - inversion H5. subst. 1034 | constructor; auto. 1035 | Qed. 1036 | 1037 | Lemma kleene_2 : forall (r : LRegex), 1038 | Epsilon ∪ (r * · r) ⊑ r *. 1039 | Proof. 1040 | intro r. 1041 | apply subset_leq with (s := r *). 1042 | intros w start delta H. 1043 | inversion H. 1044 | - subst. apply match_eps_iff in H5. 1045 | subst. constructor. 1046 | - inversion H5. subst. 1047 | apply match_star_r; auto. 1048 | Qed. 1049 | 1050 | Lemma kleene_3_alt : forall (a b c : LRegex), 1051 | (a · c) ∪ b ⊑ c -> (a *) · b ⊑ c. 1052 | Proof. 1053 | intros a b c. 1054 | repeat rewrite <- subset_leq. 1055 | intros H w start delta Hconcat. 1056 | inversion Hconcat. subst. 1057 | clear Hconcat. revert H6. revert d2. 1058 | remember (a *) as e. 1059 | induction H2; try discriminate. 1060 | - intros. simpl. apply H. 1061 | apply match_union_r. 1062 | replace (start + 0) with start in H6 by lia. 1063 | auto. 1064 | - intros. apply H. 1065 | apply match_union_l. 1066 | inversion Heqe. subst. 1067 | replace (d1 + d2 + d0) with (d1 + (d2 + d0)) by lia. 1068 | apply match_concat; auto. 1069 | apply IHmatch_regex2; auto. 1070 | now replace (start + d1 + d2) with (start + (d1 + d2)) by lia. 1071 | Qed. 1072 | 1073 | Lemma kleene_3 : forall (r s : LRegex), 1074 | r · s ⊑ s -> (r *) · s ⊑ s. 1075 | Proof. 1076 | intros r s. 1077 | repeat rewrite <- subset_leq. 1078 | intros H w start delta Hconcat. 1079 | inversion Hconcat. subst. 1080 | clear Hconcat. revert H6. revert d2. 1081 | remember (r *) as e. 1082 | induction H2; try discriminate. 1083 | - intros. simpl. 1084 | now replace (start + 0) with start in H6 by lia. 1085 | - intros. inversion Heqe. subst. 1086 | replace (d1 + d2 + d0) with (d1 + (d2 + d0)) by lia. 1087 | apply H. 1088 | apply match_concat; auto. 1089 | apply IHmatch_regex2; auto. 1090 | now replace (start + d1 + d2) with (start + (d1 + d2)) by lia. 1091 | Qed. 1092 | 1093 | Lemma kleene_4 : forall (r s : LRegex), 1094 | r · s ⊑ r -> r · (s *) ⊑ r. 1095 | Proof. 1096 | intros r s H. 1097 | apply subset_leq. 1098 | intros w start delta HH. 1099 | inversion HH. subst. clear HH. 1100 | generalize dependent d1. 1101 | induction d2 using lt_wf_ind. 1102 | intros. 1103 | apply match_star_nonempty in H6. 1104 | destruct H6 as [Hd2 | [ds [X1 [X2 [X3 X4]]]]]. 1105 | - subst. 1106 | now replace (d1 + 0) with d1 by lia. 1107 | - replace (d1 + d2) with ((d1 + ds) + (d2 - ds)) by lia. 1108 | apply H0. 1109 | + lia. 1110 | + apply H. apply match_union_l. 1111 | apply match_concat; auto. 1112 | + now replace (start + d1 + ds) with (start + (d1 + ds)) in X4 by lia. 1113 | Qed. 1114 | 1115 | Lemma star_expand_l : forall (r : LRegex), 1116 | r * ≡ Epsilon ∪ (r · r *). 1117 | Proof. 1118 | intros r. 1119 | apply regex_leq_antisym. 1120 | - apply subset_leq. 1121 | intros w start delta H. 1122 | inversion H. 1123 | + apply match_union_l. constructor. 1124 | + apply match_union_r. apply match_concat; auto. 1125 | - apply kleene_1. 1126 | Qed. 1127 | 1128 | Lemma star_expand_r : forall (r : LRegex), 1129 | r * ≡ Epsilon ∪ (r * · r). 1130 | Proof. 1131 | intros r. 1132 | apply regex_leq_antisym. 1133 | - apply subset_leq. 1134 | intros w start delta H. 1135 | apply unmatch_star_r in H. 1136 | destruct H as [Hd | [d1 [d2 [H1 [H2 H3]]]]]. 1137 | + apply match_union_l. subst; constructor. 1138 | + apply match_union_r. 1139 | rewrite <- H1. apply match_concat; auto. 1140 | - apply kleene_2. 1141 | Qed. 1142 | 1143 | Lemma star_exp (r : LRegex) : 1144 | forall n, r ^^ n ⊑ r *. 1145 | Proof. 1146 | intro. 1147 | apply subset_leq. 1148 | induction n. 1149 | - simpl. 1150 | intros w start delta H. 1151 | apply match_eps_iff in H. 1152 | subst. constructor. 1153 | - simpl. 1154 | intros w start delta H. 1155 | inversion H. subst. 1156 | apply IHn in H6. 1157 | constructor; auto. 1158 | Qed. 1159 | 1160 | Instance union_monotone : Proper (regex_leq ==> regex_leq ==> regex_leq) Union. 1161 | Proof. 1162 | intros r1 r2 Hr s1 s2 Hs. 1163 | unfold regex_leq in *. 1164 | rewrite union_assoc. 1165 | rewrite <- union_assoc with (r2 := r2). 1166 | rewrite union_comm with (r1 := s1) at 1. 1167 | rewrite <- union_assoc. 1168 | rewrite <- union_assoc. rewrite Hr. 1169 | rewrite union_assoc. rewrite Hs. 1170 | reflexivity. 1171 | Qed. 1172 | 1173 | Instance concat_monotone : Proper (regex_leq ==> regex_leq ==> regex_leq) Concat. 1174 | Proof. 1175 | intros r1 r2 Hr s1 s2 Hs. 1176 | rewrite <- subset_leq in Hr, Hs. 1177 | apply subset_leq. 1178 | intros w start delta H. 1179 | inversion H. 1180 | apply match_concat; auto. 1181 | Qed. 1182 | 1183 | Lemma empty_reg_never : forall w i d, 1184 | ~ match_regex ∅ w i d. 1185 | Proof. 1186 | intros. 1187 | unfold empty_reg. 1188 | rewrite match_class_iff. 1189 | intros [a [H _]]. 1190 | discriminate. 1191 | Qed. 1192 | 1193 | Lemma union_empty_l : forall (r : @LRegex A), 1194 | ∅ ∪ r ≡ r. 1195 | Proof. 1196 | unfold regex_eq. intros. 1197 | rewrite match_union_iff. 1198 | split; auto. 1199 | intros [H | H]; [ | auto]. 1200 | apply empty_reg_never in H. contradiction. 1201 | Qed. 1202 | 1203 | Lemma union_empty_r : forall (r : @LRegex A), 1204 | r ∪ ∅ ≡ r. 1205 | Proof. 1206 | intros. 1207 | rewrite union_comm. 1208 | apply union_empty_l. 1209 | Qed. 1210 | 1211 | End Equations. -------------------------------------------------------------------------------- /theories/Extract.v: -------------------------------------------------------------------------------- 1 | Require Import Layerwise. 2 | 3 | Require Import Extraction. 4 | Require Import Coq.extraction.ExtrHaskellNatInt. 5 | Require Import Coq.extraction.ExtrHaskellBasic. 6 | Extraction Language Haskell. 7 | 8 | Require Import Coq.Lists.List. 9 | 10 | Extract Inlined Constant rev => "Prelude.reverse". 11 | 12 | Set Extraction Output Directory "../haskell/src". 13 | Extraction "Extracted.hs" Layerwise.scanMatch Layerwise.llmatch. -------------------------------------------------------------------------------- /theories/LRegex.v: -------------------------------------------------------------------------------- 1 | (** 2 | 3 | This file contains the following: 4 | 1. Syntax for regular expressions [LRegex : Type] 5 | 2. Semantics for regular expressions [match_regex : LRegex -> list A -> nat -> nat -> Prop]. 6 | - Note that the notation is supposed to be read as [match_regex r w start delta] which means 7 | that [r] matches [w] between [start] and [start + delta]. 8 | 3. The predicate [not_match_regex : LRegex -> list A -> nat -> nat -> Prop] which is the negation of [match_regex]. 9 | - A proposition [match_not_match] shows that [not_match_regex] is the negation of [match_regex]. 10 | - A proposition [match_lem] showing that given any regular expression and positions, either [match_regex] or [not_match_regex] holds. 11 | 4. The predicate [is_tape : LRegex -> list A -> list bool -> Prop]. We say that [t : list bool] is a tape for [r : LRegex] and [w : list A] 12 | if the i-th element of [t] is true if and only if [match_regex r w 0 i]. 13 | 5. The predicate [is_scanMatcher : (LRegex -> list A -> list bool) -> Prop ]. We say that [f : LRegex -> list A -> list bool] is a scanMatcher 14 | if for any regular expression [r : LRegex] and any word [w : list A], [f r w] is a tape for [r] and [w]. 15 | 6. A number of lemmas such as [match_eps_iff], [match_concat_iff], [match_star_iff], etc that are useful in manipulating the 16 | [match_regex] predicate. 17 | 18 | *) 19 | 20 | Require Import Coq.Lists.List. 21 | Require Import Coq.Arith.Wf_nat. 22 | Require Import Lia. 23 | 24 | 25 | Declare Scope regex_scope. 26 | Open Scope regex_scope. 27 | 28 | Section Syntax. 29 | 30 | Context {A : Type}. 31 | 32 | Inductive LRegex : Type := 33 | | Epsilon : LRegex 34 | | CharClass : (A -> bool) -> LRegex 35 | | Concat : LRegex -> LRegex -> LRegex 36 | | Union : LRegex -> LRegex -> LRegex 37 | | Star : LRegex -> LRegex 38 | | LookAhead : LRegex -> LRegex 39 | | LookBehind : LRegex -> LRegex 40 | | NegLookAhead : LRegex -> LRegex 41 | | NegLookBehind : LRegex -> LRegex. 42 | 43 | End Syntax. 44 | 45 | Definition ε {A : Type} := @Epsilon A. 46 | Infix "·" := Concat (at level 60, right associativity) : regex_scope. 47 | Infix "∪" := Union (at level 50, left associativity) : regex_scope. 48 | Notation "r *" := (Star r) (at level 40, no associativity) : regex_scope. 49 | 50 | Notation "(?> r )" := (LookAhead r) (at level 40, no associativity) : regex_scope. 51 | Notation "(?!> r )" := (NegLookAhead r) (at level 40, no associativity) : regex_scope. 52 | Notation "(?< r )" := (LookBehind r) (at level 40, no associativity) : regex_scope. 53 | Notation "(? list A -> nat -> nat -> Prop := 60 | | match_epsilon : forall (w : list A) (start : nat), 61 | match_regex ε w start 0 62 | | match_class : forall (a : A) (pred : A -> bool) (w : list A) (start : nat), 63 | (pred a = true) -> (nth_error w start = Some a) -> match_regex (CharClass pred) w start 1 64 | | match_concat : forall (r1 r2 : LRegex) (w : list A) (start d1 d2 : nat), 65 | match_regex r1 w start d1 -> match_regex r2 w (start + d1) d2 66 | -> match_regex (r1 · r2) w start (d1 + d2) 67 | | match_union_l : forall (r1 r2 : LRegex) (w : list A) (start d : nat), 68 | match_regex r1 w start d -> match_regex (r1 ∪ r2) w start d 69 | | match_union_r : forall (r1 r2 : LRegex) (w : list A) (start d : nat), 70 | match_regex r2 w start d -> match_regex (r1 ∪ r2) w start d 71 | | match_star_eps : forall (r : LRegex) (w : list A) (start : nat), 72 | match_regex (r *) w start 0 73 | | match_star : forall (r : LRegex) (w : list A) (start d1 d2 : nat), 74 | match_regex r w start d1 -> match_regex (r *) w (start + d1) d2 75 | -> match_regex (r *) w start (d1 + d2) 76 | | match_lookahead : forall (r : LRegex) (w : list A) (start : nat), 77 | match_regex r w start (length w - start) -> match_regex ( (?> r) ) w start 0 78 | | match_lookbehind : forall (r : LRegex) (w : list A) (start : nat), 79 | match_regex r w 0 start -> match_regex ((?< r)) w start 0 80 | | match_neglookahead : forall (r : LRegex) (w : list A) (start : nat), 81 | not_match_regex r w start (length w - start) -> match_regex ( (?!> r) ) w start 0 82 | | match_neglookbehind : forall (r : LRegex) (w : list A) (start : nat), 83 | not_match_regex r w 0 start -> match_regex ((? list A -> nat -> nat -> Prop := 85 | | not_match_epsilon : forall (w : list A) (start delta : nat), 86 | (delta <> 0) -> not_match_regex ε w start delta 87 | | not_match_class_false : forall (a : A) (pred : A -> bool) (w : list A) (start delta: nat), 88 | (pred a = false) -> (nth_error w start = Some a) -> not_match_regex (CharClass pred) w start delta 89 | | not_match_class_index : forall (pred : A -> bool) (w : list A) (start delta : nat), 90 | (nth_error w start = None) -> not_match_regex (CharClass pred) w start delta 91 | | not_match_class_length : forall (pred : A -> bool) (w : list A) (start delta : nat), 92 | (delta <> 1) -> not_match_regex (CharClass pred) w start delta 93 | | not_match_concat : forall (r1 r2 : LRegex) (w : list A) (start delta : nat), 94 | (forall d, d <= delta -> not_match_regex r1 w start d \/ not_match_regex r2 w (start + d) (delta - d)) 95 | -> not_match_regex (Concat r1 r2) w start delta 96 | | not_match_union : forall (r1 r2 : LRegex) (w : list A) (start d : nat), 97 | not_match_regex r1 w start d -> not_match_regex r2 w start d 98 | -> not_match_regex (r1 ∪ r2) w start d 99 | | not_match_star : forall (r : LRegex) (w : list A) (start delta : nat), 100 | (delta <> 0) 101 | -> (forall d, 0 < d <= delta -> not_match_regex r w start d \/ not_match_regex (r *) w (start + d) (delta - d)) 102 | -> not_match_regex (r *) w start delta 103 | | not_match_lookahead_length : forall (r : LRegex) (w : list A) (start delta : nat), 104 | (delta <> 0) -> not_match_regex ( (?> r) ) w start delta 105 | | not_match_lookahead_false : forall (r : LRegex) (w : list A) (start delta : nat), 106 | not_match_regex r w start (length w - start) -> not_match_regex ( (?> r) ) w start delta 107 | | not_match_lookbehind_length : forall (r : LRegex) (w : list A) (start delta : nat), 108 | (delta <> 0) -> not_match_regex ((?< r)) w start delta 109 | | not_match_lookbehind_false : forall (r : LRegex) (w : list A) (start delta : nat), 110 | not_match_regex r w 0 start -> not_match_regex ((?< r)) w start delta 111 | | not_match_neglookahead_length : forall (r : LRegex) (w : list A) (start delta : nat), 112 | (delta <> 0) -> not_match_regex ( (?!> r) ) w start delta 113 | | not_match_neglookahead_false : forall (r : LRegex) (w : list A) (start delta : nat), 114 | match_regex r w start (length w - start) -> not_match_regex ( (?!> r) ) w start delta 115 | | not_match_neglookbehind_length : forall (r : LRegex) (w : list A) (start delta : nat), 116 | (delta <> 0) -> not_match_regex ((? not_match_regex ((? 125 | (nth_error t delta = Some true <-> match_regex r w 0 delta) /\ 126 | (nth_error t delta = Some false <-> ~ match_regex r w 0 delta). 127 | 128 | Definition is_tape_slice (r : LRegex) (w : list A) (t : list bool) (start delta : nat) : Prop := 129 | start + delta <= length w /\ 130 | (length t = delta + 1) /\ 131 | forall i, 132 | i <= delta -> 133 | (nth_error t i = Some true <-> match_regex r w start i) /\ 134 | (nth_error t i = Some false <-> ~ match_regex r w start i). 135 | 136 | Definition is_scanMatcher (scanMatch : LRegex -> list A -> list bool) : Prop := 137 | forall r w, 138 | is_tape r w (scanMatch r w). 139 | 140 | Lemma match_eps_iff : forall w start delta, 141 | match_regex ε w start delta <-> delta = 0. 142 | Proof. 143 | split; intros. 144 | - now inversion H. 145 | - subst. apply match_epsilon. 146 | Qed. 147 | 148 | Lemma not_match_eps_iff : forall w start delta, 149 | not_match_regex ε w start delta <-> delta <> 0. 150 | Proof. 151 | split; intros. 152 | - now inversion H. 153 | - destruct delta. 154 | + tauto. 155 | + now apply not_match_epsilon. 156 | Qed. 157 | 158 | Lemma match_class_iff : forall pred w start delta, 159 | match_regex (CharClass pred) w start delta <-> 160 | exists a, (pred a = true) /\ (nth_error w start = Some a) /\ (delta = 1). 161 | Proof. 162 | split; intros. 163 | - inversion H. exists a. auto. 164 | - destruct H as [a [H1 [H2 H3]]]. subst delta. apply (match_class a); assumption. 165 | Qed. 166 | 167 | Lemma not_match_class_iff : forall pred w start delta, 168 | not_match_regex (CharClass pred) w start delta <-> 169 | (delta <> 1) 170 | \/ (nth_error w start = None) 171 | \/ (exists a, nth_error w start = Some a /\ pred a = false). 172 | Proof. 173 | split; intros. 174 | - inversion H; firstorder. 175 | - firstorder. 176 | * now apply not_match_class_length. 177 | * now apply not_match_class_index. 178 | * now apply not_match_class_false with (a := x). 179 | Qed. 180 | 181 | Lemma match_union_iff : forall r1 r2 w start delta, 182 | match_regex (Union r1 r2) w start delta <-> 183 | (match_regex r1 w start delta) \/ (match_regex r2 w start delta). 184 | Proof. 185 | split; intros. 186 | - inversion H; auto. 187 | - destruct H. 188 | + now apply match_union_l. 189 | + now apply match_union_r. 190 | Qed. 191 | 192 | Lemma not_match_union_iff : forall r1 r2 w start delta, 193 | not_match_regex (Union r1 r2) w start delta <-> 194 | (not_match_regex r1 w start delta) /\ (not_match_regex r2 w start delta). 195 | Proof. 196 | split; intros. 197 | - inversion H; auto. 198 | - destruct H. apply not_match_union; assumption. 199 | Qed. 200 | 201 | Lemma match_concat_iff : forall r1 r2 w start delta, 202 | match_regex (Concat r1 r2) w start delta <-> 203 | exists d1 d2, 204 | (match_regex r1 w start d1) 205 | /\ (match_regex r2 w (start + d1) d2) 206 | /\ (delta = d1 + d2). 207 | Proof. 208 | split; intros. 209 | - inversion H. exists d1. exists d2. auto. 210 | - destruct H as [d1 [d2 [H1 [H2 H3]]]]. subst delta. 211 | now apply match_concat. 212 | Qed. 213 | 214 | Lemma match_concat_iff_2 : forall r1 r2 w start delta, 215 | match_regex (Concat r1 r2) w start delta <-> 216 | exists d1, 217 | (match_regex r1 w start d1) 218 | /\ (match_regex r2 w (start + d1) (delta - d1)) 219 | /\ d1 <= delta. 220 | Proof. 221 | intros. rewrite match_concat_iff; 222 | split; intros; 223 | destruct H as [d1 [d2 [H1 H]]]; exists d1. 224 | + split. assumption. 225 | destruct H as [H2 H3]. 226 | assert (d2 = delta - d1) as H4. lia. 227 | split. 228 | * now subst d2. 229 | * lia. 230 | + exists (delta - d1). repeat split; auto. 231 | lia. 232 | Qed. 233 | 234 | Lemma not_match_concat_iff : forall r1 r2 w start delta, 235 | not_match_regex (Concat r1 r2) w start delta <-> 236 | forall d, d <= delta 237 | -> not_match_regex r1 w start d 238 | \/ not_match_regex r2 w (start + d) (delta - d). 239 | Proof. 240 | split; intro. 241 | - inversion H. auto. 242 | - constructor. auto. 243 | Qed. 244 | 245 | Lemma match_lookahead_iff : forall r w start delta, 246 | match_regex (LookAhead r) w start delta <-> 247 | (match_regex r w start (length w - start)) /\ (delta = 0). 248 | Proof. 249 | split; intros. 250 | - inversion H. split. assumption. reflexivity. 251 | - destruct H as [H1 H2]. subst delta. apply match_lookahead. assumption. 252 | Qed. 253 | 254 | Lemma not_match_lookahead_iff : forall r w start delta, 255 | not_match_regex (LookAhead r) w start delta <-> 256 | (not_match_regex r w start (length w - start)) \/ (delta <> 0). 257 | Proof. 258 | split; intros. 259 | - inversion H. right. lia. 260 | auto. 261 | - assert (delta = 0 \/ delta <> 0) as Hd by lia. destruct Hd; destruct H. 262 | + now apply not_match_lookahead_false. 263 | + tauto. 264 | + now apply not_match_lookahead_length. 265 | + now apply not_match_lookahead_length. 266 | Qed. 267 | 268 | Lemma match_neglookahead_iff : forall r w start delta, 269 | match_regex (NegLookAhead r) w start delta <-> 270 | (not_match_regex r w start (length w - start)) /\ (delta = 0). 271 | Proof. 272 | split; intros. 273 | - inversion H. split. assumption. reflexivity. 274 | - destruct H as [H1 H2]. subst delta. apply match_neglookahead. assumption. 275 | Qed. 276 | 277 | Lemma not_match_neglookahead_iff : forall r w start delta, 278 | not_match_regex (NegLookAhead r) w start delta <-> 279 | (match_regex r w start (length w - start)) \/ (delta <> 0). 280 | Proof. 281 | split; intros. 282 | - inversion H. right. lia. 283 | auto. 284 | - assert (delta = 0 \/ delta <> 0) as Hd by lia. destruct Hd; destruct H. 285 | + now apply not_match_neglookahead_false. 286 | + tauto. 287 | + now apply not_match_neglookahead_length. 288 | + now apply not_match_neglookahead_length. 289 | Qed. 290 | 291 | Lemma match_lookbehind_iff : forall r w start delta, 292 | match_regex (LookBehind r) w start delta <-> 293 | (match_regex r w 0 start) /\ (delta = 0). 294 | Proof. 295 | split; intros. 296 | - inversion H. split. assumption. reflexivity. 297 | - destruct H as [H1 H2]. subst delta. apply match_lookbehind. assumption. 298 | Qed. 299 | 300 | Lemma not_match_lookbehind_iff : forall r w start delta, 301 | not_match_regex (LookBehind r) w start delta <-> 302 | (not_match_regex r w 0 start) \/ (delta <> 0). 303 | Proof. 304 | split; intros. 305 | - inversion H. right. lia. 306 | auto. 307 | - assert (delta = 0 \/ delta <> 0) as Hd by lia. destruct Hd; destruct H. 308 | + now apply not_match_lookbehind_false. 309 | + tauto. 310 | + now apply not_match_lookbehind_length. 311 | + now apply not_match_lookbehind_length. 312 | Qed. 313 | 314 | Lemma match_neglookbehind_iff : forall r w start delta, 315 | match_regex (NegLookBehind r) w start delta <-> 316 | (not_match_regex r w 0 start) /\ (delta = 0). 317 | Proof. 318 | split; intros. 319 | - inversion H. split. assumption. reflexivity. 320 | - destruct H as [H1 H2]. subst delta. apply match_neglookbehind. assumption. 321 | Qed. 322 | 323 | Lemma not_match_neglookbehind_iff : forall r w start delta, 324 | not_match_regex (NegLookBehind r) w start delta <-> 325 | (match_regex r w 0 start) \/ (delta <> 0). 326 | Proof. 327 | split; intros. 328 | - inversion H. right. lia. 329 | auto. 330 | - assert (delta = 0 \/ delta <> 0) as Hd by lia. destruct Hd; destruct H. 331 | + now apply not_match_neglookbehind_false. 332 | + tauto. 333 | + now apply not_match_neglookbehind_length. 334 | + now apply not_match_neglookbehind_length. 335 | Qed. 336 | 337 | Lemma match_star_nonempty : forall r w start delta, 338 | match_regex (Star r) w start delta <-> 339 | delta = 0 \/ 340 | exists d1, 341 | 0 < d1 /\ d1 <= delta 342 | /\ (match_regex r w start d1) 343 | /\ (match_regex (Star r) w (start + d1) (delta - d1)). 344 | Proof. 345 | intros r w start delta. 346 | split. 347 | * intros. remember (Star r) as e. 348 | induction H; try discriminate. 349 | - tauto. 350 | - assert (r0 = r) by now inversion Heqe. subst r0. 351 | assert (d1 + d2 = 0 \/ d1 + d2 > 0) by lia. 352 | destruct H1; auto. 353 | apply IHmatch_regex2 in Heqe. 354 | destruct Heqe as [Heqe | Heqe]. 355 | + subst d2. right. exists d1. 356 | repeat split; try lia; auto. 357 | replace (d1 + 0 - d1) with 0 by lia. 358 | auto. 359 | + destruct Heqe as [d3 [X1 [X2 [X3 X4]]]]. 360 | right. assert (d1 = 0 \/ d1 > 0) as Hd by lia. destruct Hd. 361 | ** subst d1. exists d3. 362 | repeat split; auto. 363 | replace (start + 0) with start in X3 by lia. 364 | assumption. 365 | replace (start + 0 + d3) with (start + d3) in X4 by lia. 366 | assumption. 367 | ** exists d1. 368 | repeat split; auto. 369 | lia. replace (d1 + d2 - d1) with d2 by lia. 370 | assumption. 371 | * intros. destruct H as [H | H]. 372 | - subst delta. constructor. 373 | - destruct H as [d1 [H1 [H2 [H3 H4]]]]. 374 | replace delta with (d1 + (delta - d1)) by lia. 375 | apply match_star; assumption. 376 | Qed. 377 | 378 | Lemma match_star_iff : forall r w start delta, 379 | match_regex (Star r) w start delta <-> 380 | delta = 0 \/ 381 | exists d, 382 | d <= delta 383 | /\ (match_regex r w start d) 384 | /\ (match_regex (Star r) w (start + d) (delta - d)). 385 | Proof. 386 | split; intros. 387 | - inversion H. 388 | + auto. 389 | + right. exists d1. 390 | repeat split; auto. lia. 391 | now replace (d1 + d2 - d1) with d2 by lia. 392 | - destruct H as [H | H]. 393 | + subst delta. constructor. 394 | + destruct H as [d [Hd [H1 H2]]]. 395 | replace delta with (d + (delta - d)). 396 | apply match_star. auto. auto. 397 | lia. 398 | Qed. 399 | 400 | Lemma not_match_star_iff : forall r w start delta, 401 | not_match_regex (Star r) w start delta <-> 402 | delta <> 0 /\ 403 | forall d, 404 | 0 < d <= delta 405 | -> not_match_regex r w start d 406 | \/ not_match_regex (Star r) w (start + d) (delta - d). 407 | Proof. 408 | split; intros. 409 | - inversion H. auto. 410 | - apply not_match_star; tauto. 411 | Qed. 412 | 413 | Lemma bounded_lem (P : nat -> Prop) : 414 | (forall n, P n \/ ~ P n) 415 | -> forall N, (exists n, n <= N /\ P n) \/ ~ (exists n, n <= N /\ P n). 416 | Proof. 417 | intro HP. 418 | induction N. 419 | - destruct (HP 0). 420 | + left. exists 0. auto. 421 | + right. intro. destruct H0 as [n [Hn HPn]]. 422 | assert (n = 0) by lia. congruence. 423 | - destruct (HP (S N)). 424 | + left. exists (S N). auto. 425 | + destruct (IHN) as [[ n [IH1 IH2]] | IH ]. 426 | * left. exists n. auto. 427 | * right. intro. destruct H0 as [m [Hm HPm]]. 428 | assert (m = S N \/ m < S N) as Hm' by lia. 429 | destruct Hm'. 430 | ** subst m. contradiction. 431 | ** assert (m <= N) by lia. apply IH. 432 | exists m. auto. 433 | Qed. 434 | 435 | Lemma star_lem : forall r, 436 | (forall w start delta, 437 | match_regex r w start delta \/ ~ match_regex r w start delta) 438 | -> forall w start delta, 439 | match_regex (Star r) w start delta \/ ~ match_regex (Star r) w start delta. 440 | Proof. 441 | intros r Hrlem w start delta. revert start. 442 | induction delta using lt_wf_ind; intros. 443 | rewrite match_star_nonempty. 444 | (* get rid of the possibility that delta = 0 *) 445 | assert (delta = 0 \/ delta <> 0) as [Hd | Hd] by lia; [firstorder|]. 446 | pose (P (d : nat) := 447 | 0 < d 448 | /\ match_regex r w start d 449 | /\ match_regex (Star r) w (start + d) (delta - d)). 450 | enough ((exists d, d <= delta /\ P d) \/ ~ (exists d, d <= delta /\ P d)) 451 | by firstorder. 452 | apply bounded_lem. 453 | intros d. subst P. simpl. 454 | (* get rid of the case when d = 0 *) 455 | assert (d = 0 \/ d <> 0) as [Hd' | Hd'] by lia ; [lia|]. 456 | assert (delta - d < delta) as Hdelta by lia. 457 | destruct (Hrlem w start d); 458 | destruct (H (delta - d) Hdelta (start + d)); firstorder. 459 | left. split. lia. split; auto. 460 | Qed. 461 | 462 | Lemma concat_lem : forall r1 r2, 463 | (forall w start delta, 464 | match_regex r1 w start delta \/ ~ match_regex r1 w start delta) 465 | -> (forall w start delta, 466 | match_regex r2 w start delta \/ ~ match_regex r2 w start delta) 467 | -> forall w start delta, 468 | match_regex (Concat r1 r2) w start delta 469 | \/ ~ match_regex (Concat r1 r2) w start delta. 470 | Proof. 471 | intros r1 r2 IH1 IH2. 472 | intros w start delta. 473 | rewrite match_concat_iff_2. 474 | pose (P (d : nat) := match_regex r1 w start d /\ match_regex r2 w (start + d) (delta - d)). 475 | enough ((exists d, d <= delta /\ P d) \/ ~ (exists d, d <= delta /\ P d)) by firstorder. 476 | apply bounded_lem. subst P. simpl. 477 | intro d. 478 | destruct (IH1 w start d); 479 | destruct (IH2 w (start + d) (delta - d)); 480 | firstorder. 481 | Qed. 482 | 483 | Lemma star_match_not_match : forall r, 484 | (forall w start delta, 485 | ~ (match_regex r w start delta) <-> not_match_regex r w start delta) 486 | -> (forall w start delta, 487 | match_regex r w start delta \/ ~ match_regex r w start delta) 488 | -> forall w start delta, 489 | ~ (match_regex (Star r) w start delta) 490 | <-> not_match_regex (Star r) w start delta. 491 | Proof. 492 | intros r Hr Hlem w start delta. revert start. 493 | induction delta using lt_wf_ind; intros. 494 | rewrite match_star_nonempty, not_match_star_iff. 495 | assert (delta = 0 \/ delta <> 0) as [Hd | Hd] by lia; [firstorder|]. 496 | split; intro. 497 | - split; auto. 498 | intros dd Hdd. 499 | rewrite <- Hr. rewrite <- H by lia. 500 | destruct (Hlem w start dd); 501 | destruct (star_lem r Hlem w (start + dd) (delta - dd)); 502 | firstorder. 503 | - intro. destruct H1; auto. 504 | firstorder. specialize (H2 x). 505 | rewrite <- Hr in H2. rewrite <- H in H2 by lia. 506 | firstorder. 507 | Qed. 508 | 509 | Lemma match_iff_not_match_aux : forall r w start delta, 510 | (~ (match_regex r w start delta) 511 | <-> (not_match_regex r w start delta)) 512 | /\ (match_regex r w start delta 513 | \/ ~ (match_regex r w start delta)). 514 | Proof. 515 | induction r; intros. 516 | { (* eps *) 517 | rewrite match_eps_iff. 518 | rewrite not_match_eps_iff. 519 | lia. 520 | } 521 | { (* charclass *) 522 | rewrite match_class_iff. 523 | rewrite not_match_class_iff. 524 | split. 525 | { split; intro. 526 | - assert (delta = 1 \/ delta <> 1) as Hd by lia. 527 | destruct Hd; auto. 528 | subst delta. 529 | destruct (nth_error). 530 | + right. right. exists a. split; auto. 531 | destruct (b a) eqn:F; auto. 532 | contradiction H. exists a; auto. 533 | + auto. 534 | - destruct H as [H | [H | H]]; 535 | intro; firstorder; congruence. 536 | } 537 | { assert (delta = 1 \/ delta <> 1) as Hd by lia. 538 | destruct (nth_error w start) as [a | ]eqn:E; 539 | destruct Hd; 540 | try destruct (b a) eqn:F; 541 | firstorder. 542 | - right. intro. destruct H0. firstorder. congruence. 543 | - right. intro. destruct H0. firstorder. discriminate. 544 | } 545 | } 546 | { (* concat *) 547 | rewrite match_concat_iff. 548 | rewrite not_match_concat_iff. 549 | assert 550 | (forall w start delta, (~ match_regex r1 w start delta <-> not_match_regex r1 w start delta)) 551 | as IH1a by firstorder. 552 | assert 553 | (forall w start delta, (~ match_regex r2 w start delta <-> not_match_regex r2 w start delta)) 554 | as IH2a by firstorder. 555 | assert 556 | (forall w start delta, (match_regex r1 w start delta \/ ~ match_regex r1 w start delta)) 557 | as IH1b by firstorder. 558 | assert 559 | (forall w start delta, (match_regex r2 w start delta \/ ~ match_regex r2 w start delta)) 560 | as IH2b by firstorder. 561 | clear IHr1 IHr2. 562 | split. 563 | { split; intro. 564 | - intros. 565 | rewrite <- IH1a. 566 | rewrite <- IH2a. 567 | destruct (IH1b w start d) as [H1 | H1]; 568 | destruct (IH2b w (start + d) (delta - d)) as [H2 | H2]; auto. 569 | exfalso. apply H. exists d, (delta - d). repeat split; auto. lia. 570 | - intros [d [dd [Hd [Hdd Hdelta]]]]. 571 | exfalso. assert (d <= delta) by lia. 572 | apply H in H0. 573 | rewrite <- IH1a in H0. 574 | rewrite <- IH2a in H0. firstorder. 575 | replace dd with (delta - d) in * by lia. auto. 576 | } 577 | { rewrite <- match_concat_iff. rewrite match_concat_iff_2. 578 | pose (P (d : nat) := match_regex r1 w start d /\ match_regex r2 w (start + d) (delta - d)). 579 | enough ((exists d, d <= delta /\ P d) \/ ~ (exists d, d <= delta /\ P d)) by firstorder. 580 | apply bounded_lem. subst P. simpl. 581 | intro d. destruct (IH1b w start d); destruct (IH2b w (start + d) (delta - d)); firstorder. 582 | } 583 | } 584 | { (* union *) 585 | rewrite match_union_iff. 586 | rewrite not_match_union_iff. 587 | firstorder. 588 | destruct (IHr1 w start delta); destruct (IHr2 w start delta); firstorder. 589 | } 590 | { (* star *) 591 | split. 592 | { apply star_match_not_match; firstorder. } 593 | { apply star_lem; firstorder. } 594 | } 595 | { (* lookahead *) 596 | rewrite match_lookahead_iff. 597 | rewrite not_match_lookahead_iff. 598 | assert (delta = 0 \/ delta <> 0) as [Hd | Hd] by lia; firstorder. 599 | } 600 | { (* lookbehind *) 601 | rewrite match_lookbehind_iff. 602 | rewrite not_match_lookbehind_iff. 603 | assert (delta = 0 \/ delta <> 0) as [Hd | Hd] by lia; firstorder. 604 | } 605 | { (* neg-lookahead *) 606 | rewrite match_neglookahead_iff. 607 | rewrite not_match_neglookahead_iff. 608 | assert (delta = 0 \/ delta <> 0) as [Hd | Hd] by lia; firstorder. 609 | } 610 | { (* neg-lookbehind *) 611 | rewrite match_neglookbehind_iff. 612 | rewrite not_match_neglookbehind_iff. 613 | assert (delta = 0 \/ delta <> 0) as [Hd | Hd] by lia; firstorder. 614 | } 615 | Qed. 616 | 617 | 618 | Lemma match_not_match : forall r w start delta, 619 | ~ (match_regex r w start delta) <-> (not_match_regex r w start delta). 620 | Proof. 621 | intros. 622 | pose proof match_iff_not_match_aux r w start delta. 623 | firstorder. 624 | Qed. 625 | 626 | Lemma match_lem : forall r w start delta, 627 | match_regex r w start delta \/ ~ (match_regex r w start delta). 628 | Proof. 629 | intros. 630 | pose proof match_iff_not_match_aux r w start delta. 631 | firstorder. 632 | Qed. 633 | 634 | Lemma tape_length : forall r w t, 635 | is_tape r w t 636 | -> length t = length w + 1. 637 | Proof. 638 | firstorder. 639 | Qed. 640 | 641 | Lemma scanMatcher_length : forall scanMatch, 642 | is_scanMatcher scanMatch 643 | -> forall r w, 644 | length (scanMatch r w) = length w + 1. 645 | Proof. 646 | firstorder. 647 | Qed. 648 | 649 | Lemma tape_none : forall r w t, 650 | is_tape r w t 651 | -> forall delta, 652 | delta > length w 653 | <-> nth_error t delta = None. 654 | Proof. 655 | intros. unfold is_tape in H. 656 | destruct H as [Hlen H]. 657 | specialize (H delta). 658 | split; intro. 659 | - rewrite nth_error_None. lia. 660 | - rewrite nth_error_None in H0. lia. 661 | Qed. 662 | 663 | Lemma scanMatcher_none : forall scanMatch, 664 | is_scanMatcher scanMatch 665 | -> forall r w delta, 666 | delta > length w 667 | <-> nth_error (scanMatch r w) delta = None. 668 | Proof. 669 | intros. unfold is_scanMatcher in H. 670 | specialize (H r w). 671 | apply tape_none with (delta := delta) in H. 672 | firstorder. 673 | Qed. 674 | 675 | End Semantics. 676 | 677 | Hint Constructors match_regex : regex. 678 | Hint Resolve match_eps_iff : regex. 679 | Hint Resolve match_class_iff : regex. 680 | Hint Resolve match_union_iff : regex. 681 | Hint Resolve match_concat_iff match_concat_iff_2 : regex. 682 | Hint Resolve match_lookahead_iff : regex. 683 | Hint Resolve match_neglookahead_iff : regex. 684 | Hint Resolve match_lookbehind_iff : regex. 685 | Hint Resolve match_neglookbehind_iff : regex. 686 | Hint Resolve match_star_iff match_star_nonempty : regex. 687 | Hint Resolve match_lem : regex. 688 | Hint Resolve scanMatcher_none : regex. -------------------------------------------------------------------------------- /theories/Makefile: -------------------------------------------------------------------------------- 1 | %: Makefile.coq phony 2 | +make -f Makefile.coq $@ 3 | 4 | all: Makefile.coq 5 | +make -f Makefile.coq all 6 | 7 | clean: Makefile.coq 8 | +make -f Makefile.coq clean 9 | rm -f Makefile.coq 10 | 11 | Makefile.coq: _CoqProject Makefile 12 | coq_makefile -f _CoqProject -o Makefile.coq 13 | 14 | _CoqProject: ; 15 | 16 | Makefile: ; 17 | 18 | phony: ; 19 | 20 | .PHONY: all clean phony -------------------------------------------------------------------------------- /theories/ORegex.v: -------------------------------------------------------------------------------- 1 | (** 2 | 3 | This file contains a number of important notions involving Regular Expressions with Oracle Queries (ORegex). 4 | 5 | 1. The notion of [ostring : Type] formalizes the notion of a string annotated with oracle valuations. An [ostring] is a list over [A] along with a list of [valuation] (i.e, list of [boolean]) 6 | - The predicate [ostring_wf] is a predicate that asserts that an [ostring] is well-formed. 7 | - The [olength : ostring -> nat] function is the analogue of [length] for [ostring]. 8 | - [ostring] should be cleaved via the functions [ofirstn : nat -> ostring -> ostring] and [oskipn : nat -> ostring -> ostring]. 9 | - The [orev : ostring -> ostring] function is the analogue of [rev] for [ostring]. 10 | - This file contains many lemmas connecting [olength], [orev], [ofirstn], and [oskipn] to each other. 11 | 12 | 2. The type [ORegex] denotes Regular Expressions with Oracle Queries. The predicate [match_oregex : ORegex -> ostring -> Prop] is used to certify that an [ORegex] matches a given [ostring]. 13 | 14 | *) 15 | 16 | Require Import Lia. 17 | Require Import Coq.Lists.List. 18 | 19 | 20 | Import ListNotations. 21 | 22 | Require Import ListLemmas. 23 | 24 | Section OString. 25 | 26 | Context {A : Type}. 27 | 28 | Definition valuation : Type := list bool. 29 | Definition ostring : Type := (list A) * (list valuation). 30 | 31 | Definition outer_length_wf (s : ostring) : Prop := 32 | length (fst s) + 1 = length (snd s). 33 | Definition inner_length_wf (s : ostring) : Prop := 34 | forall u v : valuation, 35 | In u (snd s) -> In v (snd s) -> length u = length v. 36 | Definition ostring_wf (s : ostring) : Prop := 37 | outer_length_wf s /\ inner_length_wf s. 38 | 39 | Definition olength (s : ostring) : nat := length (fst s). 40 | Definition ofirstn (n : nat) (s : ostring) : ostring := 41 | (firstn n (fst s), firstn (S n) (snd s)). 42 | Definition oskipn (n : nat) (s : ostring) : ostring := 43 | (skipn n (fst s), 44 | skipn (Nat.min n (length (fst s))) (snd s)). 45 | Definition orev (s : ostring) : ostring := 46 | (rev (fst s), rev (snd s)). 47 | Definition ofirstval (os : ostring) : option valuation := 48 | hd_error (snd os). 49 | Definition olastval (os : ostring) : option valuation := 50 | last_error (snd os). 51 | 52 | Inductive ORegex : Type := 53 | | OEpsilon : ORegex 54 | | OCharClass : (A -> bool) -> ORegex 55 | | OConcat : ORegex -> ORegex -> ORegex 56 | | OUnion : ORegex -> ORegex -> ORegex 57 | | OStar : ORegex -> ORegex 58 | | OQueryPos : nat -> ORegex 59 | | OQueryNeg : nat -> ORegex 60 | . 61 | 62 | Inductive match_oregex : ORegex -> ostring -> Prop := 63 | | omatch_epsilon : 64 | forall (os : ostring), olength os = 0 -> match_oregex OEpsilon os 65 | | omatch_charclass : 66 | forall (os : ostring) (a : A) (pred : A -> bool), 67 | olength os = 1 -> (hd_error (fst os)) = Some a -> pred a = true -> match_oregex (OCharClass pred) os 68 | | omatch_concat : 69 | forall (r1 r2 : ORegex) (os : ostring) (n : nat), 70 | match_oregex r1 (ofirstn n os) -> match_oregex r2 (oskipn n os) -> match_oregex (OConcat r1 r2) os 71 | | omatch_union_l : 72 | forall (r1 r2 : ORegex) (os : ostring), 73 | match_oregex r1 os -> match_oregex (OUnion r1 r2) os 74 | | omatch_union_r : 75 | forall (r1 r2 : ORegex) (os : ostring), 76 | match_oregex r2 os -> match_oregex (OUnion r1 r2) os 77 | | omatch_star_eps : 78 | forall (r : ORegex) (os : ostring), 79 | olength os = 0 -> match_oregex (OStar r) os 80 | | omatch_star : 81 | forall (r : ORegex) (os : ostring) (n : nat), 82 | match_oregex r (ofirstn n os) -> match_oregex (OStar r) (oskipn n os) -> match_oregex (OStar r) os 83 | | omatch_query_pos : 84 | forall (os : ostring) (v : valuation) (q : nat), 85 | olength os = 0 -> hd_error (snd os) = Some v -> nth_error v q = Some true -> match_oregex (OQueryPos q) os 86 | | omatch_query_neg : 87 | forall (os : ostring) (v : valuation) (q : nat), 88 | olength os = 0 -> hd_error (snd os) = Some v -> nth_error v q = Some false -> match_oregex (OQueryNeg q) os 89 | . 90 | 91 | Definition oWildCard : ORegex := 92 | OStar (OCharClass (fun _ => true)). 93 | 94 | Definition empty_oreg : ORegex := 95 | OCharClass (fun _ => false). 96 | 97 | Definition rPass (or : ORegex) : ORegex := 98 | OConcat oWildCard or. 99 | 100 | Lemma ofirstn_outer_length_wf : forall n s, 101 | outer_length_wf s -> outer_length_wf (ofirstn n s). 102 | Proof. 103 | unfold outer_length_wf, ofirstn. 104 | destruct s as (o, w). 105 | simpl fst. simpl snd. 106 | rewrite firstn_length. intros. 107 | destruct w eqn:Hw. 108 | - simpl in *. lia. 109 | - simpl in *. 110 | enough (Nat.min n (length o) = length (firstn n l)) by lia. 111 | rewrite firstn_length. lia. 112 | Qed. 113 | 114 | Lemma ofirstn_inner_length_wf : forall n s, 115 | inner_length_wf s -> inner_length_wf (ofirstn n s). 116 | Proof. 117 | unfold inner_length_wf, ofirstn. 118 | destruct s as (o, w). 119 | simpl fst. simpl snd at 1 2 4 6. 120 | remember (S n) as n'. simpl snd. 121 | subst n'. intros. 122 | apply H; 123 | rewrite <- firstn_skipn with (n := (S n)); 124 | rewrite in_app_iff; auto. 125 | Qed. 126 | 127 | Lemma ofirstn_ostring_wf : forall n s, 128 | ostring_wf s -> ostring_wf (ofirstn n s). 129 | Proof. 130 | firstorder using ofirstn_outer_length_wf, ofirstn_inner_length_wf. 131 | Qed. 132 | 133 | Lemma oskipn_outer_length_wf : forall n s, 134 | outer_length_wf s -> outer_length_wf (oskipn n s). 135 | Proof. 136 | unfold outer_length_wf, oskipn. 137 | destruct s as (o, w). 138 | simpl fst. simpl snd. 139 | repeat rewrite skipn_length. 140 | intros. lia. 141 | Qed. 142 | 143 | Lemma oskipn_inner_length_wf : forall n s, 144 | inner_length_wf s -> inner_length_wf (oskipn n s). 145 | Proof. 146 | unfold inner_length_wf, oskipn. 147 | destruct s as (o, w). 148 | simpl fst. simpl snd. 149 | intros. 150 | apply H; 151 | rewrite <- firstn_skipn with (n := Nat.min n (length o)); 152 | rewrite in_app_iff; auto. 153 | Qed. 154 | 155 | Lemma oskipn_ostring_wf : forall n s, 156 | ostring_wf s -> ostring_wf (oskipn n s). 157 | Proof. 158 | firstorder using oskipn_outer_length_wf, oskipn_inner_length_wf. 159 | Qed. 160 | 161 | Lemma orev_outer_length_wf : forall s, 162 | outer_length_wf s -> outer_length_wf (orev s). 163 | Proof. 164 | unfold outer_length_wf, orev. 165 | destruct s as (o, w). 166 | simpl. 167 | repeat rewrite rev_length. 168 | lia. 169 | Qed. 170 | 171 | Lemma orev_inner_length_wf : forall s, 172 | inner_length_wf s -> inner_length_wf (orev s). 173 | Proof. 174 | unfold inner_length_wf, orev. 175 | destruct s as (o, w). 176 | simpl. intros. 177 | rewrite <- in_rev in *. 178 | auto. 179 | Qed. 180 | 181 | Lemma orev_ostring_wf : forall s, 182 | ostring_wf s -> ostring_wf (orev s). 183 | Proof. 184 | firstorder using orev_outer_length_wf, orev_inner_length_wf. 185 | Qed. 186 | 187 | Lemma orev_involutive : forall s, 188 | orev (orev s) = s. 189 | Proof. 190 | unfold orev. 191 | destruct s as (o, w). simpl. 192 | intros. f_equal; apply rev_involutive. 193 | Qed. 194 | 195 | Lemma orev_olength : forall s, 196 | olength (orev s) = olength s. 197 | Proof. 198 | unfold olength, orev. 199 | destruct s as (o, w). simpl. 200 | apply rev_length. 201 | Qed. 202 | 203 | Lemma ofirstn_olength : forall n s, 204 | olength (ofirstn n s) = min n (olength s). 205 | Proof. 206 | unfold olength, ofirstn. 207 | destruct s as (o, w). simpl. 208 | rewrite firstn_length. lia. 209 | Qed. 210 | 211 | Lemma oskipn_olength : forall n s, 212 | olength (oskipn n s) = (olength s - n). 213 | Proof. 214 | unfold olength, oskipn. 215 | destruct s as (o, w). simpl. 216 | rewrite skipn_length. lia. 217 | Qed. 218 | 219 | 220 | Lemma ofirstn_oskipn_orev : forall n ostr, 221 | outer_length_wf ostr 222 | -> ofirstn n ostr = orev (oskipn (olength ostr - n) (orev ostr)). 223 | Proof. 224 | unfold outer_length_wf. 225 | unfold ofirstn, oskipn, orev. 226 | destruct ostr as (o, w). 227 | simpl fst. simpl snd. intro. 228 | f_equal. 229 | - unfold olength. simpl. 230 | apply firstn_skipn_rev. 231 | - unfold olength. simpl fst. 232 | rewrite rev_length. 233 | assert (length o >= n \/ length o < n) by lia. 234 | destruct H0. 235 | + replace (min (length o - n) (length o)) with (length o - n) by lia. 236 | replace (length o - n) with (length w - (S n)) by lia. 237 | apply firstn_skipn_rev. 238 | + replace (min (length o - n) (length o)) with 0 by lia. 239 | simpl skipn. rewrite rev_involutive. 240 | now rewrite firstn_all2 by lia. 241 | Qed. 242 | 243 | Lemma ofirstn_orev : forall n ostr, 244 | outer_length_wf ostr 245 | -> ofirstn n (orev ostr) = orev (oskipn (olength ostr - n) ostr). 246 | Proof. 247 | unfold outer_length_wf. 248 | unfold ofirstn, oskipn, orev. 249 | destruct ostr as (o, w). 250 | simpl fst. simpl snd. intro. 251 | f_equal. 252 | - unfold olength. simpl. 253 | apply firstn_rev. 254 | - unfold olength. simpl fst. 255 | assert (length o >= n \/ length o < n) by lia. 256 | destruct H0. 257 | + replace (min (length o - n) (length o)) with (length o - n) by lia. 258 | replace (length o - n) with (length w - (S n)) by lia. 259 | apply firstn_rev. 260 | + replace (min (length o - n) (length o)) with 0 by lia. 261 | simpl skipn. 262 | rewrite firstn_all2. auto. 263 | rewrite rev_length. lia. 264 | Qed. 265 | 266 | Lemma oskipn_orev : forall n ostr, 267 | outer_length_wf ostr 268 | -> oskipn n (orev ostr) = orev (ofirstn (olength ostr - n) ostr). 269 | Proof. 270 | unfold outer_length_wf. 271 | unfold ofirstn, oskipn, orev. 272 | destruct ostr as (o, w). 273 | simpl fst. simpl snd at 1 2 5. 274 | unfold olength. simpl fst. 275 | remember (S (length o - n)) as m. 276 | simpl snd. subst. intro. 277 | f_equal. 278 | - apply skipn_rev. 279 | - rewrite rev_length. 280 | assert (length o >= n \/ length o < n) by lia. 281 | destruct H0. 282 | + replace (min n (length o)) with n by lia. 283 | replace (S (length o - n)) with (length w - n) by lia. 284 | apply skipn_rev. 285 | + replace (min n (length o)) with (length o) by lia. 286 | replace (S (length o - n)) with (length w - length o) by lia. 287 | apply skipn_rev. 288 | Qed. 289 | 290 | Lemma ofirstn_all2 : forall n ostr, 291 | outer_length_wf ostr -> 292 | olength ostr <= n -> ofirstn n ostr = ostr. 293 | Proof. 294 | unfold ofirstn, olength, outer_length_wf. 295 | destruct ostr as (w, o). 296 | simpl fst. simpl snd. 297 | intros. 298 | now repeat rewrite firstn_all2 by lia. 299 | Qed. 300 | 301 | Lemma ofirstn_all3 : forall n ostr, 302 | outer_length_wf ostr -> 303 | olength ostr <= n -> ofirstn n ostr = ofirstn (olength ostr) ostr. 304 | Proof. 305 | intros. repeat rewrite ofirstn_all2; auto. 306 | Qed. 307 | 308 | Lemma oskipn_all2 : forall n ostr, 309 | outer_length_wf ostr -> 310 | olength ostr <= n -> oskipn n ostr = (nil, skipn (olength ostr) (snd ostr)). 311 | Proof. 312 | unfold outer_length_wf, oskipn, olength. 313 | destruct ostr as (w, o). 314 | simpl fst. simpl snd. 315 | intros. 316 | f_equal. 317 | - now rewrite skipn_all2. 318 | - f_equal. lia. 319 | Qed. 320 | 321 | Lemma oskipn_all3 : forall n ostr, 322 | outer_length_wf ostr -> 323 | olength ostr <= n -> oskipn n ostr = oskipn (olength ostr) ostr. 324 | Proof. 325 | intros. repeat rewrite oskipn_all2; auto. 326 | Qed. 327 | 328 | Lemma oskipn0 : forall ostr, 329 | oskipn 0 ostr = ostr. 330 | Proof. 331 | unfold oskipn. 332 | destruct ostr as (w, o). 333 | simpl fst. simpl snd. 334 | replace (min 0 (length w)) with 0 by lia. 335 | auto. 336 | Qed. 337 | 338 | Lemma ofirstn_ofirstn : forall n m ostr, 339 | ofirstn n (ofirstn m ostr) = ofirstn (min n m) ostr. 340 | Proof. 341 | unfold ofirstn. 342 | destruct ostr as (w, o). 343 | simpl fst. simpl snd at 2 3. 344 | f_equal. 345 | - apply firstn_firstn. 346 | - remember (S m) as m'. 347 | simpl snd. rewrite firstn_firstn. f_equal. 348 | lia. 349 | Qed. 350 | 351 | 352 | Lemma oskipn_oskipn : forall n m ostr, 353 | oskipn n (oskipn m ostr) = oskipn (n + m) ostr. 354 | Proof. 355 | unfold oskipn. 356 | destruct ostr as (w, o). 357 | simpl fst. simpl snd at 2 3. 358 | f_equal. 359 | - apply skipn_skipn. 360 | - remember (S m) as m'. 361 | simpl snd. rewrite skipn_skipn. 362 | rewrite skipn_length. f_equal. 363 | lia. 364 | Qed. 365 | 366 | 367 | Lemma oskipn_ofirstn_comm : forall n m ostr, 368 | outer_length_wf ostr -> 369 | m <= n -> 370 | oskipn m (ofirstn n ostr) = ofirstn (n - m) (oskipn m ostr). 371 | Proof. 372 | unfold oskipn, ofirstn. 373 | destruct ostr as (w, o). 374 | unfold outer_length_wf. 375 | simpl fst. remember (S n) as n'. simpl snd. 376 | intros Hwf Hmn. 377 | f_equal. 378 | - apply skipn_firstn_comm. 379 | - rewrite skipn_firstn_comm. 380 | rewrite firstn_length. 381 | assert (length w < m \/ length w >= m) as Hm by lia. 382 | assert (length w < n \/ length w >= n) as Hn by lia. 383 | destruct Hm, Hn. 384 | { replace (min n (length w)) with (length w) by lia. 385 | replace (min m (length w)) with (length w) by lia. 386 | repeat rewrite firstn_all2. auto. 387 | - rewrite skipn_length. lia. 388 | - rewrite skipn_length. lia. 389 | } 390 | { replace (min m (length w)) with (length w) by lia. 391 | replace (min n (length w)) with n by lia. 392 | replace (min m n) with m by lia. 393 | subst n'. replace (S n - m) with (S (n - m)) by lia. 394 | f_equal. 395 | repeat rewrite skipn_all2 by lia. 396 | auto. 397 | } 398 | { replace (min n (length w)) with (length w) by lia. 399 | replace (min m (length w)) with m by lia. 400 | f_equal. lia. 401 | } 402 | { replace (min n (length w)) with n by lia. 403 | replace (min m (length w)) with m by lia. 404 | replace (min m n) with m by lia. 405 | f_equal. lia. 406 | } 407 | Qed. 408 | 409 | Lemma ofirstval_Some (os : ostring) : 410 | outer_length_wf os -> exists v0, ofirstval os = Some v0. 411 | Proof. 412 | unfold outer_length_wf, ofirstval. 413 | destruct os as (w, o). 414 | simpl. intros Hwf. 415 | destruct o as [| v0 o]. 416 | - simpl in Hwf. lia. 417 | - exists v0. auto. 418 | Qed. 419 | 420 | Lemma olastval_Some (os : ostring) : 421 | outer_length_wf os -> exists v0, olastval os = Some v0. 422 | Proof. 423 | unfold outer_length_wf, olastval. 424 | destruct os as (w, o). 425 | simpl. 426 | destruct (unsnoc o) as [[l x] | ] eqn:E. 427 | - rewrite unsnoc_Some in E. 428 | subst. exists x. 429 | rewrite last_error_Some. 430 | eauto. 431 | - rewrite unsnoc_None in E. 432 | subst. simpl. lia. 433 | Qed. 434 | 435 | Lemma ostring_ind (P : ostring -> Prop) : 436 | (forall v0, P ([], [v0])) -> 437 | (forall v0 a0 w o, outer_length_wf (w, o) -> P (w, o) -> P (a0 :: w, v0 :: o)) -> 438 | forall os, outer_length_wf os -> P os. 439 | Proof. 440 | intros Hbase Hstep. 441 | destruct os as (w, o). 442 | revert o. induction w as [| a0 w IHw]; intros o Hwf. 443 | - unfold outer_length_wf in Hwf. simpl in Hwf. 444 | destruct o as [| v0 o]. { simpl in Hwf. lia. } 445 | destruct o as [| v1 o]. 2: { simpl in Hwf. lia. } 446 | apply Hbase. 447 | - unfold outer_length_wf in Hwf. simpl in Hwf. 448 | destruct o as [| v0 o]. { simpl in Hwf. lia. } 449 | apply Hstep. 450 | + unfold outer_length_wf. simpl. 451 | simpl in Hwf. lia. 452 | + apply IHw. unfold outer_length_wf in *. simpl in *. lia. 453 | Qed. 454 | 455 | Lemma ostring_rev_ind (P : ostring -> Prop) : 456 | (forall v0, P ([], [v0])) -> 457 | (forall v a w o, outer_length_wf (w, o) -> P (w, o) -> P (w ++ [a], o ++ [v])) -> 458 | forall os, outer_length_wf os -> P os. 459 | Proof. 460 | intros Hbase Hstep. 461 | destruct os as (w, o). 462 | revert o. induction w as [| a w IHw] using rev_ind. 463 | - intros o Hwf. unfold outer_length_wf in Hwf. simpl in Hwf. 464 | destruct o as [| v0 o]. { simpl in Hwf. lia. } 465 | destruct o as [| v1 o]. 2: { simpl in Hwf. lia. } 466 | apply Hbase. 467 | - intros o Hwf. unfold outer_length_wf in Hwf. simpl in Hwf. 468 | destruct (unsnoc o) as [[o' ox] | ] eqn:E. 469 | 2 : { rewrite unsnoc_None in E. subst. simpl in Hwf. lia. } 470 | rewrite unsnoc_Some in E. subst. 471 | apply Hstep. 472 | + unfold outer_length_wf. simpl. 473 | repeat rewrite app_length in Hwf. simpl in Hwf. lia. 474 | + apply IHw. 475 | unfold outer_length_wf. simpl. 476 | repeat rewrite app_length in Hwf. simpl in Hwf. lia. 477 | Qed. 478 | 479 | Lemma omatch_charclass_iff : forall (os : ostring) (pred : A -> bool), 480 | match_oregex (OCharClass pred) os 481 | <-> exists a, fst os = [a] /\ pred a = true. 482 | Proof. 483 | intros. 484 | split; intros. 485 | - inversion H. 486 | subst. destruct os as [w o]. 487 | simpl in *. destruct w; [ inversion H1 | ]. 488 | destruct w; [ | inversion H1]. 489 | exists a. simpl in H2. 490 | inversion H2. auto. 491 | - destruct H as [a [H1 H2]]. 492 | destruct os as [w o]. simpl in H1; subst. 493 | now apply omatch_charclass with (pred := pred) (a := a). 494 | Qed. 495 | 496 | Lemma omatch_union_iff : forall r1 r2 os, 497 | match_oregex (OUnion r1 r2) os 498 | <-> match_oregex r1 os \/ match_oregex r2 os. 499 | Proof. 500 | intros. 501 | split; intros. 502 | - inversion H; subst; auto. 503 | - destruct H; [apply omatch_union_l | apply omatch_union_r]; auto. 504 | Qed. 505 | 506 | Lemma omatch_concat_iff : forall r1 r2 os, 507 | outer_length_wf os -> 508 | match_oregex (OConcat r1 r2) os 509 | <-> exists n, 0 <= n <= olength os /\ 510 | match_oregex r1 (ofirstn n os) /\ match_oregex r2 (oskipn n os). 511 | Proof. 512 | intros r1 r2 os. 513 | split. 514 | - intros Hr12. inversion Hr12; subst. 515 | assert (n <= olength os \/ n > olength os) as [Hn | Hn] by lia. 516 | + exists n. repeat split; auto. lia. 517 | + exists (olength os). repeat split. 518 | lia. lia. 519 | destruct os as (w, o). 520 | unfold olength in *. simpl in *. 521 | * rewrite ofirstn_all2 by auto. 522 | rewrite ofirstn_all2 in H2. auto. 523 | auto. unfold olength. simpl. lia. 524 | * rewrite oskipn_all2 by auto. 525 | rewrite oskipn_all2 in H4. auto. 526 | auto. unfold olength in *. 527 | destruct os as (w, o); simpl in *; lia. 528 | - intros [n [Hn [Hr1 Hr2]]]. 529 | apply omatch_concat with n; auto. 530 | Qed. 531 | 532 | 533 | Lemma omatch_star_nonempty : forall r os, 534 | outer_length_wf os -> 535 | match_oregex (OStar r) os 536 | <-> olength os = 0 537 | \/ exists n, 538 | 0 < n /\ n <= olength os /\ 539 | match_oregex r (ofirstn n os) /\ match_oregex (OStar r) (oskipn n os). 540 | Proof. 541 | intros r os Hwf. 542 | split. 543 | - intros H. remember (OStar r) as e. 544 | induction H; try discriminate. 545 | + tauto. 546 | + assert (r0 = r) by now inversion Heqe. subst r0. 547 | apply IHmatch_oregex2 in Heqe. 548 | clear IHmatch_oregex2. clear IHmatch_oregex1. 549 | assert (olength os = 0 \/ olength os > 0) as Hos by lia. 550 | destruct Hos; [tauto |]. 551 | assert (n > olength os \/ n <= olength os) as Hn1 by lia. 552 | destruct Hn1. 553 | { 554 | rewrite oskipn_all3 in H0. 555 | rewrite ofirstn_all3 in H. 556 | right. exists (olength os). repeat split. 557 | - lia. 558 | - lia. 559 | - assumption. 560 | - assumption. 561 | - assumption. 562 | - lia. 563 | - assumption. 564 | - lia. 565 | } 566 | assert (n > 0 \/ n = 0) as Hn by lia. destruct Hn. 567 | { 568 | right. exists n. repeat split. 569 | - lia. 570 | - lia. 571 | - assumption. 572 | - assumption. 573 | } 574 | subst n. rewrite oskipn0 in *. 575 | destruct Heqe; tauto. 576 | now apply oskipn_outer_length_wf. 577 | - intros H. 578 | destruct H as [H | [n [Hn [Hn' [Hr Hstar]]]]]. 579 | + apply omatch_star_eps; assumption. 580 | + apply omatch_star with (n := n); assumption. 581 | Qed. 582 | 583 | Lemma ounmatch_star_r (r : ORegex) (os : ostring) : 584 | outer_length_wf os -> 585 | match_oregex (OStar r) os 586 | -> olength os = 0 587 | \/ exists n, match_oregex (OStar r) (ofirstn n os) 588 | /\ match_oregex r (oskipn n os). 589 | Proof. 590 | intros Hwf H. 591 | remember (OStar r) as e. 592 | induction H; inversion Heqe. 593 | - auto. 594 | - subst. right. 595 | pose proof (oskipn_outer_length_wf n os Hwf). 596 | pose proof (IHmatch_oregex2 H1 Heqe). 597 | clear IHmatch_oregex2. clear IHmatch_oregex1. 598 | clear Heqe. rename H2 into IH. 599 | destruct IH as [IH | [n' [IH1 IH2]]]. 600 | + exists 0. split. 601 | * apply omatch_star_eps. 602 | rewrite ofirstn_olength. lia. 603 | * enough (oskipn 0 os = ofirstn n os). 604 | -- rewrite -> H2. assumption. 605 | -- rewrite oskipn0. rewrite ofirstn_all2. auto. auto. 606 | rewrite oskipn_olength in IH. lia. 607 | + exists (n' + n). split. 608 | * apply omatch_star with (n := n). 609 | -- rewrite ofirstn_ofirstn. 610 | replace (min n (n' + n)) with n by lia. 611 | assumption. 612 | -- rewrite oskipn_ofirstn_comm. 613 | replace (n' + n - n) with n' by lia. 614 | assumption. 615 | assumption. lia. 616 | * rewrite <- oskipn_oskipn. assumption. 617 | Qed. 618 | 619 | 620 | Lemma omatch_star_r : forall r os n, 621 | outer_length_wf os -> 622 | match_oregex (OStar r) (ofirstn n os) 623 | -> match_oregex r (oskipn n os) 624 | -> match_oregex (OStar r) os. 625 | Proof. 626 | assert ( 627 | forall e os1, 628 | match_oregex e os1 629 | -> forall r os2, 630 | match_oregex r os2 631 | -> e = OStar r 632 | -> forall n os, 633 | outer_length_wf os 634 | -> n <= olength os 635 | -> os1 = ofirstn n os 636 | -> os2 = oskipn n os 637 | -> match_oregex e os 638 | ). 639 | { intros e os1 Hos1. 640 | induction Hos1; try (intros; discriminate). 641 | - intros. subst. 642 | rewrite ofirstn_olength in H. 643 | replace (min n (olength os0)) with n in H by lia. 644 | subst. rewrite oskipn0 in H0. 645 | apply omatch_star with (n := olength os0). 646 | + rewrite ofirstn_all2. 647 | inversion H1. subst. assumption. auto. 648 | auto. 649 | + rewrite oskipn_all2. 650 | apply omatch_star_eps. auto. 651 | auto. auto. 652 | - rename os into osA. rename n into nA. 653 | rename Hos1_1 into HosA1. rename Hos1_2 into HosA2. 654 | clear IHHos1_1. rename IHHos1_2 into IH. 655 | intros r' osB HosB Heq n os Hwf Hn HeqA HeqB. 656 | subst. 657 | specialize (IH r' (oskipn n os) HosB Heq (n - nA) (oskipn nA os)). 658 | specialize (IH (oskipn_outer_length_wf nA os Hwf)). 659 | rewrite oskipn_olength in IH. 660 | specialize (IH ltac:(lia)). 661 | pose proof (oskipn_ofirstn_comm n nA os Hwf). 662 | assert (nA > n \/ nA <= n) as HnA by lia. 663 | destruct HnA. 664 | { rewrite ofirstn_ofirstn in HosA1. 665 | replace (min nA n) with n in HosA1. 666 | inversion Heq. subst. 667 | apply omatch_star with (n := n). 668 | assumption. apply omatch_star with (n := olength (oskipn n os)). 669 | rewrite ofirstn_all2. assumption. 670 | apply oskipn_outer_length_wf. assumption. 671 | auto. rewrite oskipn_all2. 672 | apply omatch_star_eps. auto. 673 | apply oskipn_outer_length_wf. assumption. 674 | lia. lia. 675 | } 676 | specialize (H H0). 677 | specialize (IH H). 678 | rewrite oskipn_oskipn in IH. 679 | replace (n - nA + nA) with n in IH by lia. 680 | specialize (IH ltac:(reflexivity)). 681 | rewrite ofirstn_ofirstn in HosA1. 682 | replace (min nA n) with nA in HosA1 by lia. 683 | apply omatch_star with (n := nA); assumption. 684 | } 685 | intros. 686 | assert (n > olength os \/ n <= olength os) as Hn by lia. 687 | destruct Hn. 688 | { rewrite ofirstn_all2 in H1. 689 | auto. auto. lia. 690 | } 691 | apply H with (os1 := ofirstn n os) (os2 := oskipn n os) 692 | (r := r) (n := n); auto. 693 | Qed. 694 | 695 | Lemma oWildCard_match (os : ostring) (Hwf : outer_length_wf os) : 696 | match_oregex oWildCard os. 697 | Proof. 698 | revert Hwf. 699 | destruct os as (w, o). 700 | revert o. 701 | induction w. { 702 | (* when w is [] *) 703 | intros. 704 | now apply omatch_star_eps. 705 | } 706 | intros. 707 | destruct o as [ | o0 o']. { 708 | unfold outer_length_wf in Hwf. 709 | simpl in Hwf. lia. 710 | } 711 | destruct o' as [ | o1 o'']. { 712 | unfold outer_length_wf in Hwf. 713 | simpl in Hwf. lia. 714 | } 715 | apply omatch_star with (n := 1). 716 | - unfold ofirstn. simpl. 717 | eapply omatch_charclass; auto. 718 | simpl. reflexivity. 719 | - unfold oskipn. simpl. 720 | apply IHw. 721 | unfold outer_length_wf in Hwf |- *. 722 | simpl in Hwf |- *. 723 | lia. 724 | Qed. 725 | 726 | Lemma rPass_match (or : ORegex) (os : ostring) (Hwf : outer_length_wf os): 727 | match_oregex (rPass or) os 728 | <-> exists start, start <= olength os /\ match_oregex or (oskipn start os). 729 | Proof. 730 | unfold rPass. 731 | rewrite omatch_concat_iff; [ | assumption]. 732 | split; intros. 733 | - destruct H as [n [Hn [_ Hr]]]. 734 | exists n. tauto. 735 | - destruct H as [n [Hn Hr]]. 736 | exists n. repeat split. 737 | + lia. 738 | + assumption. 739 | + apply oWildCard_match. apply ofirstn_outer_length_wf. assumption. 740 | + assumption. 741 | Qed. 742 | 743 | Lemma omatch_empty_oreg_never : 744 | forall (os : ostring), 745 | ~ match_oregex empty_oreg os. 746 | Proof. 747 | intros ? H. 748 | inversion H; subst. 749 | discriminate. 750 | Qed. 751 | 752 | End OString. -------------------------------------------------------------------------------- /theories/OReverse.v: -------------------------------------------------------------------------------- 1 | (** 2 | 3 | This file is the analogue of [Reverse.v] for the type [ORegex]. It contains the following: 4 | 1. The definition of [oreverse] which reverses an [ORegex]. 5 | 2. The lemma [oreverse_match_iff] which describes the relationship between [r : ORegex] and [oreverse r : ORegex] with respect to [match_oregex]. 6 | 7 | *) 8 | 9 | Require Import Lia. 10 | Require Import Coq.Lists.List. 11 | 12 | Require Import ORegex. 13 | 14 | Section OReverse. 15 | 16 | Context {A : Type}. 17 | 18 | Fixpoint oreverse (r : @ORegex A) : ORegex := 19 | match r with 20 | | OEpsilon => OEpsilon 21 | | OCharClass c => OCharClass c 22 | | OUnion r1 r2 => OUnion (oreverse r1) (oreverse r2) 23 | | OConcat r1 r2 => OConcat (oreverse r2) (oreverse r1) 24 | | OStar r => OStar (oreverse r) 25 | | OQueryPos i => OQueryPos i 26 | | OQueryNeg i => OQueryNeg i 27 | end. 28 | 29 | Lemma oreverse_involutive : 30 | forall r, 31 | oreverse (oreverse r) = r. 32 | Proof. 33 | induction r; simpl; congruence. 34 | Qed. 35 | 36 | Lemma oreverse_match : 37 | forall r ostr, 38 | outer_length_wf ostr 39 | -> match_oregex r ostr 40 | -> match_oregex (oreverse r) (orev ostr). 41 | Proof. 42 | intros r ostr. 43 | destruct ostr as (w, o). 44 | unfold orev. simpl. 45 | revert w o. 46 | induction r; simpl; intros w o Hwf; intros. 47 | - inversion H. apply omatch_epsilon. 48 | unfold olength in *. simpl in *. 49 | now rewrite rev_length. 50 | - inversion H. simpl in H2. 51 | apply omatch_charclass with (a := a). 52 | + unfold olength in *. simpl in *. 53 | rewrite rev_length. auto. 54 | + simpl. unfold olength in H1. 55 | destruct w; 56 | [simpl in H1; lia 57 | | destruct w; 58 | [ | simpl in H1; lia]]. 59 | simpl in H2. inversion H2. subst. 60 | auto. 61 | + auto. 62 | - inversion H. subst. 63 | remember (ofirstn n (w, o)) as ostr1. 64 | remember (oskipn n (w, o)) as ostr2. 65 | (destruct ostr1 as (w1, o1); destruct ostr2 as (w2, o2)). 66 | apply IHr1 in H2. apply IHr2 in H4. 67 | replace (rev w1, rev o1) with (orev (w1, o1)) in H2 by reflexivity. 68 | replace (rev w2, rev o2) with (orev (w2, o2)) in H4 by reflexivity. 69 | rewrite Heqostr1 in H2. rewrite Heqostr2 in H4. 70 | assert (n <= olength (w, o) \/ n > olength (w, o)) as Hlen by lia. 71 | destruct Hlen. 72 | + assert (n = (olength (w, o) - (olength (w, o) - n))) by lia. 73 | rewrite H1 in H2. rewrite H1 in H4. 74 | rewrite <- ofirstn_orev in H4 by assumption. 75 | rewrite <- oskipn_orev in H2 by assumption. 76 | unfold olength in H2. simpl in H2. 77 | unfold olength in H4. simpl in H4. 78 | unfold orev in H2. unfold orev in H4. 79 | simpl in H2. simpl in H4. 80 | apply omatch_concat with (n := length w - n); assumption. 81 | + unfold olength in H0. simpl in H0. 82 | assert ((w1, o1) = (w, o)). 83 | { rewrite ofirstn_all2 in Heqostr1. 84 | auto. assumption. 85 | unfold olength. simpl. lia. 86 | } 87 | rewrite H1 in Heqostr1. 88 | rewrite <- Heqostr1 in H2. 89 | apply omatch_concat with (n := 0); auto. 90 | rewrite oskipn_all3 in H4. 91 | replace (olength (w, o)) with (olength (w, o) - 0) in H4 by lia. 92 | rewrite <- ofirstn_orev in H4 by auto. 93 | auto. auto. unfold olength. simpl. lia. 94 | + rewrite Heqostr2. 95 | apply oskipn_outer_length_wf. assumption. 96 | + rewrite Heqostr1. 97 | apply ofirstn_outer_length_wf. assumption. 98 | - inversion H; subst. 99 | + apply omatch_union_l. 100 | apply IHr1; assumption. 101 | + apply omatch_union_r. 102 | apply IHr2; assumption. 103 | - replace (OStar (oreverse r)) with (oreverse (OStar r)) by auto. 104 | remember (OStar r) as r'. remember (w, o) as ostr. 105 | replace (rev w, rev o) with (orev (w, o)) by auto. 106 | rewrite <- Heqostr. clear Heqostr. clear w o. 107 | revert Hwf. induction H; try discriminate. 108 | + inversion Heqr'. subst. intros. 109 | apply omatch_star_eps. 110 | unfold olength in *. simpl in *. 111 | rewrite rev_length. auto. 112 | + inversion Heqr'. subst. intros. 113 | simpl. 114 | assert (olength os = 0 \/ olength os > 0) as Hoslen by lia. 115 | destruct Hoslen; [ 116 | apply omatch_star_eps; 117 | destruct os as (w, o); 118 | unfold olength in *; 119 | simpl in *; 120 | rewrite rev_length; auto | ]. 121 | assert (n <= olength os \/ n > olength os) as Hn by lia. 122 | destruct Hn. 123 | * specialize (IHmatch_oregex2 ltac:(reflexivity) 124 | (oskipn_outer_length_wf n os Hwf)). 125 | simpl in IHmatch_oregex2. 126 | clear IHmatch_oregex1. 127 | apply omatch_star_r with (n := (olength os - n)). 128 | now apply orev_outer_length_wf. 129 | -- rewrite ofirstn_orev by assumption. 130 | replace (olength os - (olength os - n)) with n by lia. 131 | assumption. 132 | -- rewrite oskipn_orev by assumption. 133 | replace (olength os - (olength os - n)) with n by lia. 134 | destruct (ofirstn n os) as (w, o) eqn: He. 135 | unfold orev. simpl. apply IHr. 136 | rewrite <- He. 137 | apply ofirstn_outer_length_wf. assumption. 138 | assumption. 139 | * rewrite ofirstn_all2 in H. 140 | apply omatch_star with (n := olength os). 141 | ++ rewrite ofirstn_all2. 142 | destruct os as (w, o). 143 | unfold orev. simpl. apply IHr. 144 | auto. auto. 145 | now apply orev_outer_length_wf. 146 | pose proof (orev_olength os). lia. 147 | ++ apply omatch_star_eps. 148 | rewrite oskipn_all2. unfold olength. auto. 149 | now apply orev_outer_length_wf. 150 | pose proof (orev_olength os). lia. 151 | ++ assumption. 152 | ++ lia. 153 | - inversion H. subst. 154 | unfold olength in H1. simpl in H1. 155 | apply omatch_query_pos with (v := v). 156 | + unfold olength. simpl. 157 | now rewrite rev_length. 158 | + unfold outer_length_wf in Hwf. 159 | simpl in *. 160 | destruct o; 161 | [ simpl in Hwf; lia | 162 | destruct o; 163 | [| simpl in Hwf; lia]]. 164 | auto. 165 | + auto. 166 | - inversion H. subst. 167 | unfold olength in H1. simpl in H1. 168 | apply omatch_query_neg with (v := v). 169 | + unfold olength. simpl. 170 | now rewrite rev_length. 171 | + unfold outer_length_wf in Hwf. 172 | simpl in *. 173 | destruct o; 174 | [ simpl in Hwf; lia | 175 | destruct o; 176 | [| simpl in Hwf; lia]]. 177 | auto. 178 | + auto. 179 | Qed. 180 | 181 | Lemma oreverse_match_iff : 182 | forall r ostr, 183 | outer_length_wf ostr 184 | -> match_oregex r ostr 185 | <-> match_oregex (oreverse r) (orev ostr). 186 | Proof. 187 | intros. split. 188 | - intros. apply oreverse_match; assumption. 189 | - intros. rewrite <- oreverse_involutive with (r := r). 190 | rewrite <- orev_involutive with (s := ostr). 191 | apply oreverse_match. apply orev_outer_length_wf. assumption. 192 | assumption. 193 | Qed. 194 | 195 | 196 | End OReverse. -------------------------------------------------------------------------------- /theories/Reverse.v: -------------------------------------------------------------------------------- 1 | (** 2 | 3 | This file contains the following: 4 | 1. The [reverse : LRegex -> LRegex] function, which reverses a regular expression. 5 | 2. The [reverse_match] lemma, which describes the relationship between [r : LRegex] and [reverse r] in terms of [match_regex]. 6 | 7 | *) 8 | 9 | Require Import Lia. 10 | Require Import Coq.Lists.List. 11 | 12 | 13 | Require Import LRegex. 14 | Require Import Equations. 15 | 16 | Section Reverse. 17 | 18 | Context {A : Type}. 19 | 20 | Fixpoint reverse (r : @LRegex A) : LRegex := 21 | match r with 22 | | Epsilon => Epsilon 23 | | CharClass c => CharClass c 24 | | Concat r1 r2 => Concat (reverse r2) (reverse r1) 25 | | Union r1 r2 => Union (reverse r1) (reverse r2) 26 | | Star r => Star (reverse r) 27 | | LookAhead r => LookBehind (reverse r) 28 | | LookBehind r => LookAhead (reverse r) 29 | | NegLookAhead r => NegLookBehind (reverse r) 30 | | NegLookBehind r => NegLookAhead (reverse r) 31 | end. 32 | 33 | Lemma reverse_involutive (r : LRegex) : 34 | reverse (reverse r) = r. 35 | Proof. 36 | induction r; simpl; congruence. 37 | Qed. 38 | 39 | Lemma reverse_match_aux (r : LRegex) : 40 | (forall w start delta, 41 | start <= length w 42 | -> match_regex r w start delta 43 | -> match_regex (reverse r) (rev w) (length w - (start + delta)) delta) 44 | /\ (forall w start delta, 45 | start <= length w 46 | -> match_regex (reverse r) w start delta 47 | -> match_regex r (rev w) (length w - (start + delta)) delta). 48 | Proof. 49 | induction r; simpl reverse; split; intros w start delta. 50 | - repeat rewrite match_eps_iff. 51 | auto. 52 | - repeat rewrite match_eps_iff. 53 | auto. 54 | - repeat rewrite match_class_iff. 55 | intro Hlen. 56 | intros [a [ba [Ha Hd]]]. 57 | exists a. repeat split; auto. 58 | assert (start < length w). 59 | { apply nth_error_Some. unfold not. congruence. } 60 | apply nth_error_nth with (d := a) in Ha. 61 | rewrite <- Ha. 62 | rewrite nth_error_nth' with (d := a). 63 | + rewrite rev_nth by lia. 64 | now replace (length w - S (length w - (start + delta))) with start by lia. 65 | + rewrite rev_length. lia. 66 | - repeat rewrite match_class_iff. 67 | intro Hlen. 68 | intros [a [ba [Ha Hd]]]. 69 | exists a. repeat split; auto. 70 | assert (start < length w). 71 | { apply nth_error_Some. unfold not. congruence. } 72 | apply nth_error_nth with (d := a) in Ha. 73 | rewrite <- Ha. 74 | rewrite nth_error_nth' with (d := a). 75 | + rewrite rev_nth by lia. 76 | now replace (length w - S (length w - (start + delta))) with start by lia. 77 | + rewrite rev_length. lia. 78 | - intro Hlen. intros. 79 | rewrite match_concat_iff in H. 80 | destruct H as [d1 [d2 [H1 [H2 H3]]]]. 81 | rewrite match_concat_iff. 82 | pose proof (match_length _ _ _ _ H1) as Hlen1. 83 | pose proof (match_length _ _ _ _ H2) as Hlen2. 84 | apply IHr1 in H1. apply IHr2 in H2. 85 | exists d2, d1. 86 | subst. 87 | repeat split; [| | lia]. 88 | + replace (start + (d1 + d2)) with (start + d1 + d2) by lia. apply H2. 89 | + replace (length w - (start + (d1 + d2)) + d2) with (length w - (start + d1)) by lia. 90 | apply H1. 91 | + lia. 92 | + assumption. 93 | - intro Hlen. intros. 94 | rewrite match_concat_iff in H. 95 | destruct H as [d1 [d2 [H1 [H2 H3]]]]. 96 | rewrite match_concat_iff. 97 | pose proof (match_length _ _ _ _ H1) as Hlen1. 98 | pose proof (match_length _ _ _ _ H2) as Hlen2. 99 | apply IHr2 in H1. apply IHr1 in H2. 100 | exists d2, d1. 101 | subst. 102 | repeat split; [| | lia]. 103 | + replace (start + (d1 + d2)) with (start + d1 + d2) by lia. apply H2. 104 | + replace (length w - (start + (d1 + d2)) + d2) with (length w - (start + d1)) by lia. 105 | apply H1. 106 | + lia. 107 | + assumption. 108 | - intro Hlen. intros. 109 | rewrite match_union_iff in H. 110 | destruct H as [H1 | H2]; rewrite match_union_iff. 111 | + apply IHr1 in H1. auto. lia. 112 | + apply IHr2 in H2. auto. lia. 113 | - intro Hlen. intros. 114 | rewrite match_union_iff in H. 115 | destruct H as [H1 | H2]; rewrite match_union_iff. 116 | + apply IHr1 in H1. auto. lia. 117 | + apply IHr2 in H2. auto. lia. 118 | - intro Hlen. remember (r *) as e. 119 | intro. 120 | induction H; try discriminate. 121 | + apply match_star_eps. 122 | + 123 | pose proof (match_length _ _ _ _ H) as Hlen1. 124 | pose proof (match_length _ _ _ _ H0) as Hlen2. 125 | inversion Heqe. subst. 126 | specialize (IHmatch_regex2 ltac:(lia) ltac:(reflexivity)). 127 | clear IHmatch_regex1. 128 | apply IHr in H. 129 | replace (d1 + d2) with (d2 + d1) at 2 by lia. 130 | apply match_star_r. 131 | replace (start + (d1 + d2)) with (start + d1 + d2) by lia. 132 | apply IHmatch_regex2. 133 | replace (length w - (start + (d1 + d2)) + d2) 134 | with (length w - (start + d1)) by lia. 135 | apply H. lia. 136 | - intro Hlen. 137 | replace ((reverse r) *) with ((reverse (r *))) by auto. 138 | remember (reverse (r *)) as e. intro H. 139 | induction H; try discriminate. 140 | + apply match_star_eps. 141 | + subst. 142 | pose proof (match_length _ _ _ _ H) as Hlen1. 143 | pose proof (match_length _ _ _ _ H0) as Hlen2. 144 | inversion Heqe. subst. 145 | specialize (IHmatch_regex2 ltac:(lia) ltac:(reflexivity)). 146 | clear IHmatch_regex1. 147 | apply IHr in H. 148 | replace (d1 + d2) with (d2 + d1) at 2 by lia. 149 | apply match_star_r. 150 | replace (start + (d1 + d2)) with (start + d1 + d2) by lia. 151 | apply IHmatch_regex2. 152 | replace (length w - (start + (d1 + d2)) + d2) with (length w - (start + d1)) by lia. 153 | apply H. lia. 154 | - rewrite match_lookahead_iff. rewrite match_lookbehind_iff. 155 | intro Hlen. intros [H1 H2]. 156 | apply IHr in H1. 157 | subst. replace ((length w - (start + (length w - start))) ) with 0 in H1 by lia. 158 | replace (start + 0) with start by lia. 159 | auto. lia. 160 | - rewrite match_lookahead_iff. rewrite match_lookbehind_iff. 161 | intro Hlen. intros [H1 H2]. 162 | apply IHr in H1. 163 | subst. simpl in H1. 164 | replace (start + 0) with start by lia. 165 | rewrite rev_length. 166 | pose proof (match_length _ _ _ _ H1) as Hlen1. 167 | rewrite rev_length in Hlen1. 168 | replace (length w - (length w - start)) with start by lia. 169 | auto. lia. 170 | - rewrite match_lookbehind_iff. rewrite match_lookahead_iff. 171 | intro Hlen. intros [H1 H2]. 172 | pose proof (match_length _ _ _ _ H1) as Hlen1. 173 | apply IHr in H1. 174 | subst. replace ((length w - (start + (length w - start))) ) with 0 in H1 by lia. 175 | replace (start + 0) with start by lia. simpl in H1. 176 | rewrite rev_length. 177 | replace (length w - (length w - start)) with start by lia. 178 | auto. lia. 179 | - rewrite match_lookbehind_iff. rewrite match_lookahead_iff. 180 | intro Hlen. intros [H1 H2]. 181 | apply IHr in H1. 182 | subst. 183 | replace ((length w - (start + (length w - start))) ) with 0 in H1 by lia. 184 | replace (start + 0) with start by lia. auto. lia. 185 | - rewrite match_neglookahead_iff. rewrite match_neglookbehind_iff. 186 | repeat rewrite <- match_not_match. 187 | intro Hlen. intros [H1 H2]. 188 | split; [ | auto]. 189 | unfold not. intros H3. 190 | apply IHr in H3. 191 | rewrite rev_involutive in H3. rewrite rev_length in H3. 192 | simpl in H3. subst delta. 193 | replace (start + 0) with start in H3 by lia. 194 | replace (length w - (length w - start)) with start in H3 by lia. 195 | auto. rewrite rev_length. lia. 196 | - rewrite match_neglookahead_iff. rewrite match_neglookbehind_iff. 197 | repeat rewrite <- match_not_match. 198 | intro Hlen. intros [H1 H2]. 199 | split; [ | auto]. 200 | unfold not. intros H3. 201 | apply IHr in H3. 202 | rewrite rev_involutive in H3. rewrite rev_length in H3. 203 | simpl in H3. subst delta. 204 | replace (start + 0) with start in H3 by lia. 205 | replace (length w - (length w - start)) with start in H3 by lia. 206 | replace (length w - (length w - start + start)) with 0 in H3 by lia. 207 | auto. rewrite rev_length. lia. 208 | - rewrite match_neglookbehind_iff. rewrite match_neglookahead_iff. 209 | repeat rewrite <- match_not_match. 210 | intro Hlen. intros [H1 H2]. 211 | split; [ | auto]. 212 | unfold not. intros H3. 213 | apply IHr in H3. 214 | rewrite rev_involutive in H3. rewrite rev_length in H3. 215 | simpl in H3. subst delta. 216 | replace (start + 0) with start in H3 by lia. 217 | replace (length w - (length w - start)) with start in H3 by lia. 218 | replace (length w - (length w - start + start)) with 0 in H3 by lia. 219 | auto. rewrite rev_length. lia. 220 | - rewrite match_neglookbehind_iff. rewrite match_neglookahead_iff. 221 | repeat rewrite <- match_not_match. 222 | intro Hlen. intros [H1 H2]. 223 | split; [ | auto]. 224 | unfold not. intros H3. 225 | apply IHr in H3. 226 | rewrite rev_involutive in H3. rewrite rev_length in H3. 227 | simpl in H3. subst delta. 228 | replace (start + 0) with start in H3 by lia. 229 | replace (length w - (length w - start)) with start in H3 by lia. 230 | auto. rewrite rev_length. lia. 231 | Qed. 232 | 233 | Lemma reverse_match1 : 234 | forall r w start delta, 235 | start <= length w 236 | -> match_regex r w start delta 237 | -> match_regex (reverse r) (rev w) (length w - (start + delta)) delta. 238 | Proof. 239 | pose proof reverse_match_aux. 240 | firstorder. 241 | Qed. 242 | 243 | Lemma reverse_match : 244 | forall r w start delta, 245 | start <= length w 246 | -> start + delta <= length w 247 | -> match_regex r w start delta 248 | <-> match_regex (reverse r) (rev w) (length w - (start + delta)) delta. 249 | Proof. 250 | intros r w start delta HX HY. split; intros H0. 251 | - now apply reverse_match1 in H0. 252 | - apply reverse_match1 in H0. 253 | rewrite rev_involutive in H0. 254 | rewrite rev_length in H0. 255 | rewrite reverse_involutive in H0. 256 | replace (length w - (length w - (start + delta) + delta)) 257 | with start in H0 by lia. 258 | auto. rewrite rev_length. lia. 259 | Qed. 260 | 261 | End Reverse. -------------------------------------------------------------------------------- /theories/_CoqProject: -------------------------------------------------------------------------------- 1 | -R . LRegex 2 | ./ListLemmas.v 3 | ./LRegex.v 4 | ./Equations.v 5 | ./ORegex.v 6 | ./Reverse.v 7 | ./OReverse.v 8 | ./Abstraction.v 9 | ./OMatcher.v 10 | ./CMatcher.v 11 | ./Layerwise.v 12 | ./Extract.v --------------------------------------------------------------------------------