├── .gitignore ├── .travis.yml ├── CHANGES.md ├── LICENSE.md ├── Makefile ├── Readme.md ├── bin ├── dune └── gen_re.ml ├── dune-project ├── jbuild-workspace.dev ├── lib ├── dune ├── langgen.ml ├── parsing.ml ├── regenerate.ml ├── regenerate.mli ├── regex.ml └── word.ml ├── regenerate.descr ├── regenerate.opam ├── segments ├── Heap.ml ├── LazyList.ml ├── Segments.ml ├── Sigs.ml ├── StrictSet.ml ├── ThunkList.ml ├── ThunkListMemo.ml ├── Trie.ml ├── Trie.mli └── dune ├── test ├── dune ├── re.gnuplot ├── re │ ├── dune │ └── test_re.ml ├── running_profile.ml └── test.ml └── web ├── dune ├── index.html ├── regenerate.css ├── regenerate_web.ml └── static ├── fork.min.css └── gpce18.pdf /.gitignore: -------------------------------------------------------------------------------- 1 | _build/ 2 | .merlin 3 | *.install 4 | *.bc.js 5 | .gh-pages 6 | data/*.csv 7 | data/*.svg 8 | data/*.png 9 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: c 2 | install: wget https://raw.githubusercontent.com/ocaml/ocaml-ci-scripts/master/.travis-docker.sh 3 | script: bash -ex .travis-docker.sh 4 | services: 5 | - docker 6 | env: 7 | global: 8 | - PINS="regenerate:." 9 | - DISTRO="debian-stable" 10 | matrix: 11 | - PACKAGE="regenerate" OCAML_VERSION="4.03.0" 12 | - PACKAGE="regenerate" OCAML_VERSION="4.04.2" 13 | - PACKAGE="regenerate" OCAML_VERSION="4.07.1" 14 | - PACKAGE="regenerate" OCAML_VERSION="4.08.1" 15 | - PACKAGE="regenerate" OCAML_VERSION="4.09.0" 16 | -------------------------------------------------------------------------------- /CHANGES.md: -------------------------------------------------------------------------------- 1 | # 0.2 2 | 3 | * Move to dune 4 | * Use Iter instead of Sequence 5 | 6 | # 0.1 7 | 8 | And so it begins. 9 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | Copyright (c) 2016 year, Gabriel Radanne 2 | 3 | Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies. 4 | 5 | THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 6 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | default: all 2 | 3 | prebuild: 4 | @rm -f web/regenerate_web.bc.js 5 | 6 | web: prebuild 7 | dune build web/regenerate_web.bc.js 8 | @cp _build/default/web/regenerate_web.bc.js web/ 9 | 10 | all: prebuild 11 | dune build @install 12 | 13 | test: prebuild 14 | dune runtest --profile release 15 | 16 | clean: prebuild 17 | dune clean 18 | 19 | doc: prebuild 20 | dune build @doc 21 | 22 | NAME=regenerate 23 | DOCDIR=.gh-pages 24 | 25 | $(DOCDIR)/.git: 26 | mkdir -p $(DOCDIR) 27 | cd $(DOCDIR) && (\ 28 | git clone -b gh-pages git@github.com:regex-Generate/$(NAME).git . \ 29 | ) 30 | 31 | gh-pages-index: $(DOCDIR)/.git web 32 | cp -r web/*.html web/*.js web/*.css web/static "$(DOCDIR)/" 33 | 34 | gh-pages: $(DOCDIR)/.git gh-pages-index doc 35 | cp -r _build/default/_doc/_html/* $(DOCDIR)/doc/dev/ 36 | git -C $(DOCDIR) add --all 37 | git -C $(DOCDIR) commit -a -m "gh-page updates" 38 | git -C $(DOCDIR) push origin gh-pages 39 | 40 | 41 | .PHONY: all test clean web prebuild gh-pages doc 42 | -------------------------------------------------------------------------------- /Readme.md: -------------------------------------------------------------------------------- 1 | [Regenerate][web] [![Build Status](https://travis-ci.org/regex-generate/regenerate.svg?branch=master)](https://travis-ci.org/regex-generate/regenerate) [![Opam](https://img.shields.io/badge/opam-0.2-green.svg)](https://opam.ocaml.org/packages/regenerate/) [![doc](https://img.shields.io/badge/doc-online-green.svg)][docdev] 2 | ---------- 3 | 4 | Regenerate is a tool to generate test-cases for regular expression engines. 5 | 6 | Regenerate takes a [*regular* expression][regex] and generates strings that match it. 7 | It handles most posix extended regular expressions along with 8 | complement (`~a`) and intersection (`a&b`). 9 | Since it handles complement, it can also generate strings that 10 | *don't* match a given regular expression. 11 | 12 | Regenerate is both a tool to generate samples and a library to define test harnesses. There is also an [online demo][web] of the tool. 13 | 14 | ## Command line tool 15 | 16 | The command line tool can generate example strings matching a given regex. it takes 17 | a regex using the POSIX extended regular expression syntax (or [ERE][] for short). For example, here 18 | is a set of samples that are matched by the regex `(1(01*0)*1|0)*` on the alphabet composed of `0` and `1`: 19 | 20 | ``` 21 | % regenerate gen --sample 5 --alphabet "01" "(1(01*0)*1|0)*" 22 | 01001 23 | 11000 24 | 000011 25 | 001001 26 | 001100 27 | 011000 28 | 111111 29 | 0000110 30 | 0011000 31 | 0101101 32 | 1011101 33 | 1101001 34 | 00010101 35 | 00100100 36 | ``` 37 | 38 | Please consult the help for more details, or play with the [online demo][web]. 39 | 40 | ## Library 41 | 42 | The `regenerate` library allow to easily define test harnesses for regular expression engines. It contains utilities to randomly generate regular expressions and 43 | associated positive and negative samples. The main entry point of the library 44 | is the `Regenerate.arbitrary` function which exposes the sample generation as a 45 | [qcheck][] generator. 46 | 47 | See [`test_re.ml`](test/re/test_re.ml) for an example test harness for 48 | [`re`](https://github.com/ocaml/ocaml-re). 49 | You can also find (wip) [documentation for the dev version of the API][docdev]. 50 | 51 | ## Website 52 | 53 | Code for the online demo is hosted in the [web/](web) directory. It uses fairly 54 | simple [`js_of_ocaml`][jsoo] code. 55 | 56 | [regex]: https://en.wikipedia.org/wiki/Regular_expression 57 | [web]: https://regex-generate.github.io/regenerate/ 58 | [ERE]: https://en.wikipedia.org/wiki/Regular_expression#Standards 59 | [jsoo]: http://ocsigen.org/js_of_ocaml 60 | [docdev]: https://regex-generate.github.io/regenerate/doc/dev/regenerate/Regenerate/ 61 | [qcheck]: https://github.com/c-cube/qcheck/ 62 | -------------------------------------------------------------------------------- /bin/dune: -------------------------------------------------------------------------------- 1 | (executable 2 | (name gen_re) 3 | (public_name regenerate) 4 | (libraries mtime.clock.os regenerate fmt.tty cmdliner) 5 | (ocamlopt_flags :standard "-O3" "-inlining-report") 6 | ) 7 | -------------------------------------------------------------------------------- /bin/gen_re.ml: -------------------------------------------------------------------------------- 1 | open Regenerate 2 | open Cmdliner 3 | 4 | let () = Random.self_init () 5 | 6 | module W = Word.String 7 | 8 | type segment_impl = 9 | | ThunkList 10 | | ThunkListMemo 11 | | LazyList 12 | | StrictSet 13 | | Trie 14 | 15 | module type ARG = sig 16 | include Segments.OrderedMonoid 17 | include Segments.Trie.WORD with type t := t 18 | end 19 | module type S = 20 | functor (W : ARG) -> (Segments.S with type elt = W.t) 21 | 22 | let get_impl_mod : segment_impl -> (module S) = let open Segments in function 23 | | ThunkList -> (module ThunkList) 24 | | ThunkListMemo -> (module ThunkListMemo) 25 | | LazyList -> (module LazyList) 26 | | StrictSet -> (module StrictSet) 27 | | Trie -> (module Trie.Make) 28 | 29 | type conf = 30 | | All 31 | | Sample of { skip : int ; length : int option } 32 | | Take of int 33 | 34 | let[@inline] make_impl ~impl = 35 | let module M = (val get_impl_mod impl) in 36 | let module S = M(W) in 37 | let module A = Regenerate.Make (W) (S) in 38 | fun[@inline] ~sigma -> 39 | let sigma = S.of_list @@ List.map W.singleton @@ CCString.to_list sigma in 40 | let module Sigma = struct type t = S.t let sigma = sigma end in 41 | let module A = A (Sigma) in 42 | fun conf re -> 43 | let lang = A.gen re in 44 | match conf with 45 | | All -> A.flatten lang 46 | | Sample { skip ; length } -> 47 | CCFun.(%) ignore @@ A.sample ~skip ?n:length lang 48 | | Take n -> Iter.take n @@ A.flatten lang 49 | 50 | let tl = make_impl ~impl:ThunkList ~sigma:"ab" 51 | let tlm = make_impl ~impl:ThunkListMemo ~sigma:"ab" 52 | let ll = make_impl ~impl:LazyList ~sigma:"ab" 53 | let set = make_impl ~impl:StrictSet ~sigma:"ab" 54 | let trie = make_impl ~impl:Trie ~sigma:"ab" 55 | 56 | let get_impl ~impl ~sigma = if sigma <> "ab" then 57 | make_impl ~impl ~sigma 58 | else match impl with 59 | | ThunkList -> tl 60 | | ThunkListMemo -> tlm 61 | | LazyList -> ll 62 | | StrictSet -> set 63 | | Trie -> trie 64 | 65 | let backend = 66 | let doc = Arg.info ~docv:"IMPLEM" ~doc:"Implementation to use." 67 | ["i";"implementation"] 68 | in 69 | let c = Arg.enum [ 70 | "ThunkList", ThunkList ; 71 | "ThunkListMemo", ThunkListMemo ; 72 | "LazyList", LazyList ; 73 | "StrictSet", StrictSet ; 74 | "Trie", Trie ; 75 | ] 76 | in 77 | Arg.(value & opt c ThunkList & doc) 78 | 79 | 80 | let re_arg = 81 | let err_msg = function 82 | | `Parse_error -> `Msg "Incorrect regular expression" 83 | | `Not_supported -> `Msg "Unsupported syntax" 84 | in 85 | let printer fmt _ = Fmt.pf fmt "" in 86 | let parser s = match parse s with Ok x -> Ok x | Error e -> Error (err_msg e) in 87 | let reconv = Arg.conv ~docv:"REGEX" (parser, printer) in 88 | let doc = 89 | Arg.info ~docv:"REGEX" ~doc:"Regular expression following Posix's Extended Regular Expression syntax." 90 | [] 91 | in 92 | Arg.(required & pos 0 (some reconv) None & doc) 93 | 94 | let bound = 95 | let doc = Arg.info ~docv:"BOUND" ~doc:"Limit the number of samples." 96 | ["b";"bound"] 97 | in 98 | Arg.(value & opt (some int) None & doc) 99 | 100 | let shutter = 101 | let doc = Arg.info ~docv:"SKIP" 102 | ~doc:"Interval for stuttering." 103 | ["s";"shutter"] 104 | in 105 | Arg.(value & opt int 20 & doc) 106 | 107 | let skip = 108 | let doc = Arg.info ~docv:"SAMPLE" 109 | ~doc:"Average sampling interval." 110 | ["s";"sample"] 111 | in 112 | Arg.(value & opt (some int) None & doc) 113 | 114 | 115 | let time_limit = 116 | let doc = Arg.info ~docv:"LIMIT" ~doc:"Time limit for stuttering in seconds." 117 | ["l";"limit"] 118 | in 119 | Arg.(value & opt int64 5L & doc) 120 | 121 | let sigma = 122 | let doc = Arg.info ~docv:"ALPHABET" ~doc:"Alphabet used by the regular expression" 123 | ["a";"alphabet"] 124 | in 125 | let default = CCString.of_list @@ CCOpt.get_exn @@ Regex.enumerate ' ' '~' in 126 | Arg.(value & opt string default & doc) 127 | 128 | let setup ~impl ~sigma re = 129 | Fmt_tty.setup_std_outputs (); 130 | get_impl ~impl ~sigma re 131 | 132 | let print_all impl sigma re length skip = 133 | let conf = match length, skip with 134 | | Some n, None -> Take n 135 | | None, None -> All 136 | | _, Some skip -> Sample { skip ; length } 137 | in 138 | setup ~impl ~sigma conf re 139 | |> Fmt.pr "%a@." (CCFormat.seq ~sep:(Fmt.unit "@.") W.pp) 140 | 141 | let count impl sigma re n = 142 | let n = CCOpt.get_or ~default:1000 n in 143 | let c = Mtime_clock.counter () in 144 | let i = 145 | setup ~impl ~sigma (Take n) re 146 | |> Iter.length 147 | in 148 | Fmt.pr "Max count: %i@.Actual Count: %i@.Time: %a@." n i 149 | Mtime.Span.pp (Mtime_clock.count c) 150 | 151 | let measure_until ~limit ~interval oc lang = 152 | let c = Mtime_clock.counter () in 153 | let r = ref 0 in 154 | let fmt = Format.formatter_of_out_channel oc in 155 | let output i s = Fmt.pf fmt "%i\t%f@." i (Mtime.Span.to_s s) in 156 | let f _ = 157 | incr r ; 158 | let i = !r in 159 | if i mod interval = 0 then begin 160 | let t = Mtime_clock.count c in 161 | output i t ; 162 | if Mtime.Span.compare limit t < 0 163 | then raise Exit 164 | else () 165 | end 166 | in 167 | (try Iter.iter f lang with Exit -> ()); 168 | close_out oc ; 169 | () 170 | 171 | let running_profile impl re sigma stutter limit = 172 | let oc = stdout in 173 | setup ~impl ~sigma All re 174 | |> measure_until 175 | ~limit:(Mtime.Span.of_uint64_ns (Int64.mul limit 1_000_000_000L)) 176 | ~interval:stutter 177 | oc 178 | 179 | let gen_cmd = 180 | let info = 181 | Term.info "generate" 182 | ~doc:"Generate all strings matching a given regular expression." 183 | in 184 | let t = Term.(const print_all $ backend $ sigma $ re_arg $ bound $ skip) in 185 | (t, info) 186 | 187 | let count_cmd = 188 | let info = 189 | Term.info "count" 190 | ~doc:"Time language generation up to a certain number of strings." 191 | in 192 | let t = Term.(const count $ backend $ sigma $ re_arg $ bound) in 193 | (t, info) 194 | 195 | let profile_cmd = 196 | let info = 197 | Term.info "profile" 198 | ~doc:"Profile language generation for the given regular expression." 199 | in 200 | let t = Term.(const running_profile $ backend $ re_arg $ sigma $ shutter $ time_limit) in 201 | (t, info) 202 | 203 | let cmds = [ profile_cmd ; count_cmd ; gen_cmd ] 204 | let default_cmd = 205 | let doc = "Language generation for regular expressions." in 206 | let info = Term.info "regenerate" ~doc in 207 | let t = Term.(ret (const @@ `Help (`Pager, None))) in 208 | (t, info) 209 | 210 | let () = Term.exit @@ Term.eval_choice default_cmd cmds 211 | -------------------------------------------------------------------------------- /dune-project: -------------------------------------------------------------------------------- 1 | (lang dune 1.2) 2 | (name regenerate) 3 | -------------------------------------------------------------------------------- /jbuild-workspace.dev: -------------------------------------------------------------------------------- 1 | ;; Install the following opam switches, copy this file as 2 | ;; jbuild-workspace and run: 3 | ;; 4 | ;; $ jbuilder build @install 5 | 6 | (context ((switch 4.06.1))) 7 | (context ((switch 4.06.1+flambda))) 8 | -------------------------------------------------------------------------------- /lib/dune: -------------------------------------------------------------------------------- 1 | (library 2 | (name regenerate) 3 | (public_name regenerate) 4 | (libraries segments fmt iter qcheck containers) 5 | (ocamlopt_flags :standard "-O3" "-inlining-report") 6 | ) 7 | -------------------------------------------------------------------------------- /lib/langgen.ml: -------------------------------------------------------------------------------- 1 | module type SIGMA = sig 2 | type t 3 | val sigma : t 4 | end 5 | 6 | module[@inline always] Make 7 | (Word : Word.S) 8 | (Segment : Segments.S with type elt = Word.t) 9 | (Sigma : SIGMA with type t = Segment.t) 10 | = struct 11 | 12 | module Word = Word 13 | module Segment = Segment 14 | 15 | (** Spline of a language, a cascade-like thunk list with multiple nils. *) 16 | type node = 17 | | Nothing 18 | | Everything 19 | | Cons of Segment.t * lang 20 | and lang = unit -> node 21 | 22 | (** Utilities *) 23 | 24 | let segmentEpsilon = Segment.return Word.empty 25 | let nothing () = Nothing 26 | let everything () = Everything 27 | let (@:) h t () = Cons (h, t) 28 | 29 | let langEpsilon = segmentEpsilon @: nothing 30 | 31 | module IMap = struct 32 | include CCMap.Make(CCInt) 33 | let save k s m = 34 | if Segment.is_empty s then m 35 | else add k (Segment.memoize s) m 36 | end 37 | 38 | (** Precomputed full language. Used to replace "Everything" when need *) 39 | module Sigma_star = struct 40 | type t = (Segment.t, CCVector.rw) CCVector.t 41 | 42 | let v : t = CCVector.make 1 segmentEpsilon 43 | 44 | let rec complete_from_to i j = 45 | if i > j then () 46 | else 47 | let s = Segment.append Sigma.sigma (CCVector.get v (i-1)) in 48 | CCVector.push v @@ Segment.memoize s; 49 | complete_from_to (i+1) j 50 | 51 | let get i = 52 | assert (i >= 0); 53 | let l = CCVector.size v in 54 | if i < l then CCVector.get v i 55 | else begin 56 | CCVector.ensure_with ~init:Segment.empty v (i+1); 57 | complete_from_to l i ; 58 | CCVector.get v i 59 | end 60 | 61 | let rec iter n k = 62 | k (get n) ; 63 | iter (n+1) k 64 | end 65 | 66 | let pp_item = 67 | Fmt.parens @@ 68 | Fmt.hbox @@ 69 | Fmt.iter ~sep:(Fmt.unit ", ") (CCFun.flip Segment.to_iter) Word.pp 70 | let pp fmt (l : lang) = 71 | let pp_sep = Fmt.unit "@." in 72 | let rec pp fmt l = 73 | pp_sep fmt (); 74 | match l() with 75 | | Nothing -> Fmt.pf fmt "(Nothing)" 76 | | Everything -> Fmt.pf fmt "(Everything)" 77 | | Cons (x,l') -> 78 | pp_item fmt x ; pp fmt l' 79 | in 80 | pp fmt l 81 | 82 | let of_list l = 83 | let rec aux n l () = match l with 84 | | [] -> nothing () 85 | | _ -> 86 | let x, rest = CCList.partition (fun s -> Word.length s = n) l in 87 | Cons (Segment.of_list x, aux (n+1) rest) 88 | in aux 0 l 89 | 90 | (** Classic operations *) 91 | 92 | let rec union s1 s2 () = match s1(), s2() with 93 | | Everything, _ | _, Everything -> Everything 94 | | Nothing, x | x, Nothing -> x 95 | | Cons (x1, next1), Cons (x2, next2) -> 96 | Cons (Segment.union x1 x2, union next1 next2) 97 | 98 | let rec inter s1 s2 () = match s1(), s2() with 99 | | Everything, x | x, Everything -> x 100 | | Nothing, _ | _, Nothing -> Nothing 101 | | Cons (x1, next1), Cons (x2, next2) -> 102 | Cons (Segment.inter x1 x2, inter next1 next2) 103 | 104 | let rec difference_aux i s1 s2 () = match s1(), s2() with 105 | | Nothing, _ -> Nothing 106 | | _, Everything -> Nothing 107 | | x, Nothing -> x 108 | | Everything, Cons (x2, next2) -> 109 | let x1 = Sigma_star.get i and next1 = everything in 110 | Cons (Segment.diff x1 x2, difference_aux (i+1) next1 next2) 111 | | Cons (x1, next1), Cons (x2, next2) -> 112 | Cons (Segment.diff x1 x2, difference_aux (i+1) next1 next2) 113 | 114 | let difference = difference_aux 0 115 | let compl = difference everything 116 | 117 | (** Concatenation *) 118 | 119 | (** Invariants for each language: 120 | - [nbSeg = List.length indices] 121 | - After [explode_head], [CCVector.size vec >= n] 122 | - if [bound = Some n] then [n >= nbSeg] and [seqₙ = Nothing]. 123 | *) 124 | 125 | let[@inline] explode_head vec (seq, bound, nbSeg, indices) n = 126 | match bound with 127 | | Some _ -> nothing, bound, nbSeg, indices 128 | | None -> match seq() with 129 | | Nothing -> nothing, Some n, nbSeg, indices 130 | | Everything -> 131 | begin if n = CCVector.size vec then 132 | CCVector.push vec @@ Sigma_star.get n 133 | end ; 134 | everything, None, nbSeg+1, n :: indices 135 | | Cons (segm, s) -> 136 | begin if n = CCVector.size vec then 137 | CCVector.push vec @@ Segment.memoize segm 138 | end; 139 | let b = Segment.is_empty segm in 140 | let indices' = if b then indices else n :: indices in 141 | let nbSeg' = if b then nbSeg else nbSeg+1 in 142 | s, None, nbSeg', indices' 143 | 144 | let[@inline] concat_subterms_of_length ~n ~f validIndicesA vecA vecB = 145 | let rec combine_segments acc = function 146 | | [] -> acc 147 | | i :: l -> 148 | (* indices are in decreasing order, we can bail early. *) 149 | if n - i >= CCVector.size vecB then acc 150 | else 151 | combine_segments 152 | (f ~a:(CCVector.get vecA i) (CCVector.get vecB (n - i)) :: acc) 153 | l 154 | in 155 | validIndicesA 156 | |> combine_segments [] 157 | |> Segment.merge 158 | 159 | let[@inline] combine_segments_left ~n indL vecL vecR = 160 | concat_subterms_of_length 161 | ~n ~f:(fun ~a b -> Segment.append a b) indL vecL vecR 162 | let[@inline] combine_segments_right ~n vecL indR vecR = 163 | concat_subterms_of_length 164 | ~n ~f:(fun ~a b -> Segment.append b a) indR vecR vecL 165 | 166 | let concatenate seqL0 seqR0 = 167 | let vecL = CCVector.make 0 Segment.empty in 168 | let vecR = CCVector.make 0 Segment.empty in 169 | let[@specialize] rec collect n descL descR () = 170 | let (_, boundL, nbSegL, indL) as descL = explode_head vecL descL n in 171 | let (_, boundR, nbSegR, indR) as descR = explode_head vecR descR n in 172 | match boundL, boundR with 173 | | Some bL, Some bR when n >= bL + bR - 1 -> Nothing 174 | | Some _, _ when nbSegL = 0 -> Nothing 175 | | _, Some _ when nbSegR = 0 -> Nothing 176 | | _ -> 177 | let head = 178 | if nbSegL <= nbSegR then 179 | combine_segments_left ~n indL vecL vecR 180 | else 181 | combine_segments_right ~n vecL indR vecR 182 | in 183 | let tail = collect (n+1) descL descR in 184 | Cons (head, tail) 185 | in 186 | collect 0 (seqL0, None, 0, []) (seqR0, None, 0, []) 187 | 188 | 189 | (** Star *) 190 | 191 | let star_subterms_of_length ~max validIndices mapS = 192 | let combine_segments (i, segm) = 193 | match IMap.get (max - i) mapS with 194 | | None -> Segment.empty 195 | | Some s -> Segment.append segm s 196 | in 197 | validIndices 198 | |> List.rev_map combine_segments 199 | |> Segment.merge 200 | 201 | let star = 202 | let rec collect n mapS seq validIndices () = match seq () with 203 | | Everything -> Everything 204 | | Nothing -> 205 | let segmS = star_subterms_of_length ~max:n validIndices mapS in 206 | let mapS = IMap.save n segmS mapS in 207 | Cons (segmS, collect (n+1) mapS seq validIndices) 208 | | Cons (segm, seq) -> 209 | let validIndices = 210 | if Segment.is_empty segm 211 | then validIndices 212 | else (n, segm) :: validIndices 213 | in 214 | let segmS = star_subterms_of_length ~max:n validIndices mapS in 215 | let mapS = IMap.save n segmS mapS in 216 | Cons (segmS, collect (n+1) mapS seq validIndices) 217 | in 218 | fun s () -> match s() with 219 | | Nothing -> Cons (segmentEpsilon, nothing) 220 | | Everything as v -> v 221 | | Cons (_, seq) -> 222 | match seq() with 223 | | Nothing -> Cons (segmentEpsilon, nothing) 224 | | seq -> 225 | let mS = IMap.singleton 0 segmentEpsilon in 226 | Cons (segmentEpsilon, collect 1 mS (fun () -> seq) []) 227 | 228 | let add_epsilonX i x = if i = 0 then union x langEpsilon else x 229 | let dec k = max (k-1) 0 230 | let rec rep_with_acc acc i j lang = 231 | match i, j with 232 | | 0, None -> concatenate acc (star lang) 233 | | 0, Some 0 -> acc 234 | | i, j -> 235 | let acc = 236 | concatenate (add_epsilonX i lang) acc 237 | in 238 | rep_with_acc acc (dec i) (CCOpt.map dec j) lang 239 | let rep i j lang = match i,j with 240 | | 0, None -> star lang 241 | | 0, Some 0 -> langEpsilon 242 | | i, j -> 243 | let acc = add_epsilonX i lang in 244 | rep_with_acc acc (dec i) (CCOpt.map dec j) lang 245 | 246 | (** Others *) 247 | 248 | let charset b l = match l with 249 | | [] when b -> nothing 250 | | l -> 251 | let set = Segment.of_list @@ List.map Word.singleton l in 252 | let segm1 = 253 | if b then set else Segment.diff Sigma.sigma set 254 | in 255 | Segment.empty @: segm1 @: nothing 256 | 257 | 258 | (****) 259 | 260 | let rec gen : Word.char Regex.t -> lang = function 261 | | Set (b, l) -> charset b l 262 | | One -> langEpsilon 263 | | Seq (r1, r2) -> concatenate (gen r1) (gen r2) 264 | | Or (r1, r2) -> union (gen r1) (gen r2) 265 | | And (r1, r2) -> inter (gen r1) (gen r2) 266 | | Not r -> compl (gen r) 267 | | Rep (i, j, r) -> rep i j (gen r) 268 | 269 | (** Exporting *) 270 | 271 | let rec flatten_from n s k = match s () with 272 | | Nothing -> () 273 | | Everything -> 274 | Sigma_star.iter n (fun s -> Segment.to_iter s k) 275 | | Cons (x, s) -> 276 | Segment.to_iter x k; 277 | flatten_from (n+1) s k 278 | 279 | let flatten s = flatten_from 0 s 280 | 281 | type res = Done | Finite | GaveUp 282 | 283 | (** [sample ~skip ~n lang] returns a sequence of on average [n] elements. 284 | [lang] is only consumed when needed. 285 | 286 | We sample one element every [k], where [k] follows a power law of 287 | average [skip]. Furthermore, if we consume more than [sqrt k] empty segments, 288 | we assume that the rest of the segments will be infinitely empty and 289 | stop. 290 | 291 | If [firsts] is provided, we always output the [firsts] first elements. 292 | *) 293 | exception ExitSample 294 | let sample ?(st=Random.State.make_self_init ()) ?n ?(firsts=0) ~skip lang (k : _ -> unit) = 295 | let i = ref (-1) in 296 | 297 | (* The number of element to always take at the beginning. *) 298 | let rem_firsts = ref firsts in 299 | 300 | (* Draw the amount of element we skip and store the next element to take. *) 301 | let draw_skip st = 302 | let f = !rem_firsts in 303 | if f > 0 304 | then (decr rem_firsts ; 0) 305 | else 306 | let u = Random.State.float st 1. in 307 | 1 + int_of_float (-. (float skip) *. log1p (-. u)) 308 | in 309 | let next = ref (draw_skip st) in 310 | 311 | (* Our chance to continue after each sample. *) 312 | let continue st = 313 | match n with 314 | | Some n -> Random.State.float st 1. > (1. /. float n) 315 | | None -> true 316 | in 317 | 318 | (* Our "empty segment" budget. If we exceed this, we stop. *) 319 | let budget_of_skip n = 2 + (int_of_float @@ sqrt @@ float n) in 320 | let budget = ref (budget_of_skip !next) in 321 | 322 | let onSegm x = 323 | incr i ; 324 | if i < next then () 325 | else begin 326 | k x ; 327 | if not (continue st) then raise ExitSample 328 | else begin 329 | let newskip = draw_skip st in 330 | next := !next + newskip ; 331 | budget := budget_of_skip newskip ; 332 | end 333 | end 334 | in 335 | let rec walk_lang n seq = 336 | let i0 = !i in 337 | let next segm seq = 338 | match Segment.to_iter segm onSegm with 339 | | exception ExitSample -> Done 340 | | () -> 341 | let i1 = !i in 342 | if i0 <> i1 then walk_lang (n+1) seq 343 | else if !budget < 0 then GaveUp 344 | else begin 345 | decr budget ; 346 | walk_lang (n+1) seq 347 | end 348 | in 349 | match seq () with 350 | | Nothing -> Finite 351 | | Everything -> 352 | let segm = Sigma_star.get n in 353 | next segm everything 354 | | Cons (segm, seq) -> 355 | next segm seq 356 | in 357 | walk_lang 0 lang 358 | 359 | end 360 | 361 | let arbitrary 362 | (type t) (type char) 363 | (module W : Word.S with type char = char and type t = t) 364 | (module S : Segments.S with type elt = W.t) 365 | ?(skip=8) 366 | ~compl 367 | ~pp 368 | ~samples 369 | (alphabet : W.char list) = 370 | let sigma = S.of_list @@ List.map W.singleton alphabet in 371 | let module Sigma = struct type t = S.t let sigma = sigma end in 372 | let module L = Make (W) (S) (Sigma) in 373 | let gen st = 374 | let open QCheck.Gen in 375 | let re = Regex.gen ~compl (oneofl alphabet) st in 376 | Fmt.epr "Regex: %a@." (Regex.pp pp) re; 377 | let print_samples s l = 378 | Fmt.epr "@[<2>%s:@ %a@]@." s Fmt.(list ~sep:(unit ",@ ") W.pp) l 379 | in 380 | let lang = L.gen re in 381 | let f s l = 382 | l 383 | |> L.sample ~st ~skip ~n:samples 384 | |> CCFun.(%) ignore 385 | |> Iter.to_list 386 | |> CCFun.tap (print_samples s) 387 | in 388 | let pos_examples = f "Pos" lang in 389 | let neg_examples = f "Neg" @@ L.compl lang in 390 | (re, pos_examples, neg_examples) 391 | in 392 | let pp fmt (x, l , l') = 393 | Fmt.pf fmt "@[<2>Regex:@ %a@]@.@[Pos:@ %a@]@.@[Neg:@ %a@]" 394 | (Regex.pp pp) x Fmt.(list W.pp) l Fmt.(list W.pp) l' 395 | in 396 | let print = Fmt.to_to_string pp in 397 | let small (x, _, _) = Regex.size x in 398 | let shrink = QCheck.Shrink.(triple nil list list) in 399 | QCheck.make ~print ~small ~shrink gen 400 | 401 | -------------------------------------------------------------------------------- /lib/parsing.ml: -------------------------------------------------------------------------------- 1 | (** Posix parser, borrowed from Re *) 2 | 3 | exception Parse_error 4 | exception Not_supported 5 | 6 | let parse s = 7 | let i = ref 0 in 8 | let l = String.length s in 9 | let eos () = !i = l in 10 | let test c = not (eos ()) && s.[!i] = c in 11 | let accept c = let r = test c in if r then incr i; r in 12 | let get () = let r = s.[!i] in incr i; r in 13 | let unget () = decr i in 14 | 15 | let rec regexp () = regexp' (branch ()) 16 | and regexp' left = 17 | if accept '|' then regexp' (Regex.alt left (branch ())) 18 | else if accept '&' then regexp' (Regex.inter left (branch ())) 19 | else left 20 | and branch () = branch' [] 21 | and branch' left = 22 | if eos () || test '|' || test '&' || test ')' then Regex.seq (List.rev left) 23 | else branch' (piece () :: left) 24 | and piece () = 25 | let not_op = 26 | if accept '~' then Regex.compl else fun x -> x 27 | in 28 | let r = not_op @@ atom () in 29 | if accept '*' then Regex.star r else 30 | if accept '+' then Regex.plus r else 31 | if accept '?' then Regex.opt r else 32 | if accept '{' then 33 | match integer () with 34 | | Some i -> 35 | let j = if accept ',' then integer () else Some i in 36 | if not (accept '}') then raise Parse_error; 37 | begin match j with 38 | Some j when j < i -> raise Parse_error | _ -> () 39 | end; 40 | Regex.rep i j r 41 | | None -> 42 | unget (); r 43 | else 44 | r 45 | and atom () = 46 | if accept '.' then begin 47 | raise Not_supported 48 | (* if newline then Re.notnl else Re.any *) 49 | end else if accept '(' then begin 50 | let r = regexp () in 51 | if not (accept ')') then raise Parse_error; 52 | r 53 | end else 54 | if accept '^' then begin 55 | raise Not_supported 56 | (* if newline then Re.bol else Re.bos *) 57 | end else if accept '$' then begin 58 | raise Not_supported 59 | (* if newline then Re.eol else Re.eos *) 60 | end else if accept '[' then begin 61 | if accept '^' then 62 | Regex.complset (bracket []) 63 | else 64 | Regex.charset (bracket []) 65 | end else 66 | if accept '\\' then begin 67 | if eos () then raise Parse_error; 68 | match get () with 69 | '|' | '&' | '(' | ')' | '*' | '+' | '?' | '~' 70 | | '[' | '.' | '^' | '$' | '{' | '\\' as c -> Regex.char c 71 | | _ -> raise Parse_error 72 | end else begin 73 | if eos () then raise Parse_error; 74 | match get () with 75 | '*' | '+' | '?' | '{' | '\\' -> raise Parse_error 76 | | c -> Regex.char c 77 | end 78 | and integer () = 79 | if eos () then None else 80 | match get () with 81 | '0'..'9' as d -> integer' (Char.code d - Char.code '0') 82 | | _ -> unget (); None 83 | and integer' i = 84 | if eos () then Some i else 85 | match get () with 86 | '0'..'9' as d -> 87 | let i' = 10 * i + (Char.code d - Char.code '0') in 88 | if i' < i then raise Parse_error; 89 | integer' i' 90 | | _ -> 91 | unget (); Some i 92 | and bracket s = 93 | if s <> [] && accept ']' then s else begin 94 | let c = char () in 95 | if accept '-' then begin 96 | if accept ']' then c :: '-' :: s else begin 97 | let c' = char () in 98 | match Regex.enumerate c c' with 99 | | None -> raise Parse_error 100 | | Some l -> bracket (l @ s) 101 | end 102 | end else 103 | bracket (c :: s) 104 | end 105 | and char () = 106 | if eos () then raise Parse_error; 107 | let c = get () in 108 | if c = '[' then begin 109 | if accept '=' then raise Not_supported 110 | else if accept ':' then begin 111 | raise Not_supported (*XXX*) 112 | end else if accept '.' then begin 113 | if eos () then raise Parse_error; 114 | let c = get () in 115 | if not (accept '.') then raise Not_supported; 116 | if not (accept ']') then raise Parse_error; 117 | c 118 | end else 119 | c 120 | end else 121 | c 122 | in 123 | let res = regexp () in 124 | if not (eos ()) then raise Parse_error; 125 | res 126 | 127 | let parse s = 128 | try Ok (parse s) with 129 | | Parse_error -> Error `Parse_error 130 | | Not_supported -> Error `Not_supported 131 | -------------------------------------------------------------------------------- /lib/regenerate.ml: -------------------------------------------------------------------------------- 1 | module Regex = Regex 2 | module Word = Word 3 | module Segments = Segments 4 | 5 | type 'a regex = 'a Regex.t 6 | 7 | let parse = Parsing.parse 8 | 9 | include Langgen 10 | -------------------------------------------------------------------------------- /lib/regenerate.mli: -------------------------------------------------------------------------------- 1 | (** Regenerate is a library to generate test cases for regular expression engines. 2 | 3 | Here is a typical use of the library, for creating a test harness 4 | with {!QCheck}. 5 | {[ 6 | let test = 7 | (* The alphabet is [abc] *) 8 | let alphabet = ['a'; 'b'; 'c'] in 9 | 10 | (* Words are made of regular strings. *) 11 | let module Word = Regenerate.Word.String in 12 | 13 | (* Streams are made of ThunkLists. *) 14 | let module Stream = Regenerate.Segments.ThunkList(Word) in 15 | 16 | let generator = 17 | Regenerate.arbitrary 18 | (module Word) 19 | (module Stream) 20 | ~compl:false (* Do not generate complement operators. *) 21 | ~pp:Fmt.char (* Printer for characters. *) 22 | ~samples:100 (* We want on average 100 samples for each regex. *) 23 | alphabet 24 | in 25 | 26 | QCheck.Test.make generator check (* Test the [check] function. *) 27 | ]} 28 | *) 29 | 30 | type 'a regex = 'a Regex.t 31 | (** The type of regular expressions on characters of type ['a]. *) 32 | 33 | val arbitrary: 34 | (module Word.S with type char = 'char and type t = 'word) -> 35 | (module Segments.S with type elt = 'word) -> 36 | ?skip:int -> 37 | compl:bool -> 38 | pp:'char Fmt.t -> 39 | samples:int -> 40 | 'char list -> 41 | ('char Regex.t * 'word list * 'word list) QCheck.arbitrary 42 | (** [Regenerate.arbitrary (module W) (module S) ~compl ~pp ~samples alpha] 43 | creates a {!QCheck} generator that generates triples containing 44 | a {!regex} and a list of positive and negative samples. 45 | 46 | @param W is a module implementing operation on words. See {!Word} for some predefined modules. 47 | @param S is a module implementing a data-structure enumerating words. See {!Segments} for some predefined modules. We recommend {!Segments.ThunkList}. 48 | @param skip specifies how many samples should we skip on average. Default is [8]. 49 | @param compl specifies if we generate regex containing the complement operator. 50 | @param pp specifies how to print individual characters. 51 | @param alpha describes the alphabet as a list of characters. 52 | 53 | *) 54 | 55 | val parse : 56 | string -> 57 | (char regex, [> `Not_supported | `Parse_error ]) result 58 | (** [Regenerate.parse s] returns the associated {!regex}. 59 | It recognizes the Posix Extended Regular Expression syntax plus complement ([~a]) and intersection ([a&b]). 60 | Character classes are not supported. 61 | *) 62 | 63 | module Regex = Regex 64 | (** Definition of Regular expressions and associated utilities. *) 65 | 66 | module Word = Word 67 | (** Generic definitions of words on which regular expression can match. *) 68 | 69 | module Segments = Segments 70 | (** Streaming data-structures that will contain the generate samples. *) 71 | 72 | 73 | (** {2:functor Functorial API} 74 | 75 | This API allows full access to generators and regular operators. 76 | For casual use of the library, consider using {!arbitrary} instead. 77 | *) 78 | 79 | module type SIGMA = sig 80 | type t 81 | val sigma : t 82 | end 83 | 84 | (** [Regenerate.Make(W)(S)(A)] is a module that implements 85 | sample generation for words implemented by the module [W] with the alphabet [A]. 86 | [S] describes the data structure used internally for the enumeration of 87 | words. 88 | 89 | For casual use of the library, consider using {!arbitrary} instead. 90 | *) 91 | module Make 92 | (Word : Word.S) 93 | (Segment : Segments.S with type elt = Word.t) 94 | (Sigma : SIGMA with type t = Segment.t) : sig 95 | 96 | type node = 97 | | Nothing 98 | | Everything 99 | | Cons of Segment.t * lang 100 | and lang = unit -> node 101 | (** A language is a lazy stream of words segmented by growing length. *) 102 | 103 | val pp : Format.formatter -> lang -> unit 104 | (** [pp fmt lang] pretty print the language [lang]. *) 105 | 106 | val gen : Word.char Regex.t -> lang 107 | (** [gen regex] returns the language recognized by [regex]. *) 108 | 109 | (** {2 Sampling} *) 110 | 111 | type res = 112 | | Done 113 | | Finite 114 | | GaveUp 115 | exception ExitSample 116 | 117 | val sample : 118 | ?st:Random.State.t -> 119 | ?n:int -> 120 | ?firsts:int -> 121 | skip:int -> lang -> (Segment.elt -> unit) -> res 122 | (** [sample ~skip ~n lang] returns a sequence of on average [n] elements. 123 | [lang] is only consumed when needed. 124 | 125 | We sample one element every [k], where [k] follows a power law of 126 | average [skip]. Furthermore, if we consume more than [sqrt k] empty segments, 127 | we assume that the rest of the segments will be infinitely empty and 128 | stop. 129 | 130 | If [firsts] is provided, we always output the [firsts] first elements. 131 | *) 132 | 133 | (** {2 Operations on languages} *) 134 | 135 | val flatten : lang -> Segment.elt Iter.t 136 | (** [flatten lang] returns the sequence of its segments. *) 137 | 138 | (** {3 Regular operations} *) 139 | 140 | val union : lang -> lang -> lang 141 | val inter : lang -> lang -> lang 142 | val difference : lang -> lang -> lang 143 | val compl : lang -> lang 144 | val concatenate : lang -> lang -> lang 145 | val star : (unit -> node) -> unit -> node 146 | val rep : int -> int option -> lang -> lang 147 | val charset : bool -> Word.char list -> unit -> node 148 | 149 | end 150 | -------------------------------------------------------------------------------- /lib/regex.ml: -------------------------------------------------------------------------------- 1 | type 'a cset = 'a list 2 | 3 | type 'a t 4 | = One 5 | | Set of bool * 'a cset 6 | | Seq of 'a t * 'a t 7 | | Or of 'a t * 'a t 8 | | And of 'a t * 'a t 9 | | Not of 'a t 10 | | Rep of int * int option * 'a t 11 | 12 | (** Smart constructors *) 13 | 14 | let epsilon = One 15 | let void = Set (true, []) 16 | let atom c = Set (true, [c]) 17 | let char c = atom c 18 | let charset cs = Set (true, cs) 19 | let complset cs = Set (false, cs) 20 | let enumerate c1 c2 = 21 | if c1 > c2 then None 22 | else 23 | let rec aux i m = 24 | if i > m then [] 25 | else Char.chr i :: aux (i+1) m 26 | in 27 | Some (aux (Char.code c1) (Char.code c2)) 28 | 29 | let rec reduce init f = function 30 | | [] -> init 31 | | [x] -> x 32 | | x :: l -> f x (reduce init f l) 33 | 34 | let seq l = reduce One (fun x y -> Seq (x,y)) l 35 | let alt x y = Or (x,y) 36 | let inter x y = And (x,y) 37 | let compl x = Not x 38 | 39 | let rep i j x = Rep (i, j, x) 40 | let star x = rep 0 None x 41 | let plus x = rep 1 None x 42 | let opt x = rep 0 (Some 1) x 43 | 44 | (** QCheck utilities *) 45 | 46 | let rec size = function 47 | | One -> 1 48 | | Set _ -> 1 49 | | Rep (_,_,a) 50 | | Not a -> size a + 1 51 | | Or (a,b) 52 | | And (a,b) 53 | | Seq (a,b) -> size a + size b + 1 54 | 55 | let prio = function 56 | | And (_,_) -> 1 57 | | Or (_,_) -> 2 58 | | Seq (_,_) -> 3 59 | | Not _ -> 4 60 | | Rep (_,_,_) -> -1 61 | | One 62 | | Set _ -> 6 63 | 64 | let rec pp ?(epsilon=true) ppalpha fmt x = 65 | let f fmt y = 66 | if prio y < prio x || prio y = -1 67 | then Fmt.parens (pp ~epsilon ppalpha) fmt y 68 | else pp ~epsilon ppalpha fmt y 69 | in 70 | match x with 71 | | One -> Fmt.pf fmt (if epsilon then "ε" else "") 72 | | Set (true,[x]) -> Fmt.pf fmt "%a" ppalpha x 73 | | Set (b,l) -> Fmt.pf fmt "[%s%a]" 74 | (if b then "" else "^") (Fmt.list ~sep:Fmt.nop ppalpha) l 75 | | Seq (a,b) -> Fmt.pf fmt "%a%a" f a f b 76 | | Or (a,b) -> Fmt.pf fmt "%a|%a" f a f b 77 | | And (a,b) -> Fmt.pf fmt "%a&%a" f a f b 78 | | Not a -> Fmt.pf fmt "~%a" f a 79 | | Rep (0,None,a) -> Fmt.pf fmt "%a*" f a 80 | | Rep (1,None,a) -> Fmt.pf fmt "%a+" f a 81 | | Rep (i,None,a) -> Fmt.pf fmt "%a{%i,}" f a i 82 | | Rep (i,Some j,a) when i = j -> Fmt.pf fmt "%a{%i}" f a i 83 | | Rep (i,Some j,a) -> Fmt.pf fmt "%a{%i,%i}" f a i j 84 | 85 | let gen ~compl:with_compl alphabet = 86 | let open QCheck.Gen in 87 | let opt a = frequency [ 1, pure None ; 1, map CCOpt.return a] in 88 | 89 | let proba_compl = if with_compl then 3 else 0 in 90 | let gatom = alphabet >|= atom in 91 | let gset = 92 | bool >>= fun b -> 93 | map 94 | (fun l -> Set (b, CCList.uniq ~eq:(=) l)) 95 | (list_size (1 -- 10) alphabet) 96 | in 97 | let gbase = frequency [ 98 | (* 1, pure void ; *) 99 | 1, pure epsilon ; 100 | 8, gatom ; 101 | 5, gset ; 102 | ] in 103 | let rec gen nbRep n st = 104 | if n <= 1 then gbase st else 105 | frequency [ 106 | 1, gbase ; 107 | proba_compl, gcompl nbRep n ; 108 | 3, gbin nbRep n alt ; 109 | 2, gbin nbRep n inter ; 110 | 5, gbin nbRep n (fun x y -> Seq (x,y)) ; 111 | nbRep * 2, grep nbRep n ; 112 | ] st 113 | and grep nbRep n = 114 | int_bound 3 >>= fun i -> 115 | opt (int_range i 5) >>= fun j -> 116 | gen (nbRep - 1) (n-1) >|= fun a -> 117 | rep i j a 118 | and gcompl nbRep n = gen nbRep (n-1) >|= compl 119 | and gbin nbRep n f = 120 | gen nbRep ((n-1)/2) >>= fun a -> 121 | gen nbRep ((n-1)/2) >|= fun b -> 122 | f a b 123 | in 124 | sized_size (int_range 2 20) (gen 2) 125 | -------------------------------------------------------------------------------- /lib/word.ml: -------------------------------------------------------------------------------- 1 | module type S = sig 2 | type char 3 | type t 4 | val empty : t 5 | val singleton : char -> t 6 | val length : t -> int 7 | val append : t -> t -> t 8 | val cons : char -> t -> t 9 | val pp : Format.formatter -> t -> unit 10 | end 11 | 12 | module List (C : sig include Set.OrderedType val pp : t Fmt.t end) = struct 13 | type char = C.t 14 | type t = C.t list 15 | let empty = [] 16 | let singleton x = [x] 17 | let length = List.length 18 | let append = List.append 19 | let cons x l = x :: l 20 | let pp = CCFormat.(list ~sep:(fun _ () -> ()) C.pp) 21 | end 22 | 23 | module String = struct 24 | type nonrec char = char 25 | include CCString 26 | let empty = "" 27 | let singleton = make 1 28 | let cons c s = singleton c ^ s 29 | let append = (^) 30 | let compare_char = Char.compare 31 | let pp fmt s = 32 | if s = "" then 33 | Format.pp_print_string fmt "ε" 34 | else 35 | Format.pp_print_string fmt s 36 | let to_seq s k = String.iter k s 37 | end 38 | -------------------------------------------------------------------------------- /regenerate.descr: -------------------------------------------------------------------------------- 1 | Regenerate is a tool to generate test-cases for regular expression engines. 2 | 3 | Regenerate takes a regular expression and generates strings that match it. 4 | It handles most posix extended regular expressions along with 5 | complement (~a) and intersection (a&b). 6 | Since it handles complement, it can also generate strings that 7 | *don't* match a given regular expression. -------------------------------------------------------------------------------- /regenerate.opam: -------------------------------------------------------------------------------- 1 | opam-version: "2.0" 2 | version: "0.2" 3 | maintainer: "Drup " 4 | authors: "Drup " 5 | license: "ISC" 6 | homepage: "https://github.com/regex-generate/regenerate" 7 | bug-reports: "https://github.com/regex-generate/regenerate/issues" 8 | dev-repo: "git+https://github.com/regex-generate/regenerate.git" 9 | doc: "https://regex-generate.github.io/regenerate/doc/0.2/" 10 | 11 | depends: [ 12 | "dune" {build} 13 | "cmdliner" {= "1.0.4"} 14 | "fmt" {= "0.8.0"} 15 | "containers" {= "2.8.1"} 16 | "mtime" {= "1.4.0"} 17 | "oseq" 18 | "iter" 19 | "qcheck" 20 | "re" {with-test} 21 | "ocaml" {>= "4.03"} 22 | ] 23 | build: [ 24 | ["dune" "subst"] {pinned} 25 | ["dune" "build" "-p" name "-j" jobs] 26 | ["dune" "runtest" "-p" name "-j" jobs] {with-test} 27 | ] 28 | synopsis: 29 | "Regenerate is a tool to generate test-cases for regular expression engines" 30 | description: """ 31 | Regenerate takes a regular expression and generates strings that match it. 32 | It handles most posix extended regular expressions along with 33 | complement (~a) and intersection (a&b). 34 | Since it handles complement, it can also generate strings that 35 | *don't* match a given regular expression.""" 36 | -------------------------------------------------------------------------------- /segments/Heap.ml: -------------------------------------------------------------------------------- 1 | type 'a t = { 2 | tree : 'a tree; 3 | cmp : 'a -> 'a -> int; 4 | merge : 'a -> 'a -> 'a; 5 | } (** A pairing tree heap with the given comparison function *) 6 | 7 | and 'a tree = 8 | | Empty 9 | | Node of 'a * 'a tree * 'a tree 10 | 11 | let empty ~cmp ~merge = { 12 | tree = Empty; 13 | cmp; 14 | merge; 15 | } 16 | 17 | let is_empty h = 18 | match h.tree with 19 | | Empty -> true 20 | | Node _ -> false 21 | 22 | let rec union ({cmp;merge;_} as h) t1 t2 = match t1, t2 with 23 | | Empty, _ -> t2 24 | | _, Empty -> t1 25 | | Node (x1, l1, r1), Node (x2, l2, r2) -> 26 | let c = cmp x1 x2 in 27 | if c = 0 then Node (merge x1 x2, union h r1 r2, union h l1 l2) 28 | else if c < 0 then Node (x1, union h t2 r1, l1) 29 | else Node (x2, union h t1 r2, l2) 30 | 31 | let insert h x = 32 | { h with tree = union h (Node (x, Empty, Empty)) h.tree } 33 | 34 | let pop h = match h.tree with 35 | | Empty -> raise Not_found 36 | | Node (x, l, r) -> 37 | x, { h with tree = union h l r } 38 | -------------------------------------------------------------------------------- /segments/LazyList.ml: -------------------------------------------------------------------------------- 1 | module L = CCLazy_list 2 | 3 | let next = Lazy.force 4 | 5 | module Make (K : Sigs.OrderedMonoid) 6 | : Sigs.S with type elt = K.t and type t = K.t L.t 7 | = struct 8 | 9 | type elt = K.t 10 | type t = elt L.t 11 | 12 | let empty = L.empty 13 | let is_empty = L.is_empty 14 | let return = L.return 15 | 16 | let of_list l = L.of_list @@ CCList.sort_uniq ~cmp:K.compare l 17 | let iter f l = 18 | let rec aux l = match next l with 19 | | L.Cons (x, t) -> (f x; aux t) 20 | | Nil -> () 21 | in aux l 22 | let to_iter x f = iter f x 23 | 24 | 25 | let memoize x = x 26 | (* let pp = pp *) 27 | 28 | type drop = Drop | Keep 29 | let dropX s s' = function Drop -> s' | Keep -> s 30 | let rec merge_with l r f s1 s2 = lazy (match Lazy.force s1, Lazy.force s2 with 31 | | L.Nil, L.Nil -> L.Nil 32 | | Cons _, Nil -> l s1 33 | | Nil, Cons _ -> r s2 34 | | Cons (x1, s1'), Cons (x2, s2') -> 35 | let d1, d2, res = f x1 x2 in 36 | let k = merge_with l r f (dropX s1 s1' d1) (dropX s2 s2' d2) in 37 | match res with 38 | | Some x -> Cons (x, k) 39 | | None -> Lazy.force k 40 | ) 41 | let keep = Lazy.force 42 | let drop _ = L.Nil 43 | 44 | let union = 45 | let f x y = 46 | let i = K.compare x y in 47 | if i = 0 then Drop, Drop, Some x 48 | else if i < 0 then Drop, Keep, Some x 49 | else Keep, Drop, Some y 50 | in 51 | merge_with keep keep f 52 | 53 | let inter = 54 | let f x y = 55 | let i = K.compare x y in 56 | if i = 0 then Drop, Drop, Some x 57 | else if i < 0 then Drop, Keep, None 58 | else Keep, Drop, None 59 | in 60 | merge_with drop drop f 61 | 62 | let diff = 63 | let f x y = 64 | let i = K.compare x y in 65 | if i = 0 then Drop, Drop, None 66 | else if i < 0 then Drop, Keep, Some x 67 | else Keep, Drop, None 68 | in 69 | merge_with keep drop f 70 | 71 | let append l1 l2 = 72 | let open CCLazy_list.Infix in 73 | l1 >>= fun x1 -> l2 >|= fun x2 -> K.append x1 x2 74 | 75 | let merge l = 76 | let cmp (v1,_) (v2,_) = K.compare v1 v2 in 77 | let merge (x1, s1) (_, s2) = (x1, s1@s2) in 78 | let push h s = 79 | match Lazy.force s with L.Nil -> h | Cons (x, s') -> Heap.insert h (x, [s']) 80 | in 81 | let h0 = List.fold_left push (Heap.empty ~cmp ~merge) l in 82 | let rec next heap = 83 | lazy ( 84 | if Heap.is_empty heap then L.Nil else begin 85 | let (x, seq), heaps = Heap.pop heap in 86 | let new_heap = List.fold_left push heaps seq in 87 | L.Cons (x, next new_heap) 88 | end 89 | ) 90 | in 91 | next h0 92 | 93 | end 94 | -------------------------------------------------------------------------------- /segments/Segments.ml: -------------------------------------------------------------------------------- 1 | include Sigs 2 | 3 | module ThunkList = ThunkList.Make 4 | module ThunkListMemo = ThunkListMemo.Make 5 | module LazyList = LazyList.Make 6 | module StrictSet = StrictSet.Make 7 | module Trie = Trie 8 | -------------------------------------------------------------------------------- /segments/Sigs.ml: -------------------------------------------------------------------------------- 1 | module type OrderedMonoid = sig 2 | include Set.OrderedType 3 | val append : t -> t -> t 4 | end 5 | 6 | module type S = sig 7 | type elt 8 | type t 9 | val empty : t 10 | val is_empty : t -> bool 11 | val return : elt -> t 12 | 13 | val append: t -> t -> t 14 | val union : t -> t -> t 15 | val inter : t -> t -> t 16 | val diff : t -> t -> t 17 | val merge : t list -> t 18 | 19 | val of_list : elt list -> t 20 | val to_iter : t -> elt Iter.t 21 | 22 | val memoize : t -> t 23 | end 24 | -------------------------------------------------------------------------------- /segments/StrictSet.ml: -------------------------------------------------------------------------------- 1 | module Make (Elt : Sigs.OrderedMonoid) 2 | : Sigs.S with type elt = Elt.t and type t = Set.Make(Elt).t 3 | = struct 4 | include Set.Make(Elt) 5 | 6 | 7 | let merge = List.fold_left union empty 8 | let memoize x = x 9 | let return = singleton 10 | let append s1 s2 = 11 | fold (fun x1 acc1 -> 12 | fold (fun x2 acc2 -> add (Elt.append x1 x2) acc2) s2 acc1) s1 empty 13 | let to_iter e k = iter k e 14 | end 15 | -------------------------------------------------------------------------------- /segments/ThunkList.ml: -------------------------------------------------------------------------------- 1 | module Make (K : Sigs.OrderedMonoid) 2 | : Sigs.S with type elt = K.t and type t = K.t OSeq.t 3 | = struct 4 | open OSeq 5 | 6 | type elt = K.t 7 | type t = elt OSeq.t 8 | 9 | let empty = empty 10 | let is_empty = is_empty 11 | let of_list l = of_list @@ CCList.sort_uniq ~cmp:K.compare l 12 | let return = return 13 | let to_iter x f = iter f x 14 | let memoize x = x 15 | 16 | type drop = Drop | Keep 17 | let dropX s s' = function Drop -> s' | Keep -> s 18 | let rec merge_with l r f s1 s2 () = match s1 (), s2 () with 19 | | Nil, Nil -> Nil 20 | | Cons _, Nil -> l s1 () 21 | | Nil, Cons _ -> r s2 () 22 | | Cons (x1, s1'), Cons (x2, s2') -> 23 | let d1, d2, res = f x1 x2 in 24 | let k = merge_with l r f (dropX s1 s1' d1) (dropX s2 s2' d2) in 25 | match res with 26 | | Some x -> Cons (x, k) 27 | | None -> k () 28 | let keep x = x 29 | let drop _ = empty 30 | 31 | let union = 32 | let f x y = 33 | let i = K.compare x y in 34 | if i = 0 then Drop, Drop, Some x 35 | else if i < 0 then Drop, Keep, Some x 36 | else Keep, Drop, Some y 37 | in 38 | merge_with keep keep f 39 | 40 | let inter = 41 | let f x y = 42 | let i = K.compare x y in 43 | if i = 0 then Drop, Drop, Some x 44 | else if i < 0 then Drop, Keep, None 45 | else Keep, Drop, None 46 | in 47 | merge_with drop drop f 48 | 49 | let diff = 50 | let f x y = 51 | let i = K.compare x y in 52 | if i = 0 then Drop, Drop, None 53 | else if i < 0 then Drop, Keep, Some x 54 | else Keep, Drop, None 55 | in 56 | merge_with keep drop f 57 | 58 | let append l1 l2 = 59 | l1 >>= fun x -> l2 >|= fun y -> K.append x y 60 | 61 | let merge l = 62 | let cmp (v1,_) (v2,_) = K.compare v1 v2 in 63 | let merge (x1, s1) (_, s2) = (x1, s1@s2) in 64 | let push h s = 65 | match s() with Nil -> h | Cons (x, s') -> Heap.insert h (x, [s']) 66 | in 67 | let h0 = List.fold_left push (Heap.empty ~cmp ~merge) l in 68 | let rec next heap () = 69 | if Heap.is_empty heap then Nil else begin 70 | let (x, seq), heaps = Heap.pop heap in 71 | let new_heap = List.fold_left push heaps seq in 72 | Cons (x, next new_heap) 73 | end 74 | in 75 | next h0 76 | 77 | end 78 | -------------------------------------------------------------------------------- /segments/ThunkListMemo.ml: -------------------------------------------------------------------------------- 1 | module Make (K : Sigs.OrderedMonoid) 2 | : Sigs.S with type elt = K.t and type t = ThunkList.Make(K).t 3 | = struct 4 | open OSeq 5 | include ThunkList.Make(K) 6 | 7 | let memoize f = 8 | let r = CCVector.create () in 9 | let rec f' i seq () = 10 | if i < CCVector.length r 11 | then CCVector.get r i 12 | else 13 | let l = match seq() with 14 | | Nil -> Nil 15 | | Cons (x, tail) -> Cons (x, f' (i+1) tail) 16 | in 17 | CCVector.push r l; 18 | l 19 | in 20 | f' 0 f 21 | end 22 | -------------------------------------------------------------------------------- /segments/Trie.ml: -------------------------------------------------------------------------------- 1 | 2 | (* This file is free software, part of containers. See file "license" for more details. *) 3 | 4 | (** {1 Prefix Tree} *) 5 | 6 | (** {2 Signatures} *) 7 | 8 | (** {6 A Composite Word} 9 | 10 | Words are made of characters, who belong to a total order *) 11 | 12 | module type WORD = sig 13 | type t 14 | type char 15 | 16 | val compare_char : char -> char -> int 17 | val append : t -> t -> t 18 | val to_iter : t -> char Iter.t 19 | val of_list : char list -> t 20 | end 21 | 22 | module Make(W : WORD) 23 | (* : Sigs.S with type elt = W.t *) 24 | = struct 25 | type char = W.char 26 | type elt = W.t 27 | 28 | module M = Map.Make(struct 29 | type t = char 30 | let compare = W.compare_char 31 | end) 32 | 33 | type +'a trie = 34 | | Empty 35 | | Leaf of 'a 36 | | Node of 'a trie M.t 37 | 38 | type t = unit trie 39 | 40 | let empty = Empty 41 | 42 | let rec _check_invariants = function 43 | | Empty | Leaf (_ , _) -> true 44 | | Node map -> 45 | not (M.is_empty map) && 46 | M.for_all (fun _ v -> _check_invariants v) map 47 | 48 | let is_empty = function 49 | | Empty -> true 50 | | _ -> false 51 | 52 | let _id x = x 53 | 54 | (** Smart constructors *) 55 | 56 | (* sub-tree t prefixed with c *) 57 | let _cons c t = if is_empty t then Empty else Node (M.singleton c t) 58 | 59 | let _leaf x = Leaf x 60 | 61 | (* build a Node value *) 62 | let _node map = 63 | if M.is_empty map then Empty 64 | else 65 | if M.cardinal map = 1 66 | then 67 | let c, sub = M.min_binding map in 68 | _cons c sub 69 | else Node map 70 | 71 | let _node2 c1 t1 c2 t2 = 72 | match is_empty t1, is_empty t2 with 73 | | true, true -> Empty 74 | | true, false -> _cons c2 t2 75 | | false, true -> _cons c1 t1 76 | | false, false -> 77 | let map = M.add c1 t1 M.empty in 78 | let map = M.add c2 t2 map in 79 | _node map 80 | 81 | (** Inserting/Removing *) 82 | 83 | (* fold [f] on [iter] with accumulator [acc], and call [finish] 84 | on the accumulator once [iter] is exhausted *) 85 | let _fold_iter_and_then f ~finish acc iter = 86 | let acc = ref acc in 87 | iter (fun x -> acc := f !acc x); 88 | finish !acc 89 | 90 | let update key f t = 91 | (* first arg: current subtree and rebuild function; [c]: current char *) 92 | let goto (t, rebuild) c = 93 | match t with 94 | | Empty | Leaf _ -> t, fun t -> rebuild (_cons c t) 95 | | Node map -> 96 | try 97 | let t' = M.find c map in 98 | (* rebuild: we modify [t], so we put the new version in [map] 99 | if it's not empty, and make the node again *) 100 | let rebuild' new_child = 101 | rebuild ( 102 | if is_empty new_child 103 | then _node (M.remove c map) 104 | else _node (M.add c new_child map) 105 | ) 106 | in 107 | t', rebuild' 108 | with Not_found -> 109 | let rebuild' new_child = 110 | if is_empty new_child 111 | then rebuild t (* ignore *) 112 | else 113 | let map' = M.add c new_child map in 114 | rebuild (_node map') 115 | in 116 | empty, rebuild' 117 | in 118 | let leaf_or_empty rebuild o = match f o with 119 | | None -> rebuild (_node M.empty) 120 | | Some x' -> rebuild (_leaf x') 121 | in 122 | let finish (t,rebuild) = match t with 123 | | Leaf x -> leaf_or_empty rebuild @@ Some x 124 | | Empty -> leaf_or_empty rebuild @@ None 125 | | Node map -> rebuild (_node map) 126 | in 127 | let word = W.to_iter key in 128 | _fold_iter_and_then goto ~finish (t, _id) word 129 | 130 | let add k v t = update k (fun _ -> Some v) t 131 | (* let remove k t = update k (fun _ -> None) t *) 132 | 133 | let singleton k v = add k v Empty 134 | 135 | (** Iter/Fold *) 136 | 137 | type 'a difflist = 'a list -> 'a list 138 | 139 | let _difflist_add 140 | : 'a difflist -> 'a -> 'a difflist 141 | = fun f x -> fun l' -> f (x :: l') 142 | 143 | (* fold that also keeps the path from the root, so as to provide the list 144 | of chars that lead to a value. The path is a difference list, ie 145 | a function that prepends a list to some suffix *) 146 | let rec _fold f path t acc = match t with 147 | | Empty -> acc 148 | | Leaf v -> f acc path v 149 | (* | Cons (c, t') -> _fold f (_difflist_add path c) t' acc *) 150 | | Node map -> 151 | M.fold 152 | (fun c t' acc -> _fold f (_difflist_add path c) t' acc) 153 | map acc 154 | 155 | (* let fold f acc t = 156 | * _fold 157 | * (fun acc path v -> 158 | * let key = W.of_list (path []) in 159 | * f acc key v) 160 | * _id t acc *) 161 | 162 | (*$T 163 | T.fold (fun acc k v -> (k,v) :: acc) [] t1 \ 164 | |> List.sort Pervasives.compare = List.sort Pervasives.compare l1 165 | *) 166 | 167 | let iter f t = 168 | _fold 169 | (fun () path y -> f (W.of_list (path [])) y) 170 | _id t () 171 | 172 | (* let rec fold_values f acc t = match t with 173 | * | Empty -> acc 174 | * | Leaf v -> f acc v 175 | * (\* | Cons (_, t') -> fold_values f acc t' *\) 176 | * | Node map -> 177 | * M.fold 178 | * (fun _c t' acc -> fold_values f acc t') 179 | * map acc 180 | * 181 | * let iter_values f t = fold_values (fun () x -> f x) () t *) 182 | 183 | 184 | (** Merging operations *) 185 | let _mk = function Some x -> _leaf x | None -> empty 186 | 187 | let[@specialize] rec merge_with ~f ~left ~right t1 t2 = match t1, t2 with 188 | | Empty, Empty -> f None None 189 | | Empty, Node _ -> right t2 190 | | Node _, Empty -> left t1 191 | | Leaf v, Empty -> f (Some v) None 192 | | Empty, Leaf v -> f None (Some v) 193 | | Leaf v, Leaf v' -> f (Some v) (Some v') 194 | | Leaf _, Node _ | Node _, Leaf _ -> assert false 195 | (* | Cons (c1,t1'), Cons (c2,t2') -> 196 | * if W.compare_char c1 c2 = 0 197 | * then _cons c1 (merge_with ~f ~left ~right t1' t2') 198 | * else _node2 c1 (left t1') c2 (right t2') 199 | * 200 | * | Cons (c1, t1'), Node (value, map) -> 201 | * begin try 202 | * (\* collision *\) 203 | * let t2' = M.find c1 map in 204 | * let new_t = merge_with ~f ~left ~right t1' t2' in 205 | * let map' = if is_empty new_t 206 | * then M.remove c1 map 207 | * else M.add c1 new_t map 208 | * in 209 | * _node value map' 210 | * with Not_found -> 211 | * (\* no collision *\) 212 | * assert (not(is_empty t1')); 213 | * let t1' = left t1' in 214 | * let map' = if is_empty t1' then map else M.add c1 t1' map in 215 | * Node (value, map') 216 | * end 217 | * | Node (value, map), Cons (c2, t2') -> 218 | * begin try 219 | * (\* collision *\) 220 | * let t1' = M.find c2 map in 221 | * let new_t = merge_with ~f ~left ~right t1' t2' in 222 | * let map' = if is_empty new_t 223 | * then M.remove c2 map 224 | * else M.add c2 new_t map 225 | * in 226 | * _node value map' 227 | * with Not_found -> 228 | * (\* no collision *\) 229 | * assert (not(is_empty t2')); 230 | * let t2' = left t2' in 231 | * let map' = if is_empty t2' then map else M.add c2 t2' map in 232 | * Node (value, map') 233 | * end *) 234 | | Node map1, Node map2 -> 235 | (* let v = f v1 v2 in *) 236 | let as_option t = if is_empty t then None else Some t in 237 | let map' = M.merge 238 | (fun _c t1 t2 -> match t1, t2 with 239 | | None, None -> assert false 240 | | Some t, None -> as_option @@ left t 241 | | None, Some t -> as_option @@ right t 242 | | Some t1, Some t2 -> 243 | let new_t = merge_with ~f ~left ~right t1 t2 in 244 | as_option new_t 245 | ) map1 map2 246 | in 247 | _node map' 248 | 249 | let keep x = x 250 | let drop _ = Empty 251 | 252 | let union l l' = 253 | let left = keep and right = keep and f a b = match a,b with 254 | | Some _, _ -> _mk a 255 | | None, _ -> _mk b 256 | in 257 | merge_with ~f ~left ~right l l' 258 | 259 | let inter l l' = 260 | let left = drop and right = drop and f a b = match a,b with 261 | | Some _, Some _ -> _mk a 262 | | _ -> empty 263 | in 264 | merge_with ~f ~left ~right l l' 265 | 266 | let diff l l' = 267 | let left = keep and right = drop and f a b = match a,b with 268 | | Some _, None -> _mk a 269 | | _ -> empty 270 | in 271 | merge_with ~f ~left ~right l l' 272 | 273 | let merge l = 274 | List.fold_left union Empty l 275 | 276 | (** Grafting/flatmap *) 277 | 278 | (* let map f t = 279 | * let rec map_ = function 280 | * | Empty -> Empty 281 | * (\* | Cons (c, t') -> Cons (c, map_ t') *\) 282 | * | Leaf x -> Leaf (f x) 283 | * | Node map -> 284 | * let map' = M.map map_ map 285 | * in Node map' 286 | * in map_ t *) 287 | 288 | let rec append t t0 = match t with 289 | | Empty -> Empty 290 | | Leaf _v -> t0 291 | (* | Cons (c, t') -> Cons (c, append t' t0) *) 292 | | Node map -> 293 | let map = M.map (fun t' -> append t' t0) map in 294 | Node map 295 | 296 | (** Misc *) 297 | 298 | (* let rec size t = match t with 299 | * | Empty -> 0 300 | * | Cons (_, t') -> size t' 301 | * | Node (v, map) -> 302 | * let s = if v=None then 0 else 1 in 303 | * M.fold 304 | * (fun _ t' acc -> size t' + acc) 305 | * map s *) 306 | 307 | let of_list l = 308 | List.fold_left (fun acc v -> add v () acc) empty l 309 | 310 | let to_iter t k = iter (fun x () -> k x) t 311 | 312 | (** External API *) 313 | 314 | let return x = singleton x () 315 | let memoize x = x 316 | 317 | end 318 | 319 | module type ORDERED = sig 320 | type t 321 | val compare : t -> t -> int 322 | end 323 | 324 | module MakeArray(X : ORDERED) = Make(struct 325 | type t = X.t array 326 | type char = X.t 327 | let append = Array.append 328 | let compare_char = X.compare 329 | let to_iter a k = Array.iter k a 330 | let of_list = Array.of_list 331 | end) 332 | 333 | module MakeList(X : ORDERED) = Make(struct 334 | type t = X.t list 335 | type char = X.t 336 | let append = List.append 337 | let compare_char = X.compare 338 | let to_iter a k = List.iter k a 339 | let of_list l = l 340 | end) 341 | 342 | module String = Make(struct 343 | type t = string 344 | type nonrec char = char 345 | let append = (^) 346 | let compare_char = Char.compare 347 | let to_iter s k = String.iter k s 348 | let of_list l = 349 | let buf = Buffer.create (List.length l) in 350 | List.iter (fun c -> Buffer.add_char buf c) l; 351 | Buffer.contents buf 352 | end) 353 | -------------------------------------------------------------------------------- /segments/Trie.mli: -------------------------------------------------------------------------------- 1 | 2 | module type WORD = sig 3 | type t 4 | type char 5 | 6 | val compare_char : char -> char -> int 7 | val append : t -> t -> t 8 | val to_iter : t -> char Iter.t 9 | val of_list : char list -> t 10 | end 11 | 12 | 13 | module Make(W : WORD) : Sigs.S with type elt = W.t 14 | module String : Sigs.S with type elt = string 15 | -------------------------------------------------------------------------------- /segments/dune: -------------------------------------------------------------------------------- 1 | (library 2 | (name segments) 3 | (public_name regenerate.segments) 4 | (libraries fmt iter containers.data containers.iter oseq containers) 5 | (ocamlopt_flags :standard "-O3") 6 | ) 7 | -------------------------------------------------------------------------------- /test/dune: -------------------------------------------------------------------------------- 1 | (library 2 | (name test) 3 | (modules test) 4 | (libraries mtime.clock.os regenerate) 5 | (ocamlopt_flags :standard "-O3" "-inlining-report") 6 | ) 7 | 8 | (executable 9 | (name running_profile) 10 | (modules running_profile) 11 | (libraries test) 12 | (ocamlopt_flags :standard "-O3" "-inlining-report") 13 | ) 14 | -------------------------------------------------------------------------------- /test/re.gnuplot: -------------------------------------------------------------------------------- 1 | # Gnuplot script file for plotting data in file "force.dat" 2 | # This file is called force.p 3 | set autoscale # scale axes automatically 4 | unset log # remove any log-scaling 5 | unset label # remove any previous labels 6 | set xtic auto # set xtics automatically 7 | set ytic auto # set ytics automatically 8 | #set title "Force Deflection Data for a Beam and a Column" 9 | set ylabel "Count" 10 | set xlabel "Time (s)" 11 | #set logscale y 12 | 13 | set terminal png size 1200,900 enhanced 14 | set output 're_gen.png' 15 | 16 | plot "re_stream_0.csv" using 2:1 title 'Re 0 Stream' with linespoints , \ 17 | "re_stream_1.csv" using 2:1 title 'Re 1 Stream' with linespoints , \ 18 | "re_stream_2.csv" using 2:1 title 'Re 2 Stream' with linespoints , \ 19 | "re_stream_3.csv" using 2:1 title 'Re 3 Stream' with linespoints, \ 20 | "re_trie_0.csv" using 2:1 title 'Re 0 Trie' with linespoints , \ 21 | "re_trie_1.csv" using 2:1 title 'Re 1 Trie' with linespoints , \ 22 | "re_trie_2.csv" using 2:1 title 'Re 2 Trie' with linespoints , \ 23 | "re_trie_3.csv" using 2:1 title 'Re 3 Trie' with linespoints, \ 24 | "re_set_0.csv" using 2:1 title 'Re 0 Set' with linespoints , \ 25 | "re_set_1.csv" using 2:1 title 'Re 1 Set' with linespoints , \ 26 | "re_set_2.csv" using 2:1 title 'Re 2 Set' with linespoints , \ 27 | "re_set_3.csv" using 2:1 title 'Re 3 Set' with linespoints -------------------------------------------------------------------------------- /test/re/dune: -------------------------------------------------------------------------------- 1 | (test 2 | (name test_re) 3 | (libraries regenerate regenerate.segments re) 4 | (action (run %{test} -v --long)) 5 | ) 6 | -------------------------------------------------------------------------------- /test/re/test_re.ml: -------------------------------------------------------------------------------- 1 | let () = Printexc.record_backtrace true 2 | 3 | (* Turn a regular expression from Regenerate into an Re one. *) 4 | let rec to_re = let open Regenerate.Regex in function 5 | | One -> Re.epsilon 6 | | Set (true, l) -> Re.set @@ CCString.of_list l 7 | | Set (false, l) -> Re.compl [Re.set @@ CCString.of_list l] 8 | | Seq (re, re') -> Re.seq [to_re re; to_re re'] 9 | | Or (re, re') -> Re.alt [to_re re; to_re re'] 10 | | And (re, re') -> Re.inter [to_re re; to_re re'] 11 | | Not re -> assert false (* Re does not handle arbitrary complement. *) 12 | | Rep (i,j,re) -> Re.repn (to_re re) i j 13 | 14 | (* Check positive and negative samples. *) 15 | let check (re, pos, neg) = 16 | (* 1. Compile the regular expression. *) 17 | let cre = 18 | try 19 | Re.compile @@ Re.whole_string @@ to_re re 20 | with _ -> 21 | (* Discard regular expressions that Re does not handle. *) 22 | QCheck.assume_fail () 23 | in 24 | (* 2. Test! *) 25 | List.for_all (fun s -> Re.execp cre s) pos && 26 | List.for_all (fun s -> not @@ Re.execp cre s) neg 27 | 28 | let test = 29 | let alphabet = ['a'; 'b'; 'c'] in 30 | let module Word = Regenerate.Word.String in 31 | let module Stream = Segments.ThunkList(Word) in 32 | let generator = 33 | Regenerate.arbitrary 34 | (module Word) (* Datastructure for words *) 35 | (module Stream) (* Datastructure for streams of words *) 36 | ~compl:false (* Should we generate complement operations? *) 37 | ~pp:Fmt.char (* Printer for characters *) 38 | ~samples:100 (* Average number of sampes for each regular expression *) 39 | alphabet (* Alphabet *) 40 | in 41 | QCheck.Test.make generator check 42 | 43 | let () = QCheck_runner.run_tests_main [test] 44 | -------------------------------------------------------------------------------- /test/running_profile.ml: -------------------------------------------------------------------------------- 1 | open Test 2 | 3 | let f i l = 4 | let file = Fmt.strf "re_trie_%i.csv" i in 5 | let oc = open_out file in 6 | let n = 7 | L.gen l 8 | |> Test.measure_until 9 | ~limit:(Mtime.Span.of_uint64_ns (5_000_000_000L)) 10 | ~interval:20 11 | oc 12 | in 13 | Gc.full_major (); 14 | Fmt.pr "Re %i: %i elements@." i n 15 | 16 | let () = 17 | Array.iteri f langs 18 | 19 | -------------------------------------------------------------------------------- /test/test.ml: -------------------------------------------------------------------------------- 1 | open Regenerate 2 | 3 | (* module W = Word.List(Char) *) 4 | module W = Word.String 5 | module S = Segments.ThunkList(W) 6 | module Sigma = struct type t = S.t 7 | let sigma = S.of_list ["a"; "b" ; "c"] 8 | end 9 | module L = Make (W) (S) (Sigma) 10 | 11 | let assert_sorted s = 12 | let rec aux x = function 13 | | OSeq.Nil -> () 14 | | Cons (y, s) -> 15 | if W.compare x y >= 0 then 16 | Fmt.invalid_arg "Not sorted: %a %a" W.pp x W.pp y 17 | else aux y (s()) 18 | in 19 | match s () with 20 | | OSeq.Nil -> () 21 | | Cons (x, s) -> aux x (s()) 22 | 23 | (* let (!!) = of_list 24 | * 25 | * let sigma = S.of_list ['a';'b';'c'] 26 | * let full = sigma_star sigma 27 | * let l = !!["a"; "ab"; "c" ;"abc"] 28 | * let a = !!["a"] *) 29 | 30 | let langs = Regex.[| 31 | seq [compl (char 'a'); char 'a'] ; 32 | star (atom 'a') ; 33 | star (Seq (atom 'a', star (atom 'b'))) ; 34 | star (Seq (Or (atom 'a', One), star (atom 'b'))) ; 35 | |] 36 | 37 | let id x = x 38 | 39 | let rec take n s () = match s() with 40 | | L.Everything | L.Nothing as v -> v 41 | | L.Cons (seg, s) -> 42 | if n <= 0 then L.Nothing 43 | else L.Cons (seg, take (n-1) s) 44 | 45 | let print_all ?n (lang : L.lang) = 46 | lang 47 | (* |> L.flatten *) 48 | |> (match n with Some n -> take n | None -> id) 49 | |> Fmt.pr "%a@." L.pp 50 | 51 | let time_up_to_gen n lang = 52 | let i = lang 53 | |> take n 54 | |> L.flatten 55 | |> Iter.length 56 | in 57 | Fmt.pr "Max length: %i@.Count: %i@.Time: %a@." n i 58 | Mtime.Span.pp (Mtime_clock.elapsed()) 59 | 60 | let time_up_to_length n lang = 61 | let i = lang 62 | |> L.flatten 63 | |> Iter.take n 64 | |> Iter.length 65 | in 66 | Fmt.pr "Max count: %i@.Actual Count: %i@.Time: %a@." n i 67 | Mtime.Span.pp (Mtime_clock.elapsed()) 68 | 69 | let measure_until ~limit ~interval oc lang = 70 | let c = Mtime_clock.counter () in 71 | let r = ref 0 in 72 | let fmt = Format.formatter_of_out_channel oc in 73 | let output i s = Fmt.pf fmt "%i\t%f@." i (Mtime.Span.to_s s) in 74 | let f _ = 75 | incr r ; 76 | let i = !r in 77 | if i mod interval = 0 then begin 78 | let t = Mtime_clock.count c in 79 | if Mtime.Span.compare limit t < 0 80 | then raise Exit 81 | else output i t 82 | end 83 | in 84 | (try Iter.iter f (L.flatten lang) with Exit -> ()); 85 | close_out oc ; 86 | !r 87 | 88 | (* 89 | 90 | open Regenerate ;; 91 | open Test ;; 92 | #install_printer pp_seg ;; 93 | #install_printer pp ;; 94 | 95 | *) 96 | -------------------------------------------------------------------------------- /web/dune: -------------------------------------------------------------------------------- 1 | (executable 2 | (name regenerate_web) 3 | (libraries regenerate js_of_ocaml) 4 | (preprocess (pps js_of_ocaml.ppx)) 5 | (js_of_ocaml) 6 | ) 7 | -------------------------------------------------------------------------------- /web/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Regenerate : Generate languages from regular expressions 9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 |
17 |
18 |
19 |

Regenerate

20 | 21 |

22 | Regenerate is a tool to generate test-cases for regular expression engines. 23 |

24 |

25 | Regenerate takes a regular expression and generates strings that match it. 26 | It handles most posix extended regular expressions along with 27 | complement (~a) and intersection (a&b). See 28 | the help for details. 29 | Since it handles complement, it can also generate strings that 30 | don't match a given regular expression. 31 | Below, we generate both positive and negative examples for the 32 | language composed of the letter 'a' and 'b'. 33 | See our paper for details. 34 |

35 |

36 | Regenerate is made with OCaml and a couple 37 | of cool 38 | OCaml libraries. The native tool and the library are available on 39 | opam. 40 | This page was made with the awesome 41 | js_of_ocaml compiler. 42 | The source code is available on github. 43 |

44 | 45 |
46 |
47 | 50 | ? 51 |
52 | 53 | 57 |
58 |
59 | 60 |
61 | 65 |
66 | 67 |
68 |
69 |
70 | 71 |
72 |
73 |
74 |
75 | 76 | 88 | 89 |
90 | 91 | 99 | 100 |
101 |

Which regular expressions?

102 |

You can use most posix extended regular expressions plus complement and intersection. 103 | More precisely:

104 |
    105 |
  • a: A single char.
  • 106 |
  • [ab]: Character sets. 107 |
      108 |
    • [^ab]: Complement Character sets.
    • 109 |
    • [a-c]: Character ranges.
    • 110 |
    111 |
  • 112 |
  • ab: Sequence.
  • 113 |
  • a|b: Alternative.
  • 114 |
  • a&b: Intersection.
  • 115 |
  • ~a: Complement.
  • 116 |
  • a{i,j}: Repetition. 117 |
      118 |
    • a* is a{0,}
    • 119 |
    • a+ is a{1,}
    • 120 |
    • a? is a{0,1}
    • 121 |
    122 |
  • 123 |
124 | 125 |

Back references, character classes, anchoring and lookahead/lookbehind are not 126 | supported.

127 | 128 | 131 |
132 | 133 | 134 |
135 | 140 |
141 | 142 | 145 | 147 | 150 | 153 | 154 | 155 | -------------------------------------------------------------------------------- /web/regenerate.css: -------------------------------------------------------------------------------- 1 | .instances { 2 | font-size: 1.5em; 3 | } 4 | 5 | .positive { 6 | text-align:right; 7 | } 8 | .negative { 9 | text-align:left; 10 | } 11 | 12 | .positive > h2 { 13 | color: #5da423; 14 | } 15 | .negative > h2 { 16 | color: #c60f13; 17 | } 18 | 19 | #neg-msg, #pos-msg { 20 | color: #707070; 21 | font-size: 0.8em; 22 | } 23 | 24 | #pos-instances #pos-msg { 25 | float:right; 26 | } 27 | #neg-instances #neg-msg { 28 | float:left; 29 | } 30 | 31 | .fork-me { 32 | background-color: #000; 33 | } 34 | -------------------------------------------------------------------------------- /web/regenerate_web.ml: -------------------------------------------------------------------------------- 1 | open Js_of_ocaml 2 | open Regenerate 3 | 4 | (* let alphabet = CCOpt.get_exn @@ Regex.enumerate ' ' '~' *) 5 | let alphabet = ['a';'b'] 6 | module W = Word.String 7 | module S = Segments.ThunkList(W) 8 | let sigma = S.of_list @@ List.map W.singleton alphabet 9 | module Sigma = struct type t = S.t let sigma = sigma end 10 | module L = Make (W) (S) (Sigma) 11 | 12 | (** Web part *) 13 | 14 | let (!$) = Dom_html.getElementById 15 | let ($$) id c = CCOpt.get_exn @@ Dom_html.getElementById_coerce id c 16 | 17 | let instances = !$"instances" 18 | let pos_instances = !$"pos-instances" 19 | let neg_instances = !$"neg-instances" 20 | let re_form = !$"re-form" 21 | let fail_div = !$"fail" 22 | let pos_msg = !$"pos-msg" 23 | let neg_msg = !$"neg-msg" 24 | let re_list = !$"re-list" 25 | 26 | let re_input = "re-input" $$ Dom_html.CoerceTo.input 27 | let mode_select = "mode-select" $$ Dom_html.CoerceTo.select 28 | 29 | let createLicode s = 30 | let itemLi = Dom_html.createLi Dom_html.document in 31 | let item = Dom_html.createCode Dom_html.document in 32 | item##.textContent := (Js.some @@ Js.string s) ; 33 | let _ = Dom.appendChild itemLi item in 34 | itemLi 35 | 36 | type mode = All | Sample 37 | let get_mode () = 38 | match Js.to_string mode_select##.value with 39 | | "all" -> All 40 | | "sample" -> Sample 41 | | _ -> assert false 42 | 43 | (** Prepare the page to show new instances. *) 44 | let clear_children s = s##.innerHTML := Js.string "" 45 | let clear () = 46 | instances##.classList##remove (Js.string "is-hidden") ; 47 | List.iter clear_children [ 48 | pos_instances ; 49 | neg_instances ; 50 | pos_msg ; 51 | neg_msg ; 52 | fail_div ; 53 | ]; 54 | () 55 | 56 | (** Push a new instance. *) 57 | let push b s = 58 | let elem = createLicode s in 59 | let parent = if b then pos_instances else neg_instances in 60 | let _ = Dom.appendChild parent elem in 61 | () 62 | 63 | (** On failure. *) 64 | let fail s = 65 | instances##.classList##add (Js.string "is-hidden") ; 66 | let html = 67 | Fmt.strf {|
%s
|} s 68 | in 69 | fail_div##.innerHTML := Js.string html ; 70 | () 71 | 72 | (** When a stream is done, we show a message. *) 73 | let show_note elem res = 74 | let s = match res with 75 | | L.Done -> "" 76 | | L.GaveUp -> 77 | {|

I give up! It doesn't look like I will produce more strings for this 78 | regular expression. 79 | Maybe the next string is very long, or maybe there isn't any more strings. 80 | If you want me to try harder, use the native version!

|} 81 | | L.Finite -> 82 | {|

That's it! 83 | This regular expression recognizes a finite number of strings.

|} 84 | in 85 | elem##.innerHTML := Js.string s 86 | 87 | (** Reimplementation of Sequence.take with an exit state. *) 88 | exception ExitTake 89 | let take n seq k = 90 | let count = ref 0 in 91 | try 92 | seq 93 | (fun x -> 94 | if !count = n then raise ExitTake; 95 | incr count; 96 | k x) 97 | with ExitTake -> L.Done 98 | 99 | let st = Random.State.make_self_init () 100 | let gen mode re = 101 | let firsts, n = match mode with 102 | | Sample -> 5, 20 103 | | All -> 200, 200 104 | in 105 | let f l = take n @@ L.sample ~st ~firsts ~skip:5 l in 106 | let lang = L.gen re in 107 | 108 | let pos = f lang in 109 | let pos_res = pos (push true) in 110 | show_note pos_msg pos_res ; 111 | 112 | let neg = f @@ L.compl lang in 113 | let neg_res = neg (push false) in 114 | show_note neg_msg neg_res ; 115 | 116 | () 117 | 118 | let handler_generate _ _ = 119 | let s = re_input##.value in 120 | let mode = get_mode () in 121 | clear (); 122 | begin match parse @@ Js.to_string s with 123 | | Error `Not_supported -> fail "This feature is not supported." 124 | | Error `Parse_error -> fail "The parsing of your regular expression failed." 125 | | Ok re -> 126 | try gen mode re 127 | with exn -> 128 | Firebug.console##error (Js.string @@ Printexc.to_string exn) 129 | end; 130 | false 131 | 132 | 133 | (** Regular expression list and generator. *) 134 | 135 | let re_examples = [ 136 | "(b(ab*a)*b|a)*"; 137 | "(ab*)*"; 138 | "~(a*)|a*"; 139 | "~(a*)&a*"; 140 | "(b*ab*ab*a)*b*"; 141 | ] 142 | let () = 143 | let add_re_to_list re = 144 | let handler _ _ = 145 | re_input##.value := Js.string re ; 146 | false 147 | in 148 | (* let elem = createLicode re in *) 149 | let button = Dom_html.createA Dom_html.document in 150 | button##setAttribute (Js.string "href") (Js.string "#") ; 151 | button##setAttribute (Js.string "data-close") (Js.string "") ; 152 | (* let _ = Dom.appendChild button elem in *) 153 | button##.textContent := Js.Opt.return (Js.string re) ; 154 | let _ = Dom_events.listen button Dom_events.Typ.click handler in 155 | let _ = Dom.appendChild re_list button in 156 | () 157 | in 158 | List.iter add_re_to_list re_examples 159 | 160 | 161 | let re_gen_button = !$"re-gen" 162 | let handler_gen_re _ _ = 163 | let re = 164 | Regex.gen ~compl:true (QCheck.Gen.oneofl alphabet) st 165 | in 166 | re_input##.value := 167 | Fmt.kstrf Js.string "%a@." (Regex.pp ~epsilon:false Fmt.char) re ; 168 | false 169 | 170 | 171 | let () = 172 | ignore @@ 173 | Dom_events.listen re_form Dom_events.Typ.submit handler_generate ; 174 | ignore @@ 175 | Dom_events.listen re_gen_button Dom_events.Typ.click handler_gen_re ; 176 | () 177 | -------------------------------------------------------------------------------- /web/static/fork.min.css: -------------------------------------------------------------------------------- 1 | /*! fork-me-on-github v2.1.0 | (c) 2014, 2014 | MIT License | https://github.com/edull24/fork-me-on-github */.fork-me-wrapper{width:170px;height:170px;position:absolute;top:0;right:0;overflow:hidden;z-index:9999}.fork-me{width:42px;height:42px;position:absolute;top:5px;right:5px;background-color:#a00000;background-image:url("");background-repeat:no-repeat;background-position:center center;overflow:hidden;white-space:nowrap}@media only screen and (min-width: 40.063em){.fork-me{width:auto;height:auto;top:40px;right:-55px;padding:2px 0;box-shadow:0 0 10px 0 #888888;background-color:#a00000;background-image:none !important;-webkit-transform:rotate(45deg);transform:rotate(45deg)}}.fork-me-link,.fork-me-link:link,.fork-me-link:visited,.fork-me-link:hover,.fork-me-link:focus,.fork-me-link:active{display:block;height:100%;text-decoration:none;border:none}@media only screen and (min-width: 40.063em){.fork-me-link,.fork-me-link:link,.fork-me-link:visited,.fork-me-link:hover,.fork-me-link:focus,.fork-me-link:active{font-family:"Helvetica Neue", "Helvetica", Helvetica, Arial, sans-serif;text-align:center;color:white;padding:10px 50px;border:1px solid white;text-shadow:0 0 5px #444444}}.fork-me-link:focus{text-decoration:underline}.fork-me-text{display:none}@media only screen and (min-width: 40.063em){.fork-me-text{display:inline}}.fork-me-dark{background-image:url("")}.fork-me-left{left:0;right:auto}.fork-me-left .fork-me{left:5px;right:auto}@media only screen and (min-width: 40.063em){.fork-me-left .fork-me{left:-55px;right:auto;-webkit-transform:rotate(-45deg);transform:rotate(-45deg)}}.fork-me-fixed{position:fixed} 2 | -------------------------------------------------------------------------------- /web/static/gpce18.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/regex-generate/regenerate/b534e04c12f59c384932130886c228ff713b2d4b/web/static/gpce18.pdf --------------------------------------------------------------------------------