├── .clang_complete ├── .gitignore ├── LICENCE ├── README.md ├── SConstruct ├── glossary ├── include └── rejit.h ├── sample ├── SConscript ├── basic.cc ├── jrep.cc ├── regexdna-multithread.cc └── regexdna.cc ├── src ├── allocation.h ├── assembler-base.cc ├── assembler-base.h ├── assembler.h ├── checks.cc ├── checks.h ├── codegen.cc ├── codegen.h ├── config.h ├── cpu.h ├── flags.cc ├── flags.h ├── globals.h ├── macro-assembler-base.h ├── macro-assembler.h ├── memory.h ├── parser.cc ├── parser.h ├── platform.h ├── platform │ ├── platform-linux.cc │ ├── platform-macos.cc │ ├── platform-posix.cc │ └── platform-posix.h ├── regexp.cc ├── regexp.h ├── rejit.cc ├── suffix_trees.cc ├── suffix_trees.h ├── utils.cc ├── utils.h └── x64 │ ├── assembler-x64-inl.h │ ├── assembler-x64.cc │ ├── assembler-x64.h │ ├── codegen-x64.cc │ ├── macro-assembler-x64-inl.h │ ├── macro-assembler-x64.cc │ └── macro-assembler-x64.h └── tools ├── analysis ├── SConscript └── compinfo.cc ├── benchmarks ├── SConscript ├── engines │ ├── bench_engine.cc │ ├── bench_engine.h │ ├── pcre │ │ ├── SConscript │ │ └── engine.cc │ ├── re2 │ │ ├── SConscript │ │ └── engine.cc │ ├── rejit │ │ ├── SConscript │ │ └── engine.cc │ └── v8 │ │ ├── SConscript │ │ └── engine ├── gjrep.py ├── resources │ ├── html │ │ ├── bench_plot.js │ │ ├── benchmarks_results.html.footer │ │ ├── benchmarks_results.html.header │ │ └── main.css │ └── sample_bench_complex.png └── run.py ├── debug └── lldb_rejit.py ├── tests ├── run.py └── test.cc └── utils.py /.clang_complete: -------------------------------------------------------------------------------- 1 | -std=c++11 -Wall -pedantic -Werror -Isrc/ -Iinclude/ -DREJIT_TARGET_ARCH_X64 -g -DDEBUG 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.o 2 | *.pyc 3 | *.swp 4 | .DS_Store 5 | .sconf_temp/ 6 | .sconsign.dblite 7 | .tags 8 | build 9 | config.log 10 | tools/benchmarks/benchmarks_results.html 11 | tools/benchmarks/engines/pcre/engine 12 | tools/benchmarks/engines/pcre/svn.pcre/ 13 | tools/benchmarks/engines/re2/engine 14 | tools/benchmarks/engines/re2/hg.re2/ 15 | tools/benchmarks/engines/rejit/engine 16 | tools/benchmarks/resources/html/flot/ 17 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Rejit is a prototype of a non-backtracking, just-in-time, SIMD-able regular 2 | expression compiler developed on our free time. It is available under the GPLv3 3 | licence. It currently only supports the x86_64 architecture. 4 | 5 | ## Documentation 6 | 7 | Documentation and information is available [here][coreperf rejit]. Below are 8 | some sample benchmarks results. 9 | 10 | ## Benchmarks 11 | 12 | The results below were produced on a machine with a quad-core Intel Core 13 | i5-2400 CPU @ 3.10GHz with 4GiB RAM on Fedora 17 (3.8.13-100.fc17.x86\_64). 14 | It supports SSE4.2. 15 | 16 | Results are reported for the following engines versions: 17 | 18 | ``` 19 | GNU grep version 2.14, commit: 599f2b15bc152cdf19022c7803f9a5f336c25e65 20 | Rejit commit: b29ea4af1a3ae86dcb25bf961bc716029430c9b1 21 | V8 version 3.20.9, commit: 455bb9c2ab8af080aa15e0fbf4838731f45241e8 22 | Re2 commit: aa957b5e3374 23 | ``` 24 | 25 | #### Grepping recursively through the Linux kernel sources. 26 | 27 | ``` 28 | $ CMD='grep -R regexp linux-3.10.6/'; $CMD > /dev/null && time $CMD > /dev/null 29 | real 0m0.622s 30 | user 0m0.356s 31 | sys 0m0.260s 32 | ``` 33 | 34 | ```jrep``` is a grep-like utility powered by rejit. 35 | 36 | ``` 37 | $ CMD='jrep -R regexp linux-3.10.6/'; $CMD > /dev/null && time $CMD > /dev/null 38 | real 0m0.370s 39 | user 0m0.101s 40 | sys 0m0.263s 41 | ``` 42 | 43 | The `jrep` utility performs 1.68 times faster than gnu grep in this very real 44 | use-case! The time spent in `sys` is equivalent, but Rejit spends 3 times less 45 | time in `user` code. 46 |
It is part of the sample programs in the rejit repository (see the 47 | wiki). It is of course far behind grep in terms of features, but 48 | supports searching for multi-lines patterns and has initial multi-threading 49 | support. 50 | 51 | #### DNA matching benchmark. 52 | 53 | From the "[Computer Language Benchmarks Game][2]", this benchmark performs some DNA matching operations using regular expressions. 54 | 55 | The tables below show performance (`real` running time) for different input sizes, for 56 | 57 | the 58 | [fastest registered single threaded implementation][cpu_bench single threaded fastest] 59 | (V8) and a single-threaded Rejit-powered implementation. 60 | 61 | ``` 62 | input size V8 Rejit 63 | 50.000 (500KB) 0.034s 0.015s 64 | 500.000 ( 5MB) 0.217s 0.130s 65 | 5.000.000 ( 50MB) 2.054s 1.246s 66 | 50.000.000 (500MB) (out of memory) 14.624s 67 | ``` 68 | 69 | the 70 | [second fastest registered single threaded implementation][cpu_bench multi threaded fastest] 71 | (Re2) and a multi-threaded Rejit-powered implementation. (A quick go at running 72 | the first listed implementation would raise failures.) 73 | 74 | ``` 75 | input size Re2 Rejit 76 | 50.000 (500KB) 0.022s 0.011s 77 | 500.000 ( 5MB) 0.183s 0.087s 78 | 5.000.000 ( 50MB) 1.629s 0.971s 79 | 50.000.000 (500MB) 20.693s 11.594s 80 | ``` 81 | 82 | See performance for various engines and languages for [single-core][4] and [quad-core][5] implementations. 83 | The rejit programs used to run these benchmarks are also part of the rejit 84 | sample programs (see the wiki). 85 | 86 | #### Complex regular expression matching 87 | This is an example taken from rejit's benchmarks suite. It shows the performance to find all left-most longest matches of the regular expression ```([complex]|(regexp)){2,7}abcdefgh(at|the|[e-nd]as well)``` in randomly generated texts of various sizes. The performance reported is ```( /