├── .clang_complete
├── .gitignore
├── LICENCE
├── README.md
├── SConstruct
├── glossary
├── include
└── rejit.h
├── sample
├── SConscript
├── basic.cc
├── jrep.cc
├── regexdna-multithread.cc
└── regexdna.cc
├── src
├── allocation.h
├── assembler-base.cc
├── assembler-base.h
├── assembler.h
├── checks.cc
├── checks.h
├── codegen.cc
├── codegen.h
├── config.h
├── cpu.h
├── flags.cc
├── flags.h
├── globals.h
├── macro-assembler-base.h
├── macro-assembler.h
├── memory.h
├── parser.cc
├── parser.h
├── platform.h
├── platform
│ ├── platform-linux.cc
│ ├── platform-macos.cc
│ ├── platform-posix.cc
│ └── platform-posix.h
├── regexp.cc
├── regexp.h
├── rejit.cc
├── suffix_trees.cc
├── suffix_trees.h
├── utils.cc
├── utils.h
└── x64
│ ├── assembler-x64-inl.h
│ ├── assembler-x64.cc
│ ├── assembler-x64.h
│ ├── codegen-x64.cc
│ ├── macro-assembler-x64-inl.h
│ ├── macro-assembler-x64.cc
│ └── macro-assembler-x64.h
└── tools
├── analysis
├── SConscript
└── compinfo.cc
├── benchmarks
├── SConscript
├── engines
│ ├── bench_engine.cc
│ ├── bench_engine.h
│ ├── pcre
│ │ ├── SConscript
│ │ └── engine.cc
│ ├── re2
│ │ ├── SConscript
│ │ └── engine.cc
│ ├── rejit
│ │ ├── SConscript
│ │ └── engine.cc
│ └── v8
│ │ ├── SConscript
│ │ └── engine
├── gjrep.py
├── resources
│ ├── html
│ │ ├── bench_plot.js
│ │ ├── benchmarks_results.html.footer
│ │ ├── benchmarks_results.html.header
│ │ └── main.css
│ └── sample_bench_complex.png
└── run.py
├── debug
└── lldb_rejit.py
├── tests
├── run.py
└── test.cc
└── utils.py
/.clang_complete:
--------------------------------------------------------------------------------
1 | -std=c++11 -Wall -pedantic -Werror -Isrc/ -Iinclude/ -DREJIT_TARGET_ARCH_X64 -g -DDEBUG
2 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | *.o
2 | *.pyc
3 | *.swp
4 | .DS_Store
5 | .sconf_temp/
6 | .sconsign.dblite
7 | .tags
8 | build
9 | config.log
10 | tools/benchmarks/benchmarks_results.html
11 | tools/benchmarks/engines/pcre/engine
12 | tools/benchmarks/engines/pcre/svn.pcre/
13 | tools/benchmarks/engines/re2/engine
14 | tools/benchmarks/engines/re2/hg.re2/
15 | tools/benchmarks/engines/rejit/engine
16 | tools/benchmarks/resources/html/flot/
17 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | Rejit is a prototype of a non-backtracking, just-in-time, SIMD-able regular
2 | expression compiler developed on our free time. It is available under the GPLv3
3 | licence. It currently only supports the x86_64 architecture.
4 |
5 | ## Documentation
6 |
7 | Documentation and information is available [here][coreperf rejit]. Below are
8 | some sample benchmarks results.
9 |
10 | ## Benchmarks
11 |
12 | The results below were produced on a machine with a quad-core Intel Core
13 | i5-2400 CPU @ 3.10GHz with 4GiB RAM on Fedora 17 (3.8.13-100.fc17.x86\_64).
14 | It supports SSE4.2.
15 |
16 | Results are reported for the following engines versions:
17 |
18 | ```
19 | GNU grep version 2.14, commit: 599f2b15bc152cdf19022c7803f9a5f336c25e65
20 | Rejit commit: b29ea4af1a3ae86dcb25bf961bc716029430c9b1
21 | V8 version 3.20.9, commit: 455bb9c2ab8af080aa15e0fbf4838731f45241e8
22 | Re2 commit: aa957b5e3374
23 | ```
24 |
25 | #### Grepping recursively through the Linux kernel sources.
26 |
27 | ```
28 | $ CMD='grep -R regexp linux-3.10.6/'; $CMD > /dev/null && time $CMD > /dev/null
29 | real 0m0.622s
30 | user 0m0.356s
31 | sys 0m0.260s
32 | ```
33 |
34 | ```jrep``` is a grep-like utility powered by rejit.
35 |
36 | ```
37 | $ CMD='jrep -R regexp linux-3.10.6/'; $CMD > /dev/null && time $CMD > /dev/null
38 | real 0m0.370s
39 | user 0m0.101s
40 | sys 0m0.263s
41 | ```
42 |
43 | The `jrep` utility performs 1.68 times faster than gnu grep in this very real
44 | use-case! The time spent in `sys` is equivalent, but Rejit spends 3 times less
45 | time in `user` code.
46 |
It is part of the sample programs in the rejit repository (see the
47 | wiki). It is of course far behind grep in terms of features, but
48 | supports searching for multi-lines patterns and has initial multi-threading
49 | support.
50 |
51 | #### DNA matching benchmark.
52 |
53 | From the "[Computer Language Benchmarks Game][2]", this benchmark performs some DNA matching operations using regular expressions.
54 |
55 | The tables below show performance (`real` running time) for different input sizes, for
56 |
57 | the
58 | [fastest registered single threaded implementation][cpu_bench single threaded fastest]
59 | (V8) and a single-threaded Rejit-powered implementation.
60 |
61 | ```
62 | input size V8 Rejit
63 | 50.000 (500KB) 0.034s 0.015s
64 | 500.000 ( 5MB) 0.217s 0.130s
65 | 5.000.000 ( 50MB) 2.054s 1.246s
66 | 50.000.000 (500MB) (out of memory) 14.624s
67 | ```
68 |
69 | the
70 | [second fastest registered single threaded implementation][cpu_bench multi threaded fastest]
71 | (Re2) and a multi-threaded Rejit-powered implementation. (A quick go at running
72 | the first listed implementation would raise failures.)
73 |
74 | ```
75 | input size Re2 Rejit
76 | 50.000 (500KB) 0.022s 0.011s
77 | 500.000 ( 5MB) 0.183s 0.087s
78 | 5.000.000 ( 50MB) 1.629s 0.971s
79 | 50.000.000 (500MB) 20.693s 11.594s
80 | ```
81 |
82 | See performance for various engines and languages for [single-core][4] and [quad-core][5] implementations.
83 | The rejit programs used to run these benchmarks are also part of the rejit
84 | sample programs (see the wiki).
85 |
86 | #### Complex regular expression matching
87 | This is an example taken from rejit's benchmarks suite. It shows the performance to find all left-most longest matches of the regular expression ```([complex]|(regexp)){2,7}abcdefgh(at|the|[e-nd]as well)``` in randomly generated texts of various sizes. The performance reported is ```( /