├── .gitignore ├── .hgignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── MANIFEST ├── Makefile ├── README ├── README.rst ├── _re2.cc ├── re2.py ├── setup.py ├── tests ├── __init__.py ├── test_compile.py └── test_match.py └── tox.ini /.gitignore: -------------------------------------------------------------------------------- 1 | /build 2 | /dist 3 | *.so 4 | *.py[co] 5 | *.egg-info 6 | .tox/ 7 | -------------------------------------------------------------------------------- /.hgignore: -------------------------------------------------------------------------------- 1 | (?:build|dist) 2 | 3 | glob:*.so 4 | glob:*.py[cd] 5 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Code of Conduct 2 | 3 | Facebook has adopted a Code of Conduct that we expect project participants to adhere to. Please [read the full text](https://code.facebook.com/codeofconduct) so that you can understand what actions will and will not be tolerated. 4 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | 2 | # Contributing to pyre2 3 | We want to make contributing to this project as easy and transparent as 4 | possible. 5 | 6 | ## Code of Conduct 7 | The code of conduct is described in [`CODE_OF_CONDUCT.md`](CODE_OF_CONDUCT.md). 8 | 9 | ## Pull Requests 10 | We actively welcome your pull requests. 11 | 12 | 1. Fork the repo and create your branch from `master`. 13 | 2. If you've added code that should be tested, add tests. 14 | 3. If you've changed APIs, update the documentation. 15 | 4. Ensure the test suite passes. 16 | 5. Make sure your code lints. 17 | 6. If you haven't already, complete the Contributor License Agreement ("CLA"). 18 | 19 | ## Contributor License Agreement ("CLA") 20 | In order to accept your pull request, we need you to submit a CLA. You only need 21 | to do this once to work on any of Facebook's open source projects. 22 | 23 | Complete your CLA here: 24 | 25 | ## Issues 26 | We use GitHub issues to track public bugs. Please ensure your description is 27 | clear and has sufficient instructions to be able to reproduce the issue. 28 | 29 | Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe 30 | disclosure of security bugs. In those cases, please go through the process 31 | outlined on that page and do not file a public issue. 32 | 33 | ## License 34 | By contributing to pyre2, you agree that your contributions will be licensed 35 | under the LICENSE file in the root directory of this source tree. 36 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved. 2 | 3 | Redistribution and use in source and binary forms, with or without 4 | modification, are permitted provided that the following conditions 5 | are met: 6 | * Redistributions of source code must retain the above copyright 7 | notice, this list of conditions and the following disclaimer. 8 | * Redistributions in binary form must reproduce the above copyright 9 | notice, this list of conditions and the following disclaimer in the 10 | documentation and/or other materials provided with the distribution. 11 | * Neither the name of Facebook nor the names of its contributors 12 | may be used to endorse or promote products derived from this software 13 | without specific prior written permission. 14 | 15 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 16 | "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 17 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A 18 | PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 19 | HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 20 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 21 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 22 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 23 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 24 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 25 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 26 | -------------------------------------------------------------------------------- /MANIFEST: -------------------------------------------------------------------------------- 1 | LICENSE 2 | Makefile 3 | README.rst 4 | _re2.cc 5 | re2.py 6 | setup.py 7 | tests/test_match.py 8 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | PYTHON = python 2 | SETUP = $(PYTHON) setup.py 3 | RUNTESTS = PYTHONHASHSEED=random PYTHONPATH=.:$(PYTHONPATH) nosetests $(TEST_OPTIONS) 4 | 5 | build:: 6 | $(SETUP) build 7 | $(SETUP) build_ext -i 8 | 9 | check:: build 10 | $(RUNTESTS) tests/ 11 | -------------------------------------------------------------------------------- /README: -------------------------------------------------------------------------------- 1 | README.rst -------------------------------------------------------------------------------- /README.rst: -------------------------------------------------------------------------------- 1 | ===== 2 | pyre2 3 | ===== 4 | 5 | .. contents:: 6 | 7 | Summary 8 | ======= 9 | 10 | pyre2 is a Python extension that wraps 11 | `Google's RE2 regular expression library 12 | `_. 13 | It implements many of the features of Python's built-in 14 | ``re`` module with compatible interfaces. 15 | 16 | 17 | New Features 18 | ============ 19 | 20 | * ``Regexp`` objects have a ``fullmatch`` method that works like ``match``, 21 | but anchors the match at both the start and the end. 22 | * ``Regexp`` objects have 23 | ``test_search``, ``test_match``, and ``test_fullmatch`` 24 | methods that work like ``search``, ``match``, and ``fullmatch``, 25 | but only return ``True`` or ``False`` to indicate 26 | whether the match was successful. 27 | These methods should be faster than the full versions, 28 | especially for patterns with capturing groups. 29 | 30 | 31 | Missing Features 32 | ================ 33 | 34 | * No substitution methods. 35 | * No flags. 36 | * No ``split``, ``findall``, or ``finditer``. 37 | * No top-level convenience functions like ``search`` and ``match``. 38 | (Just use compile.) 39 | * No compile cache. 40 | (If you care enough about performance to use RE2, 41 | you probably care enough to cache your own patterns.) 42 | * No ``lastindex`` or ``lastgroup`` on ``Match`` objects. 43 | 44 | 45 | Current Status 46 | ============== 47 | 48 | pyre2 has only received basic testing, 49 | and I am by no means a Python extension expert, 50 | so it is quite possible that it contains bugs. 51 | I'd guess the most likely are reference leaks in error cases. 52 | 53 | RE2 doesn't build with fPIC, so I had to build it with 54 | 55 | :: 56 | 57 | make CFLAGS='-fPIC -c -Wall -Wno-sign-compare -O3 -g -I.' 58 | 59 | I also had to add it to my compiler search path when building the module 60 | with a command like 61 | 62 | :: 63 | 64 | env CPPFLAGS='-I/path/to/re2' LDFLAGS='-L/path/to/re2/obj' ./setup.py build 65 | 66 | 67 | Contact 68 | ======= 69 | 70 | You can file bug reports on GitHub, or email the author: 71 | David Reiss . 72 | 73 | 74 | License 75 | ======= 76 | 77 | See the ``_ file for more information. 78 | -------------------------------------------------------------------------------- /_re2.cc: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) Facebook, Inc. and its affiliates. 3 | * 4 | * Redistribution and use in source and binary forms, with or without 5 | * modification, are permitted provided that the following conditions 6 | * are met: 7 | * * Redistributions of source code must retain the above copyright 8 | * notice, this list of conditions and the following disclaimer. 9 | * * Redistributions in binary form must reproduce the above copyright 10 | * notice, this list of conditions and the following disclaimer in the 11 | * documentation and/or other materials provided with the distribution. 12 | * * Neither the name of Facebook nor the names of its contributors 13 | * may be used to endorse or promote products derived from this software 14 | * without specific prior written permission. 15 | * 16 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 17 | * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 18 | * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A 19 | * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 20 | * HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 21 | * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 22 | * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 23 | * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 24 | * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 25 | * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 26 | * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 27 | */ 28 | #define PY_SSIZE_T_CLEAN 29 | #include 30 | 31 | #include 32 | 33 | #include 34 | #include 35 | using std::nothrow; 36 | 37 | #include 38 | #include 39 | using re2::RE2; 40 | using re2::StringPiece; 41 | 42 | #include 43 | 44 | 45 | typedef struct _RegexpObject2 { 46 | PyObject_HEAD 47 | RE2* re2_obj; 48 | Py_ssize_t groups; 49 | PyObject* groupindex; 50 | PyObject* pattern; 51 | } RegexpObject2; 52 | 53 | typedef struct _MatchObject2 { 54 | PyObject_HEAD 55 | PyObject* re; 56 | PyObject* string; 57 | Py_ssize_t pos; /* beginning of target slice */ 58 | Py_ssize_t endpos; /* end of target slice */ 59 | // There are several possible approaches to storing the matched groups: 60 | // 1. Fully materialize the groups tuple at match time. 61 | // 2. Cache allocated PyBytes objects when groups are requested. 62 | // 3. Always allocate new PyBytess on demand. 63 | // I've chosen to go with #3. It's the simplest, and I'm pretty sure it's 64 | // optimal in all cases where no group is fetched more than once. 65 | StringPiece* groups; 66 | } MatchObject2; 67 | 68 | 69 | // Forward declaration of getter functions 70 | static PyObject* match_pos_get(MatchObject2* self); 71 | static PyObject* match_endpos_get(MatchObject2* self); 72 | static PyObject* match_re_get(MatchObject2* self); 73 | static PyObject* match_string_get(MatchObject2* self); 74 | static PyObject* regexp_groups_get(RegexpObject2* self); 75 | static PyObject* regexp_groupindex_get(RegexpObject2* self); 76 | static PyObject* regexp_pattern_get(RegexpObject2* self); 77 | 78 | static PyGetSetDef regexp_getset[] = { 79 | {(char *)"groupindex", (getter)regexp_groupindex_get, (setter)NULL}, 80 | {(char *)"groups", (getter)regexp_groups_get, (setter)NULL}, 81 | {(char *)"pattern", (getter)regexp_pattern_get, (setter)NULL}, 82 | {NULL} 83 | }; 84 | 85 | static PyGetSetDef match_getset[] = { 86 | {(char *)"endpos", (getter)match_endpos_get, (setter)NULL}, 87 | {(char *)"pos", (getter)match_pos_get, (setter)NULL}, 88 | {(char *)"re", (getter)match_re_get, (setter)NULL}, 89 | {(char *)"string", (getter)match_string_get, (setter)NULL}, 90 | {NULL} 91 | }; 92 | 93 | typedef struct _RegexpSetObject2 { 94 | PyObject_HEAD 95 | // True iff re2_set_obj has been compiled. 96 | bool compiled; 97 | RE2::Set* re2_set_obj; 98 | } RegexpSetObject2; 99 | 100 | 101 | // Forward declarations of methods, creators, and destructors. 102 | static void regexp_dealloc(RegexpObject2* self); 103 | static PyObject* create_regexp(PyObject* self, PyObject* pattern, PyObject* error_class); 104 | static PyObject* regexp_search(RegexpObject2* self, PyObject* args, PyObject* kwds); 105 | static PyObject* regexp_match(RegexpObject2* self, PyObject* args, PyObject* kwds); 106 | static PyObject* regexp_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds); 107 | static PyObject* regexp_test_search(RegexpObject2* self, PyObject* args, PyObject* kwds); 108 | static PyObject* regexp_test_match(RegexpObject2* self, PyObject* args, PyObject* kwds); 109 | static PyObject* regexp_test_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds); 110 | static void match_dealloc(MatchObject2* self); 111 | static PyObject* create_match(PyObject* re, PyObject* string, long pos, long endpos, StringPiece* groups); 112 | static PyObject* match_group(MatchObject2* self, PyObject* args); 113 | static PyObject* match_groups(MatchObject2* self, PyObject* args, PyObject* kwds); 114 | static PyObject* match_groupdict(MatchObject2* self, PyObject* args, PyObject* kwds); 115 | static PyObject* match_start(MatchObject2* self, PyObject* args); 116 | static PyObject* match_end(MatchObject2* self, PyObject* args); 117 | static PyObject* match_span(MatchObject2* self, PyObject* args); 118 | static void regexp_set_dealloc(RegexpSetObject2* self); 119 | static PyObject* regexp_set_new(PyTypeObject* type, PyObject* args, PyObject* kwds); 120 | static PyObject* regexp_set_add(RegexpSetObject2* self, PyObject* pattern); 121 | static PyObject* regexp_set_compile(RegexpSetObject2* self); 122 | static PyObject* regexp_set_match(RegexpSetObject2* self, PyObject* text); 123 | 124 | 125 | static PyMethodDef regexp_methods[] = { 126 | {"search", (PyCFunction)regexp_search, METH_VARARGS | METH_KEYWORDS, 127 | "search(string[, pos[, endpos]]) --> match object or None.\n" 128 | " Scan through string looking for a match, and return a corresponding\n" 129 | " MatchObject instance. Return None if no position in the string matches." 130 | }, 131 | {"match", (PyCFunction)regexp_match, METH_VARARGS | METH_KEYWORDS, 132 | "match(string[, pos[, endpos]]) --> match object or None.\n" 133 | " Matches zero or more characters at the beginning of the string" 134 | }, 135 | {"fullmatch", (PyCFunction)regexp_fullmatch, METH_VARARGS | METH_KEYWORDS, 136 | "fullmatch(string[, pos[, endpos]]) --> match object or None.\n" 137 | " Matches the entire string" 138 | }, 139 | {"test_search", (PyCFunction)regexp_test_search, METH_VARARGS | METH_KEYWORDS, 140 | "test_search(string[, pos[, endpos]]) --> bool.\n" 141 | " Like 'search', but only returns whether a match was found." 142 | }, 143 | {"test_match", (PyCFunction)regexp_test_match, METH_VARARGS | METH_KEYWORDS, 144 | "test_match(string[, pos[, endpos]]) --> match object or None.\n" 145 | " Like 'match', but only returns whether a match was found." 146 | }, 147 | {"test_fullmatch", (PyCFunction)regexp_test_fullmatch, METH_VARARGS | METH_KEYWORDS, 148 | "test_fullmatch(string[, pos[, endpos]]) --> match object or None.\n" 149 | " Like 'fullmatch', but only returns whether a match was found." 150 | }, 151 | {NULL} /* Sentinel */ 152 | }; 153 | 154 | static PyMethodDef match_methods[] = { 155 | {"group", (PyCFunction)match_group, METH_VARARGS, 156 | NULL 157 | }, 158 | {"groups", (PyCFunction)match_groups, METH_VARARGS | METH_KEYWORDS, 159 | NULL 160 | }, 161 | {"groupdict", (PyCFunction)match_groupdict, METH_VARARGS | METH_KEYWORDS, 162 | NULL 163 | }, 164 | {"start", (PyCFunction)match_start, METH_VARARGS, 165 | NULL 166 | }, 167 | {"end", (PyCFunction)match_end, METH_VARARGS, 168 | NULL 169 | }, 170 | {"span", (PyCFunction)match_span, METH_VARARGS, 171 | NULL 172 | }, 173 | {NULL} /* Sentinel */ 174 | }; 175 | 176 | static PyMethodDef regexp_set_methods[] = { 177 | {"add", (PyCFunction)regexp_set_add, METH_O, 178 | "add(pattern) --> index or raises an exception.\n" 179 | " Add a pattern to the set, raising if the pattern doesn't parse." 180 | }, 181 | {"compile", (PyCFunction)regexp_set_compile, METH_NOARGS, 182 | "compile() --> None or raises an exception.\n" 183 | " Compile the set to prepare it for matching." 184 | }, 185 | {"match", (PyCFunction)regexp_set_match, METH_O, 186 | "match(text) --> list\n" 187 | " Match text against the set, returning the indexes of the added patterns." 188 | }, 189 | {NULL} /* Sentinel */ 190 | }; 191 | 192 | 193 | // Simple method to block setattr. 194 | static int 195 | _no_setattr(PyObject* obj, PyObject* name, PyObject* v) { 196 | (void)name; 197 | (void)v; 198 | PyErr_Format(PyExc_AttributeError, 199 | "'%s' object attributes are read-only", 200 | obj->ob_type->tp_name); 201 | return -1; 202 | } 203 | 204 | #if PY_MAJOR_VERSION >= 3 205 | #define REGEX_OBJECT_TYPE &PyUnicode_Type 206 | #else 207 | #define REGEX_OBJECT_TYPE &PyString_Type 208 | #endif 209 | 210 | static PyTypeObject Regexp_Type2 = { 211 | PyObject_HEAD_INIT(NULL) 212 | #if PY_MAJOR_VERSION < 3 213 | 0, /*ob_size*/ 214 | #endif 215 | "_re2.RE2_Regexp", /*tp_name*/ 216 | sizeof(RegexpObject2), /*tp_basicsize*/ 217 | 0, /*tp_itemsize*/ 218 | (destructor)regexp_dealloc, /*tp_dealloc*/ 219 | 0, /*tp_print*/ 220 | 0, /*tp_getattr*/ 221 | 0, /*tp_setattr*/ 222 | 0, /*tp_compare*/ 223 | 0, /*tp_repr*/ 224 | 0, /*tp_as_number*/ 225 | 0, /*tp_as_sequence*/ 226 | 0, /*tp_as_mapping*/ 227 | 0, /*tp_hash*/ 228 | 0, /*tp_call*/ 229 | 0, /*tp_str*/ 230 | 0, /*tp_getattro*/ 231 | _no_setattr, /*tp_setattro*/ 232 | 0, /*tp_as_buffer*/ 233 | Py_TPFLAGS_DEFAULT, /*tp_flags*/ 234 | "RE2 regexp objects", /*tp_doc*/ 235 | 0, /*tp_traverse*/ 236 | 0, /*tp_clear*/ 237 | 0, /*tp_richcompare*/ 238 | 0, /*tp_weaklistoffset*/ 239 | 0, /*tp_iter*/ 240 | 0, /*tp_iternext*/ 241 | regexp_methods, /*tp_methods*/ 242 | 0, /*tp_members*/ 243 | regexp_getset, /*tp_getset*/ 244 | 0, /*tp_base*/ 245 | 0, /*tp_dict*/ 246 | 0, /*tp_descr_get*/ 247 | 0, /*tp_descr_set*/ 248 | 0, /*tp_dictoffset*/ 249 | 0, /*tp_init*/ 250 | 0, /*tp_alloc*/ 251 | 0, /*tp_new*/ 252 | }; 253 | 254 | static PyTypeObject Match_Type2 = { 255 | PyObject_HEAD_INIT(NULL) 256 | #if PY_MAJOR_VERSION < 3 257 | 0, /*ob_size*/ 258 | #endif 259 | "_re2.RE2_Match", /*tp_name*/ 260 | sizeof(MatchObject2), /*tp_basicsize*/ 261 | 0, /*tp_itemsize*/ 262 | (destructor)match_dealloc, /*tp_dealloc*/ 263 | 0, /*tp_print*/ 264 | 0, /*tp_getattr*/ 265 | 0, /*tp_setattr*/ 266 | 0, /*tp_compare*/ 267 | 0, /*tp_repr*/ 268 | 0, /*tp_as_number*/ 269 | 0, /*tp_as_sequence*/ 270 | 0, /*tp_as_mapping*/ 271 | 0, /*tp_hash*/ 272 | 0, /*tp_call*/ 273 | 0, /*tp_str*/ 274 | 0, /*tp_getattro*/ 275 | _no_setattr, /*tp_setattro*/ 276 | 0, /*tp_as_buffer*/ 277 | Py_TPFLAGS_DEFAULT, /*tp_flags*/ 278 | "RE2 match objects", /*tp_doc*/ 279 | 0, /*tp_traverse*/ 280 | 0, /*tp_clear*/ 281 | 0, /*tp_richcompare*/ 282 | 0, /*tp_weaklistoffset*/ 283 | 0, /*tp_iter*/ 284 | 0, /*tp_iternext*/ 285 | match_methods, /*tp_methods*/ 286 | 0, /*tp_members*/ 287 | match_getset, /*tp_getset*/ 288 | 0, /*tp_base*/ 289 | 0, /*tp_dict*/ 290 | 0, /*tp_descr_get*/ 291 | 0, /*tp_descr_set*/ 292 | 0, /*tp_dictoffset*/ 293 | 0, /*tp_init*/ 294 | 0, /*tp_alloc*/ 295 | 0, /*tp_new*/ 296 | }; 297 | 298 | static PyTypeObject RegexpSet_Type2 = { 299 | PyObject_HEAD_INIT(NULL) 300 | #if PY_MAJOR_VERSION < 3 301 | 0, /*ob_size*/ 302 | #endif 303 | "_re2.RE2_Set", /*tp_name*/ 304 | sizeof(RegexpSetObject2), /*tp_basicsize*/ 305 | 0, /*tp_itemsize*/ 306 | (destructor)regexp_set_dealloc, /*tp_dealloc*/ 307 | 0, /*tp_print*/ 308 | 0, /*tp_getattr*/ 309 | 0, /*tp_setattr*/ 310 | 0, /*tp_compare*/ 311 | 0, /*tp_repr*/ 312 | 0, /*tp_as_number*/ 313 | 0, /*tp_as_sequence*/ 314 | 0, /*tp_as_mapping*/ 315 | 0, /*tp_hash*/ 316 | 0, /*tp_call*/ 317 | 0, /*tp_str*/ 318 | 0, /*tp_getattro*/ 319 | _no_setattr, /*tp_setattro*/ 320 | 0, /*tp_as_buffer*/ 321 | Py_TPFLAGS_DEFAULT, /*tp_flags*/ 322 | "RE2 regexp set objects", /*tp_doc*/ 323 | 0, /*tp_traverse*/ 324 | 0, /*tp_clear*/ 325 | 0, /*tp_richcompare*/ 326 | 0, /*tp_weaklistoffset*/ 327 | 0, /*tp_iter*/ 328 | 0, /*tp_iternext*/ 329 | regexp_set_methods, /*tp_methods*/ 330 | 0, /*tp_members*/ 331 | 0, /*tp_getset*/ 332 | 0, /*tp_base*/ 333 | 0, /*tp_dict*/ 334 | 0, /*tp_descr_get*/ 335 | 0, /*tp_descr_set*/ 336 | 0, /*tp_dictoffset*/ 337 | 0, /*tp_init*/ 338 | 0, /*tp_alloc*/ 339 | regexp_set_new, /*tp_new*/ 340 | }; 341 | 342 | // getters for MatchObject2 343 | static PyObject* 344 | match_pos_get(MatchObject2* self) 345 | { 346 | return PyLong_FromSsize_t(self->pos); 347 | } 348 | 349 | static PyObject* 350 | match_endpos_get(MatchObject2* self) 351 | { 352 | return PyLong_FromSsize_t(self->endpos); 353 | } 354 | 355 | static PyObject* 356 | match_re_get(MatchObject2* self) 357 | { 358 | Py_INCREF(self->re); 359 | return self->re; 360 | } 361 | 362 | static PyObject* 363 | match_string_get(MatchObject2* self) 364 | { 365 | Py_INCREF(self->string); 366 | return self->string; 367 | } 368 | 369 | // getters for RegexObject2 370 | static PyObject* 371 | regexp_groups_get(RegexpObject2* self) 372 | { 373 | return PyLong_FromSsize_t(self->groups); 374 | } 375 | 376 | static PyObject* 377 | regexp_groupindex_get(RegexpObject2* self) 378 | { 379 | if (self->groupindex == NULL) { 380 | PyObject* groupindex = PyDict_New(); 381 | if (groupindex == NULL) { 382 | return NULL; 383 | } 384 | 385 | const std::map& name_map = self->re2_obj->NamedCapturingGroups(); 386 | for (std::map::const_iterator it = name_map.begin(); it != name_map.end(); ++it) { 387 | // This used to return an int() on Py2, but now returns a long() to be 388 | // consistent across Py3 and Py2. 389 | PyObject* index = PyLong_FromLong(it->second); 390 | if (index == NULL) { 391 | Py_DECREF(groupindex); 392 | return NULL; 393 | } 394 | 395 | int res = PyDict_SetItemString(groupindex, it->first.c_str(), index); 396 | Py_DECREF(index); 397 | if (res < 0) { 398 | Py_DECREF(groupindex); 399 | return NULL; 400 | } 401 | } 402 | self->groupindex = groupindex; 403 | } 404 | 405 | Py_INCREF(self->groupindex); 406 | return self->groupindex; 407 | } 408 | 409 | static PyObject* regexp_pattern_get(RegexpObject2* self) 410 | { 411 | Py_INCREF(self->pattern); 412 | return self->pattern; 413 | } 414 | 415 | static void 416 | regexp_dealloc(RegexpObject2* self) 417 | { 418 | delete self->re2_obj; 419 | Py_XDECREF(self->pattern); 420 | Py_XDECREF(self->groupindex); 421 | PyObject_Del(self); 422 | } 423 | 424 | static PyObject* 425 | create_regexp(PyObject* self, PyObject* pattern, PyObject* error_class) 426 | { 427 | RegexpObject2* regexp = PyObject_New(RegexpObject2, &Regexp_Type2); 428 | if (regexp == NULL) { 429 | return NULL; 430 | } 431 | regexp->pattern = NULL; 432 | regexp->re2_obj = NULL; 433 | regexp->groupindex = NULL; 434 | 435 | Py_ssize_t len_pattern; 436 | #if PY_MAJOR_VERSION >= 3 437 | const char* raw_pattern = PyUnicode_AsUTF8AndSize(pattern, &len_pattern); 438 | #else 439 | const char* raw_pattern = PyString_AS_STRING(pattern); 440 | len_pattern = PyString_GET_SIZE(pattern); 441 | #endif 442 | 443 | RE2::Options options; 444 | options.set_log_errors(false); 445 | 446 | regexp->re2_obj = new(nothrow) RE2(StringPiece(raw_pattern, (int)len_pattern), options); 447 | 448 | if (regexp->re2_obj == NULL) { 449 | PyErr_NoMemory(); 450 | Py_DECREF(regexp); 451 | return NULL; 452 | } 453 | 454 | if (!regexp->re2_obj->ok()) { 455 | const std::string& msg = regexp->re2_obj->error(); 456 | #if PY_MAJOR_VERSION >= 3 457 | PyObject* value = PyUnicode_FromStringAndSize(msg.data(), msg.length()); 458 | #else 459 | long code = (long)regexp->re2_obj->error_code(); 460 | PyObject* value = Py_BuildValue("ls#", code, msg.data(), msg.length()); 461 | #endif 462 | if (value == NULL) { 463 | Py_DECREF(regexp); 464 | return NULL; 465 | } 466 | PyErr_SetObject(error_class, value); 467 | Py_DECREF(regexp); 468 | return NULL; 469 | } 470 | 471 | Py_INCREF(pattern); 472 | regexp->pattern = pattern; 473 | regexp->groups = regexp->re2_obj->NumberOfCapturingGroups(); 474 | regexp->groupindex = NULL; 475 | return (PyObject*)regexp; 476 | } 477 | 478 | static PyObject* 479 | _do_search(RegexpObject2* self, PyObject* args, PyObject* kwds, RE2::Anchor anchor, bool return_match) 480 | { 481 | PyObject* string; 482 | long pos = 0; 483 | long endpos = LONG_MAX; 484 | 485 | static const char* kwlist[] = { 486 | "string", 487 | "pos", 488 | "endpos", 489 | NULL}; 490 | 491 | // Using O instead of s# here, because we want to stash the original 492 | // PyObject* in the match object on a successful match. 493 | if (!PyArg_ParseTupleAndKeywords(args, kwds, "O|ll", (char**)kwlist, 494 | &string, 495 | &pos, &endpos)) { 496 | return NULL; 497 | } 498 | 499 | const char *subject; 500 | Py_ssize_t slen; 501 | #if PY_MAJOR_VERSION >= 3 502 | if (PyUnicode_Check(string)) { 503 | subject = PyUnicode_AsUTF8AndSize(string, &slen); 504 | } else if (PyBytes_Check(string)) { 505 | subject = PyBytes_AS_STRING(string); 506 | slen = PyBytes_GET_SIZE(string); 507 | } else { 508 | PyErr_SetString(PyExc_TypeError, "can only operate on unicode or bytes"); 509 | return NULL; 510 | } 511 | #else 512 | subject = PyString_AsString(string); 513 | if (subject == NULL) { 514 | return NULL; 515 | } 516 | slen = PyString_GET_SIZE(string); 517 | #endif 518 | if (pos < 0) pos = 0; 519 | if (pos > slen) pos = slen; 520 | if (endpos < pos) endpos = pos; 521 | if (endpos > slen) endpos = slen; 522 | 523 | // Don't bother allocating these if we are just doing a test. 524 | int n_groups = 0; 525 | StringPiece* groups = NULL; 526 | if (return_match) { 527 | n_groups = self->re2_obj->NumberOfCapturingGroups() + 1; 528 | groups = new(nothrow) StringPiece[n_groups]; 529 | 530 | if (groups == NULL) { 531 | PyErr_NoMemory(); 532 | return NULL; 533 | } 534 | } 535 | 536 | bool matched = self->re2_obj->Match( 537 | StringPiece(subject, (int)slen), 538 | (int)pos, 539 | (int)endpos, 540 | anchor, 541 | groups, 542 | n_groups); 543 | 544 | if (!return_match) { 545 | if (matched) { 546 | Py_RETURN_TRUE; 547 | } 548 | Py_RETURN_FALSE; 549 | } 550 | 551 | if (!matched) { 552 | delete[] groups; 553 | Py_RETURN_NONE; 554 | } 555 | 556 | return create_match((PyObject*)self, string, pos, endpos, groups); 557 | } 558 | 559 | static PyObject* 560 | regexp_search(RegexpObject2* self, PyObject* args, PyObject* kwds) 561 | { 562 | return _do_search(self, args, kwds, RE2::UNANCHORED, true); 563 | } 564 | 565 | static PyObject* 566 | regexp_match(RegexpObject2* self, PyObject* args, PyObject* kwds) 567 | { 568 | return _do_search(self, args, kwds, RE2::ANCHOR_START, true); 569 | } 570 | 571 | static PyObject* 572 | regexp_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds) 573 | { 574 | return _do_search(self, args, kwds, RE2::ANCHOR_BOTH, true); 575 | } 576 | 577 | static PyObject* 578 | regexp_test_search(RegexpObject2* self, PyObject* args, PyObject* kwds) 579 | { 580 | return _do_search(self, args, kwds, RE2::UNANCHORED, false); 581 | } 582 | 583 | static PyObject* 584 | regexp_test_match(RegexpObject2* self, PyObject* args, PyObject* kwds) 585 | { 586 | return _do_search(self, args, kwds, RE2::ANCHOR_START, false); 587 | } 588 | 589 | static PyObject* 590 | regexp_test_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds) 591 | { 592 | return _do_search(self, args, kwds, RE2::ANCHOR_BOTH, false); 593 | } 594 | 595 | 596 | static void 597 | match_dealloc(MatchObject2* self) 598 | { 599 | Py_DECREF(self->re); 600 | Py_DECREF(self->string); 601 | delete[] self->groups; 602 | PyObject_Del(self); 603 | } 604 | 605 | static PyObject* 606 | create_match(PyObject* re, PyObject* string, 607 | long pos, long endpos, 608 | StringPiece* groups) 609 | { 610 | MatchObject2* match = PyObject_New(MatchObject2, &Match_Type2); 611 | if (match == NULL) { 612 | delete[] groups; 613 | return NULL; 614 | } 615 | match->re = NULL; 616 | match->groups = NULL; 617 | match->string = NULL; 618 | 619 | match->groups = groups; 620 | Py_INCREF(re); 621 | match->re = re; 622 | Py_INCREF(string); 623 | match->string = string; 624 | match->pos = pos; 625 | match->endpos = endpos; 626 | 627 | return (PyObject*)match; 628 | } 629 | 630 | /** 631 | * Attempt to convert an untrusted group index (PyObject* group) into 632 | * a trusted one (*idx_p). Return false on failure (exception). 633 | */ 634 | static bool 635 | _group_idx(MatchObject2* self, PyObject* group, long* idx_p) 636 | { 637 | if (group == NULL) { 638 | return false; 639 | } 640 | PyErr_Clear(); // Is this necessary? 641 | long idx = PyLong_AsLong(group); 642 | if (idx == -1 && PyErr_Occurred() != NULL) { 643 | return false; 644 | } 645 | // TODO: Consider caching NumberOfCapturingGroups. 646 | if (idx < 0 || idx > ((RegexpObject2*)self->re)->re2_obj->NumberOfCapturingGroups()) { 647 | PyErr_SetString(PyExc_IndexError, "no such group"); 648 | return false; 649 | } 650 | *idx_p = idx; 651 | return true; 652 | } 653 | 654 | /** 655 | * Extract the start and end indexes of a pre-checked group number. 656 | * Sets both to -1 if it did not participate in the match. 657 | */ 658 | static bool 659 | _group_span(MatchObject2* self, long idx, Py_ssize_t* o_start, Py_ssize_t* o_end) 660 | { 661 | // "idx" is expected to be verified. 662 | StringPiece& piece = self->groups[idx]; 663 | if (piece.data() == NULL) { 664 | *o_start = -1; 665 | *o_end = -1; 666 | return false; 667 | } 668 | Py_ssize_t start; 669 | #if PY_MAJOR_VERSION >= 3 670 | if (PyBytes_Check(self->string)) { 671 | start = piece.data() - PyBytes_AS_STRING(self->string); 672 | } else { 673 | start = piece.data() - PyUnicode_AsUTF8AndSize(self->string, NULL); 674 | } 675 | #else 676 | start = piece.data() - PyString_AS_STRING(self->string); 677 | #endif 678 | *o_start = start; 679 | *o_end = start + piece.length(); 680 | return true; 681 | } 682 | 683 | /** 684 | * Return a pre-checked group number as a string, or default_obj 685 | * if it didn't participate in the match. 686 | */ 687 | static PyObject* 688 | _group_get_i(MatchObject2* self, long idx, PyObject* default_obj) 689 | { 690 | Py_ssize_t start; 691 | Py_ssize_t end; 692 | if (!_group_span(self, idx, &start, &end)) { 693 | Py_INCREF(default_obj); 694 | return default_obj; 695 | } 696 | return PySequence_GetSlice(self->string, start, end); 697 | } 698 | 699 | /** 700 | * Return n un-checked group number as a string. 701 | */ 702 | static PyObject* 703 | _group_get_o(MatchObject2* self, PyObject* group) 704 | { 705 | long idx; 706 | if (!_group_idx(self, group, &idx)) { 707 | return NULL; 708 | } 709 | return _group_get_i(self, idx, Py_None); 710 | } 711 | 712 | 713 | static PyObject* 714 | match_group(MatchObject2* self, PyObject* args) 715 | { 716 | long idx = 0; 717 | Py_ssize_t nargs = PyTuple_GET_SIZE(args); 718 | switch (nargs) { 719 | case 1: 720 | if (!_group_idx(self, PyTuple_GET_ITEM(args, 0), &idx)) { 721 | return NULL; 722 | } 723 | // Fall through. 724 | case 0: 725 | return _group_get_i(self, idx, Py_None); 726 | default: 727 | PyObject* ret = PyTuple_New(nargs); 728 | if (ret == NULL) { 729 | return NULL; 730 | } 731 | 732 | for (int i = 0; i < nargs; i++) { 733 | PyObject* group = _group_get_o(self, PyTuple_GET_ITEM(args, i)); 734 | if (group == NULL) { 735 | Py_DECREF(ret); 736 | return NULL; 737 | } 738 | PyTuple_SET_ITEM(ret, i, group); 739 | } 740 | return ret; 741 | } 742 | } 743 | 744 | static PyObject* 745 | match_groups(MatchObject2* self, PyObject* args, PyObject* kwds) 746 | { 747 | static const char* kwlist[] = { 748 | "default", 749 | NULL}; 750 | 751 | PyObject* default_obj = Py_None; 752 | 753 | if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O", (char**)kwlist, 754 | &default_obj)) { 755 | return NULL; 756 | } 757 | 758 | int ngroups = ((RegexpObject2*)self->re)->re2_obj->NumberOfCapturingGroups(); 759 | 760 | PyObject* ret = PyTuple_New(ngroups); 761 | if (ret == NULL) { 762 | return NULL; 763 | } 764 | 765 | for (int i = 1; i <= ngroups; i++) { 766 | PyObject* group = _group_get_i(self, i, default_obj); 767 | if (group == NULL) { 768 | Py_DECREF(ret); 769 | return NULL; 770 | } 771 | PyTuple_SET_ITEM(ret, i-1, group); 772 | } 773 | 774 | return ret; 775 | } 776 | 777 | static PyObject* 778 | match_groupdict(MatchObject2* self, PyObject* args, PyObject* kwds) 779 | { 780 | static const char* kwlist[] = { 781 | "default", 782 | NULL}; 783 | 784 | PyObject* default_obj = Py_None; 785 | 786 | if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O", (char**)kwlist, 787 | &default_obj)) { 788 | return NULL; 789 | } 790 | 791 | PyObject* ret = PyDict_New(); 792 | if (ret == NULL) { 793 | return NULL; 794 | } 795 | 796 | const std::map& name_map = ((RegexpObject2*)self->re)->re2_obj->NamedCapturingGroups(); 797 | for (std::map::const_iterator it = name_map.begin(); it != name_map.end(); ++it) { 798 | PyObject* group = _group_get_i(self, it->second, default_obj); 799 | if (group == NULL) { 800 | Py_DECREF(ret); 801 | return NULL; 802 | } 803 | // TODO: Group names with embedded zeroes? 804 | int res = PyDict_SetItemString(ret, it->first.data(), group); 805 | Py_DECREF(group); 806 | if (res < 0) { 807 | Py_DECREF(ret); 808 | return NULL; 809 | } 810 | } 811 | 812 | return ret; 813 | } 814 | 815 | enum span_mode_t { START, END, SPAN }; 816 | 817 | static PyObject* 818 | _do_span(MatchObject2* self, PyObject* args, const char* name, span_mode_t mode) 819 | { 820 | long idx = 0; 821 | PyObject* group = NULL; 822 | if (!PyArg_UnpackTuple(args, name, 0, 1, 823 | &group)) { 824 | return NULL; 825 | } 826 | if (group != NULL) { 827 | if (!_group_idx(self, group, &idx)) { 828 | return NULL; 829 | } 830 | } 831 | 832 | Py_ssize_t start = - 1; 833 | Py_ssize_t end = - 1; 834 | 835 | (void)_group_span(self, idx, &start, &end); 836 | switch (mode) { 837 | case START : return Py_BuildValue("n", start ); 838 | case END : return Py_BuildValue("n", end ); 839 | case SPAN: 840 | return Py_BuildValue("nn", start, end); 841 | } 842 | 843 | // Make gcc happy. 844 | return NULL; 845 | } 846 | 847 | static PyObject* 848 | match_start(MatchObject2* self, PyObject* args) 849 | { 850 | return _do_span(self, args, "start", START); 851 | } 852 | 853 | static PyObject* 854 | match_end(MatchObject2* self, PyObject* args) 855 | { 856 | return _do_span(self, args, "end", END); 857 | } 858 | 859 | static PyObject* 860 | match_span(MatchObject2* self, PyObject* args) 861 | { 862 | return _do_span(self, args, "span", SPAN); 863 | } 864 | 865 | 866 | static void 867 | regexp_set_dealloc(RegexpSetObject2* self) 868 | { 869 | delete self->re2_set_obj; 870 | PyObject_Del(self); 871 | } 872 | 873 | static PyObject* 874 | regexp_set_new(PyTypeObject* type, PyObject* args, PyObject* kwds) 875 | { 876 | int anchoring = RE2::UNANCHORED; 877 | 878 | if (!PyArg_ParseTuple(args, "|I", &anchoring)) { 879 | anchoring = -1; 880 | } 881 | 882 | switch (anchoring) { 883 | case RE2::UNANCHORED: 884 | case RE2::ANCHOR_START: 885 | case RE2::ANCHOR_BOTH: 886 | break; 887 | default: 888 | PyErr_SetString(PyExc_ValueError, 889 | "anchoring must be one of re2.UNANCHORED, re2.ANCHOR_START, or re2.ANCHOR_BOTH"); 890 | return NULL; 891 | } 892 | 893 | RegexpSetObject2* self = (RegexpSetObject2*)type->tp_alloc(type, 0); 894 | 895 | if (self == NULL) { 896 | return NULL; 897 | } 898 | self->compiled = false; 899 | self->re2_set_obj = NULL; 900 | 901 | RE2::Options options; 902 | options.set_log_errors(false); 903 | 904 | self->re2_set_obj = new(nothrow) RE2::Set(options, (RE2::Anchor)anchoring); 905 | 906 | if (self->re2_set_obj == NULL) { 907 | PyErr_NoMemory(); 908 | Py_DECREF(self); 909 | return NULL; 910 | } 911 | 912 | return (PyObject*)self; 913 | } 914 | 915 | static PyObject* 916 | regexp_set_add(RegexpSetObject2* self, PyObject* pattern) 917 | { 918 | if (self->compiled) { 919 | PyErr_SetString(PyExc_RuntimeError, "Can't add() on an already compiled Set"); 920 | return NULL; 921 | } 922 | 923 | Py_ssize_t len_pattern; 924 | #if PY_MAJOR_VERSION >= 3 925 | const char* raw_pattern = PyUnicode_AsUTF8AndSize(pattern, &len_pattern); 926 | if (!raw_pattern) { 927 | return NULL; 928 | } 929 | #else 930 | const char* raw_pattern = PyString_AsString(pattern); 931 | if (!raw_pattern) { 932 | return NULL; 933 | } 934 | len_pattern = PyString_GET_SIZE(pattern); 935 | #endif 936 | std::string add_error; 937 | int seq = self->re2_set_obj->Add(StringPiece(raw_pattern, (int)len_pattern), &add_error); 938 | 939 | if (seq < 0) { 940 | PyErr_SetString(PyExc_ValueError, add_error.c_str()); 941 | return NULL; 942 | } 943 | 944 | return PyLong_FromLong(seq); 945 | } 946 | 947 | static PyObject* 948 | regexp_set_compile(RegexpSetObject2* self) 949 | { 950 | if (self->compiled) { 951 | Py_RETURN_NONE; 952 | } 953 | 954 | bool compiled = self->re2_set_obj->Compile(); 955 | 956 | if (!compiled) { 957 | PyErr_SetString(PyExc_MemoryError, "Ran out of memory during regexp compile"); 958 | return NULL; 959 | } 960 | 961 | self->compiled = true; 962 | Py_RETURN_NONE; 963 | } 964 | 965 | static PyObject* 966 | regexp_set_match(RegexpSetObject2* self, PyObject* text) 967 | { 968 | if (!self->compiled) { 969 | PyErr_SetString(PyExc_RuntimeError, "Can't match() on an uncompiled Set"); 970 | return NULL; 971 | } 972 | 973 | const char* raw_text; 974 | Py_ssize_t len_text; 975 | #if PY_MAJOR_VERSION >= 3 976 | if (PyUnicode_Check(text)) { 977 | raw_text = PyUnicode_AsUTF8AndSize(text, &len_text); 978 | } else if (PyBytes_Check(text)) { 979 | raw_text = PyBytes_AS_STRING(text); 980 | len_text = PyBytes_GET_SIZE(text); 981 | } else { 982 | PyErr_SetString(PyExc_TypeError, "expected str or bytes"); 983 | return NULL; 984 | } 985 | #else 986 | raw_text = PyString_AsString(text); 987 | if (raw_text == NULL) { 988 | return NULL; 989 | } 990 | len_text = PyString_GET_SIZE(text); 991 | #endif 992 | 993 | std::vector idxes; 994 | bool matched = self->re2_set_obj->Match(StringPiece(raw_text, (int)len_text), &idxes); 995 | 996 | if (matched) { 997 | PyObject* match_indexes = PyList_New(idxes.size()); 998 | 999 | for(std::vector::size_type i = 0; i < idxes.size(); ++i) { 1000 | PyList_SET_ITEM(match_indexes, (Py_ssize_t)i, PyLong_FromLong(idxes[i])); 1001 | } 1002 | 1003 | return match_indexes; 1004 | } else { 1005 | return PyList_New(0); 1006 | } 1007 | } 1008 | 1009 | 1010 | static PyObject* 1011 | _compile(PyObject* self, PyObject* args) 1012 | { 1013 | PyObject *pattern; 1014 | PyObject *error_class; 1015 | 1016 | if (!PyArg_ParseTuple(args, "O!O:_compile", 1017 | REGEX_OBJECT_TYPE, &pattern, 1018 | &error_class)) { 1019 | return NULL; 1020 | } 1021 | 1022 | return create_regexp(self, pattern, error_class); 1023 | } 1024 | 1025 | static PyObject* 1026 | escape(PyObject* self, PyObject* args) 1027 | { 1028 | char *str; 1029 | Py_ssize_t len; 1030 | 1031 | if (!PyArg_ParseTuple(args, "s#:escape", &str, &len)) { 1032 | return NULL; 1033 | } 1034 | 1035 | std::string esc(RE2::QuoteMeta(StringPiece(str, (int)len))); 1036 | 1037 | #if PY_MAJOR_VERSION >= 3 1038 | return PyUnicode_FromStringAndSize(esc.c_str(), esc.size()); 1039 | #else 1040 | return PyString_FromStringAndSize(esc.c_str(), esc.size()); 1041 | #endif 1042 | } 1043 | 1044 | static PyMethodDef methods[] = { 1045 | {"_compile", (PyCFunction)_compile, METH_VARARGS | METH_KEYWORDS, NULL}, 1046 | {"escape", (PyCFunction)escape, METH_VARARGS, 1047 | "Escape all potentially meaningful regexp characters."}, 1048 | {NULL} /* Sentinel */ 1049 | }; 1050 | 1051 | 1052 | #if PY_MAJOR_VERSION >= 3 1053 | static struct PyModuleDef moduledef = { 1054 | PyModuleDef_HEAD_INIT, 1055 | "_re2", 1056 | NULL, 1057 | 0, 1058 | methods, 1059 | NULL, 1060 | NULL, // myextension_traverse, 1061 | NULL, // myextension_clear, 1062 | NULL 1063 | }; 1064 | 1065 | #define INITERROR return NULL 1066 | #else 1067 | #define INITERROR return 1068 | #endif 1069 | 1070 | 1071 | PyMODINIT_FUNC 1072 | #if PY_MAJOR_VERSION >= 3 1073 | PyInit__re2(void) 1074 | #else 1075 | init_re2(void) 1076 | #endif 1077 | { 1078 | if (PyType_Ready(&Regexp_Type2) < 0) { 1079 | INITERROR; 1080 | } 1081 | 1082 | if (PyType_Ready(&Match_Type2) < 0) { 1083 | INITERROR; 1084 | } 1085 | 1086 | if (PyType_Ready(&RegexpSet_Type2) < 0) { 1087 | INITERROR; 1088 | } 1089 | 1090 | #if PY_MAJOR_VERSION >= 3 1091 | PyObject* mod = PyModule_Create(&moduledef); 1092 | #else 1093 | PyObject* mod = Py_InitModule("_re2", methods); 1094 | #endif 1095 | 1096 | Py_INCREF(&RegexpSet_Type2); 1097 | PyModule_AddObject(mod, "Set", (PyObject*)&RegexpSet_Type2); 1098 | 1099 | PyModule_AddIntConstant(mod, "UNANCHORED", RE2::UNANCHORED); 1100 | PyModule_AddIntConstant(mod, "ANCHOR_START", RE2::ANCHOR_START); 1101 | PyModule_AddIntConstant(mod, "ANCHOR_BOTH", RE2::ANCHOR_BOTH); 1102 | #if PY_MAJOR_VERSION >= 3 1103 | return mod; 1104 | #endif 1105 | } 1106 | -------------------------------------------------------------------------------- /re2.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # 3 | # Redistribution and use in source and binary forms, with or without 4 | # modification, are permitted provided that the following conditions 5 | # are met: 6 | # * Redistributions of source code must retain the above copyright 7 | # notice, this list of conditions and the following disclaimer. 8 | # * Redistributions in binary form must reproduce the above copyright 9 | # notice, this list of conditions and the following disclaimer in the 10 | # documentation and/or other materials provided with the distribution. 11 | # * Neither the name of Facebook nor the names of its contributors 12 | # may be used to endorse or promote products derived from this software 13 | # without specific prior written permission. 14 | # 15 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 16 | # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 17 | # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A 18 | # PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 19 | # HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 20 | # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 21 | # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 22 | # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 23 | # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 24 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 25 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 26 | 27 | import _re2 28 | import sre_constants 29 | 30 | __all__ = [ 31 | "error", 32 | "escape", 33 | "compile", 34 | "search", 35 | "match", 36 | "fullmatch", 37 | "Set", 38 | "UNANCHORED", 39 | "ANCHOR_START", 40 | "ANCHOR_BOTH", 41 | ] 42 | 43 | # Module-private compilation function, for future caching, other enhancements 44 | _compile = _re2._compile 45 | 46 | error = sre_constants.error 47 | escape = _re2.escape 48 | Set = _re2.Set 49 | UNANCHORED = _re2.UNANCHORED 50 | ANCHOR_START = _re2.ANCHOR_START 51 | ANCHOR_BOTH = _re2.ANCHOR_BOTH 52 | 53 | 54 | def compile(pattern): 55 | "Compile a regular expression pattern, returning a pattern object." 56 | return _compile(pattern, error) 57 | 58 | def search(pattern, string): 59 | """Scan through string looking for a match to the pattern, returning 60 | a match object, or None if no match was found.""" 61 | return _compile(pattern, error).search(string) 62 | 63 | def match(pattern, string): 64 | """Try to apply the pattern at the start of the string, returning 65 | a match object, or None if no match was found.""" 66 | return _compile(pattern, error).match(string) 67 | 68 | def fullmatch(pattern, string): 69 | """Try to apply the pattern to the entire string, returning 70 | a match object, or None if no match was found.""" 71 | return _compile(pattern, error).fullmatch(string) 72 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # Copyright (c) Facebook, Inc. and its affiliates. 3 | 4 | from setuptools import setup, Extension 5 | 6 | setup( 7 | name="fb-re2", 8 | version="1.0.7", 9 | url="https://github.com/facebook/pyre2", 10 | description="Python wrapper for Google's RE2", 11 | classifiers=[ 12 | "Intended Audience :: Developers", 13 | "License :: OSI Approved :: BSD License", 14 | "Development Status :: 5 - Production/Stable", 15 | ], 16 | author="David Reiss", 17 | author_email="dreiss@fb.com", 18 | maintainer="Siddharth Agarwal", 19 | maintainer_email="sid0@fb.com", 20 | py_modules = ["re2"], 21 | test_suite = "tests", 22 | ext_modules = [Extension("_re2", 23 | sources = ["_re2.cc"], 24 | libraries = ["re2"], 25 | extra_compile_args=['-std=c++11'], 26 | )], 27 | ) 28 | -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/facebook/pyre2/053612cd79dab923444454d0035835422e99a632/tests/__init__.py -------------------------------------------------------------------------------- /tests/test_compile.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | import unittest 3 | import re2 4 | 5 | class TestCompile(unittest.TestCase): 6 | def test_raise(self): 7 | with self.assertRaisesRegexp( 8 | re2.error, 9 | 'no argument for repetition operator: \\*' 10 | ): 11 | re2.compile('*') 12 | -------------------------------------------------------------------------------- /tests/test_match.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | import unittest 3 | import re2 4 | 5 | class TestMatch(unittest.TestCase): 6 | def test_const_match(self): 7 | m = re2.match('abc', 'abc') 8 | self.assertIsNotNone(m) 9 | self.assertEqual(m.start(), 0) 10 | self.assertEqual(m.end(), 3) 11 | self.assertEqual(m.span(), (0, 3)) 12 | self.assertEqual(m.groups(), tuple()) 13 | self.assertEqual(m.groupdict(), {}) 14 | self.assertEqual(m.pos, 0) 15 | self.assertEqual(m.endpos, 3) 16 | self.assertEqual(m.string, 'abc') 17 | self.assertIsNotNone(m.re) 18 | self.assertEqual(m.re.pattern, 'abc') 19 | self.assertEqual(m.re.groups, 0) 20 | self.assertEqual(m.re.groupindex, {}) 21 | 22 | def test_group_match(self): 23 | m = re2.match('ab([cde]fg)', 'abdfghij') 24 | self.assertIsNotNone(m) 25 | self.assertEqual(m.start(), 0) 26 | self.assertEqual(m.end(), 5) 27 | self.assertEqual(m.span(), (0, 5)) 28 | self.assertEqual(m.groups(), ('dfg',)) 29 | self.assertEqual(m.groupdict(), {}) 30 | self.assertEqual(m.pos, 0) 31 | self.assertEqual(m.endpos, 8) 32 | self.assertEqual(m.string, 'abdfghij') 33 | self.assertIsNotNone(m.re) 34 | self.assertEqual(m.re.pattern, 'ab([cde]fg)') 35 | self.assertEqual(m.re.groups, 1) 36 | self.assertEqual(m.re.groupindex, {}) 37 | 38 | def test_named_group_match(self): 39 | m = re2.match('ab(?P[cde]fg)', 'abdfghij') 40 | self.assertIsNotNone(m) 41 | self.assertEqual(m.start(), 0) 42 | self.assertEqual(m.end(), 5) 43 | self.assertEqual(m.span(), (0, 5)) 44 | self.assertEqual(m.groups(), ('dfg',)) 45 | self.assertEqual(m.groupdict(), {'testgroup': 'dfg'}) 46 | self.assertEqual(m.pos, 0) 47 | self.assertEqual(m.endpos, 8) 48 | self.assertEqual(m.string, 'abdfghij') 49 | self.assertIsNotNone(m.re) 50 | self.assertEqual(m.re.pattern, 'ab(?P[cde]fg)') 51 | self.assertEqual(m.re.groups, 1) 52 | self.assertEqual(m.re.groupindex, {'testgroup': 1}) 53 | 54 | def test_compiled_match(self): 55 | r = re2.compile('ab([cde]fg)') 56 | m = r.match('abdfghij') 57 | self.assertIsNotNone(m) 58 | self.assertEqual(m.start(), 0) 59 | self.assertEqual(m.end(), 5) 60 | self.assertEqual(m.span(), (0, 5)) 61 | self.assertEqual(m.groups(), ('dfg',)) 62 | self.assertEqual(m.groupdict(), {}) 63 | self.assertEqual(m.pos, 0) 64 | self.assertEqual(m.endpos, 8) 65 | self.assertEqual(m.string, 'abdfghij') 66 | self.assertIsNotNone(m.re) 67 | self.assertEqual(m.re.pattern, 'ab([cde]fg)') 68 | self.assertEqual(m.re.groups, 1) 69 | self.assertEqual(m.re.groupindex, {}) 70 | 71 | def test_match_raise(self): 72 | '''test that using the API incorrectly fails''' 73 | r = re2.compile('ab([cde]fg)') 74 | self.assertRaises(TypeError, lambda: re2.match(r, 'abdfghij')) 75 | 76 | def test_match_bytes(self): 77 | ''' test that we can match things in the bytes type ''' 78 | r = re2.compile('(\\x09)') 79 | m = r.match(b'\x09') 80 | self.assertIsNotNone(m) 81 | g = m.groups() 82 | self.assertTrue(isinstance(g, tuple)) 83 | self.assertTrue(isinstance(g[0], bytes)) 84 | self.assertEqual(b'\x09', g[0]) 85 | self.assertEqual(m.pos, 0) 86 | self.assertEqual(m.endpos, 1) 87 | self.assertEqual(m.string, b'\x09') 88 | self.assertIsNotNone(m.re) 89 | self.assertEqual(m.re.pattern, '(\\x09)') 90 | self.assertEqual(m.re.groups, 1) 91 | self.assertEqual(m.re.groupindex, {}) 92 | 93 | def test_match_str(self): 94 | ''' test that we can match binary things in the str type ''' 95 | r = re2.compile('(\\x09)') 96 | m = r.match('\x09') 97 | self.assertIsNotNone(m) 98 | g = m.groups() 99 | self.assertTrue(isinstance(g, tuple)) 100 | self.assertTrue(isinstance(g[0], str)) 101 | self.assertEqual('\x09', g[0]) 102 | 103 | def test_match_bad_utf8_bytes(self): 104 | ''' Validate that we just return None on invalid utf-8 ''' 105 | r = re2.compile('\\x80') 106 | m = r.match(b'\x80') 107 | self.assertIsNone(m) 108 | 109 | def test_invalid_pattern(self): 110 | ''' Verify that bad patterns raise an exception ''' 111 | self.assertRaises(Exception, lambda: re2.compile(')')) 112 | 113 | def test_span_type(self): 114 | ''' verify that start/end return the native literal integer type ''' 115 | r = re2.compile('abc') 116 | m = r.match('abc') 117 | self.assertTrue(isinstance(m.start(), type(1))) 118 | self.assertTrue(isinstance(m.end(), type(1))) 119 | 120 | def test_set_unanchored(self): 121 | s = re2.Set(re2.UNANCHORED) 122 | s_with_default_anchoring = re2.Set() 123 | 124 | for re2_set in [s, s_with_default_anchoring]: 125 | re2_set.add('foo') 126 | re2_set.add('bar') 127 | re2_set.add('baz') 128 | re2_set.compile() 129 | 130 | self.assertEqual(re2_set.match('foo'), [0]) 131 | self.assertEqual(re2_set.match('bar'), [1]) 132 | self.assertEqual(re2_set.match('baz'), [2]) 133 | self.assertEqual(re2_set.match('afoobaryo'), [0, 1]) 134 | 135 | self.assertEqual(re2_set.match('ooba'), []) 136 | 137 | def test_set_anchor_both(self): 138 | s = re2.Set(re2.ANCHOR_BOTH) 139 | self.assertEqual(s.add('foo'), 0) 140 | self.assertEqual(s.add('bar'), 1) 141 | s.compile() 142 | 143 | self.assertEqual(s.match('foobar'), []) 144 | self.assertEqual(s.match('fooba'), []) 145 | self.assertEqual(s.match('oobar'), []) 146 | self.assertEqual(s.match('foo'), [0]) 147 | self.assertEqual(s.match('bar'), [1]) 148 | 149 | def test_set_anchor_start(self): 150 | s = re2.Set(re2.ANCHOR_START) 151 | self.assertEqual(s.add('foo'), 0) 152 | self.assertEqual(s.add('bar'), 1) 153 | s.compile() 154 | 155 | self.assertEqual(s.match('foobar'), [0]) 156 | self.assertEqual(s.match('oobar'), []) 157 | self.assertEqual(s.match('foo'), [0]) 158 | self.assertEqual(s.match('ofoobaro'), []) 159 | self.assertEqual(s.match('baro'), [1]) 160 | 161 | def test_bad_anchoring(self): 162 | with self.assertRaisesRegexp(ValueError, 'anchoring must be one of.*'): 163 | re2.Set(None) 164 | 165 | with self.assertRaisesRegexp(ValueError, 'anchoring must be one of.*'): 166 | re2.Set(15) 167 | 168 | with self.assertRaisesRegexp(ValueError, 'anchoring must be one of.*'): 169 | re2.Set({}) 170 | 171 | def test_match_without_compile(self): 172 | s = re2.Set() 173 | s.add('foo') 174 | 175 | with self.assertRaisesRegexp(RuntimeError, 'Can\'t match\(\) on an.*'): 176 | s.match('bar') 177 | 178 | def test_add_after_compile(self): 179 | s = re2.Set() 180 | s.add('foo') 181 | s.compile() 182 | 183 | with self.assertRaisesRegexp(RuntimeError, 184 | 'Can\'t add\(\) on an already compiled Set'): 185 | s.add('bar') 186 | 187 | # Segfaults on Debain Jessie 188 | #def test_compile_on_empty(self): 189 | # s = re2.Set() 190 | # s.compile() 191 | # self.assertEqual(s.match('nah'), []) 192 | 193 | def test_double_compile(self): 194 | s = re2.Set() 195 | s.add('foo') 196 | s.compile() 197 | s.compile() 198 | 199 | def test_add_with_bad_pattern(self): 200 | s = re2.Set() 201 | 202 | with self.assertRaisesRegexp(ValueError, 'missing \)'): 203 | s.add('(') 204 | 205 | with self.assertRaises(TypeError): 206 | s.add(3) 207 | -------------------------------------------------------------------------------- /tox.ini: -------------------------------------------------------------------------------- 1 | [tox] 2 | envlist = py27, py36, py37 3 | 4 | 5 | [testenv] 6 | commands = python setup.py test 7 | --------------------------------------------------------------------------------