├── .gitignore
├── .travis.yml
├── CHANGELOG.rst
├── LICENSE
├── MANIFEST.in
├── README.rst
├── dev-requirements.txt
├── hypothesis_regex.py
├── setup.cfg
├── setup.py
├── tests
└── test_regex.py
└── tox.ini
/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | *.pyo
3 | /build/
4 | /dist/
5 | /.tox/
6 | /.eggs/
7 | /.cache/
8 | /.hypothesis/
9 | /hypothesis_regex.egg-info/
10 |
--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
1 | # Config file for automatic testing at travis-ci.org
2 |
3 | language: python
4 | # http://blog.travis-ci.com/2014-12-17-faster-builds-with-container-based-infrastructure/
5 | sudo: false
6 | python:
7 | - "3.6"
8 | - "3.5"
9 | - "3.4"
10 | - "3.3"
11 | - "2.7"
12 | - "pypy"
13 |
14 | before_install:
15 | - pip install -U pip
16 |
17 | install:
18 | - pip install -U .[reco]
19 | - pip install -U -r dev-requirements.txt
20 |
21 | script: python setup.py test
22 |
--------------------------------------------------------------------------------
/CHANGELOG.rst:
--------------------------------------------------------------------------------
1 | Changelog
2 | ---------
3 |
4 | 0.1 (2017-05-15)
5 | ++++++++++++++++
6 |
7 | * Initial release.
8 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | The MIT License (MIT)
2 |
3 | Copyright (c) 2017 Maxim Kulkin
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include *.rst LICENSE
2 | recursive-include tests *
3 | recursive-exclude tests *.pyc
4 | recursive-exclude tests *.pyo
5 |
--------------------------------------------------------------------------------
/README.rst:
--------------------------------------------------------------------------------
1 | ****************
2 | hypothesis-regex
3 | ****************
4 |
5 | .. image:: https://img.shields.io/pypi/l/hypothesis-regex.svg
6 | :target: https://github.com/maximkulkin/hypothesis-regex/blob/master/LICENSE
7 | :alt: License: MIT
8 |
9 | .. image:: https://img.shields.io/travis/maximkulkin/hypothesis-regex.svg
10 | :target: https://travis-ci.org/maximkulkin/hypothesis-regex
11 | :alt: Build Status
12 |
13 | .. image:: https://img.shields.io/pypi/v/hypothesis-regex.svg
14 | :target: https://pypi.python.org/pypi/hypothesis-regex
15 | :alt: PyPI
16 |
17 | `Hypothesis `_ extension
18 | to allow generating strings based on regex. Useful in case you have some schema
19 | (e.g. JSON Schema) which already has regular expressions validating data.
20 |
21 | Example
22 | =======
23 |
24 | .. code:: python
25 |
26 | from hypothesis_regex import regex
27 | import requests
28 | import json
29 |
30 | EMAIL_REGEX = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]{2,}\.[a-zA-Z0-9-.]{2,}$"
31 |
32 | @given(regex(EMAIL_REGEX))
33 | def test_registering_user(email):
34 | response = requests.post('/signup', json.dumps({'email': email}))
35 | assert response.status_code == 201
36 |
37 | Features
38 | ========
39 |
40 | Regex strategy returns strings that always match given regex (this check is
41 | enforced by a filter) and it tries to do that in an effective way so that less
42 | generated examples are filtered out. However, some regex constructs may decrease
43 | strategy efficiency and should be used with caution:
44 |
45 | * "^" and "$" in the middle of a string - do not do anything.
46 | * "\\b" and "\\B" (word boundary and not a word boundary) - do not do anything and
47 | instead just rely on top-level regex match filter to filter out non-matching
48 | examples.
49 | * positive lookaheads and lookbehinds just generate data they should match (as if
50 | it was part of preceeding/following parts).
51 | * negative lookaheads and lookbehinds do not do anything so it relies on
52 | preceeding/following parts to generate correct strings (otherwise the example will
53 | be filtered out).
54 | * "(?(id)yes-pattern|no-pattern)" does not actually check if group with given id
55 | was actually used and instead just generates either yes- or no-pattern.
56 |
57 | Regex strategy tries to go all crazy about generated data (e.g. "$" at the end of a
58 | string either does not generate anything or generate a newline). The idea is not to
59 | generate a nicely looking strings but instead any craze unexpected combination that
60 | will still match your given regex so you can prepare for those and handle them in
61 | most apropriate way.
62 |
63 | You can use regex flags to get more control on strategy:
64 |
65 | * re.IGNORECASE - literals or literal ranges generate both lowercase and uppercase
66 | letters. E.g. `r'a'` will generate both `"a"` and `"A"`, or `'[a-z]'` will generate
67 | both lowercase and uppercase english characters.
68 | * re.DOTALL - "." char will be able to generate newlines
69 | * re.UNICODE - character categories
70 | ("\\w", "\\d" or "\\s" and their negations) will generate unicode characters.
71 | This is default for Python 3, see re.ASCII to reverse it.
72 |
73 | There are two ways to pass regex flags:
74 |
75 | 1. By passing compiled regex with that flags: `regex(re.compile('abc', re.IGNORECASE))`
76 | 2. By using inline flags syntax: `regex('(?i)abc')`
77 |
78 | Installation
79 | ============
80 | ::
81 |
82 | $ pip install hypothesis-regex
83 |
84 | Requirements
85 | ============
86 |
87 | - Python >= 2.7 and <= 3.6
88 | - `hypothesis `__ >= 3.8
89 |
90 | Project Links
91 | =============
92 |
93 | - PyPI: https://pypi.python.org/pypi/hypothesis-regex
94 | - Issues: https://github.com/maximkulkin/hypothesis-regex/issues
95 |
96 | License
97 | =======
98 |
99 | MIT licensed. See the bundled `LICENSE `_ file for more details.
100 |
--------------------------------------------------------------------------------
/dev-requirements.txt:
--------------------------------------------------------------------------------
1 | pytest>=2.9
2 | tox>=1.5
3 |
--------------------------------------------------------------------------------
/hypothesis_regex.py:
--------------------------------------------------------------------------------
1 | from collections import namedtuple
2 | import re
3 | import six
4 | import six.moves
5 | import string
6 | import sre_parse as sre
7 | import sys
8 | import hypothesis.errors as he
9 | import hypothesis.strategies as hs
10 |
11 | __all__ = ['regex']
12 |
13 |
14 | HAS_SUBPATTERN_FLAGS = sys.version_info[:2] >= (3, 6)
15 |
16 |
17 | UNICODE_CATEGORIES = set([
18 | 'Cf', 'Cn', 'Co', 'LC', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu',
19 | 'Mc', 'Me', 'Mn', 'Nd', 'Nl', 'No', 'Pc', 'Pd', 'Pe',
20 | 'Pf', 'Pi', 'Po', 'Ps', 'Sc', 'Sk', 'Sm', 'So', 'Zl',
21 | 'Zp', 'Zs',
22 | ])
23 |
24 |
25 | SPACE_CHARS = u' \t\n\r\f\v'
26 | UNICODE_SPACE_CHARS = SPACE_CHARS + u'\x1c\x1d\x1e\x1f\x85'
27 | UNICODE_DIGIT_CATEGORIES = set(['Nd'])
28 | UNICODE_SPACE_CATEGORIES = set(['Zs', 'Zl', 'Zp'])
29 | UNICODE_LETTER_CATEGORIES = set(['LC', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu'])
30 | UNICODE_WORD_CATEGORIES = UNICODE_LETTER_CATEGORIES | set(['Nd', 'Nl', 'No'])
31 |
32 | HAS_WEIRD_WORD_CHARS = (2, 7) <= sys.version_info[:2] < (3, 4)
33 | UNICODE_WEIRD_NONWORD_CHARS = u'\U00012432\U00012433\U00012456\U00012457'
34 |
35 |
36 | class Context(object):
37 | __slots__ = ['groups', 'flags']
38 |
39 | def __init__(self, groups=None, flags=0):
40 | self.groups = groups or {}
41 | self.flags = flags
42 |
43 |
44 | class CharactersBuilder(object):
45 | '''
46 | Helper object that allows to configure `characters()` strategy with various
47 | unicode categories and characters. Also allows negation of configured set.
48 |
49 | :param negate: If True, configure `characters()` to match anything other than
50 | configured character set
51 | :param flags: Regex flags. They affect how and which characters are matched
52 | '''
53 | def __init__(self, negate=False, flags=0):
54 | self._categories = set()
55 | self._whitelist_chars = set()
56 | self._blacklist_chars = set()
57 | self._negate = negate
58 | self._ignorecase = flags & re.IGNORECASE
59 | self._unicode = (not flags & re.ASCII) \
60 | if six.PY3 else bool(flags & re.UNICODE)
61 |
62 | @property
63 | def strategy(self):
64 | 'Returns resulting strategy that generates configured char set'
65 | max_codepoint = None if self._unicode else 127
66 |
67 | strategies = []
68 | if self._negate:
69 | if self._categories or self._whitelist_chars:
70 | strategies.append(
71 | hs.characters(
72 | blacklist_categories=self._categories | set(['Cc', 'Cs']),
73 | blacklist_characters=self._whitelist_chars,
74 | max_codepoint=max_codepoint,
75 | )
76 | )
77 | if self._blacklist_chars:
78 | strategies.append(
79 | hs.sampled_from(
80 | list(self._blacklist_chars - self._whitelist_chars)
81 | )
82 | )
83 | else:
84 | if self._categories or self._blacklist_chars:
85 | strategies.append(
86 | hs.characters(
87 | whitelist_categories=self._categories,
88 | blacklist_characters=self._blacklist_chars,
89 | max_codepoint=max_codepoint,
90 | )
91 | )
92 | if self._whitelist_chars:
93 | strategies.append(
94 | hs.sampled_from(
95 | list(self._whitelist_chars - self._blacklist_chars)
96 | )
97 | )
98 |
99 | return hs.one_of(*strategies) if strategies else hs.just(u'')
100 |
101 | def add_category(self, category):
102 | '''
103 | Add unicode category to set
104 |
105 | Unicode categories are strings like 'Ll', 'Lu', 'Nd', etc.
106 | See `unicodedata.category()`
107 | '''
108 | if category == sre.CATEGORY_DIGIT:
109 | self._categories |= UNICODE_DIGIT_CATEGORIES
110 | elif category == sre.CATEGORY_NOT_DIGIT:
111 | self._categories |= UNICODE_CATEGORIES - UNICODE_DIGIT_CATEGORIES
112 | elif category == sre.CATEGORY_SPACE:
113 | self._categories |= UNICODE_SPACE_CATEGORIES
114 | for c in (UNICODE_SPACE_CHARS if self._unicode else SPACE_CHARS):
115 | self._whitelist_chars.add(c)
116 | elif category == sre.CATEGORY_NOT_SPACE:
117 | self._categories |= UNICODE_CATEGORIES - UNICODE_SPACE_CATEGORIES
118 | for c in (UNICODE_SPACE_CHARS if self._unicode else SPACE_CHARS):
119 | self._blacklist_chars.add(c)
120 | elif category == sre.CATEGORY_WORD:
121 | self._categories |= UNICODE_WORD_CATEGORIES
122 | self._whitelist_chars.add(u'_')
123 | if HAS_WEIRD_WORD_CHARS and self._unicode:
124 | for c in UNICODE_WEIRD_NONWORD_CHARS:
125 | self._blacklist_chars.add(c)
126 | elif category == sre.CATEGORY_NOT_WORD:
127 | self._categories |= UNICODE_CATEGORIES - UNICODE_WORD_CATEGORIES
128 | self._blacklist_chars.add(u'_')
129 | if HAS_WEIRD_WORD_CHARS and self._unicode:
130 | for c in UNICODE_WEIRD_NONWORD_CHARS:
131 | self._whitelist_chars.add(c)
132 |
133 | def add_chars(self, chars):
134 | 'Add given chars to char set'
135 | for c in chars:
136 | if self._ignorecase:
137 | self._whitelist_chars.add(c.lower())
138 | self._whitelist_chars.add(c.upper())
139 | else:
140 | self._whitelist_chars.add(c)
141 |
142 |
143 | @hs.defines_strategy
144 | def regex(regex):
145 | """Return strategy that generates strings that match given regex.
146 |
147 | Regex can be either a string or compiled regex (through `re.compile()`).
148 |
149 | You can use regex flags (such as `re.IGNORECASE`, `re.DOTALL` or `re.UNICODE`)
150 | to control generation. Flags can be passed either in compiled regex (specify
151 | flags in call to `re.compile()`) or inside pattern with (?iLmsux) group.
152 |
153 | Some tricky regular expressions are partly supported or not supported at all.
154 | "^" and "$" do not affect generation. Positive lookahead/lookbehind groups
155 | are considered normal groups. Negative lookahead/lookbehind groups do not do
156 | anything. Ternary regex groups ('(?(name)yes-pattern|no-pattern)') are not
157 | supported at all.
158 | """
159 | if not hasattr(regex, 'pattern'):
160 | regex = re.compile(regex)
161 |
162 | pattern = regex.pattern
163 | flags = regex.flags
164 |
165 | codes = sre.parse(pattern)
166 |
167 | return _strategy(codes, Context(flags=flags)).filter(regex.match)
168 |
169 |
170 | def _strategy(codes, context):
171 | """
172 | Convert SRE regex parse tree to strategy that generates strings matching that
173 | regex represented by that parse tree.
174 |
175 | `codes` is either a list of SRE regex elements representations or a particular
176 | element representation. Each element is a tuple of element code (as string) and
177 | parameters. E.g. regex 'ab[0-9]+' compiles to following elements:
178 |
179 | [
180 | ('literal', 97),
181 | ('literal', 98),
182 | ('max_repeat', (1, 4294967295, [
183 | ('in', [
184 | ('range', (48, 57))
185 | ])
186 | ]))
187 | ]
188 |
189 | The function recursively traverses regex element tree and converts each element
190 | to strategy that generates strings that match that element.
191 |
192 | Context stores
193 | 1. List of groups (for backreferences)
194 | 2. Active regex flags (e.g. IGNORECASE, DOTALL, UNICODE, they affect behavior
195 | of various inner strategies)
196 | """
197 | if not isinstance(codes, tuple):
198 | # List of codes
199 | strategies = []
200 |
201 | i = 0
202 | while i < len(codes):
203 | if codes[i][0] == sre.LITERAL and not (context.flags & re.IGNORECASE):
204 | # Merge subsequent "literals" into one `just()` strategy
205 | # that generates corresponding text if no IGNORECASE
206 | j = i + 1
207 | while j < len(codes) and codes[j][0] == sre.LITERAL:
208 | j += 1
209 |
210 | strategies.append(hs.just(
211 | u''.join([six.unichr(charcode) for (_, charcode) in codes[i:j]])
212 | ))
213 |
214 | i = j
215 | else:
216 | strategies.append(_strategy(codes[i], context))
217 | i += 1
218 |
219 | return hs.tuples(*strategies).map(u''.join)
220 | else:
221 | # Single code
222 | code, value = codes
223 | if code == sre.LITERAL:
224 | # Regex 'a' (single char)
225 | c = six.unichr(value)
226 | if context.flags & re.IGNORECASE:
227 | return hs.sampled_from([c.lower(), c.upper()])
228 | else:
229 | return hs.just(c)
230 |
231 | elif code == sre.NOT_LITERAL:
232 | # Regex '[^a]' (negation of a single char)
233 | c = six.unichr(value)
234 | blacklist = set([c.lower(), c.upper()]) \
235 | if context.flags & re.IGNORECASE else [c]
236 | return hs.characters(blacklist_characters=blacklist)
237 |
238 | elif code == sre.IN:
239 | # Regex '[abc0-9]' (set of characters)
240 | charsets = value
241 |
242 | builder = CharactersBuilder(negate=charsets[0][0] == sre.NEGATE,
243 | flags=context.flags)
244 |
245 | for charset_code, charset_value in charsets:
246 | if charset_code == sre.NEGATE:
247 | # Regex '[^...]' (negation)
248 | pass
249 | elif charset_code == sre.LITERAL:
250 | # Regex '[a]' (single char)
251 | builder.add_chars(six.unichr(charset_value))
252 | elif charset_code == sre.RANGE:
253 | # Regex '[a-z]' (char range)
254 | low, high = charset_value
255 | for x in six.moves.range(low, high+1):
256 | builder.add_chars(six.unichr(x))
257 | elif charset_code == sre.CATEGORY:
258 | # Regex '[\w]' (char category)
259 | builder.add_category(charset_value)
260 | else:
261 | raise he.InvalidArgument(
262 | 'Unknown charset code: %s' % charset_code
263 | )
264 |
265 | return builder.strategy
266 |
267 | elif code == sre.ANY:
268 | # Regex '.' (any char)
269 | if context.flags & re.DOTALL:
270 | return hs.characters()
271 | else:
272 | return hs.characters(blacklist_characters="\n")
273 |
274 | elif code == sre.AT:
275 | # Regexes like '^...', '...$', '\bfoo', '\Bfoo'
276 | if value == sre.AT_END:
277 | return hs.one_of(hs.just(u''), hs.just(u'\n'))
278 | return hs.just('')
279 |
280 | elif code == sre.SUBPATTERN:
281 | # Various groups: '(...)', '(:...)' or '(?P...)'
282 | old_flags = context.flags
283 | if HAS_SUBPATTERN_FLAGS:
284 | context.flags = (context.flags | value[1]) & ~value[2]
285 |
286 | strat = _strategy(value[-1], context)
287 |
288 | context.flags = old_flags
289 |
290 | if value[0]:
291 | context.groups[value[0]] = strat
292 | strat = hs.shared(strat, key=value[0])
293 |
294 | return strat
295 |
296 | elif code == sre.GROUPREF:
297 | # Regex '\\1' or '(?P=name)' (group reference)
298 | return hs.shared(context.groups[value], key=value)
299 |
300 | elif code == sre.ASSERT:
301 | # Regex '(?=...)' or '(?<=...)' (positive lookahead/lookbehind)
302 | return _strategy(value[1], context)
303 |
304 | elif code == sre.ASSERT_NOT:
305 | # Regex '(?!...)' or '(?=3.8',
24 | 'six>=1.10',
25 | ],
26 | setup_requires=['pytest-runner'],
27 | tests_require=['pytest'],
28 | classifiers=[
29 | 'Development Status :: 4 - Beta',
30 | 'Intended Audience :: Developers',
31 | 'License :: OSI Approved :: MIT License',
32 | 'Programming Language :: Python :: 2',
33 | 'Programming Language :: Python :: 2.7',
34 | 'Programming Language :: Python :: 3',
35 | 'Programming Language :: Python :: 3.3',
36 | 'Programming Language :: Python :: 3.4',
37 | 'Programming Language :: Python :: 3.5',
38 | 'Programming Language :: Python :: 3.6',
39 | 'Programming Language :: Python :: Implementation :: CPython',
40 | 'Programming Language :: Python :: Implementation :: PyPy',
41 | ],
42 | )
43 |
--------------------------------------------------------------------------------
/tests/test_regex.py:
--------------------------------------------------------------------------------
1 | import hypothesis as h
2 | import hypothesis.errors as he
3 |
4 | from hypothesis_regex import regex, UNICODE_CATEGORIES, UNICODE_DIGIT_CATEGORIES, \
5 | UNICODE_SPACE_CATEGORIES, UNICODE_WORD_CATEGORIES, UNICODE_WEIRD_NONWORD_CHARS, \
6 | SPACE_CHARS, UNICODE_SPACE_CHARS, HAS_WEIRD_WORD_CHARS
7 | import pytest
8 | import re
9 | import six
10 | import six.moves
11 | import sys
12 | import unicodedata
13 |
14 |
15 | def is_ascii(s):
16 | return all(ord(c) < 128 for c in s)
17 |
18 |
19 | def is_digit(s):
20 | return all(unicodedata.category(c) in UNICODE_DIGIT_CATEGORIES for c in s)
21 |
22 |
23 | def is_space(s):
24 | return all(c in SPACE_CHARS for c in s)
25 |
26 |
27 | def is_unicode_space(s):
28 | return all(
29 | unicodedata.category(c) in UNICODE_SPACE_CATEGORIES or \
30 | c in UNICODE_SPACE_CHARS
31 | for c in s
32 | )
33 |
34 |
35 | def is_word(s):
36 | return all(
37 | c == '_' or (
38 | (not HAS_WEIRD_WORD_CHARS or c not in UNICODE_WEIRD_NONWORD_CHARS) and
39 | unicodedata.category(c) in UNICODE_WORD_CATEGORIES
40 | )
41 | for c in s
42 | )
43 |
44 |
45 | def ascii_regex(pattern):
46 | flags = re.ASCII if six.PY3 else 0
47 | return re.compile(pattern, flags)
48 |
49 |
50 | def unicode_regex(pattern):
51 | return re.compile(pattern, re.UNICODE)
52 |
53 |
54 | class TestRegexUnicodeMatching:
55 | def _test_matching_pattern(self, pattern, isvalidchar, unicode=False):
56 | r = unicode_regex(pattern) if unicode else ascii_regex(pattern)
57 |
58 | codepoints = six.moves.range(0, sys.maxunicode+1) \
59 | if unicode else six.moves.range(1, 128)
60 | for c in [six.unichr(x) for x in codepoints]:
61 | if isvalidchar(c):
62 | assert r.match(c), (
63 | '"%s" supposed to match "%s" (%r, category "%s"), '
64 | 'but it doesnt' % (pattern, c, c, unicodedata.category(c))
65 | )
66 | else:
67 | assert not r.match(c), (
68 | '"%s" supposed not to match "%s" (%r, category "%s"), '
69 | 'but it does' % (pattern, c, c, unicodedata.category(c))
70 | )
71 |
72 | def test_matching_ascii_word_chars(self):
73 | self._test_matching_pattern(r'\w', is_word)
74 |
75 | def test_matching_unicode_word_chars(self):
76 | self._test_matching_pattern(r'\w', is_word, unicode=True)
77 |
78 | def test_matching_ascii_non_word_chars(self):
79 | self._test_matching_pattern(r'\W', lambda s: not is_word(s))
80 |
81 | def test_matching_unicode_non_word_chars(self):
82 | self._test_matching_pattern(r'\W', lambda s: not is_word(s), unicode=True)
83 |
84 | def test_matching_ascii_digits(self):
85 | self._test_matching_pattern(r'\d', is_digit)
86 |
87 | def test_matching_unicode_digits(self):
88 | self._test_matching_pattern(r'\d', is_digit, unicode=True)
89 |
90 | def test_matching_ascii_non_digits(self):
91 | self._test_matching_pattern(r'\D', lambda s: not is_digit(s))
92 |
93 | def test_matching_unicode_non_digits(self):
94 | self._test_matching_pattern(r'\D', lambda s: not is_digit(s), unicode=True)
95 |
96 | def test_matching_ascii_spaces(self):
97 | self._test_matching_pattern(r'\s', is_space)
98 |
99 | def test_matching_unicode_spaces(self):
100 | self._test_matching_pattern(r'\s', is_unicode_space, unicode=True)
101 |
102 | def test_matching_ascii_non_spaces(self):
103 | self._test_matching_pattern(r'\S', lambda s: not is_space(s))
104 |
105 | def test_matching_unicode_non_spaces(self):
106 | self._test_matching_pattern(r'\S', lambda s: not is_unicode_space(s),
107 | unicode=True)
108 |
109 |
110 | def assert_all_examples(strategy, predicate):
111 | '''
112 | Checks that there are no examples with given strategy
113 | that do not match predicate.
114 |
115 | :param strategy: Hypothesis strategy to check
116 | :param predicate: (callable) Predicate that takes string example and returns bool
117 | '''
118 | @h.settings(max_examples=1000, max_iterations=5000)
119 | @h.given(strategy)
120 | def assert_examples(s):
121 | assert predicate(s),'Found %r using strategy %s which does not match' % (
122 | s, strategy,
123 | )
124 |
125 | assert_examples()
126 |
127 |
128 | def assert_can_generate(pattern):
129 | '''
130 | Checks that regex strategy for given pattern generates examples
131 | that match that regex pattern
132 | '''
133 | compiled_pattern = re.compile(pattern)
134 | strategy = regex(pattern)
135 |
136 | assert_all_examples(strategy, compiled_pattern.match)
137 |
138 |
139 | class TestRegexStrategy:
140 | @pytest.mark.parametrize('pattern', ['abc', '[a][b][c]'])
141 | def test_literals(self, pattern):
142 | assert_can_generate(pattern)
143 |
144 | @pytest.mark.parametrize('pattern', [re.compile('a', re.IGNORECASE), '(?i)a'])
145 | def test_literals_with_ignorecase(self, pattern):
146 | strategy = regex(pattern)
147 |
148 | h.find(strategy, lambda s: s == 'a')
149 | h.find(strategy, lambda s: s == 'A')
150 |
151 | def test_not_literal(self):
152 | assert_can_generate('[^a][^b][^c]')
153 |
154 | @pytest.mark.parametrize('pattern', [
155 | re.compile('[^a][^b]', re.IGNORECASE),
156 | '(?i)[^a][^b]'
157 | ])
158 | def test_not_literal_with_ignorecase(self, pattern):
159 | assert_all_examples(
160 | regex(pattern),
161 | lambda s: s[0] not in ('a', 'A') and s[1] not in ('b', 'B')
162 | )
163 |
164 | def test_any(self):
165 | assert_can_generate('.')
166 |
167 | def test_any_doesnt_generate_newline(self):
168 | assert_all_examples(regex('.'), lambda s: s != '\n')
169 |
170 | @pytest.mark.parametrize('pattern', [re.compile('.', re.DOTALL), '(?s).'])
171 | def test_any_with_dotall_generate_newline(self, pattern):
172 | h.find(regex(pattern), lambda s: s == '\n')
173 |
174 | def test_range(self):
175 | assert_can_generate('[a-z0-9_]')
176 |
177 | def test_negative_range(self):
178 | assert_can_generate('[^a-z0-9_]')
179 |
180 | @pytest.mark.parametrize('pattern', [r'\d', '[\d]', '[^\D]'])
181 | def test_ascii_digits(self, pattern):
182 | strategy = regex(ascii_regex(pattern))
183 |
184 | assert_all_examples(strategy, lambda s: is_digit(s) and is_ascii(s))
185 |
186 | @pytest.mark.parametrize('pattern', [r'\d', '[\d]', '[^\D]'])
187 | def test_unicode_digits(self, pattern):
188 | strategy = regex(unicode_regex(pattern))
189 |
190 | h.find(strategy, lambda s: is_digit(s) and is_ascii(s))
191 | h.find(strategy, lambda s: is_digit(s) and not is_ascii(s))
192 |
193 | assert_all_examples(strategy, is_digit)
194 |
195 | @pytest.mark.parametrize('pattern', [r'\D', '[\D]', '[^\d]'])
196 | def test_ascii_non_digits(self, pattern):
197 | strategy = regex(ascii_regex(pattern))
198 |
199 | assert_all_examples(strategy, lambda s: not is_digit(s) and is_ascii(s))
200 |
201 | @pytest.mark.parametrize('pattern', [r'\D', '[\D]', '[^\d]'])
202 | def test_unicode_non_digits(self, pattern):
203 | strategy = regex(unicode_regex(pattern))
204 |
205 | h.find(strategy, lambda s: not is_digit(s) and is_ascii(s))
206 | h.find(strategy, lambda s: not is_digit(s) and not is_ascii(s))
207 |
208 | assert_all_examples(strategy, lambda s: not is_digit(s))
209 |
210 | @pytest.mark.parametrize('pattern', [r'\s', '[\s]', '[^\S]'])
211 | def test_ascii_whitespace(self, pattern):
212 | strategy = regex(ascii_regex(pattern))
213 |
214 | assert_all_examples(strategy, lambda s: is_space(s) and is_ascii(s))
215 |
216 | @pytest.mark.parametrize('pattern', [r'\s', '[\s]', '[^\S]'])
217 | def test_unicode_whitespace(self, pattern):
218 | strategy = regex(unicode_regex(pattern))
219 |
220 | h.find(strategy, lambda s: is_unicode_space(s) and is_ascii(s))
221 | h.find(strategy, lambda s: is_unicode_space(s) and not is_ascii(s))
222 |
223 | assert_all_examples(strategy, is_unicode_space)
224 |
225 | @pytest.mark.parametrize('pattern', [r'\S', '[\S]', '[^\s]'])
226 | def test_ascii_non_whitespace(self, pattern):
227 | strategy = regex(ascii_regex(pattern))
228 |
229 | assert_all_examples(strategy, lambda s: not is_space(s) and is_ascii(s))
230 |
231 | @pytest.mark.parametrize('pattern', [r'\S', '[\S]', '[^\s]'])
232 | def test_unicode_non_whitespace(self, pattern):
233 | strategy = regex(unicode_regex(pattern))
234 |
235 | h.find(strategy, lambda s: not is_unicode_space(s) and is_ascii(s))
236 | h.find(strategy, lambda s: not is_unicode_space(s) and not is_ascii(s))
237 |
238 | assert_all_examples(strategy, lambda s: not is_unicode_space(s))
239 |
240 | @pytest.mark.parametrize('pattern', [r'\w', '[\w]', '[^\W]'])
241 | def test_ascii_word(self, pattern):
242 | strategy = regex(ascii_regex(pattern))
243 |
244 | assert_all_examples(strategy, lambda s: is_word(s) and is_ascii(s))
245 |
246 | @pytest.mark.parametrize('pattern', [r'\w', '[\w]', '[^\W]'])
247 | def test_unicode_word(self, pattern):
248 | strategy = regex(unicode_regex(pattern))
249 |
250 | h.find(strategy, lambda s: is_word(s) and is_ascii(s))
251 | h.find(strategy, lambda s: is_word(s) and not is_ascii(s))
252 |
253 | assert_all_examples(strategy, is_word)
254 |
255 | @pytest.mark.parametrize('pattern', [r'\W', '[\W]', '[^\w]'])
256 | def test_ascii_non_word(self, pattern):
257 | strategy = regex(ascii_regex(pattern))
258 |
259 | assert_all_examples(strategy, lambda s: not is_word(s) and is_ascii(s))
260 |
261 | @pytest.mark.parametrize('pattern', [r'\W', '[\W]', '[^\w]'])
262 | def test_unicode_non_word(self, pattern):
263 | strategy = regex(unicode_regex(pattern))
264 |
265 | h.find(strategy, lambda s: not is_word(s) and is_ascii(s))
266 | h.find(strategy, lambda s: not is_word(s) and not is_ascii(s))
267 |
268 | assert_all_examples(strategy, lambda s: not is_word(s))
269 |
270 | def test_question_mark_quantifier(self):
271 | assert_can_generate('ab?')
272 |
273 | def test_asterisk_quantifier(self):
274 | assert_can_generate('ab*')
275 |
276 | def test_plus_quantifier(self):
277 | assert_can_generate('ab+')
278 |
279 | def test_repeater(self):
280 | assert_can_generate('ab{5}')
281 | assert_can_generate('ab{5,10}')
282 | assert_can_generate('ab{,10}')
283 | assert_can_generate('ab{5,}')
284 |
285 | def test_branch(self):
286 | assert_can_generate('ab|cd|ef')
287 |
288 | def test_group(self):
289 | assert_can_generate('(foo)+')
290 |
291 | def test_group_backreference(self):
292 | assert_can_generate('([\'"])[a-z]+\\1')
293 |
294 | def test_non_capturing_group(self):
295 | assert_can_generate('(?:[a-z])([\'"])[a-z]+\\1')
296 |
297 | def test_named_groups(self):
298 | assert_can_generate('(?P[\'"])[a-z]+(?P=foo)')
299 |
300 | def test_begining(self):
301 | assert_can_generate('^abc')
302 |
303 | def test_caret_in_the_middle_does_not_generate_anything(self):
304 | r = re.compile('a^b')
305 |
306 | with pytest.raises(he.NoSuchExample):
307 | h.find(regex(r), r.match)
308 |
309 | def test_end(self):
310 | strategy = regex('abc$')
311 |
312 | h.find(strategy, lambda s: s == 'abc')
313 | h.find(strategy, lambda s: s == 'abc\n')
314 |
315 | def test_groupref_exists(self):
316 | assert_all_examples(
317 | regex('^(<)?a(?(1)>)$'),
318 | lambda s: s in ('a', 'a\n', '', '\n')
319 | )
320 | assert_all_examples(
321 | regex('^(a)?(?(1)b|c)$'),
322 | lambda s: s in ('ab', 'ab\n', 'c', 'c\n')
323 | )
324 |
325 | @pytest.mark.skipif(sys.version_info[:2] < (3, 6), reason='requires Python 3.6')
326 | def test_subpattern_flags(self):
327 | strategy = regex('(?i)a(?-i:b)')
328 |
329 | # "a" is case insensitive
330 | h.find(strategy, lambda s: s[0] == 'a')
331 | h.find(strategy, lambda s: s[0] == 'A')
332 | # "b" is case sensitive
333 | h.find(strategy, lambda s: s[1] == 'b')
334 |
335 | with pytest.raises(he.NoSuchExample):
336 | h.find(strategy, lambda s: s[1] == 'B')
337 |
--------------------------------------------------------------------------------
/tox.ini:
--------------------------------------------------------------------------------
1 | [tox]
2 | envlist=py27,py33,py34,py35,py36,pypy
3 | [testenv]
4 | deps=
5 | -rdev-requirements.txt
6 | commands=
7 | python setup.py test
8 |
--------------------------------------------------------------------------------