├── .gitignore ├── .landscape.yml ├── .travis.yml ├── LICENSE ├── doc └── source │ ├── about.rst │ ├── best_practices.rst │ ├── conf.py │ ├── howto.rst │ ├── index.rst │ ├── modules.rst │ ├── solar │ ├── layout.html │ └── theme.conf │ └── static │ ├── solar.css │ └── solarized-dark.css ├── examples ├── __init__.py ├── colortime │ ├── __init__.py │ ├── colortime.py │ ├── expressions.yaml │ ├── functions.py │ └── patterns.yaml ├── phone │ ├── __init__.py │ ├── expressions.yaml │ ├── functions.py │ ├── patterns.yaml │ └── phone.py └── readme.rst ├── nose.cfg ├── readme.rst ├── reparse ├── __init__.py ├── builders.py ├── config.py ├── expression.py ├── parsers.py ├── tools │ ├── __init__.py │ └── expression_checker.py ├── util.py └── validators.py ├── requirements-dev.txt ├── requirements.txt ├── setup.py ├── tests └── tests_expression.py └── tox.ini /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | .coverage 3 | .idea/* 4 | .tox/ 5 | TAGS 6 | dist/* 7 | doc/build/* 8 | doc/html/* 9 | reparse.egg-info/ 10 | -------------------------------------------------------------------------------- /.landscape.yml: -------------------------------------------------------------------------------- 1 | strictness: veryhigh 2 | pep8: 3 | full: true 4 | doc-warnings: false 5 | test-warnings: true 6 | max-line-length: 80 7 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: python 2 | sudo: false 3 | python: 4 | - 3.2 5 | - 3.3 6 | - 3.4 7 | - 3.5 8 | install: 9 | - pip install -r requirements-dev.txt 10 | script: nosetests -c nose.cfg 11 | notifications: 12 | email: false 13 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 Andy Chase 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /doc/source/about.rst: -------------------------------------------------------------------------------- 1 | About: Why another tool for parsing? 2 | ==================================== 3 | 4 | Reparse is simply a tool for combining regular expressions together 5 | and using a regular expression engine to scan/search/parse/process input for certain tasks. 6 | 7 | Larger parsing tools like YACC/Bison, ANTLR, and others are really 8 | good for structured input like computer code or xml. They aren't specifically 9 | designed for scanning and parsing semi-structured data from unstructured 10 | text (like books, or internet documents, or diaries). 11 | 12 | Reparse is designed to work with exactly that kind of stuff, (and is completely 13 | useless for the kinds of tasks any of the above is often used for). 14 | 15 | Parsing Spectrum 16 | ---------------- 17 | 18 | Reparse isn't the first parser of it's kind. A hypothetical spectrum 19 | of parsers from pattern-finding only 20 | all the way to highly-featured, structured grammars might look something like this:: 21 | 22 | v- Reparse v- YACC/Bison 23 | UNSTRUCTURED |-------------------------| STRUCTURED 24 | ^- Regex ^- Parboiled/PyParsing 25 | 26 | Reparse is in fact very featureless. It's only a little better 27 | than plain regular expressions. Still, you might find it ideal 28 | for the kinds of tasks it was designed to deal with (like dates and addresses). 29 | 30 | 31 | What kind of things might Reparse be useful for parsing? 32 | -------------------------------------------------------- 33 | 34 | Any kind of semi-structured formats: 35 | 36 | - Uris 37 | - Numbers 38 | - Strings 39 | - Dates 40 | - Times 41 | - Addresses 42 | - Phone numbers 43 | 44 | Or in other words, anything you might consider parsing with Regex, might consider Reparse, 45 | especially if you are considering combining multiple regular expressions together. 46 | 47 | Why Regular Expressions 48 | ----------------------- 49 | 50 | PyParsing (Python) and Parboiled (JVM) also have use-cases very similar 51 | to Reparse, and they are much more feature-filled. They have their own (much more powerful) 52 | DSL for parsing text. 53 | 54 | Reparse uses Regular Expressions which has some advantages: 55 | 56 | - Short, minimal Syntax 57 | - Universal (with some minor differences between different engines) 58 | - Standard 59 | - Moderately Easy-to-learn (Though this is highly subjective) 60 | - Many programmers already know the basics 61 | - Skills can be carried else where 62 | - **Regular Expressions can be harvested elsewhere and used within Reparse** 63 | - Decent performance over large inputs 64 | - Ability to use fuzzy matching regex engines 65 | 66 | 67 | Limitations of Reparse 68 | ---------------------- 69 | 70 | Regular Expressions have been known to catch input that was unexpected, 71 | or miss input that was expected due to unforeseen edge cases. 72 | Reparse provides tools to help alleviate this by checking the expressions against expected matching 73 | inputs, and against expected non-matching inputs. 74 | 75 | This library is very limited in what it can parse, if you realize 76 | you need something like a recursive grammar, you might want to try PyParsing or something greater 77 | (though Reparse might be helpful as a 'first step' matching and transforming the parse-able data before it is properly 78 | parsed by a different library). -------------------------------------------------------------------------------- /doc/source/best_practices.rst: -------------------------------------------------------------------------------- 1 | Best practices for Regular Expressions 2 | ====================================== 3 | 4 | As your Regexs grow and form into beautiful pattern-matching powerhouses, it is important 5 | to take good care of them, and they have a tendency to grow unruly as they mature. 6 | 7 | Here are some several safe practices for handling your Regexs so that 8 | they can have a long productive life without getting out of control: 9 | 10 | - Use whitespace and comments in your regular expressions. 11 | Whitespace is set to be ignored by default (use /s to match whitespace) so use 12 | spaces and newlines to break up sections freely. Regex comments look like (?# this ). 13 | - Never let a regex become too big to be easily understood. Split up big regex 14 | into smaller expressions. (Sensible splits won't hurt them). 15 | - Maintain a Matches and Non-Matches 16 | - Reparse can use this to test your Regex to make sure they are matching properly 17 | - It helps maintainers see which regular expressions match what quickly 18 | - It helps show your intention with each expression, so that others can confidently improve or modify them 19 | - Maintain a description which talks about what you are trying to match with each regex, 20 | what you are not matching and why, and possibly a url where they might learn more 21 | about that specific format. 22 | - Having each regex author list his name can be a great boon. It gives them 23 | credit for their work, it encourages them to put forth their best effort, and is an easy way 24 | to name them. 25 | I often name the regex after the the author so I don't have to come up with unique names 26 | for all my regexs, since that are often really similar. 27 | 28 | 29 | For more information about maintaining a regex-safe environment visit: 30 | 31 | http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-problems.html 32 | -------------------------------------------------------------------------------- /doc/source/conf.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # Reparse documentation build configuration file, created by 4 | # sphinx-quickstart on Mon Aug 12 16:05:44 2013. 5 | # 6 | # This file is execfile()d with the current directory set to its containing dir. 7 | # 8 | # Note that not all possible configuration values are present in this 9 | # autogenerated file. 10 | # 11 | # All configuration values have a default; values that are commented out 12 | # serve to show the default. 13 | 14 | import sys 15 | import os 16 | 17 | # If extensions (or modules to document with autodoc) are in another directory, 18 | # add these directories to sys.path here. If the directory is relative to the 19 | # documentation root, use os.path.abspath to make it absolute, like shown here. 20 | sys.path.insert(0, os.path.abspath('../..')) 21 | 22 | # -- General configuration ----------------------------------------------------- 23 | 24 | # If your documentation needs a minimal Sphinx version, state it here. 25 | #needs_sphinx = '1.0' 26 | 27 | # Add any Sphinx extension module names here, as strings. They can be extensions 28 | # coming with Sphinx (named 'sphinx.ext.*') or your custom ones. 29 | extensions = ['sphinx.ext.autodoc'] 30 | 31 | # Add any paths that contain templates here, relative to this directory. 32 | templates_path = ['_templates'] 33 | 34 | # The suffix of source filenames. 35 | source_suffix = '.rst' 36 | 37 | # The encoding of source files. 38 | #source_encoding = 'utf-8-sig' 39 | 40 | # The master toctree document. 41 | master_doc = 'index' 42 | 43 | # General information about the project. 44 | project = u'Reparse' 45 | copyright = u'2013, Andy Chase' 46 | 47 | # The version info for the project you're documenting, acts as replacement for 48 | # |version| and |release|, also used in various other places throughout the 49 | # built documents. 50 | # 51 | # The short X.Y version. 52 | version = '3.0' 53 | # The full version, including alpha/beta/rc tags. 54 | release = '3.0' 55 | 56 | # The language for content autogenerated by Sphinx. Refer to documentation 57 | # for a list of supported languages. 58 | #language = None 59 | 60 | # There are two options for replacing |today|: either, you set today to some 61 | # non-false value, then it is used: 62 | #today = '' 63 | # Else, today_fmt is used as the format for a strftime call. 64 | #today_fmt = '%B %d, %Y' 65 | 66 | # List of patterns, relative to source directory, that match files and 67 | # directories to ignore when looking for source files. 68 | exclude_patterns = [] 69 | 70 | # The reST default role (used for this markup: `text`) to use for all documents. 71 | #default_role = None 72 | 73 | # If true, '()' will be appended to :func: etc. cross-reference text. 74 | #add_function_parentheses = True 75 | 76 | # If true, the current module name will be prepended to all description 77 | # unit titles (such as .. function::). 78 | #add_module_names = True 79 | 80 | # If true, sectionauthor and moduleauthor directives will be shown in the 81 | # output. They are ignored by default. 82 | #show_authors = False 83 | 84 | # The name of the Pygments (syntax highlighting) style to use. 85 | pygments_style = 'sphinx' 86 | 87 | # A list of ignored prefixes for module index sorting. 88 | #modindex_common_prefix = [] 89 | 90 | # If true, keep warnings as "system message" paragraphs in the built documents. 91 | #keep_warnings = False 92 | 93 | 94 | # -- Options for HTML output --------------------------------------------------- 95 | 96 | # The theme to use for HTML and HTML Help pages. See the documentation for 97 | # a list of builtin themes. 98 | html_theme = 'solar' 99 | html_theme_path = ["."] 100 | 101 | # Theme options are theme-specific and customize the look and feel of a theme 102 | # further. For a list of options available for each theme, see the 103 | # documentation. 104 | #html_theme_options = {} 105 | 106 | # Add any paths that contain custom themes here, relative to this directory. 107 | #html_theme_path = [] 108 | 109 | # The name for this set of Sphinx documents. If None, it defaults to 110 | # " v documentation". 111 | #html_title = None 112 | 113 | # A shorter title for the navigation bar. Default is the same as html_title. 114 | #html_short_title = None 115 | 116 | # The name of an image file (relative to this directory) to place at the top 117 | # of the sidebar. 118 | #html_logo = None 119 | 120 | # The name of an image file (within the static path) to use as favicon of the 121 | # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 122 | # pixels large. 123 | #html_favicon = None 124 | 125 | # Add any paths that contain custom static files (such as style sheets) here, 126 | # relative to this directory. They are copied after the builtin static files, 127 | # so a file named "default.css" will overwrite the builtin "default.css". 128 | html_static_path = ['static'] 129 | 130 | # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, 131 | # using the given strftime format. 132 | #html_last_updated_fmt = '%b %d, %Y' 133 | 134 | # If true, SmartyPants will be used to convert quotes and dashes to 135 | # typographically correct entities. 136 | #html_use_smartypants = True 137 | 138 | # Custom sidebar templates, maps document names to template names. 139 | #html_sidebars = {} 140 | 141 | # Additional templates that should be rendered to pages, maps page names to 142 | # template names. 143 | #html_additional_pages = {} 144 | 145 | # If false, no module index is generated. 146 | html_domain_indices = False 147 | 148 | # If false, no index is generated. 149 | html_use_index = False 150 | 151 | # If true, the index is split into individual pages for each letter. 152 | #html_split_index = False 153 | 154 | # If true, links to the reST sources are added to the pages. 155 | html_show_sourcelink = False 156 | 157 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. 158 | html_show_sphinx = False 159 | 160 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. 161 | html_show_copyright = False 162 | 163 | # If true, an OpenSearch description file will be output, and all pages will 164 | # contain a tag referring to it. The value of this option must be the 165 | # base URL from which the finished HTML is served. 166 | #html_use_opensearch = '' 167 | 168 | # This is the file name suffix for HTML files (e.g. ".xhtml"). 169 | #html_file_suffix = None 170 | 171 | # Output file base name for HTML help builder. 172 | htmlhelp_basename = 'Reparsedoc' 173 | 174 | 175 | # -- Options for LaTeX output -------------------------------------------------- 176 | 177 | latex_elements = { 178 | # The paper size ('letterpaper' or 'a4paper'). 179 | #'papersize': 'letterpaper', 180 | 181 | # The font size ('10pt', '11pt' or '12pt'). 182 | #'pointsize': '10pt', 183 | 184 | # Additional stuff for the LaTeX preamble. 185 | #'preamble': '', 186 | } 187 | 188 | # Grouping the document tree into LaTeX files. List of tuples 189 | # (source start file, target name, title, author, documentclass [howto/manual]). 190 | latex_documents = [ 191 | ('index', 'Reparse.tex', u'Reparse Documentation', 192 | u'Andy Chase', 'manual'), 193 | ] 194 | 195 | # The name of an image file (relative to this directory) to place at the top of 196 | # the title page. 197 | #latex_logo = None 198 | 199 | # For "manual" documents, if this is true, then toplevel headings are parts, 200 | # not chapters. 201 | #latex_use_parts = False 202 | 203 | # If true, show page references after internal links. 204 | #latex_show_pagerefs = False 205 | 206 | # If true, show URL addresses after external links. 207 | #latex_show_urls = False 208 | 209 | # Documents to append as an appendix to all manuals. 210 | #latex_appendices = [] 211 | 212 | # If false, no module index is generated. 213 | #latex_domain_indices = True 214 | 215 | 216 | # -- Options for manual page output -------------------------------------------- 217 | 218 | # One entry per manual page. List of tuples 219 | # (source start file, name, description, authors, manual section). 220 | man_pages = [ 221 | ('index', 'reparse', u'Reparse Documentation', 222 | [u'Andy Chase'], 1) 223 | ] 224 | 225 | # If true, show URL addresses after external links. 226 | #man_show_urls = False 227 | 228 | 229 | # -- Options for Texinfo output ------------------------------------------------ 230 | 231 | # Grouping the document tree into Texinfo files. List of tuples 232 | # (source start file, target name, title, author, 233 | # dir menu entry, description, category) 234 | texinfo_documents = [ 235 | ('index', 'Reparse', u'Reparse Documentation', 236 | u'Andy Chase', 'Reparse', 'One line description of project.', 237 | 'Miscellaneous'), 238 | ] 239 | 240 | # Documents to append as an appendix to all manuals. 241 | #texinfo_appendices = [] 242 | 243 | # If false, no module index is generated. 244 | #texinfo_domain_indices = True 245 | 246 | # How to display URL addresses: 'footnote', 'no', or 'inline'. 247 | #texinfo_show_urls = 'footnote' 248 | 249 | # If true, do not generate a @detailmenu in the "Top" node's menu. 250 | #texinfo_no_detailmenu = False 251 | -------------------------------------------------------------------------------- /doc/source/howto.rst: -------------------------------------------------------------------------------- 1 | Howto: How to use Reparse 2 | ========================= 3 | 4 | 5 | You will need 6 | ------------- 7 | 8 | #. A Python environment & some Regular Expression knowledge. Some resources: RegexOne_, Regex-Info_, W3Schools_. 9 | 10 | #. Some example texts that you will want to parse and their solutions. 11 | This will be useful to check your parser and will help you put together the expressions and patterns. 12 | 13 | 1. Setup Python & Reparse 14 | ------------------------- 15 | 16 | See :ref:`installation-howto` for instructions on how to install Reparse 17 | 18 | 2. Layout of an example Reparse parser 19 | -------------------------------------- 20 | 21 | Reparse needs 3 things in its operation: 22 | 23 | 1. Functions: A dictionary with String Key -> Function Value mapping. 24 | 25 | .. code-block:: python 26 | 27 | {'cool func': func} 28 | 29 | 2. Expressions: A dictionary with Groups (Dict) -> Expressions (Dict) -> Expression (String/Regex). 30 | I typically format this in Yaml since it's easy to read and write, but you can do it in json, or even straight 31 | in Python. 32 | 33 | .. code-block:: python 34 | 35 | {'my cool group': {'my cool expression': {'expression': '[a-z]+'}}} 36 | 37 | 3. Patterns: A dictionary with Patterns (Dict) -> Pattern (String/Regex) (Ditto about the Yaml). 38 | 39 | .. code-block:: python 40 | 41 | {'my cool pattern': {'pattern': ".*" }} 42 | 43 | 44 | As I mentioned I typically just use Yaml so my file structure looks like this:: 45 | 46 | my_parser.py <- The actual parser 47 | functions.py <- All the functions and a dictionary mapping them at the end 48 | expressions.yaml <- All of the Regex pieces nice and structured 49 | patterns.yaml <- All the Regex patterns 50 | 51 | This isn't forced, you could have them all in one file, or it split up 52 | in many, many files. This is just how I organize my parsing resources. 53 | 54 | 3. Writing your first expressions.yaml file 55 | ------------------------------------------- 56 | 57 | *I'll be parsing Phone numbers in this example.* 58 | 59 | .. Tip:: 60 | 61 | You wanna know a secret? I don't do all the hard work myself, I often borrow Regexs off other sites 62 | like http://regexlib.com/. 63 | 64 | .. code-block:: yaml 65 | 66 | Phone: # <- This is the expression group name, it's like the 'type' of your regex 67 | # All the expressions in one group should be able to be substituted for each other 68 | Senthil Gunabalan: # <- This is the expression 69 | # I use the authors name because I couldn't be asked to come up with names for all of them myself 70 | Expression: | 71 | [+]([0-9] \d{2}) - (\d{3}) - (\d{4}) 72 | # Whitespace is ignored, so you can use it to make your regexs readable 73 | Description: This is a basic telephone number validation [...] 74 | Matches: +974-584-5656 | +000-000-0000 | +323-343-3453 75 | Non-Matches: 974-584-5656 | +974 000 0000 76 | Groups: 77 | - AreaCode 78 | - Prefix 79 | - Body 80 | # The keys in the Groups: field have to match up with the capture groups (stuff in parenthesis ()) in the Expression 81 | # They are used as keyword arguments to the function that processes this expression 82 | # (Expression groups 'Phone' and capture groups () are different) 83 | 84 | So that's a basic expression file. The hierarchy goes: 85 | Group: 86 | Expression: 87 | Detail: 88 | Detail: 89 | Detail: 90 | Group: 91 | Expression: 92 | Detail: 93 | Detail: 94 | Detail: 95 | Expression: 96 | Detail: 97 | 98 | 4. Writing your first patterns.yaml file 99 | ---------------------------------------- 100 | 101 | There aren't any capture groups in patterns. All the capture groups should be done 102 | in expressions and merely *combined* in patterns. 103 | 104 | .. code-block:: yaml 105 | 106 | Basic Phone: 107 | Pattern: 108 | Order: 1 109 | 110 | Fax Phone: 111 | Pattern: | 112 | Fax: \s 113 | Order: 2 114 | # I could have used instead to use a pattern inside a pattern but it wouldn't have made a difference really (just an extra function call). 115 | 116 | The order field tells Reparse which pattern to pick if multiple patterns match. 117 | Generally speaking, the more specific patterns should be ordered higher than the lower ones 118 | (you wouldn't want someone to try and call a fax machine!). 119 | 120 | I could have split the expression above into 4 expression groups: Country Code, Area Code, 3-digit prefix, 4-digit body, 121 | and combined them in the patterns file, and that would have looked like this: 122 | 123 | .. code-block:: yaml 124 | 125 | Mega International: 126 | Pattern: [+]--<3-digit prefix>-<4-digit body> 127 | 128 | Done this way, I could have had 3 different formats for Area Code and the pattern would have matched 129 | on any of them. I didn't here because that'd be overkill for phone numbers. 130 | 131 | 5. Writing your functions.py file 132 | --------------------------------- 133 | 134 | Reparse matches text and also does some parsing using functions. 135 | 136 | The order in which the functions are run and results passed are as follows: 137 | 138 | #. The Function mapped to the Expression name is called with keyword arguments named in the ``Groups:`` key 139 | ('Senthil Gunabalan' in this example). 140 | 141 | #. The output of that function is passed to the function mapped to the Expression Group ('Phone' in this example). 142 | 143 | #. The output of that function is passed to the function mapped to the Pattern name ('Basic Phone' or 'Fax Phone'). 144 | 145 | #. (Optional) If you used *Patterns inside Patterns* then the output bubbles up to the top. 146 | 147 | #. The output of that function is returned. 148 | 149 | .. code-block:: python 150 | 151 | from collections import namedtuple 152 | Phone = namedtuple('phone', 'area_code prefix body fax') 153 | 154 | 155 | def senthill((AreaCode, Prefix, Body): 156 | return Phone(area_code=AreaCode, prefix=Prefix, body=Body, fax=False) 157 | 158 | 159 | def phone(p): 160 | return p[0] 161 | 162 | 163 | def basic_phone(p): 164 | return p 165 | 166 | 167 | def fax_phone(p): 168 | return p[0]._replace(fax=True) 169 | 170 | functions = { 171 | 'Senthil Gunabalan' : senthill, 172 | 'Phone' : phone, 173 | 'Basic Phone' : basic_phone, 174 | 'Fax Phone' : fax_phone 175 | } 176 | 177 | I used namedtuples here, but you can parse your output anyway you want to. 178 | 179 | 6. Combining it all together! 180 | ----------------------------- 181 | 182 | The builder.py module contains some functions to build a Reparse system together. 183 | Here's how I'd put together my phone number parser: 184 | 185 | .. code-block:: python 186 | 187 | from examples.phone.functions import functions 188 | import reparse 189 | 190 | phone_parser = reparse.parser( 191 | parser_type=reparse.basic_parser, 192 | expressions_yaml_path=path + "expressions.yaml", 193 | patterns_yaml_path=path + "patterns.yaml", 194 | functions=functions 195 | ) 196 | 197 | 198 | print(phone_parser('+974-584-5656')) 199 | # => [phone(area_code='974', prefix='584', body='5656', fax=False)] 200 | print(phone_parser('Fax: +974-584-5656')) 201 | # => [phone(area_code='974', prefix='584', body='5656', fax=True)] 202 | 203 | 7. More info 204 | ------------ 205 | 206 | Yeah, so this was all basically straight out of the examples/phone directory 207 | where you can run it yourself and see if it actually works. 208 | 209 | There's more (or at least one more) example in there for further insight. 210 | 211 | Happy parsing! 212 | 213 | .. _RegexOne: http://regexone.com/ 214 | .. _Regex-Info: http://www.regular-expressions.info/tutorial.html 215 | .. _W3Schools: http://w3schools.com/jsref/jsref_obj_regexp.asp 216 | -------------------------------------------------------------------------------- /doc/source/index.rst: -------------------------------------------------------------------------------- 1 | .. toctree:: 2 | :maxdepth: 2 3 | :hidden: 4 | 5 | about 6 | howto 7 | best_practices 8 | modules 9 | 10 | .. include:: ../../readme.rst 11 | 12 | -------------------------------------------------------------------------------- /doc/source/modules.rst: -------------------------------------------------------------------------------- 1 | Here lies the embedded docblock documentation for the various parts of Reparse. 2 | 3 | expression 4 | ========= 5 | 6 | .. automodule:: reparse.expression 7 | :members: 8 | 9 | 10 | builders 11 | ======== 12 | 13 | .. automodule:: reparse.builders 14 | :members: 15 | 16 | 17 | 18 | tools 19 | ===== 20 | 21 | expression_checker 22 | ------------------ 23 | 24 | .. automodule:: reparse.tools.expression_checker 25 | :members: -------------------------------------------------------------------------------- /doc/source/solar/layout.html: -------------------------------------------------------------------------------- 1 | {% extends "basic/layout.html" %} 2 | 3 | {%- block doctype -%} 4 | 5 | {%- endblock -%} 6 | 7 | {%- block extrahead -%} 8 | 9 | 10 | {%- endblock -%} 11 | 12 | {# put the sidebar before the body #} 13 | {% block sidebar1 %}{{ sidebar() }}{% endblock %} 14 | {% block sidebar2 %}{% endblock %} 15 | 16 | {%- block sidebartoc %} 17 |

Topics

18 |
    19 | {% for nav_item in [ 20 | ('index', 'Intro'), 21 | ('about', 'About'), 22 | ('howto', 'Howto'), 23 | ('best_practices', 'Best practices'), 24 | ('modules', 'Modules'), 25 | ]%} 26 | {% if nav_item[0] == pagename %} 27 |
  • {{ nav_item[1] }}
  • 28 | {% else %} 29 |
  • {{ nav_item[1] }}
  • 30 | {% endif %} 31 | {% endfor %} 32 |
33 | {%- include "localtoc.html" %} 34 | {%- endblock %} 35 | 36 | {%- block footer %} 37 | {%- endblock %} 38 | {%- block relbar2 %} 39 | 40 | {%- endblock %} -------------------------------------------------------------------------------- /doc/source/solar/theme.conf: -------------------------------------------------------------------------------- 1 | [theme] 2 | inherit = basic 3 | stylesheet = solar.css 4 | pygments_style = none 5 | -------------------------------------------------------------------------------- /doc/source/static/solar.css: -------------------------------------------------------------------------------- 1 | /* solar.css 2 | * Modified from sphinxdoc.css of the sphinxdoc theme. 3 | */ 4 | 5 | @import url("basic.css"); 6 | 7 | /* -- page layout ----------------------------------------------------------- */ 8 | 9 | body { 10 | font-family: sans-serif; 11 | font-size: 14px; 12 | line-height: 150%; 13 | text-align: center; 14 | color: #002b36; 15 | padding: 0; 16 | margin: 0px 80px 0px 80px; 17 | min-width: 740px; 18 | } 19 | 20 | div.document { 21 | text-align: left; 22 | } 23 | 24 | div.bodywrapper { 25 | margin: 0 240px 0 0; 26 | border-right: 1px dotted #eee8d5; 27 | } 28 | 29 | div.body { 30 | margin: 0; 31 | padding: 0.5em 20px 20px 20px; 32 | } 33 | 34 | div.related { 35 | font-size: 1em; 36 | background: black; 37 | color: #839496; 38 | padding: 5px 0px; 39 | } 40 | 41 | div.related ul { 42 | height: 2em; 43 | margin: 2px; 44 | } 45 | 46 | div.related ul li { 47 | margin: 0; 48 | padding: 0; 49 | height: 2em; 50 | float: left; 51 | } 52 | 53 | div.related ul li.right { 54 | float: right; 55 | margin-right: 5px; 56 | } 57 | 58 | div.related ul li a { 59 | margin: 0; 60 | padding: 2px 5px; 61 | line-height: 2em; 62 | text-decoration: none; 63 | color: #839496; 64 | } 65 | 66 | div.related ul li a:hover { 67 | background-color: #073642; 68 | -webkit-border-radius: 2px; 69 | -moz-border-radius: 2px; 70 | border-radius: 2px; 71 | } 72 | 73 | div.sphinxsidebarwrapper { 74 | padding: 0; 75 | } 76 | 77 | div.sphinxsidebar { 78 | margin: 0; 79 | padding: 0.5em 15px 15px 0; 80 | width: 210px; 81 | float: right; 82 | font-size: 0.9em; 83 | text-align: left; 84 | } 85 | 86 | div.sphinxsidebar h3, div.sphinxsidebar h4 { 87 | margin: 1em 0 0.5em 0; 88 | font-size: 1em; 89 | padding: 0.7em; 90 | background-color: #eeeff1; 91 | } 92 | 93 | div.sphinxsidebar h3 a { 94 | color: #2E3436; 95 | } 96 | 97 | div.sphinxsidebar ul { 98 | padding-left: 1.5em; 99 | margin-top: 7px; 100 | padding: 0; 101 | line-height: 150%; 102 | color: #586e75; 103 | } 104 | 105 | div.sphinxsidebar ul ul { 106 | margin-left: 20px; 107 | } 108 | 109 | div.sphinxsidebar input { 110 | border: 1px solid #eee8d5; 111 | } 112 | 113 | div.footer { 114 | background-color: #93a1a1; 115 | color: #eee; 116 | padding: 3px 8px 3px 0; 117 | clear: both; 118 | font-size: 0.8em; 119 | text-align: right; 120 | } 121 | 122 | div.footer a { 123 | color: #eee; 124 | text-decoration: none; 125 | } 126 | 127 | /* -- body styles ----------------------------------------------------------- */ 128 | 129 | p { 130 | margin: 0.8em 0 0.5em 0; 131 | } 132 | 133 | div.body a, div.sphinxsidebarwrapper a { 134 | color: #268bd2; 135 | text-decoration: none; 136 | } 137 | 138 | div.body a:hover, div.sphinxsidebarwrapper a:hover { 139 | border-bottom: 1px solid #268bd2; 140 | } 141 | 142 | h1, h2, h3, h4, h5, h6 { 143 | font-family: "Open Sans", sans-serif; 144 | font-weight: 300; 145 | } 146 | 147 | h1 { 148 | margin: 0; 149 | padding: 0.7em 0 0.3em 0; 150 | line-height: 1.2em; 151 | color: black; 152 | } 153 | 154 | h2 { 155 | margin: 1.3em 0 0.2em 0; 156 | padding: 0 0 10px 0; 157 | color: #073642; 158 | border-bottom: 1px solid #eee; 159 | } 160 | 161 | h3 { 162 | margin: 1em 0 -0.3em 0; 163 | padding-bottom: 5px; 164 | } 165 | 166 | h3, h4, h5, h6 { 167 | color: #073642; 168 | border-bottom: 1px dotted #eee; 169 | } 170 | 171 | div.body h1 a, div.body h2 a, div.body h3 a, div.body h4 a, div.body h5 a, div.body h6 a { 172 | color: #657B83!important; 173 | } 174 | 175 | h1 a.anchor, h2 a.anchor, h3 a.anchor, h4 a.anchor, h5 a.anchor, h6 a.anchor { 176 | display: none; 177 | margin: 0 0 0 0.3em; 178 | padding: 0 0.2em 0 0.2em; 179 | color: #aaa!important; 180 | } 181 | 182 | h1:hover a.anchor, h2:hover a.anchor, h3:hover a.anchor, h4:hover a.anchor, 183 | h5:hover a.anchor, h6:hover a.anchor { 184 | display: inline; 185 | } 186 | 187 | h1 a.anchor:hover, h2 a.anchor:hover, h3 a.anchor:hover, h4 a.anchor:hover, 188 | h5 a.anchor:hover, h6 a.anchor:hover { 189 | color: #777; 190 | background-color: #eee; 191 | } 192 | 193 | a.headerlink { 194 | color: #c60f0f!important; 195 | font-size: 1em; 196 | margin-left: 6px; 197 | padding: 0 4px 0 4px; 198 | text-decoration: none!important; 199 | } 200 | 201 | a.headerlink:hover { 202 | background-color: #ccc; 203 | color: white!important; 204 | } 205 | 206 | 207 | cite, code, tt { 208 | font-family: 'Source Code Pro', monospace; 209 | font-size: 0.9em; 210 | letter-spacing: 0.01em; 211 | background-color: #eeeff2; 212 | font-style: normal; 213 | } 214 | 215 | hr { 216 | border: 1px solid #eee; 217 | margin: 2em; 218 | } 219 | 220 | .highlight { 221 | -webkit-border-radius: 2px; 222 | -moz-border-radius: 2px; 223 | border-radius: 2px; 224 | } 225 | 226 | pre { 227 | font-family: 'Source Code Pro', monospace; 228 | font-style: normal; 229 | font-size: 0.9em; 230 | letter-spacing: 0.015em; 231 | line-height: 120%; 232 | padding: 0.7em; 233 | white-space: pre-wrap; /* css-3 */ 234 | white-space: -moz-pre-wrap; /* Mozilla, since 1999 */ 235 | white-space: -pre-wrap; /* Opera 4-6 */ 236 | white-space: -o-pre-wrap; /* Opera 7 */ 237 | word-wrap: break-word; /* Internet Explorer 5.5+ */ 238 | } 239 | 240 | pre a { 241 | color: inherit; 242 | text-decoration: underline; 243 | } 244 | 245 | td.linenos pre { 246 | padding: 0.5em 0; 247 | } 248 | 249 | div.quotebar { 250 | background-color: #f8f8f8; 251 | max-width: 250px; 252 | float: right; 253 | padding: 2px 7px; 254 | border: 1px solid #ccc; 255 | } 256 | 257 | div.topic { 258 | background-color: #f8f8f8; 259 | } 260 | 261 | table { 262 | border-collapse: collapse; 263 | margin: 0 -0.5em 0 -0.5em; 264 | } 265 | 266 | table td, table th { 267 | padding: 0.2em 0.5em 0.2em 0.5em; 268 | } 269 | 270 | div.admonition { 271 | font-size: 0.9em; 272 | margin: 1em 0 1em 0; 273 | border: 1px solid #eee; 274 | background-color: #f7f7f7; 275 | padding: 0; 276 | -moz-box-shadow: 0px 8px 6px -8px #93a1a1; 277 | -webkit-box-shadow: 0px 8px 6px -8px #93a1a1; 278 | box-shadow: 0px 8px 6px -8px #93a1a1; 279 | } 280 | 281 | div.admonition p { 282 | margin: 0.5em 1em 0.5em 1em; 283 | padding: 0.2em; 284 | } 285 | 286 | div.admonition pre { 287 | margin: 0.4em 1em 0.4em 1em; 288 | } 289 | 290 | div.admonition p.admonition-title 291 | { 292 | margin: 0; 293 | padding: 0.2em 0 0.2em 0.6em; 294 | color: white; 295 | border-bottom: 1px solid #eee8d5; 296 | font-weight: bold; 297 | background-color: #268bd2; 298 | } 299 | 300 | div.warning p.admonition-title, 301 | div.important p.admonition-title { 302 | background-color: #cb4b16; 303 | } 304 | 305 | div.hint p.admonition-title, 306 | div.tip p.admonition-title { 307 | background-color: #859900; 308 | } 309 | 310 | div.caution p.admonition-title, 311 | div.attention p.admonition-title, 312 | div.danger p.admonition-title, 313 | div.error p.admonition-title { 314 | background-color: #dc322f; 315 | } 316 | 317 | div.admonition ul, div.admonition ol { 318 | margin: 0.1em 0.5em 0.5em 3em; 319 | padding: 0; 320 | } 321 | 322 | div.versioninfo { 323 | margin: 1em 0 0 0; 324 | border: 1px solid #eee; 325 | background-color: #DDEAF0; 326 | padding: 8px; 327 | line-height: 1.3em; 328 | font-size: 0.9em; 329 | } 330 | 331 | div.viewcode-block:target { 332 | background-color: #f4debf; 333 | border-top: 1px solid #eee; 334 | border-bottom: 1px solid #eee; 335 | } 336 | 337 | h1, h2, h3 338 | { 339 | font-family: 'Strait', sans-serif; 340 | } 341 | 342 | h1 343 | { 344 | font-size: 23pt; 345 | } 346 | 347 | h2 348 | { 349 | font-size: 16pt; 350 | } 351 | 352 | .headerlink { 353 | float: right; 354 | } 355 | 356 | #re-parse h1 357 | { 358 | font-size: 90pt; 359 | z-index: -2; 360 | padding: 0; 361 | text-shadow: none; 362 | } 363 | 364 | #re-parse h1 a 365 | { 366 | display: none; 367 | } 368 | 369 | #re-parse .external img 370 | { 371 | position: relative; 372 | top: 5px; 373 | margin: 0 3px; 374 | } 375 | -------------------------------------------------------------------------------- /doc/source/static/solarized-dark.css: -------------------------------------------------------------------------------- 1 | /* solarized dark style for solar theme */ 2 | 3 | /*style pre scrollbar*/ 4 | pre::-webkit-scrollbar, .highlight::-webkit-scrollbar { 5 | height: 0.5em; 6 | background: #073642; 7 | } 8 | 9 | pre::-webkit-scrollbar-thumb { 10 | border-radius: 1em; 11 | background: #93a1a1; 12 | } 13 | 14 | /* pygments style */ 15 | .highlight .hll { background-color: #ffffcc } 16 | .highlight { background: #002B36!important; color: #93A1A1 } 17 | .highlight .c { color: #586E75 } /* Comment */ 18 | .highlight .err { color: #93A1A1 } /* Error */ 19 | .highlight .g { color: #93A1A1 } /* Generic */ 20 | .highlight .k { color: #859900 } /* Keyword */ 21 | .highlight .l { color: #93A1A1 } /* Literal */ 22 | .highlight .n { color: #93A1A1 } /* Name */ 23 | .highlight .o { color: #859900 } /* Operator */ 24 | .highlight .x { color: #CB4B16 } /* Other */ 25 | .highlight .p { color: #93A1A1 } /* Punctuation */ 26 | .highlight .cm { color: #586E75 } /* Comment.Multiline */ 27 | .highlight .cp { color: #859900 } /* Comment.Preproc */ 28 | .highlight .c1 { color: #586E75 } /* Comment.Single */ 29 | .highlight .cs { color: #859900 } /* Comment.Special */ 30 | .highlight .gd { color: #2AA198 } /* Generic.Deleted */ 31 | .highlight .ge { color: #93A1A1; font-style: italic } /* Generic.Emph */ 32 | .highlight .gr { color: #DC322F } /* Generic.Error */ 33 | .highlight .gh { color: #CB4B16 } /* Generic.Heading */ 34 | .highlight .gi { color: #859900 } /* Generic.Inserted */ 35 | .highlight .go { color: #93A1A1 } /* Generic.Output */ 36 | .highlight .gp { color: #93A1A1 } /* Generic.Prompt */ 37 | .highlight .gs { color: #93A1A1; font-weight: bold } /* Generic.Strong */ 38 | .highlight .gu { color: #CB4B16 } /* Generic.Subheading */ 39 | .highlight .gt { color: #93A1A1 } /* Generic.Traceback */ 40 | .highlight .kc { color: #CB4B16 } /* Keyword.Constant */ 41 | .highlight .kd { color: #268BD2 } /* Keyword.Declaration */ 42 | .highlight .kn { color: #859900 } /* Keyword.Namespace */ 43 | .highlight .kp { color: #859900 } /* Keyword.Pseudo */ 44 | .highlight .kr { color: #268BD2 } /* Keyword.Reserved */ 45 | .highlight .kt { color: #DC322F } /* Keyword.Type */ 46 | .highlight .ld { color: #93A1A1 } /* Literal.Date */ 47 | .highlight .m { color: #2AA198 } /* Literal.Number */ 48 | .highlight .s { color: #2AA198 } /* Literal.String */ 49 | .highlight .na { color: #93A1A1 } /* Name.Attribute */ 50 | .highlight .nb { color: #B58900 } /* Name.Builtin */ 51 | .highlight .nc { color: #268BD2 } /* Name.Class */ 52 | .highlight .no { color: #CB4B16 } /* Name.Constant */ 53 | .highlight .nd { color: #268BD2 } /* Name.Decorator */ 54 | .highlight .ni { color: #CB4B16 } /* Name.Entity */ 55 | .highlight .ne { color: #CB4B16 } /* Name.Exception */ 56 | .highlight .nf { color: #268BD2 } /* Name.Function */ 57 | .highlight .nl { color: #93A1A1 } /* Name.Label */ 58 | .highlight .nn { color: #93A1A1 } /* Name.Namespace */ 59 | .highlight .nx { color: #93A1A1 } /* Name.Other */ 60 | .highlight .py { color: #93A1A1 } /* Name.Property */ 61 | .highlight .nt { color: #268BD2 } /* Name.Tag */ 62 | .highlight .nv { color: #268BD2 } /* Name.Variable */ 63 | .highlight .ow { color: #859900 } /* Operator.Word */ 64 | .highlight .w { color: #93A1A1 } /* Text.Whitespace */ 65 | .highlight .mf { color: #2AA198 } /* Literal.Number.Float */ 66 | .highlight .mh { color: #2AA198 } /* Literal.Number.Hex */ 67 | .highlight .mi { color: #2AA198 } /* Literal.Number.Integer */ 68 | .highlight .mo { color: #2AA198 } /* Literal.Number.Oct */ 69 | .highlight .sb { color: #586E75 } /* Literal.String.Backtick */ 70 | .highlight .sc { color: #2AA198 } /* Literal.String.Char */ 71 | .highlight .sd { color: #93A1A1 } /* Literal.String.Doc */ 72 | .highlight .s2 { color: #2AA198 } /* Literal.String.Double */ 73 | .highlight .se { color: #CB4B16 } /* Literal.String.Escape */ 74 | .highlight .sh { color: #93A1A1 } /* Literal.String.Heredoc */ 75 | .highlight .si { color: #2AA198 } /* Literal.String.Interpol */ 76 | .highlight .sx { color: #2AA198 } /* Literal.String.Other */ 77 | .highlight .sr { color: #DC322F } /* Literal.String.Regex */ 78 | .highlight .s1 { color: #2AA198 } /* Literal.String.Single */ 79 | .highlight .ss { color: #2AA198 } /* Literal.String.Symbol */ 80 | .highlight .bp { color: #268BD2 } /* Name.Builtin.Pseudo */ 81 | .highlight .vc { color: #268BD2 } /* Name.Variable.Class */ 82 | .highlight .vg { color: #268BD2 } /* Name.Variable.Global */ 83 | .highlight .vi { color: #268BD2 } /* Name.Variable.Instance */ 84 | .highlight .il { color: #2AA198 } /* Literal.Number.Integer.Long */ 85 | -------------------------------------------------------------------------------- /examples/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andychase/reparse/5f46cdd0fc4e239c0ddeca4b542e48a5ae95c508/examples/__init__.py -------------------------------------------------------------------------------- /examples/colortime/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andychase/reparse/5f46cdd0fc4e239c0ddeca4b542e48a5ae95c508/examples/colortime/__init__.py -------------------------------------------------------------------------------- /examples/colortime/colortime.py: -------------------------------------------------------------------------------- 1 | """ Example from docs: 2 | 3 | >>> colortime_parser("~ ~ ~ go to the store ~ buy green at 11pm! ~ ~") 4 | [('green', datetime.time(23, 0))] 5 | 6 | In this case the processing functions weren't specified but you 7 | still get a useful result as a default. 8 | >>> colortime_parser("~ ~ ~ Crazy 2pm green ~ ~") # doctest: +IGNORE_UNICODE 9 | [['green']] 10 | """ 11 | from __future__ import unicode_literals 12 | 13 | # Example stuff ----------------------------------------------------- 14 | # Have to add the parent directory just in case you 15 | # run this file in the demo directory without installing Reparse 16 | import sys 17 | 18 | sys.path.append('../..') 19 | 20 | # If file was imported, include that path 21 | path = "" 22 | if '__file__' in globals(): 23 | import os 24 | 25 | path = str(os.path.dirname(__file__)) 26 | if path: 27 | path += "/" 28 | 29 | # Reparse ---------------------------------------------------------- 30 | from examples.colortime.functions import functions 31 | import reparse 32 | 33 | colortime_parser = reparse.parser( 34 | parser_type=reparse.basic_parser, 35 | expressions_yaml_path=path + "expressions.yaml", 36 | patterns_yaml_path=path + "patterns.yaml", 37 | functions=functions 38 | ) 39 | 40 | if __name__ == "__main__": 41 | print(colortime_parser("~ ~ ~ go to the store ~ buy green at 11pm! ~ ~")) 42 | -------------------------------------------------------------------------------- /examples/colortime/expressions.yaml: -------------------------------------------------------------------------------- 1 | # Base Regular Expression types to be used with patterns 2 | 3 | Time: 4 | Greg Burns: # This is the author of the expression 5 | # I name the expression after him to give him credit 6 | # (It's also difficult to come up with unique names for expressions). 7 | 8 | # Whitespace is ignored, Currently Regexs are run case insensitive 9 | Expression: | 10 | ([0-9]|[1][0-2]) \s? (am|pm) 11 | Description: > 12 | Simple hour and am/pm time. 13 | Matches: 14 | 8am | 8 am 15 | Non-Matches: 16 | 8a | 8 a | 8:00 am 17 | Groups: 18 | - Hour 19 | - AMPM 20 | 21 | Color: 22 | Basic Color: 23 | Expression: (Orange|Green) 24 | Matches: Orange | Green 25 | Non-Matches: Blue 26 | Groups: 27 | - Color 28 | -------------------------------------------------------------------------------- /examples/colortime/functions.py: -------------------------------------------------------------------------------- 1 | """ 2 | This file contains the parsing functions related to parsers 3 | 4 | The point is that each function can take in part of the results and 5 | use custom logic to parse a meaningful output from it. 6 | """ 7 | 8 | from datetime import time 9 | 10 | 11 | # --------------- Functions ------------------ 12 | def color_time(Color=None, Time=None): 13 | Color, Hour, Period = Color[0], int(Time[0]), Time[1] 14 | if Period == 'pm': 15 | Hour += 12 16 | Time = time(hour=Hour) 17 | 18 | return Color, Time 19 | 20 | # --------------- Function list ------------------ 21 | # This is the dictionary that is used by the Reparse 22 | # expression builder. The key is the same value used in the patterns.yaml 23 | # file under ``Function: ``. The value is a reference to function. 24 | 25 | # Defining functions at each level is often useful, 26 | # but there are default functions to save time. 27 | 28 | # This has to go last in this file so that the functions are 29 | # actually defined in Python when we get to this point. 30 | functions = { 31 | # Groups 32 | # Expressions 33 | # Type 34 | # Patterns 35 | 'BasicColorTime': color_time 36 | } 37 | -------------------------------------------------------------------------------- /examples/colortime/patterns.yaml: -------------------------------------------------------------------------------- 1 | # Patterns: weave the regex expression groups together 2 | 3 | BasicColorTime: 4 | # Order: If you have multiple patterns, you can use this field to 'weight' them 5 | # For example, if you have 3 matching patterns, the most complex one 6 | # Is the one you probably want, so put the highest number in Order: on that one. 7 | Order: 1 8 | 9 | # Pattern: Also a regular expression 10 | # Angle brackets tag such as contains the Expression class you want there 11 | # Each class contains one or more groups |ed together. 12 | Pattern: | 13 | (?: \s? at \s? )?