├── .gitignore ├── LICENSE.txt ├── docs.md ├── docs ├── Makefile ├── _static │ └── img │ │ ├── lango_example.png │ │ ├── rule_tree_1.png │ │ ├── rule_tree_2.png │ │ ├── rule_tree_3.png │ │ ├── sent_tree.png │ │ └── sent_tree_pp.png ├── conf.py ├── index.rst ├── installation.rst ├── lango.matcher.rst ├── lango.parser.rst ├── lango.rst ├── make.bat ├── matching.rst └── modules.rst ├── examples ├── matching.py ├── multimatch.py └── parser_input.py ├── lango ├── __init__.py ├── matcher.py └── parser.py ├── readme.md ├── requirements.txt └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | stanford-parser* 3 | stanford-corenlp* 4 | build* 5 | dist* 6 | Lango.egg-info* 7 | _build* 8 | _templates* 9 | *.pyc -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 2, June 1991 3 | 4 | Copyright (C) 1989, 1991 Free Software Foundation, Inc., 5 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA 6 | Everyone is permitted to copy and distribute verbatim copies 7 | of this license document, but changing it is not allowed. 8 | 9 | Preamble 10 | 11 | The licenses for most software are designed to take away your 12 | freedom to share and change it. By contrast, the GNU General Public 13 | License is intended to guarantee your freedom to share and change free 14 | software--to make sure the software is free for all its users. This 15 | General Public License applies to most of the Free Software 16 | Foundation's software and to any other program whose authors commit to 17 | using it. (Some other Free Software Foundation software is covered by 18 | the GNU Lesser General Public License instead.) You can apply it to 19 | your programs, too. 20 | 21 | When we speak of free software, we are referring to freedom, not 22 | price. Our General Public Licenses are designed to make sure that you 23 | have the freedom to distribute copies of free software (and charge for 24 | this service if you wish), that you receive source code or can get it 25 | if you want it, that you can change the software or use pieces of it 26 | in new free programs; and that you know you can do these things. 27 | 28 | To protect your rights, we need to make restrictions that forbid 29 | anyone to deny you these rights or to ask you to surrender the rights. 30 | These restrictions translate to certain responsibilities for you if you 31 | distribute copies of the software, or if you modify it. 32 | 33 | For example, if you distribute copies of such a program, whether 34 | gratis or for a fee, you must give the recipients all the rights that 35 | you have. You must make sure that they, too, receive or can get the 36 | source code. And you must show them these terms so they know their 37 | rights. 38 | 39 | We protect your rights with two steps: (1) copyright the software, and 40 | (2) offer you this license which gives you legal permission to copy, 41 | distribute and/or modify the software. 42 | 43 | Also, for each author's protection and ours, we want to make certain 44 | that everyone understands that there is no warranty for this free 45 | software. If the software is modified by someone else and passed on, we 46 | want its recipients to know that what they have is not the original, so 47 | that any problems introduced by others will not reflect on the original 48 | authors' reputations. 49 | 50 | Finally, any free program is threatened constantly by software 51 | patents. We wish to avoid the danger that redistributors of a free 52 | program will individually obtain patent licenses, in effect making the 53 | program proprietary. To prevent this, we have made it clear that any 54 | patent must be licensed for everyone's free use or not licensed at all. 55 | 56 | The precise terms and conditions for copying, distribution and 57 | modification follow. 58 | 59 | GNU GENERAL PUBLIC LICENSE 60 | TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 61 | 62 | 0. This License applies to any program or other work which contains 63 | a notice placed by the copyright holder saying it may be distributed 64 | under the terms of this General Public License. The "Program", below, 65 | refers to any such program or work, and a "work based on the Program" 66 | means either the Program or any derivative work under copyright law: 67 | that is to say, a work containing the Program or a portion of it, 68 | either verbatim or with modifications and/or translated into another 69 | language. (Hereinafter, translation is included without limitation in 70 | the term "modification".) Each licensee is addressed as "you".x 71 | 72 | Activities other than copying, distribution and modification are not 73 | covered by this License; they are outside its scope. The act of 74 | running the Program is not restricted, and the output from the Program 75 | is covered only if its contents constitute a work based on the 76 | Program (independent of having been made by running the Program). 77 | Whether that is true depends on what the Program does. 78 | 79 | 1. You may copy and distribute verbatim copies of the Program's 80 | source code as you receive it, in any medium, provided that you 81 | conspicuously and appropriately publish on each copy an appropriate 82 | copyright notice and disclaimer of warranty; keep intact all the 83 | notices that refer to this License and to the absence of any warranty; 84 | and give any other recipients of the Program a copy of this License 85 | along with the Program. 86 | 87 | You may charge a fee for the physical act of transferring a copy, and 88 | you may at your option offer warranty protection in exchange for a fee. 89 | 90 | 2. You may modify your copy or copies of the Program or any portion 91 | of it, thus forming a work based on the Program, and copy and 92 | distribute such modifications or work under the terms of Section 1 93 | above, provided that you also meet all of these conditions: 94 | 95 | a) You must cause the modified files to carry prominent notices 96 | stating that you changed the files and the date of any change. 97 | 98 | b) You must cause any work that you distribute or publish, that in 99 | whole or in part contains or is derived from the Program or any 100 | part thereof, to be licensed as a whole at no charge to all third 101 | parties under the terms of this License. 102 | 103 | c) If the modified program normally reads commands interactively 104 | when run, you must cause it, when started running for such 105 | interactive use in the most ordinary way, to print or display an 106 | announcement including an appropriate copyright notice and a 107 | notice that there is no warranty (or else, saying that you provide 108 | a warranty) and that users may redistribute the program under 109 | these conditions, and telling the user how to view a copy of this 110 | License. (Exception: if the Program itself is interactive but 111 | does not normally print such an announcement, your work based on 112 | the Program is not required to print an announcement.) 113 | 114 | These requirements apply to the modified work as a whole. If 115 | identifiable sections of that work are not derived from the Program, 116 | and can be reasonably considered independent and separate works in 117 | themselves, then this License, and its terms, do not apply to those 118 | sections when you distribute them as separate works. But when you 119 | distribute the same sections as part of a whole which is a work based 120 | on the Program, the distribution of the whole must be on the terms of 121 | this License, whose permissions for other licensees extend to the 122 | entire whole, and thus to each and every part regardless of who wrote it. 123 | 124 | Thus, it is not the intent of this section to claim rights or contest 125 | your rights to work written entirely by you; rather, the intent is to 126 | exercise the right to control the distribution of derivative or 127 | collective works based on the Program. 128 | 129 | In addition, mere aggregation of another work not based on the Program 130 | with the Program (or with a work based on the Program) on a volume of 131 | a storage or distribution medium does not bring the other work under 132 | the scope of this License. 133 | 134 | 3. You may copy and distribute the Program (or a work based on it, 135 | under Section 2) in object code or executable form under the terms of 136 | Sections 1 and 2 above provided that you also do one of the following: 137 | 138 | a) Accompany it with the complete corresponding machine-readable 139 | source code, which must be distributed under the terms of Sections 140 | 1 and 2 above on a medium customarily used for software interchange; or, 141 | 142 | b) Accompany it with a written offer, valid for at least three 143 | years, to give any third party, for a charge no more than your 144 | cost of physically performing source distribution, a complete 145 | machine-readable copy of the corresponding source code, to be 146 | distributed under the terms of Sections 1 and 2 above on a medium 147 | customarily used for software interchange; or, 148 | 149 | c) Accompany it with the information you received as to the offer 150 | to distribute corresponding source code. (This alternative is 151 | allowed only for noncommercial distribution and only if you 152 | received the program in object code or executable form with such 153 | an offer, in accord with Subsection b above.) 154 | 155 | The source code for a work means the preferred form of the work for 156 | making modifications to it. For an executable work, complete source 157 | code means all the source code for all modules it contains, plus any 158 | associated interface definition files, plus the scripts used to 159 | control compilation and installation of the executable. However, as a 160 | special exception, the source code distributed need not include 161 | anything that is normally distributed (in either source or binary 162 | form) with the major components (compiler, kernel, and so on) of the 163 | operating system on which the executable runs, unless that component 164 | itself accompanies the executable. 165 | 166 | If distribution of executable or object code is made by offering 167 | access to copy from a designated place, then offering equivalent 168 | access to copy the source code from the same place counts as 169 | distribution of the source code, even though third parties are not 170 | compelled to copy the source along with the object code. 171 | 172 | 4. You may not copy, modify, sublicense, or distribute the Program 173 | except as expressly provided under this License. Any attempt 174 | otherwise to copy, modify, sublicense or distribute the Program is 175 | void, and will automatically terminate your rights under this License. 176 | However, parties who have received copies, or rights, from you under 177 | this License will not have their licenses terminated so long as such 178 | parties remain in full compliance. 179 | 180 | 5. You are not required to accept this License, since you have not 181 | signed it. However, nothing else grants you permission to modify or 182 | distribute the Program or its derivative works. These actions are 183 | prohibited by law if you do not accept this License. Therefore, by 184 | modifying or distributing the Program (or any work based on the 185 | Program), you indicate your acceptance of this License to do so, and 186 | all its terms and conditions for copying, distributing or modifying 187 | the Program or works based on it. 188 | 189 | 6. Each time you redistribute the Program (or any work based on the 190 | Program), the recipient automatically receives a license from the 191 | original licensor to copy, distribute or modify the Program subject to 192 | these terms and conditions. You may not impose any further 193 | restrictions on the recipients' exercise of the rights granted herein. 194 | You are not responsible for enforcing compliance by third parties to 195 | this License. 196 | 197 | 7. If, as a consequence of a court judgment or allegation of patent 198 | infringement or for any other reason (not limited to patent issues), 199 | conditions are imposed on you (whether by court order, agreement or 200 | otherwise) that contradict the conditions of this License, they do not 201 | excuse you from the conditions of this License. If you cannot 202 | distribute so as to satisfy simultaneously your obligations under this 203 | License and any other pertinent obligations, then as a consequence you 204 | may not distribute the Program at all. For example, if a patent 205 | license would not permit royalty-free redistribution of the Program by 206 | all those who receive copies directly or indirectly through you, then 207 | the only way you could satisfy both it and this License would be to 208 | refrain entirely from distribution of the Program. 209 | 210 | If any portion of this section is held invalid or unenforceable under 211 | any particular circumstance, the balance of the section is intended to 212 | apply and the section as a whole is intended to apply in other 213 | circumstances. 214 | 215 | It is not the purpose of this section to induce you to infringe any 216 | patents or other property right claims or to contest validity of any 217 | such claims; this section has the sole purpose of protecting the 218 | integrity of the free software distribution system, which is 219 | implemented by public license practices. Many people have made 220 | generous contributions to the wide range of software distributed 221 | through that system in reliance on consistent application of that 222 | system; it is up to the author/donor to decide if he or she is willing 223 | to distribute software through any other system and a licensee cannot 224 | impose that choice. 225 | 226 | This section is intended to make thoroughly clear what is believed to 227 | be a consequence of the rest of this License. 228 | 229 | 8. If the distribution and/or use of the Program is restricted in 230 | certain countries either by patents or by copyrighted interfaces, the 231 | original copyright holder who places the Program under this License 232 | may add an explicit geographical distribution limitation excluding 233 | those countries, so that distribution is permitted only in or among 234 | countries not thus excluded. In such case, this License incorporates 235 | the limitation as if written in the body of this License. 236 | 237 | 9. The Free Software Foundation may publish revised and/or new versions 238 | of the General Public License from time to time. Such new versions will 239 | be similar in spirit to the present version, but may differ in detail to 240 | address new problems or concerns. 241 | 242 | Each version is given a distinguishing version number. If the Program 243 | specifies a version number of this License which applies to it and "any 244 | later version", you have the option of following the terms and conditions 245 | either of that version or of any later version published by the Free 246 | Software Foundation. If the Program does not specify a version number of 247 | this License, you may choose any version ever published by the Free Software 248 | Foundation. 249 | 250 | 10. If you wish to incorporate parts of the Program into other free 251 | programs whose distribution conditions are different, write to the author 252 | to ask for permission. For software which is copyrighted by the Free 253 | Software Foundation, write to the Free Software Foundation; we sometimes 254 | make exceptions for this. Our decision will be guided by the two goals 255 | of preserving the free status of all derivatives of our free software and 256 | of promoting the sharing and reuse of software generally. 257 | 258 | NO WARRANTY 259 | 260 | 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY 261 | FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN 262 | OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES 263 | PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED 264 | OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 265 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS 266 | TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE 267 | PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, 268 | REPAIR OR CORRECTION. 269 | 270 | 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 271 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR 272 | REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, 273 | INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING 274 | OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED 275 | TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY 276 | YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER 277 | PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE 278 | POSSIBILITY OF SUCH DAMAGES. 279 | 280 | END OF TERMS AND CONDITIONS 281 | 282 | How to Apply These Terms to Your New Programs 283 | 284 | If you develop a new program, and you want it to be of the greatest 285 | possible use to the public, the best way to achieve this is to make it 286 | free software which everyone can redistribute and change under these terms. 287 | 288 | To do so, attach the following notices to the program. It is safest 289 | to attach them to the start of each source file to most effectively 290 | convey the exclusion of warranty; and each file should have at least 291 | the "copyright" line and a pointer to where the full notice is found. 292 | 293 | 294 | Copyright (C) 295 | 296 | This program is free software; you can redistribute it and/or modify 297 | it under the terms of the GNU General Public License as published by 298 | the Free Software Foundation; either version 2 of the License, or 299 | (at your option) any later version. 300 | 301 | This program is distributed in the hope that it will be useful, 302 | but WITHOUT ANY WARRANTY; without even the implied warranty of 303 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 304 | GNU General Public License for more details. 305 | 306 | You should have received a copy of the GNU General Public License along 307 | with this program; if not, write to the Free Software Foundation, Inc., 308 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. 309 | 310 | Also add information on how to contact you by electronic and paper mail. 311 | 312 | If the program is interactive, make it output a short notice like this 313 | when it starts in an interactive mode: 314 | 315 | Gnomovision version 69, Copyright (C) year name of author 316 | Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 317 | This is free software, and you are welcome to redistribute it 318 | under certain conditions; type `show c' for details. 319 | 320 | The hypothetical commands `show w' and `show c' should show the appropriate 321 | parts of the General Public License. Of course, the commands you use may 322 | be called something other than `show w' and `show c'; they could even be 323 | mouse-clicks or menu items--whatever suits your program. 324 | 325 | You should also get your employer (if you work as a programmer) or your 326 | school, if any, to sign a "copyright disclaimer" for the program, if 327 | necessary. Here is a sample; alter the names: 328 | 329 | Yoyodyne, Inc., hereby disclaims all copyright interest in the program 330 | `Gnomovision' (which makes passes at compilers) written by James Hacker. 331 | 332 | , 1 April 1989 333 | Ty Coon, President of Vice 334 | 335 | This General Public License does not permit incorporating your program into 336 | proprietary programs. If your program is a subroutine library, you may 337 | consider it more useful to permit linking proprietary applications with the 338 | library. If this is what you want to do, use the GNU Lesser General 339 | Public License instead of this License. 340 | -------------------------------------------------------------------------------- /docs.md: -------------------------------------------------------------------------------- 1 | # Docs 2 | 3 | Pip Installs 4 | ``` 5 | sphinx-autobuild==0.6.0 6 | sphinx-rtd-theme==0.1.9 7 | sphinxcontrib-napoleon==0.5.0 8 | ``` 9 | 10 | Generate docs 11 | ``` 12 | sphinx-apidoc -f -e -o docs lango 13 | cd docs 14 | make html 15 | ``` 16 | 17 | ## Development 18 | 19 | ``` 20 | python setup.py develop 21 | ``` -------------------------------------------------------------------------------- /docs/Makefile: -------------------------------------------------------------------------------- 1 | # Makefile for Sphinx documentation 2 | # 3 | 4 | # You can set these variables from the command line. 5 | SPHINXOPTS = 6 | SPHINXBUILD = sphinx-build 7 | PAPER = 8 | BUILDDIR = _build 9 | 10 | # User-friendly check for sphinx-build 11 | ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) 12 | $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) 13 | endif 14 | 15 | # Internal variables. 16 | PAPEROPT_a4 = -D latex_paper_size=a4 17 | PAPEROPT_letter = -D latex_paper_size=letter 18 | ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . 19 | # the i18n builder cannot share the environment and doctrees with the others 20 | I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . 21 | 22 | .PHONY: help 23 | help: 24 | @echo "Please use \`make ' where is one of" 25 | @echo " html to make standalone HTML files" 26 | @echo " dirhtml to make HTML files named index.html in directories" 27 | @echo " singlehtml to make a single large HTML file" 28 | @echo " pickle to make pickle files" 29 | @echo " json to make JSON files" 30 | @echo " htmlhelp to make HTML files and a HTML help project" 31 | @echo " qthelp to make HTML files and a qthelp project" 32 | @echo " applehelp to make an Apple Help Book" 33 | @echo " devhelp to make HTML files and a Devhelp project" 34 | @echo " epub to make an epub" 35 | @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" 36 | @echo " latexpdf to make LaTeX files and run them through pdflatex" 37 | @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" 38 | @echo " text to make text files" 39 | @echo " man to make manual pages" 40 | @echo " texinfo to make Texinfo files" 41 | @echo " info to make Texinfo files and run them through makeinfo" 42 | @echo " gettext to make PO message catalogs" 43 | @echo " changes to make an overview of all changed/added/deprecated items" 44 | @echo " xml to make Docutils-native XML files" 45 | @echo " pseudoxml to make pseudoxml-XML files for display purposes" 46 | @echo " linkcheck to check all external links for integrity" 47 | @echo " doctest to run all doctests embedded in the documentation (if enabled)" 48 | @echo " coverage to run coverage check of the documentation (if enabled)" 49 | 50 | .PHONY: clean 51 | clean: 52 | rm -rf $(BUILDDIR)/* 53 | 54 | .PHONY: html 55 | html: 56 | $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html 57 | @echo 58 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." 59 | 60 | .PHONY: dirhtml 61 | dirhtml: 62 | $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml 63 | @echo 64 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." 65 | 66 | .PHONY: singlehtml 67 | singlehtml: 68 | $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml 69 | @echo 70 | @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." 71 | 72 | .PHONY: pickle 73 | pickle: 74 | $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle 75 | @echo 76 | @echo "Build finished; now you can process the pickle files." 77 | 78 | .PHONY: json 79 | json: 80 | $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json 81 | @echo 82 | @echo "Build finished; now you can process the JSON files." 83 | 84 | .PHONY: htmlhelp 85 | htmlhelp: 86 | $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp 87 | @echo 88 | @echo "Build finished; now you can run HTML Help Workshop with the" \ 89 | ".hhp project file in $(BUILDDIR)/htmlhelp." 90 | 91 | .PHONY: qthelp 92 | qthelp: 93 | $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp 94 | @echo 95 | @echo "Build finished; now you can run "qcollectiongenerator" with the" \ 96 | ".qhcp project file in $(BUILDDIR)/qthelp, like this:" 97 | @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/lango.qhcp" 98 | @echo "To view the help file:" 99 | @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/lango.qhc" 100 | 101 | .PHONY: applehelp 102 | applehelp: 103 | $(SPHINXBUILD) -b applehelp $(ALLSPHINXOPTS) $(BUILDDIR)/applehelp 104 | @echo 105 | @echo "Build finished. The help book is in $(BUILDDIR)/applehelp." 106 | @echo "N.B. You won't be able to view it unless you put it in" \ 107 | "~/Library/Documentation/Help or install it in your application" \ 108 | "bundle." 109 | 110 | .PHONY: devhelp 111 | devhelp: 112 | $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp 113 | @echo 114 | @echo "Build finished." 115 | @echo "To view the help file:" 116 | @echo "# mkdir -p $$HOME/.local/share/devhelp/lango" 117 | @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/lango" 118 | @echo "# devhelp" 119 | 120 | .PHONY: epub 121 | epub: 122 | $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub 123 | @echo 124 | @echo "Build finished. The epub file is in $(BUILDDIR)/epub." 125 | 126 | .PHONY: latex 127 | latex: 128 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 129 | @echo 130 | @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." 131 | @echo "Run \`make' in that directory to run these through (pdf)latex" \ 132 | "(use \`make latexpdf' here to do that automatically)." 133 | 134 | .PHONY: latexpdf 135 | latexpdf: 136 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 137 | @echo "Running LaTeX files through pdflatex..." 138 | $(MAKE) -C $(BUILDDIR)/latex all-pdf 139 | @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." 140 | 141 | .PHONY: latexpdfja 142 | latexpdfja: 143 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 144 | @echo "Running LaTeX files through platex and dvipdfmx..." 145 | $(MAKE) -C $(BUILDDIR)/latex all-pdf-ja 146 | @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." 147 | 148 | .PHONY: text 149 | text: 150 | $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text 151 | @echo 152 | @echo "Build finished. The text files are in $(BUILDDIR)/text." 153 | 154 | .PHONY: man 155 | man: 156 | $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man 157 | @echo 158 | @echo "Build finished. The manual pages are in $(BUILDDIR)/man." 159 | 160 | .PHONY: texinfo 161 | texinfo: 162 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo 163 | @echo 164 | @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." 165 | @echo "Run \`make' in that directory to run these through makeinfo" \ 166 | "(use \`make info' here to do that automatically)." 167 | 168 | .PHONY: info 169 | info: 170 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo 171 | @echo "Running Texinfo files through makeinfo..." 172 | make -C $(BUILDDIR)/texinfo info 173 | @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." 174 | 175 | .PHONY: gettext 176 | gettext: 177 | $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale 178 | @echo 179 | @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." 180 | 181 | .PHONY: changes 182 | changes: 183 | $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes 184 | @echo 185 | @echo "The overview file is in $(BUILDDIR)/changes." 186 | 187 | .PHONY: linkcheck 188 | linkcheck: 189 | $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck 190 | @echo 191 | @echo "Link check complete; look for any errors in the above output " \ 192 | "or in $(BUILDDIR)/linkcheck/output.txt." 193 | 194 | .PHONY: doctest 195 | doctest: 196 | $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest 197 | @echo "Testing of doctests in the sources finished, look at the " \ 198 | "results in $(BUILDDIR)/doctest/output.txt." 199 | 200 | .PHONY: coverage 201 | coverage: 202 | $(SPHINXBUILD) -b coverage $(ALLSPHINXOPTS) $(BUILDDIR)/coverage 203 | @echo "Testing of coverage in the sources finished, look at the " \ 204 | "results in $(BUILDDIR)/coverage/python.txt." 205 | 206 | .PHONY: xml 207 | xml: 208 | $(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml 209 | @echo 210 | @echo "Build finished. The XML files are in $(BUILDDIR)/xml." 211 | 212 | .PHONY: pseudoxml 213 | pseudoxml: 214 | $(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml 215 | @echo 216 | @echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml." 217 | -------------------------------------------------------------------------------- /docs/_static/img/lango_example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ayoungprogrammer/Lango/0c4284c153abc2d8de4b03a86731bd84385e6afa/docs/_static/img/lango_example.png -------------------------------------------------------------------------------- /docs/_static/img/rule_tree_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ayoungprogrammer/Lango/0c4284c153abc2d8de4b03a86731bd84385e6afa/docs/_static/img/rule_tree_1.png -------------------------------------------------------------------------------- /docs/_static/img/rule_tree_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ayoungprogrammer/Lango/0c4284c153abc2d8de4b03a86731bd84385e6afa/docs/_static/img/rule_tree_2.png -------------------------------------------------------------------------------- /docs/_static/img/rule_tree_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ayoungprogrammer/Lango/0c4284c153abc2d8de4b03a86731bd84385e6afa/docs/_static/img/rule_tree_3.png -------------------------------------------------------------------------------- /docs/_static/img/sent_tree.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ayoungprogrammer/Lango/0c4284c153abc2d8de4b03a86731bd84385e6afa/docs/_static/img/sent_tree.png -------------------------------------------------------------------------------- /docs/_static/img/sent_tree_pp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ayoungprogrammer/Lango/0c4284c153abc2d8de4b03a86731bd84385e6afa/docs/_static/img/sent_tree_pp.png -------------------------------------------------------------------------------- /docs/conf.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # lango documentation build configuration file, created by 4 | # sphinx-quickstart on Wed May 25 00:07:47 2016. 5 | # 6 | # This file is execfile()d with the current directory set to its 7 | # containing dir. 8 | # 9 | # Note that not all possible configuration values are present in this 10 | # autogenerated file. 11 | # 12 | # All configuration values have a default; values that are commented out 13 | # serve to show the default. 14 | 15 | import sys 16 | import os 17 | import lango 18 | 19 | # If extensions (or modules to document with autodoc) are in another directory, 20 | # add these directories to sys.path here. If the directory is relative to the 21 | # documentation root, use os.path.abspath to make it absolute, like shown here. 22 | #sys.path.insert(0, os.path.abspath('.')) 23 | 24 | # -- General configuration ------------------------------------------------ 25 | 26 | # If your documentation needs a minimal Sphinx version, state it here. 27 | #needs_sphinx = '1.0' 28 | 29 | # Add any Sphinx extension module names here, as strings. They can be 30 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom 31 | # ones. 32 | extensions = [ 33 | 'sphinx.ext.autodoc', 34 | 'sphinxcontrib.napoleon', 35 | 'sphinx.ext.todo', 36 | 'sphinx.ext.viewcode', 37 | ] 38 | 39 | # Add any paths that contain templates here, relative to this directory. 40 | templates_path = ['_templates'] 41 | 42 | # The suffix(es) of source filenames. 43 | # You can specify multiple suffix as a list of string: 44 | # source_suffix = ['.rst', '.md'] 45 | source_suffix = '.rst' 46 | 47 | # The encoding of source files. 48 | #source_encoding = 'utf-8-sig' 49 | 50 | # The master toctree document. 51 | master_doc = 'index' 52 | 53 | # General information about the project. 54 | project = 'lango' 55 | copyright = '2016, Michael Young' 56 | author = 'Michael Young' 57 | 58 | # The version info for the project you're documenting, acts as replacement for 59 | # |version| and |release|, also used in various other places throughout the 60 | # built documents. 61 | # 62 | # The short X.Y version. 63 | version = lango.__version__ 64 | # The full version, including alpha/beta/rc tags. 65 | release = version 66 | 67 | # The language for content autogenerated by Sphinx. Refer to documentation 68 | # for a list of supported languages. 69 | # 70 | # This is also used if you do content translation via gettext catalogs. 71 | # Usually you set "language" from the command line for these cases. 72 | language = 'en' 73 | 74 | # There are two options for replacing |today|: either, you set today to some 75 | # non-false value, then it is used: 76 | #today = '' 77 | # Else, today_fmt is used as the format for a strftime call. 78 | #today_fmt = '%B %d, %Y' 79 | 80 | # List of patterns, relative to source directory, that match files and 81 | # directories to ignore when looking for source files. 82 | exclude_patterns = ['_build'] 83 | 84 | # The reST default role (used for this markup: `text`) to use for all 85 | # documents. 86 | #default_role = None 87 | 88 | # If true, '()' will be appended to :func: etc. cross-reference text. 89 | #add_function_parentheses = True 90 | 91 | # If true, the current module name will be prepended to all description 92 | # unit titles (such as .. function::). 93 | #add_module_names = True 94 | 95 | # If true, sectionauthor and moduleauthor directives will be shown in the 96 | # output. They are ignored by default. 97 | #show_authors = False 98 | 99 | # The name of the Pygments (syntax highlighting) style to use. 100 | pygments_style = 'sphinx' 101 | 102 | # A list of ignored prefixes for module index sorting. 103 | #modindex_common_prefix = [] 104 | 105 | # If true, keep warnings as "system message" paragraphs in the built documents. 106 | #keep_warnings = False 107 | 108 | # If true, `todo` and `todoList` produce output, else they produce nothing. 109 | todo_include_todos = True 110 | 111 | 112 | # -- Options for HTML output ---------------------------------------------- 113 | 114 | # The theme to use for HTML and HTML Help pages. See the documentation for 115 | # a list of builtin themes. 116 | html_theme = 'classic' 117 | 118 | # Theme options are theme-specific and customize the look and feel of a theme 119 | # further. For a list of options available for each theme, see the 120 | # documentation. 121 | #html_theme_options = {} 122 | 123 | # Add any paths that contain custom themes here, relative to this directory. 124 | #html_theme_path = [] 125 | 126 | # The name for this set of Sphinx documents. If None, it defaults to 127 | # " v documentation". 128 | #html_title = None 129 | 130 | # A shorter title for the navigation bar. Default is the same as html_title. 131 | #html_short_title = None 132 | 133 | # The name of an image file (relative to this directory) to place at the top 134 | # of the sidebar. 135 | #html_logo = None 136 | 137 | # The name of an image file (within the static path) to use as favicon of the 138 | # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 139 | # pixels large. 140 | #html_favicon = None 141 | 142 | # Add any paths that contain custom static files (such as style sheets) here, 143 | # relative to this directory. They are copied after the builtin static files, 144 | # so a file named "default.css" will overwrite the builtin "default.css". 145 | html_static_path = ['_static'] 146 | 147 | # Add any extra paths that contain custom files (such as robots.txt or 148 | # .htaccess) here, relative to this directory. These files are copied 149 | # directly to the root of the documentation. 150 | #html_extra_path = [] 151 | 152 | # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, 153 | # using the given strftime format. 154 | #html_last_updated_fmt = '%b %d, %Y' 155 | 156 | # If true, SmartyPants will be used to convert quotes and dashes to 157 | # typographically correct entities. 158 | #html_use_smartypants = True 159 | 160 | # Custom sidebar templates, maps document names to template names. 161 | #html_sidebars = {} 162 | 163 | # Additional templates that should be rendered to pages, maps page names to 164 | # template names. 165 | #html_additional_pages = {} 166 | 167 | # If false, no module index is generated. 168 | #html_domain_indices = True 169 | 170 | # If false, no index is generated. 171 | #html_use_index = True 172 | 173 | # If true, the index is split into individual pages for each letter. 174 | #html_split_index = False 175 | 176 | # If true, links to the reST sources are added to the pages. 177 | #html_show_sourcelink = True 178 | 179 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. 180 | #html_show_sphinx = True 181 | 182 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. 183 | #html_show_copyright = True 184 | 185 | # If true, an OpenSearch description file will be output, and all pages will 186 | # contain a tag referring to it. The value of this option must be the 187 | # base URL from which the finished HTML is served. 188 | #html_use_opensearch = '' 189 | 190 | # This is the file name suffix for HTML files (e.g. ".xhtml"). 191 | #html_file_suffix = None 192 | 193 | # Language to be used for generating the HTML full-text search index. 194 | # Sphinx supports the following languages: 195 | # 'da', 'de', 'en', 'es', 'fi', 'fr', 'hu', 'it', 'ja' 196 | # 'nl', 'no', 'pt', 'ro', 'ru', 'sv', 'tr' 197 | #html_search_language = 'en' 198 | 199 | # A dictionary with options for the search language support, empty by default. 200 | # Now only 'ja' uses this config value 201 | #html_search_options = {'type': 'default'} 202 | 203 | # The name of a javascript file (relative to the configuration directory) that 204 | # implements a search results scorer. If empty, the default will be used. 205 | #html_search_scorer = 'scorer.js' 206 | 207 | # Output file base name for HTML help builder. 208 | htmlhelp_basename = 'langodoc' 209 | 210 | # -- Options for LaTeX output --------------------------------------------- 211 | 212 | latex_elements = { 213 | # The paper size ('letterpaper' or 'a4paper'). 214 | #'papersize': 'letterpaper', 215 | 216 | # The font size ('10pt', '11pt' or '12pt'). 217 | #'pointsize': '10pt', 218 | 219 | # Additional stuff for the LaTeX preamble. 220 | #'preamble': '', 221 | 222 | # Latex figure (float) alignment 223 | #'figure_align': 'htbp', 224 | } 225 | 226 | # Grouping the document tree into LaTeX files. List of tuples 227 | # (source start file, target name, title, 228 | # author, documentclass [howto, manual, or own class]). 229 | latex_documents = [ 230 | (master_doc, 'lango.tex', 'lango Documentation', 231 | 'Michael Young', 'manual'), 232 | ] 233 | 234 | # The name of an image file (relative to this directory) to place at the top of 235 | # the title page. 236 | #latex_logo = None 237 | 238 | # For "manual" documents, if this is true, then toplevel headings are parts, 239 | # not chapters. 240 | #latex_use_parts = False 241 | 242 | # If true, show page references after internal links. 243 | #latex_show_pagerefs = False 244 | 245 | # If true, show URL addresses after external links. 246 | #latex_show_urls = False 247 | 248 | # Documents to append as an appendix to all manuals. 249 | #latex_appendices = [] 250 | 251 | # If false, no module index is generated. 252 | #latex_domain_indices = True 253 | 254 | 255 | # -- Options for manual page output --------------------------------------- 256 | 257 | # One entry per manual page. List of tuples 258 | # (source start file, name, description, authors, manual section). 259 | man_pages = [ 260 | (master_doc, 'lango', 'lango Documentation', 261 | [author], 1) 262 | ] 263 | 264 | # If true, show URL addresses after external links. 265 | #man_show_urls = False 266 | 267 | 268 | # -- Options for Texinfo output ------------------------------------------- 269 | 270 | # Grouping the document tree into Texinfo files. List of tuples 271 | # (source start file, target name, title, author, 272 | # dir menu entry, description, category) 273 | texinfo_documents = [ 274 | (master_doc, 'lango', 'lango Documentation', 275 | author, 'lango', 'One line description of project.', 276 | 'Miscellaneous'), 277 | ] 278 | 279 | # Documents to append as an appendix to all manuals. 280 | #texinfo_appendices = [] 281 | 282 | # If false, no module index is generated. 283 | #texinfo_domain_indices = True 284 | 285 | # How to display URL addresses: 'footnote', 'no', or 'inline'. 286 | #texinfo_show_urls = 'footnote' 287 | 288 | # If true, do not generate a @detailmenu in the "Top" node's menu. 289 | #texinfo_no_detailmenu = False 290 | 291 | 292 | # -- Options for Epub output ---------------------------------------------- 293 | 294 | # Bibliographic Dublin Core info. 295 | epub_title = project 296 | epub_author = author 297 | epub_publisher = author 298 | epub_copyright = copyright 299 | 300 | # The basename for the epub file. It defaults to the project name. 301 | #epub_basename = project 302 | 303 | # The HTML theme for the epub output. Since the default themes are not 304 | # optimized for small screen space, using the same theme for HTML and epub 305 | # output is usually not wise. This defaults to 'epub', a theme designed to save 306 | # visual space. 307 | #epub_theme = 'epub' 308 | 309 | # The language of the text. It defaults to the language option 310 | # or 'en' if the language is not set. 311 | #epub_language = '' 312 | 313 | # The scheme of the identifier. Typical schemes are ISBN or URL. 314 | #epub_scheme = '' 315 | 316 | # The unique identifier of the text. This can be a ISBN number 317 | # or the project homepage. 318 | #epub_identifier = '' 319 | 320 | # A unique identification for the text. 321 | #epub_uid = '' 322 | 323 | # A tuple containing the cover image and cover page html template filenames. 324 | #epub_cover = () 325 | 326 | # A sequence of (type, uri, title) tuples for the guide element of content.opf. 327 | #epub_guide = () 328 | 329 | # HTML files that should be inserted before the pages created by sphinx. 330 | # The format is a list of tuples containing the path and title. 331 | #epub_pre_files = [] 332 | 333 | # HTML files that should be inserted after the pages created by sphinx. 334 | # The format is a list of tuples containing the path and title. 335 | #epub_post_files = [] 336 | 337 | # A list of files that should not be packed into the epub file. 338 | epub_exclude_files = ['search.html'] 339 | 340 | # The depth of the table of contents in toc.ncx. 341 | #epub_tocdepth = 3 342 | 343 | # Allow duplicate toc entries. 344 | #epub_tocdup = True 345 | 346 | # Choose between 'default' and 'includehidden'. 347 | #epub_tocscope = 'default' 348 | 349 | # Fix unsupported image types using the Pillow. 350 | #epub_fix_images = False 351 | 352 | # Scale large images. 353 | #epub_max_image_width = 0 354 | 355 | # How to display URL addresses: 'footnote', 'no', or 'inline'. 356 | #epub_show_urls = 'inline' 357 | 358 | # If false, no index is generated. 359 | #epub_use_index = True 360 | -------------------------------------------------------------------------------- /docs/index.rst: -------------------------------------------------------------------------------- 1 | .. lango documentation master file, created by 2 | sphinx-quickstart on Wed May 25 00:07:47 2016. 3 | You can adapt this file completely to your liking, but it should at least 4 | contain the root `toctree` directive. 5 | 6 | Welcome to Lango's documentation! 7 | ================================= 8 | 9 | .. toctree:: 10 | 11 | installation 12 | matching 13 | 14 | Reference 15 | ========== 16 | 17 | .. toctree:: 18 | :maxdepth: 4 19 | 20 | lango 21 | 22 | 23 | Indices and tables 24 | ================== 25 | 26 | * :ref:`genindex` 27 | * :ref:`modindex` 28 | * :ref:`search` 29 | 30 | -------------------------------------------------------------------------------- /docs/installation.rst: -------------------------------------------------------------------------------- 1 | Installation 2 | ================================= 3 | 4 | Install package with pip 5 | ~~~~~~~~~~~~~~~~~~~~~~~~ 6 | 7 | :: 8 | 9 | pip install lango 10 | 11 | Download Stanford CoreNLP 12 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 13 | 14 | Make sure you have Java installed for the Stanford CoreNLP to work. 15 | 16 | `Download Stanford CoreNLP`_ 17 | 18 | Extract to any folder 19 | 20 | Run Server 21 | ~~~~~~~~~~~~~~~~~~~~~~~~~ 22 | 23 | In extracted folder, run the following command to start the server: 24 | 25 | :: 26 | 27 | java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer 28 | 29 | .. _Download Stanford CoreNLP: http://stanfordnlp.github.io/CoreNLP/#download -------------------------------------------------------------------------------- /docs/lango.matcher.rst: -------------------------------------------------------------------------------- 1 | lango.matcher module 2 | ==================== 3 | 4 | .. automodule:: lango.matcher 5 | :members: 6 | :undoc-members: 7 | :show-inheritance: 8 | -------------------------------------------------------------------------------- /docs/lango.parser.rst: -------------------------------------------------------------------------------- 1 | lango.parser module 2 | =================== 3 | 4 | .. automodule:: lango.parser 5 | :members: 6 | :undoc-members: 7 | :show-inheritance: 8 | -------------------------------------------------------------------------------- /docs/lango.rst: -------------------------------------------------------------------------------- 1 | lango package 2 | ============= 3 | 4 | Submodules 5 | ---------- 6 | 7 | .. toctree:: 8 | 9 | lango.matcher 10 | lango.parser 11 | 12 | Module contents 13 | --------------- 14 | 15 | .. automodule:: lango 16 | :members: 17 | :undoc-members: 18 | :show-inheritance: 19 | -------------------------------------------------------------------------------- /docs/make.bat: -------------------------------------------------------------------------------- 1 | @ECHO OFF 2 | 3 | REM Command file for Sphinx documentation 4 | 5 | if "%SPHINXBUILD%" == "" ( 6 | set SPHINXBUILD=sphinx-build 7 | ) 8 | set BUILDDIR=_build 9 | set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% . 10 | set I18NSPHINXOPTS=%SPHINXOPTS% . 11 | if NOT "%PAPER%" == "" ( 12 | set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS% 13 | set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS% 14 | ) 15 | 16 | if "%1" == "" goto help 17 | 18 | if "%1" == "help" ( 19 | :help 20 | echo.Please use `make ^` where ^ is one of 21 | echo. html to make standalone HTML files 22 | echo. dirhtml to make HTML files named index.html in directories 23 | echo. singlehtml to make a single large HTML file 24 | echo. pickle to make pickle files 25 | echo. json to make JSON files 26 | echo. htmlhelp to make HTML files and a HTML help project 27 | echo. qthelp to make HTML files and a qthelp project 28 | echo. devhelp to make HTML files and a Devhelp project 29 | echo. epub to make an epub 30 | echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter 31 | echo. text to make text files 32 | echo. man to make manual pages 33 | echo. texinfo to make Texinfo files 34 | echo. gettext to make PO message catalogs 35 | echo. changes to make an overview over all changed/added/deprecated items 36 | echo. xml to make Docutils-native XML files 37 | echo. pseudoxml to make pseudoxml-XML files for display purposes 38 | echo. linkcheck to check all external links for integrity 39 | echo. doctest to run all doctests embedded in the documentation if enabled 40 | echo. coverage to run coverage check of the documentation if enabled 41 | goto end 42 | ) 43 | 44 | if "%1" == "clean" ( 45 | for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i 46 | del /q /s %BUILDDIR%\* 47 | goto end 48 | ) 49 | 50 | 51 | REM Check if sphinx-build is available and fallback to Python version if any 52 | %SPHINXBUILD% 1>NUL 2>NUL 53 | if errorlevel 9009 goto sphinx_python 54 | goto sphinx_ok 55 | 56 | :sphinx_python 57 | 58 | set SPHINXBUILD=python -m sphinx.__init__ 59 | %SPHINXBUILD% 2> nul 60 | if errorlevel 9009 ( 61 | echo. 62 | echo.The 'sphinx-build' command was not found. Make sure you have Sphinx 63 | echo.installed, then set the SPHINXBUILD environment variable to point 64 | echo.to the full path of the 'sphinx-build' executable. Alternatively you 65 | echo.may add the Sphinx directory to PATH. 66 | echo. 67 | echo.If you don't have Sphinx installed, grab it from 68 | echo.http://sphinx-doc.org/ 69 | exit /b 1 70 | ) 71 | 72 | :sphinx_ok 73 | 74 | 75 | if "%1" == "html" ( 76 | %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html 77 | if errorlevel 1 exit /b 1 78 | echo. 79 | echo.Build finished. The HTML pages are in %BUILDDIR%/html. 80 | goto end 81 | ) 82 | 83 | if "%1" == "dirhtml" ( 84 | %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml 85 | if errorlevel 1 exit /b 1 86 | echo. 87 | echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml. 88 | goto end 89 | ) 90 | 91 | if "%1" == "singlehtml" ( 92 | %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml 93 | if errorlevel 1 exit /b 1 94 | echo. 95 | echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml. 96 | goto end 97 | ) 98 | 99 | if "%1" == "pickle" ( 100 | %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle 101 | if errorlevel 1 exit /b 1 102 | echo. 103 | echo.Build finished; now you can process the pickle files. 104 | goto end 105 | ) 106 | 107 | if "%1" == "json" ( 108 | %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json 109 | if errorlevel 1 exit /b 1 110 | echo. 111 | echo.Build finished; now you can process the JSON files. 112 | goto end 113 | ) 114 | 115 | if "%1" == "htmlhelp" ( 116 | %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp 117 | if errorlevel 1 exit /b 1 118 | echo. 119 | echo.Build finished; now you can run HTML Help Workshop with the ^ 120 | .hhp project file in %BUILDDIR%/htmlhelp. 121 | goto end 122 | ) 123 | 124 | if "%1" == "qthelp" ( 125 | %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp 126 | if errorlevel 1 exit /b 1 127 | echo. 128 | echo.Build finished; now you can run "qcollectiongenerator" with the ^ 129 | .qhcp project file in %BUILDDIR%/qthelp, like this: 130 | echo.^> qcollectiongenerator %BUILDDIR%\qthelp\lango.qhcp 131 | echo.To view the help file: 132 | echo.^> assistant -collectionFile %BUILDDIR%\qthelp\lango.ghc 133 | goto end 134 | ) 135 | 136 | if "%1" == "devhelp" ( 137 | %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp 138 | if errorlevel 1 exit /b 1 139 | echo. 140 | echo.Build finished. 141 | goto end 142 | ) 143 | 144 | if "%1" == "epub" ( 145 | %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub 146 | if errorlevel 1 exit /b 1 147 | echo. 148 | echo.Build finished. The epub file is in %BUILDDIR%/epub. 149 | goto end 150 | ) 151 | 152 | if "%1" == "latex" ( 153 | %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex 154 | if errorlevel 1 exit /b 1 155 | echo. 156 | echo.Build finished; the LaTeX files are in %BUILDDIR%/latex. 157 | goto end 158 | ) 159 | 160 | if "%1" == "latexpdf" ( 161 | %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex 162 | cd %BUILDDIR%/latex 163 | make all-pdf 164 | cd %~dp0 165 | echo. 166 | echo.Build finished; the PDF files are in %BUILDDIR%/latex. 167 | goto end 168 | ) 169 | 170 | if "%1" == "latexpdfja" ( 171 | %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex 172 | cd %BUILDDIR%/latex 173 | make all-pdf-ja 174 | cd %~dp0 175 | echo. 176 | echo.Build finished; the PDF files are in %BUILDDIR%/latex. 177 | goto end 178 | ) 179 | 180 | if "%1" == "text" ( 181 | %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text 182 | if errorlevel 1 exit /b 1 183 | echo. 184 | echo.Build finished. The text files are in %BUILDDIR%/text. 185 | goto end 186 | ) 187 | 188 | if "%1" == "man" ( 189 | %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man 190 | if errorlevel 1 exit /b 1 191 | echo. 192 | echo.Build finished. The manual pages are in %BUILDDIR%/man. 193 | goto end 194 | ) 195 | 196 | if "%1" == "texinfo" ( 197 | %SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo 198 | if errorlevel 1 exit /b 1 199 | echo. 200 | echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo. 201 | goto end 202 | ) 203 | 204 | if "%1" == "gettext" ( 205 | %SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale 206 | if errorlevel 1 exit /b 1 207 | echo. 208 | echo.Build finished. The message catalogs are in %BUILDDIR%/locale. 209 | goto end 210 | ) 211 | 212 | if "%1" == "changes" ( 213 | %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes 214 | if errorlevel 1 exit /b 1 215 | echo. 216 | echo.The overview file is in %BUILDDIR%/changes. 217 | goto end 218 | ) 219 | 220 | if "%1" == "linkcheck" ( 221 | %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck 222 | if errorlevel 1 exit /b 1 223 | echo. 224 | echo.Link check complete; look for any errors in the above output ^ 225 | or in %BUILDDIR%/linkcheck/output.txt. 226 | goto end 227 | ) 228 | 229 | if "%1" == "doctest" ( 230 | %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest 231 | if errorlevel 1 exit /b 1 232 | echo. 233 | echo.Testing of doctests in the sources finished, look at the ^ 234 | results in %BUILDDIR%/doctest/output.txt. 235 | goto end 236 | ) 237 | 238 | if "%1" == "coverage" ( 239 | %SPHINXBUILD% -b coverage %ALLSPHINXOPTS% %BUILDDIR%/coverage 240 | if errorlevel 1 exit /b 1 241 | echo. 242 | echo.Testing of coverage in the sources finished, look at the ^ 243 | results in %BUILDDIR%/coverage/python.txt. 244 | goto end 245 | ) 246 | 247 | if "%1" == "xml" ( 248 | %SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml 249 | if errorlevel 1 exit /b 1 250 | echo. 251 | echo.Build finished. The XML files are in %BUILDDIR%/xml. 252 | goto end 253 | ) 254 | 255 | if "%1" == "pseudoxml" ( 256 | %SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml 257 | if errorlevel 1 exit /b 1 258 | echo. 259 | echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml. 260 | goto end 261 | ) 262 | 263 | :end 264 | -------------------------------------------------------------------------------- /docs/matching.rst: -------------------------------------------------------------------------------- 1 | Matching 2 | -------- 3 | 4 | Matching is done by comparing a set rules and matching it with a parse 5 | tree. You can see parse trees for sentences from 6 | examples/parser\_input.py. 7 | 8 | The set of rules is recursive and can match multiple parts of the parse 9 | tree. 10 | 11 | Rules can be broken down into smaller parts: - Tag - Token - Token Tree 12 | - Rules 13 | 14 | Tag 15 | ~~~ 16 | 17 | A tag is a POS (part of speech) tag to match. A list of POS tags used by 18 | the Stanford Parser can be found `here`_. 19 | 20 | :: 21 | 22 | Format: 23 | tag = string 24 | 25 | Example: 26 | 'NP' 27 | 'VP' 28 | 'PP' 29 | 30 | Token 31 | ~~~~~ 32 | 33 | A token is a string comprising of a tag and modifiers/labels for matching. We specify a match_label to match the tag to. We can specify opts for extracting the string from a tree. We can specify eq for matching the tree to a string. 34 | 35 | :: 36 | 37 | Example string: 38 | The red car 39 | 40 | opts 41 | -o Get object by removing "a", "the", etc. (Ex. red car) 42 | -r Get raw string (Ex. The red car) 43 | :: 44 | 45 | Format: (only tag is required) 46 | token = tag:match_label-opts=eq 47 | 48 | 49 | Example: 50 | 'VP' 51 | 'NP:subject-o' 52 | 'NP:np' 53 | 'VP=run' 54 | 'VP:action=run' 55 | 56 | Token Tree 57 | ~~~~~~~~~~ 58 | 59 | A token tree is a recursive tree of tokens. The tree matches the 60 | structure of a parse tree. 61 | 62 | :: 63 | 64 | Format: 65 | token_tree = ( token token_tree token_tree ... ) 66 | 67 | Examples: 68 | '( NP ( DT ) ( NP:subject-o ) )' 69 | '( NP )' 70 | '( PP ( TO=to ) ( NP:object-o ) )' 71 | 72 | Rules 73 | ~~~~~ 74 | 75 | Rules are a dictionary of token trees to dictionaries of matching labels 76 | to a nested set of rules. 77 | 78 | 79 | :: 80 | 81 | Format: 82 | rules = {token_tree: {match_label: rules}} 83 | 84 | Example: 85 | { 86 | '( S ( NP:np ) ( VP ( VBD:action-o ) ( PP:pp ) ) )': { 87 | 'np': { 88 | '( NP:subject-o )': {} 89 | }, 90 | 'pp': { 91 | '( PP ( TO=to ) ( NP:to_object-o ) )': {}, 92 | '( PP ( IN=from ) ( NP:from_object-o ) )': {}, 93 | } 94 | }, 95 | } 96 | 97 | When matching a rule to a parse tree, the token tree is first matched. 98 | Then, all matching tags are matched to nested rules corresponding to 99 | their matching label. 100 | 101 | All nested match labels must have a subrule match or the rules will not 102 | match. 103 | 104 | The first rule to match is returned so the order of match is based on 105 | key ordering (use OrderedDict if order matters). Once a rule is matched, 106 | it calls the callback function with the context as arguments. 107 | 108 | Example 109 | ~~~~~~~ 110 | 111 | Suppose we have the sentence “Sam ran to his house” and we wanted to 112 | match the subject (“Sam”), the object to (“his house”) and the action 113 | (“ran”). 114 | 115 | Sample parse tree for “Sam ran to his house” from the Stanford Parser. 116 | 117 | :: 118 | 119 | (S 120 | (NP 121 | (NNP Sam) 122 | ) 123 | (VP 124 | (VBD ran) 125 | (PP 126 | (TO to) 127 | (NP 128 | (PRP$ his) 129 | (NN house) 130 | ) 131 | ) 132 | ) 133 | ) 134 | 135 | Simplified image of tree: 136 | 137 | .. figure:: /_static/img/sent_tree.png 138 | :alt: tree 139 | 140 | tree 141 | 142 | :: 143 | 144 | Matching: 145 | Parse Tree: 146 | (S (NP (NNP Sam) ) (VP (VBD ran) (PP (TO to) (NP (PRP$ his) (NN house)))) 147 | 148 | Matched token tree: '( S ( NP:np ) ( VP ( VBD:action-o ) ( PP:pp ) ) )' 149 | Matched context: 150 | np: (NP (NNP Sam)) 151 | action-o: 'ran' 152 | pp: (PP (TO to) (NP (PRP$ his) (NN house))) 153 | 154 | Rule for ‘( S ( NP:np ) ( VP ( VBD:action-o ) ( PP:pp ) ) )’: 155 | 156 | .. figure:: /_static/img/rule_tree_1.png 157 | :alt: tree 158 | 159 | tree 160 | 161 | Matching ‘NP’ matches the whole NP tree and converts to a word: 162 | 163 | .. _here: https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html 164 | 165 | :: 166 | 167 | Matched token tree for np: '( NP:subject-o )' 168 | Matched context: 169 | subject-o: 'Sam' 170 | 171 | Matching ‘PP’ requires matching the nested rules: 172 | 173 | :: 174 | 175 | Match token tree for pp: '( PP ( TO=to ) ( NP:to_object-o ) )' 176 | Match context: 177 | object-o: 'his house' 178 | 179 | Match token tree for pp: '( PP ( IN=from ) ( NP:from_object-o ) )' 180 | No match found 181 | 182 | PP of the sample sentence: 183 | 184 | .. figure:: /_static/img/sent_tree_pp.png 185 | :alt: tree 186 | 187 | tree 188 | 189 | Nested PP rules: 190 | 191 | |tree2| |tree3| 192 | 193 | Only the first rule matches for ‘PP’. 194 | 195 | Now that we have a match for all nested rules, we can return the 196 | context: 197 | 198 | :: 199 | 200 | Returned context: 201 | action: 'ran' 202 | subject: 'sam' 203 | to_object: 'his house' 204 | 205 | Full code: 206 | 207 | .. code:: python 208 | 209 | from lango.parser import StanfordLibParser 210 | from lango.matcher import match_rules 211 | 212 | parser = StanfordLibParser() 213 | 214 | rules = { 215 | '( S ( NP:np ) ( VP ( VBD:action-o ) ( PP:pp ) ) )': { 216 | 'np': { 217 | '( NP:subject-o )': {} 218 | }, 219 | 'pp': { 220 | '( PP ( TO=to ) ( NP:to_object-o ) )': {}, 221 | '( PP ( IN=from ) ( NP:from_object-o ) )': {} 222 | } 223 | } 224 | } 225 | 226 | def fun(subject, action, to_object=None, from_object=None): 227 | print "%s,%s,%s,%s" % (subject, action, to_object, from_object) 228 | 229 | tree = parser.parse('Sam ran to his house') 230 | match_rules(tree, rules, fun) 231 | # output should be: sam, ran, his house, None 232 | 233 | tree = parser.parse('Billy walked from his apartment') 234 | match_rules(tree, rules, fun) 235 | # output should be: billy, walked, None, his apartment 236 | 237 | .. |tree2| image:: /_static/img/rule_tree_2.png 238 | .. |tree3| image:: /_static/img/rule_tree_3.png 239 | 240 | -------------------------------------------------------------------------------- /docs/modules.rst: -------------------------------------------------------------------------------- 1 | lango 2 | ===== 3 | 4 | .. toctree:: 5 | :maxdepth: 4 6 | 7 | lango 8 | -------------------------------------------------------------------------------- /examples/matching.py: -------------------------------------------------------------------------------- 1 | 2 | from collections import OrderedDict 3 | import os 4 | from lango.parser import StanfordServerParser 5 | from lango.matcher import match_rules 6 | 7 | 8 | 9 | parser = StanfordServerParser() 10 | 11 | sents = [ 12 | 'Call me an Uber.', 13 | 'Get my mother some flowers.', 14 | 'Find me a pizza with extra cheese.', 15 | 'Give Sam\'s dog a biscuit from Petshop.' 16 | ] 17 | 18 | """ 19 | me.call({'item': u'uber'}) 20 | my.mother.get({'item': u'flowers'}) 21 | me.order({'item': u'pizza', u'with': u'extra cheese'}) 22 | sam.dog.give({'item': u'biscuit', u'from': u'petshop'}) 23 | """ 24 | 25 | subj_obj_rules = { 26 | 'subj_t': OrderedDict([ 27 | # my brother / my mother 28 | ('( NP ( PRP$:subject-o=my ) ( NN:relation-o ) )', {}), 29 | # Sam's dog 30 | ('( NP ( NP ( NNP:subject-o ) ( POS ) ) ( NN:relation-o ) )', {}), 31 | # me 32 | ('( NP:subject-o )', {}), 33 | ]), 34 | 'obj_t': OrderedDict([ 35 | # pizza with onions 36 | ('( NP ( NP:item-O ) ( PP ( IN:item_in-O ) ( NP:item_addon-O ) ) )', {}), 37 | # pizza 38 | ('( NP:item-O )', {}), 39 | ]) 40 | } 41 | 42 | rules = { 43 | # Get me a pizza 44 | '( S ( VP ( VB:action-o ) ( S ( NP:subj_t ) ( NP:obj_t ) ) ) )': subj_obj_rules, 45 | # Get my mother flowers 46 | '( S ( VP ( VB:action-o ) ( NP:subj_t ) ( NP:obj_t ) ) )': subj_obj_rules, 47 | } 48 | 49 | def perform_action(action, item, subject, relation=None, 50 | item_addon=None, item_in=None): 51 | 52 | entity = subject 53 | if entity == "my": 54 | entity = "me" 55 | if relation: 56 | entity = '{0}.{1}'.format(entity, relation) 57 | 58 | item_props = {'item': item} 59 | if item_in and item_addon: 60 | item_props[item_in] = item_addon 61 | 62 | return '{0}.{1}({2})'.format(entity, action, item_props) 63 | 64 | for sent in sents: 65 | tree = parser.parse(sent) 66 | print(match_rules(tree, rules, perform_action)) 67 | -------------------------------------------------------------------------------- /examples/multimatch.py: -------------------------------------------------------------------------------- 1 | 2 | from collections import OrderedDict 3 | import os 4 | from lango.parser import StanfordServerParser 5 | from lango.matcher import match_rules 6 | 7 | parser = StanfordServerParser() 8 | 9 | sents = [ 10 | 'What religion is the President of the United States?' 11 | ] 12 | 13 | rules = { 14 | '( SBARQ ( WHNP/WHADVP:wh_t ) ( SQ ( VBZ ) ( NP:np_t ) ) )': { 15 | 'np_t': { 16 | '( NP ( NP:subj-o ) ( PP ( IN:subj_in-o ) ( NP:obj-o ) ) )': {}, 17 | '( NP:subj-o )': {}, 18 | }, 19 | 'wh_t': { 20 | '( WHNP:whnp ( WDT ) ( NN:prop-o ) )': {}, 21 | '( WHNP/WHADVP:qtype-o )': {}, 22 | } 23 | }, 24 | '( SBARQ:subj-o )': {}, 25 | } 26 | 27 | keys = ['subj', 'subj_in', 'obj', 'prop', 'qtype'] 28 | 29 | for sent in sents: 30 | tree = parser.parse(sent) 31 | contexts = match_rules(tree, rules, multi=True) 32 | for context in contexts: 33 | print(", ".join(['%s:%s' % (k, context.get(k)) for k in keys])) 34 | 35 | """ 36 | 5 possible matches: 37 | subj:president of united states, subj_in:None, obj:None, prop:religion, qtype:None 38 | subj:president of united states, subj_in:None, obj:None, prop:None, qtype:what religion 39 | subj:president, subj_in:of, obj:united states, prop:religion, qtype:None 40 | subj:president, subj_in:of, obj:united states, prop:None, qtype:what religion 41 | subj:what religion is president of united states ?, subj_in:None, obj:None, prop:None, qtype:None 42 | """ -------------------------------------------------------------------------------- /examples/parser_input.py: -------------------------------------------------------------------------------- 1 | 2 | import os 3 | from lango.parser import StanfordServerParser 4 | from lango.matcher import match_rules 5 | 6 | def main(): 7 | parser = StanfordServerParser() 8 | while True: 9 | try: 10 | line = input("Enter line: ") 11 | tree = parser.parse(line) 12 | tree.pretty_print() 13 | except EOFError: 14 | print("Bye!") 15 | sys.exit(0) 16 | 17 | if __name__ == "__main__": 18 | main() 19 | -------------------------------------------------------------------------------- /lango/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Lango is a natural language framework for matching parse trees 3 | and modeling conversations. 4 | """ 5 | __version__ = '0.21.0' -------------------------------------------------------------------------------- /lango/matcher.py: -------------------------------------------------------------------------------- 1 | from nltk import Tree 2 | import logging 3 | 4 | logger = logging.getLogger(__name__) 5 | 6 | def match_rules(tree, rules, fun=None, multi=False): 7 | """Matches a Tree structure with the given query rules. 8 | 9 | Query rules are represented as a dictionary of template to action. 10 | Action is either a function, or a dictionary of subtemplate parameter to rules:: 11 | 12 | rules = { 'template' : { 'key': rules } } 13 | | { 'template' : {} } 14 | 15 | Args: 16 | tree (Tree): Parsed tree structure 17 | rules (dict): A dictionary of query rules 18 | fun (function): Function to call with context (set to None if you want to return context) 19 | multi (Bool): If True, returns all matched contexts, else returns first matched context 20 | Returns: 21 | Contexts from matched rules 22 | """ 23 | if multi: 24 | context = match_rules_context_multi(tree, rules) 25 | else: 26 | context = match_rules_context(tree, rules) 27 | if not context: 28 | return None 29 | 30 | if fun: 31 | args = fun.__code__.co_varnames 32 | if multi: 33 | res = [] 34 | for c in context: 35 | action_context = {} 36 | for arg in args: 37 | if arg in c: 38 | action_context[arg] = c[arg] 39 | res.append(fun(**action_context)) 40 | return res 41 | else: 42 | action_context = {} 43 | for arg in args: 44 | if arg in context: 45 | action_context[arg] = context[arg] 46 | return fun(**action_context) 47 | else: 48 | return context 49 | 50 | def match_rules_context(tree, rules, parent_context={}): 51 | """Recursively matches a Tree structure with rules and returns context 52 | 53 | Args: 54 | tree (Tree): Parsed tree structure 55 | rules (dict): See match_rules 56 | parent_context (dict): Context of parent call 57 | Returns: 58 | dict: Context matched dictionary of matched rules or 59 | None if no match 60 | """ 61 | for template, match_rules in rules.items(): 62 | context = parent_context.copy() 63 | if match_template(tree, template, context): 64 | for key, child_rules in match_rules.items(): 65 | child_context = match_rules_context(context[key], child_rules, context) 66 | if child_context: 67 | for k, v in child_context.items(): 68 | context[k] = v 69 | else: 70 | return None 71 | return context 72 | return None 73 | 74 | def cross_context(contextss): 75 | """ 76 | Cross product of all contexts 77 | [[a], [b], [c]] -> [[a] x [b] x [c]] 78 | 79 | """ 80 | if not contextss: 81 | return [] 82 | 83 | product = [{}] 84 | 85 | for contexts in contextss: 86 | tmp_product = [] 87 | for c in contexts: 88 | for ce in product: 89 | c_copy = c.copy() 90 | c_copy.update(ce) 91 | tmp_product.append(c_copy) 92 | product = tmp_product 93 | return product 94 | 95 | def match_rules_context_multi(tree, rules, parent_context={}): 96 | """Recursively matches a Tree structure with rules and returns context 97 | 98 | Args: 99 | tree (Tree): Parsed tree structure 100 | rules (dict): See match_rules 101 | parent_context (dict): Context of parent call 102 | Returns: 103 | dict: Context matched dictionary of matched rules or 104 | None if no match 105 | """ 106 | all_contexts = [] 107 | for template, match_rules in rules.items(): 108 | context = parent_context.copy() 109 | if match_template(tree, template, context): 110 | child_contextss = [] 111 | if not match_rules: 112 | all_contexts += [context] 113 | else: 114 | for key, child_rules in match_rules.items(): 115 | child_contextss.append(match_rules_context_multi(context[key], child_rules, context)) 116 | all_contexts += cross_context(child_contextss) 117 | return all_contexts 118 | 119 | def match_template(tree, template, args=None): 120 | """Check if match string matches Tree structure 121 | 122 | Args: 123 | tree (Tree): Parsed Tree structure of a sentence 124 | template (str): String template to match. Example: "( S ( NP ) )" 125 | Returns: 126 | bool: If they match or not 127 | """ 128 | tokens = get_tokens(template.split()) 129 | cur_args = {} 130 | if match_tokens(tree, tokens, cur_args): 131 | if args is not None: 132 | for k, v in cur_args.items(): 133 | args[k] = v 134 | logger.debug('MATCHED: {0}'.format(template)) 135 | return True 136 | else: 137 | return False 138 | 139 | 140 | def match_tokens(tree, tokens, args): 141 | """Check if stack of tokens matches the Tree structure 142 | 143 | Special matching rules that can be specified in the template:: 144 | 145 | ':label': Label a token, the token will be returned as part of the context with key 'label'. 146 | '-@': Additional single letter argument determining return format of labeled token. Valid options are: 147 | '-r': Return token as word 148 | '-o': Return token as object 149 | '=word|word2|....|wordn': Force match raw lower case 150 | '$': Match end of tree 151 | 152 | Args: 153 | tree : Parsed tree structure 154 | tokens : Stack of tokens 155 | Returns: 156 | Boolean if they match or not 157 | """ 158 | arg_type_to_func = { 159 | 'r': get_raw_lower, 160 | 'R': get_raw, 161 | 'o': get_object_lower, 162 | 'O': get_object, 163 | } 164 | 165 | if len(tokens) == 0: 166 | return True 167 | 168 | if not isinstance(tree, Tree): 169 | return False 170 | 171 | root_token = tokens[0] 172 | 173 | # Equality 174 | if root_token.find('=') >= 0: 175 | eq_tokens = root_token.split('=')[1].lower().split('|') 176 | root_token = root_token.split('=')[0] 177 | word = get_raw_lower(tree) 178 | if word not in eq_tokens: 179 | return False 180 | 181 | # Get arg 182 | if root_token.find(':') >= 0: 183 | arg_tokens = root_token.split(':')[1].split('-') 184 | if len(arg_tokens) == 1: 185 | arg_name = arg_tokens[0] 186 | args[arg_name] = tree 187 | else: 188 | arg_name = arg_tokens[0] 189 | arg_type = arg_tokens[1] 190 | args[arg_name] = arg_type_to_func[arg_type](tree) 191 | root_token = root_token.split(':')[0] 192 | 193 | # Does not match wild card and label does not match 194 | if root_token != '.' and tree.label() not in root_token.split('/'): 195 | return False 196 | 197 | # Check end symbol 198 | if tokens[-1] == '$': 199 | if len(tree) != len(tokens[:-1]) - 1: 200 | return False 201 | else: 202 | tokens = tokens[:-1] 203 | 204 | # Check # of tokens 205 | if len(tree) < len(tokens) - 1: 206 | return False 207 | 208 | for i in range(len(tokens) - 1): 209 | if not match_tokens(tree[i], tokens[i + 1], args): 210 | return False 211 | return True 212 | 213 | 214 | def get_tokens(tokens): 215 | """Recursively gets tokens from a match list 216 | 217 | Args: 218 | tokens : List of tokens ['(', 'S', '(', 'NP', ')', ')'] 219 | Returns: 220 | Stack of tokens 221 | """ 222 | tokens = tokens[1:-1] 223 | ret = [] 224 | start = 0 225 | stack = 0 226 | for i in range(len(tokens)): 227 | if tokens[i] == '(': 228 | if stack == 0: 229 | start = i 230 | stack += 1 231 | elif tokens[i] == ')': 232 | stack -= 1 233 | if stack < 0: 234 | raise Exception('Bracket mismatch: ' + str(tokens)) 235 | if stack == 0: 236 | ret.append(get_tokens(tokens[start:i + 1])) 237 | else: 238 | if stack == 0: 239 | ret.append(tokens[i]) 240 | if stack != 0: 241 | raise Exception('Bracket mismatch: ' + str(tokens)) 242 | return ret 243 | 244 | 245 | def get_object(tree): 246 | """Get the object in the tree object. 247 | 248 | Method should remove unnecessary letters and words:: 249 | 250 | the 251 | a/an 252 | 's 253 | 254 | Args: 255 | tree (Tree): Parsed tree structure 256 | Returns: 257 | Resulting string of tree ``(Ex: "red car")`` 258 | """ 259 | if isinstance(tree, Tree): 260 | if tree.label() == 'DT' or tree.label() == 'POS': 261 | return '' 262 | words = [] 263 | for child in tree: 264 | words.append(get_object(child)) 265 | return ' '.join([_f for _f in words if _f]) 266 | else: 267 | return tree 268 | 269 | 270 | def get_object_lower(tree): 271 | return get_object(tree).lower() 272 | 273 | 274 | def get_raw(tree): 275 | """Get the exact words in lowercase in the tree object. 276 | 277 | Args: 278 | tree (Tree): Parsed tree structure 279 | Returns: 280 | Resulting string of tree ``(Ex: "The red car")`` 281 | """ 282 | if isinstance(tree, Tree): 283 | words = [] 284 | for child in tree: 285 | words.append(get_raw(child)) 286 | return ' '.join(words) 287 | else: 288 | return tree 289 | 290 | 291 | def get_raw_lower(tree): 292 | return get_raw(tree).lower() -------------------------------------------------------------------------------- /lango/parser.py: -------------------------------------------------------------------------------- 1 | from nltk.parse.stanford import StanfordParser, GenericStanfordParser 2 | from nltk.internals import find_jars_within_path 3 | from nltk.tree import Tree 4 | from pycorenlp import StanfordCoreNLP 5 | 6 | 7 | class Parser: 8 | """Abstract Parser class""" 9 | def __init__(): 10 | pass 11 | 12 | def parse(self, sent): 13 | pass 14 | 15 | 16 | class OldStanfordLibParser(Parser): 17 | """For StanfordParser < 3.6.0""" 18 | 19 | def __init__(self): 20 | self.parser = StanfordParser() 21 | 22 | def parse(self, line): 23 | """Returns tree objects from a sentence 24 | 25 | Args: 26 | line: Sentence to be parsed into a tree 27 | 28 | Returns: 29 | Tree object representing parsed sentence 30 | None if parse fails 31 | """ 32 | tree = list(self.parser.raw_parse(line))[0] 33 | tree = tree[0] 34 | return tree 35 | 36 | 37 | class StanfordLibParser(OldStanfordLibParser): 38 | """For StanfordParser == 3.6.0""" 39 | def __init__(self): 40 | self.parser = StanfordParser( 41 | model_path='edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz') 42 | stanford_dir = self.parser._classpath[0].rpartition('/')[0] 43 | self.parser._classpath = tuple(find_jars_within_path(stanford_dir)) 44 | 45 | 46 | class StanfordServerParser(Parser, GenericStanfordParser): 47 | """Follow the readme to setup the Stanford CoreNLP server""" 48 | def __init__(self, host='localhost', port=9000, properties={}): 49 | url = 'http://{0}:{1}'.format(host, port) 50 | self.nlp = StanfordCoreNLP(url) 51 | 52 | if not properties: 53 | self.properties = { 54 | 'annotators': 'parse', 55 | 'outputFormat': 'json', 56 | } 57 | else: 58 | self.properties = properties 59 | 60 | def _make_tree(self, result): 61 | return Tree.fromstring(result) 62 | 63 | def parse(self, sent): 64 | output = self.nlp.annotate(sent, properties=self.properties) 65 | 66 | # Got random html, return empty tree 67 | if isinstance(output, str): 68 | return Tree('', []) 69 | 70 | parse_output = output['sentences'][0]['parse'] + '\n\n' 71 | tree = next(next(self._parse_trees_output(parse_output)))[0] 72 | return tree -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | # Lango 2 | 3 | [![Gitter](https://badges.gitter.im/lango-nlp/Lobby.svg)](https://gitter.im/lango-nlp/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge) 4 | 5 | Lango is a natural language processing library for working with the building blocks of language. It includes tools for: 6 | 7 | * matching [constituent parse trees](https://en.wikipedia.org/wiki/Parse_tree#Constituency-based_parse_trees). 8 | * modeling conversations (TODO) 9 | 10 | Need help? Ask me for help on [Gitter](https://gitter.im/lango-nlp/Lobby) 11 | 12 | ## Installation 13 | 14 | ### Install package with pip 15 | 16 | ``` 17 | pip install lango 18 | ``` 19 | 20 | ### Download Stanford CoreNLP 21 | 22 | Make sure you have Java installed for the Stanford CoreNLP to work. 23 | 24 | [Download Stanford CoreNLP](http://stanfordnlp.github.io/CoreNLP/#download) 25 | 26 | Extract to any folder 27 | 28 | ### Run the Stanford CoreNLP server 29 | 30 | Run the following command in the folder where you extracted Stanford CoreNLP 31 | ``` 32 | java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer 33 | ``` 34 | 35 | ## Docs 36 | 37 | - [Blog Post](http://blog.ayoungprogrammer.com/2016/07/natural-language-understanding-by.html/) 38 | - [Read the docs](http://lango.readthedocs.io/en/latest/) 39 | - [Examples](http://github.com/ayoungprogrammer/lango/tree/master/examples) 40 | 41 | ## Matching 42 | 43 | Matching is done by comparing a set rules and matching it with a parse tree. You 44 | can see parse trees for sentences from examples/parser_input.py. 45 | 46 | The set of rules is recursive and can match multiple parts of the parse tree. 47 | 48 | Rules can be broken down into smaller parts: 49 | - Tag 50 | - Token 51 | - Token Tree 52 | - Rules 53 | 54 | ### Tag 55 | 56 | A tag is a POS (part of speech) tag to match. A list of POS tags used by the Stanford Parser can be found [here](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html). 57 | 58 | ``` 59 | Format: 60 | tag = string 61 | 62 | Example: 63 | 'NP' 64 | 'VP' 65 | 'PP' 66 | ``` 67 | 68 | ### Token 69 | 70 | A token is a string comprising of a tag and modifiers/labels for matching. We specify a match_label to match the tag to. We can specify opts for extracting the string from a tree. We can specify eq for matching the tree to a string. 71 | 72 | ``` 73 | Example string: 74 | The red car 75 | 76 | opts: 77 | -o Get object by removing "a", "the", etc. (Ex. red car) 78 | -r Get raw string (Ex. The red car) 79 | ``` 80 | 81 | ``` 82 | Format: (only tag is required) 83 | token = tag:match_label-opts=eq 84 | 85 | Example: 86 | 'VP' 87 | 'NP:subject-o' 88 | 'NP:np' 89 | 'VP=run' 90 | 'VP:action=run' 91 | ``` 92 | 93 | ### Token Tree 94 | 95 | A token tree is a recursive tree of tokens. The tree matches the structure of a parse tree. 96 | 97 | ``` 98 | Format: 99 | token_tree = ( token token_tree token_tree ... ) 100 | 101 | Examples: 102 | '( NP ( DT ) ( NP:subject-o ) )' 103 | '( NP )' 104 | '( PP ( TO=to ) ( NP:object-o ) )' 105 | ``` 106 | 107 | ### Rules 108 | 109 | Rules are a dictionary of token trees to dictionaries of matching labels to a 110 | nested set of rules. 111 | 112 | ``` 113 | Format: 114 | rules = {token_tree: {match_label: rules}} 115 | 116 | Example: 117 | { 118 | '( S ( NP:np ) ( VP ( VBD:action-o ) ( PP:pp ) ) )': { 119 | 'np': { 120 | '( NP:subject-o )': {} 121 | }, 122 | 'pp': { 123 | '( PP ( TO=to ) ( NP:to_object-o ) )': {}, 124 | '( PP ( IN=from ) ( NP:from_object-o ) )': {}, 125 | } 126 | }, 127 | } 128 | ``` 129 | 130 | When matching a rule to a parse tree, the token tree is first matched. Then, all 131 | matching tags are matched to nested rules corresponding to their matching label. 132 | 133 | All nested match labels must have a subrule match or the rules will not match. 134 | 135 | The first rule to match is returned so the order of match is based on key 136 | ordering (use OrderedDict if order matters). Once a rule is matched, it calls 137 | the callback function with the context as arguments. 138 | 139 | ### Example 140 | 141 | Suppose we have the sentence "Sam ran to his house" and we wanted to match the 142 | subject ("Sam"), the object ("his house") and the action ("ran"). 143 | 144 | Sample parse tree for "Sam ran to his house" from the Stanford Parser. 145 | 146 | ``` 147 | (S 148 | (NP 149 | (NNP Sam) 150 | ) 151 | (VP 152 | (VBD ran) 153 | (PP 154 | (TO to) 155 | (NP 156 | (PRP$ his) 157 | (NN house) 158 | ) 159 | ) 160 | ) 161 | ) 162 | ``` 163 | 164 | Simplified image of tree: 165 | 166 | ![tree](/docs/_static/img/sent_tree.png) 167 | 168 | ``` 169 | Matching: 170 | Parse Tree: 171 | (S (NP (NNP Sam) ) (VP (VBD ran) (PP (TO to) (NP (PRP$ his) (NN house)))) 172 | 173 | Matched token tree: '( S ( NP:np ) ( VP ( VBD:action-o ) ( PP:pp ) ) )' 174 | Matched context: 175 | np: (NP (NNP Sam)) 176 | action-o: 'ran' 177 | pp: (PP (TO to) (NP (PRP$ his) (NN house))) 178 | ``` 179 | 180 | Rule for '( S ( NP:np ) ( VP ( VBD:action-o ) ( PP:pp ) ) )': 181 | 182 | ![tree](/docs/_static/img/rule_tree_1.png) 183 | 184 | Matching 'NP' matches the whole NP tree and converts to a word: 185 | 186 | ``` 187 | Matched token tree for np: '( NP:subject-o )' 188 | Matched context: 189 | subject-o: 'Sam' 190 | ``` 191 | 192 | Matching 'PP' requires matching the nested rules: 193 | 194 | ``` 195 | Match token tree for pp: '( PP ( TO=to ) ( NP:to_object-o ) )' 196 | Match context: 197 | object-o: 'his house' 198 | 199 | Match token tree for pp: '( PP ( IN=from ) ( NP:from_object-o ) )' 200 | No match found 201 | ``` 202 | PP of the sample sentence: 203 | 204 | ![tree](/docs/_static/img/sent_tree_pp.png) 205 | 206 | Nested PP rules: 207 | 208 | ![tree](/docs/_static/img/rule_tree_2.png) 209 | ![tree](/docs/_static/img/rule_tree_3.png) 210 | 211 | Only the first rule matches for 'PP'. 212 | 213 | Now that we have a match for all nested rules, we can return the context: 214 | ``` 215 | Returned context: 216 | action: 'ran' 217 | subject: 'sam' 218 | to_object: 'his house' 219 | ``` 220 | 221 | Full code: 222 | 223 | ```python 224 | from lango.parser import StanfordServerParser 225 | from lango.matcher import match_rules 226 | 227 | parser = StanfordServerParser() 228 | 229 | rules = { 230 | '( S ( NP:np ) ( VP ( VBD:action-o ) ( PP:pp ) ) )': { 231 | 'np': { 232 | '( NP:subject-o )': {} 233 | }, 234 | 'pp': { 235 | '( PP ( TO=to ) ( NP:to_object-o ) )': {}, 236 | '( PP ( IN=from ) ( NP:from_object-o ) )': {} 237 | } 238 | } 239 | } 240 | 241 | def fun(subject, action, to_object=None, from_object=None): 242 | print "%s,%s,%s,%s" % (subject, action, to_object, from_object) 243 | 244 | tree = parser.parse('Sam ran to his house') 245 | match_rules(tree, rules, fun) 246 | # output should be: sam, ran, his house, None 247 | 248 | tree = parser.parse('Billy walked from his apartment') 249 | match_rules(tree, rules, fun) 250 | # output should be: billy, walked, None, his apartment 251 | ``` 252 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | nltk==3.1 2 | pycorenlp==0.3.0 -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import find_packages, setup 2 | import lango 3 | 4 | setup( 5 | name='Lango', 6 | version=lango.__version__, 7 | description='Natural Language Framework for Matching Parse Trees and Modeling Conversation', 8 | packages=find_packages(), 9 | author='Michael Young', 10 | author_email='michaelyoung1995@gmail.com', 11 | url='https://github.com/ayoungprogrammer/lango', 12 | scripts=[], 13 | install_requires=[ 14 | 'nltk', 15 | 'pycorenlp' 16 | ], 17 | ) 18 | --------------------------------------------------------------------------------