├── .gitignore ├── COPYING ├── ChangeLog ├── README.md ├── ToDo.FAQ ├── ToDo.md ├── create_windows_distributions.py ├── examples └── md2epub.py ├── gleetex ├── __init__.py ├── __main__.py ├── cachedconverter.py ├── caching.py ├── htmlhandling.py ├── image.py ├── pandoc.py ├── parser.py ├── sink.py ├── typesetting.py └── unicode.py ├── manpage.md ├── pyproject.toml ├── runtests ├── setup.cfg ├── setup.py ├── tests ├── test_cachedconverter.py ├── test_caching.py ├── test_htmlhandling.py ├── test_imagecreation.py └── test_typesetting.py └── update_unicode_table.py /.gitignore: -------------------------------------------------------------------------------- 1 | *build* 2 | dist 3 | doc/gleetex.*html 4 | doc/index.html 5 | gladtex.1 6 | GladTeX.egg-info/ 7 | *.pyc 8 | *pycache* 9 | *.swp 10 | *.zip 11 | -------------------------------------------------------------------------------- /COPYING: -------------------------------------------------------------------------------- 1 | GNU LESSER GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | 9 | This version of the GNU Lesser General Public License incorporates 10 | the terms and conditions of version 3 of the GNU General Public 11 | License, supplemented by the additional permissions listed below. 12 | 13 | 0. Additional Definitions. 14 | 15 | As used herein, "this License" refers to version 3 of the GNU Lesser 16 | General Public License, and the "GNU GPL" refers to version 3 of the GNU 17 | General Public License. 18 | 19 | "The Library" refers to a covered work governed by this License, 20 | other than an Application or a Combined Work as defined below. 21 | 22 | An "Application" is any work that makes use of an interface provided 23 | by the Library, but which is not otherwise based on the Library. 24 | Defining a subclass of a class defined by the Library is deemed a mode 25 | of using an interface provided by the Library. 26 | 27 | A "Combined Work" is a work produced by combining or linking an 28 | Application with the Library. The particular version of the Library 29 | with which the Combined Work was made is also called the "Linked 30 | Version". 31 | 32 | The "Minimal Corresponding Source" for a Combined Work means the 33 | Corresponding Source for the Combined Work, excluding any source code 34 | for portions of the Combined Work that, considered in isolation, are 35 | based on the Application, and not on the Linked Version. 36 | 37 | The "Corresponding Application Code" for a Combined Work means the 38 | object code and/or source code for the Application, including any data 39 | and utility programs needed for reproducing the Combined Work from the 40 | Application, but excluding the System Libraries of the Combined Work. 41 | 42 | 1. Exception to Section 3 of the GNU GPL. 43 | 44 | You may convey a covered work under sections 3 and 4 of this License 45 | without being bound by section 3 of the GNU GPL. 46 | 47 | 2. Conveying Modified Versions. 48 | 49 | If you modify a copy of the Library, and, in your modifications, a 50 | facility refers to a function or data to be supplied by an Application 51 | that uses the facility (other than as an argument passed when the 52 | facility is invoked), then you may convey a copy of the modified 53 | version: 54 | 55 | a) under this License, provided that you make a good faith effort to 56 | ensure that, in the event an Application does not supply the 57 | function or data, the facility still operates, and performs 58 | whatever part of its purpose remains meaningful, or 59 | 60 | b) under the GNU GPL, with none of the additional permissions of 61 | this License applicable to that copy. 62 | 63 | 3. Object Code Incorporating Material from Library Header Files. 64 | 65 | The object code form of an Application may incorporate material from 66 | a header file that is part of the Library. You may convey such object 67 | code under terms of your choice, provided that, if the incorporated 68 | material is not limited to numerical parameters, data structure 69 | layouts and accessors, or small macros, inline functions and templates 70 | (ten or fewer lines in length), you do both of the following: 71 | 72 | a) Give prominent notice with each copy of the object code that the 73 | Library is used in it and that the Library and its use are 74 | covered by this License. 75 | 76 | b) Accompany the object code with a copy of the GNU GPL and this license 77 | document. 78 | 79 | 4. Combined Works. 80 | 81 | You may convey a Combined Work under terms of your choice that, 82 | taken together, effectively do not restrict modification of the 83 | portions of the Library contained in the Combined Work and reverse 84 | engineering for debugging such modifications, if you also do each of 85 | the following: 86 | 87 | a) Give prominent notice with each copy of the Combined Work that 88 | the Library is used in it and that the Library and its use are 89 | covered by this License. 90 | 91 | b) Accompany the Combined Work with a copy of the GNU GPL and this license 92 | document. 93 | 94 | c) For a Combined Work that displays copyright notices during 95 | execution, include the copyright notice for the Library among 96 | these notices, as well as a reference directing the user to the 97 | copies of the GNU GPL and this license document. 98 | 99 | d) Do one of the following: 100 | 101 | 0) Convey the Minimal Corresponding Source under the terms of this 102 | License, and the Corresponding Application Code in a form 103 | suitable for, and under terms that permit, the user to 104 | recombine or relink the Application with a modified version of 105 | the Linked Version to produce a modified Combined Work, in the 106 | manner specified by section 6 of the GNU GPL for conveying 107 | Corresponding Source. 108 | 109 | 1) Use a suitable shared library mechanism for linking with the 110 | Library. A suitable mechanism is one that (a) uses at run time 111 | a copy of the Library already present on the user's computer 112 | system, and (b) will operate properly with a modified version 113 | of the Library that is interface-compatible with the Linked 114 | Version. 115 | 116 | e) Provide Installation Information, but only if you would otherwise 117 | be required to provide such information under section 6 of the 118 | GNU GPL, and only to the extent that such information is 119 | necessary to install and execute a modified version of the 120 | Combined Work produced by recombining or relinking the 121 | Application with a modified version of the Linked Version. (If 122 | you use option 4d0, the Installation Information must accompany 123 | the Minimal Corresponding Source and Corresponding Application 124 | Code. If you use option 4d1, you must provide the Installation 125 | Information in the manner specified by section 6 of the GNU GPL 126 | for conveying Corresponding Source.) 127 | 128 | 5. Combined Libraries. 129 | 130 | You may place library facilities that are a work based on the 131 | Library side by side in a single library together with other library 132 | facilities that are not Applications and are not covered by this 133 | License, and convey such a combined library under terms of your 134 | choice, if you do both of the following: 135 | 136 | a) Accompany the combined library with a copy of the same work based 137 | on the Library, uncombined with any other library facilities, 138 | conveyed under the terms of this License. 139 | 140 | b) Give prominent notice with the combined library that part of it 141 | is a work based on the Library, and explaining where to find the 142 | accompanying uncombined form of the same work. 143 | 144 | 6. Revised Versions of the GNU Lesser General Public License. 145 | 146 | The Free Software Foundation may publish revised and/or new versions 147 | of the GNU Lesser General Public License from time to time. Such new 148 | versions will be similar in spirit to the present version, but may 149 | differ in detail to address new problems or concerns. 150 | 151 | Each version is given a distinguishing version number. If the 152 | Library as you received it specifies that a certain numbered version 153 | of the GNU Lesser General Public License "or any later version" 154 | applies to it, you have the option of following the terms and 155 | conditions either of that published version or of any later version 156 | published by the Free Software Foundation. If the Library as you 157 | received it does not specify a version number of the GNU Lesser 158 | General Public License, you may choose any version of the GNU Lesser 159 | General Public License ever published by the Free Software Foundation. 160 | 161 | If the Library as you received it specifies that a proxy can decide 162 | whether future versions of the GNU Lesser General Public License shall 163 | apply, that proxy's public statement of acceptance of any version is 164 | permanent authorization for you to choose that version for the 165 | Library. 166 | -------------------------------------------------------------------------------- /ChangeLog: -------------------------------------------------------------------------------- 1 | 4.0 (UNRELEASED) 2 | 3 | - Enable standard UNIX quoting for `GLADTEX_ARGS` that allows passing 4 | arguments with spaces when GladTeX is used as a Pandoc filter. 5 | - Add --epub flag to produce HTML output more suitable for EPUBs. 6 | - Avoid option clashes with the xcolor package if extending the xcolor 7 | options using `-p`. Recommend `\PassOptionsToPackage`. 8 | - Add `-interactive=nonstopmode` to each LaTeX invocation to stop even if 9 | a file was not found. This is relevant when using e.g. `-p "\input{foo}" 10 | and foo.tex doesn't exist. 11 | - Rework excluded formula handling: 12 | - Remove parser for file that contains excluded formulas: the file is 13 | auto-generated by GladTeX and it is more reliable and easier to 14 | overwrite the generated file with an updated one, instead of parsing 15 | the old contents. 16 | - Restructure library with a cleaner formatter hierarchy. 17 | 18 | 3.1 19 | 20 | - [source only] move gladtex.py to gleetex/main.py 21 | 22 | 3.0.1 23 | 24 | - fix AtributeError when specifying `-E` 25 | 26 | 27 | 3.0.0 28 | 29 | - new features and incompatible changes: 30 | - add `-P` command-line switch to be used as a Pandoc document filter, 31 | see 32 | - add environment variable `GLADTEX_ARGS` to pass command-line 33 | switches when used as pandocfilter where passing additional 34 | arguments is impossible 35 | - redefine colour handling: use xcolor package, therefore handling 36 | text and background colour the same way for both PNG and SVG 37 | - add SVG support for scalable images 38 | - use SVG output by default 39 | - gleetex.htmlhandling.HtmlImageFormatter: rename link_path to 40 | link_prefix 41 | - bug fixes: 42 | - correctly parse HTML5 file encoding declarations 43 | - add more exceptions to the unicode table for the unicode replacement 44 | mode (see -R) 45 | - treat -d as a relative path 46 | 47 | 2.3.1 Avoid useless spaces 48 | 49 | - When formula replacement with `-R` is requested, it could happen that 50 | additional spaces were inserted, even if not necessary. "für" would for 51 | instance become "f\"{u} r". Fixed. 52 | 53 | 2.3 Fix formula sizing 54 | 55 | - It seems as if 16px / 12pt were the default font size these days for 56 | browsers. Therefore, the default resolution has been set to 115 DPI. 57 | Furthermore, the DPI switch now accepts pt values for fontsizes and will 58 | calculate the corresponding DPI itself. 59 | - When the environment variable `DEBUG=1` is set, the full backtrace will be 60 | printed. 61 | - Extend unicode table creation script to allow blacklisting of certain 62 | commands. 63 | 64 | 2.2.1 - fix handling of non-ascii alphabetical characters 65 | 66 | - replace characters with diacritics in the LaTeX source, but keep the 67 | unmodified character in image alt attribute (for better readibility) 68 | 69 | 2.2 - make alternative text of formulas more readable 70 | 71 | - replace formatting commands in alt attribute; this shortens the 72 | formula and makes it mor readable 73 | - replace unicode signs also in alt attribute (good for screen reader 74 | users and text-mode browsers) 75 | - recognize upper-case `ENV` attribute of `EQ` tag (so that e.g. 76 | displaymath is recognized correctly) 77 | 78 | 2.1.1 Bug Fix Release 79 | 80 | - treat eq element content as verbatim 81 | - decode HTML entities within formula tags 82 | 83 | 2.1 add support for unicode math with translation table 84 | - handle subprocess stdin and stdout encoding properly 85 | - set UTF-8 as encoding for all LaTeX documents 86 | - add -R option (replace non ascii characters) 87 | - formulas in .htex documents may now contain umlauts or unicode math 88 | characters; conversion will work without adjustments, only -R has to 89 | be specified 90 | - handle encoding better and more strictly for LaTeX 2E 91 | 92 | 2.0.1 Bug Fix Release 93 | - show user a meaningful error message if LaTeX or dvipng is missing 94 | - setup.py: build manual page, if pandoc present 95 | - freeze multiprocessing on Windows, to make executables distributable 96 | 97 | 2.0 - make GladTeX truely platform independent 98 | - add formula number in error output; makes tracking of formulas easier in 99 | error case 100 | - write man page 101 | - set css class correctly for display math formulas 102 | - HTML label/id generation: 103 | - do not create overlong id's 104 | - do only generate id's starting with an alphabetical character 105 | - squeeze multiple identical characters 106 | - reparse outsourced formulas correctly (was a mixture of formatted vs. 107 | unformatted formulas) 108 | - do not use absolute links when operating on file which is not in current 109 | working directory 110 | - be more careful with backslashes vs. slashes 111 | - allow formulas consisting only of numbers (i.e. example calculations) by 112 | prefixing "form_" in front of the HTML id (id must start with a letter 113 | but may be followed by digits) 114 | - allow removal of unreadable caches with the `-n` switch (extend library 115 | with this functionality) 116 | - introduce `-m` switch to print the output in a less concise, but more 117 | machine-parseable format 118 | 119 | 1.6 - complete rewrite 120 | - rewrite GladTeX fully in Python 121 | - allows easy compilation into a binary for a specific platform 122 | - comes with a new library to use GladTeX functionality within other 123 | applications 124 | - fully unit-tested 125 | - enable piping support; GladTeX can read from stdin and write to stdout 126 | - drop -t switch; image is either transparent by default or has a background 127 | color which can be set with -b 128 | - drop -s switch 129 | - introduce -o (output) option 130 | - introduce new cache format containing version numbers; json, so 131 | interface to other programming languages 132 | 133 | 1.5 - Introduce options to make embedding of GladTeX easier 134 | - Try to parse LaTeX's error output and display it to help users to find the 135 | issue quicker and to make GladTeX better embeddable. 136 | - Add option to remove error log file, produced by LaTeX, automatically. 137 | - Rewrite some help messages. 138 | - Add signal handling to get meaningful error messages if GladTeX hangs. 139 | 140 | 1.4.2 - bug fix release 141 | - Add some eval's to cope with some failures. 142 | - Since there were some incompatibilities between Perl 5.10 and 5.18 in how 143 | the cache of the generated images is stored, GladTeX now removes this file 144 | along with all images starting with "eqn" and generates them again. 145 | 146 | 1.4.1 - bug fix release 147 | - Remove desc.html if created and empty 148 | 149 | 1.4 - put LaTeX equations into alt tag for text-mode browsers and blind people 150 | (and disabled images) 151 | - If requested (-a), exclude equations longer than 80 characters in an extra 152 | file and make the equation image a link to the longer, excluded image 153 | alternative 154 | - eqn2img: patch to allow building on windows 155 | - Change build system from make to cmake 156 | - Refactored gladtex code a lot to allow the usage of "use strict/use 157 | warnings" 158 | - Fix bug where multiple equations couldn't be on a single (html) line 159 | - Rework manpage 160 | 161 | 1.3 - Un-escape common entities before processing equations 162 | - Update man page with CSS class options 163 | - Add support for setting the CSS class of images when the 164 | environment is "math" or "displaymath" 165 | - eqn2img: changed redirection syntax (from dvips to /dev/null) 166 | for portability 167 | - GladTeX: exit with status 1 when a closing EQ tag is missing 168 | - GladTeX: print error messages to stderr instead of stdout 169 | - Fix environment-passing to eqn2img 170 | - Add support for a "dpi" attribute on EQ tags to customize the 171 | DPI used for each equation 172 | 173 | 1.2 - Fixed a serious memory allocation error, pointed out by Eric J. 174 | Francois. Also fixed several leaks. 175 | - Added full alpha channel to PNG files (also suggested by Eric) 176 | - The -e option was ignored, fixed (pointed out by Andr\'e Schleife) 177 | - Added man page, contributed by Volker Schatz 178 | 179 | 1.1 - Portability fixes: Do not assume a specific location for perl 180 | (use "env" in the shebang line) and do not rely on the bash style 181 | "&>" redirection. 182 | 183 | 1.0 - Image alignment workaround (most browsers interpret 184 | "ALIGN=MIDDLE" somewhat strangely, so it has been changed 185 | to "STYLE=vertical-align: -xx") 186 | - Added cache file, so that gladtex doesn't have to regenerate 187 | images for equations that haven't changed. 188 | - Added ENV option (as in ) to support environments 189 | other than "displaymath". 190 | - Bug fixes. 191 | 192 | 0.3 - Added BoundingBox workaround (dvips sometimes outputs wrong 193 | BoundingBox, for instance when using \mathbb{}) 194 | - Moved the whole "LaTeX eqn to image" conversion into the C code, 195 | turning the C program (renamed from pngmodify to eqn2img) into 196 | a standalone utility (e.g. echo '\sqrt{2}' | eqn2img -o 197 | eqn.png). 198 | - Added colour options (-c -b and -t). 199 | - Fixed bug causing segfault when adding space _above_ an image. 200 | - Fixed image reusing bug (in 0.2 image reuse didn't work across 201 | separate files when processing files outside startup cwd). 202 | - And some other minor bugs and cosmetic changes. 203 | - Makefile added to distribution 204 | 205 | 0.2 - First official release, completely rewritten code. 206 | 207 | 0.1 - Used only internally at the Dept. of Mathematics at the Univ. of Oslo 208 | July-August 1999 (for the project "Matteknekker'n") under the name 209 | htmleqn. 210 | # vim: set expandtab sts=4 ts=4 sw=4 expandtab: 211 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | GladTeX 2 | ======= 3 | 4 | GladTeX is a utility and library to display formulas on the Web and in 5 | HTML-based formats such as EPUB. Formulas are embedded within `` tags 6 | and 7 | converted automatically to a scalable SVG image using LaTeX. The images 8 | integrate seamlessly into the output documents, work with any browser and are 9 | accessible for visually impaired and blind users as well. 10 | 11 | Features 12 | -------- 13 | 14 | - LaTeX-quality formulas with partial unicode maths support 15 | - [Pandoc](http://pandoc.org) support to convert from any format with 16 | LaTeX-formulas (MarkDown, …) to any HTML-based format, e.g. EPUB 17 | - Cache formulas to speed up subsequent document conversion 18 | - Python library GleeTeX to embed into other applications or to tailor to a 19 | specific workflow 20 | - cross-platform, written in Python, comes with Windows executables. 21 | 22 | License 23 | ------- 24 | 25 | - (C) 1999-2010 Martin G. Gulbrandsen (Perl version) 26 | - (C) 2011-2013 Jonathan Daugherty (especially release 1.3) (Perl version) 27 | - (C) 2013-2020 Sebastian Humenda (Python version) 28 | 29 | This program is distributed under the LGPL-3, or at your option, any later 30 | version of the license; for details see the accompanying file COPYING. 31 | 32 | The official project homepage is at 33 | 34 | Installation 35 | ============ 36 | 37 | ### Debian/Ubuntu 38 | 39 | On all derivatives of Debian (as Ubuntu/Mint, etc.), installing GladTeX is as 40 | easy as 41 | 42 | # apt-get install gladtex 43 | 44 | ### Windows 45 | 46 | If you want to use the program without the Python library, you should download a 47 | pre-compiled binary from . 48 | 49 | Just unzip the archive and move the files to a directory within `%PATH%`. 50 | 51 | ### From Source 52 | 53 | The following is required for installing GladTeX: 54 | 55 | - Python >= 3.4 56 | - LaTeX (2e), dvisvgm (optionally png) 57 | - the LaTeX package preview.sty 58 | 59 | 60 | #### Debian / Ubuntu 61 | 62 | On Debian/Ubuntu systems the following commands will satisfy the dependencies: 63 | 64 | # apt-get install python3-all texlive-fonts-recommended texlive-latex-recommended preview-latex-style dvipng 65 | 66 | The package can then be installed using 67 | 68 | # python3 setup.py install 69 | 70 | Note: If your system ships `python` as the command for Python3 you have to use 71 | `python in` the above command instead. 72 | 73 | #### OS X 74 | 75 | You need to install a LaTeX distribution on your Mac. GladTeX was successfully 76 | run with [MacTex](http://www.tug.org/mactex/). 77 | 78 | You can download a zip source archive from 79 | [GitHub](https://github.com/humenda/GladTeX) or use git: 80 | 81 | $ git clone https://github.com/humenda/GladTeX.git 82 | 83 | Use `cd` to change to the GladTeX source directory and issue 84 | 85 | $ python setup.py install 86 | 87 | 88 | 89 | Documentation 90 | ------------- 91 | 92 | Please use `man gladtex` for further instructions or have a look at the file 93 | [manpage.md](manpage.md). 94 | 95 | Contribute 96 | ---------- 97 | 98 | Contributions are welcome. Please use 99 | [PyFormat](https://pypi.org/project/pyformat/) to 100 | format the code. 101 | -------------------------------------------------------------------------------- /ToDo.FAQ: -------------------------------------------------------------------------------- 1 | A few items which could go into a FAQ: 2 | 3 | What do the error messages mean? 4 | Why is my formula positioned awkardly within the text? (most probably displaymath instead of inline math) 5 | Why are limits over sums and similar not correctly set? (use displaymath, sometimes `\\limits`) 6 | Why don't my special characters like (unicode) math symbols or umlauts not work 7 | -------------------------------------------------------------------------------- /ToDo.md: -------------------------------------------------------------------------------- 1 | To Do 2 | ===== 3 | 4 | This list contains things to be implemented in GladTeX. If you have additions or 5 | even feel like you want to do it, feel free to drop me an email: `shumenda |aT| 6 | gmx //dot-- de`. 7 | 8 | Uncategorized 9 | ------------- 10 | 11 | - introduce command line option which will check whether all all formulas in a 12 | cache are used and if not, remove the formula (only useful for caches 13 | corresponding to a single document) 14 | 15 | Gettext 16 | ------- 17 | 18 | 19 | Gettext should be integrated to localize messages (especially errors). 20 | 21 | Compressed Cache 22 | ---------------- 23 | 24 | The cache stores the path, the formula and the positioning of an image. For 25 | large documents, this might be quite big, hence it makes sense to compress them. 26 | 27 | To make things easier, the cache should have a .gz extension. 28 | 29 | -------------------------------------------------------------------------------- /create_windows_distributions.py: -------------------------------------------------------------------------------- 1 | """This file builds windows distributions, zip files with GladTeX and all other 2 | files.""" 3 | import os 4 | import shutil 5 | import stat 6 | import sys 7 | import zipfile 8 | import gleetex 9 | 10 | 11 | def exec_setup_py(arg_string): 12 | """Execute `python setup.py` as a subprocess. 13 | 14 | Use Wine, if necessary. 15 | """ 16 | ret = None 17 | if sys.platform.startswith('win'): 18 | ret = os.system('python setup.py ' + arg_string) 19 | else: 20 | if not shutil.which('wine'): 21 | print('Error: Wine is not installed, aborting…') 22 | sys.exit(5) 23 | ret = os.system('wine python setup.py ' + arg_string) 24 | if ret: 25 | if sys.platform.startswith('win'): 26 | print('Aborting at command `python setup.py %s`.' % arg_string) 27 | else: 28 | print('Aborting at command `wine python setup.py %s`.' % arg_string) 29 | sys.exit(7) 30 | 31 | 32 | def get_python_version(): 33 | """Return the python version as a string.""" 34 | import re 35 | import subprocess 36 | 37 | args = ['python', '--version'] 38 | if not sys.platform.startswith('win'): 39 | args = ['wine'] + args 40 | proc = subprocess.Popen(args, stdout=subprocess.PIPE) 41 | stdout = proc.communicate()[0].decode(sys.getdefaultencoding()) 42 | if proc.wait(): 43 | raise TypeError( 44 | 'Abnormal subprocess termination while querying python version.' 45 | ) 46 | return re.search(r'.*?(\d+\.\d+\.\d+)', stdout).groups()[0] 47 | 48 | 49 | def get_executable_name(label): 50 | """Construct the name of an executable.""" 51 | return 'gladtex-win64-%s-py_%s-%s.zip' % ( 52 | gleetex.VERSION, 53 | get_python_version(), 54 | label, 55 | ) 56 | 57 | 58 | def bundle_files(src, output_name): 59 | """Bundle the compiled binary files with README, ChangeLog and COPYING.""" 60 | if os.path.exists(output_name): 61 | shutil.rmtree(output_name) 62 | os.rename(src, output_name) 63 | # add README.first 64 | with open(os.path.join(output_name, 'README.first.txt'), 'w') as f: 65 | f.write('GladTeX for Windows\r\n===================\r\n\r\n') 66 | f.write( 67 | 'This program has been compiled with python 3.4.4. If you want to embedd it in binary form with your binary python application, the version numbers HAVE TO match.\r\n' 68 | ) 69 | f.write( 70 | '\r\nFor more information, see the file README.md or http://humenda.github.io/GladTeX\r\n' 71 | ) 72 | 73 | # copy README and other files 74 | for file in ['README.md', 'COPYING', 'ChangeLog']: 75 | dest = os.path.join(output_name, file) 76 | # check whether file ending exists 77 | if not '.' in dest[-5:]: 78 | dest += '.txt' 79 | shutil.copy(file, dest) 80 | 81 | files = [ 82 | os.path.join(root, file) 83 | for root, _, files in os.walk(output_name) 84 | for file in files 85 | ] 86 | with zipfile.ZipFile(output_name + '.zip', 'w', zipfile.ZIP_DEFLATED) as z: 87 | for file in files: 88 | z.write(file) 89 | shutil.rmtree(output_name) 90 | 91 | 92 | class TemporaryBuildDirectory: 93 | """Context handler to guard the build process. 94 | 95 | Upon entering the context, the source is copied to a temporary 96 | directory and the program changes to this directory. After all build 97 | actions have been done, the output file is copied back to the 98 | original directory, the program resets the current working directory 99 | and deletes the temporary directory. 100 | """ 101 | 102 | def __init__(self, output_file_name): 103 | self.orig_cwd = os.getcwd() 104 | self.tmpdir = None 105 | self.output_file_name = output_file_name 106 | 107 | def __enter__(self): 108 | self.tmpdir = self.get_temp_directory() 109 | shutil.copytree(os.getcwd(), self.tmpdir) 110 | os.chdir(self.tmpdir) 111 | return self 112 | 113 | def __exit__(self, _a, _b, _c): 114 | os.chdir(self.orig_cwd) 115 | shutil.copy( 116 | os.path.join( 117 | self.tmpdir, self.output_file_name), self.output_file_name 118 | ) 119 | shutil.rmtree(self.tmpdir, onerror=self.__onerror) 120 | 121 | def get_temp_directory(self): 122 | """Find a temporary directory to work in. 123 | 124 | The checks are done to find a directory which does not reside 125 | within the user's path, because py2exe includes absolute paths 126 | for python scripts (in their tracebacks). It is not desirable to 127 | show the whole world the directory layout of the computer where 128 | the source code was built on. 129 | """ 130 | tmp_base = None 131 | if os.path.exists('/tmp'): 132 | tmp_base = '/tmp' 133 | elif os.path.exists('\\temp'): 134 | tmp_base = '\\temp' 135 | elif os.path.exists('\\windows\\temp'): 136 | tmp_base = '\\windows\\temp' 137 | else: 138 | import tempfile 139 | 140 | tmp_base = tempfile.gettempdir() 141 | tmpdir = os.path.join(tmp_base, 'gladtex.build') 142 | if os.path.exists(tmpdir): 143 | shutil.rmtree(tmpdir, onerror=self.__onerror) 144 | return tmpdir 145 | 146 | def __onerror(self, func, path, exc_info): 147 | """Error handler for ``shutil.rmtree``. 148 | 149 | If the error is due to an access error (read only file) it attempts to 150 | add write permission and then retries. If the error is for another reason it re-raises the error. 151 | Usage : ``shutil.rmtree(path, onerror=onerror)``. 152 | """ 153 | if not os.access(path, os.W_OK): 154 | # Is the error an access error ? 155 | os.chmod(path, stat.S_IWUSR) 156 | func(path) 157 | else: 158 | raise exc_info 159 | 160 | 161 | if __name__ == '__main__': 162 | with TemporaryBuildDirectory(get_executable_name('embeddable')) as tb: 163 | # build embeddable release, where all files are separate DLL's; if somebody 164 | # distributes a python app, these DLL files can be shared 165 | exec_setup_py('py2exe -c -O 2 -i gleetex --bundle-files 3') 166 | bundle_files('dist', os.path.splitext(tb.output_file_name)[0]) 167 | 168 | # create a stand-alone version of GladTeX 169 | with TemporaryBuildDirectory(get_executable_name('standalone')) as tb: 170 | exec_setup_py('py2exe -i gleetex -c -O 2 --bundle-files 1') 171 | bundle_files('dist', os.path.splitext(tb.output_file_name)[0]) 172 | -------------------------------------------------------------------------------- /examples/md2epub.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """This demo script converts from markdown to Epub using GladTeX. It requires 3 | Pandoc for the conversion. 4 | 5 | Throughout this script, the abbreviation AST for Abstract Syntax Tree is 6 | used. 7 | """ 8 | 9 | import json 10 | import os 11 | import shutil 12 | import subprocess 13 | import sys 14 | 15 | import gleetex 16 | 17 | 18 | def transform_ast(ast): 19 | # extract formulas from Pandoc document AST 20 | formulas = gleetex.pandoc.extract_formulas(ast) 21 | # converter using cache, helps avoiding the same formula twice 22 | conv = gleetex.cachedconverter.CachedConverter('.', True, encoding='UTF-8') 23 | # automatically handle unicode 24 | conv.set_replace_nonascii(True) 25 | # go parallel 26 | conv.convert_all('.', formulas) 27 | 28 | # an converted image has information like image depth and height, adjust 29 | # data structure for write-back 30 | formulas = [conv.get_data_for(eqn, style) for _p, style, eqn in formulas] 31 | # get a formatter instance 32 | with gleetex.htmlhandling.HtmlImageFormatter('.') as img_fmt: 33 | # non-ascii sequences will be replaced in the laternative text 34 | img_fmt.set_replace_nonascii(True) 35 | # this alters the AST reference, so no return value required 36 | gleetex.pandoc.replace_formulas_in_ast( 37 | img_fmt, ast['blocks'], formulas) 38 | 39 | 40 | def cleanup(path): 41 | # remove images and ache, relevant data is included within the EPUB 42 | for file in os.listdir(path): 43 | if file.endswith('.png') or file.endswith('.cache'): 44 | os.remove(os.path.join(path, file)) 45 | 46 | 47 | def main(): 48 | for prog in ('pandoc', 'gladtex'): 49 | if not shutil.which(prog): 50 | sys.stderr.write( 51 | ('This script requires %s, please install and rerun ' 'this script.') 52 | % prog 53 | ) 54 | sys.exit(1) 55 | 56 | usage = False 57 | if len(sys.argv) < 2: 58 | print('Missing command arguments.') 59 | usage = True 60 | elif len(sys.argv) > 2 or (len(sys.argv) == 2 and not os.path.exists(sys.argv[1])): 61 | print('Exactly one input path required') 62 | usage = True 63 | if usage: 64 | print( 65 | '%s \n\nConvert given file to epub using GladTeX.' % sys.argv[0] 66 | ) 67 | sys.exit(0) 68 | 69 | inputfile = sys.argv[1] 70 | outputfile = '%s.epub' % os.path.splitext(inputfile)[0] 71 | # get the document AST 72 | proc = subprocess.Popen( 73 | ['pandoc', '-t', 'json', inputfile], stdout=subprocess.PIPE) 74 | ast = json.loads(proc.communicate()[0].decode(sys.getdefaultencoding())) 75 | if proc.wait() != 0: 76 | sys.exit(2) 77 | 78 | # the actual GleeTeX calls are here 79 | transform_ast(ast) 80 | 81 | # write back to stdin of pandoc 82 | proc = subprocess.Popen( 83 | ['pandoc', '-o', outputfile, '-f', 'json', '-t', 'epub'], stdin=subprocess.PIPE 84 | ) 85 | proc.communicate(json.dumps(ast).encode(sys.getdefaultencoding())) 86 | if proc.wait(): 87 | sys.exit(2) 88 | cleanup('.') 89 | 90 | 91 | if __name__ == '__main__': 92 | main() 93 | -------------------------------------------------------------------------------- /gleetex/__init__.py: -------------------------------------------------------------------------------- 1 | # (c) 2013-2021 Sebastian Humenda 2 | # This code is licenced under the terms of the LGPL-3+, see the file COPYING for 3 | # more details. 4 | from . import caching 5 | from . import cachedconverter 6 | from . import htmlhandling 7 | from . import image 8 | from . import pandoc 9 | from . import parser 10 | from . import sink 11 | from . import typesetting 12 | 13 | VERSION = '3.1.0' 14 | 15 | __all__ = [ 16 | 'caching', 17 | 'cachedconverter', 18 | 'htmlhandling', 19 | 'image', 20 | 'pandoc', 21 | 'parser', 22 | 'sink', 23 | 'typesetting', 24 | 'unicode', 25 | 'VERSION', 26 | ] 27 | -------------------------------------------------------------------------------- /gleetex/__main__.py: -------------------------------------------------------------------------------- 1 | # (c) 2013-2021 Sebastian Humenda 2 | # This code is licenced under the terms of the LGPL-3+, see the file COPYING for 3 | # more details. 4 | import argparse 5 | import multiprocessing 6 | import os 7 | import shlex 8 | import posixpath 9 | import sys 10 | import textwrap 11 | 12 | from . import ( 13 | caching, 14 | cachedconverter, 15 | htmlhandling, 16 | pandoc, 17 | parser, 18 | sink, 19 | typesetting, 20 | VERSION, 21 | ) 22 | 23 | 24 | class HelpfulCmdParser(argparse.ArgumentParser): 25 | """This variant of arg parser always prints the full help whenever an error 26 | occurs.""" 27 | 28 | def error(self, message): 29 | sys.stderr.write('error: %s\n' % message) 30 | self.print_help() 31 | sys.exit(2) 32 | 33 | 34 | class Main: 35 | """This class parses command line arguments and deals with the conversion. 36 | 37 | Only the run method needs to be called. 38 | """ 39 | 40 | def __init__(self): 41 | self.__encoding = 'utf-8' 42 | 43 | def _parse_args(self, args): 44 | """Parse command line arguments and return option instance.""" 45 | epilog = 'GladTeX %s, http://humenda.github.io/GladTeX' % VERSION 46 | description = ( 47 | 'GladTeX is a preprocessor that enables the use of LaTeX' 48 | ' maths within HTML files. The maths, embedded in ... ' 49 | 'tags, as if within \\(..\\) in LaTeX (or $...$ in TeX), is fed ' 50 | 'through latex and replaced by images.\n\nPlease also see the ' 51 | 'documentation on the web or from the manual page for more ' 52 | 'information, especially on environment variables.' 53 | ) 54 | cmd = HelpfulCmdParser(epilog=epilog, description=description) 55 | cmd.add_argument( 56 | '-a', 57 | default=sink.EXCLUSION_FILE_NAME, 58 | dest='exclusionfile', 59 | help='path to the file to which to write excluded formulas' 60 | + 'for images which are too long for the alt attribute into a ' 61 | + 'single separate file and link images to it', 62 | ) 63 | cmd.add_argument( 64 | '-b', 65 | dest='background_color', 66 | help=( 67 | 'Set background color for resulting images ' 68 | '(default transparent, use hex)' 69 | ), 70 | ) 71 | cmd.add_argument( 72 | '-c', 73 | dest='foreground_color', 74 | help=('Set foreground color for resulting images (default ' '000000, hex)'), 75 | ) 76 | cmd.add_argument( 77 | '-d', 78 | default='', 79 | dest='img_directory', 80 | help='Directory in which to' 81 | + ' store generated images in (relative to the output file)', 82 | ) 83 | cmd.add_argument( 84 | '-e', 85 | dest='latex_maths_env', 86 | help='Set custom maths environment to surround the formula' 87 | + ' (e.g. flalign)', 88 | ) 89 | cmd.add_argument( 90 | '-f', 91 | metavar='SIZE', 92 | dest='fontsize', 93 | default=12, 94 | help='Set font size in pt (default 12)', 95 | ) 96 | cmd.add_argument( 97 | '-E', 98 | dest='encoding', 99 | default=None, 100 | help='Overwrite encoding to use (default UTF-8)', 101 | ) 102 | cmd.add_argument( 103 | '--epub', 104 | dest='is_epub', 105 | default=False, 106 | action='store_true', 107 | help='Optimise output for epub, for instance round height/width of ' 108 | 'images', 109 | ) 110 | cmd.add_argument( 111 | '-i', 112 | metavar='CLASS', 113 | dest='inlinemath', 114 | help="CSS class to assign to inline math (default: 'inlinemath')", 115 | ) 116 | cmd.add_argument( 117 | '-l', 118 | metavar='CLASS', 119 | dest='displaymath', 120 | help="CSS class to assign to block-level math (default: 'displaymath')", 121 | ) 122 | cmd.add_argument( 123 | '-K', 124 | dest='keep_latex_source', 125 | action='store_true', 126 | default=False, 127 | help='keep LaTeX file(s) when converting formulas (useful for debugging)', 128 | ) 129 | cmd.add_argument( 130 | '-m', 131 | dest='machinereadable', 132 | action='store_true', 133 | default=False, 134 | help='Print output in machine-readable format (less concise, better parseable)', 135 | ) 136 | cmd.add_argument( 137 | '-n', 138 | action='store_true', 139 | dest='notkeepoldcache', 140 | help=( 141 | 'Purge unreadable caches along with all eqn*.png files. ' 142 | 'Caches can be unreadable if the used GladTeX version is ' 143 | 'incompatible. If this option is unset, GladTeX will ' 144 | 'simply fail when the cache is unreadable.' 145 | ), 146 | ) 147 | cmd.add_argument( 148 | '-o', 149 | metavar='FILENAME', 150 | dest='output', 151 | help=( 152 | "Set output file name; '-' will print text to stdout (by" 153 | 'default input file name is used and .htex extension changed ' 154 | 'to .html)' 155 | ), 156 | ) 157 | cmd.add_argument( 158 | '-p', 159 | metavar='LATEX_STATEMENT', 160 | dest='preamble', 161 | help='Add given LaTeX code to the preamble of the LaTeX ' 162 | + 'document that is used to generate the embedded images. ' 163 | + 'In order to add the contents of a file to the preamble, ' 164 | + 'use `-p "\\input{FILE}"`.', 165 | ) 166 | cmd.add_argument( 167 | '-P', 168 | dest='pandocfilter', 169 | action='store_true', 170 | help='Use GladTeX as a Pandoc filter: read a Pandoc JSON AST ' 171 | 'from stdin, convert the images, change math blocks to ' 172 | 'images and write JSON to stdout; ' 173 | 'see the man page on how to pass args to GladTeX in this mode', 174 | ) 175 | cmd.add_argument( 176 | '--png', 177 | action='store_true', 178 | dest='png', 179 | help='Use PNG instead of SVG for images', 180 | ) 181 | cmd.add_argument( 182 | '-r', 183 | '--resolution', 184 | metavar='DPI', 185 | dest='dpi', 186 | default=None, 187 | help=( 188 | 'Set resolution in DPI, only available if PNG output ' 189 | 'selected; also see `-f`' 190 | ), 191 | ) 192 | cmd.add_argument( 193 | '-R', 194 | action='store_true', 195 | dest='replace_nonascii', 196 | default=False, 197 | help='Replace non-ascii characters in formulas ' 198 | 'through their LaTeX commands', 199 | ) 200 | cmd.add_argument( 201 | '-u', 202 | metavar='URL', 203 | dest='url', 204 | help='URL to image files (relative links are default)', 205 | ) 206 | cmd.add_argument( 207 | 'input', 208 | help='Input .htex file with LaTeX ' 209 | + 'formulas (if omitted or -, stdin will be read)', 210 | ) 211 | return cmd.parse_args(args) 212 | 213 | def exit(self, text, status): 214 | """Exit function. 215 | 216 | Could be used to register any clean up action. 217 | """ 218 | sys.stderr.write(text) 219 | if not text.endswith('\n'): 220 | sys.stderr.write('\n') 221 | sys.exit(status) 222 | 223 | def validate_options(self, opts): 224 | """Validate certain arguments suppliedon the command line. 225 | 226 | The user will get a (hopefully) helpful error message if he/she 227 | gave an invalid parameter. 228 | """ 229 | if opts.fontsize and opts.dpi: 230 | print("Options -f and -d can't be used at the same time.") 231 | sys.exit(14) 232 | if opts.dpi and not opts.png: 233 | print(('Impossible to set resolution when using SVG as output, ' 'try -f')) 234 | sys.exit(14) 235 | 236 | def get_input_output(self, options): 237 | """Determine whether GladTeX is reading from stdin/file, writing to 238 | stdout/file and determine base_directory if files are in another 239 | directory. 240 | 241 | If no output file name is given and there is a input file to 242 | read from, output is written to a file ending on .html instead 243 | of .htex. The returned document is either string or byte, the 244 | latter if encoding is unknown. 245 | """ 246 | data = None 247 | output = '-' 248 | if options.input == '-': 249 | data = sys.stdin.read() 250 | else: 251 | try: 252 | # if encoding was specified or if a pandoc filter is supplied, 253 | # read document with default encoding 254 | if options.encoding or options.pandocfilter: 255 | encoding = 'UTF-8' if options.pandocfilter else options.encoding 256 | with open(options.input, encoding=encoding) as f: 257 | data = f.read() 258 | else: # read as binary and guess from HTML meta charset 259 | with open(options.input, 'rb') as file: 260 | data = file.read() 261 | except UnicodeDecodeError as e: 262 | self.exit( 263 | ( 264 | f'Error while reading from {options.input}: {e}\nProbably this ' 265 | 'file has a different encoding, try specifying -E.' 266 | ), 267 | 88, 268 | ) 269 | except IsADirectoryError: 270 | self.exit(f'Error: cannot open {options.input} for reading: is a directory.', 19) 271 | except FileNotFoundError: 272 | self.exit(f'Error: file {options.input} not found.', 20) 273 | 274 | # check which output file name to use 275 | base_path = '' 276 | if options.output: 277 | base_path = os.path.dirname(options.output) 278 | elif options.input != '-': 279 | output = os.path.splitext(options.input)[0] + '.html' 280 | base_path = os.path.dirname(options.input) 281 | 282 | if base_path: # if finally a basepath found:, strip \\ if on Windows 283 | base_path = posixpath.join(*(base_path.split('\\'))) 284 | # the basepath needs to be relative to the output file 285 | return (data, base_path, output) 286 | 287 | def run(self, args): 288 | options = self._parse_args(args[1:]) 289 | self.validate_options(options) 290 | self.__encoding = options.encoding 291 | fmt = 'pandocfilter' if options.pandocfilter else 'html' 292 | doc, base_path, output = self.get_input_output(options) 293 | try: 294 | # doc is either a list of raw HTML chunks and formulas or a tuple of 295 | # (document AST, list of formulas) if options.pandocfilter 296 | self.__encoding, doc = parser.parse_document(doc, fmt) 297 | except parser.ParseException as e: 298 | input_fn = 'stdin' if options.input == '-' else options.input 299 | self.exit(f'Error while parsing {input_fn}: {e}', 5) 300 | 301 | processed = self.convert_images( 302 | doc, base_path, options.img_directory, options) 303 | img_fmt = htmlhandling.HtmlImageFormatter( 304 | base_path=os.path.join(base_path, options.img_directory), 305 | link_prefix=options.url, 306 | exclusion_file_path=options.exclusionfile, 307 | is_epub=options.is_epub, 308 | ) 309 | if options.replace_nonascii: 310 | img_fmt.set_replace_nonascii(True) 311 | if options.url: 312 | img_fmt.set_url(options.url) 313 | if options.inlinemath: 314 | img_fmt.set_inline_math_css_class(options.inlinemath) 315 | if options.displaymath: 316 | img_fmt.set_display_math_css_class(options.displaymath) 317 | 318 | # pass formatter to document sinks; the formatter will accumulate 319 | # formulas that were too long to write them out later 320 | with ( 321 | sys.stdout if output == '-' else open( 322 | output, 'w', encoding=self.__encoding) 323 | ) as file: 324 | if options.pandocfilter: 325 | pandoc.write_pandoc_ast(file, processed, img_fmt) 326 | else: 327 | htmlhandling.write_html(file, processed, img_fmt) 328 | # ToDo: make sink type an argument 329 | sink_type = sink.SinkType.html_file 330 | try: 331 | sink.EXCLUSION_FORMULA_SINKS[sink_type]( 332 | img_fmt.get_exclusion_file_path(), img_fmt.get_excluded()) 333 | except KeyError: 334 | raise NotImplementedError() from None 335 | 336 | def convert_images(self, parsed_document, base_path, img_dir, options): 337 | """Convert all formulas to images and store file path and equation in a 338 | list to be processed later on.""" 339 | base_path = '' if not base_path or base_path == '.' else base_path 340 | img_dir = '' if not img_dir or img_dir == '.' else img_dir 341 | result = [] 342 | try: 343 | conv = cachedconverter.CachedConverter( 344 | base_path, 345 | not options.notkeepoldcache, 346 | encoding=self.__encoding, 347 | img_dir=img_dir, 348 | ) 349 | except caching.JsonParserException as e: 350 | self.exit(e.args[0], 78) 351 | 352 | self.set_options(conv, options) 353 | if options.pandocfilter: 354 | formulas = parsed_document[1] 355 | else: # HTML chunks from EqnParser 356 | formulas = [ 357 | c for c in parsed_document if isinstance(c, (tuple, list))] 358 | try: 359 | conv.convert_all(formulas) 360 | except cachedconverter.ConversionException as e: 361 | self.emit_latex_error( 362 | e, options.machinereadable, options.replace_nonascii) 363 | 364 | if options.pandocfilter: 365 | # return (ast, formulas), just with formulas being replaced with the 366 | # conversion data 367 | return ( 368 | parsed_document[0], 369 | [conv.get_data_for(eqn, style) for _p, style, eqn in formulas], 370 | ) 371 | for chunk in parsed_document: 372 | # output of EqnParser: list-alike is formula, str is raw HTML 373 | if isinstance(chunk, (tuple, list)): 374 | _p, displaymath, formula = chunk 375 | try: 376 | result.append(conv.get_data_for(formula, displaymath)) 377 | except KeyError as e: 378 | # formula is usually tuple(str, bool) 379 | formula = e.args[0] 380 | if isinstance(formula, (list, tuple)): 381 | formula = e.args[0][0] # ignore bool(displaymath) 382 | raise KeyError( 383 | ( 384 | "formula '{}' not found; that means it was " 385 | 'not converted which should usually not happen.' 386 | ).format(formula) 387 | ) from e 388 | else: 389 | result.append(chunk) 390 | return result 391 | 392 | def set_options(self, conv, options): 393 | """Apply options from command line parser to the converter.""" 394 | # set options 395 | options_to_query = [ 396 | 'preamble', 397 | 'latex_maths_env', 398 | 'png', 399 | 'keep_latex_source', 400 | 'foreground_color', 401 | 'background_color', 402 | 'is_epub', 403 | ] 404 | for option_str in options_to_query: 405 | option = getattr(options, option_str) 406 | if option: 407 | if option in ('True', 'False', 'false', 'true'): 408 | option = bool(option) 409 | conv.set_option(option_str, option) 410 | if options.dpi: 411 | conv.set_option('dpi', float(options.dpi)) 412 | elif options.fontsize: 413 | conv.set_option('fontsize', options.fontsize) 414 | if options.replace_nonascii: 415 | conv.set_replace_nonascii(True) 416 | 417 | def emit_latex_error(self, err, machine_readable, escape): 418 | """Format a LaTeX error in a meaningful way. 419 | 420 | The argument escape specifies, whether the -R switch had been 421 | passed. If the pandocfilter mode is active, formula positions 422 | will be omitted; this makes the code more complex. 423 | """ 424 | if 'DEBUG' in os.environ and os.environ['DEBUG'] == '1': 425 | raise err 426 | escaped = err.formula 427 | if escape: 428 | escaped = typesetting.escape_unicode_maths(err.formula) 429 | msg = None 430 | additional = '' 431 | if 'Package inputenc' in err.args[0]: 432 | additional += ( 433 | 'Add the switch `-R` to automatically replace unicode ' 434 | 'characters with LaTeX command sequences.' 435 | ) 436 | if machine_readable: 437 | msg = 'Number: {}\nFormula: {}{}\nMessage: {}'.format( 438 | err.formula_count, 439 | err.formula, 440 | ( 441 | '' 442 | if escaped == err.formula 443 | else '\nLaTeXified formula: %s' % escaped 444 | ), 445 | err.cause, 446 | ) 447 | if err.src_line_number and err.src_pos_on_line: 448 | msg = ('Line: {}, {}\n' + msg).format( 449 | err.src_line_number, err.src_pos_on_line 450 | ) 451 | if additional: 452 | msg += '; ' + additional 453 | else: 454 | formula = ' ' + err.formula.replace('\n', '\n ') 455 | escaped = ( 456 | ' ' + escaped.replace('\n', '\n ') 457 | if escaped != err.formula 458 | else '' 459 | ) 460 | msg = 'Error while converting formula %d' % err.formula_count 461 | if err.src_line_number and err.src_pos_on_line: 462 | msg = msg.rstrip() + ' at line %d, %d:\n' % ( 463 | err.src_line_number, 464 | err.src_pos_on_line, 465 | ) 466 | msg += '%s%s\n%s' % ( 467 | formula, 468 | ( 469 | '' 470 | if not escaped or escaped == err.formula 471 | else '\nFormula without unicode symbols:\n%s' % escaped 472 | ), 473 | err.cause, 474 | ) 475 | if additional: 476 | 477 | msg += ' undefined.\n' + \ 478 | '\n'.join(textwrap.wrap(additional, 80)) 479 | self.exit(msg, 91) 480 | 481 | 482 | def main(): 483 | """Entry point for setuptools.""" 484 | # enable multiprocessing on Windows, see python docs 485 | multiprocessing.freeze_support() 486 | m = Main() 487 | # run as pandoc filter? 488 | args = sys.argv[1:] # fallback if no environment variable set 489 | if 'GLADTEX_ARGS' in os.environ: 490 | args = shlex.split(os.environ['GLADTEX_ARGS']) 491 | if '-P' not in args: 492 | args = ['-P'] + args 493 | m.run([sys.argv[0]] + args) 494 | 495 | 496 | if __name__ == '__main__': 497 | main() 498 | -------------------------------------------------------------------------------- /gleetex/cachedconverter.py: -------------------------------------------------------------------------------- 1 | # (c) 2013-2022 Sebastian Humenda 2 | # This code is licenced under the terms of the LGPL-3+, see the file COPYING for 3 | # more details. 4 | """ 5 | A preconfigured image converter that caches conversion results. 6 | 7 | This convert with caching ability is less flexible than using the image 8 | converter directly, but automatically uses a cache if available to avoid 9 | conversion if the formula image is already present.""" 10 | 11 | import concurrent.futures 12 | import multiprocessing 13 | import os 14 | import subprocess 15 | 16 | from . import caching, image, typesetting 17 | from .caching import normalize_formula 18 | from .image import Format 19 | 20 | 21 | class ConversionException(Exception): 22 | """This exception is raised whenever a problem occurs during conversion. 23 | 24 | Example: 25 | c = ConversionException("cause", "\\tau", 10, 38, 5) 26 | assert c.cause == cause 27 | assert c.formula == '\\tau' 28 | assert c.src_line_number == 10 # line number in source document (counting from 1) 29 | assert c.src_pos_on_line == 38 # position of formula in source line, counting from 1 30 | assert c.formula_count == 5 # fifth formula in document (starting from 1) 31 | """ 32 | 33 | # mind your own business mr. pylint: 34 | # pylint: disable=too-many-arguments 35 | def __init__( 36 | self, cause, formula, formula_count, src_line_number=None, src_pos_on_line=None 37 | ): 38 | # provide a default error message 39 | if src_line_number and src_pos_on_line: 40 | super().__init__( 41 | 'LaTeX failed at formula line {}, {}, no. {}: {}'.format( 42 | src_line_number, src_pos_on_line, formula_count, cause 43 | ) 44 | ) 45 | else: 46 | super().__init__( 47 | 'LaTeX failed at formula no. {}: {}'.format( 48 | formula_count, cause) 49 | ) 50 | # provide attributes for upper level error handling 51 | self.cause = cause 52 | self.formula = formula 53 | self.src_line_number = src_line_number 54 | self.src_pos_on_line = src_pos_on_line 55 | self.formula_count = formula_count 56 | 57 | 58 | class CachedConverter: 59 | """Convert formulas to images. 60 | 61 | Cache the resulting images to reuse those for subsequent runs or for 62 | recurring instances in the same document. 63 | 64 | c = CachedConverter(base_path) 65 | for formula in [... formulas ...]: 66 | pos, file_path = c.convert(formula) 67 | ... 68 | 69 | The formula is either converted or retrieved from a cache in the same 70 | directory like the images. 71 | 72 | :param base_path directory of the output HTML file; link references in the 73 | HTML document will link relative to it 74 | :param keep_old_cache If an existing cache cannot be read (incompatible 75 | GladTeX version, ...) Aand the flag is set, the program will simply 76 | crash and tell the user to remove the cache (default). If set to False, 77 | the program will instead remove the cache and all eqn* files and 78 | recreate the cache. 79 | :param encoding The encoding for the LaTeX document, default None 80 | :param img_dir directory for images (default ., equivalent to base_path) 81 | For example "images" would put it in `base_path`/images and "../img" 82 | would put it in "base_path/../img" 83 | """ 84 | 85 | GLADTEX_CACHE_FILE_NAME = 'gladtex.cache' 86 | 87 | def __init__(self, base_path, keep_old_cache=True, encoding=None, img_dir=''): 88 | empty_path = lambda p: ('' if not p or p.strip(os.sep) == '.' else p) 89 | self.__output_path = empty_path(base_path) # path for converted document 90 | self.__img_dir = empty_path(img_dir) # relative to base_path 91 | # cache path is **relative** to base_path 92 | cache_path = os.path.join( 93 | self.__img_dir, CachedConverter.GLADTEX_CACHE_FILE_NAME 94 | ) 95 | self.__is_epub = False 96 | self.__cache = caching.ImageCache( 97 | cache_path, 98 | keep_old_cache=keep_old_cache, 99 | base_path=empty_path(self.__output_path), 100 | ) 101 | self.__converter = None 102 | self.__options = { 103 | 'dpi': None, 104 | 'transparency': None, 105 | 'fontsize': None, 106 | 'background_color': None, 107 | 'foreground_color': None, 108 | 'preamble': None, 109 | 'latex_maths_env': None, 110 | 'keep_latex_source': False, 111 | 'png': False, 112 | 'is_epub': False, 113 | } 114 | self.__encoding = encoding 115 | self.__replace_nonascii = False 116 | 117 | def set_option(self, option, value): 118 | """Set one of the options accepted for gleetex.image.Tex2img. 119 | 120 | It is a proxy function. `option` must be one of dpi, fontsize, 121 | transparency, background_color, foreground_color, preamble, 122 | latex_maths_env, keep_latex_source, png. 123 | """ 124 | if not option in self.__options.keys(): 125 | raise ValueError( 126 | 'Option must be one of ' + ', '.join(self.__options.keys()) 127 | ) 128 | self.__options[option] = value 129 | 130 | def set_replace_nonascii(self, flag): 131 | """If set, GladTeX will convert all non-ascii character to LaTeX 132 | commands. 133 | 134 | This setting is passed through to typesetting.LaTeXDocument. 135 | """ 136 | self.__replace_nonascii = flag 137 | 138 | def convert_all(self, formulas): 139 | """convert_all(formulas) Convert all formulas using self.convert 140 | concurrently. 141 | 142 | Each element of `formulas` must be a tuple containing (formula, 143 | displaymath, Formulas already contained in the cache are not 144 | converted. 145 | """ 146 | formulas_to_convert = self._get_formulas_to_convert(formulas) 147 | if formulas_to_convert: 148 | self.__converter = image.Tex2img( 149 | Format.Png if self.__options['png'] else Format.Svg 150 | ) 151 | # apply configured image output options 152 | for option, value in self.__options.items(): 153 | if value and hasattr(self.__converter, 'set_' + option): 154 | if isinstance(value, str): # only try string -> number 155 | try: # some values are numbers 156 | value = float(value) 157 | except ValueError: 158 | pass 159 | getattr(self.__converter, 'set_' + option)(value) 160 | self._convert_concurrently(formulas_to_convert) 161 | 162 | def _get_formulas_to_convert(self, formulas): 163 | """Build up a pipeline (list) of formulas for conversion. 164 | Formulas that that are in the cache or are doubled in the pipeline are dropped.""" 165 | pipeline = [] # find as many file names as equations 166 | file_ext = Format.Png.value if self.__options['png'] else Format.Svg.value 167 | eqn_path = lambda x: os.path.join(self.__img_dir, 'eqn%03d.%s' % (x, file_ext)) 168 | abs_eqn_path = lambda x: os.path.join(self.__output_path, eqn_path(x)) 169 | 170 | # is (formula, display_math) already in the list of formulas to convert; 171 | # displaymath is important since formulas look different in inline maths 172 | formula_was_converted = lambda f, dsp: \ 173 | (normalize_formula(f), dsp) in ( (normalize_formula(u[0]), u[3]) for u in pipeline) 174 | # find enough free file names 175 | file_name_count = 0 176 | used_file_names = [] # track which file names have been assigned 177 | for formula_count, (pos, dsp, formula) in enumerate(formulas): 178 | # ToDo: this belongs in the cache 179 | if not self.__cache.contains(formula, dsp) and not formula_was_converted( 180 | formula, dsp 181 | ): 182 | while ( 183 | os.path.exists(abs_eqn_path(file_name_count)) 184 | or eqn_path(file_name_count) in used_file_names 185 | ): 186 | file_name_count += 1 187 | used_file_names.append(eqn_path(file_name_count)) 188 | pipeline.append( 189 | (formula, pos, eqn_path(file_name_count), dsp, formula_count + 1) 190 | ) 191 | return pipeline 192 | 193 | def _convert_concurrently(self, formulas_to_convert): 194 | """The actual concurrent conversion process. 195 | 196 | Method is intended to be called from convert_all(). 197 | """ 198 | imgdir_full = os.path.join(self.__output_path, self.__img_dir) 199 | if imgdir_full and not os.path.exists(imgdir_full): 200 | # create directory *before* it is required in the concurrent 201 | # formulacreation step 202 | os.makedirs(imgdir_full) 203 | 204 | thread_count = int(multiprocessing.cpu_count() * 2) 205 | # convert missing formulas 206 | with concurrent.futures.ThreadPoolExecutor( 207 | max_workers=thread_count 208 | ) as executor: 209 | # start conversion and mark each thread with its formula, position 210 | # in the source file and formula_count (index into a global list of 211 | # formulas) 212 | jobs = { 213 | executor.submit(self.__convert, eqn, path, dsp): (eqn, pos, count) 214 | for (eqn, pos, path, dsp, count) in formulas_to_convert 215 | } 216 | error_occurred = None 217 | for future in concurrent.futures.as_completed(jobs): 218 | # cancel all pending requests 219 | if error_occurred and not future.done(): 220 | future.cancel() 221 | continue 222 | formula, pos_in_src, formula_count = jobs[future] 223 | error_occurred = self._handle_job_output(future, formula, pos_in_src, formula_count) 224 | # pylint: disable=raising-bad-type 225 | if error_occurred: 226 | raise error_occurred 227 | 228 | def _handle_job_output(self, future, formula, pos_in_src, formula_count): 229 | """Process the output as produced by each conversion future. 230 | Handle the error case by signalling the error to end all other 231 | conversions.""" 232 | try: 233 | data = future.result() 234 | except subprocess.SubprocessError as e: 235 | # retrieve the position (line, pos on line) in the source document 236 | # from original formula list 237 | if pos_in_src: # missing for the pandocfilter case 238 | pos_in_src = [p + 1 for p in pos_in_src] # line/pos count from 1 239 | self.__cache.write() # write back cache with valid entries 240 | if pos_in_src: # pandocfilter case: 241 | return ConversionException( 242 | str(e.args[0]), 243 | formula, 244 | formula_count, 245 | pos_in_src[0], 246 | pos_in_src[1], 247 | ) 248 | else: 249 | return ConversionException( 250 | str(e.args[0]), formula, formula_count 251 | ) 252 | else: 253 | self.__cache.add_formula( 254 | formula, data['pos'], data['path'], data['displaymath'] 255 | ) 256 | self.__cache.write() 257 | 258 | def __convert(self, formula, img_path, displaymath=False): 259 | """convert(formula, img_path, displaymath=False) Convert given formula 260 | with displaymath/inlinemath. This method wraps the formula in a tex 261 | document, executes all the steps to produce a image and return the 262 | positioning information for the HTML output. It does not check the 263 | cache. 264 | 265 | :param formula formula to convert 266 | :param img_path image output path (relative to the configured base_path, 267 | see __init__) 268 | :param displaymath whether or not to use displaymath during the conversion 269 | :return dictionary with position (pos), image path (path) and formula 270 | style (displaymath, boolean) as a dictionary with the keys in 271 | parenthesis 272 | """ 273 | latex = typesetting.LaTeXDocument(formula) 274 | latex.set_displaymath(displaymath) 275 | 276 | def set(opt, setter): 277 | if self.__options[opt]: 278 | getattr(latex, 'set_' + setter)(self.__options[opt]) 279 | 280 | set('preamble', 'preamble_string') 281 | set('latex_maths_env', 'latex_environment') 282 | set('background_color', 'background_color') 283 | set('foreground_color', 'foreground_color') 284 | if self.__encoding: 285 | latex.set_encoding(self.__encoding) 286 | if self.__replace_nonascii: 287 | latex.set_replace_nonascii(True) 288 | # dvipng needs the additionalindication of transparency (enabled by 289 | # default) when setting a background colour 290 | if self.__options['background_color']: 291 | self.__converter.set_transparency(False) 292 | pos = self.__converter.convert( 293 | latex, os.path.join(self.__output_path, 294 | os.path.splitext(img_path)[0]) 295 | ) 296 | return { 297 | 'pos': pos, 298 | 'path': img_path, # relative to self.__base_name(!) 299 | 'displaymath': displaymath, 300 | } 301 | 302 | def get_data_for(self, formula, display_math): 303 | """Simple wrapper around ImageCache, enriching the returned data with 304 | the information provided as arguments to this function. 305 | 306 | This helps when using a formula without its context. 307 | """ 308 | data = self.__cache.get_data_for(formula, display_math).copy() 309 | data.update({'formula': formula, 'displaymath': display_math}) 310 | return data 311 | -------------------------------------------------------------------------------- /gleetex/caching.py: -------------------------------------------------------------------------------- 1 | # (c) 2013-2022 Sebastian Humenda 2 | # This code is licenced under the terms of the LGPL-3+, see the file COPYING for 3 | # more details. 4 | """This module contains the ImageCache, caching formulas which have already 5 | been converted. This allows to re-use images for formulas which occur multiple 6 | times within a document or multiple documents in a directory. Furthermore, it 7 | can significantly speed up incremental document creation, because the cache is 8 | remembered across GladTeX runs. 9 | 10 | Cache format: 11 | 12 | { # dict of formulas 13 | 'some formula': # formula as key into dictionary 14 | { # list of display math / inline maths variants 15 | True: # displaymath = True 16 | { # dictionary of values describing formula 17 | 'path': 'some/path' 18 | 'pos': { # positioning within the HTML document 19 | 'height': ..., 'width':..., 'depth:.... 20 | } 21 | } 22 | } 23 | } 24 | } 25 | 26 | The spacing in formulas is normalised to avoid converting the same formula with 27 | different spacing. 28 | """ 29 | 30 | import contextlib 31 | import json 32 | import os 33 | 34 | CACHE_VERSION = '2.0' 35 | 36 | 37 | def normalize_formula(formula): 38 | """Normalise the spacing of a formula. 39 | 40 | This squeezes multiple whitespace into a single, on, replaces tabs by spaces 41 | and strip trailing spaces. 42 | """ 43 | return ( 44 | formula.replace('{}', ' ') 45 | .replace('\t', ' ') 46 | .replace(' ', ' ') 47 | .rstrip() 48 | .lstrip() 49 | ) 50 | 51 | 52 | def recover_bools(object): 53 | """After JSon is read from disk, keys as False or True have been serialized 54 | to 'false' and 'true', but they're not recovered by the json parser. 55 | 56 | This function converts these keys back to booleans; note: it 57 | only works with references, so this function doesn't return 58 | anything. 59 | """ 60 | if isinstance(object, dict): 61 | for key in ['false', 'true']: 62 | if key in object: 63 | val = object[key] # store value 64 | # safe it with boolean representation 65 | object[key == 'true'] = val 66 | del object[key] # remove string key 67 | # iterate recursively through dict 68 | for value in object.values(): 69 | recover_bools(value) 70 | if isinstance(object, list): 71 | for item in object: 72 | recover_bools(item) 73 | 74 | 75 | class JsonParserException(Exception): 76 | """Specialized exception class for handling errors while parsing the JSON 77 | cache.""" 78 | 79 | pass 80 | 81 | 82 | class ImageCache: 83 | """This cache stores formulas which have been converted already and don't 84 | need to be converted again. This is both a disk usage and performance 85 | improvement. The cache can be written and read from disk. 86 | 87 | If the argument keep_old_cache is True, the cache will raise a 88 | JsonParserException if that file could not be read (i.e. incompatible 89 | GladTeX version). If set to False, it'll discard the cache along with all 90 | eqn* files and start with a clean cache. 91 | 92 | Example: 93 | 94 | cache = ImageCache() 95 | c.add_formula('\\tau', # the formulas 96 | {'height': 1, 'depth': 2, 'width='3'}, # the positioning information for the output document 97 | 'eqn042.svg', displaymath=True): 98 | assert len(cache) == 1 # one entry 99 | c.write() 100 | assert os.path.exists('gladtex.cache') 101 | 102 | The optional argument base_path adds the ability to add a base directory to 103 | each file path. The base_path is used to simulate a different working 104 | directory. Imagine you have a directory chapter01 and you want your images 105 | to be in a subdirectory img. Chjanging the current working directory isn't 106 | possible because of parallelism, therefore you initialise the cache like 107 | this: 108 | 109 | c = cache = ImageCache(path='/img/gladtex.cache', base_path='chapter01') 110 | c.add_formula(…, 'img/eqn001.svg') # will result in chapter01/img/eqn001.svg 111 | """ 112 | 113 | VERSION_STR = 'GladTeX__cache__version' 114 | 115 | def __init__(self, path='gladtex.cache', keep_old_cache=True, base_path=''): 116 | self.__cache = {} 117 | self.__set_version(CACHE_VERSION) 118 | self.__cache_name = os.path.join(base_path, path) 119 | self.__base_path = base_path 120 | if os.path.exists(os.path.join(base_path, path)): 121 | try: 122 | self._read() 123 | except JsonParserException: 124 | if keep_old_cache: 125 | raise 126 | else: 127 | self._remove_old_cache_and_files() 128 | 129 | def __len__(self): 130 | """Return number of formulas in the cache.""" 131 | # ignore version 132 | return len(self.__cache) - 1 133 | 134 | def __set_version(self, version): 135 | """Set version of cache (data structure format).""" 136 | self.__cache[ImageCache.VERSION_STR] = version 137 | 138 | def write(self): 139 | """Write cache to disk. 140 | 141 | The file name will be the one configured during initialisation 142 | of the cache. 143 | """ 144 | if not self.__cache: 145 | return 146 | with open(self.__cache_name, 'w', encoding='UTF-8') as file: 147 | file.write(json.dumps(self.__cache)) 148 | 149 | def _read(self): 150 | """Read Json from disk into cache, if file exists. 151 | 152 | :raises JsonParserException if json could not be parsed 153 | """ 154 | 155 | def raise_error(msg): 156 | raise JsonParserException( 157 | msg 158 | + '\nPlease delete the cache (and' 159 | + ' the images) and rerun the program.' 160 | ) 161 | 162 | if os.path.exists(self.__cache_name): 163 | # pylint: disable=broad-except 164 | try: 165 | with open(self.__cache_name) as file: 166 | self.__cache = json.load(file) 167 | except Exception as e: 168 | msg = 'error while reading cache from %s: ' % os.path.abspath( 169 | self.__cache_name 170 | ) 171 | if isinstance(e, (ValueError, OSError)): 172 | msg += str(e.args[0]) 173 | elif isinstance(e, UnicodeDecodeError): 174 | msg += ( 175 | 'expected UTF-8 encoding, erroneous byte ' 176 | + '{0} at {1}:{2} ({3})'.format(*(e.args[1:])) 177 | ) 178 | else: 179 | msg += str(e.args[0]) 180 | raise_error(msg) 181 | if not isinstance(self.__cache, dict): 182 | raise_error('Decoded Json is not a dictionary.') 183 | if not self.__cache.get(ImageCache.VERSION_STR): 184 | self.__set_version(CACHE_VERSION) 185 | cur_version = self.__cache.get(ImageCache.VERSION_STR) 186 | if cur_version != CACHE_VERSION: 187 | raise_error( 188 | 'Cache in %s has version %s, expected %s.' 189 | % (self.__cache_name, cur_version, CACHE_VERSION) 190 | ) 191 | recover_bools(self.__cache) 192 | 193 | def _remove_old_cache_and_files(self): 194 | os.remove(self.__cache_name) 195 | directory = os.path.dirname(self.__cache_name) 196 | if not directory: 197 | directory = '.' 198 | # remove all files starting with eqn* 199 | for file in os.listdir(directory): 200 | if not file.startswith('eqn'): 201 | continue 202 | file = os.path.join(directory, file) 203 | if os.path.isfile(file): 204 | os.remove(file) 205 | 206 | def add_formula(self, formula, pos, file_path, displaymath=False): 207 | """Add formula to cache. 208 | 209 | The pos argument contains the positioning info for the output 210 | document and is a dict with 'height', 'width' and 'depth'. Keep 211 | in mind that formulas set with displaymath are not the same as 212 | those set iwth inlinemath. This method raises OSError if 213 | specified image doesn't exist or if it got an absolute 214 | file_path. 215 | 216 | If a file path already exists, the cache entry will be overridden. 217 | """ 218 | if os.path.isabs(file_path): 219 | raise OSError(f"image path in cache may not be absolute: {file_path}") 220 | if '\\' in file_path: 221 | file_path = file_path.replace('\\', '/') 222 | if not os.path.exists(os.path.join(self.__base_path, file_path)): 223 | raise OSError( 224 | "cannot add %s to the cache: doesn't exist" 225 | % os.path.join(self.__base_path, file_path) 226 | ) 227 | if not pos or not formula or not file_path: 228 | raise ValueError('the supplied arguments may not be empty/none') 229 | if not isinstance(displaymath, bool): 230 | raise ValueError('displaymath must be a boolean') 231 | formula = normalize_formula(formula) 232 | if not formula in self.__cache: 233 | self.__cache[formula] = {} 234 | val = self.__cache[formula] 235 | if not displaymath in val: 236 | val[displaymath] = { 237 | 'pos': pos, 238 | 'path': file_path, 239 | } 240 | 241 | def remove_formula(self, formula, displaymath): 242 | """This method removes the given formula from the cache. 243 | 244 | A KeyError is raised, if the formula did not exist. Internally, 245 | formulas are normalized to detect similarities. 246 | """ 247 | formula = normalize_formula(formula) 248 | if not formula in self.__cache: 249 | raise KeyError('key %s not in cache' % formula) 250 | else: 251 | value = self.__cache[formula] 252 | if displaymath in value: 253 | with contextlib.suppress(FileNotFoundError): 254 | os.remove( 255 | os.path.join(self.__base_path, 256 | value[displaymath]['path']) 257 | ) 258 | del self.__cache[formula][displaymath] 259 | if not self.__cache[formula]: 260 | del self.__cache[formula] 261 | else: 262 | raise KeyError('key %s (%s) not in cache' % 263 | (formula, displaymath)) 264 | 265 | def contains(self, formula, displaymath): 266 | """Check whether a formula was already cached and return True if 267 | found.""" 268 | try: 269 | return bool(self.get_data_for(formula, displaymath)) 270 | except KeyError: 271 | return False 272 | 273 | def get_data_for(self, formula, displaymath): 274 | """Retrieve meta data about a formula from the cache. 275 | 276 | The meta information is used to embed the formula in the HTML 277 | document. It is a dictionary with the keys 'pos' and 'path'. The 278 | positioning info is described in the documentation of this 279 | class. This method raises a KeyError if the formula wasn't 280 | found. 281 | """ 282 | formula = normalize_formula(formula) 283 | if not formula in self.__cache: 284 | raise KeyError(formula, displaymath) 285 | else: 286 | # check whether file still exists 287 | value = self.__cache[formula] 288 | if displaymath in value.keys(): 289 | # if file doesn't exist anymore, outdated and hence removed from 290 | # cache 291 | if not os.path.exists( 292 | os.path.join(self.__base_path, value[displaymath]['path']) 293 | ): 294 | del self.__cache[formula] 295 | raise KeyError((formula, displaymath)) 296 | else: 297 | return value[displaymath] 298 | else: 299 | raise KeyError((formula, displaymath)) 300 | 301 | -------------------------------------------------------------------------------- /gleetex/htmlhandling.py: -------------------------------------------------------------------------------- 1 | # (c) 2013-2023 Sebastian Humenda 2 | # This code is licenced under the terms of the LGPL-3+, see the file COPYING for 3 | # more details. 4 | """ 5 | GleeTeX is designed to allow the re-use of the image creation code 6 | independently of the HTML conversion code. Therefore, this module contains the 7 | code required to parse equations from HTML, to write converted HTML documents 8 | back and to handle the exclusion of formulas too long for an HTML alt tag to 9 | an external HTML file. 10 | The pandoc module contains similar functions for using GleeTeX as a pandoc 11 | filter, without the using HTML as destination format. 12 | """ 13 | 14 | from abc import abstractmethod 15 | import collections 16 | import enum 17 | import html 18 | import os 19 | import posixpath 20 | import re 21 | 22 | from . import sink 23 | from . import typesetting 24 | 25 | # match HTML 4 and 5 26 | CHARSET_PATTERN = re.compile( 27 | rb'(?:content="text/html; charset=(.*?)"|charset="(.*?)")') 28 | 29 | 30 | class ParseException(Exception): 31 | """Exception to propagate a parsing error.""" 32 | 33 | def __init__(self, msg, pos=None): 34 | self.msg = msg 35 | self.pos = pos 36 | super().__init__(msg, pos) 37 | 38 | def __str__(self): 39 | if self.pos: 40 | return f'line {self.pos[0]}, {self.pos[1]}: {self.msg}' 41 | else: 42 | return self.msg 43 | 44 | 45 | def get_position(document, index): 46 | """This returns the line number and position on line for the given String. 47 | 48 | Note: lines and positions are counted from 0. 49 | """ 50 | line = document[: index + 1].count('\n') 51 | if document[index] == '\n': 52 | return (line, 0) 53 | newline = document[: index + 1].rfind('\n') 54 | newline = newline if newline >= 0 else 0 55 | return (line, len(document[newline:index])) 56 | 57 | 58 | def find_anycase(where, what): 59 | """Find with both lower or upper case.""" 60 | lower = where.find(what.lower()) 61 | upper = where.find(what.upper()) 62 | if lower >= 0: 63 | return lower 64 | return upper 65 | 66 | 67 | class EqnParser: 68 | """This parser parses ... in an HTML of a document. 69 | 70 | It's not an HTML parser, because the content within .* is 71 | parsed verbatim. It also parses comments, to not consider formulas 72 | within comments. All other cases are unhandled. Especially CData is 73 | problematic, although it seems like a rare use case. 74 | """ 75 | 76 | class State(enum.Enum): # ([\s\S]*?) also matches newlines 77 | Comment = re.compile(r'', re.MULTILINE) 78 | Equation = re.compile( 79 | r'<\s*(?:eq|EQ)\s*(.*?)?>([\s\S.]+?)<\s*/\s*(?:eq|EQ)>', re.MULTILINE 80 | ) 81 | 82 | HTML_ENTITY = re.compile(r'(&(:?#\d+|[a-zA-Z]+);)') 83 | 84 | def __init__(self): 85 | self.__document = None 86 | self.__data = [] 87 | self.__encoding = None 88 | 89 | def feed(self, document): 90 | """Feed a string or a bytes instance and start parsing. 91 | 92 | If a bytes instance is fed, an HTML encoding header has to be 93 | present, so that the encoding can be extracted. 94 | """ 95 | if isinstance(document, bytes): # try to guess encoding 96 | try: 97 | encoding = next( 98 | filter(bool, CHARSET_PATTERN.search(document).groups()) 99 | ).decode('ascii') 100 | document = document.decode(encoding) 101 | except AttributeError as e: 102 | raise ParseException( 103 | ( 104 | 'Could not determine encoding of ' 105 | 'document, no charset information in the HTML header ' 106 | 'found.' 107 | ) 108 | ) from e 109 | self.__encoding = encoding 110 | self.__document = document[:] 111 | self._parse() 112 | 113 | def find_with_offset(self, doc, start, what): 114 | """This find method searches in the document for a given string, 115 | staking the offset into account. 116 | 117 | Returned is the absolute position (so offset + relative match 118 | position) or -1 for no hit. 119 | """ 120 | if isinstance(what, str): 121 | pos = doc[start:].find(what) 122 | else: 123 | match = what.search(doc[start:]) 124 | pos = -1 if not match else match.span()[0] 125 | return pos if pos == -1 else pos + start 126 | 127 | def _parse(self): 128 | """This function parses the document, while maintaining state using the 129 | State enum.""" 130 | def in_document(x): return not x == -1 131 | # maintain a lower-case copy, which eases searching, but doesn't affect 132 | # the handler methods 133 | doc = self.__document[:].lower() 134 | 135 | end = len(self.__document) - 1 136 | eq_start = re.compile(r'<\s*eq\s*(.*?)>') 137 | 138 | start_pos = 0 139 | while start_pos < end: 140 | comment = self.find_with_offset(doc, start_pos, '' % match.groups()[0]) 208 | return start_pos + match.span()[1] # return end of match 209 | 210 | def get_encoding(self): 211 | """Return the parsed encoding from the HTML meta data. 212 | 213 | If none was set, UTF-8 is assumed. 214 | """ 215 | return self.__encoding 216 | 217 | def get_data(self): 218 | """Return parsed chunks. 219 | 220 | These are either strings or tuples with formula information, see 221 | class documentation. 222 | """ 223 | return [x for x in self.__data if x] # filter empty bits 224 | 225 | 226 | def generate_label(formula): 227 | """Generate an id for identifying a formula as an anchor in a document. 228 | 229 | The generated ID is guaranteed to be valid in an XML attribute and 230 | it won't exceed a certain length. If you happen to have a lot of 231 | formulas > 150 characters with exactly the same content in the 232 | document, that'll cause a clash of id's. 233 | """ 234 | # for some characters we just use a simple replacement (otherwise the 235 | # would be lost) 236 | mapped = {'{': '_', '}': '_', 237 | '(': '-', ')': '-', '\\': '.', '^': ',', '*': '_'} 238 | id = [] 239 | prevchar = '' 240 | for c in formula: 241 | if prevchar == c: 242 | continue # avoid multiple same characters 243 | if c in mapped: 244 | id.append(mapped[c]) 245 | elif c.isalpha() or c.isdigit(): 246 | id.append(c) 247 | prevchar = c 248 | # id's must start with an alphabetical character, so prefix the formula with 249 | # "formula" to make it a valid html id 250 | if id and not id[0].isalpha(): 251 | id = ['f', 'o', 'r', 'm', '_'] + id 252 | if not id: # is empty 253 | raise ValueError( 254 | "For the formula '%s' no referencable id could be generated." % formula 255 | ) 256 | return ''.join(id[:150]) 257 | 258 | 259 | def format_formula_paragraph(formula): 260 | """Format a formula to appear as if it would have been excluded into an 261 | external HTML file.""" 262 | return '

%s

\n' % (generate_label(formula), formula) 263 | 264 | 265 | # pylint: disable=too-many-instance-attributes 266 | class ImageFormatter: # ToDo: localisation 267 | """ImageFormatter(is_epub=False) 268 | 269 | Format converted formula to be included into HTML. A typical image 270 | attribute will contain the path to the image, style information, a CSS class 271 | to be used in custom CSS style sheets and an alternative text (the LaTeX 272 | source) for people who disabled images or for blind screen reader users. 273 | If set, LaTeX formulas exceeding a configurable maximum length will be 274 | excluded. The image will be a link which leads to the excluded image text. 275 | The alt attribute is a text-only attribute and e.g. line breaks will be lost 276 | for screen reader users, so it makes sense for longer formulas to be 277 | external to be easily readable. Furthermore the alt attribute is limited in 278 | size, so formulas that are too long need to be treated differently. 279 | If that behavior is not wanted, it can be disabled and 280 | nothing will be excluded. 281 | 282 | Keyword arguments 283 | 284 | * `base_path=""`: base path where images are stored, e.g. "images" 285 | * `link_prefix=""`: a prefix which should be added to generated links, e.g. 286 | `"https://example.com/img/"` 287 | * `exclusion_file_path=""`: the path which formula descriptions are 288 | written to which exceed a certain threshold that doesn't fit into the 289 | alt tag of the `img` tag 290 | * `is_epub`: round height/width of the linked images to comply with the 291 | EPUB standard. 292 | 293 | Intended usage: 294 | 295 | fmt = ImageFormatter() # use one of the children classes 296 | # values as returned by Tex2img 297 | fmt.format(pos, formula, img_path, displaymath=False) 298 | fmt.format(pos2, formula2, img_path2, displaymath=True) 299 | ... 300 | img.get_excluded() # a list of formulas that were too long for the alt tag 301 | """ 302 | 303 | def __init__(self, base_path=None, link_prefix='', 304 | exclusion_file_path=sink.EXCLUSION_FILE_NAME, is_epub=False): 305 | self.__inline_maxlength = 100 306 | self._excluded_formulas = collections.OrderedDict() 307 | self.__url = '' 308 | self._is_epub = is_epub 309 | self._css = {'inline': 'inlinemath', 'display': 'displaymath'} 310 | self.__replace_nonascii = False 311 | self._link_prefix = link_prefix if link_prefix else '' 312 | base_path = ("" if not base_path else base_path) 313 | self._exclusion_filepath = posixpath.join( 314 | base_path, exclusion_file_path) 315 | if os.path.exists(self._exclusion_filepath) and not os.access( 316 | self._exclusion_filepath, os.W_OK 317 | ): 318 | raise OSError(f'file {self._exclusion_filepath} not writable') 319 | 320 | def get_exclusion_file_path(self): 321 | """Return the path to the file to which formulas will be excluded too 322 | if their description exceeds the alt attribute length. 323 | 324 | May be None. 325 | """ 326 | return self._exclusion_filepath if self._exclusion_filepath else None 327 | 328 | def set_replace_nonascii(self, flag): 329 | """If True, non-ascii characters will be replaced through their LaTeX 330 | command. 331 | 332 | Note that alphabetical characters will not be replaced, to allow 333 | easier readibility. 334 | """ 335 | self.__replace_nonascii = flag 336 | 337 | def set_max_formula_length(self, length): 338 | """Set maximum length of a formula before it gets excluded into a 339 | separate file.""" 340 | self.__inline_maxlength = length 341 | 342 | def set_inline_math_css_class(self, css): 343 | """set css class for inline math.""" 344 | self._css['inline'] = css 345 | 346 | @abstractmethod 347 | def _generate_link_label(self, formula): 348 | """Generate the link to an excluded formula, consisting either of path 349 | and label or just a label. 350 | 351 | The label is generated uniquely for each label by this function. 352 | This function needs to be customised by implementors, e.g. to 353 | return "foo.html#formula" or "#formula", etc. 354 | """ 355 | 356 | def set_display_math_css_class(self, css): 357 | """set css class for display math.""" 358 | self._css['display'] = css 359 | 360 | def set_is_epub(self, flag): 361 | """Active rounding of height and weight attribute of the formula images 362 | to comply with the EPUB standard.""" 363 | self._is_epub = flag 364 | 365 | def set_url(self, prefix): 366 | """Set URL prefix which is used as a prefix to the image file in the 367 | HTML link.""" 368 | self.__url = prefix 369 | 370 | def get_excluded(self): 371 | """Return a list of LaTeX formulas that did not fit the alt tag and 372 | were hence formatted separately, e.g. into a separate document.""" 373 | return self._excluded_formulas 374 | 375 | def _process_image(self, pos, formula, img_path, displaymath=False): 376 | """Process positioning of the image and the various URI-related 377 | parameters into formatting information. 378 | 379 | :param pos dictionary containing keys depth, height and width 380 | :param formula LaTeX alternative text 381 | :param img_path: path to image 382 | :param displaymath display or inline math (default False, inline maths) 383 | :returns a dictionary with the information about the image; its keys 384 | correspond to HTML image attributes, except for "url" and "image". 385 | """ 386 | image = {'formula': formula} 387 | full_url = img_path 388 | if self.__url: 389 | full_url = self.__url.rstrip('/') + '/' + img_path 390 | image['url'] = full_url 391 | # depth is a negative offset (float, first, str later) 392 | depth = float(pos['depth']) * -1 393 | if self._is_epub: 394 | depth = str(int(depth)) 395 | else: 396 | depth = f'{depth:.2f}' 397 | image['style'] = f'vertical-align: {depth}px; margin: 0;' 398 | 399 | image['class'] = self._css['display'] if displaymath else self._css['inline'] 400 | if self._is_epub: 401 | image.update( 402 | {'height': str(int(pos['height'])), 403 | 'width': str(int(pos['width']))} 404 | ) 405 | else: 406 | image.update( 407 | {'height': f"{pos['height']:.2f}", 408 | 'width': f"{pos['width']:.2f}"} 409 | ) 410 | return image 411 | 412 | @abstractmethod 413 | def add_excluded(self, image): 414 | """Add a formula to the list of excluded formulas.""" 415 | 416 | @abstractmethod 417 | def format_internal(self, image, link_label=None): 418 | """Format an internal formula for the target output (defined by the 419 | class). 420 | 421 | :param image formula information as returned by _process_image; formula 422 | will have been shortened if it were too long 423 | :param link_label if not None, the formula image will contian a reference 424 | or link to the long version of the formula (e.g. because it didn't fit 425 | the alt attribute) 426 | """ 427 | 428 | def format(self, pos, formula, img_path, displaymath=False): 429 | """This method formats a formula. It invokes the abstract methods 430 | `format_internal` and `add_excluded`. `add_excluded` is only invoked if 431 | the formula is too long and if exclusion has been configured. This 432 | method returns the formatted image. The formatted image will contain a 433 | reference to the excluded formula source, if applicable. The formatted 434 | excluded formulas can be retrieved using get_excluded(). 435 | 436 | :param pos dictionary containing keys depth, height and width 437 | :param formula LaTeX alternative text 438 | :param img_path: path to image 439 | :param displaymath whether or not formula is in display math (default: no) 440 | :returns a tuple containing the formatted image and, if applicable, the 441 | excluded image alternate text. 442 | """ 443 | formula = typesetting.increase_readability( 444 | formula, self.__replace_nonascii) 445 | processed_data = self._process_image( 446 | pos, formula, img_path, displaymath) 447 | shortened_data = processed_data.copy() 448 | shortened_data['formula'] = formula 449 | link_destination = None 450 | if len(formula) > self.__inline_maxlength: 451 | shortened_data['formula'] = f"{formula[:self.__inline_maxlength]}..." 452 | link_destination = self._generate_link_destination(processed_data) 453 | # builds up internal list of formatted excluded formulas 454 | self.add_excluded(processed_data) 455 | return self.format_internal(shortened_data, link_destination) 456 | 457 | 458 | class HtmlImageFormatter(ImageFormatter): 459 | """Format formulas for HTML file output. 460 | 461 | See ImageFormatter for information about the usage of the class. 462 | """ 463 | 464 | def __init__(self, *args, **kwargs): 465 | super().__init__(*args, **kwargs) 466 | 467 | def _generate_link_destination(self, formula): 468 | html_label = generate_label(formula['formula']) 469 | exclusion_filelink = posixpath.join( 470 | self._link_prefix, self._exclusion_filepath 471 | ) 472 | return f'{exclusion_filelink}#{html_label}' 473 | 474 | def format_internal(self, image, link_label=None): 475 | link_start, link_end = ('', '') 476 | if link_label: 477 | link_start = f'' if link_label else '' 478 | link_end = '' if link_label else '' 479 | escaped_formula = html.escape(image['formula'], quote=True) 480 | return ( 481 | link_start 482 | + ( 483 | f'' 486 | ) 487 | + link_end 488 | ) 489 | 490 | # Todo: this function is useless: if should be merged with format and it 491 | # should build up a dictionary of id, full formula; the formatting should go 492 | # to a separate function; link prefix and such details should be part of 493 | # super class; ToDo, btw, link prefix also for image paths, probably not 494 | # used yet in format strings of format_internal 495 | def add_excluded(self, image): 496 | self._excluded_formulas[generate_label( 497 | image['formula'])] = image['formula'] 498 | 499 | 500 | def write_html(file, document, formatter): 501 | """Processed HTML documents are made up of raw HTML chunks which are 502 | written back unaltered and of a processed image. 503 | 504 | A processed image is a former formula converted to an image with 505 | additional meta data. This is passed to the format function of the 506 | supplied formatter and the result is written to the given (open) 507 | file handle. 508 | """ 509 | for chunk in document: 510 | if isinstance(chunk, dict): 511 | is_displaymath = chunk['displaymath'] 512 | file.write( 513 | formatter.format( 514 | chunk['pos'], chunk['formula'], chunk['path'], is_displaymath 515 | ) 516 | ) 517 | else: 518 | file.write(chunk) 519 | -------------------------------------------------------------------------------- /gleetex/image.py: -------------------------------------------------------------------------------- 1 | # (c) 2013-2021 Sebastian Humenda 2 | # This code is licenced under the terms of the LGPL-3+, see the file COPYING for 3 | # more details. 4 | """This module takes care of the actual image creation process. 5 | 6 | Each formula is saved as an image, either as PNG or SVG. SVG is advised, 7 | since it is a properly scalable format. 8 | """ 9 | 10 | import enum 11 | import os 12 | import re 13 | import shutil 14 | import subprocess 15 | import sys 16 | 17 | from .typesetting import LaTeXDocument 18 | 19 | DVIPNG_REGEX = re.compile(r'^ depth=(-?\d+) height=(\d+) width=(\d+)') 20 | DVISVGM_DEPTH_REGEX = re.compile( 21 | r'^\s*width=.*?pt, height=.*?pt, depth=(.*?)pt') 22 | DVISVGM_SIZE_REGEX = re.compile(r'^\s*graphic size: (.*?)pt x (.*?)pt') 23 | 24 | 25 | def remove_all(*files): 26 | """Guarded remove of files (rm -f); no exception is thrown if a file 27 | couldn't be removed.""" 28 | for file in files: 29 | try: 30 | os.remove(file) 31 | except OSError: 32 | pass 33 | 34 | 35 | def proc_call(cmd, cwd=None, install_recommends=True): 36 | """Execute cmd (list of arguments) as a subprocess. 37 | 38 | Returned is a tuple with stdout and stderr, decoded if not None. If 39 | the return value is not equal 0, a subprocess error is raised. 40 | Timeouts will happen after 20 seconds. 41 | """ 42 | with subprocess.Popen( 43 | cmd, 44 | stdout=subprocess.PIPE, 45 | stderr=subprocess.PIPE, 46 | cwd=cwd, 47 | ) as proc: 48 | data = [] 49 | try: 50 | data = [ 51 | d.decode(sys.getdefaultencoding(), errors='surrogateescape') 52 | for d in proc.communicate(timeout=20) 53 | if d 54 | ] 55 | if proc.wait(): 56 | raise subprocess.SubprocessError( 57 | 'Error while executing %s\n%s\n' % ( 58 | ' '.join(cmd), '\n'.join(data)) 59 | ) 60 | except subprocess.TimeoutExpired as e: 61 | proc.kill() 62 | note = 'Subprocess expired with time out: ' + str(cmd) + '\n' 63 | poll = proc.poll() 64 | if poll: 65 | note += str(poll) + '\n' 66 | if data: 67 | raise subprocess.SubprocessError(str(data + '\n' + note)) 68 | else: 69 | raise subprocess.SubprocessError( 70 | 'execution timed out after ' 71 | + str(e.args[1]) 72 | + ' s: ' 73 | + ' '.join(e.args[0]) 74 | ) 75 | except KeyboardInterrupt as e: 76 | sys.stderr.write('\nInterrupted; ') 77 | import traceback 78 | 79 | traceback.print_exc(file=sys.stderr) 80 | except FileNotFoundError: 81 | # program missing, try to help 82 | text = 'Command `%s` not found.' % cmd[0] 83 | if install_recommends and shutil.which('dpkg'): 84 | text += ' Install it using `sudo apt install ' + install_recommends 85 | else: 86 | text += ' Install a TeX distribution of your choice, e.g. MikTeX or TeXlive.' 87 | raise subprocess.SubprocessError(text) from None 88 | if isinstance(data, list): 89 | return '\n'.join(data) 90 | return data 91 | 92 | 93 | # pylint: disable=too-few-public-methods 94 | class Format(enum.Enum): 95 | """Choose the image output format.""" 96 | 97 | Png = 'png' 98 | Svg = 'svg' 99 | 100 | 101 | class Tex2img: 102 | """Convert a TeX document string into a png file. This class interacts with 103 | the LaTeX and dvipng/dvisvgm sub processes. Upon error the methods throw a 104 | SubprocessError with all necessary information to fix the issue. 105 | 106 | On PNG: The background of the PNG files will be transparent by 107 | default. If you set a background colour within the LaTeX 108 | document, you need to turn off transparency in this converter 109 | manually. 110 | """ 111 | 112 | def __init__(self, fmt, encoding='UTF-8'): 113 | if not isinstance(fmt, Format): 114 | raise ValueError('Enumeration of type Format expected.' + str(fmt)) 115 | self.__format = fmt 116 | self.__encoding = encoding 117 | self.__parsed_data = None 118 | self.__size = [115, None] 119 | self.__background = 'transparent' 120 | self.__keep_latex_source = False 121 | self.__is_epub = False 122 | 123 | def set_is_epub(self, val): 124 | """Enable or disable Epub-conforming image creation.""" 125 | self.__is_epub = val 126 | 127 | def set_dpi(self, dpi): 128 | """Set output resolution for formula images. 129 | 130 | This has no effect ifthe output format is SVG. It will 131 | automatically overwrite a font size, if set. 132 | """ 133 | if not isinstance(dpi, (int, float)): 134 | raise TypeError('Dpi must be an integer or floating point number') 135 | self.__size[0] = int(dpi) 136 | 137 | def set_fontsize(self, size): 138 | """Set font size for formulas. 139 | 140 | This will be automatically translated into a DPI resolution for 141 | PNG images and taken literally for SVG graphics. 142 | """ 143 | if not isinstance(size, (int, float)): 144 | raise TypeError('Dpi must be an integer or floating point number') 145 | self.__size[1] = float(size) 146 | 147 | def set_transparency(self, flag): 148 | """Set whether or not to use background colour information from the DVI 149 | file. 150 | 151 | This is only relevant for PNG output and if a background colour 152 | other than "transparent" is required, in this case this set'r 153 | should be set to false. It is set to True, resulting in a 154 | transparent background. 155 | """ 156 | self.__background = 'transparent' if flag else 'not transparent' 157 | 158 | def set_keep_latex_source(self, flag): 159 | """Set whether LaTeX source document should be kept.""" 160 | if not isinstance(flag, bool): 161 | raise TypeError('boolean object required, got %s.' % repr(flag)) 162 | self.__keep_latex_source = flag 163 | 164 | def create_dvi(self, tex_document, dvi_fn): 165 | """Call LaTeX to produce a dvi file with the given LaTeX document. 166 | 167 | Temporary files will be removed, even in the case of a LaTeX 168 | error. This method raises a SubprocessError with the helpful 169 | part of LaTeX's error output. 170 | """ 171 | path = os.path.dirname(dvi_fn) 172 | if path and not os.path.exists(path): 173 | os.makedirs(path) 174 | if not path: 175 | path = os.getcwd() 176 | 177 | def new_extension(x): return os.path.splitext(dvi_fn)[0] + '.' + x 178 | 179 | if self.__size[1]: # font size in pt 180 | tex_document.set_fontsize(self.__size[1]) 181 | tex_fn = new_extension('tex') 182 | aux_fn = new_extension('aux') 183 | log_fn = new_extension('log') 184 | cmd = None 185 | encoding = self.__encoding 186 | with open(tex_fn, mode='w', encoding=encoding) as tex: 187 | tex.write(str(tex_document)) 188 | cmd = [ 189 | 'latex', 190 | '-interaction=nonstopmode', 191 | '-halt-on-error', 192 | os.path.basename(tex_fn), 193 | ] 194 | try: 195 | proc_call(cmd, cwd=path, install_recommends='texlive-recommended') 196 | except subprocess.SubprocessError as e: 197 | remove_all(dvi_fn) 198 | msg = '' 199 | if e.args: 200 | data = self.parse_latex_log(e.args[0]) 201 | if data: 202 | msg += data 203 | else: 204 | msg += str(e.args[0]) 205 | raise subprocess.SubprocessError(msg) # propagate subprocess error 206 | finally: 207 | if self.__keep_latex_source: 208 | remove_all(aux_fn, log_fn) 209 | else: 210 | remove_all(tex_fn, aux_fn, log_fn) 211 | 212 | def create_image(self, dvi_fn): 213 | """Create the image containing the formula, using either dvisvgm or 214 | dvipng.""" 215 | dirname = os.path.dirname(dvi_fn) 216 | if dirname and not os.path.exists(dirname): 217 | os.makedirs(dirname) 218 | 219 | output_fn = '%s.%s' % (os.path.splitext(dvi_fn) 220 | [0], self.__format.value) 221 | if self.__format == Format.Png: 222 | dpi = fontsize2dpi( 223 | self.__size[1]) if self.__size[1] else self.__size[0] 224 | return create_png(dvi_fn, output_fn, dpi, self.__background) 225 | if not self.__size[1]: 226 | self.__size[1] = 12 # 12 pt 227 | return create_svg(dvi_fn, output_fn) 228 | 229 | def convert(self, tex_document, base_name): 230 | """Convert the given TeX document into an image. 231 | 232 | The base name is used to create the required intermediate files 233 | and the resulting file will be made of the base_name and the 234 | format-specific file extension. This function returns the 235 | positioning information used in the CSS style attribute. 236 | """ 237 | if not isinstance(tex_document, LaTeXDocument): 238 | raise TypeError( 239 | ('expected object of type typesetting.LaTeXDocument,' ' got %s') 240 | % type(tex_document) 241 | ) 242 | dvi = '%s.dvi' % base_name 243 | try: 244 | self.create_dvi(tex_document, dvi) 245 | dimensions = self.create_image(dvi) 246 | if self.__is_epub: 247 | for key, val in dimensions.items(): 248 | dimensions[key] = int(round(val)) 249 | return dimensions 250 | except OSError: 251 | remove_all('%s.%s' % (base_name, self.__format.value)) 252 | raise 253 | 254 | def parse_latex_log(self, logdata): 255 | """Parse the LaTeX error output and return the relevant part of it.""" 256 | if not logdata: 257 | return None 258 | line = None 259 | for line in logdata.split('\n'): 260 | if line.startswith('! '): 261 | line = line[2:] 262 | break 263 | if line: # try to remove LaTeX line numbers 264 | lineno = re.search(r'\s*on input line \d+', line) 265 | if lineno: 266 | line = line[: lineno.span()[0]] + line[lineno.span()[1]:] 267 | return line 268 | return None 269 | 270 | 271 | def fontsize2dpi(size_pt): 272 | """This function calculates the DPI for the resulting image. Depending on 273 | the font size, a different resolution needs to be used. According to the 274 | dvipng manual page, the formula is: 275 | 276 | = * 72.27 / 10 [px * TeXpt/in / TeXpt] 277 | """ 278 | size_px = size_pt * 1.3333333 # and more 3s! 279 | return size_px * 72.27 / 10 280 | 281 | 282 | def create_png(dvi_fn, output_name, dpi, background): 283 | """Create a PNG file from a given dvi file. The side effect is the PNG file 284 | being written to disk. By default, the background of the resulting image is 285 | transparent, setting any other value will make it use whatever was is set 286 | in the DVI file. 287 | 288 | :param dvi_fn Dvi file name 289 | :param output_name Output file name 290 | :param dpi Output resolution 291 | :param background Background colour (default: transparent) 292 | :return dimensions for embedding into an HTML document 293 | :raises ValueError raised whenever dvipng output coudln't be parsed 294 | """ 295 | if not output_name: 296 | raise ValueError('Empty output_name') 297 | cmd = ['dvipng', '-q*', '-D', str(dpi)] 298 | if background == 'transparent': 299 | cmd += ['-bg', background] 300 | cmd += [ 301 | '--height*', 302 | '--depth*', 303 | '--width*', # print information for embedding 304 | '-o', 305 | output_name, 306 | dvi_fn, 307 | ] 308 | data = None 309 | try: 310 | data = proc_call(cmd, install_recommends='dvipng') 311 | except subprocess.SubprocessError: 312 | remove_all(output_name) 313 | raise 314 | finally: 315 | remove_all(dvi_fn) 316 | for line in data.split('\n'): 317 | found = DVIPNG_REGEX.search(line) 318 | if found: 319 | return dict(zip(['depth', 'height', 'width'], map(float, found.groups()))) 320 | raise ValueError('Could not parse dvi output: ' + repr(data)) 321 | 322 | 323 | def create_svg(dvi_fn, output_name): 324 | """Create a SVG file from a given dvi file. The side effect is the SVG file 325 | being written to disk. 326 | 327 | :param dvi_fn Dvi file name 328 | :param output_name Output file name 329 | :param size font size in pt 330 | :return dimensions for embedding into an HTML document 331 | :raises ValueError raised whenever dvipng output couldn't be parsed 332 | """ 333 | if not output_name: 334 | raise ValueError('Empty output_name') 335 | cmd = [ 336 | 'dvisvgm', 337 | '--exact', 338 | '--no-fonts', 339 | '-o', 340 | output_name, 341 | '--bbox=preview', 342 | dvi_fn, 343 | ] 344 | data = None 345 | try: 346 | data = proc_call(cmd, install_recommends='texlive-binaries') 347 | except subprocess.SubprocessError: 348 | remove_all(output_name) 349 | raise 350 | finally: 351 | remove_all(dvi_fn) 352 | pos = {} 353 | for line in data.split('\n'): 354 | if not pos: 355 | found = DVISVGM_DEPTH_REGEX.search(line) 356 | if found: 357 | # convert from pt to px (assuming 96 dpi) 358 | pos['depth'] = float(found.groups()[0]) * 1.3333333 359 | else: 360 | found = DVISVGM_SIZE_REGEX.search(line) 361 | if found: 362 | pos.update( 363 | dict( 364 | zip( 365 | ['width', 'height'], 366 | # convert from pt to px (assuming 96 dpi) 367 | (float(v) * 1.3333333 for v in found.groups()), 368 | ) 369 | ) 370 | ) 371 | return pos 372 | raise ValueError('Could not parse dvisvgm output: ' + repr(data)) 373 | -------------------------------------------------------------------------------- /gleetex/pandoc.py: -------------------------------------------------------------------------------- 1 | # (c) 2013-2018 Sebastian Humenda 2 | # This code is licenced under the terms of the LGPL-3+, see the file COPYING for 3 | # more details. 4 | """This module contains functionality to parse formulas from a given Pandoc 5 | document AST and to replace these through formatted HTML equations. 6 | 7 | It works in these parsses: 8 | 9 | 1. Extract all math elements from the Pandoc AST. 10 | 2. Convert all formulas to images 11 | * LaTeX is the slowest bit in this process, therefore the formulas are 12 | collected and then converted in parallel. 13 | 3. Replace all math tags in the pandoc AST by raw HTML inline formatting 14 | instructions that reference the converted images and position them 15 | correctly. Note that this cannot made HTML-independent because of the 16 | requirement to use vertical alignment that is not supported by the Pandoc 17 | AST and is hence expressed as a CSS styling instruction. 18 | """ 19 | 20 | import json 21 | 22 | from .htmlhandling import ParseException 23 | 24 | 25 | def __extract_formulas(formulas, ast): 26 | """Recursively extract 'Math' elements from the given AST and add them to 27 | `formulas (list)`.""" 28 | if isinstance(ast, list): 29 | for item in ast: 30 | __extract_formulas(formulas, item) 31 | elif isinstance(ast, dict): 32 | if 't' in ast and ast['t'] == 'Math': 33 | style, formula = ast['c'] 34 | # style = {'t': 'blah'} -> we want blah 35 | style = next(iter(style.values())) 36 | if style not in ['InlineMath', 'DisplayMath']: 37 | raise ParseException( 38 | '[pandoc] unknown formula formatting: ' + repr(ast['c']) 39 | ) 40 | style = True if style == 'DisplayMath' else False 41 | # position is None (only applicable for HTML parsing) 42 | formulas.append((None, style, formula)) 43 | elif 'c' in ast: 44 | __extract_formulas(formulas, ast['c']) 45 | # ^ all other cases do not matter 46 | 47 | 48 | def extract_formulas(ast): 49 | """Extract formulas from a given Pandoc document AST. The returned formulas 50 | are typed like those form the HTML parser, therefore the first argument of 51 | the tuple is unused and hence None. 52 | 53 | :param ast Structure of lists and dicts representing a Pandoc document AST 54 | :return a list of formulas where each formula is (None, style, formula) 55 | """ 56 | formulas = [] 57 | __extract_formulas(formulas, ast['blocks']) 58 | return formulas 59 | 60 | 61 | def replace_formulas_in_ast(formatter, ast, formulas): 62 | """replace 'Math' elements from the given AST with a formatted variant Each 63 | 'Math' element found in the Pandoc AST will be replaced through a formatted 64 | (HTML) image link. 65 | 66 | The formulas are taken from the supplied formulas list. The number 67 | of formulas in the document has to match the number of formulas form 68 | the list. 69 | """ 70 | if not formulas: 71 | return 72 | if isinstance(ast, list): 73 | for item in ast: 74 | replace_formulas_in_ast(formatter, item, formulas) 75 | elif isinstance(ast, dict): 76 | if 't' in ast and ast['t'] == 'Math': 77 | ast['t'] = 'RawInline' # raw HTML 78 | eqn = formulas.pop(0) 79 | ast['c'] = [ 80 | 'html', 81 | formatter.format( 82 | eqn['pos'], eqn['formula'], eqn['path'], eqn['displaymath'] 83 | ), 84 | ] 85 | elif 'c' in ast: 86 | replace_formulas_in_ast(formatter, ast['c'], formulas) 87 | # ^ ignore all other cases 88 | 89 | 90 | def write_pandoc_ast(file, document, formatter): 91 | """Replace 'Math' elements from a Pandoc AST with 'RawInline' elements, 92 | containing formatted HTML image tags. 93 | 94 | :param formatter A formatter offering the "format" method (see ImageFormatter) 95 | :param formulas A list of formulas with the information (pos, formula, path, displaymath) 96 | :param ast Document ast to modified 97 | """ 98 | ast, formulas = document 99 | replace_formulas_in_ast(formatter, ast['blocks'], formulas) 100 | file.write(json.dumps(ast)) 101 | -------------------------------------------------------------------------------- /gleetex/parser.py: -------------------------------------------------------------------------------- 1 | # (c) 2013-2021 Sebastian Humenda 2 | # This code is licenced under the terms of the LGPL-3+, see the file COPYING for 3 | # more details. 4 | """Top-level API to parse input documents. 5 | 6 | The main point of the parsing is to extract formulas from a given input 7 | document, while preserving the remaining formatting. The returned parsed 8 | document structure is highly dependent on the input format and hence 9 | document in their respective functions. 10 | """ 11 | 12 | import enum 13 | import json 14 | import sys 15 | 16 | from . import htmlhandling 17 | from . import pandoc 18 | 19 | ParseException = ( 20 | htmlhandling.ParseException 21 | ) # re-export for consistent API from outside 22 | 23 | 24 | class Format(enum.Enum): 25 | HTML = 0 26 | # while this is json, we never know what other applications might decide to 27 | # use json as their intermediate representation ;) 28 | PANDOCFILTER = 1 29 | 30 | @staticmethod 31 | def parse(string): 32 | string = string.lower() 33 | if string == 'html': 34 | return Format.HTML 35 | if string == 'pandocfilter': 36 | return Format.PANDOCFILTER 37 | raise ValueError('unrecognised format: %s' % string) 38 | 39 | 40 | def parse_document(doc, fmt): 41 | """This function parses an input document (string or bytes) with the given 42 | format specifier. For HTML, the returned "parsed" document is a list of 43 | chunks, where raw chunks are just plain HTML instructions and data and 44 | formula chunks are parsed from the '' tags. If the input document is a 45 | pandoc AST, the formulas will be extracted and the document is a tuple of 46 | (pandoc AST, formulas). 47 | 48 | :param doc input of bytes or string to parse 49 | :param fmt either the enum type `Format` or a string understood by Format.parse 50 | :return (encoding, document) (a tuple) 51 | """ 52 | if isinstance(fmt, str): 53 | fmt = Format.parse(fmt) 54 | encoding = None 55 | if fmt == Format.HTML: 56 | docparser = htmlhandling.EqnParser() 57 | docparser.feed(doc) 58 | encoding = docparser.get_encoding() 59 | encoding = encoding if encoding else 'utf-8' 60 | doc = docparser.get_data() 61 | elif fmt == Format.PANDOCFILTER: 62 | if isinstance(doc, bytes): 63 | doc = doc.decode(sys.getdefaultencoding()) 64 | ast = json.loads(doc) 65 | formulas = pandoc.extract_formulas(ast) 66 | doc = (ast, formulas) # ← see doc string 67 | if not encoding: 68 | encoding = sys.getdefaultencoding() 69 | return encoding, doc 70 | -------------------------------------------------------------------------------- /gleetex/sink.py: -------------------------------------------------------------------------------- 1 | """ 2 | Sink functionality for outputs. 3 | 4 | 5 | GladTeX is capable of writing to different output formats, called a sink. A sink 6 | may parse the source and process the formulas in the document, replacing it with 7 | its converted equivalent. Tis decouples the GleeTeX-internal logic from HTML and 8 | allows using it e.g. as a filter for Pandoc (JSON-encoded). 9 | """ 10 | 11 | import enum 12 | import html 13 | 14 | EXCLUSION_FILE_NAME = 'excluded-descriptions.html' 15 | 16 | # Todo: localisation 17 | HTML_TEMPLATE_HEAD = """ 18 | 20 | \n 21 | 22 | Excluded Formulas 23 | 24 | 25 | """ 26 | 27 | 28 | class SinkType(enum.Enum): 29 | """The type of sink to use. """ 30 | drop = 0 31 | html_body = 1 32 | html_file = 2 33 | json_file = 2 34 | inline = 3 35 | 36 | def html_write_excluded_file(exclusion_filename, formatted_excluded_formulas): 37 | """Write back list of excluded formulas. 38 | Formulas that are too long or too complex for the alt tag are excluded to a 39 | separate file. This function initiates the writing process to the external 40 | file.""" 41 | with open(exclusion_filename, 'w', encoding='UTF-8') as file: 42 | file.write(HTML_TEMPLATE_HEAD) 43 | _html_write_excluded(file, formatted_excluded_formulas) 44 | file.write('\n\n\n') 45 | 46 | 47 | def html_write_excluded_body(exclusion_filename, formatted_excluded_formulas): 48 | with open(exclusion_filename, 'w', encoding='UTF-8') as file: 49 | _html_write_excluded(file, formatted_excluded_formulas) 50 | 51 | 52 | def _html_write_excluded(file_obj, formatted_excluded_formulas): 53 | for label, formula in formatted_excluded_formulas.items(): 54 | escaped_formula = html.escape(formula) 55 | file_obj.write(f'
{escaped_formula}
\n') 56 | 57 | 58 | # Map the sink type to their processing function. 59 | EXCLUSION_FORMULA_SINKS = { 60 | SinkType.html_file: html_write_excluded_file, 61 | SinkType.html_body: html_write_excluded_body, 62 | } 63 | -------------------------------------------------------------------------------- /gleetex/typesetting.py: -------------------------------------------------------------------------------- 1 | # (c) 2013-2021 Sebastian Humenda 2 | # This code is licenced under the terms of the LGPL-3+, see the file COPYING for 3 | # more details. 4 | """This module contains functionality to typeset formulas for the usage in a 5 | LaTeX document (e.g. creating the preamble, replacing non-ascii letters) and to 6 | typeset LaTeX formulas in a more readable way as alternate description of the 7 | resulting image.""" 8 | 9 | import inspect 10 | import re 11 | 12 | from . import unicode 13 | 14 | FORMATTING_COMMANDS = [ 15 | '\\ ', 16 | '\\,', 17 | '\\;', 18 | '\\big', 19 | '\\Big', 20 | '\\left', 21 | '\\right', 22 | '\\limits', 23 | ] 24 | 25 | # A list of LaTeX math environments which place their content in math 26 | # mode but can't be used in math mode themselves (i.e. be nested). Used 27 | # to prevent bad math environment nesting while rendering (see #21). 28 | # 29 | # Assembled from 30 | # - `https://docs.mathjax.org/en/latest/input/tex/macros/index.html#environments` 31 | # - `https://en.wikibooks.org/wiki/LaTeX/Advanced_Mathematics`, 32 | # filtered by custom tests to see if LaTeX compiles with nesting. Some 33 | # environments might still be missing, but these should cover the most 34 | # common use cases. 35 | NON_NESTABLE_MATH_ENVS = [ 36 | 'align*', 37 | 'align', 38 | 'alignat*', 39 | 'alignat', 40 | 'displaymath', 41 | 'empheq', 42 | 'eqnarray*', 43 | 'eqnarray', 44 | 'equation*', 45 | 'equation', 46 | 'flalign*', 47 | 'flalign', 48 | 'gather*', 49 | 'gather', 50 | 'math', 51 | 'multline*', 52 | 'multline', 53 | 'numcases', 54 | 'prooftree', 55 | 'subnumcases', 56 | 'xalignat*', 57 | 'xalignat', 58 | 'xxalignat', 59 | ] 60 | # The pattern used to detect the presence of one of the environments 61 | # above in a given formula. We look for such an environment opening 62 | # after ignoring all initial space characters and LaTeX `%` comments 63 | # *only*, as the it is supposed to only be a formula and to avoid 64 | # complex and error-prone parsing, while still supporting the presumably 65 | # most common use cases. 66 | MATH_ENV_DETECTION_PATTERN = re.compile( 67 | r'\s*(%.*(\n|\r\n?)\s*)*\\begin\{{({})\}}' 68 | .format('|'.join(re.escape(env) for env in NON_NESTABLE_MATH_ENVS)), 69 | ) 70 | 71 | 72 | class DocumentSerializationException(Exception): 73 | """This error is raised whenever a non-ascii character contained in a 74 | formula could not be replaced by a LaTeX command. It provides the following 75 | attributes: 76 | 77 | formula - the formula 78 | index - position in formula 79 | upoint - unicode point. 80 | """ 81 | 82 | def __init__(self, formula, index, upoint): 83 | self.formula = formula 84 | self.index = index 85 | self.upoint = upoint 86 | super().__init__(formula, index, upoint) 87 | 88 | def __str__(self): 89 | return ( 90 | 'could not find LaTeX replacement command for unicode ' 91 | 'character %d, index %d in formula %s' 92 | ) % (self.upoint, self.index, self.formula) 93 | 94 | 95 | def escape_unicode_maths(formula, replace_alphabeticals=True): 96 | """This function uses the unicode table to replace any non-ascii character 97 | (identified with its unicode code point) with a LaTeX command. 98 | 99 | It also parses the formula for commands as e.g. \\\text or \\mbox 100 | and applies text-mode commands within them. This allows the 101 | conversion of formulas with unicode maths with old-style LaTeX2e, 102 | which gleetex depends on. 103 | """ 104 | if not any(ord(ch) > 160 for ch in formula): 105 | return formula # no umlauts, no replacement 106 | 107 | # characters in math mode need a different replacement than in text mode. 108 | # Therefore, the string has to be split into parts of math and text mode. 109 | chunks = [] 110 | if not ('\\text' in formula or '\\mbox' in formula): 111 | # no text mode, so tread a 112 | chunks = [formula] 113 | else: 114 | start = 0 115 | while '\\text' in formula[start:] or '\\mbox' in formula[start:]: 116 | index = formula[start:].find('\\text') 117 | if index < 0: 118 | index = formula[start:].find('\\mbox') 119 | opening_brace = formula[start + index:].find('{') + start + index 120 | # add text before text-alike command and the command itself to chunks 121 | chunks.append(formula[start:opening_brace]) 122 | closing_brace = get_matching_brace(formula, opening_brace) 123 | # add text-mode stuff 124 | chunks.append(formula[opening_brace: closing_brace + 1]) 125 | start = closing_brace + 1 126 | # add last chunk 127 | chunks.append(formula[start:]) 128 | 129 | is_math = True 130 | for index, chunk in enumerate(chunks): 131 | try: 132 | chunks[index] = replace_unicode_characters( 133 | chunk, is_math, replace_alphabeticals=replace_alphabeticals 134 | ) 135 | except ValueError as e: # unicode point missing 136 | index = int(e.args[0]) 137 | raise DocumentSerializationException( 138 | formula, index, ord(formula[index]) 139 | ) from None 140 | is_math = not is_math 141 | return ''.join(chunks) 142 | 143 | 144 | def replace_unicode_characters(characters, is_math, replace_alphabeticals=True): 145 | """Replace all non-ascii characters within the given string with their 146 | LaTeX equivalent. The boolean is_math indicates, whether text-mode commands 147 | (like in \\text{}) or the amsmath equivalents should be used. When 148 | replace_alphabeticals is False, alphabetical characters will not be 149 | replaced through their LaTeX command when in text mode, so that text 150 | within. 151 | 152 | \\text{} (and similar) is not garbled. For instance, \\text{für} is 153 | be replaced by \\text{f\"{u}r} when replace_alphabeticals=True. This 154 | is useful for the alt attribute of an image, where the reader might 155 | want to read the normal text as such. This function raises a 156 | ValueError if a unicode point is not in the table. The first 157 | argument of the ValueError is the index within the string, where the 158 | unknown unicode character has been encountered. 159 | """ 160 | result = [] 161 | for idx, character in enumerate(characters): 162 | if ( 163 | ord(character) < 168 164 | ): # ignore normal ascii character and unicode control sequences 165 | result.append(character) 166 | # treat alphanumerical characters differently when in text mode, see doc 167 | # string; don't replace alphabeticals if specified 168 | elif character.isalpha() and not replace_alphabeticals: 169 | result.append(character) 170 | else: 171 | mode = unicode.LaTeXMode.mathmode if is_math else unicode.LaTeXMode.textmode 172 | commands = unicode.unicode_table.get(ord(character)) 173 | if not commands: # unicode point missing in table 174 | # is catched one level above; provide index for more concise error output 175 | raise ValueError(characters.index(character)) 176 | # if math mode and only a text alternative exists, add \\text{} 177 | # around it 178 | if mode == unicode.LaTeXMode.mathmode and mode not in commands: 179 | result.append('\\text{%s}' % 180 | commands[unicode.LaTeXMode.textmode]) 181 | else: 182 | result.append(commands[mode]) 183 | # if the next character is alphabetical, add space 184 | if ( 185 | (idx + 1) < len(characters) 186 | and characters[idx + 1].isalpha() 187 | and commands[mode][-1].isalpha() 188 | ): 189 | result.append(' ') 190 | return ''.join(result) 191 | 192 | 193 | def get_matching_brace(string, pos_of_opening_brace): 194 | if string[pos_of_opening_brace] != '{': 195 | raise ValueError( 196 | 'index %s in string %s: not a opening brace' 197 | % (pos_of_opening_brace, repr(string)) 198 | ) 199 | counter = 1 200 | for index, ch in enumerate(string[pos_of_opening_brace + 1:]): 201 | if ch == '{': 202 | counter += 1 203 | elif ch == '}': 204 | counter -= 1 205 | if counter == 0: 206 | return pos_of_opening_brace + index + 1 207 | if counter != 0: 208 | raise ValueError('Unbalanced braces in formula ' + repr(string)) 209 | 210 | 211 | # pylint: disable=too-many-instance-attributes 212 | class LaTeXDocument: 213 | """This class represents a LaTeX document. 214 | 215 | It is intended to contain an equation as main content and properties 216 | to customize it. Its main purpose is to provide a str method which 217 | will serialize it to a full LaTeX document. 218 | """ 219 | 220 | def __init__(self, eqn): 221 | self.__encoding = None 222 | self.__equation = eqn 223 | self.__displaymath = False 224 | self.__fontsize = 12 225 | self.__background_color = None 226 | self.__foreground_color = None 227 | self._preamble = '' 228 | self.__maths_env = None 229 | self.__replace_nonascii = False 230 | 231 | def _parse_color(self, color): 232 | # could be a valid color name 233 | try: # hex number? 234 | return int(color, 16) 235 | except ValueError: 236 | return color # treat as normal dvips compatible colour name 237 | 238 | def set_background_color(self, color): 239 | """Set the background color. 240 | 241 | The `color` can be either a valid dvips name or a tuple with RGB 242 | values between 0 and 1. If unset, the image will be transparent. 243 | """ 244 | self.__background_color = self._parse_color(color) 245 | 246 | def set_foreground_color(self, color): 247 | """Set the foreground color. 248 | 249 | The `color` can be either a valid dvips name or a tuple with RGB 250 | values between 0 and 1. If unset, the text will be black. 251 | """ 252 | self.__foreground_color = self._parse_color(color) 253 | 254 | def set_replace_nonascii(self, flag): 255 | """If True, all non-ascii character will be replaced through a LaTeX 256 | command.""" 257 | self.__replace_nonascii = flag 258 | 259 | def set_latex_environment(self, env): 260 | """Set maths environment name like `displaymath` or `flalign*`.""" 261 | self.__maths_env = env 262 | 263 | def get_latex_environment(self): 264 | return self.__maths_env 265 | 266 | def get_encoding(self): 267 | """Return encoding for the document (or None).""" 268 | return self.__encoding 269 | 270 | def set_preamble_string(self, p): 271 | """Set the string to add to the preamble of the LaTeX document.""" 272 | self._preamble = p 273 | 274 | def set_encoding(self, encoding): 275 | """Set the encoding as used by the inputenc package.""" 276 | if encoding.lower().startswith('utf') and '8' in encoding: 277 | self.__encoding = 'utf8' 278 | elif ( 279 | encoding.lower().startswith('iso') and '8859' in encoding 280 | ) or encoding.lower() == 'latin1': 281 | self.__encoding = 'latin1' 282 | else: 283 | # if you plan to add an encoding, you have to adjust the str 284 | # function, which also loads the fontenc package 285 | raise ValueError( 286 | ( 287 | 'Encoding %s is not supported at the moment. If ' 288 | 'you want to use LaTeX 2e, you should report a bug at the home ' 289 | 'page of GladTeX.' 290 | ) 291 | % encoding 292 | ) 293 | 294 | def set_displaymath(self, flag): 295 | """Set whether the formula is set in displaymath.""" 296 | if not isinstance(flag, bool): 297 | raise TypeError('Displaymath parameter must be of type bool.') 298 | self.__displaymath = flag 299 | 300 | def is_displaymath(self): 301 | return self.__displaymath 302 | 303 | def _get_encoding_preamble(self): 304 | # first check whether there are umlauts within the formula and if so, an 305 | # encoding has been set 306 | if any(ord(ch) > 128 for ch in self.__equation) and not self.__replace_nonascii: 307 | if not self.__encoding: 308 | raise ValueError( 309 | ( 310 | 'No encoding set, but non-ascii characters ' 311 | 'present. Please specify an encoding.' 312 | ) 313 | ) 314 | encoding_preamble = '' 315 | if self.__encoding: 316 | # try to guess language and hence character set (fontenc) 317 | import locale 318 | 319 | language = locale.getdefaultlocale() 320 | if language and language[0]: # extract just the language code 321 | language = language[0].split('_')[0] 322 | if not language or not language[0]: 323 | language = 'en' 324 | # check whether language on computer is within T1 and hence whether 325 | # it should be loaded; I know that this can be a misleading 326 | # assumption, but there's no better way that I know of 327 | if language in ['fr', 'es', 'it', 'de', 'nl', 'ro', 'en']: 328 | encoding_preamble += '\n\\usepackage[T1]{fontenc}' 329 | else: 330 | raise ValueError( 331 | ( 332 | 'Language not supported by T1 fontenc ' 333 | 'encoding; please report this to the GladTeX project.' 334 | ) 335 | ) 336 | return encoding_preamble 337 | 338 | def set_fontsize(self, size_in_pt): 339 | """Set fontsize in pt, 12 pt by default.""" 340 | self.__fontsize = size_in_pt 341 | 342 | def get_fontsize(self): 343 | return self.__fontsize 344 | 345 | def __str__(self): 346 | preamble = ( 347 | self._get_encoding_preamble() 348 | + ('\n\\usepackage[utf8]{inputenc}\n\\usepackage{amsmath, amssymb}' '\n') 349 | + (self._preamble if self._preamble else '') 350 | ) 351 | return self._format_document(preamble) 352 | 353 | def _format_color_definition(self, which): 354 | color = getattr(self, '_%s__%s_color' % 355 | (self.__class__.__name__, which)) 356 | if not color or isinstance(color, str): 357 | return '' 358 | return '\\definecolor{%s}{HTML}{%s}' % (which, hex(color)[2:].upper().zfill(6)) 359 | 360 | def _format_colors(self): 361 | color_defs = ( 362 | self._format_color_definition('background'), 363 | self._format_color_definition('foreground'), 364 | ) 365 | color_body = '' 366 | if self.__background_color: 367 | color_body += '\\pagecolor{%s}' % ( 368 | 'background' if color_defs[0] else self.__background_color 369 | ) 370 | if self.__foreground_color: 371 | # opening brace isn't required here, inserted automatically 372 | color_body += '\\color{%s}' % ( 373 | 'foreground' if color_defs[1] else self.__foreground_color 374 | ) 375 | return (''.join(color_defs), color_body) 376 | 377 | def _format_document(self, preamble): 378 | """Return a formatted LaTeX document with the specified formula 379 | embedded.""" 380 | formula = self.__equation.lstrip().rstrip() 381 | if self.__replace_nonascii: 382 | formula = escape_unicode_maths(formula, replace_alphabeticals=True) 383 | # Try to detect and support the usage of math environments which 384 | # cannot be nested in other math environments in order to 385 | # prevent invalid nesting. I.e., when found, such an environment 386 | # is not wrapped (fixing #21). 387 | if MATH_ENV_DETECTION_PATTERN.match(formula): 388 | opening = closing = '' 389 | elif self.__maths_env: 390 | opening = '\\begin{%s}' % self.__maths_env 391 | closing = '\\end{%s}' % self.__maths_env 392 | else: 393 | # determine characters with which to surround the formula 394 | opening = '\\[' if self.__displaymath else '\\(' 395 | closing = '\\]' if self.__displaymath else '\\)' 396 | fontsize = 'fontsize=%ipt' % self.__fontsize 397 | color_preamble, color_body = self._format_colors() 398 | return inspect.cleandoc( 399 | f""" 400 | \\PassOptionsToPackage{{dvipsnames}}{{xcolor}}\n 401 | \\documentclass[{fontsize}, fleqn]{{scrartcl}}\n 402 | {preamble} 403 | \\usepackage{{xcolor}} 404 | {color_preamble} 405 | {color_body} 406 | % tightpage must be last, see its package docs 407 | \\usepackage[active,textmath,displaymath,tightpage]{{preview}}\n 408 | \\begin{{document}}\n 409 | \\noindent% 410 | \\begin{{preview}}{{%s 411 | {opening}{formula}{closing}}}\\end{{preview}}\n 412 | \\end{{document}}\n 413 | """ 414 | ) 415 | 416 | 417 | def increase_readability(formula, replace_nonascii=False): 418 | """In alternate texts for non-image users or those using a screen reader, 419 | the LaTeX code should be as readable as possible. 420 | 421 | Therefore the formula should not contain unicode characters or 422 | formatting instructions. 423 | """ 424 | if replace_nonascii: 425 | # keep umlauts, etc; makes the alt more readable, yet wouldn't compile 426 | formula = escape_unicode_maths(formula, replace_alphabeticals=False) 427 | # replace formatting-only symbols which distract the reader 428 | formula_changed = True 429 | while formula_changed: 430 | formula_changed = False 431 | for command in FORMATTING_COMMANDS: 432 | idx = formula.find(command) 433 | # only replace if it's not after a \\ and not part of a longer command 434 | if (idx > 0 and formula[idx - 1] != '\\') or idx == 0: 435 | end = idx + len(command) 436 | # following conditions for replacement must be met: 437 | # command doesn't end on alphabet. char. and is followed by same 438 | # category OR end of string reached OR command does not # end on 439 | # alphabetical char. at all 440 | if ( 441 | end >= len(formula) 442 | or not command[-1].isalpha() 443 | or not formula[end].isalpha() 444 | ): 445 | formula = formula[:idx] + ' ' + \ 446 | formula[idx + len(command):] 447 | formula = formula.replace(' ', ' ') 448 | formula_changed = True 449 | return formula 450 | -------------------------------------------------------------------------------- /manpage.md: -------------------------------------------------------------------------------- 1 | % GLADTEX(1) 2 | % Sebastian Humenda 3 | % 5th of June 2021 4 | 5 | # NAME 6 | 7 | **GladTeX** - generate HTML with LaTeX formulas embedded as images 8 | 9 | # SYNOPSIS 10 | 11 | **gladtex** [OPTIONS] 12 | 13 | 14 | # DESCRIPTION 15 | 16 | **GladTeX** is a formula preprocessor for HTML files. It recognizes a special tag 17 | (`...`) marking formulas for conversion. The converted vector images 18 | are integrated into the output HTML document. 19 | This eases the process of creating HTML 20 | documents (or web sites) containing formulas.\ 21 | The generated images are saved in a cache to not render the same image over 22 | and over again. This speeds up the process when formulas occur multiple times or 23 | when a document is extended gradually. 24 | 25 | The LaTeX formulas are preserved in the alt attribute of the embedded images, 26 | hence screen reader users benefit from an accessible HTML version of the 27 | document. 28 | 29 | Furthermore it can be used with Pandoc to convert Markdown documents and other 30 | formats with LaTeX formulas to HTML, EPUB and in fact to any HTML-based format, 31 | see the option `-P`. 32 | 33 | See [FILE FORMAT](#file-format) for an explanation of the file format and 34 | [EXAMPLES](#examples) for examples on how to use GladTeX on its own or with 35 | Pandoc. 36 | 37 | # OPTIONS 38 | 39 | **INPUT FILE NAME** 40 | : Input .htex file with LaTeX formulas (if omitted or -, stdin will be read). 41 | 42 | **-h** **--help** 43 | : Show this help message and exit. 44 | 45 | **-a** 46 | : Save text alternatives for images which are too long for the alt attribute 47 | into a single separate file and link images to it. 48 | 49 | **-b** _BACKGROUND_COLOR_ 50 | : Set background color for resulting images (default transparent). GladTeX 51 | understands colors as provided by the `dvips` option of the xcolor LaTeX 52 | package. Alternatively, a 6-digit hexadecimal value can be provided (as used 53 | e.g. in HTML/CSS). 54 | 55 | **-c** _`FOREGROUND_COLOR`_ 56 | : Set foreground color for resulting images. See the option above for a more 57 | in-depth explanation. 58 | 59 | **-d** _DIRECTORY_ 60 | : Directory in which to store the generated images in (relative path).\ 61 | The given path is interpreted relatively to the input file. For instance,: 62 | 63 | gladtex -d img dir/file.htex 64 | 65 | will create a `dir/img` directory and link accordingly in `x/file.htex`. 66 | 67 | **-e** _`LATEX_MATHS_ENV`_ 68 | : Set custom maths environment to surround the formula (e.g. flalign). 69 | 70 | **-E** _ENCODING_ 71 | : Overwrite encoding to use (default UTF-8). 72 | 73 | **--epub** 74 | : Make embedded formula image more EPUB-compliant, i.e. round pixel sizes to 75 | integers. 76 | 77 | **-f** _FONTSIZE_ 78 | : Overwrite the default font size of 12pt. 12pt is the default in most 79 | browsers and hence changing this might lead to less-portable documents. 80 | 81 | **-i** _CLASS_ 82 | : CSS class to assign to inline math (default: 'inlinemath'). 83 | 84 | **-K** 85 | : keep LaTeX file(s) when converting formulas 86 | 87 | By default, the generated LaTeX document, containing the formula to be 88 | converted, are removed after the conversion (no matter whether it was 89 | successful or not). If it wasn't successful, it is sometimes helpful to look 90 | at the complete document. This option will keep the file. 91 | 92 | **-l** _CLASS_ 93 | : CSS class to assign to block-level math (default: 'displaymath'). 94 | 95 | **-n** 96 | : Purge unreadable caches along with all eqn*.png files. 97 | 98 | Caches can be unreadable if the used GladTeX version is incompatible. If 99 | this option is unset, GladTeX will simply fail when the cache is unreadable. 100 | 101 | **-m** 102 | : Print error output in machine-readable format (less concise, better parseable). 103 | 104 | Each line will start with a key, followed by a colon, followed by the value, 105 | i.e. `line: 5`. 106 | 107 | **-o** _FILENAME_ 108 | : Set output file name. '-' will print text to stdout. Bydefault, input file 109 | name is used and the `.htex` extension is replaced by `.html`. 110 | 111 | **-p** _`LATEX_STATEMENT`_ 112 | : Add given LaTeX code to the preamble of the LaTeX document that is used to 113 | generate the embedded images. In order to add the contents of a file to the 114 | preamble, use `-p "\input{FILE}"`. 115 | 116 | **-P** 117 | : Act as a pandoc filter. In this mode, input is expected to be a Pandoc JSON 118 | AST and the output will be a modified AST, with all formulas replaced 119 | through HTML image tags. It makes sense to use `-` as the input file for 120 | this option. 121 | This option implies `-E UTF-8`. Also see [GLADTEX_ARGS](#gladtex_args) on 122 | how to invoke GladTeX as a pandoc filter and how to pass arguments in this 123 | mode. 124 | 125 | **--png** 126 | : Switch from SVG to PNG as image output. This image has several known issues, 127 | one of them being that images won't resize when zooming into the document. 128 | It is also harder to work with for visually impaired users. 129 | 130 | **-r** _DPI_ 131 | : Set resolution (size of images) to 'dpi' (115 by default). This is only 132 | available with the `--png` option. Also see the `-f` option. 133 | 134 | **-R** 135 | : Replace non-ascii (unicode) characters by LaTeX commands. 136 | 137 | GladTeX can automatically detect non-ascii characters in formulas and 138 | replace them through their appropriate LaTeX commands. In the alt attribute 139 | of the resulting image, alphabetical characters won't be replaced. That 140 | means that the alt text from the image is not exactly the same than the 141 | code used for generating the image, but it is far more readable. 142 | 143 | For instance, the formula \$\\text{für alle} a\$, would be compiled as 144 | \$\\text{f\\ddot{u}r alle} a\$ and displayed as "\\text{für alle} a" in the alt 145 | attribute. 146 | 147 | 148 | **-u** _URL_ 149 | : Base URL to image files (relative links are default). 150 | 151 | # FILE FORMAT 152 | 153 | A .htex file is essentially a HTML file containing LaTeX formulas. The formulas 154 | have to be surrounded by `` and ``. 155 | 156 | By default, formulas are rendered as inline maths, so they are squeezed to the 157 | height of the line. It is possible to render a formula as display maths by 158 | setting the env attribute to displaymath, i.e. `...`. 159 | 160 | # ENVIRONMENT VARIABLES 161 | 162 | GladTeX can be customised by environment variables: 163 | 164 | `DEBUG` 165 | : If this is set to 1, a full Python traceback, instead of a human-readable 166 | error message, will be displayed. 167 | [`GLADTEX_ARGS`:]{#gladtex_args} 168 | : When this environment variable is set, GladTeX switches into 169 | the **pandoc filter** mode: input is read from standard input, output 170 | written to standard output and the `-P` and `-E UTF-8` options are assumed. 171 | The contents of this variable are parsed as command-line switches. Qutoing 172 | can be done in POSIX-shell compatible syntax: 173 | 174 | ``` 175 | export GLADTEX_ARGS='-d "image directory"' 176 | ``` 177 | 178 | It may be empty as well, which will just imply `-P`. 179 | See an example in [Output As EPUB]#output-asepub). 180 | 181 | # EXAMPLES 182 | 183 | ## Sample HTEX document 184 | 185 | A sample HTEX document could look like this: 186 | 187 | ~~~~ 188 | 189 | 190 |

Some text

191 |

Circumference of a circle: u = \pi\cdot d

192 |

A useful matrix: \begin{pmatrix} 193 | 1 &2 &3 &4\\ 194 | 5 &6 &7 &8\\ 195 | 9 &10&11&12 196 | \end{pmatrix}

197 | 198 | ~~~~ 199 | 200 | This can be converted using 201 | 202 | gladtex file.htex 203 | 204 | and the result will be a HTML document called `file.html` along with two files 205 | `eqn0000.png` and `eqn0001.png` in the same directory. 206 | 207 | ## Markdown To HTML 208 | 209 | GladTeX can be used together with Pandoc. That can be handy to create an online 210 | version of a scientific paper written in Markdown. The MarkDown document would 211 | look like this: 212 | 213 | ~~~~ 214 | Some text 215 | ========= 216 | 217 | Circumference of a circle: $u = \pi\cdot d$ 218 | 219 | A useful matrix: $$\begin{pmatrix} 220 | 1 &2 &3 &4\\ 221 | 5 &6 &7 &8\\ 222 | 9 &10&11&12 \end{pmatrix}$$ 223 | ~~~~ 224 | 225 | The conversion is as easy as typing on the command-line: 226 | 227 | pandoc -s -t html --gladtex file.md | gladtex -o file.html - 228 | 229 | ## Output as EPUB 230 | 231 | GladTeX can be used together with Pandoc, the swiss knife of format conversion. 232 | In short, any format that Pandoc understands can be converted to EPUB using 233 | GladTeX: 234 | 235 | pandoc -t json myexample.md | gladtex -d "img dir" -P --epub - | 236 | pandoc -f json -o book.epub 237 | 238 | GladTeX can be also directly called by Pandoc, by setting the environment 239 | variable `GLADTEX_ARGS` that automatically implies `-P`: 240 | 241 | export GLADTEX_ARGS='-d "img dir" --epub' 242 | pandoc -o book.EPUB --filter gladtex myexample.md 243 | 244 | 245 | # KNOWN LIMITATIONS 246 | 247 | LaTeX2E is ***not*** unicode aware. if you have any unicode (more precisely, 248 | non-ascii characters) signs in your documents, you have the choice to do one of 249 | the following: 250 | 251 | 1. Look up the symbol in one of the many LaTeX formula listings and replace the 252 | symbol with the appropriate command. 253 | 2. Use the `-r` switch to let GladTeX replace the umlauts for you. 254 | 255 | PLEASE NOTE: It is impossible to use GladTeX with LuaLaTeX. At the time of writing, dvipng 256 | does not support the extended font features of the lualatex engine. 257 | 258 | 259 | # PROJECT HOME 260 | 261 | The project home is at . The source can be 262 | found at . 263 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [tool.black] 2 | line-length = 88 3 | target-version = ['py36'] 4 | -------------------------------------------------------------------------------- /runtests: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | # change above to python, if your system doesn't have the 3 suffix for python 3 3 | set -e 4 | PYTHON=python 5 | # check whether a program called "python3" exists, else use "python" 6 | if command -v python3 >/dev/null 2>&1 7 | then 8 | PYTHON=python3 9 | fi 10 | 11 | $PYTHON -m unittest discover tests 12 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [metadata] 2 | description-file = README.md 3 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | import os 2 | import setuptools 3 | 4 | with open(os.path.join('gleetex', '__init__.py')) as f: 5 | global VERSION 6 | import re 7 | 8 | for line in f.read().split('\n'): 9 | vers = re.search(r'^VERSION\s+=\s+.(\d+\.\d+\.\d+)', line) 10 | if vers: 11 | VERSION = vers.groups()[0] 12 | if not VERSION: 13 | 14 | class SetupError(Exception): 15 | pass 16 | 17 | raise SetupError('Error parsing package version') 18 | 19 | setuptools.setup( 20 | name='GladTeX', 21 | version=VERSION, 22 | description='Formula typesetting for the web using LaTeX', 23 | long_description="""Formula typesetting for the web 24 | 25 | This package (and command-line application) allows to create web documents with 26 | properly typeset formulas. It uses the embedded LaTeX formulas from the source 27 | document to place SVG images at the right positions on the web page. For people 28 | not able to see the images (due to a poor internet connection or because of a 29 | disability), the LaTeX formula is preserved in the alt tag of the SVG image. 30 | GladTeX hence combines proper math typesetting on the web with the creation of 31 | accessible scientific documents.""", 32 | long_description_content_type='text/markdown', 33 | author='Sebastian Humenda', 34 | author_email='shumenda@gmx.de', 35 | url='https://humenda.github.io/GladTeX', 36 | packages=['gleetex'], 37 | entry_points={'console_scripts': ['gladtex = gleetex.__main__:main']}, 38 | license='LGPL3.0', 39 | ) 40 | -------------------------------------------------------------------------------- /tests/test_cachedconverter.py: -------------------------------------------------------------------------------- 1 | # pylint: disable=too-many-public-methods,import-error,too-few-public-methods,missing-docstring,unused-variable 2 | import os 3 | import shutil 4 | import tempfile 5 | import unittest 6 | from unittest.mock import patch 7 | from gleetex import cachedconverter, image 8 | from gleetex.caching import JsonParserException 9 | from gleetex.image import remove_all 10 | 11 | 12 | def get_number_of_files(path): 13 | return len(os.listdir(path)) 14 | 15 | 16 | def mk_eqn(eqn, count=0, pos=(1, 1)): 17 | """Create formula. 18 | 19 | Each formula must look like this: (eqn, pos, path, dsp, count) for 20 | self._convert_concurrently, this is a shorthand with mocking a few 21 | values. 22 | """ 23 | return (pos, False, eqn) 24 | 25 | 26 | def write(path, content='dummy'): 27 | with open(path, 'w', encoding='utf-8') as f: 28 | f.write(str(content)) 29 | 30 | 31 | class Tex2imgMock: 32 | """Could use a proper mock, but this one allows a bit more tricking.""" 33 | 34 | def __init__(self, fmt): 35 | self.__format = fmt 36 | self.set_dpi = ( 37 | self.set_transparency 38 | ) = ( 39 | self.set_foreground_color 40 | ) = self.set_background_color = lambda x: None # do nothing 41 | 42 | def create_dvi(self, dvi_fn): 43 | with open(dvi_fn, 'w') as f: 44 | f.write('dummy') 45 | 46 | def create_image(self, dvi_fn): 47 | if os.path.exists(dvi_fn): 48 | os.remove(dvi_fn) 49 | write(os.path.splitext(dvi_fn)[0] + '.' + self.__format.value) 50 | 51 | def convert(self, tx, basename): 52 | write(basename + '.tex', tx) 53 | dvi = basename + '.dvi' 54 | self.create_dvi(dvi) 55 | self.create_image(dvi) 56 | remove_all(dvi, basename + '.tex', basename + 57 | '.log', basename + '.aux') 58 | return {'depth': 9, 'height': 8, 'width': 7} 59 | 60 | def parse_log(self, _logdata): 61 | return {} 62 | 63 | 64 | class TestCachedConverter(unittest.TestCase): 65 | # pylint: disable=protected-access 66 | def setUp(self): 67 | self.original_directory = os.getcwd() 68 | self.tmpdir = tempfile.mkdtemp() 69 | os.chdir(self.tmpdir) 70 | 71 | # pylint: disable=protected-access 72 | def tearDown(self): 73 | # restore static reference to converter 74 | cachedconverter.CachedConverter._converter = image.Tex2img 75 | os.chdir(self.original_directory) 76 | shutil.rmtree(self.tmpdir, ignore_errors=True) 77 | 78 | @patch('gleetex.image.Tex2img', Tex2imgMock) 79 | def test_that_subdirectory_is_created(self): 80 | c = cachedconverter.CachedConverter('subdirectory') 81 | formula = ({}, True, '\\textbf{FOO!}') 82 | c.convert_all([formula]) 83 | # one directory exists 84 | self.assertEqual( 85 | get_number_of_files('.'), 86 | 1, 87 | "Found the following files, expected only 'subdirectory': " 88 | + ', '.join(os.listdir('.')), 89 | ) 90 | # subdirectory contains 1 image and a cache 91 | self.assertEqual( 92 | get_number_of_files('subdirectory'), 93 | 2, 94 | 'expected two' 95 | + ' files, found instead ' 96 | + repr(os.listdir('subdirectory')), 97 | ) 98 | 99 | def test_that_unknown_options_trigger_exception(self): 100 | c = cachedconverter.CachedConverter('subdirectory') 101 | self.assertRaises(ValueError, c.set_option, 'cxzbiucxzbiuxzb', 'muh') 102 | 103 | def test_that_invalid_caches_trigger_error_by_default(self): 104 | with open('gladtex.cache', 'w') as f: 105 | f.write('invalid cache') 106 | with self.assertRaises(JsonParserException): 107 | c = cachedconverter.CachedConverter('') 108 | 109 | @patch('gleetex.image.Tex2img', Tex2imgMock) 110 | def test_that_invalid_caches_get_removed_if_specified(self): 111 | formulas = [mk_eqn('tau')] 112 | with open('gladtex.cache', 'w') as f: 113 | f.write('invalid cache') 114 | c = cachedconverter.CachedConverter('.', keep_old_cache=False) 115 | c.convert_all(formulas) 116 | # cache got overridden 117 | with open('gladtex.cache') as f: 118 | self.assertFalse('invalid' in f.read()) 119 | 120 | @patch('gleetex.image.Tex2img', Tex2imgMock) 121 | def test_that_converted_formulas_are_cached(self): 122 | formulas = [mk_eqn('\\tau')] 123 | c = cachedconverter.CachedConverter('.') 124 | c.convert_all(formulas) 125 | self.assertTrue(c.get_data_for('\\tau', False)) 126 | 127 | @patch('gleetex.image.Tex2img', Tex2imgMock) 128 | def test_that_file_names_are_correctly_picked(self): 129 | formulas = [mk_eqn('\\tau')] 130 | write('eqn000.svg') 131 | write('eqn001.svg') 132 | c = cachedconverter.CachedConverter('') 133 | to_convert = c._get_formulas_to_convert(formulas) 134 | self.assertTrue(len(to_convert), 1) 135 | self.assertEqual(to_convert[0][2], 'eqn002.svg') 136 | 137 | @patch('gleetex.image.Tex2img', Tex2imgMock) 138 | def test_that_all_converted_formulas_are_in_cache_and_meta_info_correct(self): 139 | formulas = [mk_eqn('a_{%d}' % i, pos=(i, i), count=i) 140 | for i in range(4)] 141 | c = cachedconverter.CachedConverter('.') 142 | c.convert_all(formulas) 143 | # expect all formulas and a gladtex cache to exist 144 | self.assertEqual( 145 | get_number_of_files('.'), 146 | len(formulas) + 1, 147 | 'present files:\n' + ', '.join(os.listdir('.')), 148 | ) 149 | for pos, dsp, formula in formulas: 150 | data = c.get_data_for(formula, False) 151 | self.assertEqual( 152 | data['pos'], 153 | {'depth': 9, 'height': 8, 'width': 7}, 154 | 'expected the pos as defined in the dummy class', 155 | ) 156 | 157 | @patch('gleetex.image.Tex2img', Tex2imgMock) 158 | def test_that_inline_math_and_display_math_results_in_different_formulas(self): 159 | # two formulas, second is displaymath 160 | formula = r'\sum_{i=0}^n x_i' 161 | formulas = [((1, 1), False, formula), ((3, 1), True, formula)] 162 | c = cachedconverter.CachedConverter('.') 163 | c.convert_all(formulas) 164 | # expect all formulas and a gladtex cache to exist 165 | self.assertEqual( 166 | get_number_of_files('.'), 167 | len(formulas) + 1, 168 | 'present files:\n%s' % ', '.join(os.listdir('.')), 169 | ) 170 | -------------------------------------------------------------------------------- /tests/test_caching.py: -------------------------------------------------------------------------------- 1 | # pylint: disable=too-many-public-methods,import-error,too-few-public-methods,missing-docstring,unused-variable 2 | import os 3 | import shutil 4 | import tempfile 5 | import unittest 6 | from gleetex import caching 7 | 8 | 9 | def write(path, content): 10 | with open(path, 'w', encoding='utf-8') as f: 11 | f.write(content) 12 | 13 | 14 | class test_caching(unittest.TestCase): 15 | def setUp(self): 16 | self.pos = {'height': 8, 'depth': 2, 'width': 666} 17 | self.original_directory = os.getcwd() 18 | self.tmpdir = tempfile.mkdtemp() 19 | os.chdir(self.tmpdir) 20 | 21 | def tearDown(self): 22 | os.chdir(self.original_directory) 23 | shutil.rmtree(self.tmpdir, ignore_errors=True) 24 | 25 | def test_differently_spaced_formulas_are_the_same(self): 26 | form1 = r'\tau \pi' 27 | form2 = '\tau\\pi' 28 | self.assertTrue( 29 | caching.normalize_formula(form1), caching.normalize_formula(form2) 30 | ) 31 | 32 | def test_trailing_and_leading_spaces_and_tabs_are_no_problem(self): 33 | u = caching.normalize_formula 34 | form1 = ' hi' 35 | form2 = 'hi ' 36 | form3 = '\thi' 37 | self.assertEqual(u(form1), u(form2)) 38 | self.assertEqual(u(form1), u(form3)) 39 | 40 | def test_that_empty_braces_are_ignored(self): 41 | u = caching.normalize_formula 42 | form1 = r'\sin{}x' 43 | form2 = r'\sin x' 44 | form3 = r'\sin{} x' 45 | self.assertEqual(u(form1), u(form2)) 46 | self.assertEqual(u(form1), u(form3)) 47 | self.assertEqual(u(form2), u(form3)) 48 | 49 | def test_empty_cache_works_fine(self): 50 | write('foo.png', 'muha') 51 | c = caching.ImageCache('file.png') 52 | formula = r'f(x) = \ln(x)' 53 | c.add_formula(formula, self.pos, 'foo.png') 54 | self.assertTrue(c.contains(formula, False)) 55 | 56 | def test_that_invalid_cach_entries_are_detected(self): 57 | # entry is invalid if file doesn't exist 58 | c = caching.ImageCache() 59 | formula = r'f(x) = \ln(x)' 60 | self.assertRaises(OSError, c.add_formula, 61 | formula, self.pos, 'file.png') 62 | 63 | def test_that_correct_pos_and_path_are_returned_after_writing_the_cache_back(self): 64 | c = caching.ImageCache() 65 | formula = r'g(x) = \ln(x)' 66 | write('file.png', 'dummy') 67 | c.add_formula(formula, self.pos, 'file.png', displaymath=False) 68 | c.write() 69 | c = caching.ImageCache() 70 | self.assertTrue(c.contains(formula, False)) 71 | data = c.get_data_for(formula, False) 72 | self.assertEqual(data['pos'], self.pos) 73 | self.assertEqual(data['path'], 'file.png') 74 | 75 | def test_formulas_are_not_added_twice(self): 76 | form1 = r'\ln(x) \neq e^x' 77 | write('spass.png', 'binaryBinary_binary') 78 | c = caching.ImageCache() 79 | for i in range(1, 10): 80 | c.add_formula(form1, self.pos, 'spass.png') 81 | self.assertEqual(len(c), 1) 82 | 83 | def test_that_remove_actually_removes(self): 84 | form1 = '\\int e^x dy' 85 | write('happyness.png', 'binaryBinary_binary') 86 | c = caching.ImageCache() 87 | c.add_formula(form1, self.pos, 'happyness.png') 88 | c.remove_formula(form1, False) 89 | self.assertEqual(len(c), 0) 90 | 91 | def test_removal_of_non_existing_formula_raises_exception(self): 92 | c = caching.ImageCache() 93 | self.assertRaises(KeyError, c.remove_formula, 'Haha!', False) 94 | 95 | def test_that_invalid_version_is_detected(self): 96 | c = caching.ImageCache('gladtex.cache') 97 | c._ImageCache__set_version('invalid.stuff') 98 | c.write() 99 | self.assertRaises( 100 | caching.JsonParserException, caching.ImageCache, 'gladtex.cache' 101 | ) 102 | 103 | def test_that_invalid_style_is_detected(self): 104 | write('foo.png', 'dummy') 105 | c = caching.ImageCache('gladtex.cache') 106 | c.add_formula('\\tau', self.pos, 'foo.png', False) 107 | c.add_formula('\\theta', self.pos, 'foo.png', True) 108 | self.assertRaises( 109 | ValueError, c.add_formula, '\\gamma', self.pos, 'foo.png', 'some stuff' 110 | ) 111 | 112 | def test_that_backslash_in_path_is_replaced_through_slash(self): 113 | c = caching.ImageCache('gladtex.cache') 114 | os.mkdir('bilder') 115 | write(os.path.join('bilder', 'foo.png'), str(0xDEADBEEF)) 116 | c.add_formula('\\tau', self.pos, 'bilder\\foo.png', False) 117 | self.assertTrue('/' in c.get_data_for('\\tau', False)['path']) 118 | 119 | def test_that_absolute_paths_trigger_OSError(self): 120 | c = caching.ImageCache('gladtex.cache') 121 | write('foo.png', 'dummy') 122 | fn = os.path.abspath('foo.png') 123 | self.assertRaises(OSError, c.add_formula, '\\tau', self.pos, fn, False) 124 | 125 | def test_that_invalid_caches_are_removed_automatically_if_desired(self): 126 | def file_was_removed(x): return self.assertFalse( 127 | os.path.exists(x), 128 | 'expected that file %s was removed, but it still exists' % x, 129 | ) 130 | write('gladtex.cache', 'some non-json rubbish') 131 | c = caching.ImageCache('gladtex.cache', keep_old_cache=False) 132 | file_was_removed('gladtex.cache') 133 | # try the same in a subdirectory 134 | os.mkdir('foo') 135 | cache_path = os.path.join('foo', 'gladtex.cache') 136 | eqn1_path = os.path.join('foo', 'eqn000.png') 137 | eqn2_path = os.path.join('foo', 'eqn003.png') 138 | write(cache_path, 'some non-json rubbish') 139 | write(eqn1_path, 'binary') 140 | write(eqn2_path, 'more binary') 141 | c = caching.ImageCache(cache_path, keep_old_cache=False) 142 | file_was_removed(cache_path) 143 | file_was_removed(eqn1_path) 144 | file_was_removed(eqn2_path) 145 | 146 | def test_that_formulas_in_cache_with_no_file_raise_key_error(self): 147 | c = caching.ImageCache('gladtex.cache', keep_old_cache=False) 148 | write('foo.png', 'dummy') 149 | c.add_formula('\\tau', self.pos, 'foo.png') 150 | c.write() 151 | os.remove('foo.png') 152 | c = caching.ImageCache('gladtex.cache', keep_old_cache=False) 153 | with self.assertRaises(KeyError): 154 | c.get_data_for('foo.png', 'False') 155 | -------------------------------------------------------------------------------- /tests/test_htmlhandling.py: -------------------------------------------------------------------------------- 1 | # pylint: disable=too-many-public-methods 2 | from functools import reduce 3 | import io 4 | import os 5 | import re 6 | import shutil 7 | import tempfile 8 | import unittest 9 | from gleetex import htmlhandling, sink 10 | 11 | 12 | excl_filename = sink.EXCLUSION_FILE_NAME 13 | 14 | HTML_SKELETON = """ 15 | {1}""" 16 | 17 | 18 | def read(file_name, mode='r', encoding='utf-8'): 19 | """Read the file, return the string. 20 | 21 | Close file properly. 22 | """ 23 | with open(file_name, mode, encoding=encoding) as handle: 24 | return handle.read() 25 | 26 | 27 | class HtmlparserTest(unittest.TestCase): 28 | def setUp(self): 29 | self.p = htmlhandling.EqnParser() 30 | 31 | def test_start_tags_are_parsed_literally(self): 32 | self.p.feed("

") 33 | self.assertEqual( 34 | self.p.get_data()[0], 35 | "

", 36 | 'The HTML parser should copy start tags literally.', 37 | ) 38 | 39 | def test_that_end_tags_are_copied_literally(self): 40 | self.p.feed('

') 41 | self.assertEqual(''.join(self.p.get_data()), '

') 42 | 43 | def test_entities_are_unchanged(self): 44 | self.p.feed(' ') 45 | self.assertEqual(self.p.get_data()[0], ' ') 46 | 47 | def test_charsets_are_copied(self): 48 | self.p.feed('>→') 49 | self.assertEqual(''.join(self.p.get_data()[0]), '>→') 50 | 51 | def test_without_eqn_all_blocks_are_strings(self): 52 | self.p.feed('\n

42

blah

') 53 | self.assertTrue( 54 | reduce(lambda x, y: x and isinstance(y, str), self.p.get_data()), 55 | 'all chunks have to be strings', 56 | ) 57 | 58 | def test_equation_is_detected(self): 59 | self.p.feed('foo \\pi') 60 | self.assertTrue(isinstance(self.p.get_data()[0], (tuple, list))) 61 | self.assertEqual(self.p.get_data()[0][2], 'foo \\pi') 62 | 63 | def test_tag_followed_by_eqn_is_correctly_recognized(self): 64 | self.p.feed('

bar') 65 | self.assertEqual(self.p.get_data()[0], '

') 66 | self.assertTrue( 67 | isinstance(self.p.get_data(), list), 68 | 'second item of data must be equation data list', 69 | ) 70 | 71 | def test_document_with_tag_then_eqn_then_tag_works(self): 72 | self.p.feed('

bar
baz') 73 | eqn = None 74 | # test should not depend on a specific position of equation, search for 75 | # it 76 | data = self.p.get_data() 77 | for chunk in data: 78 | if isinstance(chunk, (tuple, list)): 79 | eqn = chunk 80 | break 81 | self.assertTrue(isinstance(data[0], str)) 82 | self.assertTrue( 83 | eqn is not None, 'No equation found, must be tuple/list object.' 84 | ) 85 | self.assertTrue(isinstance(data[-1], str)) 86 | 87 | def test_equation_is_copied_literally(self): 88 | self.p.feed('my\nlittle\n\\tau') 89 | self.assertEqual(self.p.get_data()[0][2], 'my\nlittle\n\\tau') 90 | 91 | def test_unclosed_eqns_are_detected(self): 92 | self.assertRaises( 93 | htmlhandling.ParseException, self.p.feed, '

\\endless\\formula' 94 | ) 95 | 96 | def test_nested_formulas_trigger_exception(self): 97 | self.assertRaises( 98 | htmlhandling.ParseException, self.p.feed, '\\pi' 99 | ) 100 | self.assertRaises( 101 | htmlhandling.ParseException, self.p.feed, '\\pi

' 102 | ) 103 | 104 | def test_formulas_without_displaymath_attribute_are_detected(self): 105 | self.p.feed('

\frac12
bar

') 106 | formulas = [c for c in self.p.get_data() if isinstance(c, 107 | (tuple, list))] 108 | self.assertEqual(len(formulas), 2) # there should be _2_ formulas 109 | self.assertEqual(formulas[0][1], False) # no displaymath 110 | self.assertEqual(formulas[1][1], False) # no displaymath 111 | 112 | def test_that_unclosed_formulas_detected(self): 113 | self.assertRaises(htmlhandling.ParseException, 114 | self.p.feed, '\\pi

') 115 | self.assertRaises(htmlhandling.ParseException, self.p.feed, '\\pi') 116 | 117 | def test_formula_contains_only_formula(self): 118 | p = htmlhandling.EqnParser() 119 | p.feed('

1

') 120 | formula = next(e for e in p.get_data() if isinstance(e, (list, tuple))) 121 | self.assertEqual(formula[-1], '1test

') 125 | formula = next(e for e in p.get_data() if isinstance(e, (list, tuple))) 126 | self.assertEqual(formula[-1], 'test') 127 | 128 | p = htmlhandling.EqnParser() 129 | p.feed('

1

') 130 | formula = next(e for e in p.get_data() if isinstance(e, (list, tuple))) 131 | self.assertEqual(formula[-1], '1a>b
') 135 | formula = self.p.get_data()[0] 136 | self.assertEqual(formula[-1], 'a>b') 137 | 138 | def test_displaymath_is_recognized(self): 139 | self.p.feed( 140 | '\\sum\limits_{n=1}^{e^i} a^nl^n') 141 | self.assertEqual(self.p.get_data()[0][1], True) # displaymath flag set 142 | 143 | def test_encoding_is_parsed_from_HTML4(self): 144 | iso8859_1 = HTML_SKELETON.format( 145 | 'iso-8859-15', 'öäüß').encode('iso-8859-1') 146 | self.p.feed(iso8859_1) 147 | self.assertEqual(self.p._EqnParser__encoding, 'iso-8859-15') 148 | 149 | def test_encoding_is_parsed_from_HTML5(self): 150 | document = r""" 151 | 152 | 153 | 154 |

hi

""" 155 | self.p.feed(document.encode('utf-8')) 156 | self.assertEqual(self.p._EqnParser__encoding.lower(), 'utf-8') 157 | 158 | def test_strings_can_be_passed_tO_parser_as_well(self): 159 | # no exception - everything is working as expected 160 | self.p.feed(HTML_SKELETON.format('utf-8', 'æø')) 161 | 162 | 163 | class GetPositionTest(unittest.TestCase): 164 | def test_that_line_number_is_correct(self): 165 | self.assertEqual(htmlhandling.get_position('jojo', 0)[0], 0) 166 | self.assertEqual(htmlhandling.get_position('jojo', 3)[0], 0) 167 | self.assertEqual(htmlhandling.get_position('a\njojo', 3)[0], 1) 168 | self.assertEqual(htmlhandling.get_position('a\n\njojo', 3)[0], 2) 169 | 170 | def test_that_position_on_line_is_correct(self): 171 | self.assertEqual(htmlhandling.get_position('jojo', 0)[1], 0) 172 | self.assertEqual(htmlhandling.get_position('jojo', 3)[1], 3) 173 | self.assertEqual(htmlhandling.get_position('a\njojo', 3)[1], 2) 174 | self.assertEqual(htmlhandling.get_position('a\n\njojo', 3)[1], 1) 175 | 176 | 177 | class HtmlImageTest(unittest.TestCase): 178 | def setUp(self): 179 | self.pos = {'depth': 99, 'height': 88, 'width': 77} 180 | self.original_directory = os.getcwd() 181 | self.tmpdir = tempfile.mkdtemp() 182 | os.chdir(self.tmpdir) 183 | 184 | def tearDown(self): 185 | os.chdir(self.original_directory) 186 | shutil.rmtree(self.tmpdir, ignore_errors=True) 187 | 188 | def test_that_no_file_is_written_if_no_content(self): 189 | ht = htmlhandling.HtmlImageFormatter('foo') 190 | self.assertFalse(os.path.exists('foo.html')) 191 | 192 | def test_file_if_written_when_content_exists(self): 193 | img = htmlhandling.HtmlImageFormatter() 194 | img.format(self.pos, '\\tau\\tau' * 20, 'foo.png') 195 | self.assertTrue(len(img.get_excluded()) == 1) 196 | 197 | def test_id_contains_no_special_characters(self): 198 | data = htmlhandling.generate_label("\\tau!'{}][~^") 199 | for character in {'!', "'", '\\', '{', '}'}: 200 | self.assertFalse(character in data) 201 | 202 | def test_formula_can_consist_only_of_numbers_and_id_is_generated(self): 203 | data = htmlhandling.generate_label('9*8*7=504') 204 | self.assertTrue(data.startswith('form')) 205 | self.assertTrue(data.endswith('504')) 206 | 207 | def test_that_empty_ids_raise_exception(self): 208 | self.assertRaises(ValueError, htmlhandling.generate_label, '') 209 | 210 | def test_that_same_characters_are_not_repeated(self): 211 | id = htmlhandling.generate_label('jo{{{{{{{{ha') 212 | self.assertEqual(id, 'jo_ha') 213 | 214 | def test_that_ids_are_max_150_characters_wide(self): 215 | id = htmlhandling.generate_label('\\alpha\\cdot\\gamma + ' * 999) 216 | self.assertTrue(len(id) == 150) 217 | 218 | def test_that_ids_start_with_letter(self): 219 | id = htmlhandling.generate_label('{}\\[]ÖÖÖö9343...·tau') 220 | self.assertTrue(id[0].isalpha()) 221 | 222 | def test_that_link_to_external_image_points_to_file_and_formula(self): 223 | formula = '\\tau\\tau\gamma\delta' * 42 224 | img = htmlhandling.HtmlImageFormatter() 225 | formatted_img = img.format(self.pos, formula, 'foo.png') 226 | expected_id = htmlhandling.generate_label(formula) 227 | sink.html_write_excluded_file(excl_filename, img.get_excluded()) 228 | external_file = read(excl_filename, 'r', encoding='utf-8') 229 | # find linked formula path 230 | href = re.search('href="(.*?)"', formatted_img) 231 | self.assertTrue(href != None) 232 | # extract path and id from it 233 | self.assertTrue('#' in href.groups()[0]) 234 | path, id = href.groups()[0].split('#') 235 | self.assertEqual(path, excl_filename) 236 | self.assertEqual(id, expected_id) 237 | 238 | # check external file 239 | self.assertTrue(' patch level 3 20 | Babel <3.9r> and hyphenation patterns for 10 language(s) loaded. 21 | (/usr/share/texlive/texmf-dist/tex/latex/base/article.cls 22 | Document Class: article 2014/09/29 v1.4h Standard LaTeX document class 23 | (/usr/share/texlive/texmf-dist/tex/latex/base/size10.clo)) (./bla.aux) 24 | ! Undefined control sequence. 25 | \foo 26 | 27 | l.3 $\foo 28 | $ 29 | No pages of output. 30 | Transcript written on bla.log. 31 | """ 32 | 33 | 34 | def call_dummy(_lklklklklk, **blah): 35 | """Dummy to prohibit subprocess execution.""" 36 | return str(blah) 37 | 38 | 39 | # pylint: disable=unused-argument 40 | def latex_error_mock(_cmd, **quark): 41 | """Mock an error case.""" 42 | raise SubprocessError(LATEX_ERROR_OUTPUT) 43 | 44 | 45 | # pylint: disable=unused-argument 46 | def dvipng_mock(cmd, **kwargs): 47 | """Mock an error case.""" 48 | fn = None 49 | try: 50 | fn = next(e for e in cmd if e.endswith('.png')) 51 | except StopIteration: 52 | try: 53 | fn = next(e for e in cmd if e.endswith('.dvi')) 54 | except StopIteration: 55 | pass 56 | if fn: 57 | with open(fn, 'w') as f: 58 | f.write('test case') 59 | return ( 60 | 'This is dvipng 1.14 Copyright 2002-2010 Jan-Ake Larsson\n ' 61 | + 'depth=3 height=9 width=22' 62 | ) 63 | 64 | 65 | def touch(files): 66 | for file in files: 67 | dirname = os.path.dirname(file) 68 | if dirname and not os.path.exists(dirname): 69 | os.makedirs(dirname) 70 | with open(file, 'w') as f: 71 | f.write('\n') 72 | 73 | 74 | class test_imagecreation(unittest.TestCase): 75 | def setUp(self): 76 | self.original_directory = os.getcwd() 77 | self.tmpdir = tempfile.mkdtemp() 78 | os.chdir(self.tmpdir) 79 | image.Tex2img.call = call_dummy 80 | 81 | def tearDown(self): 82 | os.chdir(self.original_directory) 83 | shutil.rmtree(self.tmpdir, ignore_errors=True) 84 | 85 | @patch('gleetex.image.proc_call', latex_error_mock) 86 | def test_that_error_of_incorrect_formula_is_parsed_correctly(self): 87 | i = image.Tex2img(Format.Png) 88 | try: 89 | i.create_dvi(doc('\\foo'), 'foo.png') 90 | except SubprocessError as e: 91 | # expect undefined control sequence in error output 92 | self.assertTrue('Undefined' in e.args[0]) 93 | 94 | @patch('gleetex.image.proc_call', call_dummy) 95 | def test_that_intermediate_files_are_removed_after_successful_run(self): 96 | files = ['foo.log', 'foo.aux', 'foo.tex'] 97 | touch(files) 98 | i = image.Tex2img(Format.Png) 99 | i.create_dvi(doc('\\frac\\pi\\tau'), 'foo.png') 100 | for intermediate_file in files: 101 | self.assertFalse( 102 | os.path.exists(intermediate_file), 103 | 'File ' + intermediate_file + ' should not exist.', 104 | ) 105 | 106 | @patch('gleetex.image.proc_call', latex_error_mock) 107 | def test_that_intermediate_files_are_removed_when_exception_is_raised(self): 108 | files = ['foo.log', 'foo.aux', 'foo.tex'] 109 | touch(files) 110 | # error case 111 | i = image.Tex2img(Format.Png) 112 | try: 113 | i.convert(doc('\\foo'), 'foo') 114 | except SubprocessError as e: 115 | for intermediate_file in files: 116 | self.assertFalse( 117 | os.path.exists(intermediate_file), 118 | 'File ' + intermediate_file + ' should not exist.', 119 | ) 120 | 121 | @patch('gleetex.image.proc_call', dvipng_mock) 122 | def test_intermediate_files_are_removed(self): 123 | files = ['foo.tex', 'foo.log', 'foo.aux', 'foo.dvi'] 124 | touch(files) 125 | i = image.Tex2img(Format.Png) 126 | i.convert(doc('\\hat{x}'), 'foo') 127 | for intermediate_file in files: 128 | self.assertFalse(os.path.exists(intermediate_file)) 129 | 130 | @patch('gleetex.image.proc_call', latex_error_mock) 131 | def test_intermediate_files_are_removed_when_exception_raised(self): 132 | files = ['foo.tex', 'foo.log', 'foo.aux', 'foo.dvi'] 133 | touch(files) 134 | i = image.Tex2img(Format.Png) 135 | try: 136 | i.convert(doc('\\hat{x}'), 'foo') 137 | except SubprocessError: 138 | self.assertFalse(os.path.exists('foo.tex')) 139 | self.assertFalse(os.path.exists('foo.dvi')) 140 | self.assertFalse(os.path.exists('foo.log')) 141 | self.assertFalse(os.path.exists('foo.aux')) 142 | 143 | @patch( 144 | 'gleetex.image.proc_call', 145 | lambda *x, **y: 'This is dvipng 1.14 ' 146 | + 'Copyright 2002-2010 Jan-Ake Larsson\n depth=3 height=9 width=22', 147 | ) 148 | def test_that_values_for_positioning_png_are_returned(self): 149 | i = image.Tex2img(Format.Png) 150 | posdata = i.create_image('foo.dvi') 151 | self.assertTrue('height' in posdata) 152 | self.assertTrue('width' in posdata) 153 | 154 | @patch('gleetex.image.proc_call', dvipng_mock) 155 | def test_that_output_file_names_with_paths_are_ok_and_log_is_removed(self): 156 | def fname(f): return os.path.join('bilder', 'farce.' + f) 157 | touch([fname('log'), fname('png')]) 158 | t = image.Tex2img(Format.Png) 159 | t.convert(doc(r'\hat{es}\pi\pi\ldots'), fname('')[:-1]) 160 | self.assertFalse(os.path.exists('farce.log')) 161 | self.assertTrue( 162 | os.path.exists(fname('png')), 163 | "couldn't find file {}, directory structure:\n{}".format( 164 | fname('png'), ''.join(pprint.pformat(list(os.walk('.')))) 165 | ), 166 | ) 167 | self.assertFalse(os.path.exists(fname('log'))) 168 | 169 | 170 | class TestImageResolutionCorrectlyCalculated(unittest.TestCase): 171 | def test_sizes_are_correctly_calculated(self): 172 | self.assertEqual(int(image.fontsize2dpi(12)), 115) 173 | self.assertEqual(int(image.fontsize2dpi(10)), 96) 174 | -------------------------------------------------------------------------------- /tests/test_typesetting.py: -------------------------------------------------------------------------------- 1 | # pylint: disable=too-many-public-methods,import-error,too-few-public-methods,missing-docstring,unused-variable 2 | import unittest 3 | from gleetex.typesetting import LaTeXDocument 4 | import gleetex.typesetting as typesetting 5 | 6 | 7 | class test_typesetting(unittest.TestCase): 8 | def test_formula_is_embedded(self): 9 | formula = 'E = m \\cdot c^2' 10 | doc = LaTeXDocument(formula) 11 | self.assertTrue( 12 | formula in str(doc), 13 | 'formula must be contained in LaTeX typesetting as it was inserted.', 14 | ) 15 | 16 | def test_if_displaymath_unset_correct_env_used(self): 17 | doc = LaTeXDocument(r'A = \pi r^2') 18 | doc.set_displaymath(False) 19 | self.assertTrue('\\(' in str(doc)) 20 | self.assertTrue('\\)' in str(doc)) 21 | 22 | def test_if_displaymath_is_set_correct_env_used(self): 23 | doc = LaTeXDocument(r'A = \pi r^2') 24 | doc.set_displaymath(True) 25 | self.assertTrue('\\[' in str(doc)) 26 | self.assertTrue('\\]' in str(doc)) 27 | 28 | def test_preamble_is_included(self): 29 | preamble = '\\usepackage{eurosym}' 30 | doc = LaTeXDocument('moooo') 31 | doc.set_preamble_string(preamble) 32 | self.assertTrue(preamble in str(doc)) 33 | 34 | def test_obviously_wrong_encoding_trigger_exception(self): 35 | doc = LaTeXDocument('f00') 36 | self.assertRaises(ValueError, doc.set_encoding, 'latin1:') 37 | self.assertRaises(ValueError, doc.set_encoding, 'utf66') 38 | # the following passes (assertRaisesNot) 39 | doc.set_encoding('utf-8') 40 | 41 | def test_that_latex_maths_env_is_used(self): 42 | doc = LaTeXDocument('f00') 43 | doc.set_latex_environment('flalign*') 44 | self.assertTrue(r'\begin{flalign*}' in str(doc)) 45 | self.assertTrue(r'\end{flalign*}' in str(doc)) 46 | 47 | def test_non_nestable_math_envs_are_not_wrapped(self): 48 | doc = LaTeXDocument(r'\begin{align}foobar\end{align}') 49 | doc.set_displaymath(True) 50 | self.assertNotIn(r'\[', str(doc)) 51 | self.assertNotIn(r'\]', str(doc)) 52 | 53 | doc.set_displaymath(False) 54 | self.assertNotIn(r'\(', str(doc)) 55 | self.assertNotIn(r'\)', str(doc)) 56 | 57 | def test_non_nestable_math_envs_ignore_spaces(self): 58 | doc = LaTeXDocument(""" \t \t 59 | 60 | \v \r 61 | \\begin{align*}blabla 62 | 63 | \\end{align*} 64 | """) 65 | doc.set_displaymath(True) 66 | self.assertNotIn(r'\[', str(doc)) 67 | self.assertNotIn(r'\]', str(doc)) 68 | 69 | doc.set_displaymath(False) 70 | self.assertNotIn(r'\(', str(doc)) 71 | self.assertNotIn(r'\)', str(doc)) 72 | 73 | doc = LaTeXDocument(""" \t \r \t 74 | 75 | \t 76 | \t\v \\begin{array}{r} 77 | blabla 78 | 79 | \\end{array} 80 | """) 81 | doc.set_displaymath(True) 82 | self.assertIn(r'\[', str(doc)) 83 | self.assertIn(r'\]', str(doc)) 84 | 85 | doc.set_displaymath(False) 86 | self.assertIn(r'\(', str(doc)) 87 | self.assertIn(r'\)', str(doc)) 88 | 89 | def test_non_nestable_math_envs_ignore_comments_and_spaces(self): 90 | doc = LaTeXDocument(""" \r \t \t % klap30alf;''vak309u1$&la 91 | %\\begin{pmatrix}asdfl;3\v 92 | 93 | % 94 | \\begin{eqnarray}blabla 95 | 96 | \\end{eqnarray} 97 | """) 98 | doc.set_displaymath(True) 99 | self.assertNotIn(r'\[', str(doc)) 100 | self.assertNotIn(r'\]', str(doc)) 101 | 102 | doc.set_displaymath(False) 103 | self.assertNotIn(r'\(', str(doc)) 104 | self.assertNotIn(r'\)', str(doc)) 105 | 106 | doc = LaTeXDocument(""" \t % a3k0' a'ef4FAS \t 107 | \v \r 108 | \t % \\begin{align}asd3%%33-uadsfjl3;lkja04 109 | 110 | 111 | \t \\begin{array}{r} 112 | blabla 113 | 114 | \\end{array} 115 | """) 116 | doc.set_displaymath(True) 117 | self.assertIn(r'\[', str(doc)) 118 | self.assertIn(r'\]', str(doc)) 119 | 120 | doc.set_displaymath(False) 121 | self.assertIn(r'\(', str(doc)) 122 | self.assertIn(r'\)', str(doc)) 123 | 124 | def test_non_nestable_math_envs_ignore_comments_and_spaces_only(self): 125 | doc = LaTeXDocument(""" \r \t \t % klap30alf;''vak309u1$&la 126 | %\\begin{align*}asdfl;3\v 127 | 128 | f % 129 | \\begin{eqnarray}blabla 130 | 131 | \\end{eqnarray} 132 | """) 133 | doc.set_displaymath(True) 134 | self.assertIn(r'\[', str(doc)) 135 | self.assertIn(r'\]', str(doc)) 136 | 137 | doc.set_displaymath(False) 138 | self.assertIn(r'\(', str(doc)) 139 | self.assertIn(r'\)', str(doc)) 140 | 141 | doc = LaTeXDocument(""" \t % a3k0' a'ef4FAS \t 142 | \\ \v \r 143 | \t % \\begin{align}asd3%%33-uadsfjl3;lkja04 144 | 145 | 146 | \t \\begin{array}{r} 147 | blabla 148 | """) 149 | doc.set_displaymath(True) 150 | self.assertIn(r'\[', str(doc)) 151 | self.assertIn(r'\]', str(doc)) 152 | 153 | doc.set_displaymath(False) 154 | self.assertIn(r'\(', str(doc)) 155 | self.assertIn(r'\)', str(doc)) 156 | 157 | 158 | ################################################################################ 159 | 160 | 161 | class test_replace_unicode_characters(unittest.TestCase): 162 | def test_that_ascii_strings_are_returned_verbatim(self): 163 | for string in ['abc.\\', '`~[]}{:<>']: 164 | textmode = typesetting.replace_unicode_characters(string, False) 165 | self.assertEqual( 166 | textmode, string, 'expected %s, got %s' % (string, textmode) 167 | ) 168 | mathmode = typesetting.replace_unicode_characters(string, True) 169 | self.assertEqual( 170 | textmode, string, 'expected %s, got %s' % (string, mathmode) 171 | ) 172 | 173 | def test_that_alphabetical_characters_are_replaced_by_default(self): 174 | textmode = typesetting.replace_unicode_characters('ö', False) 175 | self.assertTrue('\\"' in textmode) 176 | mathmode = typesetting.replace_unicode_characters('ö', True) 177 | self.assertTrue('\\ddot' in mathmode) 178 | 179 | def test_that_alphabetical_characters_are_kept_in_text_mode_if_specified(self): 180 | self.assertEqual( 181 | typesetting.replace_unicode_characters( 182 | 'ö', False, replace_alphabeticals=False # text mode 183 | ), 184 | 'ö', 185 | ) 186 | self.assertEqual( 187 | typesetting.replace_unicode_characters( 188 | 'æ', False, replace_alphabeticals=False 189 | ), 190 | 'æ', 191 | ) 192 | 193 | def test_that_alphanumericals_are_replaced_in_mathmode_even_if_replace_alphabeticals_set( 194 | self, 195 | ): 196 | self.assertNotEqual( 197 | typesetting.replace_unicode_characters( 198 | 'öäü', True, replace_alphabeticals=True 199 | ), 200 | 'öäü', 201 | ) 202 | self.assertNotEqual( 203 | typesetting.replace_unicode_characters( 204 | 'æø', True, replace_alphabeticals=True 205 | ), 206 | 'æø', 207 | ) 208 | 209 | def test_that_charachters_not_present_in_file_raise_exception(self): 210 | with self.assertRaises(ValueError): 211 | typesetting.replace_unicode_characters('€', True) 212 | 213 | def test_that_formulas_are_replaced(self): 214 | self.assertNotEqual( 215 | typesetting.replace_unicode_characters('π', True), 'π') 216 | self.assertNotEqual( 217 | typesetting.replace_unicode_characters('π', False), 'π') 218 | 219 | 220 | class test_get_matching_brace(unittest.TestCase): 221 | def test_closing_brace_found_when_only_one_brace_present(self): 222 | text = 'text{ok}' 223 | self.assertEqual(typesetting.get_matching_brace( 224 | text, 4), len(text) - 1) 225 | self.assertEqual(typesetting.get_matching_brace( 226 | text + 'foo', 4), len(text) - 1) 227 | 228 | def test_outer_brace_found(self): 229 | text = 'text{o, bla\\"{o}dfdx.}ds' 230 | self.assertEqual(typesetting.get_matching_brace( 231 | text, 4), len(text) - 3) 232 | 233 | def test_inner_brace_is_matched(self): 234 | text = 'text{o, bla\\"{o}dfdx.}ds' 235 | self.assertEqual(typesetting.get_matching_brace(text, 13), 15) 236 | 237 | def test_that_unmatched_braces_raise_exception(self): 238 | with self.assertRaises(ValueError): 239 | typesetting.get_matching_brace('text{foooooooo', 4) 240 | with self.assertRaises(ValueError): 241 | typesetting.get_matching_brace('text{jo"{o....}', 4) 242 | 243 | def test_wrong_position_for_opening_brace_raises(self): 244 | with self.assertRaises(ValueError): 245 | typesetting.get_matching_brace('moo', 1) 246 | 247 | 248 | class test_escape_unicode_maths(unittest.TestCase): 249 | """These tests assume that the tests written above work!""" 250 | 251 | def test_that_mathmode_and_textmode_are_treated_differently(self): 252 | math = typesetting.escape_unicode_maths('ö') 253 | self.assertNotEqual(math, 'ö') 254 | text = typesetting.escape_unicode_maths('\\text{ö}') 255 | self.assertFalse('ö' in text) 256 | # check whether characters got transcribed differently; it's enough to 257 | # check one character of the generated sequence, they should differ 258 | self.assertNotEqual(math[:2], text[6:8]) 259 | 260 | def test_that_flag_to_preserve_alphas_is_passed_through(self): 261 | res = typesetting.escape_unicode_maths( 262 | '\\text{ö}', replace_alphabeticals=False) 263 | self.assertEqual(res, '\\text{ö}') 264 | 265 | def test_that_all_characters_are_preserved_when_no_replacements_happen(self): 266 | text = 'This is a \\text{test} mate.' 267 | self.assertEqual(typesetting.escape_unicode_maths(text), text) 268 | self.assertEqual( 269 | typesetting.escape_unicode_maths( 270 | text, replace_alphabeticals=False), text 271 | ) 272 | text = 'But yeah but no' * 20 + ', oh my god!' 273 | self.assertEqual(typesetting.escape_unicode_maths(text), text) 274 | self.assertEqual( 275 | typesetting.escape_unicode_maths( 276 | text, replace_alphabeticals=False), text 277 | ) 278 | 279 | def test_that_everything_around_surrounded_character_is_preserved(self): 280 | text = 'This is a \\text{über} test. ;)' 281 | result = typesetting.escape_unicode_maths( 282 | text, replace_alphabeticals=True) 283 | ue_pos = text.index('ü') 284 | # text in front is unchanged 285 | self.assertEqual(result[:ue_pos], text[:ue_pos]) 286 | # find b character, which is the start of the remaining string 287 | b_pos = result[ue_pos:].find('b') + ue_pos 288 | # check that text after umlaut matches 289 | self.assertEqual(result[b_pos:], text[ue_pos + 1:]) 290 | 291 | text = 'But yeah but no' * 20 + ', oh my god!ø' 292 | o_strok_pos = text.index('ø') 293 | res = typesetting.escape_unicode_maths(text) 294 | self.assertEqual(res[:o_strok_pos], text[:o_strok_pos]) 295 | 296 | def test_that_unknown_unicode_characters_raise_exception(self): 297 | # you know that Santa Clause character? Seriously, if you don't know it, 298 | # you should have a look. LaTeX does indeed not have command for this 299 | # (2016, one never knows) 300 | santa = chr(127877) 301 | with self.assertRaises(typesetting.DocumentSerializationException): 302 | typesetting.escape_unicode_maths(santa) 303 | 304 | def test_that_two_text_environments_preserve_all_characters(self): 305 | text = r'a\cdot b \text{equals} b\cdot c} \mbox{ is not equal } u^{v\cdot k}' 306 | self.assertEqual(typesetting.escape_unicode_maths(text), text) 307 | 308 | def test_color_names_in_backgroundare_accepted(self): 309 | doc = LaTeXDocument(r'A = \pi r^2') 310 | doc.set_background_color('cyan') 311 | doc = str(doc) 312 | self.assertTrue( 313 | 'pagecolor{cyan}' in doc, 'Expected \\pagecolor in document, got: %s' % doc 314 | ) 315 | self.assertTrue('\\color' not in doc) 316 | self.assertFalse('definecolor' in doc) 317 | 318 | def test_color_names_in_foregroundare_accepted(self): 319 | doc = LaTeXDocument(r'A = \pi r^2') 320 | doc.set_foreground_color('cyan') 321 | doc = str(doc) 322 | self.assertTrue( 323 | 'pagecolor' not in doc, 'Expected \\pagecolor in document, got: %s' % doc 324 | ) 325 | self.assertTrue('\\color{cyan' in doc, 326 | 'expeccted \\color{cyan, got:\n' + doc) 327 | self.assertFalse('definecolor' in doc) 328 | 329 | def test_hex_colours_with_leading_0s_work(self): 330 | doc = LaTeXDocument(r'A = \pi r^2') 331 | doc.set_background_color('00FFCC') 332 | doc = str(doc) 333 | self.assertTrue( 334 | 'pagecolor{background}' in doc, 335 | 'Expected \\pagecolor in document, got: %s' % doc, 336 | ) 337 | self.assertTrue('definecolor' in doc) 338 | self.assertTrue('00FFCC' in doc) 339 | 340 | def test_color_rgb_in_foregroundare_accepted(self): 341 | doc = LaTeXDocument(r'A = \pi r^2') 342 | doc.set_foreground_color('FFAACC') 343 | doc = str(doc) 344 | self.assertTrue( 345 | 'pagecolor{}' not in doc, 'Expected \\pagecolor in document, got: %s' % doc 346 | ) 347 | self.assertTrue( 348 | '\\color{foreground' in doc, 'document misses \\color command: %s' % doc 349 | ) 350 | self.assertTrue('definecolor' in doc) 351 | self.assertTrue('FFAACC' in doc) 352 | 353 | def test_color_rgb_in_backgroundare_accepted(self): 354 | doc = LaTeXDocument(r'A = \pi r^2') 355 | doc.set_background_color('FFAACC') 356 | doc = str(doc) 357 | self.assertTrue( 358 | 'pagecolor{background}' in doc, 359 | 'Expected \\pagecolor in document, got: %s' % doc, 360 | ) 361 | self.assertTrue('\\color' not in doc) 362 | self.assertTrue('definecolor' in doc) 363 | self.assertTrue('FFAACC' in doc) 364 | 365 | def test_no_colors_no_color_definitions(self): 366 | doc = str(LaTeXDocument(r'A = \pi r^2')) 367 | self.assertFalse('pagecolor' in doc) 368 | self.assertFalse('\\color' in doc) 369 | self.assertFalse('definecolor' in doc) 370 | -------------------------------------------------------------------------------- /update_unicode_table.py: -------------------------------------------------------------------------------- 1 | # (c) 2013-2021 Sebastian Humenda 2 | # This code is licenced under the terms of the LGPL-3+, see the file COPYING for 3 | # more details. 4 | """This script auto-generates gleetex/unicode_data.py. 5 | 6 | The purpose is to provide a table with mappings from unicode points to 7 | their LaTeX equivalent. This way, formulas can be converted using 8 | LaTeX2e, but the end-user can still use unicode in it's formulas. The 9 | unicode version can also be used to make the alternative text more 10 | readable. 11 | """ 12 | 13 | import collections 14 | import enum 15 | import os 16 | import shlex 17 | import shutil 18 | import sys 19 | import urllib.request 20 | import xml.etree.ElementTree as ET 21 | 22 | ################################################################################ 23 | # Constants 24 | 25 | 26 | class LaTeXMode(enum.Enum): 27 | # exception: not a constant, but required for one of the constants 28 | """Represent either math or text mode. 29 | 30 | Math mode in LaTeX is e.g. everything between $ and $. 31 | """ 32 | textmode = 0 33 | mathmode = 1 34 | 35 | 36 | # URL to XML file, which is used to generate the python source file 37 | UNICODE_TABLE_URL = ( 38 | 'https://raw.githubusercontent.com/w3c/xml-entities/gh-pages/unicode.xml' 39 | ) 40 | # a list of commands to replace, if found 41 | BAD_COMMANDS = { 42 | # decimal_codepoint: {LaTeXMode: new version} 43 | 178: {LaTeXMode.mathmode: '^2'}, 44 | 179: {LaTeXMode.mathmode: '^3'}, 45 | 181: {LaTeXMode.mathmode: '\\mu'}, 46 | 185: {LaTeXMode.mathmode: '^1'}, 47 | 8211: {LaTeXMode.mathmode: '\\mathrm{\\textendash}'}, 48 | 8722: {LaTeXMode.mathmode: '-'}, 49 | } 50 | 51 | ################################################################################ 52 | 53 | 54 | def get_unicode_table_xml(): 55 | with urllib.request.urlopen(UNICODE_TABLE_URL) as u: 56 | return ET.fromstring(u.read()) 57 | 58 | 59 | def create_unicode_latex_table(root): 60 | """This function iterates over the XML tree and extracts all characters for 61 | the unicode table. 62 | 63 | The resulting table will have the decimal unicode point as key. The 64 | value is again a dict with the possible keys from LaTeX and the 65 | LaTeX commands as string. Certain unicode points are ignored, to 66 | prevent replacing normal or control characters. 67 | """ 68 | unicode_table = {} 69 | for character in root.find('charlist').iterfind('character'): 70 | childtags = set(node.tag for node in character) 71 | # skip characters without LaTeX alternative 72 | if ( 73 | 'latex' not in childtags 74 | and 'AMS' not in childtags 75 | and 'mathlatex' not in childtags 76 | ): 77 | continue # skip this character 78 | attr = character.attrib.get 79 | # if no mode (text or math) was specified, ignore character 80 | if attr('mode') not in ('text', 'math', 'mixed', 'other'): 81 | continue 82 | 83 | # a defined character may have multiple codepoints (called ids); add 84 | # each of the ids as a separate entry to the table 85 | ids = tuple(map(int, attr('dec').split('-'))) 86 | if any(elem for elem in ids if elem < 161): 87 | continue # ignore ASCII and a few control unicode characters 88 | 89 | # extract textmode, mathmode and AMS commands: 90 | commands = {} 91 | if 'latex' in childtags: 92 | commands[LaTeXMode.textmode] = next( 93 | character.iterfind('latex')).text 94 | if 'AMS' in childtags: 95 | commands[LaTeXMode.mathmode] = next(character.iterfind('AMS')).text 96 | # only take LaTeX command from , if no AMS tag present and 97 | # no set was specified. A `set` is a attempt to specify the LaTeX 98 | # package which needs to be loaded. 99 | if 'mathlatex' in childtags and LaTeXMode.mathmode not in commands: 100 | mathnode = next(character.iterfind('mathlatex')) 101 | if 'set' not in mathnode.attrib: 102 | commands[LaTeXMode.mathmode] = mathnode.text 103 | 104 | if commands: # if a usable textmode and a mathmode without unicode-math found: 105 | for identification in ids: 106 | # some code points are not usable for our purposes, so update 107 | # the control sequences, if appropriate 108 | if identification in BAD_COMMANDS: 109 | commands.update(BAD_COMMANDS[identification]) 110 | unicode_table[identification] = commands 111 | return unicode_table 112 | 113 | 114 | def serialize_table(table): 115 | """Serialize the given unicode table to a python table, which could be 116 | directly executed by eval. 117 | 118 | The decimal code points, serving as a key in the dictionary, are 119 | sorted for the output. 120 | """ 121 | ordered_table = collections.OrderedDict() 122 | for key in sorted(table.keys()): 123 | ordered_table[key] = table[key] 124 | python_string = ['unicode_table = {'] 125 | def reprmode(m, v): return 'LaTeXMode.%s: %s' % (m.name, repr(v[m])) 126 | for code_point, replacements in ordered_table.items(): 127 | # serialize by hand to have a fixed order of items; helpful for a 128 | # minimal git diff 129 | commands = '' 130 | if LaTeXMode.textmode in replacements: 131 | commands = reprmode(LaTeXMode.textmode, replacements) 132 | if LaTeXMode.mathmode in replacements: 133 | if commands: 134 | commands += ', ' 135 | commands += reprmode(LaTeXMode.mathmode, replacements) 136 | python_string.append('%s: {%s},' % (code_point, commands)) 137 | return '\n '.join(python_string) + '\n }\n' 138 | 139 | 140 | def generate_python_src_file(table, python_table): 141 | """Generate a fully importable python source file, by dumping the enum 142 | declarations, python imports, doc strings and the given python string with 143 | the unicode table into the source and returning it as a whole string.""" 144 | enum_def = 'class LaTeXMode(enum.Enum):\n """%s"""\n ' % LaTeXMode.__doc__ 145 | enum_values = tuple(e for e in dir(LaTeXMode) if not e.startswith('_')) 146 | enum_def += '\n '.join( 147 | '%s = %s' % (name, getattr(LaTeXMode, name).value) for name in enum_values 148 | ) 149 | return """\"\"\" 150 | DO NOT ALTER THIS FILE IN ANY WAY, IT IS GENERATED AUTOMATICALLY. SEE THE SCRIPT 151 | `update_unicode_table.py` FOR MORE INFORMATION. 152 | 153 | This file contains a table of unicode code point to LaTeX command mapping. It 154 | has %s entries and was derived from 155 | <%s>.\"\"\" 156 | #pylint: disable=too-many-lines,missing-docstring\n\n 157 | import enum\n 158 | %s\n\n%s\n""" % ( 159 | len(table), 160 | UNICODE_TABLE_URL, 161 | enum_def, 162 | python_table, 163 | ) 164 | 165 | 166 | def main(): 167 | if not os.path.exists('gleetex'): 168 | print('Error: Generator script must be run from GladTeX source root.') 169 | table = create_unicode_latex_table(get_unicode_table_xml()) 170 | python_table = serialize_table(table) 171 | path = os.path.join('gleetex', 'unicode.py') 172 | with open(path, 'w', encoding='utf-8') as f: 173 | f.write(generate_python_src_file(table, python_table)) 174 | exit = 0 175 | if shutil.which('black'): 176 | exit = os.system(f'black {shlex.quote(path)}') 177 | sys.exit(exit) 178 | 179 | 180 | if __name__ == '__main__': 181 | main() 182 | --------------------------------------------------------------------------------