├── .gitignore
├── COPYING
├── ChangeLog
├── README.md
├── ToDo.FAQ
├── ToDo.md
├── create_windows_distributions.py
├── examples
└── md2epub.py
├── gleetex
├── __init__.py
├── __main__.py
├── cachedconverter.py
├── caching.py
├── htmlhandling.py
├── image.py
├── pandoc.py
├── parser.py
├── sink.py
├── typesetting.py
└── unicode.py
├── manpage.md
├── pyproject.toml
├── runtests
├── setup.cfg
├── setup.py
├── tests
├── test_cachedconverter.py
├── test_caching.py
├── test_htmlhandling.py
├── test_imagecreation.py
└── test_typesetting.py
└── update_unicode_table.py
/.gitignore:
--------------------------------------------------------------------------------
1 | *build*
2 | dist
3 | doc/gleetex.*html
4 | doc/index.html
5 | gladtex.1
6 | GladTeX.egg-info/
7 | *.pyc
8 | *pycache*
9 | *.swp
10 | *.zip
11 |
--------------------------------------------------------------------------------
/COPYING:
--------------------------------------------------------------------------------
1 | GNU LESSER GENERAL PUBLIC LICENSE
2 | Version 3, 29 June 2007
3 |
4 | Copyright (C) 2007 Free Software Foundation, Inc.
5 | Everyone is permitted to copy and distribute verbatim copies
6 | of this license document, but changing it is not allowed.
7 |
8 |
9 | This version of the GNU Lesser General Public License incorporates
10 | the terms and conditions of version 3 of the GNU General Public
11 | License, supplemented by the additional permissions listed below.
12 |
13 | 0. Additional Definitions.
14 |
15 | As used herein, "this License" refers to version 3 of the GNU Lesser
16 | General Public License, and the "GNU GPL" refers to version 3 of the GNU
17 | General Public License.
18 |
19 | "The Library" refers to a covered work governed by this License,
20 | other than an Application or a Combined Work as defined below.
21 |
22 | An "Application" is any work that makes use of an interface provided
23 | by the Library, but which is not otherwise based on the Library.
24 | Defining a subclass of a class defined by the Library is deemed a mode
25 | of using an interface provided by the Library.
26 |
27 | A "Combined Work" is a work produced by combining or linking an
28 | Application with the Library. The particular version of the Library
29 | with which the Combined Work was made is also called the "Linked
30 | Version".
31 |
32 | The "Minimal Corresponding Source" for a Combined Work means the
33 | Corresponding Source for the Combined Work, excluding any source code
34 | for portions of the Combined Work that, considered in isolation, are
35 | based on the Application, and not on the Linked Version.
36 |
37 | The "Corresponding Application Code" for a Combined Work means the
38 | object code and/or source code for the Application, including any data
39 | and utility programs needed for reproducing the Combined Work from the
40 | Application, but excluding the System Libraries of the Combined Work.
41 |
42 | 1. Exception to Section 3 of the GNU GPL.
43 |
44 | You may convey a covered work under sections 3 and 4 of this License
45 | without being bound by section 3 of the GNU GPL.
46 |
47 | 2. Conveying Modified Versions.
48 |
49 | If you modify a copy of the Library, and, in your modifications, a
50 | facility refers to a function or data to be supplied by an Application
51 | that uses the facility (other than as an argument passed when the
52 | facility is invoked), then you may convey a copy of the modified
53 | version:
54 |
55 | a) under this License, provided that you make a good faith effort to
56 | ensure that, in the event an Application does not supply the
57 | function or data, the facility still operates, and performs
58 | whatever part of its purpose remains meaningful, or
59 |
60 | b) under the GNU GPL, with none of the additional permissions of
61 | this License applicable to that copy.
62 |
63 | 3. Object Code Incorporating Material from Library Header Files.
64 |
65 | The object code form of an Application may incorporate material from
66 | a header file that is part of the Library. You may convey such object
67 | code under terms of your choice, provided that, if the incorporated
68 | material is not limited to numerical parameters, data structure
69 | layouts and accessors, or small macros, inline functions and templates
70 | (ten or fewer lines in length), you do both of the following:
71 |
72 | a) Give prominent notice with each copy of the object code that the
73 | Library is used in it and that the Library and its use are
74 | covered by this License.
75 |
76 | b) Accompany the object code with a copy of the GNU GPL and this license
77 | document.
78 |
79 | 4. Combined Works.
80 |
81 | You may convey a Combined Work under terms of your choice that,
82 | taken together, effectively do not restrict modification of the
83 | portions of the Library contained in the Combined Work and reverse
84 | engineering for debugging such modifications, if you also do each of
85 | the following:
86 |
87 | a) Give prominent notice with each copy of the Combined Work that
88 | the Library is used in it and that the Library and its use are
89 | covered by this License.
90 |
91 | b) Accompany the Combined Work with a copy of the GNU GPL and this license
92 | document.
93 |
94 | c) For a Combined Work that displays copyright notices during
95 | execution, include the copyright notice for the Library among
96 | these notices, as well as a reference directing the user to the
97 | copies of the GNU GPL and this license document.
98 |
99 | d) Do one of the following:
100 |
101 | 0) Convey the Minimal Corresponding Source under the terms of this
102 | License, and the Corresponding Application Code in a form
103 | suitable for, and under terms that permit, the user to
104 | recombine or relink the Application with a modified version of
105 | the Linked Version to produce a modified Combined Work, in the
106 | manner specified by section 6 of the GNU GPL for conveying
107 | Corresponding Source.
108 |
109 | 1) Use a suitable shared library mechanism for linking with the
110 | Library. A suitable mechanism is one that (a) uses at run time
111 | a copy of the Library already present on the user's computer
112 | system, and (b) will operate properly with a modified version
113 | of the Library that is interface-compatible with the Linked
114 | Version.
115 |
116 | e) Provide Installation Information, but only if you would otherwise
117 | be required to provide such information under section 6 of the
118 | GNU GPL, and only to the extent that such information is
119 | necessary to install and execute a modified version of the
120 | Combined Work produced by recombining or relinking the
121 | Application with a modified version of the Linked Version. (If
122 | you use option 4d0, the Installation Information must accompany
123 | the Minimal Corresponding Source and Corresponding Application
124 | Code. If you use option 4d1, you must provide the Installation
125 | Information in the manner specified by section 6 of the GNU GPL
126 | for conveying Corresponding Source.)
127 |
128 | 5. Combined Libraries.
129 |
130 | You may place library facilities that are a work based on the
131 | Library side by side in a single library together with other library
132 | facilities that are not Applications and are not covered by this
133 | License, and convey such a combined library under terms of your
134 | choice, if you do both of the following:
135 |
136 | a) Accompany the combined library with a copy of the same work based
137 | on the Library, uncombined with any other library facilities,
138 | conveyed under the terms of this License.
139 |
140 | b) Give prominent notice with the combined library that part of it
141 | is a work based on the Library, and explaining where to find the
142 | accompanying uncombined form of the same work.
143 |
144 | 6. Revised Versions of the GNU Lesser General Public License.
145 |
146 | The Free Software Foundation may publish revised and/or new versions
147 | of the GNU Lesser General Public License from time to time. Such new
148 | versions will be similar in spirit to the present version, but may
149 | differ in detail to address new problems or concerns.
150 |
151 | Each version is given a distinguishing version number. If the
152 | Library as you received it specifies that a certain numbered version
153 | of the GNU Lesser General Public License "or any later version"
154 | applies to it, you have the option of following the terms and
155 | conditions either of that published version or of any later version
156 | published by the Free Software Foundation. If the Library as you
157 | received it does not specify a version number of the GNU Lesser
158 | General Public License, you may choose any version of the GNU Lesser
159 | General Public License ever published by the Free Software Foundation.
160 |
161 | If the Library as you received it specifies that a proxy can decide
162 | whether future versions of the GNU Lesser General Public License shall
163 | apply, that proxy's public statement of acceptance of any version is
164 | permanent authorization for you to choose that version for the
165 | Library.
166 |
--------------------------------------------------------------------------------
/ChangeLog:
--------------------------------------------------------------------------------
1 | 4.0 (UNRELEASED)
2 |
3 | - Enable standard UNIX quoting for `GLADTEX_ARGS` that allows passing
4 | arguments with spaces when GladTeX is used as a Pandoc filter.
5 | - Add --epub flag to produce HTML output more suitable for EPUBs.
6 | - Avoid option clashes with the xcolor package if extending the xcolor
7 | options using `-p`. Recommend `\PassOptionsToPackage`.
8 | - Add `-interactive=nonstopmode` to each LaTeX invocation to stop even if
9 | a file was not found. This is relevant when using e.g. `-p "\input{foo}"
10 | and foo.tex doesn't exist.
11 | - Rework excluded formula handling:
12 | - Remove parser for file that contains excluded formulas: the file is
13 | auto-generated by GladTeX and it is more reliable and easier to
14 | overwrite the generated file with an updated one, instead of parsing
15 | the old contents.
16 | - Restructure library with a cleaner formatter hierarchy.
17 |
18 | 3.1
19 |
20 | - [source only] move gladtex.py to gleetex/main.py
21 |
22 | 3.0.1
23 |
24 | - fix AtributeError when specifying `-E`
25 |
26 |
27 | 3.0.0
28 |
29 | - new features and incompatible changes:
30 | - add `-P` command-line switch to be used as a Pandoc document filter,
31 | see
32 | - add environment variable `GLADTEX_ARGS` to pass command-line
33 | switches when used as pandocfilter where passing additional
34 | arguments is impossible
35 | - redefine colour handling: use xcolor package, therefore handling
36 | text and background colour the same way for both PNG and SVG
37 | - add SVG support for scalable images
38 | - use SVG output by default
39 | - gleetex.htmlhandling.HtmlImageFormatter: rename link_path to
40 | link_prefix
41 | - bug fixes:
42 | - correctly parse HTML5 file encoding declarations
43 | - add more exceptions to the unicode table for the unicode replacement
44 | mode (see -R)
45 | - treat -d as a relative path
46 |
47 | 2.3.1 Avoid useless spaces
48 |
49 | - When formula replacement with `-R` is requested, it could happen that
50 | additional spaces were inserted, even if not necessary. "für" would for
51 | instance become "f\"{u} r". Fixed.
52 |
53 | 2.3 Fix formula sizing
54 |
55 | - It seems as if 16px / 12pt were the default font size these days for
56 | browsers. Therefore, the default resolution has been set to 115 DPI.
57 | Furthermore, the DPI switch now accepts pt values for fontsizes and will
58 | calculate the corresponding DPI itself.
59 | - When the environment variable `DEBUG=1` is set, the full backtrace will be
60 | printed.
61 | - Extend unicode table creation script to allow blacklisting of certain
62 | commands.
63 |
64 | 2.2.1 - fix handling of non-ascii alphabetical characters
65 |
66 | - replace characters with diacritics in the LaTeX source, but keep the
67 | unmodified character in image alt attribute (for better readibility)
68 |
69 | 2.2 - make alternative text of formulas more readable
70 |
71 | - replace formatting commands in alt attribute; this shortens the
72 | formula and makes it mor readable
73 | - replace unicode signs also in alt attribute (good for screen reader
74 | users and text-mode browsers)
75 | - recognize upper-case `ENV` attribute of `EQ` tag (so that e.g.
76 | displaymath is recognized correctly)
77 |
78 | 2.1.1 Bug Fix Release
79 |
80 | - treat eq element content as verbatim
81 | - decode HTML entities within formula tags
82 |
83 | 2.1 add support for unicode math with translation table
84 | - handle subprocess stdin and stdout encoding properly
85 | - set UTF-8 as encoding for all LaTeX documents
86 | - add -R option (replace non ascii characters)
87 | - formulas in .htex documents may now contain umlauts or unicode math
88 | characters; conversion will work without adjustments, only -R has to
89 | be specified
90 | - handle encoding better and more strictly for LaTeX 2E
91 |
92 | 2.0.1 Bug Fix Release
93 | - show user a meaningful error message if LaTeX or dvipng is missing
94 | - setup.py: build manual page, if pandoc present
95 | - freeze multiprocessing on Windows, to make executables distributable
96 |
97 | 2.0 - make GladTeX truely platform independent
98 | - add formula number in error output; makes tracking of formulas easier in
99 | error case
100 | - write man page
101 | - set css class correctly for display math formulas
102 | - HTML label/id generation:
103 | - do not create overlong id's
104 | - do only generate id's starting with an alphabetical character
105 | - squeeze multiple identical characters
106 | - reparse outsourced formulas correctly (was a mixture of formatted vs.
107 | unformatted formulas)
108 | - do not use absolute links when operating on file which is not in current
109 | working directory
110 | - be more careful with backslashes vs. slashes
111 | - allow formulas consisting only of numbers (i.e. example calculations) by
112 | prefixing "form_" in front of the HTML id (id must start with a letter
113 | but may be followed by digits)
114 | - allow removal of unreadable caches with the `-n` switch (extend library
115 | with this functionality)
116 | - introduce `-m` switch to print the output in a less concise, but more
117 | machine-parseable format
118 |
119 | 1.6 - complete rewrite
120 | - rewrite GladTeX fully in Python
121 | - allows easy compilation into a binary for a specific platform
122 | - comes with a new library to use GladTeX functionality within other
123 | applications
124 | - fully unit-tested
125 | - enable piping support; GladTeX can read from stdin and write to stdout
126 | - drop -t switch; image is either transparent by default or has a background
127 | color which can be set with -b
128 | - drop -s switch
129 | - introduce -o (output) option
130 | - introduce new cache format containing version numbers; json, so
131 | interface to other programming languages
132 |
133 | 1.5 - Introduce options to make embedding of GladTeX easier
134 | - Try to parse LaTeX's error output and display it to help users to find the
135 | issue quicker and to make GladTeX better embeddable.
136 | - Add option to remove error log file, produced by LaTeX, automatically.
137 | - Rewrite some help messages.
138 | - Add signal handling to get meaningful error messages if GladTeX hangs.
139 |
140 | 1.4.2 - bug fix release
141 | - Add some eval's to cope with some failures.
142 | - Since there were some incompatibilities between Perl 5.10 and 5.18 in how
143 | the cache of the generated images is stored, GladTeX now removes this file
144 | along with all images starting with "eqn" and generates them again.
145 |
146 | 1.4.1 - bug fix release
147 | - Remove desc.html if created and empty
148 |
149 | 1.4 - put LaTeX equations into alt tag for text-mode browsers and blind people
150 | (and disabled images)
151 | - If requested (-a), exclude equations longer than 80 characters in an extra
152 | file and make the equation image a link to the longer, excluded image
153 | alternative
154 | - eqn2img: patch to allow building on windows
155 | - Change build system from make to cmake
156 | - Refactored gladtex code a lot to allow the usage of "use strict/use
157 | warnings"
158 | - Fix bug where multiple equations couldn't be on a single (html) line
159 | - Rework manpage
160 |
161 | 1.3 - Un-escape common entities before processing equations
162 | - Update man page with CSS class options
163 | - Add support for setting the CSS class of images when the
164 | environment is "math" or "displaymath"
165 | - eqn2img: changed redirection syntax (from dvips to /dev/null)
166 | for portability
167 | - GladTeX: exit with status 1 when a closing EQ tag is missing
168 | - GladTeX: print error messages to stderr instead of stdout
169 | - Fix environment-passing to eqn2img
170 | - Add support for a "dpi" attribute on EQ tags to customize the
171 | DPI used for each equation
172 |
173 | 1.2 - Fixed a serious memory allocation error, pointed out by Eric J.
174 | Francois. Also fixed several leaks.
175 | - Added full alpha channel to PNG files (also suggested by Eric)
176 | - The -e option was ignored, fixed (pointed out by Andr\'e Schleife)
177 | - Added man page, contributed by Volker Schatz
178 |
179 | 1.1 - Portability fixes: Do not assume a specific location for perl
180 | (use "env" in the shebang line) and do not rely on the bash style
181 | "&>" redirection.
182 |
183 | 1.0 - Image alignment workaround (most browsers interpret
184 | "ALIGN=MIDDLE" somewhat strangely, so it has been changed
185 | to "STYLE=vertical-align: -xx")
186 | - Added cache file, so that gladtex doesn't have to regenerate
187 | images for equations that haven't changed.
188 | - Added ENV option (as in ) to support environments
189 | other than "displaymath".
190 | - Bug fixes.
191 |
192 | 0.3 - Added BoundingBox workaround (dvips sometimes outputs wrong
193 | BoundingBox, for instance when using \mathbb{})
194 | - Moved the whole "LaTeX eqn to image" conversion into the C code,
195 | turning the C program (renamed from pngmodify to eqn2img) into
196 | a standalone utility (e.g. echo '\sqrt{2}' | eqn2img -o
197 | eqn.png).
198 | - Added colour options (-c -b and -t).
199 | - Fixed bug causing segfault when adding space _above_ an image.
200 | - Fixed image reusing bug (in 0.2 image reuse didn't work across
201 | separate files when processing files outside startup cwd).
202 | - And some other minor bugs and cosmetic changes.
203 | - Makefile added to distribution
204 |
205 | 0.2 - First official release, completely rewritten code.
206 |
207 | 0.1 - Used only internally at the Dept. of Mathematics at the Univ. of Oslo
208 | July-August 1999 (for the project "Matteknekker'n") under the name
209 | htmleqn.
210 | # vim: set expandtab sts=4 ts=4 sw=4 expandtab:
211 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | GladTeX
2 | =======
3 |
4 | GladTeX is a utility and library to display formulas on the Web and in
5 | HTML-based formats such as EPUB. Formulas are embedded within `…` tags
6 | and
7 | converted automatically to a scalable SVG image using LaTeX. The images
8 | integrate seamlessly into the output documents, work with any browser and are
9 | accessible for visually impaired and blind users as well.
10 |
11 | Features
12 | --------
13 |
14 | - LaTeX-quality formulas with partial unicode maths support
15 | - [Pandoc](http://pandoc.org) support to convert from any format with
16 | LaTeX-formulas (MarkDown, …) to any HTML-based format, e.g. EPUB
17 | - Cache formulas to speed up subsequent document conversion
18 | - Python library GleeTeX to embed into other applications or to tailor to a
19 | specific workflow
20 | - cross-platform, written in Python, comes with Windows executables.
21 |
22 | License
23 | -------
24 |
25 | - (C) 1999-2010 Martin G. Gulbrandsen (Perl version)
26 | - (C) 2011-2013 Jonathan Daugherty (especially release 1.3) (Perl version)
27 | - (C) 2013-2020 Sebastian Humenda (Python version)
28 |
29 | This program is distributed under the LGPL-3, or at your option, any later
30 | version of the license; for details see the accompanying file COPYING.
31 |
32 | The official project homepage is at
33 |
34 | Installation
35 | ============
36 |
37 | ### Debian/Ubuntu
38 |
39 | On all derivatives of Debian (as Ubuntu/Mint, etc.), installing GladTeX is as
40 | easy as
41 |
42 | # apt-get install gladtex
43 |
44 | ### Windows
45 |
46 | If you want to use the program without the Python library, you should download a
47 | pre-compiled binary from .
48 |
49 | Just unzip the archive and move the files to a directory within `%PATH%`.
50 |
51 | ### From Source
52 |
53 | The following is required for installing GladTeX:
54 |
55 | - Python >= 3.4
56 | - LaTeX (2e), dvisvgm (optionally png)
57 | - the LaTeX package preview.sty
58 |
59 |
60 | #### Debian / Ubuntu
61 |
62 | On Debian/Ubuntu systems the following commands will satisfy the dependencies:
63 |
64 | # apt-get install python3-all texlive-fonts-recommended texlive-latex-recommended preview-latex-style dvipng
65 |
66 | The package can then be installed using
67 |
68 | # python3 setup.py install
69 |
70 | Note: If your system ships `python` as the command for Python3 you have to use
71 | `python in` the above command instead.
72 |
73 | #### OS X
74 |
75 | You need to install a LaTeX distribution on your Mac. GladTeX was successfully
76 | run with [MacTex](http://www.tug.org/mactex/).
77 |
78 | You can download a zip source archive from
79 | [GitHub](https://github.com/humenda/GladTeX) or use git:
80 |
81 | $ git clone https://github.com/humenda/GladTeX.git
82 |
83 | Use `cd` to change to the GladTeX source directory and issue
84 |
85 | $ python setup.py install
86 |
87 |
88 |
89 | Documentation
90 | -------------
91 |
92 | Please use `man gladtex` for further instructions or have a look at the file
93 | [manpage.md](manpage.md).
94 |
95 | Contribute
96 | ----------
97 |
98 | Contributions are welcome. Please use
99 | [PyFormat](https://pypi.org/project/pyformat/) to
100 | format the code.
101 |
--------------------------------------------------------------------------------
/ToDo.FAQ:
--------------------------------------------------------------------------------
1 | A few items which could go into a FAQ:
2 |
3 | What do the error messages mean?
4 | Why is my formula positioned awkardly within the text? (most probably displaymath instead of inline math)
5 | Why are limits over sums and similar not correctly set? (use displaymath, sometimes `\\limits`)
6 | Why don't my special characters like (unicode) math symbols or umlauts not work
7 |
--------------------------------------------------------------------------------
/ToDo.md:
--------------------------------------------------------------------------------
1 | To Do
2 | =====
3 |
4 | This list contains things to be implemented in GladTeX. If you have additions or
5 | even feel like you want to do it, feel free to drop me an email: `shumenda |aT|
6 | gmx //dot-- de`.
7 |
8 | Uncategorized
9 | -------------
10 |
11 | - introduce command line option which will check whether all all formulas in a
12 | cache are used and if not, remove the formula (only useful for caches
13 | corresponding to a single document)
14 |
15 | Gettext
16 | -------
17 |
18 |
19 | Gettext should be integrated to localize messages (especially errors).
20 |
21 | Compressed Cache
22 | ----------------
23 |
24 | The cache stores the path, the formula and the positioning of an image. For
25 | large documents, this might be quite big, hence it makes sense to compress them.
26 |
27 | To make things easier, the cache should have a .gz extension.
28 |
29 |
--------------------------------------------------------------------------------
/create_windows_distributions.py:
--------------------------------------------------------------------------------
1 | """This file builds windows distributions, zip files with GladTeX and all other
2 | files."""
3 | import os
4 | import shutil
5 | import stat
6 | import sys
7 | import zipfile
8 | import gleetex
9 |
10 |
11 | def exec_setup_py(arg_string):
12 | """Execute `python setup.py` as a subprocess.
13 |
14 | Use Wine, if necessary.
15 | """
16 | ret = None
17 | if sys.platform.startswith('win'):
18 | ret = os.system('python setup.py ' + arg_string)
19 | else:
20 | if not shutil.which('wine'):
21 | print('Error: Wine is not installed, aborting…')
22 | sys.exit(5)
23 | ret = os.system('wine python setup.py ' + arg_string)
24 | if ret:
25 | if sys.platform.startswith('win'):
26 | print('Aborting at command `python setup.py %s`.' % arg_string)
27 | else:
28 | print('Aborting at command `wine python setup.py %s`.' % arg_string)
29 | sys.exit(7)
30 |
31 |
32 | def get_python_version():
33 | """Return the python version as a string."""
34 | import re
35 | import subprocess
36 |
37 | args = ['python', '--version']
38 | if not sys.platform.startswith('win'):
39 | args = ['wine'] + args
40 | proc = subprocess.Popen(args, stdout=subprocess.PIPE)
41 | stdout = proc.communicate()[0].decode(sys.getdefaultencoding())
42 | if proc.wait():
43 | raise TypeError(
44 | 'Abnormal subprocess termination while querying python version.'
45 | )
46 | return re.search(r'.*?(\d+\.\d+\.\d+)', stdout).groups()[0]
47 |
48 |
49 | def get_executable_name(label):
50 | """Construct the name of an executable."""
51 | return 'gladtex-win64-%s-py_%s-%s.zip' % (
52 | gleetex.VERSION,
53 | get_python_version(),
54 | label,
55 | )
56 |
57 |
58 | def bundle_files(src, output_name):
59 | """Bundle the compiled binary files with README, ChangeLog and COPYING."""
60 | if os.path.exists(output_name):
61 | shutil.rmtree(output_name)
62 | os.rename(src, output_name)
63 | # add README.first
64 | with open(os.path.join(output_name, 'README.first.txt'), 'w') as f:
65 | f.write('GladTeX for Windows\r\n===================\r\n\r\n')
66 | f.write(
67 | 'This program has been compiled with python 3.4.4. If you want to embedd it in binary form with your binary python application, the version numbers HAVE TO match.\r\n'
68 | )
69 | f.write(
70 | '\r\nFor more information, see the file README.md or http://humenda.github.io/GladTeX\r\n'
71 | )
72 |
73 | # copy README and other files
74 | for file in ['README.md', 'COPYING', 'ChangeLog']:
75 | dest = os.path.join(output_name, file)
76 | # check whether file ending exists
77 | if not '.' in dest[-5:]:
78 | dest += '.txt'
79 | shutil.copy(file, dest)
80 |
81 | files = [
82 | os.path.join(root, file)
83 | for root, _, files in os.walk(output_name)
84 | for file in files
85 | ]
86 | with zipfile.ZipFile(output_name + '.zip', 'w', zipfile.ZIP_DEFLATED) as z:
87 | for file in files:
88 | z.write(file)
89 | shutil.rmtree(output_name)
90 |
91 |
92 | class TemporaryBuildDirectory:
93 | """Context handler to guard the build process.
94 |
95 | Upon entering the context, the source is copied to a temporary
96 | directory and the program changes to this directory. After all build
97 | actions have been done, the output file is copied back to the
98 | original directory, the program resets the current working directory
99 | and deletes the temporary directory.
100 | """
101 |
102 | def __init__(self, output_file_name):
103 | self.orig_cwd = os.getcwd()
104 | self.tmpdir = None
105 | self.output_file_name = output_file_name
106 |
107 | def __enter__(self):
108 | self.tmpdir = self.get_temp_directory()
109 | shutil.copytree(os.getcwd(), self.tmpdir)
110 | os.chdir(self.tmpdir)
111 | return self
112 |
113 | def __exit__(self, _a, _b, _c):
114 | os.chdir(self.orig_cwd)
115 | shutil.copy(
116 | os.path.join(
117 | self.tmpdir, self.output_file_name), self.output_file_name
118 | )
119 | shutil.rmtree(self.tmpdir, onerror=self.__onerror)
120 |
121 | def get_temp_directory(self):
122 | """Find a temporary directory to work in.
123 |
124 | The checks are done to find a directory which does not reside
125 | within the user's path, because py2exe includes absolute paths
126 | for python scripts (in their tracebacks). It is not desirable to
127 | show the whole world the directory layout of the computer where
128 | the source code was built on.
129 | """
130 | tmp_base = None
131 | if os.path.exists('/tmp'):
132 | tmp_base = '/tmp'
133 | elif os.path.exists('\\temp'):
134 | tmp_base = '\\temp'
135 | elif os.path.exists('\\windows\\temp'):
136 | tmp_base = '\\windows\\temp'
137 | else:
138 | import tempfile
139 |
140 | tmp_base = tempfile.gettempdir()
141 | tmpdir = os.path.join(tmp_base, 'gladtex.build')
142 | if os.path.exists(tmpdir):
143 | shutil.rmtree(tmpdir, onerror=self.__onerror)
144 | return tmpdir
145 |
146 | def __onerror(self, func, path, exc_info):
147 | """Error handler for ``shutil.rmtree``.
148 |
149 | If the error is due to an access error (read only file) it attempts to
150 | add write permission and then retries. If the error is for another reason it re-raises the error.
151 | Usage : ``shutil.rmtree(path, onerror=onerror)``.
152 | """
153 | if not os.access(path, os.W_OK):
154 | # Is the error an access error ?
155 | os.chmod(path, stat.S_IWUSR)
156 | func(path)
157 | else:
158 | raise exc_info
159 |
160 |
161 | if __name__ == '__main__':
162 | with TemporaryBuildDirectory(get_executable_name('embeddable')) as tb:
163 | # build embeddable release, where all files are separate DLL's; if somebody
164 | # distributes a python app, these DLL files can be shared
165 | exec_setup_py('py2exe -c -O 2 -i gleetex --bundle-files 3')
166 | bundle_files('dist', os.path.splitext(tb.output_file_name)[0])
167 |
168 | # create a stand-alone version of GladTeX
169 | with TemporaryBuildDirectory(get_executable_name('standalone')) as tb:
170 | exec_setup_py('py2exe -i gleetex -c -O 2 --bundle-files 1')
171 | bundle_files('dist', os.path.splitext(tb.output_file_name)[0])
172 |
--------------------------------------------------------------------------------
/examples/md2epub.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | """This demo script converts from markdown to Epub using GladTeX. It requires
3 | Pandoc for the conversion.
4 |
5 | Throughout this script, the abbreviation AST for Abstract Syntax Tree is
6 | used.
7 | """
8 |
9 | import json
10 | import os
11 | import shutil
12 | import subprocess
13 | import sys
14 |
15 | import gleetex
16 |
17 |
18 | def transform_ast(ast):
19 | # extract formulas from Pandoc document AST
20 | formulas = gleetex.pandoc.extract_formulas(ast)
21 | # converter using cache, helps avoiding the same formula twice
22 | conv = gleetex.cachedconverter.CachedConverter('.', True, encoding='UTF-8')
23 | # automatically handle unicode
24 | conv.set_replace_nonascii(True)
25 | # go parallel
26 | conv.convert_all('.', formulas)
27 |
28 | # an converted image has information like image depth and height, adjust
29 | # data structure for write-back
30 | formulas = [conv.get_data_for(eqn, style) for _p, style, eqn in formulas]
31 | # get a formatter instance
32 | with gleetex.htmlhandling.HtmlImageFormatter('.') as img_fmt:
33 | # non-ascii sequences will be replaced in the laternative text
34 | img_fmt.set_replace_nonascii(True)
35 | # this alters the AST reference, so no return value required
36 | gleetex.pandoc.replace_formulas_in_ast(
37 | img_fmt, ast['blocks'], formulas)
38 |
39 |
40 | def cleanup(path):
41 | # remove images and ache, relevant data is included within the EPUB
42 | for file in os.listdir(path):
43 | if file.endswith('.png') or file.endswith('.cache'):
44 | os.remove(os.path.join(path, file))
45 |
46 |
47 | def main():
48 | for prog in ('pandoc', 'gladtex'):
49 | if not shutil.which(prog):
50 | sys.stderr.write(
51 | ('This script requires %s, please install and rerun ' 'this script.')
52 | % prog
53 | )
54 | sys.exit(1)
55 |
56 | usage = False
57 | if len(sys.argv) < 2:
58 | print('Missing command arguments.')
59 | usage = True
60 | elif len(sys.argv) > 2 or (len(sys.argv) == 2 and not os.path.exists(sys.argv[1])):
61 | print('Exactly one input path required')
62 | usage = True
63 | if usage:
64 | print(
65 | '%s \n\nConvert given file to epub using GladTeX.' % sys.argv[0]
66 | )
67 | sys.exit(0)
68 |
69 | inputfile = sys.argv[1]
70 | outputfile = '%s.epub' % os.path.splitext(inputfile)[0]
71 | # get the document AST
72 | proc = subprocess.Popen(
73 | ['pandoc', '-t', 'json', inputfile], stdout=subprocess.PIPE)
74 | ast = json.loads(proc.communicate()[0].decode(sys.getdefaultencoding()))
75 | if proc.wait() != 0:
76 | sys.exit(2)
77 |
78 | # the actual GleeTeX calls are here
79 | transform_ast(ast)
80 |
81 | # write back to stdin of pandoc
82 | proc = subprocess.Popen(
83 | ['pandoc', '-o', outputfile, '-f', 'json', '-t', 'epub'], stdin=subprocess.PIPE
84 | )
85 | proc.communicate(json.dumps(ast).encode(sys.getdefaultencoding()))
86 | if proc.wait():
87 | sys.exit(2)
88 | cleanup('.')
89 |
90 |
91 | if __name__ == '__main__':
92 | main()
93 |
--------------------------------------------------------------------------------
/gleetex/__init__.py:
--------------------------------------------------------------------------------
1 | # (c) 2013-2021 Sebastian Humenda
2 | # This code is licenced under the terms of the LGPL-3+, see the file COPYING for
3 | # more details.
4 | from . import caching
5 | from . import cachedconverter
6 | from . import htmlhandling
7 | from . import image
8 | from . import pandoc
9 | from . import parser
10 | from . import sink
11 | from . import typesetting
12 |
13 | VERSION = '3.1.0'
14 |
15 | __all__ = [
16 | 'caching',
17 | 'cachedconverter',
18 | 'htmlhandling',
19 | 'image',
20 | 'pandoc',
21 | 'parser',
22 | 'sink',
23 | 'typesetting',
24 | 'unicode',
25 | 'VERSION',
26 | ]
27 |
--------------------------------------------------------------------------------
/gleetex/__main__.py:
--------------------------------------------------------------------------------
1 | # (c) 2013-2021 Sebastian Humenda
2 | # This code is licenced under the terms of the LGPL-3+, see the file COPYING for
3 | # more details.
4 | import argparse
5 | import multiprocessing
6 | import os
7 | import shlex
8 | import posixpath
9 | import sys
10 | import textwrap
11 |
12 | from . import (
13 | caching,
14 | cachedconverter,
15 | htmlhandling,
16 | pandoc,
17 | parser,
18 | sink,
19 | typesetting,
20 | VERSION,
21 | )
22 |
23 |
24 | class HelpfulCmdParser(argparse.ArgumentParser):
25 | """This variant of arg parser always prints the full help whenever an error
26 | occurs."""
27 |
28 | def error(self, message):
29 | sys.stderr.write('error: %s\n' % message)
30 | self.print_help()
31 | sys.exit(2)
32 |
33 |
34 | class Main:
35 | """This class parses command line arguments and deals with the conversion.
36 |
37 | Only the run method needs to be called.
38 | """
39 |
40 | def __init__(self):
41 | self.__encoding = 'utf-8'
42 |
43 | def _parse_args(self, args):
44 | """Parse command line arguments and return option instance."""
45 | epilog = 'GladTeX %s, http://humenda.github.io/GladTeX' % VERSION
46 | description = (
47 | 'GladTeX is a preprocessor that enables the use of LaTeX'
48 | ' maths within HTML files. The maths, embedded in ... '
49 | 'tags, as if within \\(..\\) in LaTeX (or $...$ in TeX), is fed '
50 | 'through latex and replaced by images.\n\nPlease also see the '
51 | 'documentation on the web or from the manual page for more '
52 | 'information, especially on environment variables.'
53 | )
54 | cmd = HelpfulCmdParser(epilog=epilog, description=description)
55 | cmd.add_argument(
56 | '-a',
57 | default=sink.EXCLUSION_FILE_NAME,
58 | dest='exclusionfile',
59 | help='path to the file to which to write excluded formulas'
60 | + 'for images which are too long for the alt attribute into a '
61 | + 'single separate file and link images to it',
62 | )
63 | cmd.add_argument(
64 | '-b',
65 | dest='background_color',
66 | help=(
67 | 'Set background color for resulting images '
68 | '(default transparent, use hex)'
69 | ),
70 | )
71 | cmd.add_argument(
72 | '-c',
73 | dest='foreground_color',
74 | help=('Set foreground color for resulting images (default ' '000000, hex)'),
75 | )
76 | cmd.add_argument(
77 | '-d',
78 | default='',
79 | dest='img_directory',
80 | help='Directory in which to'
81 | + ' store generated images in (relative to the output file)',
82 | )
83 | cmd.add_argument(
84 | '-e',
85 | dest='latex_maths_env',
86 | help='Set custom maths environment to surround the formula'
87 | + ' (e.g. flalign)',
88 | )
89 | cmd.add_argument(
90 | '-f',
91 | metavar='SIZE',
92 | dest='fontsize',
93 | default=12,
94 | help='Set font size in pt (default 12)',
95 | )
96 | cmd.add_argument(
97 | '-E',
98 | dest='encoding',
99 | default=None,
100 | help='Overwrite encoding to use (default UTF-8)',
101 | )
102 | cmd.add_argument(
103 | '--epub',
104 | dest='is_epub',
105 | default=False,
106 | action='store_true',
107 | help='Optimise output for epub, for instance round height/width of '
108 | 'images',
109 | )
110 | cmd.add_argument(
111 | '-i',
112 | metavar='CLASS',
113 | dest='inlinemath',
114 | help="CSS class to assign to inline math (default: 'inlinemath')",
115 | )
116 | cmd.add_argument(
117 | '-l',
118 | metavar='CLASS',
119 | dest='displaymath',
120 | help="CSS class to assign to block-level math (default: 'displaymath')",
121 | )
122 | cmd.add_argument(
123 | '-K',
124 | dest='keep_latex_source',
125 | action='store_true',
126 | default=False,
127 | help='keep LaTeX file(s) when converting formulas (useful for debugging)',
128 | )
129 | cmd.add_argument(
130 | '-m',
131 | dest='machinereadable',
132 | action='store_true',
133 | default=False,
134 | help='Print output in machine-readable format (less concise, better parseable)',
135 | )
136 | cmd.add_argument(
137 | '-n',
138 | action='store_true',
139 | dest='notkeepoldcache',
140 | help=(
141 | 'Purge unreadable caches along with all eqn*.png files. '
142 | 'Caches can be unreadable if the used GladTeX version is '
143 | 'incompatible. If this option is unset, GladTeX will '
144 | 'simply fail when the cache is unreadable.'
145 | ),
146 | )
147 | cmd.add_argument(
148 | '-o',
149 | metavar='FILENAME',
150 | dest='output',
151 | help=(
152 | "Set output file name; '-' will print text to stdout (by"
153 | 'default input file name is used and .htex extension changed '
154 | 'to .html)'
155 | ),
156 | )
157 | cmd.add_argument(
158 | '-p',
159 | metavar='LATEX_STATEMENT',
160 | dest='preamble',
161 | help='Add given LaTeX code to the preamble of the LaTeX '
162 | + 'document that is used to generate the embedded images. '
163 | + 'In order to add the contents of a file to the preamble, '
164 | + 'use `-p "\\input{FILE}"`.',
165 | )
166 | cmd.add_argument(
167 | '-P',
168 | dest='pandocfilter',
169 | action='store_true',
170 | help='Use GladTeX as a Pandoc filter: read a Pandoc JSON AST '
171 | 'from stdin, convert the images, change math blocks to '
172 | 'images and write JSON to stdout; '
173 | 'see the man page on how to pass args to GladTeX in this mode',
174 | )
175 | cmd.add_argument(
176 | '--png',
177 | action='store_true',
178 | dest='png',
179 | help='Use PNG instead of SVG for images',
180 | )
181 | cmd.add_argument(
182 | '-r',
183 | '--resolution',
184 | metavar='DPI',
185 | dest='dpi',
186 | default=None,
187 | help=(
188 | 'Set resolution in DPI, only available if PNG output '
189 | 'selected; also see `-f`'
190 | ),
191 | )
192 | cmd.add_argument(
193 | '-R',
194 | action='store_true',
195 | dest='replace_nonascii',
196 | default=False,
197 | help='Replace non-ascii characters in formulas '
198 | 'through their LaTeX commands',
199 | )
200 | cmd.add_argument(
201 | '-u',
202 | metavar='URL',
203 | dest='url',
204 | help='URL to image files (relative links are default)',
205 | )
206 | cmd.add_argument(
207 | 'input',
208 | help='Input .htex file with LaTeX '
209 | + 'formulas (if omitted or -, stdin will be read)',
210 | )
211 | return cmd.parse_args(args)
212 |
213 | def exit(self, text, status):
214 | """Exit function.
215 |
216 | Could be used to register any clean up action.
217 | """
218 | sys.stderr.write(text)
219 | if not text.endswith('\n'):
220 | sys.stderr.write('\n')
221 | sys.exit(status)
222 |
223 | def validate_options(self, opts):
224 | """Validate certain arguments suppliedon the command line.
225 |
226 | The user will get a (hopefully) helpful error message if he/she
227 | gave an invalid parameter.
228 | """
229 | if opts.fontsize and opts.dpi:
230 | print("Options -f and -d can't be used at the same time.")
231 | sys.exit(14)
232 | if opts.dpi and not opts.png:
233 | print(('Impossible to set resolution when using SVG as output, ' 'try -f'))
234 | sys.exit(14)
235 |
236 | def get_input_output(self, options):
237 | """Determine whether GladTeX is reading from stdin/file, writing to
238 | stdout/file and determine base_directory if files are in another
239 | directory.
240 |
241 | If no output file name is given and there is a input file to
242 | read from, output is written to a file ending on .html instead
243 | of .htex. The returned document is either string or byte, the
244 | latter if encoding is unknown.
245 | """
246 | data = None
247 | output = '-'
248 | if options.input == '-':
249 | data = sys.stdin.read()
250 | else:
251 | try:
252 | # if encoding was specified or if a pandoc filter is supplied,
253 | # read document with default encoding
254 | if options.encoding or options.pandocfilter:
255 | encoding = 'UTF-8' if options.pandocfilter else options.encoding
256 | with open(options.input, encoding=encoding) as f:
257 | data = f.read()
258 | else: # read as binary and guess from HTML meta charset
259 | with open(options.input, 'rb') as file:
260 | data = file.read()
261 | except UnicodeDecodeError as e:
262 | self.exit(
263 | (
264 | f'Error while reading from {options.input}: {e}\nProbably this '
265 | 'file has a different encoding, try specifying -E.'
266 | ),
267 | 88,
268 | )
269 | except IsADirectoryError:
270 | self.exit(f'Error: cannot open {options.input} for reading: is a directory.', 19)
271 | except FileNotFoundError:
272 | self.exit(f'Error: file {options.input} not found.', 20)
273 |
274 | # check which output file name to use
275 | base_path = ''
276 | if options.output:
277 | base_path = os.path.dirname(options.output)
278 | elif options.input != '-':
279 | output = os.path.splitext(options.input)[0] + '.html'
280 | base_path = os.path.dirname(options.input)
281 |
282 | if base_path: # if finally a basepath found:, strip \\ if on Windows
283 | base_path = posixpath.join(*(base_path.split('\\')))
284 | # the basepath needs to be relative to the output file
285 | return (data, base_path, output)
286 |
287 | def run(self, args):
288 | options = self._parse_args(args[1:])
289 | self.validate_options(options)
290 | self.__encoding = options.encoding
291 | fmt = 'pandocfilter' if options.pandocfilter else 'html'
292 | doc, base_path, output = self.get_input_output(options)
293 | try:
294 | # doc is either a list of raw HTML chunks and formulas or a tuple of
295 | # (document AST, list of formulas) if options.pandocfilter
296 | self.__encoding, doc = parser.parse_document(doc, fmt)
297 | except parser.ParseException as e:
298 | input_fn = 'stdin' if options.input == '-' else options.input
299 | self.exit(f'Error while parsing {input_fn}: {e}', 5)
300 |
301 | processed = self.convert_images(
302 | doc, base_path, options.img_directory, options)
303 | img_fmt = htmlhandling.HtmlImageFormatter(
304 | base_path=os.path.join(base_path, options.img_directory),
305 | link_prefix=options.url,
306 | exclusion_file_path=options.exclusionfile,
307 | is_epub=options.is_epub,
308 | )
309 | if options.replace_nonascii:
310 | img_fmt.set_replace_nonascii(True)
311 | if options.url:
312 | img_fmt.set_url(options.url)
313 | if options.inlinemath:
314 | img_fmt.set_inline_math_css_class(options.inlinemath)
315 | if options.displaymath:
316 | img_fmt.set_display_math_css_class(options.displaymath)
317 |
318 | # pass formatter to document sinks; the formatter will accumulate
319 | # formulas that were too long to write them out later
320 | with (
321 | sys.stdout if output == '-' else open(
322 | output, 'w', encoding=self.__encoding)
323 | ) as file:
324 | if options.pandocfilter:
325 | pandoc.write_pandoc_ast(file, processed, img_fmt)
326 | else:
327 | htmlhandling.write_html(file, processed, img_fmt)
328 | # ToDo: make sink type an argument
329 | sink_type = sink.SinkType.html_file
330 | try:
331 | sink.EXCLUSION_FORMULA_SINKS[sink_type](
332 | img_fmt.get_exclusion_file_path(), img_fmt.get_excluded())
333 | except KeyError:
334 | raise NotImplementedError() from None
335 |
336 | def convert_images(self, parsed_document, base_path, img_dir, options):
337 | """Convert all formulas to images and store file path and equation in a
338 | list to be processed later on."""
339 | base_path = '' if not base_path or base_path == '.' else base_path
340 | img_dir = '' if not img_dir or img_dir == '.' else img_dir
341 | result = []
342 | try:
343 | conv = cachedconverter.CachedConverter(
344 | base_path,
345 | not options.notkeepoldcache,
346 | encoding=self.__encoding,
347 | img_dir=img_dir,
348 | )
349 | except caching.JsonParserException as e:
350 | self.exit(e.args[0], 78)
351 |
352 | self.set_options(conv, options)
353 | if options.pandocfilter:
354 | formulas = parsed_document[1]
355 | else: # HTML chunks from EqnParser
356 | formulas = [
357 | c for c in parsed_document if isinstance(c, (tuple, list))]
358 | try:
359 | conv.convert_all(formulas)
360 | except cachedconverter.ConversionException as e:
361 | self.emit_latex_error(
362 | e, options.machinereadable, options.replace_nonascii)
363 |
364 | if options.pandocfilter:
365 | # return (ast, formulas), just with formulas being replaced with the
366 | # conversion data
367 | return (
368 | parsed_document[0],
369 | [conv.get_data_for(eqn, style) for _p, style, eqn in formulas],
370 | )
371 | for chunk in parsed_document:
372 | # output of EqnParser: list-alike is formula, str is raw HTML
373 | if isinstance(chunk, (tuple, list)):
374 | _p, displaymath, formula = chunk
375 | try:
376 | result.append(conv.get_data_for(formula, displaymath))
377 | except KeyError as e:
378 | # formula is usually tuple(str, bool)
379 | formula = e.args[0]
380 | if isinstance(formula, (list, tuple)):
381 | formula = e.args[0][0] # ignore bool(displaymath)
382 | raise KeyError(
383 | (
384 | "formula '{}' not found; that means it was "
385 | 'not converted which should usually not happen.'
386 | ).format(formula)
387 | ) from e
388 | else:
389 | result.append(chunk)
390 | return result
391 |
392 | def set_options(self, conv, options):
393 | """Apply options from command line parser to the converter."""
394 | # set options
395 | options_to_query = [
396 | 'preamble',
397 | 'latex_maths_env',
398 | 'png',
399 | 'keep_latex_source',
400 | 'foreground_color',
401 | 'background_color',
402 | 'is_epub',
403 | ]
404 | for option_str in options_to_query:
405 | option = getattr(options, option_str)
406 | if option:
407 | if option in ('True', 'False', 'false', 'true'):
408 | option = bool(option)
409 | conv.set_option(option_str, option)
410 | if options.dpi:
411 | conv.set_option('dpi', float(options.dpi))
412 | elif options.fontsize:
413 | conv.set_option('fontsize', options.fontsize)
414 | if options.replace_nonascii:
415 | conv.set_replace_nonascii(True)
416 |
417 | def emit_latex_error(self, err, machine_readable, escape):
418 | """Format a LaTeX error in a meaningful way.
419 |
420 | The argument escape specifies, whether the -R switch had been
421 | passed. If the pandocfilter mode is active, formula positions
422 | will be omitted; this makes the code more complex.
423 | """
424 | if 'DEBUG' in os.environ and os.environ['DEBUG'] == '1':
425 | raise err
426 | escaped = err.formula
427 | if escape:
428 | escaped = typesetting.escape_unicode_maths(err.formula)
429 | msg = None
430 | additional = ''
431 | if 'Package inputenc' in err.args[0]:
432 | additional += (
433 | 'Add the switch `-R` to automatically replace unicode '
434 | 'characters with LaTeX command sequences.'
435 | )
436 | if machine_readable:
437 | msg = 'Number: {}\nFormula: {}{}\nMessage: {}'.format(
438 | err.formula_count,
439 | err.formula,
440 | (
441 | ''
442 | if escaped == err.formula
443 | else '\nLaTeXified formula: %s' % escaped
444 | ),
445 | err.cause,
446 | )
447 | if err.src_line_number and err.src_pos_on_line:
448 | msg = ('Line: {}, {}\n' + msg).format(
449 | err.src_line_number, err.src_pos_on_line
450 | )
451 | if additional:
452 | msg += '; ' + additional
453 | else:
454 | formula = ' ' + err.formula.replace('\n', '\n ')
455 | escaped = (
456 | ' ' + escaped.replace('\n', '\n ')
457 | if escaped != err.formula
458 | else ''
459 | )
460 | msg = 'Error while converting formula %d' % err.formula_count
461 | if err.src_line_number and err.src_pos_on_line:
462 | msg = msg.rstrip() + ' at line %d, %d:\n' % (
463 | err.src_line_number,
464 | err.src_pos_on_line,
465 | )
466 | msg += '%s%s\n%s' % (
467 | formula,
468 | (
469 | ''
470 | if not escaped or escaped == err.formula
471 | else '\nFormula without unicode symbols:\n%s' % escaped
472 | ),
473 | err.cause,
474 | )
475 | if additional:
476 |
477 | msg += ' undefined.\n' + \
478 | '\n'.join(textwrap.wrap(additional, 80))
479 | self.exit(msg, 91)
480 |
481 |
482 | def main():
483 | """Entry point for setuptools."""
484 | # enable multiprocessing on Windows, see python docs
485 | multiprocessing.freeze_support()
486 | m = Main()
487 | # run as pandoc filter?
488 | args = sys.argv[1:] # fallback if no environment variable set
489 | if 'GLADTEX_ARGS' in os.environ:
490 | args = shlex.split(os.environ['GLADTEX_ARGS'])
491 | if '-P' not in args:
492 | args = ['-P'] + args
493 | m.run([sys.argv[0]] + args)
494 |
495 |
496 | if __name__ == '__main__':
497 | main()
498 |
--------------------------------------------------------------------------------
/gleetex/cachedconverter.py:
--------------------------------------------------------------------------------
1 | # (c) 2013-2022 Sebastian Humenda
2 | # This code is licenced under the terms of the LGPL-3+, see the file COPYING for
3 | # more details.
4 | """
5 | A preconfigured image converter that caches conversion results.
6 |
7 | This convert with caching ability is less flexible than using the image
8 | converter directly, but automatically uses a cache if available to avoid
9 | conversion if the formula image is already present."""
10 |
11 | import concurrent.futures
12 | import multiprocessing
13 | import os
14 | import subprocess
15 |
16 | from . import caching, image, typesetting
17 | from .caching import normalize_formula
18 | from .image import Format
19 |
20 |
21 | class ConversionException(Exception):
22 | """This exception is raised whenever a problem occurs during conversion.
23 |
24 | Example:
25 | c = ConversionException("cause", "\\tau", 10, 38, 5)
26 | assert c.cause == cause
27 | assert c.formula == '\\tau'
28 | assert c.src_line_number == 10 # line number in source document (counting from 1)
29 | assert c.src_pos_on_line == 38 # position of formula in source line, counting from 1
30 | assert c.formula_count == 5 # fifth formula in document (starting from 1)
31 | """
32 |
33 | # mind your own business mr. pylint:
34 | # pylint: disable=too-many-arguments
35 | def __init__(
36 | self, cause, formula, formula_count, src_line_number=None, src_pos_on_line=None
37 | ):
38 | # provide a default error message
39 | if src_line_number and src_pos_on_line:
40 | super().__init__(
41 | 'LaTeX failed at formula line {}, {}, no. {}: {}'.format(
42 | src_line_number, src_pos_on_line, formula_count, cause
43 | )
44 | )
45 | else:
46 | super().__init__(
47 | 'LaTeX failed at formula no. {}: {}'.format(
48 | formula_count, cause)
49 | )
50 | # provide attributes for upper level error handling
51 | self.cause = cause
52 | self.formula = formula
53 | self.src_line_number = src_line_number
54 | self.src_pos_on_line = src_pos_on_line
55 | self.formula_count = formula_count
56 |
57 |
58 | class CachedConverter:
59 | """Convert formulas to images.
60 |
61 | Cache the resulting images to reuse those for subsequent runs or for
62 | recurring instances in the same document.
63 |
64 | c = CachedConverter(base_path)
65 | for formula in [... formulas ...]:
66 | pos, file_path = c.convert(formula)
67 | ...
68 |
69 | The formula is either converted or retrieved from a cache in the same
70 | directory like the images.
71 |
72 | :param base_path directory of the output HTML file; link references in the
73 | HTML document will link relative to it
74 | :param keep_old_cache If an existing cache cannot be read (incompatible
75 | GladTeX version, ...) Aand the flag is set, the program will simply
76 | crash and tell the user to remove the cache (default). If set to False,
77 | the program will instead remove the cache and all eqn* files and
78 | recreate the cache.
79 | :param encoding The encoding for the LaTeX document, default None
80 | :param img_dir directory for images (default ., equivalent to base_path)
81 | For example "images" would put it in `base_path`/images and "../img"
82 | would put it in "base_path/../img"
83 | """
84 |
85 | GLADTEX_CACHE_FILE_NAME = 'gladtex.cache'
86 |
87 | def __init__(self, base_path, keep_old_cache=True, encoding=None, img_dir=''):
88 | empty_path = lambda p: ('' if not p or p.strip(os.sep) == '.' else p)
89 | self.__output_path = empty_path(base_path) # path for converted document
90 | self.__img_dir = empty_path(img_dir) # relative to base_path
91 | # cache path is **relative** to base_path
92 | cache_path = os.path.join(
93 | self.__img_dir, CachedConverter.GLADTEX_CACHE_FILE_NAME
94 | )
95 | self.__is_epub = False
96 | self.__cache = caching.ImageCache(
97 | cache_path,
98 | keep_old_cache=keep_old_cache,
99 | base_path=empty_path(self.__output_path),
100 | )
101 | self.__converter = None
102 | self.__options = {
103 | 'dpi': None,
104 | 'transparency': None,
105 | 'fontsize': None,
106 | 'background_color': None,
107 | 'foreground_color': None,
108 | 'preamble': None,
109 | 'latex_maths_env': None,
110 | 'keep_latex_source': False,
111 | 'png': False,
112 | 'is_epub': False,
113 | }
114 | self.__encoding = encoding
115 | self.__replace_nonascii = False
116 |
117 | def set_option(self, option, value):
118 | """Set one of the options accepted for gleetex.image.Tex2img.
119 |
120 | It is a proxy function. `option` must be one of dpi, fontsize,
121 | transparency, background_color, foreground_color, preamble,
122 | latex_maths_env, keep_latex_source, png.
123 | """
124 | if not option in self.__options.keys():
125 | raise ValueError(
126 | 'Option must be one of ' + ', '.join(self.__options.keys())
127 | )
128 | self.__options[option] = value
129 |
130 | def set_replace_nonascii(self, flag):
131 | """If set, GladTeX will convert all non-ascii character to LaTeX
132 | commands.
133 |
134 | This setting is passed through to typesetting.LaTeXDocument.
135 | """
136 | self.__replace_nonascii = flag
137 |
138 | def convert_all(self, formulas):
139 | """convert_all(formulas) Convert all formulas using self.convert
140 | concurrently.
141 |
142 | Each element of `formulas` must be a tuple containing (formula,
143 | displaymath, Formulas already contained in the cache are not
144 | converted.
145 | """
146 | formulas_to_convert = self._get_formulas_to_convert(formulas)
147 | if formulas_to_convert:
148 | self.__converter = image.Tex2img(
149 | Format.Png if self.__options['png'] else Format.Svg
150 | )
151 | # apply configured image output options
152 | for option, value in self.__options.items():
153 | if value and hasattr(self.__converter, 'set_' + option):
154 | if isinstance(value, str): # only try string -> number
155 | try: # some values are numbers
156 | value = float(value)
157 | except ValueError:
158 | pass
159 | getattr(self.__converter, 'set_' + option)(value)
160 | self._convert_concurrently(formulas_to_convert)
161 |
162 | def _get_formulas_to_convert(self, formulas):
163 | """Build up a pipeline (list) of formulas for conversion.
164 | Formulas that that are in the cache or are doubled in the pipeline are dropped."""
165 | pipeline = [] # find as many file names as equations
166 | file_ext = Format.Png.value if self.__options['png'] else Format.Svg.value
167 | eqn_path = lambda x: os.path.join(self.__img_dir, 'eqn%03d.%s' % (x, file_ext))
168 | abs_eqn_path = lambda x: os.path.join(self.__output_path, eqn_path(x))
169 |
170 | # is (formula, display_math) already in the list of formulas to convert;
171 | # displaymath is important since formulas look different in inline maths
172 | formula_was_converted = lambda f, dsp: \
173 | (normalize_formula(f), dsp) in ( (normalize_formula(u[0]), u[3]) for u in pipeline)
174 | # find enough free file names
175 | file_name_count = 0
176 | used_file_names = [] # track which file names have been assigned
177 | for formula_count, (pos, dsp, formula) in enumerate(formulas):
178 | # ToDo: this belongs in the cache
179 | if not self.__cache.contains(formula, dsp) and not formula_was_converted(
180 | formula, dsp
181 | ):
182 | while (
183 | os.path.exists(abs_eqn_path(file_name_count))
184 | or eqn_path(file_name_count) in used_file_names
185 | ):
186 | file_name_count += 1
187 | used_file_names.append(eqn_path(file_name_count))
188 | pipeline.append(
189 | (formula, pos, eqn_path(file_name_count), dsp, formula_count + 1)
190 | )
191 | return pipeline
192 |
193 | def _convert_concurrently(self, formulas_to_convert):
194 | """The actual concurrent conversion process.
195 |
196 | Method is intended to be called from convert_all().
197 | """
198 | imgdir_full = os.path.join(self.__output_path, self.__img_dir)
199 | if imgdir_full and not os.path.exists(imgdir_full):
200 | # create directory *before* it is required in the concurrent
201 | # formulacreation step
202 | os.makedirs(imgdir_full)
203 |
204 | thread_count = int(multiprocessing.cpu_count() * 2)
205 | # convert missing formulas
206 | with concurrent.futures.ThreadPoolExecutor(
207 | max_workers=thread_count
208 | ) as executor:
209 | # start conversion and mark each thread with its formula, position
210 | # in the source file and formula_count (index into a global list of
211 | # formulas)
212 | jobs = {
213 | executor.submit(self.__convert, eqn, path, dsp): (eqn, pos, count)
214 | for (eqn, pos, path, dsp, count) in formulas_to_convert
215 | }
216 | error_occurred = None
217 | for future in concurrent.futures.as_completed(jobs):
218 | # cancel all pending requests
219 | if error_occurred and not future.done():
220 | future.cancel()
221 | continue
222 | formula, pos_in_src, formula_count = jobs[future]
223 | error_occurred = self._handle_job_output(future, formula, pos_in_src, formula_count)
224 | # pylint: disable=raising-bad-type
225 | if error_occurred:
226 | raise error_occurred
227 |
228 | def _handle_job_output(self, future, formula, pos_in_src, formula_count):
229 | """Process the output as produced by each conversion future.
230 | Handle the error case by signalling the error to end all other
231 | conversions."""
232 | try:
233 | data = future.result()
234 | except subprocess.SubprocessError as e:
235 | # retrieve the position (line, pos on line) in the source document
236 | # from original formula list
237 | if pos_in_src: # missing for the pandocfilter case
238 | pos_in_src = [p + 1 for p in pos_in_src] # line/pos count from 1
239 | self.__cache.write() # write back cache with valid entries
240 | if pos_in_src: # pandocfilter case:
241 | return ConversionException(
242 | str(e.args[0]),
243 | formula,
244 | formula_count,
245 | pos_in_src[0],
246 | pos_in_src[1],
247 | )
248 | else:
249 | return ConversionException(
250 | str(e.args[0]), formula, formula_count
251 | )
252 | else:
253 | self.__cache.add_formula(
254 | formula, data['pos'], data['path'], data['displaymath']
255 | )
256 | self.__cache.write()
257 |
258 | def __convert(self, formula, img_path, displaymath=False):
259 | """convert(formula, img_path, displaymath=False) Convert given formula
260 | with displaymath/inlinemath. This method wraps the formula in a tex
261 | document, executes all the steps to produce a image and return the
262 | positioning information for the HTML output. It does not check the
263 | cache.
264 |
265 | :param formula formula to convert
266 | :param img_path image output path (relative to the configured base_path,
267 | see __init__)
268 | :param displaymath whether or not to use displaymath during the conversion
269 | :return dictionary with position (pos), image path (path) and formula
270 | style (displaymath, boolean) as a dictionary with the keys in
271 | parenthesis
272 | """
273 | latex = typesetting.LaTeXDocument(formula)
274 | latex.set_displaymath(displaymath)
275 |
276 | def set(opt, setter):
277 | if self.__options[opt]:
278 | getattr(latex, 'set_' + setter)(self.__options[opt])
279 |
280 | set('preamble', 'preamble_string')
281 | set('latex_maths_env', 'latex_environment')
282 | set('background_color', 'background_color')
283 | set('foreground_color', 'foreground_color')
284 | if self.__encoding:
285 | latex.set_encoding(self.__encoding)
286 | if self.__replace_nonascii:
287 | latex.set_replace_nonascii(True)
288 | # dvipng needs the additionalindication of transparency (enabled by
289 | # default) when setting a background colour
290 | if self.__options['background_color']:
291 | self.__converter.set_transparency(False)
292 | pos = self.__converter.convert(
293 | latex, os.path.join(self.__output_path,
294 | os.path.splitext(img_path)[0])
295 | )
296 | return {
297 | 'pos': pos,
298 | 'path': img_path, # relative to self.__base_name(!)
299 | 'displaymath': displaymath,
300 | }
301 |
302 | def get_data_for(self, formula, display_math):
303 | """Simple wrapper around ImageCache, enriching the returned data with
304 | the information provided as arguments to this function.
305 |
306 | This helps when using a formula without its context.
307 | """
308 | data = self.__cache.get_data_for(formula, display_math).copy()
309 | data.update({'formula': formula, 'displaymath': display_math})
310 | return data
311 |
--------------------------------------------------------------------------------
/gleetex/caching.py:
--------------------------------------------------------------------------------
1 | # (c) 2013-2022 Sebastian Humenda
2 | # This code is licenced under the terms of the LGPL-3+, see the file COPYING for
3 | # more details.
4 | """This module contains the ImageCache, caching formulas which have already
5 | been converted. This allows to re-use images for formulas which occur multiple
6 | times within a document or multiple documents in a directory. Furthermore, it
7 | can significantly speed up incremental document creation, because the cache is
8 | remembered across GladTeX runs.
9 |
10 | Cache format:
11 |
12 | { # dict of formulas
13 | 'some formula': # formula as key into dictionary
14 | { # list of display math / inline maths variants
15 | True: # displaymath = True
16 | { # dictionary of values describing formula
17 | 'path': 'some/path'
18 | 'pos': { # positioning within the HTML document
19 | 'height': ..., 'width':..., 'depth:....
20 | }
21 | }
22 | }
23 | }
24 | }
25 |
26 | The spacing in formulas is normalised to avoid converting the same formula with
27 | different spacing.
28 | """
29 |
30 | import contextlib
31 | import json
32 | import os
33 |
34 | CACHE_VERSION = '2.0'
35 |
36 |
37 | def normalize_formula(formula):
38 | """Normalise the spacing of a formula.
39 |
40 | This squeezes multiple whitespace into a single, on, replaces tabs by spaces
41 | and strip trailing spaces.
42 | """
43 | return (
44 | formula.replace('{}', ' ')
45 | .replace('\t', ' ')
46 | .replace(' ', ' ')
47 | .rstrip()
48 | .lstrip()
49 | )
50 |
51 |
52 | def recover_bools(object):
53 | """After JSon is read from disk, keys as False or True have been serialized
54 | to 'false' and 'true', but they're not recovered by the json parser.
55 |
56 | This function converts these keys back to booleans; note: it
57 | only works with references, so this function doesn't return
58 | anything.
59 | """
60 | if isinstance(object, dict):
61 | for key in ['false', 'true']:
62 | if key in object:
63 | val = object[key] # store value
64 | # safe it with boolean representation
65 | object[key == 'true'] = val
66 | del object[key] # remove string key
67 | # iterate recursively through dict
68 | for value in object.values():
69 | recover_bools(value)
70 | if isinstance(object, list):
71 | for item in object:
72 | recover_bools(item)
73 |
74 |
75 | class JsonParserException(Exception):
76 | """Specialized exception class for handling errors while parsing the JSON
77 | cache."""
78 |
79 | pass
80 |
81 |
82 | class ImageCache:
83 | """This cache stores formulas which have been converted already and don't
84 | need to be converted again. This is both a disk usage and performance
85 | improvement. The cache can be written and read from disk.
86 |
87 | If the argument keep_old_cache is True, the cache will raise a
88 | JsonParserException if that file could not be read (i.e. incompatible
89 | GladTeX version). If set to False, it'll discard the cache along with all
90 | eqn* files and start with a clean cache.
91 |
92 | Example:
93 |
94 | cache = ImageCache()
95 | c.add_formula('\\tau', # the formulas
96 | {'height': 1, 'depth': 2, 'width='3'}, # the positioning information for the output document
97 | 'eqn042.svg', displaymath=True):
98 | assert len(cache) == 1 # one entry
99 | c.write()
100 | assert os.path.exists('gladtex.cache')
101 |
102 | The optional argument base_path adds the ability to add a base directory to
103 | each file path. The base_path is used to simulate a different working
104 | directory. Imagine you have a directory chapter01 and you want your images
105 | to be in a subdirectory img. Chjanging the current working directory isn't
106 | possible because of parallelism, therefore you initialise the cache like
107 | this:
108 |
109 | c = cache = ImageCache(path='/img/gladtex.cache', base_path='chapter01')
110 | c.add_formula(…, 'img/eqn001.svg') # will result in chapter01/img/eqn001.svg
111 | """
112 |
113 | VERSION_STR = 'GladTeX__cache__version'
114 |
115 | def __init__(self, path='gladtex.cache', keep_old_cache=True, base_path=''):
116 | self.__cache = {}
117 | self.__set_version(CACHE_VERSION)
118 | self.__cache_name = os.path.join(base_path, path)
119 | self.__base_path = base_path
120 | if os.path.exists(os.path.join(base_path, path)):
121 | try:
122 | self._read()
123 | except JsonParserException:
124 | if keep_old_cache:
125 | raise
126 | else:
127 | self._remove_old_cache_and_files()
128 |
129 | def __len__(self):
130 | """Return number of formulas in the cache."""
131 | # ignore version
132 | return len(self.__cache) - 1
133 |
134 | def __set_version(self, version):
135 | """Set version of cache (data structure format)."""
136 | self.__cache[ImageCache.VERSION_STR] = version
137 |
138 | def write(self):
139 | """Write cache to disk.
140 |
141 | The file name will be the one configured during initialisation
142 | of the cache.
143 | """
144 | if not self.__cache:
145 | return
146 | with open(self.__cache_name, 'w', encoding='UTF-8') as file:
147 | file.write(json.dumps(self.__cache))
148 |
149 | def _read(self):
150 | """Read Json from disk into cache, if file exists.
151 |
152 | :raises JsonParserException if json could not be parsed
153 | """
154 |
155 | def raise_error(msg):
156 | raise JsonParserException(
157 | msg
158 | + '\nPlease delete the cache (and'
159 | + ' the images) and rerun the program.'
160 | )
161 |
162 | if os.path.exists(self.__cache_name):
163 | # pylint: disable=broad-except
164 | try:
165 | with open(self.__cache_name) as file:
166 | self.__cache = json.load(file)
167 | except Exception as e:
168 | msg = 'error while reading cache from %s: ' % os.path.abspath(
169 | self.__cache_name
170 | )
171 | if isinstance(e, (ValueError, OSError)):
172 | msg += str(e.args[0])
173 | elif isinstance(e, UnicodeDecodeError):
174 | msg += (
175 | 'expected UTF-8 encoding, erroneous byte '
176 | + '{0} at {1}:{2} ({3})'.format(*(e.args[1:]))
177 | )
178 | else:
179 | msg += str(e.args[0])
180 | raise_error(msg)
181 | if not isinstance(self.__cache, dict):
182 | raise_error('Decoded Json is not a dictionary.')
183 | if not self.__cache.get(ImageCache.VERSION_STR):
184 | self.__set_version(CACHE_VERSION)
185 | cur_version = self.__cache.get(ImageCache.VERSION_STR)
186 | if cur_version != CACHE_VERSION:
187 | raise_error(
188 | 'Cache in %s has version %s, expected %s.'
189 | % (self.__cache_name, cur_version, CACHE_VERSION)
190 | )
191 | recover_bools(self.__cache)
192 |
193 | def _remove_old_cache_and_files(self):
194 | os.remove(self.__cache_name)
195 | directory = os.path.dirname(self.__cache_name)
196 | if not directory:
197 | directory = '.'
198 | # remove all files starting with eqn*
199 | for file in os.listdir(directory):
200 | if not file.startswith('eqn'):
201 | continue
202 | file = os.path.join(directory, file)
203 | if os.path.isfile(file):
204 | os.remove(file)
205 |
206 | def add_formula(self, formula, pos, file_path, displaymath=False):
207 | """Add formula to cache.
208 |
209 | The pos argument contains the positioning info for the output
210 | document and is a dict with 'height', 'width' and 'depth'. Keep
211 | in mind that formulas set with displaymath are not the same as
212 | those set iwth inlinemath. This method raises OSError if
213 | specified image doesn't exist or if it got an absolute
214 | file_path.
215 |
216 | If a file path already exists, the cache entry will be overridden.
217 | """
218 | if os.path.isabs(file_path):
219 | raise OSError(f"image path in cache may not be absolute: {file_path}")
220 | if '\\' in file_path:
221 | file_path = file_path.replace('\\', '/')
222 | if not os.path.exists(os.path.join(self.__base_path, file_path)):
223 | raise OSError(
224 | "cannot add %s to the cache: doesn't exist"
225 | % os.path.join(self.__base_path, file_path)
226 | )
227 | if not pos or not formula or not file_path:
228 | raise ValueError('the supplied arguments may not be empty/none')
229 | if not isinstance(displaymath, bool):
230 | raise ValueError('displaymath must be a boolean')
231 | formula = normalize_formula(formula)
232 | if not formula in self.__cache:
233 | self.__cache[formula] = {}
234 | val = self.__cache[formula]
235 | if not displaymath in val:
236 | val[displaymath] = {
237 | 'pos': pos,
238 | 'path': file_path,
239 | }
240 |
241 | def remove_formula(self, formula, displaymath):
242 | """This method removes the given formula from the cache.
243 |
244 | A KeyError is raised, if the formula did not exist. Internally,
245 | formulas are normalized to detect similarities.
246 | """
247 | formula = normalize_formula(formula)
248 | if not formula in self.__cache:
249 | raise KeyError('key %s not in cache' % formula)
250 | else:
251 | value = self.__cache[formula]
252 | if displaymath in value:
253 | with contextlib.suppress(FileNotFoundError):
254 | os.remove(
255 | os.path.join(self.__base_path,
256 | value[displaymath]['path'])
257 | )
258 | del self.__cache[formula][displaymath]
259 | if not self.__cache[formula]:
260 | del self.__cache[formula]
261 | else:
262 | raise KeyError('key %s (%s) not in cache' %
263 | (formula, displaymath))
264 |
265 | def contains(self, formula, displaymath):
266 | """Check whether a formula was already cached and return True if
267 | found."""
268 | try:
269 | return bool(self.get_data_for(formula, displaymath))
270 | except KeyError:
271 | return False
272 |
273 | def get_data_for(self, formula, displaymath):
274 | """Retrieve meta data about a formula from the cache.
275 |
276 | The meta information is used to embed the formula in the HTML
277 | document. It is a dictionary with the keys 'pos' and 'path'. The
278 | positioning info is described in the documentation of this
279 | class. This method raises a KeyError if the formula wasn't
280 | found.
281 | """
282 | formula = normalize_formula(formula)
283 | if not formula in self.__cache:
284 | raise KeyError(formula, displaymath)
285 | else:
286 | # check whether file still exists
287 | value = self.__cache[formula]
288 | if displaymath in value.keys():
289 | # if file doesn't exist anymore, outdated and hence removed from
290 | # cache
291 | if not os.path.exists(
292 | os.path.join(self.__base_path, value[displaymath]['path'])
293 | ):
294 | del self.__cache[formula]
295 | raise KeyError((formula, displaymath))
296 | else:
297 | return value[displaymath]
298 | else:
299 | raise KeyError((formula, displaymath))
300 |
301 |
--------------------------------------------------------------------------------
/gleetex/htmlhandling.py:
--------------------------------------------------------------------------------
1 | # (c) 2013-2023 Sebastian Humenda
2 | # This code is licenced under the terms of the LGPL-3+, see the file COPYING for
3 | # more details.
4 | """
5 | GleeTeX is designed to allow the re-use of the image creation code
6 | independently of the HTML conversion code. Therefore, this module contains the
7 | code required to parse equations from HTML, to write converted HTML documents
8 | back and to handle the exclusion of formulas too long for an HTML alt tag to
9 | an external HTML file.
10 | The pandoc module contains similar functions for using GleeTeX as a pandoc
11 | filter, without the using HTML as destination format.
12 | """
13 |
14 | from abc import abstractmethod
15 | import collections
16 | import enum
17 | import html
18 | import os
19 | import posixpath
20 | import re
21 |
22 | from . import sink
23 | from . import typesetting
24 |
25 | # match HTML 4 and 5
26 | CHARSET_PATTERN = re.compile(
27 | rb'(?:content="text/html; charset=(.*?)"|charset="(.*?)")')
28 |
29 |
30 | class ParseException(Exception):
31 | """Exception to propagate a parsing error."""
32 |
33 | def __init__(self, msg, pos=None):
34 | self.msg = msg
35 | self.pos = pos
36 | super().__init__(msg, pos)
37 |
38 | def __str__(self):
39 | if self.pos:
40 | return f'line {self.pos[0]}, {self.pos[1]}: {self.msg}'
41 | else:
42 | return self.msg
43 |
44 |
45 | def get_position(document, index):
46 | """This returns the line number and position on line for the given String.
47 |
48 | Note: lines and positions are counted from 0.
49 | """
50 | line = document[: index + 1].count('\n')
51 | if document[index] == '\n':
52 | return (line, 0)
53 | newline = document[: index + 1].rfind('\n')
54 | newline = newline if newline >= 0 else 0
55 | return (line, len(document[newline:index]))
56 |
57 |
58 | def find_anycase(where, what):
59 | """Find with both lower or upper case."""
60 | lower = where.find(what.lower())
61 | upper = where.find(what.upper())
62 | if lower >= 0:
63 | return lower
64 | return upper
65 |
66 |
67 | class EqnParser:
68 | """This parser parses ... in an HTML of a document.
69 |
70 | It's not an HTML parser, because the content within .* is
71 | parsed verbatim. It also parses comments, to not consider formulas
72 | within comments. All other cases are unhandled. Especially CData is
73 | problematic, although it seems like a rare use case.
74 | """
75 |
76 | class State(enum.Enum): # ([\s\S]*?) also matches newlines
77 | Comment = re.compile(r'', re.MULTILINE)
78 | Equation = re.compile(
79 | r'<\s*(?:eq|EQ)\s*(.*?)?>([\s\S.]+?)<\s*/\s*(?:eq|EQ)>', re.MULTILINE
80 | )
81 |
82 | HTML_ENTITY = re.compile(r'(&(:?#\d+|[a-zA-Z]+);)')
83 |
84 | def __init__(self):
85 | self.__document = None
86 | self.__data = []
87 | self.__encoding = None
88 |
89 | def feed(self, document):
90 | """Feed a string or a bytes instance and start parsing.
91 |
92 | If a bytes instance is fed, an HTML encoding header has to be
93 | present, so that the encoding can be extracted.
94 | """
95 | if isinstance(document, bytes): # try to guess encoding
96 | try:
97 | encoding = next(
98 | filter(bool, CHARSET_PATTERN.search(document).groups())
99 | ).decode('ascii')
100 | document = document.decode(encoding)
101 | except AttributeError as e:
102 | raise ParseException(
103 | (
104 | 'Could not determine encoding of '
105 | 'document, no charset information in the HTML header '
106 | 'found.'
107 | )
108 | ) from e
109 | self.__encoding = encoding
110 | self.__document = document[:]
111 | self._parse()
112 |
113 | def find_with_offset(self, doc, start, what):
114 | """This find method searches in the document for a given string,
115 | staking the offset into account.
116 |
117 | Returned is the absolute position (so offset + relative match
118 | position) or -1 for no hit.
119 | """
120 | if isinstance(what, str):
121 | pos = doc[start:].find(what)
122 | else:
123 | match = what.search(doc[start:])
124 | pos = -1 if not match else match.span()[0]
125 | return pos if pos == -1 else pos + start
126 |
127 | def _parse(self):
128 | """This function parses the document, while maintaining state using the
129 | State enum."""
130 | def in_document(x): return not x == -1
131 | # maintain a lower-case copy, which eases searching, but doesn't affect
132 | # the handler methods
133 | doc = self.__document[:].lower()
134 |
135 | end = len(self.__document) - 1
136 | eq_start = re.compile(r'<\s*eq\s*(.*?)>')
137 |
138 | start_pos = 0
139 | while start_pos < end:
140 | comment = self.find_with_offset(doc, start_pos, '' % match.groups()[0])
208 | return start_pos + match.span()[1] # return end of match
209 |
210 | def get_encoding(self):
211 | """Return the parsed encoding from the HTML meta data.
212 |
213 | If none was set, UTF-8 is assumed.
214 | """
215 | return self.__encoding
216 |
217 | def get_data(self):
218 | """Return parsed chunks.
219 |
220 | These are either strings or tuples with formula information, see
221 | class documentation.
222 | """
223 | return [x for x in self.__data if x] # filter empty bits
224 |
225 |
226 | def generate_label(formula):
227 | """Generate an id for identifying a formula as an anchor in a document.
228 |
229 | The generated ID is guaranteed to be valid in an XML attribute and
230 | it won't exceed a certain length. If you happen to have a lot of
231 | formulas > 150 characters with exactly the same content in the
232 | document, that'll cause a clash of id's.
233 | """
234 | # for some characters we just use a simple replacement (otherwise the
235 | # would be lost)
236 | mapped = {'{': '_', '}': '_',
237 | '(': '-', ')': '-', '\\': '.', '^': ',', '*': '_'}
238 | id = []
239 | prevchar = ''
240 | for c in formula:
241 | if prevchar == c:
242 | continue # avoid multiple same characters
243 | if c in mapped:
244 | id.append(mapped[c])
245 | elif c.isalpha() or c.isdigit():
246 | id.append(c)
247 | prevchar = c
248 | # id's must start with an alphabetical character, so prefix the formula with
249 | # "formula" to make it a valid html id
250 | if id and not id[0].isalpha():
251 | id = ['f', 'o', 'r', 'm', '_'] + id
252 | if not id: # is empty
253 | raise ValueError(
254 | "For the formula '%s' no referencable id could be generated." % formula
255 | )
256 | return ''.join(id[:150])
257 |
258 |
259 | def format_formula_paragraph(formula):
260 | """Format a formula to appear as if it would have been excluded into an
261 | external HTML file."""
262 | return '
%s
\n' % (generate_label(formula), formula)
263 |
264 |
265 | # pylint: disable=too-many-instance-attributes
266 | class ImageFormatter: # ToDo: localisation
267 | """ImageFormatter(is_epub=False)
268 |
269 | Format converted formula to be included into HTML. A typical image
270 | attribute will contain the path to the image, style information, a CSS class
271 | to be used in custom CSS style sheets and an alternative text (the LaTeX
272 | source) for people who disabled images or for blind screen reader users.
273 | If set, LaTeX formulas exceeding a configurable maximum length will be
274 | excluded. The image will be a link which leads to the excluded image text.
275 | The alt attribute is a text-only attribute and e.g. line breaks will be lost
276 | for screen reader users, so it makes sense for longer formulas to be
277 | external to be easily readable. Furthermore the alt attribute is limited in
278 | size, so formulas that are too long need to be treated differently.
279 | If that behavior is not wanted, it can be disabled and
280 | nothing will be excluded.
281 |
282 | Keyword arguments
283 |
284 | * `base_path=""`: base path where images are stored, e.g. "images"
285 | * `link_prefix=""`: a prefix which should be added to generated links, e.g.
286 | `"https://example.com/img/"`
287 | * `exclusion_file_path=""`: the path which formula descriptions are
288 | written to which exceed a certain threshold that doesn't fit into the
289 | alt tag of the `img` tag
290 | * `is_epub`: round height/width of the linked images to comply with the
291 | EPUB standard.
292 |
293 | Intended usage:
294 |
295 | fmt = ImageFormatter() # use one of the children classes
296 | # values as returned by Tex2img
297 | fmt.format(pos, formula, img_path, displaymath=False)
298 | fmt.format(pos2, formula2, img_path2, displaymath=True)
299 | ...
300 | img.get_excluded() # a list of formulas that were too long for the alt tag
301 | """
302 |
303 | def __init__(self, base_path=None, link_prefix='',
304 | exclusion_file_path=sink.EXCLUSION_FILE_NAME, is_epub=False):
305 | self.__inline_maxlength = 100
306 | self._excluded_formulas = collections.OrderedDict()
307 | self.__url = ''
308 | self._is_epub = is_epub
309 | self._css = {'inline': 'inlinemath', 'display': 'displaymath'}
310 | self.__replace_nonascii = False
311 | self._link_prefix = link_prefix if link_prefix else ''
312 | base_path = ("" if not base_path else base_path)
313 | self._exclusion_filepath = posixpath.join(
314 | base_path, exclusion_file_path)
315 | if os.path.exists(self._exclusion_filepath) and not os.access(
316 | self._exclusion_filepath, os.W_OK
317 | ):
318 | raise OSError(f'file {self._exclusion_filepath} not writable')
319 |
320 | def get_exclusion_file_path(self):
321 | """Return the path to the file to which formulas will be excluded too
322 | if their description exceeds the alt attribute length.
323 |
324 | May be None.
325 | """
326 | return self._exclusion_filepath if self._exclusion_filepath else None
327 |
328 | def set_replace_nonascii(self, flag):
329 | """If True, non-ascii characters will be replaced through their LaTeX
330 | command.
331 |
332 | Note that alphabetical characters will not be replaced, to allow
333 | easier readibility.
334 | """
335 | self.__replace_nonascii = flag
336 |
337 | def set_max_formula_length(self, length):
338 | """Set maximum length of a formula before it gets excluded into a
339 | separate file."""
340 | self.__inline_maxlength = length
341 |
342 | def set_inline_math_css_class(self, css):
343 | """set css class for inline math."""
344 | self._css['inline'] = css
345 |
346 | @abstractmethod
347 | def _generate_link_label(self, formula):
348 | """Generate the link to an excluded formula, consisting either of path
349 | and label or just a label.
350 |
351 | The label is generated uniquely for each label by this function.
352 | This function needs to be customised by implementors, e.g. to
353 | return "foo.html#formula" or "#formula", etc.
354 | """
355 |
356 | def set_display_math_css_class(self, css):
357 | """set css class for display math."""
358 | self._css['display'] = css
359 |
360 | def set_is_epub(self, flag):
361 | """Active rounding of height and weight attribute of the formula images
362 | to comply with the EPUB standard."""
363 | self._is_epub = flag
364 |
365 | def set_url(self, prefix):
366 | """Set URL prefix which is used as a prefix to the image file in the
367 | HTML link."""
368 | self.__url = prefix
369 |
370 | def get_excluded(self):
371 | """Return a list of LaTeX formulas that did not fit the alt tag and
372 | were hence formatted separately, e.g. into a separate document."""
373 | return self._excluded_formulas
374 |
375 | def _process_image(self, pos, formula, img_path, displaymath=False):
376 | """Process positioning of the image and the various URI-related
377 | parameters into formatting information.
378 |
379 | :param pos dictionary containing keys depth, height and width
380 | :param formula LaTeX alternative text
381 | :param img_path: path to image
382 | :param displaymath display or inline math (default False, inline maths)
383 | :returns a dictionary with the information about the image; its keys
384 | correspond to HTML image attributes, except for "url" and "image".
385 | """
386 | image = {'formula': formula}
387 | full_url = img_path
388 | if self.__url:
389 | full_url = self.__url.rstrip('/') + '/' + img_path
390 | image['url'] = full_url
391 | # depth is a negative offset (float, first, str later)
392 | depth = float(pos['depth']) * -1
393 | if self._is_epub:
394 | depth = str(int(depth))
395 | else:
396 | depth = f'{depth:.2f}'
397 | image['style'] = f'vertical-align: {depth}px; margin: 0;'
398 |
399 | image['class'] = self._css['display'] if displaymath else self._css['inline']
400 | if self._is_epub:
401 | image.update(
402 | {'height': str(int(pos['height'])),
403 | 'width': str(int(pos['width']))}
404 | )
405 | else:
406 | image.update(
407 | {'height': f"{pos['height']:.2f}",
408 | 'width': f"{pos['width']:.2f}"}
409 | )
410 | return image
411 |
412 | @abstractmethod
413 | def add_excluded(self, image):
414 | """Add a formula to the list of excluded formulas."""
415 |
416 | @abstractmethod
417 | def format_internal(self, image, link_label=None):
418 | """Format an internal formula for the target output (defined by the
419 | class).
420 |
421 | :param image formula information as returned by _process_image; formula
422 | will have been shortened if it were too long
423 | :param link_label if not None, the formula image will contian a reference
424 | or link to the long version of the formula (e.g. because it didn't fit
425 | the alt attribute)
426 | """
427 |
428 | def format(self, pos, formula, img_path, displaymath=False):
429 | """This method formats a formula. It invokes the abstract methods
430 | `format_internal` and `add_excluded`. `add_excluded` is only invoked if
431 | the formula is too long and if exclusion has been configured. This
432 | method returns the formatted image. The formatted image will contain a
433 | reference to the excluded formula source, if applicable. The formatted
434 | excluded formulas can be retrieved using get_excluded().
435 |
436 | :param pos dictionary containing keys depth, height and width
437 | :param formula LaTeX alternative text
438 | :param img_path: path to image
439 | :param displaymath whether or not formula is in display math (default: no)
440 | :returns a tuple containing the formatted image and, if applicable, the
441 | excluded image alternate text.
442 | """
443 | formula = typesetting.increase_readability(
444 | formula, self.__replace_nonascii)
445 | processed_data = self._process_image(
446 | pos, formula, img_path, displaymath)
447 | shortened_data = processed_data.copy()
448 | shortened_data['formula'] = formula
449 | link_destination = None
450 | if len(formula) > self.__inline_maxlength:
451 | shortened_data['formula'] = f"{formula[:self.__inline_maxlength]}..."
452 | link_destination = self._generate_link_destination(processed_data)
453 | # builds up internal list of formatted excluded formulas
454 | self.add_excluded(processed_data)
455 | return self.format_internal(shortened_data, link_destination)
456 |
457 |
458 | class HtmlImageFormatter(ImageFormatter):
459 | """Format formulas for HTML file output.
460 |
461 | See ImageFormatter for information about the usage of the class.
462 | """
463 |
464 | def __init__(self, *args, **kwargs):
465 | super().__init__(*args, **kwargs)
466 |
467 | def _generate_link_destination(self, formula):
468 | html_label = generate_label(formula['formula'])
469 | exclusion_filelink = posixpath.join(
470 | self._link_prefix, self._exclusion_filepath
471 | )
472 | return f'{exclusion_filelink}#{html_label}'
473 |
474 | def format_internal(self, image, link_label=None):
475 | link_start, link_end = ('', '')
476 | if link_label:
477 | link_start = f'' if link_label else ''
478 | link_end = '' if link_label else ''
479 | escaped_formula = html.escape(image['formula'], quote=True)
480 | return (
481 | link_start
482 | + (
483 | f''
486 | )
487 | + link_end
488 | )
489 |
490 | # Todo: this function is useless: if should be merged with format and it
491 | # should build up a dictionary of id, full formula; the formatting should go
492 | # to a separate function; link prefix and such details should be part of
493 | # super class; ToDo, btw, link prefix also for image paths, probably not
494 | # used yet in format strings of format_internal
495 | def add_excluded(self, image):
496 | self._excluded_formulas[generate_label(
497 | image['formula'])] = image['formula']
498 |
499 |
500 | def write_html(file, document, formatter):
501 | """Processed HTML documents are made up of raw HTML chunks which are
502 | written back unaltered and of a processed image.
503 |
504 | A processed image is a former formula converted to an image with
505 | additional meta data. This is passed to the format function of the
506 | supplied formatter and the result is written to the given (open)
507 | file handle.
508 | """
509 | for chunk in document:
510 | if isinstance(chunk, dict):
511 | is_displaymath = chunk['displaymath']
512 | file.write(
513 | formatter.format(
514 | chunk['pos'], chunk['formula'], chunk['path'], is_displaymath
515 | )
516 | )
517 | else:
518 | file.write(chunk)
519 |
--------------------------------------------------------------------------------
/gleetex/image.py:
--------------------------------------------------------------------------------
1 | # (c) 2013-2021 Sebastian Humenda
2 | # This code is licenced under the terms of the LGPL-3+, see the file COPYING for
3 | # more details.
4 | """This module takes care of the actual image creation process.
5 |
6 | Each formula is saved as an image, either as PNG or SVG. SVG is advised,
7 | since it is a properly scalable format.
8 | """
9 |
10 | import enum
11 | import os
12 | import re
13 | import shutil
14 | import subprocess
15 | import sys
16 |
17 | from .typesetting import LaTeXDocument
18 |
19 | DVIPNG_REGEX = re.compile(r'^ depth=(-?\d+) height=(\d+) width=(\d+)')
20 | DVISVGM_DEPTH_REGEX = re.compile(
21 | r'^\s*width=.*?pt, height=.*?pt, depth=(.*?)pt')
22 | DVISVGM_SIZE_REGEX = re.compile(r'^\s*graphic size: (.*?)pt x (.*?)pt')
23 |
24 |
25 | def remove_all(*files):
26 | """Guarded remove of files (rm -f); no exception is thrown if a file
27 | couldn't be removed."""
28 | for file in files:
29 | try:
30 | os.remove(file)
31 | except OSError:
32 | pass
33 |
34 |
35 | def proc_call(cmd, cwd=None, install_recommends=True):
36 | """Execute cmd (list of arguments) as a subprocess.
37 |
38 | Returned is a tuple with stdout and stderr, decoded if not None. If
39 | the return value is not equal 0, a subprocess error is raised.
40 | Timeouts will happen after 20 seconds.
41 | """
42 | with subprocess.Popen(
43 | cmd,
44 | stdout=subprocess.PIPE,
45 | stderr=subprocess.PIPE,
46 | cwd=cwd,
47 | ) as proc:
48 | data = []
49 | try:
50 | data = [
51 | d.decode(sys.getdefaultencoding(), errors='surrogateescape')
52 | for d in proc.communicate(timeout=20)
53 | if d
54 | ]
55 | if proc.wait():
56 | raise subprocess.SubprocessError(
57 | 'Error while executing %s\n%s\n' % (
58 | ' '.join(cmd), '\n'.join(data))
59 | )
60 | except subprocess.TimeoutExpired as e:
61 | proc.kill()
62 | note = 'Subprocess expired with time out: ' + str(cmd) + '\n'
63 | poll = proc.poll()
64 | if poll:
65 | note += str(poll) + '\n'
66 | if data:
67 | raise subprocess.SubprocessError(str(data + '\n' + note))
68 | else:
69 | raise subprocess.SubprocessError(
70 | 'execution timed out after '
71 | + str(e.args[1])
72 | + ' s: '
73 | + ' '.join(e.args[0])
74 | )
75 | except KeyboardInterrupt as e:
76 | sys.stderr.write('\nInterrupted; ')
77 | import traceback
78 |
79 | traceback.print_exc(file=sys.stderr)
80 | except FileNotFoundError:
81 | # program missing, try to help
82 | text = 'Command `%s` not found.' % cmd[0]
83 | if install_recommends and shutil.which('dpkg'):
84 | text += ' Install it using `sudo apt install ' + install_recommends
85 | else:
86 | text += ' Install a TeX distribution of your choice, e.g. MikTeX or TeXlive.'
87 | raise subprocess.SubprocessError(text) from None
88 | if isinstance(data, list):
89 | return '\n'.join(data)
90 | return data
91 |
92 |
93 | # pylint: disable=too-few-public-methods
94 | class Format(enum.Enum):
95 | """Choose the image output format."""
96 |
97 | Png = 'png'
98 | Svg = 'svg'
99 |
100 |
101 | class Tex2img:
102 | """Convert a TeX document string into a png file. This class interacts with
103 | the LaTeX and dvipng/dvisvgm sub processes. Upon error the methods throw a
104 | SubprocessError with all necessary information to fix the issue.
105 |
106 | On PNG: The background of the PNG files will be transparent by
107 | default. If you set a background colour within the LaTeX
108 | document, you need to turn off transparency in this converter
109 | manually.
110 | """
111 |
112 | def __init__(self, fmt, encoding='UTF-8'):
113 | if not isinstance(fmt, Format):
114 | raise ValueError('Enumeration of type Format expected.' + str(fmt))
115 | self.__format = fmt
116 | self.__encoding = encoding
117 | self.__parsed_data = None
118 | self.__size = [115, None]
119 | self.__background = 'transparent'
120 | self.__keep_latex_source = False
121 | self.__is_epub = False
122 |
123 | def set_is_epub(self, val):
124 | """Enable or disable Epub-conforming image creation."""
125 | self.__is_epub = val
126 |
127 | def set_dpi(self, dpi):
128 | """Set output resolution for formula images.
129 |
130 | This has no effect ifthe output format is SVG. It will
131 | automatically overwrite a font size, if set.
132 | """
133 | if not isinstance(dpi, (int, float)):
134 | raise TypeError('Dpi must be an integer or floating point number')
135 | self.__size[0] = int(dpi)
136 |
137 | def set_fontsize(self, size):
138 | """Set font size for formulas.
139 |
140 | This will be automatically translated into a DPI resolution for
141 | PNG images and taken literally for SVG graphics.
142 | """
143 | if not isinstance(size, (int, float)):
144 | raise TypeError('Dpi must be an integer or floating point number')
145 | self.__size[1] = float(size)
146 |
147 | def set_transparency(self, flag):
148 | """Set whether or not to use background colour information from the DVI
149 | file.
150 |
151 | This is only relevant for PNG output and if a background colour
152 | other than "transparent" is required, in this case this set'r
153 | should be set to false. It is set to True, resulting in a
154 | transparent background.
155 | """
156 | self.__background = 'transparent' if flag else 'not transparent'
157 |
158 | def set_keep_latex_source(self, flag):
159 | """Set whether LaTeX source document should be kept."""
160 | if not isinstance(flag, bool):
161 | raise TypeError('boolean object required, got %s.' % repr(flag))
162 | self.__keep_latex_source = flag
163 |
164 | def create_dvi(self, tex_document, dvi_fn):
165 | """Call LaTeX to produce a dvi file with the given LaTeX document.
166 |
167 | Temporary files will be removed, even in the case of a LaTeX
168 | error. This method raises a SubprocessError with the helpful
169 | part of LaTeX's error output.
170 | """
171 | path = os.path.dirname(dvi_fn)
172 | if path and not os.path.exists(path):
173 | os.makedirs(path)
174 | if not path:
175 | path = os.getcwd()
176 |
177 | def new_extension(x): return os.path.splitext(dvi_fn)[0] + '.' + x
178 |
179 | if self.__size[1]: # font size in pt
180 | tex_document.set_fontsize(self.__size[1])
181 | tex_fn = new_extension('tex')
182 | aux_fn = new_extension('aux')
183 | log_fn = new_extension('log')
184 | cmd = None
185 | encoding = self.__encoding
186 | with open(tex_fn, mode='w', encoding=encoding) as tex:
187 | tex.write(str(tex_document))
188 | cmd = [
189 | 'latex',
190 | '-interaction=nonstopmode',
191 | '-halt-on-error',
192 | os.path.basename(tex_fn),
193 | ]
194 | try:
195 | proc_call(cmd, cwd=path, install_recommends='texlive-recommended')
196 | except subprocess.SubprocessError as e:
197 | remove_all(dvi_fn)
198 | msg = ''
199 | if e.args:
200 | data = self.parse_latex_log(e.args[0])
201 | if data:
202 | msg += data
203 | else:
204 | msg += str(e.args[0])
205 | raise subprocess.SubprocessError(msg) # propagate subprocess error
206 | finally:
207 | if self.__keep_latex_source:
208 | remove_all(aux_fn, log_fn)
209 | else:
210 | remove_all(tex_fn, aux_fn, log_fn)
211 |
212 | def create_image(self, dvi_fn):
213 | """Create the image containing the formula, using either dvisvgm or
214 | dvipng."""
215 | dirname = os.path.dirname(dvi_fn)
216 | if dirname and not os.path.exists(dirname):
217 | os.makedirs(dirname)
218 |
219 | output_fn = '%s.%s' % (os.path.splitext(dvi_fn)
220 | [0], self.__format.value)
221 | if self.__format == Format.Png:
222 | dpi = fontsize2dpi(
223 | self.__size[1]) if self.__size[1] else self.__size[0]
224 | return create_png(dvi_fn, output_fn, dpi, self.__background)
225 | if not self.__size[1]:
226 | self.__size[1] = 12 # 12 pt
227 | return create_svg(dvi_fn, output_fn)
228 |
229 | def convert(self, tex_document, base_name):
230 | """Convert the given TeX document into an image.
231 |
232 | The base name is used to create the required intermediate files
233 | and the resulting file will be made of the base_name and the
234 | format-specific file extension. This function returns the
235 | positioning information used in the CSS style attribute.
236 | """
237 | if not isinstance(tex_document, LaTeXDocument):
238 | raise TypeError(
239 | ('expected object of type typesetting.LaTeXDocument,' ' got %s')
240 | % type(tex_document)
241 | )
242 | dvi = '%s.dvi' % base_name
243 | try:
244 | self.create_dvi(tex_document, dvi)
245 | dimensions = self.create_image(dvi)
246 | if self.__is_epub:
247 | for key, val in dimensions.items():
248 | dimensions[key] = int(round(val))
249 | return dimensions
250 | except OSError:
251 | remove_all('%s.%s' % (base_name, self.__format.value))
252 | raise
253 |
254 | def parse_latex_log(self, logdata):
255 | """Parse the LaTeX error output and return the relevant part of it."""
256 | if not logdata:
257 | return None
258 | line = None
259 | for line in logdata.split('\n'):
260 | if line.startswith('! '):
261 | line = line[2:]
262 | break
263 | if line: # try to remove LaTeX line numbers
264 | lineno = re.search(r'\s*on input line \d+', line)
265 | if lineno:
266 | line = line[: lineno.span()[0]] + line[lineno.span()[1]:]
267 | return line
268 | return None
269 |
270 |
271 | def fontsize2dpi(size_pt):
272 | """This function calculates the DPI for the resulting image. Depending on
273 | the font size, a different resolution needs to be used. According to the
274 | dvipng manual page, the formula is:
275 |
276 | = * 72.27 / 10 [px * TeXpt/in / TeXpt]
277 | """
278 | size_px = size_pt * 1.3333333 # and more 3s!
279 | return size_px * 72.27 / 10
280 |
281 |
282 | def create_png(dvi_fn, output_name, dpi, background):
283 | """Create a PNG file from a given dvi file. The side effect is the PNG file
284 | being written to disk. By default, the background of the resulting image is
285 | transparent, setting any other value will make it use whatever was is set
286 | in the DVI file.
287 |
288 | :param dvi_fn Dvi file name
289 | :param output_name Output file name
290 | :param dpi Output resolution
291 | :param background Background colour (default: transparent)
292 | :return dimensions for embedding into an HTML document
293 | :raises ValueError raised whenever dvipng output coudln't be parsed
294 | """
295 | if not output_name:
296 | raise ValueError('Empty output_name')
297 | cmd = ['dvipng', '-q*', '-D', str(dpi)]
298 | if background == 'transparent':
299 | cmd += ['-bg', background]
300 | cmd += [
301 | '--height*',
302 | '--depth*',
303 | '--width*', # print information for embedding
304 | '-o',
305 | output_name,
306 | dvi_fn,
307 | ]
308 | data = None
309 | try:
310 | data = proc_call(cmd, install_recommends='dvipng')
311 | except subprocess.SubprocessError:
312 | remove_all(output_name)
313 | raise
314 | finally:
315 | remove_all(dvi_fn)
316 | for line in data.split('\n'):
317 | found = DVIPNG_REGEX.search(line)
318 | if found:
319 | return dict(zip(['depth', 'height', 'width'], map(float, found.groups())))
320 | raise ValueError('Could not parse dvi output: ' + repr(data))
321 |
322 |
323 | def create_svg(dvi_fn, output_name):
324 | """Create a SVG file from a given dvi file. The side effect is the SVG file
325 | being written to disk.
326 |
327 | :param dvi_fn Dvi file name
328 | :param output_name Output file name
329 | :param size font size in pt
330 | :return dimensions for embedding into an HTML document
331 | :raises ValueError raised whenever dvipng output couldn't be parsed
332 | """
333 | if not output_name:
334 | raise ValueError('Empty output_name')
335 | cmd = [
336 | 'dvisvgm',
337 | '--exact',
338 | '--no-fonts',
339 | '-o',
340 | output_name,
341 | '--bbox=preview',
342 | dvi_fn,
343 | ]
344 | data = None
345 | try:
346 | data = proc_call(cmd, install_recommends='texlive-binaries')
347 | except subprocess.SubprocessError:
348 | remove_all(output_name)
349 | raise
350 | finally:
351 | remove_all(dvi_fn)
352 | pos = {}
353 | for line in data.split('\n'):
354 | if not pos:
355 | found = DVISVGM_DEPTH_REGEX.search(line)
356 | if found:
357 | # convert from pt to px (assuming 96 dpi)
358 | pos['depth'] = float(found.groups()[0]) * 1.3333333
359 | else:
360 | found = DVISVGM_SIZE_REGEX.search(line)
361 | if found:
362 | pos.update(
363 | dict(
364 | zip(
365 | ['width', 'height'],
366 | # convert from pt to px (assuming 96 dpi)
367 | (float(v) * 1.3333333 for v in found.groups()),
368 | )
369 | )
370 | )
371 | return pos
372 | raise ValueError('Could not parse dvisvgm output: ' + repr(data))
373 |
--------------------------------------------------------------------------------
/gleetex/pandoc.py:
--------------------------------------------------------------------------------
1 | # (c) 2013-2018 Sebastian Humenda
2 | # This code is licenced under the terms of the LGPL-3+, see the file COPYING for
3 | # more details.
4 | """This module contains functionality to parse formulas from a given Pandoc
5 | document AST and to replace these through formatted HTML equations.
6 |
7 | It works in these parsses:
8 |
9 | 1. Extract all math elements from the Pandoc AST.
10 | 2. Convert all formulas to images
11 | * LaTeX is the slowest bit in this process, therefore the formulas are
12 | collected and then converted in parallel.
13 | 3. Replace all math tags in the pandoc AST by raw HTML inline formatting
14 | instructions that reference the converted images and position them
15 | correctly. Note that this cannot made HTML-independent because of the
16 | requirement to use vertical alignment that is not supported by the Pandoc
17 | AST and is hence expressed as a CSS styling instruction.
18 | """
19 |
20 | import json
21 |
22 | from .htmlhandling import ParseException
23 |
24 |
25 | def __extract_formulas(formulas, ast):
26 | """Recursively extract 'Math' elements from the given AST and add them to
27 | `formulas (list)`."""
28 | if isinstance(ast, list):
29 | for item in ast:
30 | __extract_formulas(formulas, item)
31 | elif isinstance(ast, dict):
32 | if 't' in ast and ast['t'] == 'Math':
33 | style, formula = ast['c']
34 | # style = {'t': 'blah'} -> we want blah
35 | style = next(iter(style.values()))
36 | if style not in ['InlineMath', 'DisplayMath']:
37 | raise ParseException(
38 | '[pandoc] unknown formula formatting: ' + repr(ast['c'])
39 | )
40 | style = True if style == 'DisplayMath' else False
41 | # position is None (only applicable for HTML parsing)
42 | formulas.append((None, style, formula))
43 | elif 'c' in ast:
44 | __extract_formulas(formulas, ast['c'])
45 | # ^ all other cases do not matter
46 |
47 |
48 | def extract_formulas(ast):
49 | """Extract formulas from a given Pandoc document AST. The returned formulas
50 | are typed like those form the HTML parser, therefore the first argument of
51 | the tuple is unused and hence None.
52 |
53 | :param ast Structure of lists and dicts representing a Pandoc document AST
54 | :return a list of formulas where each formula is (None, style, formula)
55 | """
56 | formulas = []
57 | __extract_formulas(formulas, ast['blocks'])
58 | return formulas
59 |
60 |
61 | def replace_formulas_in_ast(formatter, ast, formulas):
62 | """replace 'Math' elements from the given AST with a formatted variant Each
63 | 'Math' element found in the Pandoc AST will be replaced through a formatted
64 | (HTML) image link.
65 |
66 | The formulas are taken from the supplied formulas list. The number
67 | of formulas in the document has to match the number of formulas form
68 | the list.
69 | """
70 | if not formulas:
71 | return
72 | if isinstance(ast, list):
73 | for item in ast:
74 | replace_formulas_in_ast(formatter, item, formulas)
75 | elif isinstance(ast, dict):
76 | if 't' in ast and ast['t'] == 'Math':
77 | ast['t'] = 'RawInline' # raw HTML
78 | eqn = formulas.pop(0)
79 | ast['c'] = [
80 | 'html',
81 | formatter.format(
82 | eqn['pos'], eqn['formula'], eqn['path'], eqn['displaymath']
83 | ),
84 | ]
85 | elif 'c' in ast:
86 | replace_formulas_in_ast(formatter, ast['c'], formulas)
87 | # ^ ignore all other cases
88 |
89 |
90 | def write_pandoc_ast(file, document, formatter):
91 | """Replace 'Math' elements from a Pandoc AST with 'RawInline' elements,
92 | containing formatted HTML image tags.
93 |
94 | :param formatter A formatter offering the "format" method (see ImageFormatter)
95 | :param formulas A list of formulas with the information (pos, formula, path, displaymath)
96 | :param ast Document ast to modified
97 | """
98 | ast, formulas = document
99 | replace_formulas_in_ast(formatter, ast['blocks'], formulas)
100 | file.write(json.dumps(ast))
101 |
--------------------------------------------------------------------------------
/gleetex/parser.py:
--------------------------------------------------------------------------------
1 | # (c) 2013-2021 Sebastian Humenda
2 | # This code is licenced under the terms of the LGPL-3+, see the file COPYING for
3 | # more details.
4 | """Top-level API to parse input documents.
5 |
6 | The main point of the parsing is to extract formulas from a given input
7 | document, while preserving the remaining formatting. The returned parsed
8 | document structure is highly dependent on the input format and hence
9 | document in their respective functions.
10 | """
11 |
12 | import enum
13 | import json
14 | import sys
15 |
16 | from . import htmlhandling
17 | from . import pandoc
18 |
19 | ParseException = (
20 | htmlhandling.ParseException
21 | ) # re-export for consistent API from outside
22 |
23 |
24 | class Format(enum.Enum):
25 | HTML = 0
26 | # while this is json, we never know what other applications might decide to
27 | # use json as their intermediate representation ;)
28 | PANDOCFILTER = 1
29 |
30 | @staticmethod
31 | def parse(string):
32 | string = string.lower()
33 | if string == 'html':
34 | return Format.HTML
35 | if string == 'pandocfilter':
36 | return Format.PANDOCFILTER
37 | raise ValueError('unrecognised format: %s' % string)
38 |
39 |
40 | def parse_document(doc, fmt):
41 | """This function parses an input document (string or bytes) with the given
42 | format specifier. For HTML, the returned "parsed" document is a list of
43 | chunks, where raw chunks are just plain HTML instructions and data and
44 | formula chunks are parsed from the '' tags. If the input document is a
45 | pandoc AST, the formulas will be extracted and the document is a tuple of
46 | (pandoc AST, formulas).
47 |
48 | :param doc input of bytes or string to parse
49 | :param fmt either the enum type `Format` or a string understood by Format.parse
50 | :return (encoding, document) (a tuple)
51 | """
52 | if isinstance(fmt, str):
53 | fmt = Format.parse(fmt)
54 | encoding = None
55 | if fmt == Format.HTML:
56 | docparser = htmlhandling.EqnParser()
57 | docparser.feed(doc)
58 | encoding = docparser.get_encoding()
59 | encoding = encoding if encoding else 'utf-8'
60 | doc = docparser.get_data()
61 | elif fmt == Format.PANDOCFILTER:
62 | if isinstance(doc, bytes):
63 | doc = doc.decode(sys.getdefaultencoding())
64 | ast = json.loads(doc)
65 | formulas = pandoc.extract_formulas(ast)
66 | doc = (ast, formulas) # ← see doc string
67 | if not encoding:
68 | encoding = sys.getdefaultencoding()
69 | return encoding, doc
70 |
--------------------------------------------------------------------------------
/gleetex/sink.py:
--------------------------------------------------------------------------------
1 | """
2 | Sink functionality for outputs.
3 |
4 |
5 | GladTeX is capable of writing to different output formats, called a sink. A sink
6 | may parse the source and process the formulas in the document, replacing it with
7 | its converted equivalent. Tis decouples the GleeTeX-internal logic from HTML and
8 | allows using it e.g. as a filter for Pandoc (JSON-encoded).
9 | """
10 |
11 | import enum
12 | import html
13 |
14 | EXCLUSION_FILE_NAME = 'excluded-descriptions.html'
15 |
16 | # Todo: localisation
17 | HTML_TEMPLATE_HEAD = """
18 |
20 | \n
21 |
22 | Excluded Formulas
23 |
24 |
25 | """
26 |
27 |
28 | class SinkType(enum.Enum):
29 | """The type of sink to use. """
30 | drop = 0
31 | html_body = 1
32 | html_file = 2
33 | json_file = 2
34 | inline = 3
35 |
36 | def html_write_excluded_file(exclusion_filename, formatted_excluded_formulas):
37 | """Write back list of excluded formulas.
38 | Formulas that are too long or too complex for the alt tag are excluded to a
39 | separate file. This function initiates the writing process to the external
40 | file."""
41 | with open(exclusion_filename, 'w', encoding='UTF-8') as file:
42 | file.write(HTML_TEMPLATE_HEAD)
43 | _html_write_excluded(file, formatted_excluded_formulas)
44 | file.write('\n\n\n')
45 |
46 |
47 | def html_write_excluded_body(exclusion_filename, formatted_excluded_formulas):
48 | with open(exclusion_filename, 'w', encoding='UTF-8') as file:
49 | _html_write_excluded(file, formatted_excluded_formulas)
50 |
51 |
52 | def _html_write_excluded(file_obj, formatted_excluded_formulas):
53 | for label, formula in formatted_excluded_formulas.items():
54 | escaped_formula = html.escape(formula)
55 | file_obj.write(f'
\n')
56 |
57 |
58 | # Map the sink type to their processing function.
59 | EXCLUSION_FORMULA_SINKS = {
60 | SinkType.html_file: html_write_excluded_file,
61 | SinkType.html_body: html_write_excluded_body,
62 | }
63 |
--------------------------------------------------------------------------------
/gleetex/typesetting.py:
--------------------------------------------------------------------------------
1 | # (c) 2013-2021 Sebastian Humenda
2 | # This code is licenced under the terms of the LGPL-3+, see the file COPYING for
3 | # more details.
4 | """This module contains functionality to typeset formulas for the usage in a
5 | LaTeX document (e.g. creating the preamble, replacing non-ascii letters) and to
6 | typeset LaTeX formulas in a more readable way as alternate description of the
7 | resulting image."""
8 |
9 | import inspect
10 | import re
11 |
12 | from . import unicode
13 |
14 | FORMATTING_COMMANDS = [
15 | '\\ ',
16 | '\\,',
17 | '\\;',
18 | '\\big',
19 | '\\Big',
20 | '\\left',
21 | '\\right',
22 | '\\limits',
23 | ]
24 |
25 | # A list of LaTeX math environments which place their content in math
26 | # mode but can't be used in math mode themselves (i.e. be nested). Used
27 | # to prevent bad math environment nesting while rendering (see #21).
28 | #
29 | # Assembled from
30 | # - `https://docs.mathjax.org/en/latest/input/tex/macros/index.html#environments`
31 | # - `https://en.wikibooks.org/wiki/LaTeX/Advanced_Mathematics`,
32 | # filtered by custom tests to see if LaTeX compiles with nesting. Some
33 | # environments might still be missing, but these should cover the most
34 | # common use cases.
35 | NON_NESTABLE_MATH_ENVS = [
36 | 'align*',
37 | 'align',
38 | 'alignat*',
39 | 'alignat',
40 | 'displaymath',
41 | 'empheq',
42 | 'eqnarray*',
43 | 'eqnarray',
44 | 'equation*',
45 | 'equation',
46 | 'flalign*',
47 | 'flalign',
48 | 'gather*',
49 | 'gather',
50 | 'math',
51 | 'multline*',
52 | 'multline',
53 | 'numcases',
54 | 'prooftree',
55 | 'subnumcases',
56 | 'xalignat*',
57 | 'xalignat',
58 | 'xxalignat',
59 | ]
60 | # The pattern used to detect the presence of one of the environments
61 | # above in a given formula. We look for such an environment opening
62 | # after ignoring all initial space characters and LaTeX `%` comments
63 | # *only*, as the it is supposed to only be a formula and to avoid
64 | # complex and error-prone parsing, while still supporting the presumably
65 | # most common use cases.
66 | MATH_ENV_DETECTION_PATTERN = re.compile(
67 | r'\s*(%.*(\n|\r\n?)\s*)*\\begin\{{({})\}}'
68 | .format('|'.join(re.escape(env) for env in NON_NESTABLE_MATH_ENVS)),
69 | )
70 |
71 |
72 | class DocumentSerializationException(Exception):
73 | """This error is raised whenever a non-ascii character contained in a
74 | formula could not be replaced by a LaTeX command. It provides the following
75 | attributes:
76 |
77 | formula - the formula
78 | index - position in formula
79 | upoint - unicode point.
80 | """
81 |
82 | def __init__(self, formula, index, upoint):
83 | self.formula = formula
84 | self.index = index
85 | self.upoint = upoint
86 | super().__init__(formula, index, upoint)
87 |
88 | def __str__(self):
89 | return (
90 | 'could not find LaTeX replacement command for unicode '
91 | 'character %d, index %d in formula %s'
92 | ) % (self.upoint, self.index, self.formula)
93 |
94 |
95 | def escape_unicode_maths(formula, replace_alphabeticals=True):
96 | """This function uses the unicode table to replace any non-ascii character
97 | (identified with its unicode code point) with a LaTeX command.
98 |
99 | It also parses the formula for commands as e.g. \\\text or \\mbox
100 | and applies text-mode commands within them. This allows the
101 | conversion of formulas with unicode maths with old-style LaTeX2e,
102 | which gleetex depends on.
103 | """
104 | if not any(ord(ch) > 160 for ch in formula):
105 | return formula # no umlauts, no replacement
106 |
107 | # characters in math mode need a different replacement than in text mode.
108 | # Therefore, the string has to be split into parts of math and text mode.
109 | chunks = []
110 | if not ('\\text' in formula or '\\mbox' in formula):
111 | # no text mode, so tread a
112 | chunks = [formula]
113 | else:
114 | start = 0
115 | while '\\text' in formula[start:] or '\\mbox' in formula[start:]:
116 | index = formula[start:].find('\\text')
117 | if index < 0:
118 | index = formula[start:].find('\\mbox')
119 | opening_brace = formula[start + index:].find('{') + start + index
120 | # add text before text-alike command and the command itself to chunks
121 | chunks.append(formula[start:opening_brace])
122 | closing_brace = get_matching_brace(formula, opening_brace)
123 | # add text-mode stuff
124 | chunks.append(formula[opening_brace: closing_brace + 1])
125 | start = closing_brace + 1
126 | # add last chunk
127 | chunks.append(formula[start:])
128 |
129 | is_math = True
130 | for index, chunk in enumerate(chunks):
131 | try:
132 | chunks[index] = replace_unicode_characters(
133 | chunk, is_math, replace_alphabeticals=replace_alphabeticals
134 | )
135 | except ValueError as e: # unicode point missing
136 | index = int(e.args[0])
137 | raise DocumentSerializationException(
138 | formula, index, ord(formula[index])
139 | ) from None
140 | is_math = not is_math
141 | return ''.join(chunks)
142 |
143 |
144 | def replace_unicode_characters(characters, is_math, replace_alphabeticals=True):
145 | """Replace all non-ascii characters within the given string with their
146 | LaTeX equivalent. The boolean is_math indicates, whether text-mode commands
147 | (like in \\text{}) or the amsmath equivalents should be used. When
148 | replace_alphabeticals is False, alphabetical characters will not be
149 | replaced through their LaTeX command when in text mode, so that text
150 | within.
151 |
152 | \\text{} (and similar) is not garbled. For instance, \\text{für} is
153 | be replaced by \\text{f\"{u}r} when replace_alphabeticals=True. This
154 | is useful for the alt attribute of an image, where the reader might
155 | want to read the normal text as such. This function raises a
156 | ValueError if a unicode point is not in the table. The first
157 | argument of the ValueError is the index within the string, where the
158 | unknown unicode character has been encountered.
159 | """
160 | result = []
161 | for idx, character in enumerate(characters):
162 | if (
163 | ord(character) < 168
164 | ): # ignore normal ascii character and unicode control sequences
165 | result.append(character)
166 | # treat alphanumerical characters differently when in text mode, see doc
167 | # string; don't replace alphabeticals if specified
168 | elif character.isalpha() and not replace_alphabeticals:
169 | result.append(character)
170 | else:
171 | mode = unicode.LaTeXMode.mathmode if is_math else unicode.LaTeXMode.textmode
172 | commands = unicode.unicode_table.get(ord(character))
173 | if not commands: # unicode point missing in table
174 | # is catched one level above; provide index for more concise error output
175 | raise ValueError(characters.index(character))
176 | # if math mode and only a text alternative exists, add \\text{}
177 | # around it
178 | if mode == unicode.LaTeXMode.mathmode and mode not in commands:
179 | result.append('\\text{%s}' %
180 | commands[unicode.LaTeXMode.textmode])
181 | else:
182 | result.append(commands[mode])
183 | # if the next character is alphabetical, add space
184 | if (
185 | (idx + 1) < len(characters)
186 | and characters[idx + 1].isalpha()
187 | and commands[mode][-1].isalpha()
188 | ):
189 | result.append(' ')
190 | return ''.join(result)
191 |
192 |
193 | def get_matching_brace(string, pos_of_opening_brace):
194 | if string[pos_of_opening_brace] != '{':
195 | raise ValueError(
196 | 'index %s in string %s: not a opening brace'
197 | % (pos_of_opening_brace, repr(string))
198 | )
199 | counter = 1
200 | for index, ch in enumerate(string[pos_of_opening_brace + 1:]):
201 | if ch == '{':
202 | counter += 1
203 | elif ch == '}':
204 | counter -= 1
205 | if counter == 0:
206 | return pos_of_opening_brace + index + 1
207 | if counter != 0:
208 | raise ValueError('Unbalanced braces in formula ' + repr(string))
209 |
210 |
211 | # pylint: disable=too-many-instance-attributes
212 | class LaTeXDocument:
213 | """This class represents a LaTeX document.
214 |
215 | It is intended to contain an equation as main content and properties
216 | to customize it. Its main purpose is to provide a str method which
217 | will serialize it to a full LaTeX document.
218 | """
219 |
220 | def __init__(self, eqn):
221 | self.__encoding = None
222 | self.__equation = eqn
223 | self.__displaymath = False
224 | self.__fontsize = 12
225 | self.__background_color = None
226 | self.__foreground_color = None
227 | self._preamble = ''
228 | self.__maths_env = None
229 | self.__replace_nonascii = False
230 |
231 | def _parse_color(self, color):
232 | # could be a valid color name
233 | try: # hex number?
234 | return int(color, 16)
235 | except ValueError:
236 | return color # treat as normal dvips compatible colour name
237 |
238 | def set_background_color(self, color):
239 | """Set the background color.
240 |
241 | The `color` can be either a valid dvips name or a tuple with RGB
242 | values between 0 and 1. If unset, the image will be transparent.
243 | """
244 | self.__background_color = self._parse_color(color)
245 |
246 | def set_foreground_color(self, color):
247 | """Set the foreground color.
248 |
249 | The `color` can be either a valid dvips name or a tuple with RGB
250 | values between 0 and 1. If unset, the text will be black.
251 | """
252 | self.__foreground_color = self._parse_color(color)
253 |
254 | def set_replace_nonascii(self, flag):
255 | """If True, all non-ascii character will be replaced through a LaTeX
256 | command."""
257 | self.__replace_nonascii = flag
258 |
259 | def set_latex_environment(self, env):
260 | """Set maths environment name like `displaymath` or `flalign*`."""
261 | self.__maths_env = env
262 |
263 | def get_latex_environment(self):
264 | return self.__maths_env
265 |
266 | def get_encoding(self):
267 | """Return encoding for the document (or None)."""
268 | return self.__encoding
269 |
270 | def set_preamble_string(self, p):
271 | """Set the string to add to the preamble of the LaTeX document."""
272 | self._preamble = p
273 |
274 | def set_encoding(self, encoding):
275 | """Set the encoding as used by the inputenc package."""
276 | if encoding.lower().startswith('utf') and '8' in encoding:
277 | self.__encoding = 'utf8'
278 | elif (
279 | encoding.lower().startswith('iso') and '8859' in encoding
280 | ) or encoding.lower() == 'latin1':
281 | self.__encoding = 'latin1'
282 | else:
283 | # if you plan to add an encoding, you have to adjust the str
284 | # function, which also loads the fontenc package
285 | raise ValueError(
286 | (
287 | 'Encoding %s is not supported at the moment. If '
288 | 'you want to use LaTeX 2e, you should report a bug at the home '
289 | 'page of GladTeX.'
290 | )
291 | % encoding
292 | )
293 |
294 | def set_displaymath(self, flag):
295 | """Set whether the formula is set in displaymath."""
296 | if not isinstance(flag, bool):
297 | raise TypeError('Displaymath parameter must be of type bool.')
298 | self.__displaymath = flag
299 |
300 | def is_displaymath(self):
301 | return self.__displaymath
302 |
303 | def _get_encoding_preamble(self):
304 | # first check whether there are umlauts within the formula and if so, an
305 | # encoding has been set
306 | if any(ord(ch) > 128 for ch in self.__equation) and not self.__replace_nonascii:
307 | if not self.__encoding:
308 | raise ValueError(
309 | (
310 | 'No encoding set, but non-ascii characters '
311 | 'present. Please specify an encoding.'
312 | )
313 | )
314 | encoding_preamble = ''
315 | if self.__encoding:
316 | # try to guess language and hence character set (fontenc)
317 | import locale
318 |
319 | language = locale.getdefaultlocale()
320 | if language and language[0]: # extract just the language code
321 | language = language[0].split('_')[0]
322 | if not language or not language[0]:
323 | language = 'en'
324 | # check whether language on computer is within T1 and hence whether
325 | # it should be loaded; I know that this can be a misleading
326 | # assumption, but there's no better way that I know of
327 | if language in ['fr', 'es', 'it', 'de', 'nl', 'ro', 'en']:
328 | encoding_preamble += '\n\\usepackage[T1]{fontenc}'
329 | else:
330 | raise ValueError(
331 | (
332 | 'Language not supported by T1 fontenc '
333 | 'encoding; please report this to the GladTeX project.'
334 | )
335 | )
336 | return encoding_preamble
337 |
338 | def set_fontsize(self, size_in_pt):
339 | """Set fontsize in pt, 12 pt by default."""
340 | self.__fontsize = size_in_pt
341 |
342 | def get_fontsize(self):
343 | return self.__fontsize
344 |
345 | def __str__(self):
346 | preamble = (
347 | self._get_encoding_preamble()
348 | + ('\n\\usepackage[utf8]{inputenc}\n\\usepackage{amsmath, amssymb}' '\n')
349 | + (self._preamble if self._preamble else '')
350 | )
351 | return self._format_document(preamble)
352 |
353 | def _format_color_definition(self, which):
354 | color = getattr(self, '_%s__%s_color' %
355 | (self.__class__.__name__, which))
356 | if not color or isinstance(color, str):
357 | return ''
358 | return '\\definecolor{%s}{HTML}{%s}' % (which, hex(color)[2:].upper().zfill(6))
359 |
360 | def _format_colors(self):
361 | color_defs = (
362 | self._format_color_definition('background'),
363 | self._format_color_definition('foreground'),
364 | )
365 | color_body = ''
366 | if self.__background_color:
367 | color_body += '\\pagecolor{%s}' % (
368 | 'background' if color_defs[0] else self.__background_color
369 | )
370 | if self.__foreground_color:
371 | # opening brace isn't required here, inserted automatically
372 | color_body += '\\color{%s}' % (
373 | 'foreground' if color_defs[1] else self.__foreground_color
374 | )
375 | return (''.join(color_defs), color_body)
376 |
377 | def _format_document(self, preamble):
378 | """Return a formatted LaTeX document with the specified formula
379 | embedded."""
380 | formula = self.__equation.lstrip().rstrip()
381 | if self.__replace_nonascii:
382 | formula = escape_unicode_maths(formula, replace_alphabeticals=True)
383 | # Try to detect and support the usage of math environments which
384 | # cannot be nested in other math environments in order to
385 | # prevent invalid nesting. I.e., when found, such an environment
386 | # is not wrapped (fixing #21).
387 | if MATH_ENV_DETECTION_PATTERN.match(formula):
388 | opening = closing = ''
389 | elif self.__maths_env:
390 | opening = '\\begin{%s}' % self.__maths_env
391 | closing = '\\end{%s}' % self.__maths_env
392 | else:
393 | # determine characters with which to surround the formula
394 | opening = '\\[' if self.__displaymath else '\\('
395 | closing = '\\]' if self.__displaymath else '\\)'
396 | fontsize = 'fontsize=%ipt' % self.__fontsize
397 | color_preamble, color_body = self._format_colors()
398 | return inspect.cleandoc(
399 | f"""
400 | \\PassOptionsToPackage{{dvipsnames}}{{xcolor}}\n
401 | \\documentclass[{fontsize}, fleqn]{{scrartcl}}\n
402 | {preamble}
403 | \\usepackage{{xcolor}}
404 | {color_preamble}
405 | {color_body}
406 | % tightpage must be last, see its package docs
407 | \\usepackage[active,textmath,displaymath,tightpage]{{preview}}\n
408 | \\begin{{document}}\n
409 | \\noindent%
410 | \\begin{{preview}}{{%s
411 | {opening}{formula}{closing}}}\\end{{preview}}\n
412 | \\end{{document}}\n
413 | """
414 | )
415 |
416 |
417 | def increase_readability(formula, replace_nonascii=False):
418 | """In alternate texts for non-image users or those using a screen reader,
419 | the LaTeX code should be as readable as possible.
420 |
421 | Therefore the formula should not contain unicode characters or
422 | formatting instructions.
423 | """
424 | if replace_nonascii:
425 | # keep umlauts, etc; makes the alt more readable, yet wouldn't compile
426 | formula = escape_unicode_maths(formula, replace_alphabeticals=False)
427 | # replace formatting-only symbols which distract the reader
428 | formula_changed = True
429 | while formula_changed:
430 | formula_changed = False
431 | for command in FORMATTING_COMMANDS:
432 | idx = formula.find(command)
433 | # only replace if it's not after a \\ and not part of a longer command
434 | if (idx > 0 and formula[idx - 1] != '\\') or idx == 0:
435 | end = idx + len(command)
436 | # following conditions for replacement must be met:
437 | # command doesn't end on alphabet. char. and is followed by same
438 | # category OR end of string reached OR command does not # end on
439 | # alphabetical char. at all
440 | if (
441 | end >= len(formula)
442 | or not command[-1].isalpha()
443 | or not formula[end].isalpha()
444 | ):
445 | formula = formula[:idx] + ' ' + \
446 | formula[idx + len(command):]
447 | formula = formula.replace(' ', ' ')
448 | formula_changed = True
449 | return formula
450 |
--------------------------------------------------------------------------------
/manpage.md:
--------------------------------------------------------------------------------
1 | % GLADTEX(1)
2 | % Sebastian Humenda
3 | % 5th of June 2021
4 |
5 | # NAME
6 |
7 | **GladTeX** - generate HTML with LaTeX formulas embedded as images
8 |
9 | # SYNOPSIS
10 |
11 | **gladtex** [OPTIONS]
12 |
13 |
14 | # DESCRIPTION
15 |
16 | **GladTeX** is a formula preprocessor for HTML files. It recognizes a special tag
17 | (`...`) marking formulas for conversion. The converted vector images
18 | are integrated into the output HTML document.
19 | This eases the process of creating HTML
20 | documents (or web sites) containing formulas.\
21 | The generated images are saved in a cache to not render the same image over
22 | and over again. This speeds up the process when formulas occur multiple times or
23 | when a document is extended gradually.
24 |
25 | The LaTeX formulas are preserved in the alt attribute of the embedded images,
26 | hence screen reader users benefit from an accessible HTML version of the
27 | document.
28 |
29 | Furthermore it can be used with Pandoc to convert Markdown documents and other
30 | formats with LaTeX formulas to HTML, EPUB and in fact to any HTML-based format,
31 | see the option `-P`.
32 |
33 | See [FILE FORMAT](#file-format) for an explanation of the file format and
34 | [EXAMPLES](#examples) for examples on how to use GladTeX on its own or with
35 | Pandoc.
36 |
37 | # OPTIONS
38 |
39 | **INPUT FILE NAME**
40 | : Input .htex file with LaTeX formulas (if omitted or -, stdin will be read).
41 |
42 | **-h** **--help**
43 | : Show this help message and exit.
44 |
45 | **-a**
46 | : Save text alternatives for images which are too long for the alt attribute
47 | into a single separate file and link images to it.
48 |
49 | **-b** _BACKGROUND_COLOR_
50 | : Set background color for resulting images (default transparent). GladTeX
51 | understands colors as provided by the `dvips` option of the xcolor LaTeX
52 | package. Alternatively, a 6-digit hexadecimal value can be provided (as used
53 | e.g. in HTML/CSS).
54 |
55 | **-c** _`FOREGROUND_COLOR`_
56 | : Set foreground color for resulting images. See the option above for a more
57 | in-depth explanation.
58 |
59 | **-d** _DIRECTORY_
60 | : Directory in which to store the generated images in (relative path).\
61 | The given path is interpreted relatively to the input file. For instance,:
62 |
63 | gladtex -d img dir/file.htex
64 |
65 | will create a `dir/img` directory and link accordingly in `x/file.htex`.
66 |
67 | **-e** _`LATEX_MATHS_ENV`_
68 | : Set custom maths environment to surround the formula (e.g. flalign).
69 |
70 | **-E** _ENCODING_
71 | : Overwrite encoding to use (default UTF-8).
72 |
73 | **--epub**
74 | : Make embedded formula image more EPUB-compliant, i.e. round pixel sizes to
75 | integers.
76 |
77 | **-f** _FONTSIZE_
78 | : Overwrite the default font size of 12pt. 12pt is the default in most
79 | browsers and hence changing this might lead to less-portable documents.
80 |
81 | **-i** _CLASS_
82 | : CSS class to assign to inline math (default: 'inlinemath').
83 |
84 | **-K**
85 | : keep LaTeX file(s) when converting formulas
86 |
87 | By default, the generated LaTeX document, containing the formula to be
88 | converted, are removed after the conversion (no matter whether it was
89 | successful or not). If it wasn't successful, it is sometimes helpful to look
90 | at the complete document. This option will keep the file.
91 |
92 | **-l** _CLASS_
93 | : CSS class to assign to block-level math (default: 'displaymath').
94 |
95 | **-n**
96 | : Purge unreadable caches along with all eqn*.png files.
97 |
98 | Caches can be unreadable if the used GladTeX version is incompatible. If
99 | this option is unset, GladTeX will simply fail when the cache is unreadable.
100 |
101 | **-m**
102 | : Print error output in machine-readable format (less concise, better parseable).
103 |
104 | Each line will start with a key, followed by a colon, followed by the value,
105 | i.e. `line: 5`.
106 |
107 | **-o** _FILENAME_
108 | : Set output file name. '-' will print text to stdout. Bydefault, input file
109 | name is used and the `.htex` extension is replaced by `.html`.
110 |
111 | **-p** _`LATEX_STATEMENT`_
112 | : Add given LaTeX code to the preamble of the LaTeX document that is used to
113 | generate the embedded images. In order to add the contents of a file to the
114 | preamble, use `-p "\input{FILE}"`.
115 |
116 | **-P**
117 | : Act as a pandoc filter. In this mode, input is expected to be a Pandoc JSON
118 | AST and the output will be a modified AST, with all formulas replaced
119 | through HTML image tags. It makes sense to use `-` as the input file for
120 | this option.
121 | This option implies `-E UTF-8`. Also see [GLADTEX_ARGS](#gladtex_args) on
122 | how to invoke GladTeX as a pandoc filter and how to pass arguments in this
123 | mode.
124 |
125 | **--png**
126 | : Switch from SVG to PNG as image output. This image has several known issues,
127 | one of them being that images won't resize when zooming into the document.
128 | It is also harder to work with for visually impaired users.
129 |
130 | **-r** _DPI_
131 | : Set resolution (size of images) to 'dpi' (115 by default). This is only
132 | available with the `--png` option. Also see the `-f` option.
133 |
134 | **-R**
135 | : Replace non-ascii (unicode) characters by LaTeX commands.
136 |
137 | GladTeX can automatically detect non-ascii characters in formulas and
138 | replace them through their appropriate LaTeX commands. In the alt attribute
139 | of the resulting image, alphabetical characters won't be replaced. That
140 | means that the alt text from the image is not exactly the same than the
141 | code used for generating the image, but it is far more readable.
142 |
143 | For instance, the formula \$\\text{für alle} a\$, would be compiled as
144 | \$\\text{f\\ddot{u}r alle} a\$ and displayed as "\\text{für alle} a" in the alt
145 | attribute.
146 |
147 |
148 | **-u** _URL_
149 | : Base URL to image files (relative links are default).
150 |
151 | # FILE FORMAT
152 |
153 | A .htex file is essentially a HTML file containing LaTeX formulas. The formulas
154 | have to be surrounded by `` and ``.
155 |
156 | By default, formulas are rendered as inline maths, so they are squeezed to the
157 | height of the line. It is possible to render a formula as display maths by
158 | setting the env attribute to displaymath, i.e. `...`.
159 |
160 | # ENVIRONMENT VARIABLES
161 |
162 | GladTeX can be customised by environment variables:
163 |
164 | `DEBUG`
165 | : If this is set to 1, a full Python traceback, instead of a human-readable
166 | error message, will be displayed.
167 | [`GLADTEX_ARGS`:]{#gladtex_args}
168 | : When this environment variable is set, GladTeX switches into
169 | the **pandoc filter** mode: input is read from standard input, output
170 | written to standard output and the `-P` and `-E UTF-8` options are assumed.
171 | The contents of this variable are parsed as command-line switches. Qutoing
172 | can be done in POSIX-shell compatible syntax:
173 |
174 | ```
175 | export GLADTEX_ARGS='-d "image directory"'
176 | ```
177 |
178 | It may be empty as well, which will just imply `-P`.
179 | See an example in [Output As EPUB]#output-asepub).
180 |
181 | # EXAMPLES
182 |
183 | ## Sample HTEX document
184 |
185 | A sample HTEX document could look like this:
186 |
187 | ~~~~
188 |
189 |
190 |
')
106 | formulas = [c for c in self.p.get_data() if isinstance(c,
107 | (tuple, list))]
108 | self.assertEqual(len(formulas), 2) # there should be _2_ formulas
109 | self.assertEqual(formulas[0][1], False) # no displaymath
110 | self.assertEqual(formulas[1][1], False) # no displaymath
111 |
112 | def test_that_unclosed_formulas_detected(self):
113 | self.assertRaises(htmlhandling.ParseException,
114 | self.p.feed, '\\pi')
115 | self.assertRaises(htmlhandling.ParseException, self.p.feed, '\\pi')
116 |
117 | def test_formula_contains_only_formula(self):
118 | p = htmlhandling.EqnParser()
119 | p.feed('
1
')
120 | formula = next(e for e in p.get_data() if isinstance(e, (list, tuple)))
121 | self.assertEqual(formula[-1], '1test')
125 | formula = next(e for e in p.get_data() if isinstance(e, (list, tuple)))
126 | self.assertEqual(formula[-1], 'test')
127 |
128 | p = htmlhandling.EqnParser()
129 | p.feed('