├── CHANGES.txt
├── .gitignore
├── MANIFEST.in
├── MANIFEST
├── setup.py
├── bin
├── converted_latex_sample.md
└── latex_sample.tex
├── README.md
├── README.txt
└── latex2markdown.py
/CHANGES.txt:
--------------------------------------------------------------------------------
1 | v0.2, 2012-03-31 -- Fixed release.
2 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | *.egg-info
2 | *.rst
3 | build/*
4 | dist/*
5 | *.log
6 | *.pdf
7 | *.aux
8 |
--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include *.txt
2 | recursive-include docs *.txt
3 | recursive-include bin *.tex
4 | recursive-include bin *.md
5 | recursive-include bin *.pdf
--------------------------------------------------------------------------------
/MANIFEST:
--------------------------------------------------------------------------------
1 | # file GENERATED by distutils, do NOT edit
2 | CHANGES.txt
3 | README.txt
4 | latex2markdown.py
5 | setup.py
6 | bin/converted_latex_sample.md
7 | bin/latex_sample.pdf
8 | bin/latex_sample.tex
9 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | from distutils.core import setup
2 |
3 | setup(
4 | name='latex2markdown',
5 | author="Andrew Tulloch",
6 | author_email="andrew@tullo.ch",
7 | version='0.2.1',
8 | py_modules=['latex2markdown'],
9 | scripts=['bin/converted_latex_sample.md','bin/latex_sample.tex'],
10 | url="https://github.com/ajtulloch/LaTeX2Markdown",
11 | description="An AMS-LaTeX compatible converter that maps a subset of LaTeX to Markdown/MathJaX.",
12 | classifiers=[
13 | "Development Status :: 3 - Alpha",
14 | "Environment :: Console",
15 | "Programming Language :: Python",
16 | "Topic :: Scientific/Engineering :: Mathematics",
17 | "Topic :: Software Development :: Documentation",
18 | "Topic :: Text Processing :: Markup",
19 | "Topic :: Text Processing :: Markup :: LaTeX",
20 | "Topic :: Text Processing :: Markup :: HTML"
21 | ],
22 | long_description=open("README.txt").read()
23 | )
24 |
--------------------------------------------------------------------------------
/bin/converted_latex_sample.md:
--------------------------------------------------------------------------------
1 | ### Simple Examples
2 |
3 |
4 | This section introduces the usage of the LaTeX2Markdown tool, showing an example of the various environments available.
5 |
6 | #### Theorem 1 (Euclid, 300 BC)
7 |
8 | > There are infinitely many primes.
9 |
10 |
11 | #### Proof
12 |
13 | Suppose that $p_1 < p_2 < \dots < p_n$ are all of the primes. Let $P = 1 + \prod_{i=1}^n p_i$ and let $p$ be a prime dividing $P$.
14 |
15 | Then $p$ can not be any of $p_i$, for otherwise $p$ would divide the difference $P - \left(\prod_{i=1}^n p_i \right) - 1$, which is impossible. So this prime $p$ is still another prime, and $p_1, p_2, \dots p_n$ cannot be all of the primes.
16 |
17 |
18 | #### Exercise 1
19 |
20 | > Give an alternative proof that there are an infinite number of prime numbers.
21 |
22 |
23 | To solve this exercise, we first introduce the following lemma.
24 | #### Lemma 1
25 |
26 | > The Fermat numbers $F_n = 2^{2^{n}} + 1$ are pairwise relatively prime.
27 |
28 |
29 | #### Proof
30 |
31 | It is easy to show by induction that
32 | \[ F_m - 2 = F_0 F_1 \dots F_{m-1}. \]
33 | This means that if $d$ divides both $F_n$ and $F_m$ (with $n < m$), then $d$ also divides $F_m - 2$. Hence, $d$ divides 2. But every Fermat number is odd, so $d$ is necessarily one. This proves the lemma.
34 |
35 |
36 | We can now provide a solution to the exercise.
37 |
38 | #### Theorem 2 (Goldbach, 1750)
39 |
40 | > There are infinitely many prime numbers.
41 |
42 |
43 | #### Proof
44 |
45 | Choose a prime divisor $p_n$ of each Fermat number $F_n$. By the lemma we know these primes are all distinct, showing there are infinitely many primes.
46 |
47 |
48 | ### Demonstration of the environments
49 |
50 |
51 | We can format *italic text*, **bold text**, and `code` blocks.
52 |
53 |
54 |
55 | 1. A numbered list item
56 | 1. Another numbered list item
57 |
58 |
59 |
60 |
61 | * A bulleted list item
62 | * Another bulleted list item
63 |
64 |
65 | #### Theorem 3
66 |
67 | > This is a theorem. It contains an `align` block.
68 | >
69 | > All math environments supported by MathJaX should work with LaTeX - a full list is available on the MathJaX homepage.
70 | >
71 | > Maxwell's equations, differential form.
72 | > \begin{align}
73 | > \nabla \cdot \mathbf{E} &= \frac {\rho} {\varepsilon_0} \\\\
74 | > \nabla \cdot \mathbf{B} &= 0 \\\\
75 | > \nabla \times \mathbf{E} &= -\frac{\partial \mathbf{B}} {\partial t} \\\\
76 | > \nabla \times \mathbf{B} &= \mu_0 \mathbf{J} + \mu_0 \varepsilon_0 \frac{\partial \mathbf{E}} {\partial t} \\\\
77 | > \end{align}
78 |
79 |
80 | #### Theorem 4 (Theorem name)
81 |
82 | > This is a named theorem.
83 |
84 |
85 | #### Lemma 2
86 |
87 | > This is a lemma.
88 |
89 |
90 | #### Proposition 1
91 |
92 | > This is a proposition
93 |
94 |
95 | #### Proof
96 |
97 | This is a proof.
98 |
99 |
100 |
101 |
102 | This is a code listing.
103 | One line of code
104 | Another line of code
--------------------------------------------------------------------------------
/bin/latex_sample.tex:
--------------------------------------------------------------------------------
1 | \documentclass[12pt]{amsart}
2 | \usepackage{amsthm, amsmath, amssymb}
3 | \usepackage{setspace}
4 | \usepackage{listings}
5 | \onehalfspacing
6 |
7 | \theoremstyle{plain}% default
8 | \newtheorem{thm}{Theorem}[section]
9 | \newtheorem{lem}[thm]{Lemma}
10 | \newtheorem{prop}[thm]{Proposition}
11 | \newtheorem{exer}[thm]{Exercise}
12 |
13 | \title{LaTeX2Markdown Examples}
14 | \author{Andrew Tulloch}
15 | \begin{document}
16 |
17 | % LaTeX2Markdown IGNORE
18 | \maketitle
19 | % LaTeX2Markdown END
20 |
21 | \section{Simple Examples}
22 |
23 | This section introduces the usage of the LaTeX2Markdown tool, showing an example of the various environments available.
24 |
25 | \begin{thm}[Euclid, 300 BC]
26 | There are infinitely many primes.
27 | \end{thm}
28 |
29 | \begin{proof}
30 | Suppose that $p_1 < p_2 < \dots < p_n$ are all of the primes. Let $P = 1 + \prod_{i=1}^n p_i$ and let $p$ be a prime dividing $P$.
31 |
32 | Then $p$ can not be any of $p_i$, for otherwise $p$ would divide the difference $P - \left(\prod_{i=1}^n p_i \right) - 1$, which is impossible. So this prime $p$ is still another prime, and $p_1, p_2, \dots p_n$ cannot be all of the primes.
33 | \end{proof}
34 |
35 | \begin{exer}
36 | Give an alternative proof that there are an infinite number of prime numbers.
37 | \end{exer}
38 |
39 | To solve this exercise, we first introduce the following lemma.
40 | \begin{lem}
41 | The Fermat numbers $F_n = 2^{2^{n}} + 1$ are pairwise relatively prime.
42 | \end{lem}
43 |
44 | \begin{proof}
45 | It is easy to show by induction that
46 | \[ F_m - 2 = F_0 F_1 \dots F_{m-1}. \]
47 | This means that if $d$ divides both $F_n$ and $F_m$ (with $n < m$), then $d$ also divides $F_m - 2$. Hence, $d$ divides 2. But every Fermat number is odd, so $d$ is necessarily one. This proves the lemma.
48 | \end{proof}
49 |
50 | We can now provide a solution to the exercise.
51 |
52 | \begin{thm}[Goldbach, 1750]
53 | There are infinitely many prime numbers.
54 | \end{thm}
55 |
56 | \begin{proof}
57 | Choose a prime divisor $p_n$ of each Fermat number $F_n$. By the lemma we know these primes are all distinct, showing there are infinitely many primes.
58 | \end{proof}
59 |
60 | \section{Demonstration of the environments}
61 |
62 | We can format \emph{italic text}, \textbf{bold text}, and \texttt{code} blocks.
63 |
64 | \begin{enumerate}
65 | \item A numbered list item
66 | \item Another numbered list item
67 | \end{enumerate}
68 |
69 | \begin{itemize}
70 | \item A bulleted list item
71 | \item Another bulleted list item
72 | \end{itemize}
73 |
74 | \begin{thm}
75 | This is a theorem. It contains an \texttt{align} block.
76 |
77 | All math environments supported by MathJaX should work with LaTeX - a full list is available on the MathJaX homepage.
78 |
79 | Maxwell's equations, differential form.
80 | \begin{align*}
81 | \nabla \cdot \mathbf{E} &= \frac {\rho} {\varepsilon_0} \\
82 | \nabla \cdot \mathbf{B} &= 0 \\
83 | \nabla \times \mathbf{E} &= -\frac{\partial \mathbf{B}} {\partial t} \\
84 | \nabla \times \mathbf{B} &= \mu_0 \mathbf{J} + \mu_0 \varepsilon_0 \frac{\partial \mathbf{E}} {\partial t} \\
85 | \end{align*}
86 | \end{thm}
87 |
88 | \begin{thm}[Theorem name]
89 | This is a named theorem.
90 | \end{thm}
91 |
92 | \begin{lem}
93 | This is a lemma.
94 | \end{lem}
95 |
96 | \begin{prop}
97 | This is a proposition
98 | \end{prop}
99 |
100 | \begin{proof}
101 | This is a proof.
102 | \end{proof}
103 |
104 | \begin{lstlisting}
105 | This is a code listing.
106 | One line of code
107 | Another line of code
108 | \end{lstlisting}
109 |
110 | \end{document}
111 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # LaTeX2Markdown
2 |
3 | An [AMS-LaTeX][amslatex] compatible converter from (a subset of) [LaTeX][latex] to [MathJaX][mathjax] compatible [Markdown][markdown].
4 |
5 | [amslatex]: http://en.wikipedia.org/wiki/AMS-LaTeX
6 | [latex]: http://www.latex-project.org/
7 | [mathjax]: http://www.mathjax.org/
8 | [markdown]: http://daringfireball.net/projects/markdown/
9 | [pandoc]: http://johnmacfarlane.net/pandoc/
10 | ## Who should use this?
11 |
12 | Anyone who writes LaTeX documents using the AMS-LaTeX packages (`amsmath`, `amsthm`, `amssymb`) and wants to convert these documents to Markdown format to use with MathJaX. These Markdown files can then be easily added to any web platform - Jekyll blogs, Wordpress, basic HTML sites, etc.
13 |
14 | In short, if you seek to use MathJaX to view your LaTeX documents online, then you might be interested in this.
15 |
16 | ## Demonstration
17 |
18 | Check out [tullo.ch/projects/LaTeX2Markdown](http://tullo.ch/projects/LaTeX2Markdown) for a live demonstration of the converter.
19 |
20 |
21 | ## Getting Started
22 |
23 | ### Installation
24 |
25 | The project is available on PyPI, so getting it is as simple as using
26 |
27 | pip install latex2markdown
28 |
29 | or
30 |
31 | easy_install latex2markdown
32 |
33 | ### Usage
34 |
35 | The utility can be called from the command line, or from within a Python script.
36 |
37 | For the command line, the syntax to convert a LaTeX file to a Markdown file is as follows:
38 |
39 | python -m latex2markdown path/to/latex/file path/to/output/markdown/file
40 |
41 | For example, to compile a LaTeX file `sample.tex` into a Markdown file `sample.md`, call
42 |
43 | python -m latex2markdown sample.tex sample.md
44 |
45 | To use it within a Python script (to extend it, modify output, etc.), you can use it as follows:
46 |
47 | import latex2markdown
48 | with open("latex_file.tex", "r") as f:
49 | latex_string = f.read()
50 |
51 | l2m = latex2markdown.LaTeX2Markdown(latex_string)
52 |
53 | markdown_string = l2m.to_markdown()
54 |
55 | with open("markdown_file.md", "w") as f:
56 | f.write(markdown_string)
57 |
58 | Finally, add the following snippet to your HTML when loading this document.
59 |
60 |
71 |
73 |
74 | For a working example, have a look at the source of the [tullo.ch](http://tullo.ch) homepage [here](https://github.com/ajtulloch/ajtulloch.github.com).
75 |
76 | ## Why not use Pandoc?
77 |
78 | [Pandoc][pandoc] is an excellent document converter for less complex LaTeX documents. Indeed, I've used it to convert this README document to a reST version for use on PyPI.
79 |
80 | Unfortunately, it is not designed to deal with documents that use the AMSTeX extensions - which include the theorem, lemma, proof, and exercise environments that are heavily used for typesetting papers, lecture notes, and other documents.
81 |
82 | As neither Pandoc nor MathJaX can deal with these documents, I hacked together a set of regular expressions that can convert a subset of LaTeX to Markdown, and used a few more to convert the sMarkdown to MathJaX-convertible Markdown.
83 |
84 | ## Example
85 |
86 | As an example, the following LaTeX code:
87 |
88 | \section{Example Section}
89 | \begin{thm}[Euclid]
90 | There are infinitely many primes.
91 | \end{thm}
92 |
93 | \begin{proof}
94 | Suppose that $p_1 < p_2 < \dots < p_n$ are all of the primes.
95 | Let $P = 1 + \prod_{i=1}^n p_i$ and let $p$ be a prime dividing $P$.
96 |
97 | Then $p$ can not be any of $p_i$, for otherwise $p$ would divide the
98 | difference $P - \left(\prod_{i=1}^n p_i \right) - 1$, which is impossible.
99 | So this prime $p$ is still another prime, and $p_1, p_2, \dots p_n$
100 | cannot be all of the primes.
101 | \end{proof}
102 |
103 | is converted into the following Markdown:
104 |
105 | ### Example Section
106 | #### Theorem 1 (Euclid)
107 |
108 | > There are infinitely many primes.
109 |
110 | #### Proof
111 |
112 | Suppose that $p_1 < p_2 < \dots < p_n$ are all of the primes.
113 | Let $P = 1 + \prod_{i=1}^n p_i$ and let $p$ be a prime dividing $P$.
114 |
115 | Then $p$ can not be any of $p_i$, for otherwise $p$ would divide the difference
116 | $P - \left(\prod_{i=1}^n p_i \right) - 1$, which is impossible. So this prime
117 | $p$ is still another prime, and $p_1, p_2, \dots p_n$ cannot be all of the primes.
118 |
119 |
120 | ## Supported LaTeX/AMSTeX Environments
121 |
122 | * `emph`, `textbf`, `texttt`
123 | * `thm`
124 | * `prop`
125 | * `lem`
126 | * `exer `
127 | * `proof`
128 | * `chapter`
129 | * `section`
130 | * `subsection`
131 | * `itemize`
132 | * `enumerate`
133 |
134 | along with everything supported by MathJax - list available [online](http://www.mathjax.org/docs/2.0/tex.html#supported-latex-commands).
135 |
136 |
137 | [](https://bitdeli.com/free "Bitdeli Badge")
138 |
139 |
--------------------------------------------------------------------------------
/README.txt:
--------------------------------------------------------------------------------
1 | LaTeX2Markdown
2 | ==============
3 |
4 | An `AMS-LaTeX `_ compatible
5 | converter from (a subset of) `LaTeX `_ to
6 | `MathJaX `_ compatible
7 | `Markdown `_.
8 |
9 | Anyone who writes LaTeX documents using the AMS-LaTeX packages
10 | (``amsmath``, ``amsthm``, ``amssymb``) and wants to convert these
11 | documents to Markdown format to use with MathJaX. These Markdown files
12 | can then be easily added to any web platform - Jekyll blogs, Wordpress,
13 | basic HTML sites, etc.
14 |
15 | In short, if you seek to use MathJaX to view your LaTeX documents
16 | online, then you might be interested in this.
17 |
18 | Demonstration
19 | -------------
20 |
21 | Check out
22 | `tullo.ch/projects/LaTeX2Markdown `_
23 | for a live demonstration of the converter.
24 |
25 | Getting Started
26 | ---------------
27 |
28 | Installation
29 | ~~~~~~~~~~~~
30 |
31 | The project is available on PyPI, so getting it is as simple as using
32 |
33 | ::
34 |
35 | pip install latex2markdown
36 |
37 | or
38 |
39 | ::
40 |
41 | easy_install latex2markdown
42 |
43 | Usage
44 | ~~~~~
45 |
46 | The utility can be called from the command line, or from within a Python
47 | script.
48 |
49 | For the command line, the syntax to convert a LaTeX file to a Markdown
50 | file is as follows:
51 |
52 | ::
53 |
54 | python -m latex2markdown path/to/latex/file path/to/output/markdown/file
55 |
56 | For example, to compile a LaTeX file ``sample.tex`` into a Markdown file
57 | ``sample.md``, call
58 |
59 | ::
60 |
61 | python -m latex2markdown sample.tex sample.md
62 |
63 | To use it within a Python script (to extend it, modify output, etc.),
64 | you can use it as follows:
65 |
66 | ::
67 |
68 | import latex2markdown
69 | with open("latex_file.tex", "r") as f:
70 | latex_string = f.read()
71 |
72 | l2m = latex2markdown.LaTeX2Markdown(latex_string)
73 |
74 | markdown_string = l2m.to_markdown()
75 |
76 | with open("markdown_file.md", "w") as f:
77 | f.write(markdown_string)
78 |
79 | Finally, add the following snippet to your HTML when loading this
80 | document.
81 |
82 | ::
83 |
84 |
95 |
97 |
98 | For a working example, have a look at the source of the
99 | `tullo.ch `_ homepage
100 | `here `_.
101 |
102 | Why not use Pandoc?
103 | -------------------
104 |
105 | `Pandoc `_
106 | is an excellent document converter for less complex LaTeX documents.
107 | Indeed, I've used it to convert this README document to a reST version
108 | for use on PyPI.
109 |
110 | Unfortunately, it is not designed to deal with documents that use the
111 | AMSTeX extensions - which include the theorem, lemma, proof, and
112 | exercise environments that are heavily used for typesetting papers,
113 | lecture notes, and other documents.
114 |
115 | As neither Pandoc nor MathJaX can deal with these documents, I hacked
116 | together a set of regular expressions that can convert a subset of LaTeX
117 | to Markdown, and used a few more to convert the sMarkdown to
118 | MathJaX-convertible Markdown.
119 |
120 | Example
121 | -------
122 |
123 | As an example, the following LaTeX code:
124 |
125 | ::
126 |
127 | \section{Example Section}
128 | \begin{thm}[Euclid]
129 | There are infinitely many primes.
130 | \end{thm}
131 |
132 | \begin{proof}
133 | Suppose that $p_1 < p_2 < \dots < p_n$ are all of the primes.
134 | Let $P = 1 + \prod_{i=1}^n p_i$ and let $p$ be a prime dividing $P$.
135 |
136 | Then $p$ can not be any of $p_i$, for otherwise $p$ would divide the
137 | difference $P - \left(\prod_{i=1}^n p_i \right) - 1$, which is impossible.
138 | So this prime $p$ is still another prime, and $p_1, p_2, \dots p_n$
139 | cannot be all of the primes.
140 | \end{proof}
141 |
142 | is converted into the following Markdown:
143 |
144 | ::
145 |
146 | ### Example Section
147 | #### Theorem 1 (Euclid)
148 |
149 | > There are infinitely many primes.
150 |
151 | #### Proof
152 |
153 | Suppose that $p_1 < p_2 < \dots < p_n$ are all of the primes.
154 | Let $P = 1 + \prod_{i=1}^n p_i$ and let $p$ be a prime dividing $P$.
155 |
156 | Then $p$ can not be any of $p_i$, for otherwise $p$ would divide the difference
157 | $P - \left(\prod_{i=1}^n p_i \right) - 1$, which is impossible. So this prime
158 | $p$ is still another prime, and $p_1, p_2, \dots p_n$ cannot be all of the primes.
159 |
160 | Supported LaTeX/AMSTeX Environments
161 | -----------------------------------
162 |
163 | - ``emph``, ``textbf``, ``texttt``
164 | - ``thm``
165 | - ``prop``
166 | - ``lem``
167 | - ``exer``
168 | - ``proof``
169 | - ``chapter``
170 | - ``section``
171 | - ``subsection``
172 | - ``itemize``
173 | - ``enumerate``
174 |
175 | along with everything supported by MathJax - list available
176 | `online `_.
177 |
--------------------------------------------------------------------------------
/latex2markdown.py:
--------------------------------------------------------------------------------
1 | import re
2 | from collections import defaultdict
3 |
4 | #------------------------------------------------------------------------------
5 |
6 | # Basic configuration - modify this to change output formatting
7 | _block_configuration = {
8 | "chapter": {
9 | "markdown_heading": "##",
10 | "pretty_name": "",
11 | "show_count": False
12 | },
13 | "enumerate": {
14 | "line_indent_char": "",
15 | "list_heading": "1. ",
16 | "markdown_heading": "",
17 | "pretty_name": "",
18 | "show_count": False
19 | },
20 | "exer": {
21 | "line_indent_char": "> ",
22 | "markdown_heading": "####",
23 | "pretty_name": "Exercise",
24 | "show_count": True
25 | },
26 | "itemize": {
27 | "line_indent_char": "",
28 | "list_heading": "* ",
29 | "markdown_heading": "",
30 | "pretty_name": "",
31 | "show_count": False
32 | },
33 | "lem": {
34 | "line_indent_char": "> ",
35 | "markdown_heading": "####",
36 | "pretty_name": "Lemma",
37 | "show_count": True
38 | },
39 | "lstlisting": {
40 | "line_indent_char": " ",
41 | "markdown_heading": "",
42 | "pretty_name": "",
43 | "show_count": False
44 | },
45 | "proof": {
46 | "line_indent_char": "",
47 | "markdown_heading": "####",
48 | "pretty_name": "Proof",
49 | "show_count": False
50 | },
51 | "prop": {
52 | "line_indent_char": "> ",
53 | "markdown_heading": "####",
54 | "pretty_name": "Proposition",
55 | "show_count": True
56 | },
57 | "section": {
58 | "markdown_heading": "###",
59 | "pretty_name": "",
60 | "show_count": False
61 | },
62 | "subsection": {
63 | "markdown_heading": "####",
64 | "pretty_name": "",
65 | "show_count": False
66 | },
67 | "thm": {
68 | "line_indent_char": "> ",
69 | "markdown_heading": "####",
70 | "pretty_name": "Theorem",
71 | "show_count": True
72 | }
73 | }
74 |
75 | #------------------------------------------------------------------------------
76 |
77 | class LaTeX2Markdown(object):
78 | """Initialise with a LaTeX string - see the main routine for examples of
79 | reading this string from an existing .tex file.
80 |
81 | To modify the outputted markdown, modify the _block_configuration variable
82 | before initializing the LaTeX2Markdown instance."""
83 | def __init__(self, latex_string,
84 | block_configuration = _block_configuration,
85 | block_counter = defaultdict(lambda: 1)):
86 |
87 | self._block_configuration = block_configuration
88 | self._latex_string = latex_string
89 | self._block_counter = block_counter
90 |
91 | # Precompile the regexes
92 |
93 | # Select everything in the main matter
94 | self._main_re = re.compile(r"""\\begin{document}
95 | (?P.*)
96 | \\end{document}""",
97 | flags=re.DOTALL + re.VERBOSE)
98 |
99 | # Select all our block materials.
100 | self._block_re = re.compile(r"""\\begin{(?Pexer|proof|thm|lem|prop)} # block name
101 | (\[(?P.*?)\])? # Optional block title
102 | (?P.*?) # Non-greedy block contents
103 | \\end{(?P=block_name)}""", # closing block
104 | flags=re.DOTALL + re.VERBOSE)
105 |
106 | # Select all our list blocks
107 | self._lists_re = re.compile(r"""\\begin{(?Penumerate|itemize)} # list name
108 | (\[.*?\])? # Optional enumerate settings i.e. (a)
109 | (?P.*?) # Non-greedy list contents
110 | \\end{(?P=block_name)}""", # closing list
111 | flags=re.DOTALL + re.VERBOSE)
112 |
113 | # Select all our headers
114 | self._header_re = re.compile(r"""\\(?Pchapter|section|subsection) # Header
115 | {(?P.*?)}""", # Header title
116 | flags=re.DOTALL + re.VERBOSE)
117 |
118 | # Select all our 'auxillary blocks' - these need special treatment
119 | # for future use - e.g. pygments highlighting instead of code blocks
120 | # in Markdown
121 | self._aux_block_re = re.compile(r"""\\begin{(?Plstlisting)} # block name
122 | (?P.*?) # Non-greedy block contents
123 | \\end{(?P=block_name)}""", # closing block
124 | flags=re.DOTALL + re.VERBOSE)
125 |
126 | def _replace_header(self, matchobj):
127 | """Creates a header string for a section/subsection/chapter match.
128 | For example, "### 2 - Integral Calculus\n" """
129 |
130 | header_name = matchobj.group('header_name')
131 | header_contents = matchobj.group('header_contents')
132 |
133 | header = self._format_block_name(header_name)
134 |
135 | block_config = self._block_configuration[header_name]
136 |
137 | # If we have a count, separate the title from the count with a dash
138 | separator = "-" if block_config.get("show_count") else ""
139 |
140 | output_str = "{header} {separator} {title}\n".format(
141 | header=header,
142 | title=header_contents,
143 | separator=separator)
144 |
145 | return output_str
146 |
147 | def _replace_block(self, matchobj):
148 | """Create a string that replaces an entire block.
149 | The string consists of a header (e.g. ### Exercise 1)
150 | and a block, containing the LaTeX code.
151 |
152 | The block may be optionally indented, blockquoted, etc.
153 | These settings are customizable through the config.json
154 | file"""
155 |
156 | block_name = matchobj.group('block_name')
157 | block_contents = matchobj.group('block_contents')
158 | # Block title may not exist, so use .get method
159 | block_title = matchobj.groupdict().get('block_title')
160 |
161 | # We have to format differently for lists
162 | if block_name in {"itemize", "enumerate"}:
163 | formatted_contents = self._format_list_contents(block_name,
164 | block_contents)
165 | else:
166 | formatted_contents = self._format_block_contents(block_name,
167 | block_contents)
168 |
169 | header = self._format_block_name(block_name, block_title)
170 |
171 | output_str = "{header}\n\n{block_contents}".format(
172 | header=header,
173 | block_contents=formatted_contents)
174 | return output_str
175 |
176 |
177 | def _format_block_contents(self, block_name, block_contents):
178 | """Format the contents of a block with configuration parameters
179 | provided in the self._block_configuration attribute"""
180 |
181 | block_config = self._block_configuration[block_name]
182 |
183 | line_indent_char = block_config["line_indent_char"]
184 |
185 | output_str = ""
186 | for line in block_contents.lstrip().rstrip().split("\n"):
187 | line = line.lstrip().rstrip()
188 | indented_line = line_indent_char + line + "\n"
189 | output_str += indented_line
190 | return output_str
191 |
192 | def _format_list_contents(self, block_name, block_contents):
193 | """To format a list, we must remove the \item declaration in the
194 | LaTeX source. All else is as in the _format_block_contents method."""
195 | block_config = self._block_configuration[block_name]
196 |
197 | list_heading = block_config["list_heading"]
198 |
199 | output_str = ""
200 | for line in block_contents.lstrip().rstrip().split("\n"):
201 | line = line.lstrip().rstrip()
202 | markdown_list_line = line.replace(r"\item", list_heading)
203 | output_str += markdown_list_line + "\n"
204 | return output_str
205 |
206 | def _format_block_name(self, block_name, block_title=None):
207 | """Format the Markdown header associated with a block.
208 | Due to the optional block_title, we split the string construction
209 | into two parts."""
210 |
211 | block_config = self._block_configuration[block_name]
212 | pretty_name = block_config["pretty_name"]
213 | show_count = block_config["show_count"]
214 | markdown_heading = block_config["markdown_heading"]
215 |
216 | block_count = self._block_counter[block_name] if show_count else ""
217 | self._block_counter[block_name] += 1
218 |
219 | output_str = "{markdown_heading} {pretty_name} {block_count}".format(
220 | markdown_heading=markdown_heading,
221 | pretty_name=pretty_name,
222 | block_count=block_count)
223 |
224 | if block_title:
225 | output_str = "{output_str} ({block_title})".format(
226 | output_str=output_str,
227 | block_title=block_title)
228 |
229 | return output_str.lstrip().rstrip()
230 |
231 | def _latex_to_markdown(self):
232 | """Main function, returns the formatted Markdown as a string.
233 | Uses a lot of custom regexes to fix a lot of content - you may have
234 | to add or remove some regexes to suit your own needs."""
235 |
236 | # Get main content, skipping preamble and closing tags.
237 | try:
238 | output = self._main_re.search(self._latex_string).group("main")
239 | except AttributeError:
240 | output = self._latex_string
241 |
242 | # Reformat, lists, blocks, and headers.
243 | output = self._lists_re.sub(self._replace_block, output)
244 | output = self._block_re.sub(self._replace_block, output)
245 | output = self._header_re.sub(self._replace_header, output)
246 | output = self._aux_block_re.sub(self._replace_block, output)
247 |
248 | # Fix \\ formatting for line breaks in align blocks
249 | output = re.sub(r" \\\\", r" \\\\\\\\", output)
250 | # Convert align* block to align - this fixes formatting
251 | output = re.sub(r"align\*", r"align", output)
252 |
253 | # Fix emph, textbf, texttt formatting
254 | output = re.sub(r"\\emph{(.*?)}", r"*\1*", output)
255 | output = re.sub(r"\\textbf{(.*?)}", r"**\1**", output)
256 | output = re.sub(r"\\texttt{(.*?)}", r"`\1`", output)
257 |
258 | # Fix \% formatting
259 | output = re.sub(r"\\%", r"%", output)
260 | # Fix argmax, etc.
261 | output = re.sub(r"\\arg(max|min)", r"\\text{arg\1}", output)
262 |
263 | # Throw away content in IGNORE/END block
264 | output = re.sub(r"% LaTeX2Markdown IGNORE(.*?)\% LaTeX2Markdown END",
265 | "", output, flags=re.DOTALL)
266 | return output.lstrip().rstrip()
267 |
268 | def to_markdown(self):
269 | return self._latex_to_markdown()
270 |
271 | def to_latex(self):
272 | return self._latex_string
273 |
274 | #------------------------------------------------------------------------------
275 |
276 | if __name__ == '__main__':
277 | import sys
278 | if len(sys.argv) == 1:
279 | input_file = "bin/latex_sample.tex"
280 | output_file = "bin/converted_latex_sample.md"
281 | else:
282 | input_file, output_file = sys.argv[1], sys.argv[2]
283 |
284 | with open(input_file, 'r') as f:
285 | latex_string = f.read()
286 | y = LaTeX2Markdown(latex_string)
287 | markdown_string = y.to_markdown()
288 | with open(output_file, 'w') as f_out:
289 | f_out.write(markdown_string)
290 |
--------------------------------------------------------------------------------