├── CHANGES.txt ├── .gitignore ├── MANIFEST.in ├── MANIFEST ├── setup.py ├── bin ├── converted_latex_sample.md └── latex_sample.tex ├── README.md ├── README.txt └── latex2markdown.py /CHANGES.txt: -------------------------------------------------------------------------------- 1 | v0.2, 2012-03-31 -- Fixed release. 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.egg-info 2 | *.rst 3 | build/* 4 | dist/* 5 | *.log 6 | *.pdf 7 | *.aux 8 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include *.txt 2 | recursive-include docs *.txt 3 | recursive-include bin *.tex 4 | recursive-include bin *.md 5 | recursive-include bin *.pdf -------------------------------------------------------------------------------- /MANIFEST: -------------------------------------------------------------------------------- 1 | # file GENERATED by distutils, do NOT edit 2 | CHANGES.txt 3 | README.txt 4 | latex2markdown.py 5 | setup.py 6 | bin/converted_latex_sample.md 7 | bin/latex_sample.pdf 8 | bin/latex_sample.tex 9 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from distutils.core import setup 2 | 3 | setup( 4 | name='latex2markdown', 5 | author="Andrew Tulloch", 6 | author_email="andrew@tullo.ch", 7 | version='0.2.1', 8 | py_modules=['latex2markdown'], 9 | scripts=['bin/converted_latex_sample.md','bin/latex_sample.tex'], 10 | url="https://github.com/ajtulloch/LaTeX2Markdown", 11 | description="An AMS-LaTeX compatible converter that maps a subset of LaTeX to Markdown/MathJaX.", 12 | classifiers=[ 13 | "Development Status :: 3 - Alpha", 14 | "Environment :: Console", 15 | "Programming Language :: Python", 16 | "Topic :: Scientific/Engineering :: Mathematics", 17 | "Topic :: Software Development :: Documentation", 18 | "Topic :: Text Processing :: Markup", 19 | "Topic :: Text Processing :: Markup :: LaTeX", 20 | "Topic :: Text Processing :: Markup :: HTML" 21 | ], 22 | long_description=open("README.txt").read() 23 | ) 24 | -------------------------------------------------------------------------------- /bin/converted_latex_sample.md: -------------------------------------------------------------------------------- 1 | ### Simple Examples 2 | 3 | 4 | This section introduces the usage of the LaTeX2Markdown tool, showing an example of the various environments available. 5 | 6 | #### Theorem 1 (Euclid, 300 BC) 7 | 8 | > There are infinitely many primes. 9 | 10 | 11 | #### Proof 12 | 13 | Suppose that $p_1 < p_2 < \dots < p_n$ are all of the primes. Let $P = 1 + \prod_{i=1}^n p_i$ and let $p$ be a prime dividing $P$. 14 | 15 | Then $p$ can not be any of $p_i$, for otherwise $p$ would divide the difference $P - \left(\prod_{i=1}^n p_i \right) - 1$, which is impossible. So this prime $p$ is still another prime, and $p_1, p_2, \dots p_n$ cannot be all of the primes. 16 | 17 | 18 | #### Exercise 1 19 | 20 | > Give an alternative proof that there are an infinite number of prime numbers. 21 | 22 | 23 | To solve this exercise, we first introduce the following lemma. 24 | #### Lemma 1 25 | 26 | > The Fermat numbers $F_n = 2^{2^{n}} + 1$ are pairwise relatively prime. 27 | 28 | 29 | #### Proof 30 | 31 | It is easy to show by induction that 32 | \[ F_m - 2 = F_0 F_1 \dots F_{m-1}. \] 33 | This means that if $d$ divides both $F_n$ and $F_m$ (with $n < m$), then $d$ also divides $F_m - 2$. Hence, $d$ divides 2. But every Fermat number is odd, so $d$ is necessarily one. This proves the lemma. 34 | 35 | 36 | We can now provide a solution to the exercise. 37 | 38 | #### Theorem 2 (Goldbach, 1750) 39 | 40 | > There are infinitely many prime numbers. 41 | 42 | 43 | #### Proof 44 | 45 | Choose a prime divisor $p_n$ of each Fermat number $F_n$. By the lemma we know these primes are all distinct, showing there are infinitely many primes. 46 | 47 | 48 | ### Demonstration of the environments 49 | 50 | 51 | We can format *italic text*, **bold text**, and `code` blocks. 52 | 53 | 54 | 55 | 1. A numbered list item 56 | 1. Another numbered list item 57 | 58 | 59 | 60 | 61 | * A bulleted list item 62 | * Another bulleted list item 63 | 64 | 65 | #### Theorem 3 66 | 67 | > This is a theorem. It contains an `align` block. 68 | > 69 | > All math environments supported by MathJaX should work with LaTeX - a full list is available on the MathJaX homepage. 70 | > 71 | > Maxwell's equations, differential form. 72 | > \begin{align} 73 | > \nabla \cdot \mathbf{E} &= \frac {\rho} {\varepsilon_0} \\\\ 74 | > \nabla \cdot \mathbf{B} &= 0 \\\\ 75 | > \nabla \times \mathbf{E} &= -\frac{\partial \mathbf{B}} {\partial t} \\\\ 76 | > \nabla \times \mathbf{B} &= \mu_0 \mathbf{J} + \mu_0 \varepsilon_0 \frac{\partial \mathbf{E}} {\partial t} \\\\ 77 | > \end{align} 78 | 79 | 80 | #### Theorem 4 (Theorem name) 81 | 82 | > This is a named theorem. 83 | 84 | 85 | #### Lemma 2 86 | 87 | > This is a lemma. 88 | 89 | 90 | #### Proposition 1 91 | 92 | > This is a proposition 93 | 94 | 95 | #### Proof 96 | 97 | This is a proof. 98 | 99 | 100 | 101 | 102 | This is a code listing. 103 | One line of code 104 | Another line of code -------------------------------------------------------------------------------- /bin/latex_sample.tex: -------------------------------------------------------------------------------- 1 | \documentclass[12pt]{amsart} 2 | \usepackage{amsthm, amsmath, amssymb} 3 | \usepackage{setspace} 4 | \usepackage{listings} 5 | \onehalfspacing 6 | 7 | \theoremstyle{plain}% default 8 | \newtheorem{thm}{Theorem}[section] 9 | \newtheorem{lem}[thm]{Lemma} 10 | \newtheorem{prop}[thm]{Proposition} 11 | \newtheorem{exer}[thm]{Exercise} 12 | 13 | \title{LaTeX2Markdown Examples} 14 | \author{Andrew Tulloch} 15 | \begin{document} 16 | 17 | % LaTeX2Markdown IGNORE 18 | \maketitle 19 | % LaTeX2Markdown END 20 | 21 | \section{Simple Examples} 22 | 23 | This section introduces the usage of the LaTeX2Markdown tool, showing an example of the various environments available. 24 | 25 | \begin{thm}[Euclid, 300 BC] 26 | There are infinitely many primes. 27 | \end{thm} 28 | 29 | \begin{proof} 30 | Suppose that $p_1 < p_2 < \dots < p_n$ are all of the primes. Let $P = 1 + \prod_{i=1}^n p_i$ and let $p$ be a prime dividing $P$. 31 | 32 | Then $p$ can not be any of $p_i$, for otherwise $p$ would divide the difference $P - \left(\prod_{i=1}^n p_i \right) - 1$, which is impossible. So this prime $p$ is still another prime, and $p_1, p_2, \dots p_n$ cannot be all of the primes. 33 | \end{proof} 34 | 35 | \begin{exer} 36 | Give an alternative proof that there are an infinite number of prime numbers. 37 | \end{exer} 38 | 39 | To solve this exercise, we first introduce the following lemma. 40 | \begin{lem} 41 | The Fermat numbers $F_n = 2^{2^{n}} + 1$ are pairwise relatively prime. 42 | \end{lem} 43 | 44 | \begin{proof} 45 | It is easy to show by induction that 46 | \[ F_m - 2 = F_0 F_1 \dots F_{m-1}. \] 47 | This means that if $d$ divides both $F_n$ and $F_m$ (with $n < m$), then $d$ also divides $F_m - 2$. Hence, $d$ divides 2. But every Fermat number is odd, so $d$ is necessarily one. This proves the lemma. 48 | \end{proof} 49 | 50 | We can now provide a solution to the exercise. 51 | 52 | \begin{thm}[Goldbach, 1750] 53 | There are infinitely many prime numbers. 54 | \end{thm} 55 | 56 | \begin{proof} 57 | Choose a prime divisor $p_n$ of each Fermat number $F_n$. By the lemma we know these primes are all distinct, showing there are infinitely many primes. 58 | \end{proof} 59 | 60 | \section{Demonstration of the environments} 61 | 62 | We can format \emph{italic text}, \textbf{bold text}, and \texttt{code} blocks. 63 | 64 | \begin{enumerate} 65 | \item A numbered list item 66 | \item Another numbered list item 67 | \end{enumerate} 68 | 69 | \begin{itemize} 70 | \item A bulleted list item 71 | \item Another bulleted list item 72 | \end{itemize} 73 | 74 | \begin{thm} 75 | This is a theorem. It contains an \texttt{align} block. 76 | 77 | All math environments supported by MathJaX should work with LaTeX - a full list is available on the MathJaX homepage. 78 | 79 | Maxwell's equations, differential form. 80 | \begin{align*} 81 | \nabla \cdot \mathbf{E} &= \frac {\rho} {\varepsilon_0} \\ 82 | \nabla \cdot \mathbf{B} &= 0 \\ 83 | \nabla \times \mathbf{E} &= -\frac{\partial \mathbf{B}} {\partial t} \\ 84 | \nabla \times \mathbf{B} &= \mu_0 \mathbf{J} + \mu_0 \varepsilon_0 \frac{\partial \mathbf{E}} {\partial t} \\ 85 | \end{align*} 86 | \end{thm} 87 | 88 | \begin{thm}[Theorem name] 89 | This is a named theorem. 90 | \end{thm} 91 | 92 | \begin{lem} 93 | This is a lemma. 94 | \end{lem} 95 | 96 | \begin{prop} 97 | This is a proposition 98 | \end{prop} 99 | 100 | \begin{proof} 101 | This is a proof. 102 | \end{proof} 103 | 104 | \begin{lstlisting} 105 | This is a code listing. 106 | One line of code 107 | Another line of code 108 | \end{lstlisting} 109 | 110 | \end{document} 111 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # LaTeX2Markdown 2 | 3 | An [AMS-LaTeX][amslatex] compatible converter from (a subset of) [LaTeX][latex] to [MathJaX][mathjax] compatible [Markdown][markdown]. 4 | 5 | [amslatex]: http://en.wikipedia.org/wiki/AMS-LaTeX 6 | [latex]: http://www.latex-project.org/ 7 | [mathjax]: http://www.mathjax.org/ 8 | [markdown]: http://daringfireball.net/projects/markdown/ 9 | [pandoc]: http://johnmacfarlane.net/pandoc/ 10 | ## Who should use this? 11 | 12 | Anyone who writes LaTeX documents using the AMS-LaTeX packages (`amsmath`, `amsthm`, `amssymb`) and wants to convert these documents to Markdown format to use with MathJaX. These Markdown files can then be easily added to any web platform - Jekyll blogs, Wordpress, basic HTML sites, etc. 13 | 14 | In short, if you seek to use MathJaX to view your LaTeX documents online, then you might be interested in this. 15 | 16 | ## Demonstration 17 | 18 | Check out [tullo.ch/projects/LaTeX2Markdown](http://tullo.ch/projects/LaTeX2Markdown) for a live demonstration of the converter. 19 | 20 | 21 | ## Getting Started 22 | 23 | ### Installation 24 | 25 | The project is available on PyPI, so getting it is as simple as using 26 | 27 | pip install latex2markdown 28 | 29 | or 30 | 31 | easy_install latex2markdown 32 | 33 | ### Usage 34 | 35 | The utility can be called from the command line, or from within a Python script. 36 | 37 | For the command line, the syntax to convert a LaTeX file to a Markdown file is as follows: 38 | 39 | python -m latex2markdown path/to/latex/file path/to/output/markdown/file 40 | 41 | For example, to compile a LaTeX file `sample.tex` into a Markdown file `sample.md`, call 42 | 43 | python -m latex2markdown sample.tex sample.md 44 | 45 | To use it within a Python script (to extend it, modify output, etc.), you can use it as follows: 46 | 47 | import latex2markdown 48 | with open("latex_file.tex", "r") as f: 49 | latex_string = f.read() 50 | 51 | l2m = latex2markdown.LaTeX2Markdown(latex_string) 52 | 53 | markdown_string = l2m.to_markdown() 54 | 55 | with open("markdown_file.md", "w") as f: 56 | f.write(markdown_string) 57 | 58 | Finally, add the following snippet to your HTML when loading this document. 59 | 60 | 71 | 73 | 74 | For a working example, have a look at the source of the [tullo.ch](http://tullo.ch) homepage [here](https://github.com/ajtulloch/ajtulloch.github.com). 75 | 76 | ## Why not use Pandoc? 77 | 78 | [Pandoc][pandoc] is an excellent document converter for less complex LaTeX documents. Indeed, I've used it to convert this README document to a reST version for use on PyPI. 79 | 80 | Unfortunately, it is not designed to deal with documents that use the AMSTeX extensions - which include the theorem, lemma, proof, and exercise environments that are heavily used for typesetting papers, lecture notes, and other documents. 81 | 82 | As neither Pandoc nor MathJaX can deal with these documents, I hacked together a set of regular expressions that can convert a subset of LaTeX to Markdown, and used a few more to convert the sMarkdown to MathJaX-convertible Markdown. 83 | 84 | ## Example 85 | 86 | As an example, the following LaTeX code: 87 | 88 | \section{Example Section} 89 | \begin{thm}[Euclid] 90 | There are infinitely many primes. 91 | \end{thm} 92 | 93 | \begin{proof} 94 | Suppose that $p_1 < p_2 < \dots < p_n$ are all of the primes. 95 | Let $P = 1 + \prod_{i=1}^n p_i$ and let $p$ be a prime dividing $P$. 96 | 97 | Then $p$ can not be any of $p_i$, for otherwise $p$ would divide the 98 | difference $P - \left(\prod_{i=1}^n p_i \right) - 1$, which is impossible. 99 | So this prime $p$ is still another prime, and $p_1, p_2, \dots p_n$ 100 | cannot be all of the primes. 101 | \end{proof} 102 | 103 | is converted into the following Markdown: 104 | 105 | ### Example Section 106 | #### Theorem 1 (Euclid) 107 | 108 | > There are infinitely many primes. 109 | 110 | #### Proof 111 | 112 | Suppose that $p_1 < p_2 < \dots < p_n$ are all of the primes. 113 | Let $P = 1 + \prod_{i=1}^n p_i$ and let $p$ be a prime dividing $P$. 114 | 115 | Then $p$ can not be any of $p_i$, for otherwise $p$ would divide the difference 116 | $P - \left(\prod_{i=1}^n p_i \right) - 1$, which is impossible. So this prime 117 | $p$ is still another prime, and $p_1, p_2, \dots p_n$ cannot be all of the primes. 118 | 119 | 120 | ## Supported LaTeX/AMSTeX Environments 121 | 122 | * `emph`, `textbf`, `texttt` 123 | * `thm` 124 | * `prop` 125 | * `lem` 126 | * `exer ` 127 | * `proof` 128 | * `chapter` 129 | * `section` 130 | * `subsection` 131 | * `itemize` 132 | * `enumerate` 133 | 134 | along with everything supported by MathJax - list available [online](http://www.mathjax.org/docs/2.0/tex.html#supported-latex-commands). 135 | 136 | 137 | [![Bitdeli Badge](https://d2weczhvl823v0.cloudfront.net/ajtulloch/latex2markdown/trend.png)](https://bitdeli.com/free "Bitdeli Badge") 138 | 139 | -------------------------------------------------------------------------------- /README.txt: -------------------------------------------------------------------------------- 1 | LaTeX2Markdown 2 | ============== 3 | 4 | An `AMS-LaTeX `_ compatible 5 | converter from (a subset of) `LaTeX `_ to 6 | `MathJaX `_ compatible 7 | `Markdown `_. 8 | 9 | Anyone who writes LaTeX documents using the AMS-LaTeX packages 10 | (``amsmath``, ``amsthm``, ``amssymb``) and wants to convert these 11 | documents to Markdown format to use with MathJaX. These Markdown files 12 | can then be easily added to any web platform - Jekyll blogs, Wordpress, 13 | basic HTML sites, etc. 14 | 15 | In short, if you seek to use MathJaX to view your LaTeX documents 16 | online, then you might be interested in this. 17 | 18 | Demonstration 19 | ------------- 20 | 21 | Check out 22 | `tullo.ch/projects/LaTeX2Markdown `_ 23 | for a live demonstration of the converter. 24 | 25 | Getting Started 26 | --------------- 27 | 28 | Installation 29 | ~~~~~~~~~~~~ 30 | 31 | The project is available on PyPI, so getting it is as simple as using 32 | 33 | :: 34 | 35 | pip install latex2markdown 36 | 37 | or 38 | 39 | :: 40 | 41 | easy_install latex2markdown 42 | 43 | Usage 44 | ~~~~~ 45 | 46 | The utility can be called from the command line, or from within a Python 47 | script. 48 | 49 | For the command line, the syntax to convert a LaTeX file to a Markdown 50 | file is as follows: 51 | 52 | :: 53 | 54 | python -m latex2markdown path/to/latex/file path/to/output/markdown/file 55 | 56 | For example, to compile a LaTeX file ``sample.tex`` into a Markdown file 57 | ``sample.md``, call 58 | 59 | :: 60 | 61 | python -m latex2markdown sample.tex sample.md 62 | 63 | To use it within a Python script (to extend it, modify output, etc.), 64 | you can use it as follows: 65 | 66 | :: 67 | 68 | import latex2markdown 69 | with open("latex_file.tex", "r") as f: 70 | latex_string = f.read() 71 | 72 | l2m = latex2markdown.LaTeX2Markdown(latex_string) 73 | 74 | markdown_string = l2m.to_markdown() 75 | 76 | with open("markdown_file.md", "w") as f: 77 | f.write(markdown_string) 78 | 79 | Finally, add the following snippet to your HTML when loading this 80 | document. 81 | 82 | :: 83 | 84 | 95 | 97 | 98 | For a working example, have a look at the source of the 99 | `tullo.ch `_ homepage 100 | `here `_. 101 | 102 | Why not use Pandoc? 103 | ------------------- 104 | 105 | `Pandoc `_ 106 | is an excellent document converter for less complex LaTeX documents. 107 | Indeed, I've used it to convert this README document to a reST version 108 | for use on PyPI. 109 | 110 | Unfortunately, it is not designed to deal with documents that use the 111 | AMSTeX extensions - which include the theorem, lemma, proof, and 112 | exercise environments that are heavily used for typesetting papers, 113 | lecture notes, and other documents. 114 | 115 | As neither Pandoc nor MathJaX can deal with these documents, I hacked 116 | together a set of regular expressions that can convert a subset of LaTeX 117 | to Markdown, and used a few more to convert the sMarkdown to 118 | MathJaX-convertible Markdown. 119 | 120 | Example 121 | ------- 122 | 123 | As an example, the following LaTeX code: 124 | 125 | :: 126 | 127 | \section{Example Section} 128 | \begin{thm}[Euclid] 129 | There are infinitely many primes. 130 | \end{thm} 131 | 132 | \begin{proof} 133 | Suppose that $p_1 < p_2 < \dots < p_n$ are all of the primes. 134 | Let $P = 1 + \prod_{i=1}^n p_i$ and let $p$ be a prime dividing $P$. 135 | 136 | Then $p$ can not be any of $p_i$, for otherwise $p$ would divide the 137 | difference $P - \left(\prod_{i=1}^n p_i \right) - 1$, which is impossible. 138 | So this prime $p$ is still another prime, and $p_1, p_2, \dots p_n$ 139 | cannot be all of the primes. 140 | \end{proof} 141 | 142 | is converted into the following Markdown: 143 | 144 | :: 145 | 146 | ### Example Section 147 | #### Theorem 1 (Euclid) 148 | 149 | > There are infinitely many primes. 150 | 151 | #### Proof 152 | 153 | Suppose that $p_1 < p_2 < \dots < p_n$ are all of the primes. 154 | Let $P = 1 + \prod_{i=1}^n p_i$ and let $p$ be a prime dividing $P$. 155 | 156 | Then $p$ can not be any of $p_i$, for otherwise $p$ would divide the difference 157 | $P - \left(\prod_{i=1}^n p_i \right) - 1$, which is impossible. So this prime 158 | $p$ is still another prime, and $p_1, p_2, \dots p_n$ cannot be all of the primes. 159 | 160 | Supported LaTeX/AMSTeX Environments 161 | ----------------------------------- 162 | 163 | - ``emph``, ``textbf``, ``texttt`` 164 | - ``thm`` 165 | - ``prop`` 166 | - ``lem`` 167 | - ``exer`` 168 | - ``proof`` 169 | - ``chapter`` 170 | - ``section`` 171 | - ``subsection`` 172 | - ``itemize`` 173 | - ``enumerate`` 174 | 175 | along with everything supported by MathJax - list available 176 | `online `_. 177 | -------------------------------------------------------------------------------- /latex2markdown.py: -------------------------------------------------------------------------------- 1 | import re 2 | from collections import defaultdict 3 | 4 | #------------------------------------------------------------------------------ 5 | 6 | # Basic configuration - modify this to change output formatting 7 | _block_configuration = { 8 | "chapter": { 9 | "markdown_heading": "##", 10 | "pretty_name": "", 11 | "show_count": False 12 | }, 13 | "enumerate": { 14 | "line_indent_char": "", 15 | "list_heading": "1. ", 16 | "markdown_heading": "", 17 | "pretty_name": "", 18 | "show_count": False 19 | }, 20 | "exer": { 21 | "line_indent_char": "> ", 22 | "markdown_heading": "####", 23 | "pretty_name": "Exercise", 24 | "show_count": True 25 | }, 26 | "itemize": { 27 | "line_indent_char": "", 28 | "list_heading": "* ", 29 | "markdown_heading": "", 30 | "pretty_name": "", 31 | "show_count": False 32 | }, 33 | "lem": { 34 | "line_indent_char": "> ", 35 | "markdown_heading": "####", 36 | "pretty_name": "Lemma", 37 | "show_count": True 38 | }, 39 | "lstlisting": { 40 | "line_indent_char": " ", 41 | "markdown_heading": "", 42 | "pretty_name": "", 43 | "show_count": False 44 | }, 45 | "proof": { 46 | "line_indent_char": "", 47 | "markdown_heading": "####", 48 | "pretty_name": "Proof", 49 | "show_count": False 50 | }, 51 | "prop": { 52 | "line_indent_char": "> ", 53 | "markdown_heading": "####", 54 | "pretty_name": "Proposition", 55 | "show_count": True 56 | }, 57 | "section": { 58 | "markdown_heading": "###", 59 | "pretty_name": "", 60 | "show_count": False 61 | }, 62 | "subsection": { 63 | "markdown_heading": "####", 64 | "pretty_name": "", 65 | "show_count": False 66 | }, 67 | "thm": { 68 | "line_indent_char": "> ", 69 | "markdown_heading": "####", 70 | "pretty_name": "Theorem", 71 | "show_count": True 72 | } 73 | } 74 | 75 | #------------------------------------------------------------------------------ 76 | 77 | class LaTeX2Markdown(object): 78 | """Initialise with a LaTeX string - see the main routine for examples of 79 | reading this string from an existing .tex file. 80 | 81 | To modify the outputted markdown, modify the _block_configuration variable 82 | before initializing the LaTeX2Markdown instance.""" 83 | def __init__(self, latex_string, 84 | block_configuration = _block_configuration, 85 | block_counter = defaultdict(lambda: 1)): 86 | 87 | self._block_configuration = block_configuration 88 | self._latex_string = latex_string 89 | self._block_counter = block_counter 90 | 91 | # Precompile the regexes 92 | 93 | # Select everything in the main matter 94 | self._main_re = re.compile(r"""\\begin{document} 95 | (?P
.*) 96 | \\end{document}""", 97 | flags=re.DOTALL + re.VERBOSE) 98 | 99 | # Select all our block materials. 100 | self._block_re = re.compile(r"""\\begin{(?Pexer|proof|thm|lem|prop)} # block name 101 | (\[(?P.*?)\])? # Optional block title 102 | (?P.*?) # Non-greedy block contents 103 | \\end{(?P=block_name)}""", # closing block 104 | flags=re.DOTALL + re.VERBOSE) 105 | 106 | # Select all our list blocks 107 | self._lists_re = re.compile(r"""\\begin{(?Penumerate|itemize)} # list name 108 | (\[.*?\])? # Optional enumerate settings i.e. (a) 109 | (?P.*?) # Non-greedy list contents 110 | \\end{(?P=block_name)}""", # closing list 111 | flags=re.DOTALL + re.VERBOSE) 112 | 113 | # Select all our headers 114 | self._header_re = re.compile(r"""\\(?Pchapter|section|subsection) # Header 115 | {(?P.*?)}""", # Header title 116 | flags=re.DOTALL + re.VERBOSE) 117 | 118 | # Select all our 'auxillary blocks' - these need special treatment 119 | # for future use - e.g. pygments highlighting instead of code blocks 120 | # in Markdown 121 | self._aux_block_re = re.compile(r"""\\begin{(?Plstlisting)} # block name 122 | (?P.*?) # Non-greedy block contents 123 | \\end{(?P=block_name)}""", # closing block 124 | flags=re.DOTALL + re.VERBOSE) 125 | 126 | def _replace_header(self, matchobj): 127 | """Creates a header string for a section/subsection/chapter match. 128 | For example, "### 2 - Integral Calculus\n" """ 129 | 130 | header_name = matchobj.group('header_name') 131 | header_contents = matchobj.group('header_contents') 132 | 133 | header = self._format_block_name(header_name) 134 | 135 | block_config = self._block_configuration[header_name] 136 | 137 | # If we have a count, separate the title from the count with a dash 138 | separator = "-" if block_config.get("show_count") else "" 139 | 140 | output_str = "{header} {separator} {title}\n".format( 141 | header=header, 142 | title=header_contents, 143 | separator=separator) 144 | 145 | return output_str 146 | 147 | def _replace_block(self, matchobj): 148 | """Create a string that replaces an entire block. 149 | The string consists of a header (e.g. ### Exercise 1) 150 | and a block, containing the LaTeX code. 151 | 152 | The block may be optionally indented, blockquoted, etc. 153 | These settings are customizable through the config.json 154 | file""" 155 | 156 | block_name = matchobj.group('block_name') 157 | block_contents = matchobj.group('block_contents') 158 | # Block title may not exist, so use .get method 159 | block_title = matchobj.groupdict().get('block_title') 160 | 161 | # We have to format differently for lists 162 | if block_name in {"itemize", "enumerate"}: 163 | formatted_contents = self._format_list_contents(block_name, 164 | block_contents) 165 | else: 166 | formatted_contents = self._format_block_contents(block_name, 167 | block_contents) 168 | 169 | header = self._format_block_name(block_name, block_title) 170 | 171 | output_str = "{header}\n\n{block_contents}".format( 172 | header=header, 173 | block_contents=formatted_contents) 174 | return output_str 175 | 176 | 177 | def _format_block_contents(self, block_name, block_contents): 178 | """Format the contents of a block with configuration parameters 179 | provided in the self._block_configuration attribute""" 180 | 181 | block_config = self._block_configuration[block_name] 182 | 183 | line_indent_char = block_config["line_indent_char"] 184 | 185 | output_str = "" 186 | for line in block_contents.lstrip().rstrip().split("\n"): 187 | line = line.lstrip().rstrip() 188 | indented_line = line_indent_char + line + "\n" 189 | output_str += indented_line 190 | return output_str 191 | 192 | def _format_list_contents(self, block_name, block_contents): 193 | """To format a list, we must remove the \item declaration in the 194 | LaTeX source. All else is as in the _format_block_contents method.""" 195 | block_config = self._block_configuration[block_name] 196 | 197 | list_heading = block_config["list_heading"] 198 | 199 | output_str = "" 200 | for line in block_contents.lstrip().rstrip().split("\n"): 201 | line = line.lstrip().rstrip() 202 | markdown_list_line = line.replace(r"\item", list_heading) 203 | output_str += markdown_list_line + "\n" 204 | return output_str 205 | 206 | def _format_block_name(self, block_name, block_title=None): 207 | """Format the Markdown header associated with a block. 208 | Due to the optional block_title, we split the string construction 209 | into two parts.""" 210 | 211 | block_config = self._block_configuration[block_name] 212 | pretty_name = block_config["pretty_name"] 213 | show_count = block_config["show_count"] 214 | markdown_heading = block_config["markdown_heading"] 215 | 216 | block_count = self._block_counter[block_name] if show_count else "" 217 | self._block_counter[block_name] += 1 218 | 219 | output_str = "{markdown_heading} {pretty_name} {block_count}".format( 220 | markdown_heading=markdown_heading, 221 | pretty_name=pretty_name, 222 | block_count=block_count) 223 | 224 | if block_title: 225 | output_str = "{output_str} ({block_title})".format( 226 | output_str=output_str, 227 | block_title=block_title) 228 | 229 | return output_str.lstrip().rstrip() 230 | 231 | def _latex_to_markdown(self): 232 | """Main function, returns the formatted Markdown as a string. 233 | Uses a lot of custom regexes to fix a lot of content - you may have 234 | to add or remove some regexes to suit your own needs.""" 235 | 236 | # Get main content, skipping preamble and closing tags. 237 | try: 238 | output = self._main_re.search(self._latex_string).group("main") 239 | except AttributeError: 240 | output = self._latex_string 241 | 242 | # Reformat, lists, blocks, and headers. 243 | output = self._lists_re.sub(self._replace_block, output) 244 | output = self._block_re.sub(self._replace_block, output) 245 | output = self._header_re.sub(self._replace_header, output) 246 | output = self._aux_block_re.sub(self._replace_block, output) 247 | 248 | # Fix \\ formatting for line breaks in align blocks 249 | output = re.sub(r" \\\\", r" \\\\\\\\", output) 250 | # Convert align* block to align - this fixes formatting 251 | output = re.sub(r"align\*", r"align", output) 252 | 253 | # Fix emph, textbf, texttt formatting 254 | output = re.sub(r"\\emph{(.*?)}", r"*\1*", output) 255 | output = re.sub(r"\\textbf{(.*?)}", r"**\1**", output) 256 | output = re.sub(r"\\texttt{(.*?)}", r"`\1`", output) 257 | 258 | # Fix \% formatting 259 | output = re.sub(r"\\%", r"%", output) 260 | # Fix argmax, etc. 261 | output = re.sub(r"\\arg(max|min)", r"\\text{arg\1}", output) 262 | 263 | # Throw away content in IGNORE/END block 264 | output = re.sub(r"% LaTeX2Markdown IGNORE(.*?)\% LaTeX2Markdown END", 265 | "", output, flags=re.DOTALL) 266 | return output.lstrip().rstrip() 267 | 268 | def to_markdown(self): 269 | return self._latex_to_markdown() 270 | 271 | def to_latex(self): 272 | return self._latex_string 273 | 274 | #------------------------------------------------------------------------------ 275 | 276 | if __name__ == '__main__': 277 | import sys 278 | if len(sys.argv) == 1: 279 | input_file = "bin/latex_sample.tex" 280 | output_file = "bin/converted_latex_sample.md" 281 | else: 282 | input_file, output_file = sys.argv[1], sys.argv[2] 283 | 284 | with open(input_file, 'r') as f: 285 | latex_string = f.read() 286 | y = LaTeX2Markdown(latex_string) 287 | markdown_string = y.to_markdown() 288 | with open(output_file, 'w') as f_out: 289 | f_out.write(markdown_string) 290 | --------------------------------------------------------------------------------