├── .gitignore
├── README.md
├── examples
├── README.md
├── blogpost.md
├── header-attrs.md
├── headers.org
├── html.md
├── link.md
├── mult-authors.md
├── nav.md
├── nav.org
├── nba2.org
├── no-headers.org
├── ol.md
├── paragraphs.md
├── paragraphs.org
├── plain-list.org
├── sample.md
├── test.md
├── worknotes.md
└── worknotes.org
├── pandoc_opml
└── __init__.py
└── setup.py
/.gitignore:
--------------------------------------------------------------------------------
1 | *.egg-info
2 | *.pyc
3 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | pandoc-opml
2 | ===========
3 |
4 | pandoc-opml generates [OPML] files from [Markdown] with the help of [pandoc].
5 |
6 | [OPML]: http://dev.opml.org/spec2.html
7 | [Markdown]: http://johnmacfarlane.net/pandoc/README.html#pandocs-markdown
8 | [pandoc]: http://johnmacfarlane.net/pandoc/
9 |
10 | Demo
11 | ----
12 |
13 | Imagine this Markdown document:
14 |
15 | ```markdown
16 | ---
17 | title: Demo Document
18 | author: Eric Davis
19 | ---
20 |
21 | # Hello World!
22 |
23 | This is a child of the "Hello World!" header.
24 | ```
25 |
26 | After running it through `pandoc-opml`, you'd have this OPML document:
27 |
28 | ```xml
29 |
30 |
31 |
32 |
33 | Demo Document
34 | Eric Davis
35 | Tue, 13 Jan 2015 04:21:33 GMT
36 | https://github.com/edavis/pandoc-opml
37 | https://github.com/edavis/pandoc-opml#docs
38 |
39 |
40 |
41 |
42 |
43 |
44 |
45 | ```
46 |
47 | Alright, so I've taken the simplicity of Markdown and turned it into a
48 | jumble of XML. What's so great about this?
49 |
50 | Well, think of what an XML version of your Markdown now enables.
51 |
52 | Say you wanted to grab all level 1 and level 2 headlines from a
53 | Markdown document to put together a table of contents.
54 |
55 | All the widely used Markdown libraries seem to focus primarily on
56 | transforming Markdown into HTML, so no help there. Beyond that, you
57 | could try writing a regex to extract the headers but [that path is
58 | brittle and full of pain][regex quote].
59 |
60 | What if instead you could transform your Markdown into XML and gain
61 | with it all the tools and libraries that natively work with XML? Then
62 | your "grab all level 1 and level 2 headers" task would be a breeze.
63 |
64 | pandoc-opml is the tool to do just that.
65 |
66 | [regex quote]: http://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/
67 |
68 | Installation
69 | ------------
70 |
71 | I'll eventually toss this up on PyPI, but for now:
72 |
73 | ```bash
74 | $ pip install https://github.com/edavis/pandoc-opml
75 | ```
76 |
77 | The only external requirement is [pandoc].
78 |
79 | Running
80 | -------
81 |
82 | ```bash
83 | $ pandoc-opml [-o ]
84 | ```
85 |
86 | If `-o/--output` is not provided, the output is written to stdout.
87 |
88 | Docs
89 | ----
90 |
91 | pandoc-opml makes every effort to follow the [OPML v2.0][OPML]
92 | specification as closely as possible.
93 |
94 | However, Markdown is a rich format so some additional information
95 | about the source elements are stored as attributes.
96 |
97 | A good OPML parser should ignore anything it doesn't understand, so
98 | none of this should be a problem. Please file a bug report if any
99 | problems do arise.
100 |
101 | ### Headers
102 |
103 | The OPML of Markdown [headline elements][headlines] includes two
104 | attributes: `level` and `name`.
105 |
106 | The `level` attribute is the HTML level for the given header
107 | element. For example `1` for h1, `2` for h2, etc.
108 |
109 | The `name` attribute is the unique identifier assigned according to
110 | [these rules][unique ids].
111 |
112 | [headlines]: http://johnmacfarlane.net/pandoc/README.html#headers
113 | [unique ids]: http://johnmacfarlane.net/pandoc/README.html#extension-auto_identifiers
114 |
115 | To override the `name` attribute, explicitly set the unique identifier:
116 |
117 | ```markdown
118 | # Hello World {#custom-id}
119 | ```
120 |
121 | ```xml
122 |
123 | ```
124 |
125 | ### Attributes
126 |
127 | If you specify [header attributes], pandoc-opml will include them in
128 | the resulting OPML:
129 |
130 | ```markdown
131 | # Hello World {#custom-id .draft category=demo}
132 | ```
133 |
134 | ```xml
135 |
136 | ```
137 |
138 | Class header attributes have the value of "true" while key/value
139 | header attributes are included as-is.
140 |
141 | Later attributes overwrite earlier ones. For example:
142 |
143 | ```markdown
144 | # Hello World {#unique-id .name name=example}
145 | ```
146 |
147 | First, `name=unique-id`. Then, the class attribute sets
148 | `name=true`. Then, the key/value attribute sets `name=example`. In the
149 | resulting OPML, `name` will equal `example`.
150 |
151 | [header attributes]: http://johnmacfarlane.net/pandoc/README.html#extension-header_attributes
152 |
153 | ### Lists
154 |
155 | [Unordered list items][unordered lists] have a `list` attribute set to
156 | `unordered`.
157 |
158 | [Ordered list items][ordered lists] have a `list` attribute set to
159 | `ordered` and an `ordinal` attribute set to the ordinal number of the
160 | list item.
161 |
162 | Example:
163 |
164 | ```markdown
165 | - Hello World
166 | - This is a test
167 |
168 | 1) Hello World
169 | 2) This is a test
170 | ```
171 |
172 | ```xml
173 |
174 |
175 |
176 |
177 |
178 | ```
179 |
180 | [list elements]: http://johnmacfarlane.net/pandoc/README.html#lists
181 | [unordered lists]: http://johnmacfarlane.net/pandoc/README.html#bullet-lists
182 | [ordered lists]: http://johnmacfarlane.net/pandoc/README.html#ordered-lists
183 |
184 | ### Metadata
185 |
186 | If `description` is included in the [metadata], it is included as a
187 | `` element in the OPML's ``.
188 |
189 | If `date` is included, it is included as the `` element
190 | in the OPML's ``.
191 |
192 | The `` element is the timestamp of when `pandoc-opml`
193 | created the OPML.
194 |
195 | All the other metadata (e.g., title, author, email, etc.) maps to the
196 | standard OPML `` elements.
197 |
198 | If more than one author is provided, a single `` element is
199 | created with the names comma delimited.
200 |
201 | [metadata]: http://johnmacfarlane.net/pandoc/README.html#metadata-blocks
202 |
203 | ### HTML
204 |
205 | If the source Markdown contains formatting, the respective OPML `text`
206 | attribute will contain encoded HTML markup:
207 |
208 | ```markdown
209 | This paragraph contains *emphasis* and **strong** formatting along
210 | with `code` and H~2~O (subscripts) and 2^10^ (superscripts) and last,
211 | but not least, ~~deleted text~~.
212 | ```
213 |
214 | ```xml
215 |
216 | ```
217 |
218 | Background
219 | ----------
220 |
221 | I've long been interested in OPML as a file format, but I was always
222 | more comfortable using a text editor than any of the available OPML
223 | editors.
224 |
225 | So I started toying with the idea of using a regular text editor and
226 | exporting plain text files to OPML instead of editing OPML
227 | directly.
228 |
229 | I knew the hardest part was going to be parsing the plain text input
230 | files. Looking for alternatives to writing that code myself, I found
231 | pandoc and was thrilled to see it provided access to the abstract
232 | syntax tree (AST) that represented the input file's headers,
233 | paragraphs, list items, etc. Plus, by using pandoc, I could write the
234 | input files in any of the [many file formats it understands][inputs].
235 |
236 | [inputs]: http://johnmacfarlane.net/pandoc/README.html#description
237 |
--------------------------------------------------------------------------------
/examples/README.md:
--------------------------------------------------------------------------------
1 | I often need little documents to test out certain bits of pandoc-opml
2 | functionality. These are those documents.
3 |
4 | I'll probably phase these out once I have unittests in place but for
5 | now it's better than nothing.
6 |
--------------------------------------------------------------------------------
/examples/blogpost.md:
--------------------------------------------------------------------------------
1 | % Eric's Blog OPML
2 | % Eric Davis
3 | % 2015-01-15
4 |
5 | # NBA {type=include url=http://files.davising.com.s3.amazonaws.com/2015/01/11/nba.opml}
6 | # Google {type=link url=http://google.com/}
7 |
--------------------------------------------------------------------------------
/examples/header-attrs.md:
--------------------------------------------------------------------------------
1 | # Header 1 {type=howto domain=opml.ericdavis.org}
2 |
3 | Hello world
4 |
5 | ## Header 2 {#custom-name type=worknote}
6 |
7 | - Hello
8 |
--------------------------------------------------------------------------------
/examples/headers.org:
--------------------------------------------------------------------------------
1 | #+TITLE: Hello World
2 | #+DESCRIPTION: Test headers in the OPML head element
3 | #+AUTHOR: Eric Davis; Davis Eric
4 | #+EMAIL: edavis@eresources.com
5 | #+DATE: 2014-01-01
6 |
7 | - Item 1
8 | - Item 2
9 | - Item 2.1
10 | - Item 3
11 |
--------------------------------------------------------------------------------
/examples/html.md:
--------------------------------------------------------------------------------
1 | Emph: *emph*
2 |
3 | Strong: **strong**
4 |
5 | Code: `code 123`
6 |
7 | Sub and super: H~2~O is a liquid. 2^10^ is 1024.
8 |
9 | Strikeout: This is ~~deleted text~~.
10 |
--------------------------------------------------------------------------------
/examples/link.md:
--------------------------------------------------------------------------------
1 | This is a paragraph with a link in it: [Hello World]
2 |
3 | This is an inline link .
4 |
5 | Same as above, but implicit: http://example.com/
6 |
7 | (Have to add brackets or nothing happens, it looks like).
8 |
9 | And with a [title](http://example.com/ "example title").
10 |
11 | [Hello World]: http://example.com/
12 |
--------------------------------------------------------------------------------
/examples/mult-authors.md:
--------------------------------------------------------------------------------
1 | % Testing multiple authors
2 | % Eric Davis
3 |
4 | Hello World!
5 |
--------------------------------------------------------------------------------
/examples/nav.md:
--------------------------------------------------------------------------------
1 | % Hello World
2 | % Eric Davis; Davis Eric
3 | % 2015-01-01
4 |
5 | # Header 1
6 |
7 | Paragraph below 1 in markdown
8 |
9 | ## Header 2
10 |
11 | Author info doesn't work for head in markdown
12 |
--------------------------------------------------------------------------------
/examples/nav.org:
--------------------------------------------------------------------------------
1 | #+DESCRIPTION: See how pandoc-opml deals with org-mode navigation headers
2 |
3 | * Hello World
4 | Paragraph hw
5 |
6 | - Testing
7 | - Testing 123
8 | - Testing 321
9 |
10 | Last paragraph
11 |
12 | ** World Hello
13 | Paragraph
14 |
15 | - Item 1
16 | - Item 2
17 | - Item 2.1
18 | - Item 3
19 |
20 | Final graf
21 |
--------------------------------------------------------------------------------
/examples/nba2.org:
--------------------------------------------------------------------------------
1 | #+TITLE: NBA Teams
2 | #+AUTHOR: Eric Davis
3 | #+EMAIL: eric@davising.com
4 | #+DESCRIPTION: List of all NBA teams
5 |
6 | * Eastern Conference
7 | ** Atlantic Division
8 | - Boston Celtics
9 | - Brooklyn Nets
10 | - New York Knicks
11 | - Philadelphia 76ers
12 | - Toronto Raptors
13 | ** Central Division
14 | - Chicago Bulls
15 | - Cleveland Cavaliers
16 | - Detroit Pistons
17 | - Indiana Pacers
18 | - Milwaukee Bucks
19 | ** Southeast Division
20 | - Atlanta Hawks
21 | - Charlotte Bobcats
22 | - Miami Heat
23 | - Orlando Magic
24 | - Washington Wizards
25 | * Western Conference
26 | ** Southwest Division
27 | - Dallas Mavericks
28 | - Houston Rockets
29 | - Memphis Grizzlies
30 | - New Orleans Pelicans
31 | - San Antonio Spurs
32 | ** Northwest Division
33 | - Denver Nuggets
34 | - Minnesota Timberwolves
35 | - Portland Trail Blazers
36 | - Oklahoma City Thunder
37 | - Utah Jazz
38 | ** Pacific Division
39 | - Golden State Warriors
40 | - Los Angeles Clippers
41 | - Los Angeles Lakers
42 | - Phoenix Suns
43 | - Sacramento Kings
44 |
--------------------------------------------------------------------------------
/examples/no-headers.org:
--------------------------------------------------------------------------------
1 | - Hello World
2 |
--------------------------------------------------------------------------------
/examples/ol.md:
--------------------------------------------------------------------------------
1 | 1) Hello
2 | 2) World
3 | 1) Eric
4 | 2) James
5 | 3) Davis
6 | 3) Testing
7 |
--------------------------------------------------------------------------------
/examples/paragraphs.md:
--------------------------------------------------------------------------------
1 | - Item 1
2 |
3 | - Test
4 |
5 | - Test 2
6 |
7 | - Test 3
8 |
--------------------------------------------------------------------------------
/examples/paragraphs.org:
--------------------------------------------------------------------------------
1 | * Header
2 |
3 | Paragraph 1
4 |
5 | - Item 1
6 | - Item 1.1
7 | - Item 2
8 | - Item 3
9 |
10 | Paragraph 2
11 |
--------------------------------------------------------------------------------
/examples/plain-list.org:
--------------------------------------------------------------------------------
1 | - NBA
2 | - Eastern Conference
3 | - Atlantic Division
4 | - Boston Celtics
5 | - Brooklyn Nets
6 | - New York Knicks
7 | - Philadelphia 76ers
8 | - Toronto Raptors
9 | - Central Division
10 | - Chicago Bulls
11 | - Cleveland Cavaliers
12 | - Detroit Pistons
13 | - Indiana Pacers
14 | - Milwaukee Bucks
15 | - Southeast Division
16 | - Atlanta Hawks
17 | - Charlotte Bobcats
18 | - Miami Heat
19 | - Orlando Magic
20 | - Washington Wizards
21 | - Western Conference
22 | - Southwest Division
23 | - Dallas Mavericks
24 | - Houston Rockets
25 | - Memphis Grizzlies
26 | - New Orleans Pelicans
27 | - San Antonio Spurs
28 | - Northwest Division
29 | - Denver Nuggets
30 | - Minnesota Timberwolves
31 | - Portland Trail Blazers
32 | - Oklahoma City Thunder
33 | - Utah Jazz
34 | - Pacific Division
35 | - Golden State Warriors
36 | - Los Angeles Clippers
37 | - Los Angeles Lakers
38 | - Phoenix Suns
39 | - Sacramento Kings
40 |
--------------------------------------------------------------------------------
/examples/sample.md:
--------------------------------------------------------------------------------
1 | - Item 1
2 | - Item 1.1
3 | - Item 1.2
4 | - Item 2
5 | - Item 2.1
6 | - Item 2.1.1
7 | - Item 2.2
8 | - Item 3
9 | - Item 3.1
10 | - Item 3.2
11 | - Item 3.3
12 | - Item 3.3.1
13 |
--------------------------------------------------------------------------------
/examples/test.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/edavis/pandoc-opml/94eb14b4df92fffa72e41e728dc9f4c599dbb7c3/examples/test.md
--------------------------------------------------------------------------------
/examples/worknotes.md:
--------------------------------------------------------------------------------
1 | # 2015
2 | ## January 2015
3 | ### January 11, 2015
4 |
5 | Hello world!
6 |
--------------------------------------------------------------------------------
/examples/worknotes.org:
--------------------------------------------------------------------------------
1 | * 2015
2 | ** January 2015
3 | *** January 11, 2015
4 |
5 | Body text, yo
6 |
--------------------------------------------------------------------------------
/pandoc_opml/__init__.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import json
3 | import time
4 | import itertools
5 | import subprocess
6 | from datetime import datetime
7 | from xml.etree import ElementTree as ET
8 |
9 | __version__ = '0.1'
10 |
11 | def gmt(s=None):
12 | utc = time.gmtime(s)
13 | d = datetime(*utc[:6])
14 | return d.strftime('%a, %d %b %Y %H:%M:%S') + ' GMT'
15 |
16 | class Node(object):
17 | def __init__(self, text, attr=None):
18 | self.text = text
19 | self.attr = attr or {}
20 | self.children = []
21 |
22 | def append(self, node):
23 | self.children.append(node)
24 |
25 | class PandocOPML(object):
26 | def __init__(self, json_ast=None):
27 | if json_ast is None:
28 | self.head, self.body = json.loads(sys.stdin.read())
29 | else:
30 | self.head, self.body = json.loads(json_ast)
31 | self.head = self.head['unMeta']
32 | self.depth = 0
33 | self.el = None
34 | self.nodes = self.parse()
35 |
36 | def parse(self):
37 | nodes = []
38 |
39 | def add_node(node):
40 | try:
41 | nodes[self.depth].append(node)
42 | except IndexError:
43 | nodes.append([node])
44 |
45 | if self.depth > 0:
46 | parent = nodes[self.depth - 1][-1]
47 | parent.append(node)
48 |
49 | def inner(content):
50 | for obj in content:
51 | if obj.get('t') in {'Para', 'Plain'}:
52 | node = Node(self.extract(obj.get('c')))
53 | add_node(node)
54 | self.el = obj.get('t')
55 |
56 | elif obj.get('t') == 'OrderedList':
57 | info, contents = obj.get('c')
58 | counter = itertools.count(info[0])
59 | if self.el in {'Header', 'Para'} or self.el is None:
60 | for element in contents:
61 | inner(element)
62 | n = nodes[self.depth][-1] # most recently added node
63 | n.attr.update({
64 | 'ordinal': str(next(counter)),
65 | 'list': 'ordered',
66 | })
67 | else:
68 | self.depth += 1
69 | for element in contents:
70 | inner(element)
71 | n = nodes[self.depth][-1]
72 | n.attr.update({
73 | 'ordinal': str(next(counter)),
74 | 'list': 'ordered',
75 | })
76 | self.depth -= 1
77 |
78 | elif obj.get('t') == 'BulletList':
79 | if self.el in {'Header', 'Para'} or self.el is None:
80 | # Don't increase the depth when a BulletList
81 | # follows a Header or Para object.
82 | #
83 | # If the last object was Header, it has
84 | # already incremented the depth.
85 | for element in obj.get('c'):
86 | inner(element)
87 | n = nodes[self.depth][-1]
88 | n.attr['list'] = 'unordered'
89 | else:
90 | # But do increase the depth when a BulletList
91 | # follows anything else.
92 | #
93 | # This makes nested BulletLists work.
94 | self.depth += 1
95 | for element in obj.get('c'):
96 | inner(element)
97 | n = nodes[self.depth][-1]
98 | n.attr['list'] = 'unordered'
99 | self.depth -= 1
100 |
101 | elif obj.get('t') == 'Header':
102 | level, attr, content = obj.get('c')
103 | outline_attr = self.extract_header_attributes(attr)
104 | outline_attr['level'] = str(level)
105 | node = Node(self.extract(content), outline_attr)
106 | self.depth = level - 1
107 |
108 | add_node(node)
109 |
110 | # the next elements are children of this header
111 | self.depth += 1
112 | self.el = 'Header'
113 |
114 | inner(self.body)
115 | return nodes
116 |
117 | def write(self, output):
118 | def process(parent, node):
119 | for child in node.children:
120 | params = {'text': child.text}
121 | params.update(child.attr)
122 | el = ET.SubElement(parent, 'outline', **params)
123 | process(el, child)
124 |
125 | root = ET.Element('opml', version='2.0')
126 | head = ET.SubElement(root, 'head')
127 | body = ET.SubElement(root, 'body')
128 | now = gmt()
129 |
130 | def header(key, value):
131 | ET.SubElement(head, key).text = value
132 |
133 | if 'title' in self.head:
134 | header('title', self.extract(self.head['title']['c']))
135 |
136 | if 'description' in self.head:
137 | header('description', self.extract(self.head['description']['c']))
138 |
139 | if 'author' in self.head:
140 | # Markdown returns a MetaList of MetaInlines while org-mode returns MetaInlines.
141 | authors = []
142 | if self.head['author'].get('t') == 'MetaList':
143 | for author in self.head['author']['c']:
144 | authors.append(self.extract(author['c']))
145 | elif self.head['author'].get('t') == 'MetaInlines':
146 | authors = [self.extract(self.head['author']['c'])]
147 | header('ownerName', ', '.join(authors))
148 |
149 | if 'email' in self.head:
150 | header('ownerEmail', self.extract(self.head['email']['c']))
151 |
152 | if 'date' in self.head:
153 | header('dateCreated', self.extract(self.head['date']['c']))
154 |
155 | header('dateModified', now)
156 | header('generator', 'https://github.com/edavis/pandoc-opml')
157 | header('docs', 'https://github.com/edavis/pandoc-opml#docs')
158 |
159 | generated = ET.Comment(' OPML generated by pandoc-opml v%s on %s ' % (__version__, now))
160 | root.insert(0, generated)
161 |
162 | for summit in self.nodes.pop(0):
163 | params = {'text': summit.text}
164 | params.update(summit.attr)
165 | el = ET.SubElement(body, 'outline', **params)
166 | process(el, summit)
167 |
168 | content = ET.ElementTree(root)
169 | content.write(
170 | open(output, 'wb') if output else sys.stdout,
171 | encoding = 'UTF-8',
172 | xml_declaration = True,
173 | )
174 |
175 | def extract_header_attributes(self, attr):
176 | outline_attr = {}
177 | name, args, kwargs = attr
178 | if name:
179 | outline_attr['name'] = name
180 | for arg in args:
181 | outline_attr[arg] = 'true'
182 | outline_attr.update(dict(kwargs))
183 | return outline_attr
184 |
185 | def extract(self, contents):
186 | ret = []
187 | html_map = {
188 | 'Emph': 'em',
189 | 'Strong': 'strong',
190 | 'Subscript': 'sub',
191 | 'Superscript': 'sup',
192 | 'Strikeout': 'del',
193 | }
194 |
195 | for obj in contents:
196 | if obj.get('t') == 'Str':
197 | ret.append(obj.get('c'))
198 | elif obj.get('t') == 'Space':
199 | ret.append(' ')
200 | elif obj.get('t') == 'Link':
201 | content, (link_url, link_title) = obj.get('c')
202 | text = self.extract(content)
203 | if link_title:
204 | ret.append(r'%s' % (link_url, link_title, text))
205 | else:
206 | ret.append(r'%s' % (link_url, text))
207 | elif obj.get('t') in html_map:
208 | tag = html_map[obj.get('t')]
209 | ret.append(
210 | r'<%s>%s%s>' % (tag, self.extract(obj.get('c')), tag)
211 | )
212 | elif obj.get('t') == 'Code':
213 | (_, code) = obj.get('c')
214 | ret.append(r'%s
' % code)
215 |
216 | return ''.join(ret)
217 |
218 | def main():
219 | import argparse
220 | parser = argparse.ArgumentParser()
221 | parser.add_argument('-o', '--output')
222 | parser.add_argument('input')
223 | args = parser.parse_args()
224 |
225 | json_ast = subprocess.check_output(['pandoc', '-t', 'json', args.input])
226 |
227 | p = PandocOPML(json_ast)
228 | p.write(args.output)
229 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | from setuptools import setup, find_packages
2 | from pandoc_opml import __version__
3 |
4 | setup(
5 | name = 'pandoc-opml',
6 | version = __version__,
7 | packages = find_packages(),
8 | author = 'Eric Davis',
9 | author_email = 'eric@davising.com',
10 | url = 'https://github.com/edavis/pandoc-opml',
11 | entry_points = {
12 | 'console_scripts': [
13 | 'pandoc-opml = pandoc_opml:main',
14 | ],
15 | },
16 | )
17 |
--------------------------------------------------------------------------------